Topics in Gaussian rough paths theory

vorgelegt von Diplom-Mathematiker Sebastian Riedel Hannover

Von der Fakult¨atII - Mathematik und Naturwissenschaften der Technischen Universit¨atBerlin zur Erlangung des akademischen Grades

Doktor der Naturwissenschaften Dr.rer.nat.

genehmigte Dissertation

Promotionsausschuss:

Vorsitzende: Prof. Dr. Gitta Kutyniok Berichter/Gutachter: Prof. Dr. Peter K. Friz Berichter/Gutachter: Prof. Dr.

Tag der wissenschaftlichen Aussprache: 23. April 2013

Berlin 2013 D 83 Berlin, May 5, 2013 Acknowledgement

At first, I would like to express my gratitude to my PhD advisor, Professor Peter Friz, who constantly supported me during the time of my doctorate. In particular, I would like to thank Peter for the time he always found for discussing with me and for the patience he had. His enduring encouragement laid the basis for the current work. Next, I would like to thank Professor Martin Hairer for being my second examiner, and Professor Gitta Kutyniok who kindly agreed to be the chair of the examination board. I am indebted to all my collaborators who worked with me during the last three years. Namely, I would like to thank Doctor Christian Bayer, Doctor Benjamin Gess, Professor Archil Gulisashvili, Professor Peter Friz, PD Doctor John Schoenmakers and Weijun Xu. Furthermore, I would like to thank Professor Terry Lyons for inviting me to Oxford during my PhD and for the valuable discussions we had. This work could have not been written without the financial support of the International Research Training Group “Stochastic models of complex processes” and the Berlin Mathematical School (BMS). I would like to thank all the people working there for their helpfulness and kindness they showed to me during the last years. Special thanks go to Joscha Diehl, Cl´ement Foucart, Birte Schr¨oderand Maite Wilke Berenguer for reading parts of this thesis and giving valuable comments. At this point, I would like to mention my colleagues and the friends I met in the mathematical institute of the Technische Universit¨atBerlin who gave me a very warm welcome and provided an open and friendly atmosphere during the time of my doctorate. In particular, I would like to thank Professor Michael Scheutzow, my BMS mentor, who gave me a lot of helpful advices concerning my PhD. I am also more than thankful to the following people: Michele, who made me laugh uncountably many times and who introduced me to the dark secrets of pasta and facebook. To Joscha for the possibility to ask the really relevant questions about rough paths. Thank you, Simon, for many fruitful discussions about and not about math. Maite, thank you for making me get up on Monday before 7:00 by offering coffee and for the joint exercise sessions. Thanks to Cl´ement for having many beers with me after work and for the first part of Brice de Nice. Last, thank you, Stefano, for B.F.H., the 1st of May and the 2nd chaos. At the end, I would like to thank my family, in particular my parents, who always had faith in what I am doing. Finally, my biggest thanks go to Birte for encouraging me during the last years, for sharing successes and defeats, joy and sorrow, and for chasing math away when it should not be there. ii To Birte iv Contents

Introduction 1

Notation and basic definitions 17

1 Convergence rates for the full Brownian rough paths with applications to limit theorems for stochastic flows 21 1.1 Rates of Convergence for the full Brownian rough path ...... 24

2 Convergence rates for the full Gaussian rough paths 35 2.1 Iterated and the shuffle algebra ...... 39 2.2 Multidimensional Young-integration and grid-controls ...... 43 2.3 The main estimates ...... 47 2.4 Main result ...... 66

3 Integrability of (non-)linear rough differential equations and integrals 73 3.1 Basic definitions ...... 74 3.2 Cass, Litterer and Lyons revisited ...... 76 3.3 Transitivity of the tail estimates under locally linear maps ...... 81 3.4 Linear RDEs ...... 83 3.5 Applications in stochastic analysis ...... 85

4 A simple proof of distance bounds for Gaussian rough paths 91 4.1 2D variation and Gaussian rough paths ...... 93 4.2 Main estimates ...... 98 4.3 Applications ...... 103

5 Spatial rough path lifts of stochastic convolutions 107 5.1 Main Result ...... 109 5.2 Conditions in terms of Fourier coefficients ...... 113 5.3 Lifting Ornstein-Uhlenbeck processes in space ...... 120

6 From rough path estimates to multilevel Monte Carlo 131 6.1 Rough path estimates revisited ...... 133 6.2 Probabilistic convergence results for RDEs ...... 137 6.3 Giles’ complexity theorem revisited ...... 141 6.4 Multilevel Monte Carlo for RDEs ...... 149

Appendix 151 A Kolmogorov theorem for multiplicative functionals ...... 151 Contents

vi Introduction

“What I don’t like about measure theory is that you have to say ’almost everywhere’ almost everywhere.” – Kurt Friedrichs

“What is Ω? Cats?” – Michele Salvi

In order to describe correctly the research contributions of this thesis, we begin by a brief history of the theory of stochastic integration with a focus on the attempt to define a pathwise . We will explain the very first goal of the precursors as well as the generalisations made by some of the leaders in the field. In particular, we will highlight the link between pathwise stochastic integration and Gaussian analysis. The introduction closes with an outline of our results. A basic problem in is to give a meaning to differential equations of the form

Y˙t = f(Yt) X˙ t; Y0 = ξ ∈ W, (1)

Y taking values in some W , X : [0,T ] → V being some random signal with values in a Banach space V and f taking values in the space of linear maps from V to W . In a deterministic setting, these equations are also called controlled differential equations. In many cases in stochastics, it is natural to assume that X˙ denotes some “noise” term which can formally be written as the differential of a Brownian motion B. However, this causes problems when we try to give a rigorous meaning to (1). In fact, a famous property of the trajectories t 7→ ω(t) of the Brownian motion, i.e. its sample paths, is their non-differentiability on a set of full measure. Therefore, we cannot apply the deterministic theory of controlled differential equations. One approach is to rewrite the differential equation (1) as an integral equation:

Z t Yt = Y0 + f(Ys) dXs. (2) 0 By doing this, we shift the problem of defining (1) to the problem of how to define the (stochas- tic) integral in (2). More generally, we may ask the following question: How can we define a stochastic integral of the form

Z t Ys dXs (3) 0 where X and Y are stochastic processes taking values in V resp. L(V,W )? There are basically two strategies we can follow. The first one ignores all the probabilistic structure the processes X and Y might have and tries to build up a deterministic theory of integration which is rich enough Introduction in order to integrate all sample paths of X and Y with respect to each other. We will call this the pathwise approach. The second strategy uses the probabilistic properties of the processes under consideration in order to define the integral, and we will call this the probabilistic approach. We will see that Lyons’ rough paths theory can be seen as a pathwise approach, whereas the classical It¯otheory is rather a probabilistic approach. In the following, we will summarise the most important attempts of defining stochastic integrals of the form (3) in order to better understand the contribution of rough paths theory in the context of stochastic integration. This permits us to explain the notion of Gaussian rough paths which provides the framework for this thesis. Young’s approach The first and probably most “natural” pathwise approach is to define the integral (3) as the limit (at least in probability) of Riemann-sums:

Z t X Ys dXs = lim Yti (Xti+1 − Xti ), (4) 0 |Π|→0 ti∈Π where the Π are finite partitions of the interval [0, t]. It is commonly known that this limit exists (pathwise) if the sample paths of X have bounded variation (which is the same as to say that the sample paths have finite length): X lim |Xti+1 − Xti | < ∞ a.s. |Π|→0 ti∈Π

Unfortunately, this is not the case for the Brownian motion and the above quantity will be infinite almost surely in this case. A more elaborated approach was developed by Laurence C. Young, starting from the article [You36] and further developed in a series of papers. Recall the notion of p-variation, a generalisation of the concept of bounded variation: If x: [0,T ] → V is a path and p ≥ 1, the p-variation of x is defined as

1   p X p sup  |xti+1 − xti |  . Π ti∈Π

The main theorem of Young can be stated as follows: If x and y are paths of finite p- resp. 1 1 1 q-variation with p + q > 1, the limit in (4) exists and can be bounded in terms of the p- and 1 1 q-variation of x and y. Let us note that the condition p + q > 1 is necessary; indeed, Young gives a counterexample by constructing paths x and y which have finite 2-variation only and for which the Riemann sums (4) diverge. Recall now that our initial aim was to solve integral equations of the form

Z t yt = y0 + f(ys) dxs. (5) 0 If this equation has a solution, we expect (at least for smooth f) that the solution y has a regularity (on small scales) which is similar to the regularity of x. In other words, if x has finite p-variation, also y and f(y) should have finite p-variation. This means that as long as x has finite p-variation for some p < 2, equation (5) should be solvable. That this is indeed the case was first rigorously worked out - to the author’s knowledge - by Terry Lyons in [Lyo94] for finite dimensional Banach spaces, see also [LCL07] for the general case. Lyons solves equation (5) by a Picard iteration scheme and shows that the solution y varies continuously in x with respect

1Note here that Young considers the case of complex valued paths only; however, the same proof works also in Banach spaces, cf. [LCL07].

2 to p-variation topology. Let us go back to stochastics now. If we consider again the Brownian motion B, it turns out that

X 2 sup |Bti+1 − Bti | = ∞ a.s. Π ti∈Π

(for a proof, cf. [FV10b, Section 13.9]), hence the sample paths of the Brownian motion do not have finite p-variation (almost surely) for p ≤ 2. In other words, the regularity of the trajectories slightly fail to fulfill the necessary regularity condition and therefore the theory of Young integration cannot be applied in the Brownian motion case. This, of course, is a remarkable drawback of Young’s theory. However, it still can be used to solve equations of the form (2) if the trajectories of the driving signal X are “not too rough”; for instance, it applies when X is a fractional Brownian motion with Hurst parameter H > 1/2 (the precise definition of a fractional Brownian motion will be given below). It¯o’stheory of stochastic integration We will only consider finite dimensional Banach spaces in this paragraph. In the seminal work [It¯o44],Kiyosi It¯owas the first who gave a satisfactory definition of the stochastic integral (3) in the case where X is a Brownian motion. In [It¯o51],he used this definition to solve differential equations driven by a Brownian motion. Since his approach differs very much from the pathwise approach, we decided to sketch it briefly. In modern language, the It¯ointegral is constructed by first identifying a family of “simple” processes which are piecewise constant, left- continuous and adapted with respect to the Brownian filtration. Adaptedness can be understood as that at time t, the process does not have more “information” about the Brownian motion than it provides up to time t (for instance, it cannot look in the future). This is of course a probabilistic notion. The integral is then defined in a natural way with respect to these processes. One realizes that these simple processes and the stochastic integral both belong to a certain space of processes called martingale spaces, and the stochastic integral defines an isometry between these spaces. Taking the closure in the space of integrable processes then defines the stochastic integral. In contrast to the pathwise approach, the It¯ointegral is now defined as an element in some space of processes via an isometry. In a second step, one can show that also the Riemann sums (4) converge to this object, but in general only in probability (which is weaker than almost sure convergence). The theory of It¯ointegration had an enormous success and is now widely used in stochastic calculus. Together with a change of variable formula, called It¯o’sLemma, it provides a powerful tool to solve stochastic differential equations of the form (2) even for more general driving processes X. However, the theory has certain constraints. We list two of them:

(i) The class of driving processes is (essentially) limited to (semi-)martingales, i.e. to pro- cesses which have the probabilistic properties of a “fair game”. It is not hard to imagine models (e.g. in finance) for which the driving signal does not have this structure.

(ii) Since the integral is defined in a “global” way, it is a priori not clear what happens on the level of trajectories. Recall that Lyons proved a pathwise continuity for the map x 7→ If (x, ξ) := y, y being the solution of (5), when x is a path of finite p-variation for p < 2. However, for Brownian trajectories ω, we do not know which regularity properties the map ω 7→ If (ω, ξ) enjoys. We will actually see that it is not (and cannot be) continuous.

The It¯ointegral has some unexpected properties. For instance, when replacing the Riemann sums in (4) by

X ˜ Yti (Xti+1 − Xti ), ti∈Π

3 Introduction

˜ where Yti = (Yti + Yti+1 )/2 and then passing to the limit |Π| → 0, we still have convergence (in probability), but not to the same object, which we call the in this case. This phenomenon does not occur for the Riemann-Stieltjes or the Young integral. Moreover, the change-of-variable formula (or It¯oformula) for the It¯ointegral contains an additional, unexpected term, the It¯ocorrection term. This term does not occur for the Stratonovich integral. This already indicates that stochastic integration is very different from the usual integration theory we know, and that one has some “freedom” when defining the integral. F¨ollmer’s“It¯oformula without probability” An interesting contribution in the direction of a pathwise approach was made by Hans F¨ollmerin the year 1981 in the work [F¨ol81]. F¨ollmerconsiders the quadratic variation [x] of a continuous, real valued path x with respect to a sequence of partitions Πn for which the mesh-size tends to 0 for n → ∞, defined by

X 2 lim |xt − xt | =: [x]t n→∞ i+1 i ti∈Πn; ti

Z t 0 f (xs) dxs 0 as the limit of Riemann sums along the sequence of partitions Πn. In the case of the Brownian motion, it is well-known that for its trajectories ω we have [ω]t = t almost surely for any sequence of nested partitions (Πn)n, hence the integral

Z t 0 f (Bs) dBs 0 can be defined in a pathwise manner. The work of F¨ollmeris interesting for us since he found a sufficient criterion, finiteness of the quadratic variation, to define a stochastic integral in a pathwise sense. Lyons’ key insights and the birth of rough paths theory In the humble opinion of the author, the final breakthrough in the task of defining stochastic integrals in a pathwise manner was made by Terry Lyons. For a better understanding of the issues, we will first give some “negative” results which show what will not work. Then we will try to sketch the main ideas of Lyons which constitute what is known today under the term of rough paths theory. In the work [Lyo91], Lyons proves the following result: Let C ⊂ C([0,T ], R) be a class of paths for which the It¯o(or Stratonovich) integral R µ dν exists (as a limit of Riemann sums) for all µ, ν ∈ C. Then C has Wiener measure 0. This result shows that even if we managed to extend the definition of the Young integral to a wider class of paths, using, for instance, a finer notion than the notion of p-variation, we would never be able to integrate all Brownian paths with respect to each other. One could hope that a different and maybe more sophisticated definition of the integral might help us getting out of this trouble. That this is not the case is

4 shown by Lyons in [LCL07, Proposition 1.29]: Let B be a Banach space on which the Wiener measure can be defined in a natural way2. Then there is no bilinear, continuous functional R 1 I : B × B → R for which I(µ, ν) = 0 µt dνt when µ, ν ∈ B are trigonometric polynomials. 3 This implies that whatever we choose as a linear subspace C ⊂ C([0,T ], R) (where C should be at least rich enough to handle Brownian paths), we will never be able to define a bilinear, continuous functional I : C × C → R (which should be thought of our integral) which fulfills the R 1 basic requirement that I(µ, ν) = 0 µt dνt holds for all µ, ν ∈ {cos(2πn·), sin(2πn·) | n ∈ N}. On the level of controlled differential equations, Lyons proves the following (cf. [LCL07, Section 1.5.2]): The map x 7→ If (x, ξ) is not continuous in 2-variation topology. The way Lyons proves this gives us a first hint what goes wrong for p ≥ 2 and how one might overcome this issue. Lyons defines f in such a way that the solution y = If (x, ξ) is given explicitly as  Z  1 2 yt = (yt , yt ) = xt − x0, dxu1 ⊗ dxu2 ∈ V ⊕ (V ⊗ V ), 0

x· − x0, dxu1 ⊗ dxu2 =: x· 7→ If (x, ξ) 0

2cf. [LCL07, Proposition 1.29] for the precise definition here. 3We will see that the condition of a linear subspace will be crucial; indeed, our integration theory for rough paths will not be linear.

5 Introduction

Consequently, he defines a p-rough path x to be the path x together with his first bpc iterated integrals. One should note at this point that rough paths spaces are not vector spaces (they actually cannot be linear if we want to be able to integrate Brownian paths as we saw before) but are metric spaces. The distance of two rough paths x and y takes into account the distance in p-variation between the paths x and y and the higher order iterated integrals. Lyons then defines a notion of an integral along a rough path x. Note that in general it will not be possible to integrate two rough paths x and y with respect to each other since the joint integral would necessarily contain mixed integrals of x and y and hence information which is not included in the respective rough paths. Instead, Lyons first defines the integral over (sufficiently smooth) 1-forms α: V → L(V,W ): If x is a p-rough path, Z α(x) dx, is defined to be another p-rough path and we have continuity in rough paths topology of the map x 7→ R α(x) dx. If we can make sense of (x, y) as a joint rough path4, we will be able to define the integral Z f(y) dx for sufficiently smooth functions f : W → L(V,W ) as a rough integral. It turns out that in the situation of controlled differential equations we can indeed follow this strategy. Lyons solves the equation

dyt = f(yt) dxt; y0 = ξ (6) via a Picard iteration for a p-rough path x and sufficiently smooth f. The solution y is again a p-rough path, and the map x 7→ y =: If (x, ξ) is seen to be continuous in rough paths topology. These results are quite technically involved, but now well understood and outlined in several monographs (cf. [LQ98], [LCL07]). Before we come to the application of rough paths theory in the field of stochastic analysis, we will give some further remarks concerning the deterministic theory.

(i) If (xn) is a sequence of smooth paths, we can solve equation (6) and obtain smooth n solutions (yn). If the iterated integrals of x converge to a rough path x, continuity of the map x 7→7→ If (x, ξ) implies that also the iterated integrals of yn converge to the solution y and the limit does not depend on the choice of the initial sequence. This Theorem is known as the Universal limit theorem. It can be seen as a deterministic analogue of the well-known Wong-Zakai theorem for Stratonovich stochastic differential equations. (ii) The statement that the necessary extra information to define a rough path is encoded in its iterated integrals is slightly misleading. In fact, the information is encoded in all iterated integrals indexed by rooted trees (cf. Gubinelli’s work [Gub10] for a clarification). However, the original statement is correct when we define the product of two iterated integrals in such a way that the algebra of iterated integrals is isomorphic to the shuffle algebra. In this case, the rough path x has a nice geometric feature; namely, it is seen to take values in a Lie group, the free nilpotent group of step bpc over V. Such paths are called weakly geometric rough paths. Iterated integrals of smooth paths are also taking values in this Lie group and taking the closure with respect to the p-variation metric defines the space of geometric rough paths. Every geometric rough path is also weakly geometric, but the converse is false, cf. Friz and Victoir [FV06a]. The geometric point of view of rough paths theory is worked out in great detail by Friz and Victoir in the monograph [FV10b].

4This is, of course, stronger than just defining x and y as rough paths; the situation can be compared to the fact that the distribution of two random variables X and Y do not determine the joint distribution of (X,Y ).

6 (iii) Although the space of rough paths is not a linear space, one can show that for a fixed reference rough path x, there is a linear space of paths for which all elements can be integrated with respect to x. These spaces are called spaces of controlled paths and were introduced by Gubinelli in [Gub04]. The integration theory for controlled paths is often more flexible and easier to handle than Lyons original integration theory and is now widely used, see also the forthcoming monograph [FH].

(iv) Rough paths theory was, from the very beginning, closely related to numerical approxima- tion schemes. In the work [Dav07], Davie showed that deterministic Euler- and Milstein schemes converge to the solution of the respective rough differential equation. This was generalised to step-N Taylor schemes for geometric rough paths by Friz and Victoir in [FV08b], see also [FV10b, Chapter 10].

(v) The map (x, f, ξ) 7→ If (x, ξ), which we will call the It¯o-Lyons map in the following, is even more regular than we already stated. In fact, it can be seen that it is locally Lipschitz continuous in every argument (cf. [FV10b, Chapter 10]). Moreover, the map x 7→ If (x, ξ) is even Fr´echet differentiable (cf. Li and Lyons [LL06] for the case p < 2 and Friz and Victoir [FV10b, Theorem 11.6] for the general case of geometric rough paths).

Rough paths theory applied to stochastic analysis Let us go back now to the initial problem of solving controlled differential equations driven by some random signal X. If we want to apply rough paths theory, we have to say what the iterated integrals of X should be. On the level of trajectories, it is not clear what a “natural” choice of an iterated integral is.5 We will see that taking into account the probabilistic properties of the process helps to find a “natural” candidate for an iterated integral. From now on, we will only consider finite dimensional Banach spaces. In the case of the Brownian motion with independent components, the natural choices for the iterated integrals are the usual It¯oand Stratonovich integrals: Z Z

dBu1 ⊗ dBu2 , ◦dBu1 ⊗ ◦dBu2 . 0

5However, it can be seen that every path of finite p-variation can be lifted to a geometric rough path, cf. Lyons and Victoir [LV07]. The problem is that this lift is not (and cannot be) unique.

7 Introduction are seen to be α-H¨oldercontinuous for every α < H, hence we can apply Young’s integration theory for H > 1/2. For H ≤ 1/2, the sample paths fail to be α-H¨olderfor α > 1/2. The question now is: are there still “natural” choices of iterated integrals with respect to a fractional Brownian motion in the case H < 1/2? The first article which gave an answer to this question is the work of Coutin and Qian [CQ02]. The authors consider the sequence (Πn) of dyadic partitions of the interval [0, t] and define the process BH (n) to be the process BH for which the sample paths are piecewise linear approximated at the points Πn. Considering the process  Z  H H H H Bt (n) − B0 (n), dBu1 (n) ⊗ dBu2 (n) , 0 1/3, this is a Cauchy sequence almost surely in rough paths topology. The limiting object is defined to be the rough path lift of BH . The authors also consider the third iterated integral and prove convergence, hence there is also a lift for H > 1/4. However, for H = 1/4, they can show that the second iterated integral diverges. This notion of a rough path lift of a fractional Brownian motion is indeed quite natural at least for two reasons. First, for H = 1/2, we find the usual Stratonovich integral. Second, it generalises the Wong-Zakai theorem for the fractional Brownian motion: If Y (n) denotes the solution of the random controlled differential equation

H dY (n)t = f(Y (n)t) dBt (n); Y (n) = Y ∈ ξ, by the universal limit theorem, Y (n) converges almost surely in rough paths topology to a limit which is precisely the solution Y of the corresponding rough differential equation (projected to the first tensor level). We would like to mention at this point that there are also different approaches to define a rough path lift to a fractional Brownian motion (cf. e.g. [Unt09] and the following articles by the same author), but we will not comment on these approaches here. Gaussian rough paths in the sense of Friz–Victoir In [FV10a], Friz and Victoir generalise the method of Coutin and Qian and give a sufficient criterion on the covariance function R under which a given Gaussian process can be lifted “in a natural way” to a process with sample paths in a rough paths space. In this thesis, we will always work in their framework, therefore we decided to sketch their main ideas here. Let X = (X1,...,Xd) be a d-dimensional Gaussian process with independent and identically distributed6 components. The main problem is to make sense of the integral Z t i i j (Xs − X0) dXs 0 for i 6= j. If the trajectories of X are differentiable, we can formally calculate the second moment: Z t 2 Z t Z t i i j i i i i j j E (Xs − X0) dXs = E (Xs − X0)(Xu − X0)∂sXs ∂uXu ds du 0 0 0 Z i i i i j j = E[(Xs − X0)(Xu − X0)] ∂s∂uE[Xs Xu] ds du [0,t]2 Z = R(s, u) − R(s, 0) − R(0, u) + R(0, 0) dR(s, u), [0,t]2 where R denotes the covariance function and the right hand side is a suitable version of a 2D Young integral. Fortunately, there is indeed a theory for 2 dimensional Young integration (developed by Towghi in [Tow02]) and we can bound the right hand side in terms of the 2

6The assumption that the components should have the same distribution is not really necessary and only assumed for the sake of simplicity.

8 dimensional ρ-variation of R provided ρ < 2. Natural approximations of the sample paths of the process X (such as a piecewise linear approximation or the convolution with a smooth function) yield approximations of the covariance function for which the ρ-variation is seen to be uniformly bounded. The following result should therefore not come as a surprise: Assume that the covariance function of every component of X has finite ρ-variation for some ρ < 2. Then there exists a natural lift of X to a process with values in a rough paths space. The lift of the process will be denoted by X in the following. The results of Friz and Victoir are sharp in the sense that the covariance function of a fractional Brownian motion is seen to have finite 1 ρ-variation for ρ = 2H , but not better. The threshold ρ = 2 therefore corresponds to the Hurst 1 parameter H = 4 for which Coutin and Qian already showed that the natural approximation of the second integral diverges. Once the existence of a Gaussian rough paths lift is established under this very general condition, it can be shown that many theorems from stochastic analysis proven for the Brownian motion generalise to Gaussian rough paths. For instance, in the article [FV10a], Friz and Victoir prove Fernique estimates for the lift X (see also the work of Friz and Oberhauser [FO10] for a different proof of this result). A support theorem for Gaussian rough paths is proven in [FV10a, Theorem 55]. A large deviation principle for the lift of a fractional Brownian motion was proven by Millet and Sanz-Sol´ein [MSS06] and later generalised for Gaussian rough paths by Friz and Victoir in [FV10b, Theorem 15.55]. A Malliavin-type calculus was established (cf. Cass, Friz and Victoir [CFV09], Friz and Victoir [FV10b, Chapter 20], Cass and Friz [CF11]) and a H¨ormander-type theorem for Gaussian rough paths can be proven (cf. Cass, Friz [CF10], Cass, Litterer, Lyons [CLL], Cass, Hairer, Litterer and Tindel [CHLT12]).

The results of this thesis

We will now summarise the main contributions of this thesis. More details to the respective results may be found in the beginning of the corresponding chapters.

In Chapter 1, we consider the Brownian rough paths lift B, seen as the Brownian incre- d ments of a Brownian motion B : [0, 1] → R together with its iterated Stratonovich integrals. By Lyons extension theorem, we can lift the sample paths of B to any p-rough paths space pro- vided p > 2. If we approximate the trajectories of the underlying Brownian motion piecewise linear at the points {0 < 1/n < 2/n < ··· < 1}, we obtain another process Bn with piecewise linear trajectories. This process can be lifted to a process Bn with sample paths in a p-rough paths space using Riemann-Stieltjes theory. The first result is the following. 1 1 Theorem I. For all p > 2 and η < 2 − p ,  1 η ρ 1 −H¨ol(B, Bn) ≤ C p n almost surely for all n ∈ N where C is a finite random variable.

ρ 1 (·, ·) denotes a rough paths metric here. Note that the convergence rate increases for p −H¨ol 1 large p but does not exceed 2 . From the local Lipschitz continuity of the It¯o–Lyons map, we immediately obtain convergence rates for the Wong–Zakai theorem. Moreover, for sufficiently smooth vector fields, the solution flow of a rough differential equation is differentiable and our convergence rates for the Wong–Zakai theorem also hold true on the level of flows:

Theorem II. Let f = (f0, f1, . . . , fd) be smooth vector fields and consider the (random) flow e y0 7→ UBn,t←0 (y0) on R defined by d X i dy = f0 (y) dt + fi (y) dBn, y (0) = y0. i=1

9 Introduction

Then a.s.

UBn,t←0 (y0) d converges uniformly (as do all its derivatives in y0) on every compact subset K ⊂ [0, ∞) × R ; and the limit UB,t←0 (y0) := lim UB ,t←0 (y0) n→∞ n solves the Stratonovich SDE

d X i dy = f0 (y) dt + fi (y) ◦ dB , y (0) = y0. i=1

d Moreover, for every η < 1/2 and every k ∈ {1, 2,... } and K ⊂ [0, ∞) × R , there exists an a.s. finite random variable C such that

 1 η max |∂αUB,·←0 (·) − ∂αUBn,·←0 (·)|∞;K ≤ C α=(α1,...,αe) n |α|=α1+···+αe≤k for all n ∈ N. 1 Note that this implies an almost sure Wong–Zakai convergence rate of (almost) 2 , which is known to be sharp (modulo possible logarithmic corrections). The results in this chapter were obtained in collaboration with Prof. Peter Friz and are published in the journal Bulletin des Sciences Math´ematiques, see [FR11].

In Chapter 2, we generalise the results of Chapter 1 to lifts of Gaussian processes X in the sense of Friz–Victoir. Again, Xn denotes the process with piecewise linear approximated trajectories. Our main theorem can be stated as follows:

Theorem III. Assume that the covariance of X has finite ρ-variation in 2D sense and that 1 the ρ-variation over every square [s, t]2 ⊂ [0, 1]2 can be bounded by a constant times |t − s| ρ . 1 1 2ρ Then for all η < ρ − 2 and p > 1−2ρη ,

ρ 1 (X, Xn) → 0 p −H¨ol for n → ∞ almost surely and in Lq for any q ≥ 1, with rate η.

Note again that a good convergence rate forces p to be chosen large. Note also that our theorem holds for much more general approximations than piecewise linear approximations. As a consequence, we obtain almost sure convergence rates for the Wong–Zakai theorem for Gaussian rough paths.

Corollary IV. Let f = (f0, f1, . . . , fd) be smooth vector fields and consider the random con- trolled differential equation

d X i dYn = f0(Yn) dt + fi(Yn) dXn; Yn(0) = ξ. i=1 Then a.s. Yn → Y 1 1 uniformly for n → ∞ with rate η for any η < ρ − 2 and the limit solves the random rough differential equation dY = f(Y ) dX; Y (0) = ξ.

10 Recall that Davie presented a step-2 Taylor scheme for solving rough differential equations (RDEs) and computed the convergence rate (cf. [Dav07]). Step-N schemes with convergence rates are considered in Friz and Victoir [FV10b, Chapter 10]. In [DNT12], Deya, Neuenkirch and Tindel present a simplified Milstein-type scheme for solving rough differential equations driven by a fractional Brownian motion. The advantage of this numerical scheme is that the iterated integrals (which are hard to simulate numerically) are replaced by a product of increments. Our results imply sharp convergence rates for these schemes in a general Gaussian setting.

7 Corollary V. The approximation Yn obtained by running a simplified step-3 Taylor scheme with mesh size 1/n for solving the random rough differential equation

dY = f(Y ) dX; Y (0) = ξ

1 1 converges almost surely uniformly to the solution Y with rate η for any η < ρ − 2 . This proves a conjecture stated by Deya, Neuenkirch and Tindel in the work [DNT12]. The results in this chapter were obtained in collaboration with Prof. Peter Friz and are accepted for publication by the journal Annales de l’Institut Henri Poincar´eProbabilit´eset Statistiques, see [FR].

In Chapter 3, we consider the work [CLL] of Cass, Litterer and Lyons. Before we describe their results, it will be useful to make the following definition. Recall that for each Gaussian process X, there is an associated Cameron–Martin space (or reproducing kernel Hilbert space).

Definition VI. We say that complementary Young regularity holds for the trajectories of a Gaussian process and its Cameron–Martin paths if the Cameron–Martin space is continuously embedded in the space of paths which have finite q-variation, the trajectories of X have finite p-variation almost surely and

1 1 + > 1. p q

The condition assures that we can make sense of the Young integral between the Cameron– Martin paths and the trajectories of the process. In [FV06b, Corollary 1], Friz and Victoir show that complementary Young regularity holds for the fractional Brownian motion with Hurst 1 parameter H > 4 and from their work [FV10a, Proposition 17] it follows that complementary Young regularity holds for a Gaussian process X for which the covariance has finite ρ-variation 3 for ρ < 2 . The aim of the article [CLL] is to prove that the Jacobian of a Gaussian RDE flow has q 8 finite L moments for every q ≥ 1. They introduce a map which assigns an integer Nα(x) to a p-rough path x (which equals the number the p-th power of the p-variation of x exceeds the barrier α). The main work of [CLL] is to show that if we replace the rough paths x by the lift of a Gaussian process X, this number has tails which are strictly “better” than exponential tails. More precisely, Nα(X) is seen to have Weibull tails with shape parameter strictly greater than 1 provided the trajectories of the underlying Gaussian process X and its Cameron–Martin paths have complementary Young regularity. Our first contribution is the identification of so-called locally linear maps Ψ, mapping from one rough paths space to another, under which the tail estimates remain valid. Our result is purely deterministic.

7In the case ρ = 1, a step-2 scheme converges with the same rate. 8The motivation for this is that this result can be used to prove that the solution of a Gaussian rough differential equation has a smooth density with respect to Lebesque measure at every fixed time point t, cf. also [CHLT12].

11 Introduction

Theorem VII. Let Ψ be a locally linear map. Then there is a α0 such that

Nα0 (Ψ(x)) ≤ Nα(x).

Since the p-variation of x can be bounded by a constant times Nα(x), the tail estimates obtained for Nα(x) remain valid for the p-variation of x. Rough integration and the It¯o-Lyons map are examples of locally linear maps; hence we immediately obtain

Corollary VIII. Assume that complementary Young regularity holds for the trajectories of X and its Cameron–Martin paths. Then the following objects have exponential tails9:

(i) The rough integral Z α(X) dX

where α is a suitable one-form.

(ii) The p-variation of Y where Y solves the random rough differential equation

dY = f(Y) dX; Y (0) = ξ

for smooth and bounded vector fields f.

For linear vector fields f ∈ L(W, L(V,W )) =∼ L(V,L(W, W )), the situation is different.

Theorem IX. If y solves the linear rough differential equation

dy = f(y) dx = f(dx)(y); y(0) = ξ, (7) then

p Nα(y) ≤ C(1 + ξ) exp(CNα(x)) for a constant C. In particular, if Nα(X) has Weibull tails with shape parameter strictly greater than 1 (which is the case if complementary Young regularity holds for the trajectories of X and its Cameron–Martin paths), the p-variation of the solution Y of the random linear rough differential equation (7) has finite Lq momemts for any q ≥ 1.

Our estimates particularly imply that the Jacobian of a Gaussian RDE flow has finite Lq moments for any q ≥ 1, which was the main result of the work [CLL]. The estimates are also robust in the sense that they can be used to prove uniform tail estimates. As an example, we show that a certain rough integral over a family of Gaussian processes has uniformly Gaussian tails, a technical result needed by Hairer in [Hai11]. The results in this chapter were obtained in collaboration with Prof. Peter Friz and are published in the journal Stochastic Analysis and Applications, see [FR13].

In Chapter 4, we apply the methods from Lyons and Xu presented in [LX12] to bound the distance between two Gaussian rough paths in p-variation topology. Our estimates are very similar to the ones needed for proving Theorem III, but we show how to avoid the algebraic machinery presented in Chapter 2 and still get optimal bounds in the case ρ = 1. Our main theorem states the following:

9In fact, our tail estimates are sharper and can be expressed in terms of Weibull tails, cf. Chapter 3. We restrict ourselves to exponential tails for the sake of simplicity.

12 Theorem X. Let (X,Y ) = (X1,Y 1, ··· ,Xd,Y d) be a jointly Gaussian process and let (Xi,Y i) j j 3 and (X ,Y ) be independent for i 6= j. Assume that there is a ρ ∈ [1, 2 ) such that the ρ- variation of R(Xi,Y i) is bounded by a constant K for every i = 1, . . . , d. Let γ ≥ ρ such that 1 1 γ + ρ > 1. Then, for every p > 2γ, q ≥ 1 and δ > 0 small enough, there exists a constant CK such that

1 1 (i) if 2γ + ρ > 1, then

ρ 1− γ q |ρp−var(X, Y)|L ≤ C sup |Xt − Yt|L2 , (8) t

1 1 (ii) if 2γ + ρ ≤ 1, then

3−2ρ−δ q |ρp−var(X, Y)|L ≤ C sup |Xt − Yt|L2 . (9) t

In our theorem, ρp−var(·, ·) denotes a p-rough path metric. Note that for ρ = 1, we can 1 1 always use the estimate (8). Inequality (8) is actually valid for all γ ≥ ρ provided γ + ρ > 1 which can be seen by using the techniques developed in Chapter 2, but one aim of Chapter 4 is to show that we can avoid a bulk of calculations and still obtain estimate (9) (which is not sharp though). One can show that our results imply convergence rates as in Theorem III, and for ρ = 1 we obtain optimal convergence. Another application of Theorem X appears in the field of stochastic partial differential equations. In [Hai11], Hairer considers the stationary solution ψ of the equation

dψ = (∂xx − 1)ψ dt + σ dW (10) where σ is a positive constant, the spatial variable x takes values in [0, 2π], ∂xx is equipped with periodic boundary conditions and dW is space-time white noise, i.e. a standard cylindrical 2 on L ([0, 2π], R). Hairer shows that for every fixed time point t, the Gaussian process ψ¯t obtained by taking d independent copies of the spatial processes ψt can be lifted to a process Ψ¯ t with values in a p-rough paths space for any p > 2. He also shows that there is a continuous modification of the map t 7→ Ψ¯ t. Our results imply optimal time regularity.

Corollary XI. There is an α-H¨oldercontinuous modification of the map t 7→ Ψ¯ t for every 1 1 α < 4 − 2p . Note that the H¨olderexponent increases for large p and is bounded by 1/4 which is known to be a sharp bound. The results in this Chapter were obtained in collaboration with Weijun Xu and are available online, see [RX12].

In Chapter 5, we reinvestigate the solution of the modified heat equation (10). The space regularity of ψt essentially depends on two factors: the smoothing effect of the operator ∂xx and the “colouring” of the noise dW . We have already seen that the crucial condition for lifting ψt to a process in a rough paths space (in the sense of Friz–Victoir) is a sufficiently regular covariance function Rψt in terms of 2 dimensional ρ-variation. The parameter ρ should therefore also depend on the smoothing effect of the operator and the colouring of the noise.

Our main theorem determines the parameter ρ for which the ρ-variation of Rψt is finite in terms of the spectrum of the operator and the noise. For simplicity, we only state the result for the fractional heat equation here.

13 Introduction

Theorem XII. Let ψ be the stationary solution of the fractional, modified heat equation

α dψ = (−(−∂xx) − 1)ψ dt + σ dW ; α ∈ (1/2, 1] (11) with periodic boundary conditions where dW is space-time white noise. Then the ρ-variation 1 ¯α 1 d i of Rψ is finite for ρ = 2α−1 . In particular, if we set ψ := (ψ , . . . , ψ ) where the ψ are ¯α independent copies of ψ, for every fixed t we can lift the trajectories of ψt to p-rough paths 2 in the sense of Friz–Victoir for all p > 2α−1 provided α > 3/4. Moreover, there is a H¨older ¯ α continuous modification of the lifted process t 7→ Ψt . In addition, our results imply uniform bounds of the ρ-variation for viscosity and Galerkin approximations of (11) which can be used in a future work for numerical considerations. We also give a new and easy criterion on the covariance of a Gaussian process with stationary increments to have finite ρ-variation. If the process is given as a random Fourier series (as in the situation above), these conditions translate into conditions on the Fourier coefficients. The results in this chapter were obtained in collaboration with Prof. Peter Friz, Dr. Ben- jamin Gess and Prof. Archil Gulisashvili and are available online, see [FGGR12].

In Chapter 6 we come back to numerical considerations. Let Y be the solution of a random rough differential equation dY = f(Y ) dX; Y (0) = ξ, (12) X being the lift of a Gaussian process whose covariance is of finite ρ-variation. In Corollary V, we saw that there is an easy implementable numerical scheme which converges almost surely to the solution of (12). Let Yn denote an approximation of Y using such a scheme with mesh-size 1/n. Assume now that we are interested in evaluating a quantity of the form Eg(Y ) where g is a functional which may depend on the whole path of Y . The first obstacle is that we do not know, even for smooth g, with which rate Eg(Yn) converges to Eg(Y ) for n → ∞ since we only proved almost sure convergence, not L1 convergence (which would imply a convergence rate when g is at least Lipschitz). This is our first result.

Theorem XIII. Assume that complementary Young regularity holds for the trajectories of X and its Cameron–Martin paths. Then the Wong-Zakai approximation in Corollary IV and the simple Taylor scheme in Corollary V both converge in Lq for any q ≥ 1 with the same convergence rate.

We would like to mention here that the proof of this theorem is more involved than one might expect at a first sight. In fact, we improve the estimate for the Lipschitz constant of the It¯o–Lyons map slightly, using similar estimates as in Chapter 3 for the case of linear rough differential equations, and then use the results of Cass, Litterer and Lyons in [CLL] to prove the assertions. As an immediate corollary of Theorem XIII, we obtain strong convergence rates for our numerical scheme in the case when g is Lipschitz. However, at the present stage, we can only bound the weak convergence rate from below with the strong rate, whereas the weak rate might be better, at least for smooth g. If we want to evaluate Eg(Y ), a Monte–Carlo evaluation would be a possible and easy method. In the seminal work [Gil08b], Giles showed that one can reduce the computational complexity (more precisely: its asymptotics for a given mean squared error) dramatically when using a multilevel Monte Carlo method. For us, the multilevel method is also interesting because the strong convergence rate plays a more important role here than the weak one (which would be used calculating the complexity of the usual Monte Carlo evaluation). Indeed, we can prove an abstract, more general complexity theorem as in [Gil08b] which fits our purposes. Applied to the evaluation of Eg(Y ), we can prove the following result. Theorem XIV. Assume that complementary Young regularity holds for the trajectories of X and its Cameron–Martin paths and that g is Lipschitz. Then the Monte Carlo evaluation of

14 2 a path-dependent functional of the form Eg(Y ), to within a mean squared error of ε , can be achieved with computational complexity   2ρ O ε−θ ∀θ > . 2 − ρ In the case of a Brownian motion, the asymptotics of the computational complexity is bounded by O ε−θ for any θ > 2 which is known to be sharp modulo logarithmic corrections, cf. [Gil08a, Gil08b]. Compared to a usual Monte Carlo method, we see that indeed a multilevel method decreases the computational complexity in the general Gaussian setting. The results in this chapter were obtained in collaboration with Dr. Christian Bayer, Prof. Pe- ter Friz and PD Dr. John Schoenmakers.

15 Introduction

16 Notation and basic definitions

In this chapter, we introduce the most important concepts and definitions from rough path theory. For a detailed account, we refer to [FV10b], [LCL07] and [LQ02]. Fix a time interval [0,T ]. For all s < t ∈ [0,T ], we define the n-simplex n ∆s,t := {(u1, . . . , un) | s < u1 . . . < un < t}. 2 We will simply write ∆s,t instead of ∆s,t and ∆ instead of ∆0,T . Let (E, d) be a metric space and x ∈ C([0,T ],E). For p ≥ 1 and α ∈ (0, 1] we define

1   p X p d (xu, xv) kxkp−var;[s,t] := sup  |d(xti , xti+1 )|  and kxkα−H¨ol;[s,t] := sup α D⊂[s,t] (u,v)∈∆s,t |v − u| ti,ti+1∈D where D ⊂ [s, t] means that D is a finite dissection of the form {s = t0 < . . . < tM = t} of the interval [s, t]. We will use the short hand notation k·kp−var and k·kα−H¨ol for k·kp−var;[0,T ] resp. k·kα−H¨ol;[0,T ] which are easily seen to be semi-norms. Given a positive integer N, the truncated tensor algebra of degree N is given by the direct sum ⊗N N d d  d d  d T (R ) = R ⊕ R ⊕ R ⊗ R ⊕ ... ⊕ R N M d ⊗n = (R ) n=0

N d d⊗n N d and we will write πn : T R → R for the projection on the n-th tensor level. T R is N d N d a (finite-dimensional) R-vector space. For elements g, h ∈ T R , we define g ⊗ h ∈ T R by n X πn (g ⊗ h) = πn−i (g) ⊗ πi (h) . i=0 N d  One can easily check that T R , +, ⊗ is an associative algebra with unit element e := 1 + 0 + 0 + ... + 0 . We call it the truncated tensor algebra of level N. A norm is defined by

|g| N d = max |πn (g)| T (R ) n=0,...,N

N d which turns T R into a Banach space. N d A continuous map x: ∆ → T R is called multiplicative functional if for all s < u < t 1 d d one has xs,t = xs,u ⊗ xu,t. For a path x = x , . . . , x : [0,T ] → R and s < t, we will use the notation xs,t = xt − xs. If x has bounded variation (or finite 1-variation), we define its n-th iterated integral by Z n xs,t = dx ⊗ ... ⊗ dx n ∆s,t Z  ⊗n X i1 in d = dx . . . dx ei1 ⊗ ... ⊗ ein ∈ R ∆n 1≤i1,...,in≤d s,t Notation, basic definitions

d where {e1, . . . , ed} denotes the Euclidean basis in R and (s, t) ∈ ∆. The canonical lift N d SN (x) : ∆ → T R is defined by

   xn if n ∈ {1,...,N} π S (x) = s,t n N s,t 1 if n = 0.

It is well known (as a consequence of Chen’s theorem) that SN (x) is a multiplicative functional. Set xt := SN (x)0,t. One can show that xt really takes values in

N  d n N  d 1-var  d o G R = g ∈ T R : ∃x ∈ C [0, 1] , R : g = SN (x)0,1 ,

N d a submanifold of T R , called the free step-N nilpotent Lie group with d generators. The N d N d dilation operator δ : R × G R → G R is defined by

i πi (δλ(g)) = λ πi(g), i = 0, ..., N.

The Carnot-Caratheodory norm, given by

n 1-var  d o kgk = inf length(x): x ∈ C [0, 1] , R ,SN (x)0,1 = g

N d defines a continuous norm on G R , homogeneous with respect to δ. This norm induces a N d (left-invariant) metric on G R known as Carnot-Caratheodory metric,

−1 d(g, h) := g ⊗ h .

N d N d Let x, y ∈ C0 [0,T ],G R , the space of continuous G R -valued paths started at the neutral element. We will use the canonical notion of increments expressed by

−1 xs,t := xs ⊗ xt.

We define p-variation- and α-H¨older-distancesby

1   p X p dp−var;[s,t](x, y) := sup  |d(xti,ti+1 , yti,ti+1 )|  and D⊂[s,t] ti,ti+1∈D

d (xu,v, yu,v) dα−H¨ol;[s,t](x, y) := sup α . (u,v)∈∆s,t |v − u|

As usual, we set dp−var := dp−var;[0,T ] and dα−H¨ol := dα−H¨ol;[0,T ]. Note that dα−H¨ol(x, 0) = kxkα−H¨ol and dp−var(x, 0) = kxkp−var where 0 denotes the constant path equal to the neutral element. These metrics are called homogeneous rough paths metrics. We define the following path spaces:

p−var N d N d (i) C0 [0,T ] ,G R : the set of continuous functions x from [0,T ] into G R such that kxkp−var < ∞ and x0 = e.

α−H¨ol N d N d (ii) C0 [0,T ] ,G R : the set of continuous functions x from [0,T ] into G R such that kxkα−H¨ol < ∞ and x0 = e.

0,p−var N d (iii) C0 [0,T ] ,G R : the dp−var-closure of

n d o SN (x) , x : [0,T ] → R smooth .

18 0,α−H¨ol N d (iv) C0 [0,T ] ,G R : the dα−H¨ol-closure of

n d o SN (x) , x : [0,T ] → R smooth .

If N = bpc, the elements of the spaces (i) and (ii) are called weak geometric (H¨older)rough paths, the elements of (iii) and (iv) are called geometric (H¨older)rough paths. It is clear by definition that every p-rough path is also a multiplicative functional. By Lyon’s First Theorem (or Extension Theorem, see [Lyo98, Theorem 2.2.1] or [FV10b, 1 Theorem 9.5]) every p-rough path (or p -H¨olderrough path) x has a unique lift to a path with N d finite p-variation in G R for any N ≥ [p]. We denote this lift by SN (x) and call it the Lyons lift. Note that the map SN is continuous in rough paths topology. For p-rough paths 1 and p -H¨olderrough paths x, we will also use the notation Z n   xs,t := dx ⊗ · · · ⊗ dx := πn SN (x)s,t n ∆s,t for N ≥ n. This is consistent with our former definition in the case where x had bounded variation. An immediate consequence of the extension theorem is that any p-rough path x can be lifted to a q-rough path for every q ≥ p. We will sometimes abuse notation and use the same letter x to denote this lift. In this work, we will often be interested in inhomogeneous rough paths metrics which we aim to define now. First, recall that a control is a function ω : ∆ → R+ which is continuous and super-additive in the sense that for all s ≤ u ≤ t one has

ω(s, u) + ω(u, t) ≤ ω(s, t).

If ω is a control, we define kx k kxk := sup u,v (with the convention 0/0 := 0) p−ω;[s,t] 1/p s≤u

N N where N ≥ bpc. If N = bpc, ρp−ω and ρp−var define rough paths metrics and we will write ρp−ω and ρp−var in that case. Note that the metrics dp−var and ρp−var both induce the same topology 1 −H¨ol N d on the rough paths spaces, as do the metrics d 1 and ρp−ω on C p ([0,T ],G (R )) with p −H¨ol the choice ω(s, t) = |t − s|; cf. [FV10b] for more details. We will now define two notions of two-dimensional p-variation. Let A = [a, b]×[c, d] ⊆ [0,T ]2 be a rectangle. If a = b or c = d we call A degenerate. Two rectangles are called essentially disjoint if their intersection is empty or degenerate. A partition Π of a rectangle R ⊆ [0,T ]2 is a finite set of essentially disjoint rectangles, whose union is R; the family of all such partitions 2 is denoted by P (R). A rectangular increment of a function f : [0,T ] → R is defined in terms of f evaluated at the four corner points of A,

 a, b  f (A) := f := f(b, d) − f(b, c) − f(a, d) + f(a, c). c, d

19 Notation, basic definitions

Recall that a dissection D of an interval [a, b] ⊂ [0,T ] is of the form

D = {a = t0 < t1 < ··· < tn = b} 2 and we will write D ([a, b]) for the family of all such dissections. For a function f : [0,T ] → R and p ≥ 1 we set   1   p p X ti, ti+1 Vp (f;[s, t] × [u, v]) := sup  f 0 0  , t , t D=(ti)∈D([s,t]) i,j j j+1 0 0 D =(tj )∈D([u,v]) !1/p X p |f|p-var;[s,t]×[u,v] := sup |f (A)| . Π∈P([s,t]×[u,v]) A∈Π 2 We say that f has finite p-variation if Vp(f; [0,T ] ) < ∞ and finite controlled p-variation if p |f| 2 < ∞. If in addition |f| ≤ C|t − s| for a constant C and all [s, t] ⊆ [0,T ] p-var;[0,T ] p-var;[s,t]2 we say that f has finite H¨oldercontrolled p-variation. The difference of 2D p-variation and controlled p-variation is that in the former, one only takes the supremum over grid-like partitions whereas in the latter, one takes the supremum over all partitions of the rectangle. The reason for introducing these two notions of 2D p-variation comes from the fact that p [s, t] × [u, v] 7→ Vp(f;[s, t] × [u, v]) will in general not be superadditive, hence does not define a 2D control function: A function ω : ∆ × ∆ → R+ is called a (2D) control if it is continuous, zero on degenerate rectangles and super-additive in the sense that for all rectangles R ⊂ [0,T ]2, n X ω (Ri) ≤ ω (R) i=1 whenever {Ri : i = 1, . . . , n} ∈ P (R). ω is called symmetric if ω ([s, t] × [u, v]) = ω ([u, v] × [s, t]) holds for all s < t and u < v. We say that the p-variation of f is controlled by ω if |f (R)|p ≤ ω (R) holds for all rectangles R ⊂ [0,T ]2. It is easy to see that if ω is a 2D control, (s, t) 7→ ω([s, t]2) defines a 1D-control. By superadditivity, the existence of a con- trol ω which controls the p-variation of f implies that f has finite controlled p-variation and 1/p |f|p−var;R ≤ ω (R) . In this case, we can always assume w.l.o.g. that ω is symmetric, otherwise we just substitute ω by its symmetrization ωsym given by

ωsym ([s, t] × [u, v]) = ω ([s, t] × [u, v]) + ω ([u, v] × [s, t]) . The connection between finite variation and finite controlled p-variation is summarized in the following theorem. 2 2 Theorem 0.0.1. Let f : [0,T ] → R be continuous and R ⊂ [0,T ] be a rectangle. (i) We have

V1 (f, R) = |f|1−var;R . (ii) For any p ≥ 1 and  > 0 there is a constant C = C (p, ) such that 1 |f| ≤ V (f, R) ≤ |f| . C (p+)−var;R p−var p−var;R (iii) If f has finite controlled p-variation, then p R 7→ |f|p−var;R is a 2D-control. In particular, there exists a 2D-control ω such that for all rectangles R ⊂ [0,T ]2 we have |f (R)|p ≤ ω (R), i.e. ω controls the p-variation of f. Proof. [FV11, Theorem 1]. For more details on higher dimensional p-variations we refer to [FV11, FV10b].

20 1

Convergence rates for the full Brownian rough paths with applications to limit theorems for stochastic flows

The purpose of this chapter is to give a proof of a quantitative version of a well-known limit theorem for stochastic flows which goes back to Bismut, Kunita, Malliavin, ... (see [Mal97, Thm 6.1] and the references therein). It says, in essence, that if one uses piecewise linear approxi- mations to multidimensional Brownian driving signals, the resulting solutions to the (random) ODEs will converge as stochastic flows to the solution of the (Stratonovich) stochastic differen- tial equations; that is, the solution flows and all their derivatives will convergene uniformly on compacts. It has been been understood in recent years that rough path theory [Lyo98, LQ02, FV10b] is ideally suited to prove such limit theorems; also on the level of flows [LQ98, FV10b]. In fact, in [Mal97, p. 216] Malliavin himself remarked ”Lyons’s forthcoming theory will reduce the proof of [Mal97, Thm 6.1] to a limit theorem for Brownian motion and L´evy’sarea”. The price one has to pay for this reduction is that one has to work with refined H¨older(or p-variation) metrics on rough path spaces; to wit, if B denotes the Brownian rough paths; i.e. d-dimensional Brownian motion B and so (d)-valued L´evy’sarea A written as

Z · ⊗2 d  d B = exp (B + A) = 1 + B + B ⊗ ◦dB ∈ R ⊕ R ⊕ R 0 its rough path regularity is summarized in that there exists a 1/α ∈ [2, 3), in fact we may take any α ∈ (1/3, 1/2), such that1

R t (B − B ) ⊗ ◦dB |Bt − Bs| s r s ρα-H¨ol;[0,T ] (B, 1) ≡ sup α ∨ sup 2α 0≤s

n ρα-H¨ol;[0,T ] (B,S2 (B )) → 0 a.s.

1 d d⊗2 We set 1 = exp (0) = 1 + 0 + 0 in R ⊕ R ⊕ R . Rates Brownian rough paths where Bn = BDn denotes the piecewise linear approximations to B, based on the dyadics n n Dn = {iT/2 : i = 0, 1,..., 2 },

Z · n n n ˙ n S2 (B ) = 1 + B + Br ⊗ Br dr 0 is the canonical lift of the piecewise smooth sample paths of Bn. There are various ways of proving this, the most elegant perhaps being the argument of [Mal97, Thm 5.2], combined with interpolation and a uniformity consideration based on Doob’s maximal inequality [FV10b, Ch. 12]; such martingale arguments are not applicable when working with more general piecewise linear approximations, such as those based on Dn = {iT/n : i = 0, 1, . . . , n}. Nonetheless a direct computation [HN09] shows that

1  1 η ∀η < − α : ρ (B,S (B )) = O a.s. 2 α-H¨ol;[0,T ] 2 n n and one can see that this is the best result of this type. As a corollary, local Lipschitzness of the Itˆo(-Lyons) map implies convergence of the corresponding stochastics flows with the same rate η. It is interesting to compare this with known convergence rates of such approximations as established in the works of Gy¨ongy[GS06] and others; there, it seems to be folklore of the subject that convergence takes place with rate η < 1/2. (Similar to the rate of strong convergence of the Euler-scheme, which is also of rate η < 1/2). At first glance, these results are recovered by taking α ↓ 0. Unfortunately, this is not possible here. The problem is that in current setting based on N = 2 iterated integrals ρα-H¨ol;[0,T ] ceases to be a rough path metric for α ≤ 1/3. Indeed, rough path theory dictates a sharp relationship between the number of required levels, N, and the (α-H¨oldertype) regularity of the signals under consideration:

N = [1/α]; see [FO09] for some subtle (counter-)examples in this context. Back to the case N = 2, we are 1 forced into the regime α ∈ (1/3, 1/2]. In particular, the ”best” rate η < 2 −α can only be taken arbitrarily close to 1 1 1 − = 2 3 6 which leaves a significant gap to the known rate ”anything less than 1/2”. This gap was also noted, in the context of fractional Brownian motion, in [DNT12]. Having explained the problem from this point of view, the idea to work with general level N, instead of level 2, is not far and this is precisely what we shall do. To wit, we shall establish

Theorem 1.0.1 (Rates for the full Brownian rough path). Let Bn denote piecewise linear ap- proximation to d-dimensional Brownian motion B based on the dissection Dn = {iT/n : 0 ≤ i ≤ n} 1 of [0,T ]. For any integer N, α ∈ [0, 1/2) and η < 2 − α there exists an a.s. finite random variable C (ω) such that

 1 η ρ (S (B) ,S (B )) ≤ C (ω) . α-H¨ol;[0,T ] N N n n

Proof. By scaling it suffices to discuss T = 1. This is the content of the next section.

The above rates now lead to the following quantitative version of the limit theorem for stochastic flows [Mal97, Thm 6.1].

22 e ∞ Corollary 1.0.2. Consider, on R , (d + 1) C -bounded vector fields V0,V1,...,Vd. Consider e the (random) flow y0 7→ UBn,t←0 (y0) on R defined by

d X i dy = V0 (y) dt + Vi (y) dBn, y (0) = y0 i=1 where Bn is the piecewise linear approximation to Brownian motion based on the dissection Dn = {iT/n : 0 ≤ i ≤ n} of [0,T ]. Then a.s.

UBn,t←0 (y0)

d converges uniformly (as do all its derivatives in y0) on every compact subset K ⊂ [0,T ] × R ; and the limit

UB,t←0 (y0) := lim UB ,t←0 (y0) n→∞ n solves the Stratonovich SDE

d X i dy = V0 (y) dt + Vi (y) ◦ dB , y (0) = y0. i=1

d Moreover, for every η < 1/2 and every k ∈ {1, 2,... } and K ⊂ [0,T ] × R , there exists an a.s. finite random variable C (ω) such that

 1 η max |∂αUB,·←0 (·) − ∂αUBn,·←0 (·)|∞;K ≤ C (ω) . α=(α1,...,αe) n |α|=α1+···+αe≤k

Proof. We recall that UB,t←0 (y0) can be obtained as solution to the rough differential equation (RDE) d X dy = V0 (y) dt + Vi (y) dB ≡ V (y) d (t, B) . i=1 As is well-known, the map B 7→ y is (locally) Lipschitz continuous when regarding B as geo- e metric α-H¨olderrough path, α ∈ (1/3, 1/2), and y a R -valued α-H¨olderpath. (The presence of a drift vector field V0 does not affect this; it suffices to remark that the so-called Young-pairing map B 7→ (t, B) , has similar Lipschitz regularity in rough path metric.) By basic consistency properties of RDE solutions, y is also the solution of the same RDE driven by the Lyons lift SN (B), any N ≥ 2, and the map SN (B) 7→ y is (locally) Lipschitz continuous when regarding B as geometric α- e H¨olderrough path, α ∈ (1/ (N + 1) , 1/N), and y a R -valued α-H¨olderpath. Moreover, this (local) Lipschitz regularity persists when one regards the ensemble

{∂aUB,·←0 (·): |a| ≤ k}

(here a = a1, . . . , ae denotes a multi-index of order |a| = a1 + ··· + ae) which corresponds to a (non-explosive! cf. Thm 11.12 in [FV10b]) system of rough differential equations; after localization the entire ensemble solves a (high-dimensional) rough differential equation and we argue as above. Finally, given η < 1/2 we pick α > 0 such that η < 1/2−α and then N := [1/α]. From our ”Rates for the full Brownian rough path” and (local) Lipschitz continuity of the solution map, on the level of stochastic flows, the result follows.

23 Rates Brownian rough paths

1.1 Rates of Convergence for the full Brownian rough path

Note that in the forthcoming section, we will use generic constants c which may depend on the dimension d without explicitly mentioning it. As seen before, we can restrict ourselfes to the case [0,T ] = [0, 1]. N d Lemma 1.1.1. Let x be a multiplicative functional in T R , (s, t) ∈ ∆ and s = t0 < . . . < tm = t, m ≥ 1. Then m−1 X πn(xs,t) = πn(xti,,ti+1 ) (1.1) i=0 m−1 n−1 X X + πn−l(xs,ti ) ⊗ πl(xti,ti+1 ) i=1 l=1 for every n = 1,...,N. Proof. Easy, for instance by induction over m.

The next lemma gives an L2-estimate for the higher order iterated Stratonovich integrals of B. d Lemma 1.1.2. Let B be a Brownian motion in R and take n ∈ N. Then there is a constant c depending only on n such that

Z n 2 ◦dB ⊗ ... ⊗ ◦dB ≤ c |t − s| ∆n s,t L2 for any (s, t) ∈ ∆. Proof. Follows from Brownian scaling.

Let D = {0 = t0 < t1 < . . . < tm = 1} be a partition of the unit interval. We use the notation D d |D| = maxi=1,...,m |ti − ti+1| for the mesh-size of D. B : [0, 1] → R denotes the piecewise linear D D approximation of B w.r.t. D, i.e. Bti = Bti for ti ∈ D and, for t ∈ [ti, ti+1], Bt is defined via BD − BD B − B t ti = ti+1 ti . t − ti ti+1 − ti It is clear that BD is a Gaussian process with paths of finite variation. Therefore we can define its n-th iterated integral Z ⊗n D D  d dB ⊗ ... ⊗ dB ∈ R n ∆0,1 in Riemann-Stieltjes sense. We will later show that Lemma 1.1.2 also holds for BD where the constant c does not depend on the choice of D. Note that for t ∈ (ti, ti+1), B − B B B˙ D = ti+1 ti =D 1 . t 1/2 ti+1 − ti (ti+1 − ti)

Therefore, in the special case (s, t) = (ti, ti+1), we already see that

Z B⊗n Z D D 1 L2 dB ⊗ ... ⊗ dB = n du1 ··· dun (1.2) ∆n |ti+1 − ti| 2 ∆n ti,ti+1 L2 ti,ti+1 ⊗n n B1 L2 |ti+1 − ti| = n |ti+1 − ti| 2 n! n = c |ti+1 − ti| 2

24 Rates of Convergence for the full Brownian rough path

B⊗n | 1 |L2 with c = n! which only depends on n. Next, we proof a technical lemma which we will need for the proof of Lemma 1.1.4. Recall the definition of the L´evyarea of a Brownian motion B = B1,...,Bd: For (s, t) ∈ ∆, d d As,t ∈ R ⊗ R is defined by

X i,j As,t = As,tei ⊗ ej where 1≤i,j≤d Z t  i,j 1 i j j i As,t = Bs,r dBr − Bs,r dBr . 2 s

d Lemma 1.1.3. Let B be a Brownian motion in R on a probability space (Ω, F, (Ft) , P). Let D = {s = t0 < . . . < tm = t} be a partition of the interval [s, t] ⊂ [0, 1]. Assume that X is a d⊗n in R and that Xti is Fti -measurable for all ti ∈ D.

(i) There is a constant c1 = c1 (n) such that

m−1 X 1/2 Xt ⊗ Bt ,t ≤ c1 max |Xt | 2 |t − s| . i i i+1 i=0,...,m−1 i L i=0 L2

(ii) There is a constant c2 = c2 (n) such that

m−1 X 1/2 1/2 Xt ⊗ At ,t ≤ c2 |D| max |Xt | 2 |t − s| . i i i+1 i=0,...,m−1 i L i=0 L2

Proof. (i) The process X can be written as

X j1,...,jn Xt = Xt ej1 ⊗ ... ⊗ ejn , 1≤j1,...,jn≤d

j ,...,jn where the X 1 are real-valued processes for any choice of j1, . . . , jn ∈ {1, . . . , d}. Fix j, j1, . . . , jn ∈ {1, . . . , d}. Using independence of the Brownian increments,

 m−1 !2 m−1 X X  2  2 E Xj1,...,jn Bj = E Xj1,...,jn Bj  ti ti,ti+1  ti ti,ti+1 i=0 i=0 m−1 2 2 X j1,...,jn j = Xt Bt ,t i L2 i i+1 L2 i=0 m−1 2 j1,...,jn X ≤ max Xt |ti+1 − ti| i=0,...,m−1 i L2 i=0 2 ≤ max |Xt | 2 |t − s| . i=0,...,m−1 i L

(ii) Fix j, k, j , . . . , j ∈ {1, . . . , d}. Like for the Brownian increments, we know that Aj,k and 1 n ti,v

25 Rates Brownian rough paths

Xj1,...,jn are independent and that E(Aj,k ) = 0 for all t < v. Therefore, ti ti,v i  m−1 !2 m−1 X X  2  2 E Xj1,...,jn Aj,k = E Xj1,...,jn Aj,k  ti ti,ti+1  ti ti,ti+1 i=0 i=0 m−1 2 2 X j1,...,jn j,k = Xt At ,t i L2 i i+1 L2 i=0 m−1 2 j1,...,jn X 2 ≤ c max Xt |ti+1 − ti| i=0,...,m−1 i L2 i=0 m−1 2 j1,...,jn X ≤ c |D| max Xt |ti+1 − ti| i=0,...,m−1 i L2 i=0 2 ≤ c |D| max |Xt | 2 |t − s| . i=0,...,m−1 i L

Define N  d SN (B): ∆ → T R by ( R   ∆n ◦dB ⊗ ... ⊗ ◦dB if n ∈ {1,...,N} πn SN (B) = s,t s,t 1 if n = 0.

It is known that SN (B) is a multiplicative functional. In rough path terms, SN (B) is the step N Lyons-lift of the enhanced brownian motion B. D    D  Since B and B are Gaussian processes, the random variables πn SN (B)s,t and πn SN (B )s,t n n are elements in the n-th (non-homogeneous) Wiener Chaos C (P). Note that for Z ∈ C (P) and q > 2, n/2 |Z|L2 ≤ |Z|Lq ≤ |Z|L2 (n + 1) (q − 1) q n (see e.g. [FV10b, Ch. 15, Sect. 3.1]). As a consequence, all L -norms are equivalent on C (P). n m In particular, for Z ∈ C (P) and W ∈ C (P),

|Z ⊗ W |L2 ≤ c (n, m) |Z|L2 |W |L2 . The next lemma contains the main work of this section.

Lemma 1.1.4. Let B be a Brownian motion on a probability space (Ω, F, (Ft) , P). Let n ≤ N and D = {s = t0 < t1 < . . . < tm = t} be any partition of any interval [s, t] ⊂ [0, 1]. Then there is a constant c = c (n) such that   1 n−1 D 2 2 πn SN (B )s,t − SN (B)s,t ≤ c |D| |t − s| . L2 Remark 1.1.5. By the scaling property of the Brownian motion, it would have been enough to proof the lemma for (s, t) = (0, 1). Indeed, for arbitrary s < t,

 D  D n/2  D˜  πn SN (B )s,t − SN (B)s,t = |t − s| πn SN (B )0,1 − SN (B)0,1

 ti−s where D˜ = 0 = t˜0 < . . . < t˜m = 1 is defined by t˜i = for all i = 0, . . . , m. Clearly, t−s D˜ = |D| and hence |t−s|     D n/2 D˜ πn SN (B )s,t − SN (B)s,t = |t − s| πn SN (B )0,1 − SN (B)0,1 L2 L2 1/2 n/2 ˜ ≤ c |t − s| D 1/2 n−1 = c |D| |t − s| 2

26 Rates of Convergence for the full Brownian rough path

Proof. By induction over n. For the cases 1 and 2 and (s, t) = (0, 1), we use [FV10b, Prop. 1 13.20] where we let r ↓ 0. For general (s, t) ∈ ∆ the estimate follows from Remark 1.1.5. Suppose now that the statement is true for all n0 ∈ {1, . . . , n − 1}. We have to show the estimate for n with n ≥ 3. By Lemma 1.1.1, we know that   D πn SN (B )s,t − SN (B)s,t L2 m−1 X  D  ≤ πn SN (B )t ,t − SN (B) i i+1 ti,ti+1 i=0 L2 n−1 m−1 X X + π (S (BD) ) ⊗ π (S (BD) ) n−l N s,ti l N ti,ti+1 l=1 i=1

−πn−l(SN (B)s,t ) ⊗ πl(SN (B)t ,t ) . i i i+1 L2 We claim that

m−1 X  D  πn SN (B )t ,t − SN (B) i i+1 ti,ti+1 i=0 L2 1 n−1 ≤ c0 |D| 2 |t − s| 2 (1.3) and

m−1 X π (S (BD) ) ⊗ π (S (BD) ) n−l N s,ti l N ti,ti+1 i=1

−πn−l(SN (B)s,t ) ⊗ πl(SN (B)t ,t ) i i i+1 L2 1 n−1 ≤ cl |D| 2 |t − s| 2 (1.4) for l = 1, . . . , n − 1. Setting c = c0 + ... + cn−1 then gives us the desired result. We start with (1.3). Since n ≥ 3, we can use Lemma 1.1.2, (1.2) and an estimate of the form |a − b| ≤ |a| + |b| to see that

m−1 X π (S (BD) − S (B) ) n N ti,ti+1 N ti,ti+1 i=0 L2 m−1 X D ≤ πn(SN (B )ti,ti+1 ) L2 + πn(SN (B)ti,ti+1 ) L2 i=0 m−1 X n/2 ≤ c (n) |ti+1 − ti| i=0 m−1 1/2 X n−1 ≤ c |D| |ti+1 − ti| 2 i=0 n−1 m−1 ! 2 1/2 X ≤ c |D| |ti+1 − ti| i=0 1/2 n−1 = c |D| |t − s| 2

We used here the basic inequality

p p p |a1| + ... |an| ≤ (|a1| + ... |an|)

27 Rates Brownian rough paths which is true for p ≥ 1. Now we come to (1.4). First we consider the case l = 1. Then,

m−1 X D D πn−1 SN (B ) ⊗ B t ,t − πn−1 (SN (B) ) ⊗ Bt ,t s,ti i i+1 s,ti i i+1 i=1 m−1 X D  = πn−1 SN (Bs,ti ) − SN (Bs,ti ) ⊗ Bti,ti+1 . i=1 D  Since πn−1 SN (Bs,ti ) − SN (Bs,ti ) is Fti -measurable for all ti we can use Lemma 1.1.3 to see that m−1 X π S (BD ) − S (B ) ⊗ B n−1 N s,ti N s,ti ti,ti+1 i=1 L2 D  1/2 ≤ c max πn−1 SN (Bs,t ) − SN (Bs,ti ) 2 |t − s| . i=1,...,m−1 i L

For fixed ti, by induction hypothesis, 1/2 D n−2 π S (B ) − S (B ) ≤ c D˜ |t − s| 2 n−1 N s,ti N s,ti L2 i 1/2 n−2 ≤ c |D| |t − s| 2 where D˜ = {s = t0 < . . . < ti}. Hence,

m−1 X D n−1 π S (B ) − S (B ) ⊗ B ≤ c |D| |t − s| 2 . n−1 N s,ti N s,ti ti,ti+1 i=1 L2

Assume now l ∈ {2, . . . , n − 1}. Using the equality a⊗b−a0 ⊗b0 = (a − a0)⊗b+a0 ⊗(b−b0), we can decompose the sum (1.4) into two sums and obtain, applying the triangle inequality,

m−1 X π (S (BD) ) ⊗ π (S (BD) ) n−l N s,ti l N ti,ti+1 i=1

−πn−l(SN (B)s,t ) ⊗ πl(SN (B)t ,t ) i i i+1 L2 m−1 X ≤ π S (BD ) − S (B ) ⊗ π (S (BD) ) (1.5) n−l N s,ti N s,ti l N ti,ti+1 i=1 L2 m−1 X D  + πn−l (SN (B) ) ⊗ πl SN (B )t ,t − SN (B)t ,t (1.6) s,ti i i+1 i i+1 i=1 L2 We start estimating the sum (1.5). Using equivalence of Lq-norms yields

m−1 X π S (BD ) − S (B ) ⊗ π (S (BD) ) n−l N s,ti N s,ti l N ti,ti+1 i=1 L2 m−1 X ≤ c(n, l) π S (BD ) − S (B ) π (S (BD) ) . n−l N s,ti N s,ti L2 l N ti,ti+1 L2 i=1

Now we use the induction hypothesis to see that for fixed ti, 1/2 D n−l−1 π S (B ) − S (B ) ≤ c D˜ |t − s| 2 n−l N s,ti N s,ti L2 i 1/2 n−l−1 ≤ c |D| |t − s| 2

28 Rates of Convergence for the full Brownian rough path

l ˜ D  2 where D = {s = t0 < t1 < . . . < ti}. In (1.2) we have seen that πl SN (B )ti,ti+1 ≤ c |ti+1 − ti| l and since 2 ≥ 1, m−1 X π S (BD ) − S (B ) π (S (BD) ) n−l N s,ti N s,ti L2 l N ti,ti+1 L2 i=1 m−1 1/2 n−l−1 X l ≤ c |D| |t − s| 2 |ti+1 − ti| 2 i=1 1/2 n−l−1 l ≤ c |D| |t − s| 2 |t − t1| 2 1/2 n−1 ≤ c |D| |t − s| 2 . Now we come to the sum (1.6). We first consider the case l = 2. An easy computation shows D  D π2 SN (B )t ,t − SN (B)t ,t = S2 B − S2(B)t ,t i i+1 i i+1 ti,ti+1 i i+1

= −Ati,ti+1 . Hence, m−1 X D  πn−2 (SN (B) ) ⊗ π2 SN (B )t ,t − SN (B)t ,t s,ti i i+1 i i+1 i=1 L2 m−1 X = πn−2 (SN (B) ) ⊗ At ,t s,ti i i+1 i=1 L2

It is clear that the process u 7→ πn−2 (SN (B)s,u), u ≥ s, is adapted to the filtration (Fu) and we can use Lemma 1.1.3 and Lemma 1.1.2 to see that m−1 X πn−2 (SN (B) ) ⊗ At ,t s,ti i i+1 i=1 L2 1/2 1/2 ≤ c |D| max |πn−2 (SN (Bs,t ))| 2 |t − s| i=1,...,m−1 i L 1/2 n−2 1/2 ≤ c |D| |tm−1 − s| 2 |t − s| 1/2 n−1 ≤ c |D| |t − s| 2 Finally we look at (1.6) for the cases l ∈ {3, . . . , n − 1}. Equivalence of Lq-norms and Lemma 1.1.2 show that m−1 X D  πn−l (SN (B) ) ⊗ πl SN (B )t ,t − SN (B)t ,t s,ti i i+1 i i+1 i=1 L2 m−1 X D  ≤ c πn−l (SN (B)s,t ) πl SN (B )ti,ti+1 − SN (B)ti,ti+1 2 i L2 L i=1 m−1 X n−l 2 D    ≤ c |ti − s| πl SN (B )ti,ti+1 L2 + πl SN (B)ti,ti+1 L2 i=1 m−1 n−l X l ≤ c |t − s| 2 |ti+1 − ti| 2 i=1 m−1 1/2 n−l X l−1 ≤ c |D| |t − s| 2 |ti+1 − ti| 2 i=1 1/2 n−1 ≤ c |D| |t − s| 2 . This finishes the proof.

29 Rates Brownian rough paths

Lemma 1.1.6. Let B be a Brownian motion and D be any partition of [0, 1]. Take n ∈ N. Then there is a constant c depending only on n such that

Z n D D 2 dB ⊗ ... ⊗ dB ≤ c |t − s| ∆n s,t L2 for any (s, t) ∈ ∆.

Proof. Assume first that s, t ∈ D. Define D˜ to be the subpartition D˜ = {s = tj < . . . < tj+m = t}

˜ of D. Clearly, D ≤ |t − s|. From Lemma 1.1.4 and Lemma 1.1.2 we get then

Z Z Z D D D D dB ⊗ ... ⊗ dB ≤ dB ⊗ ... ⊗ dB − ◦dB ⊗ ... ⊗ ◦dB ∆n ∆n ∆n s,t L2 s,t s,t L2

Z

+ ◦dB ⊗ ... ⊗ ◦dB ∆n s,t L2 1 2 n−1 n ˜ 2 2 ≤ c1 D |t − s| + c2 |t − s| n ≤ (c1 + c2) |t − s| 2 .

Now assume that there are ti, ti+1 ∈ D such that ti ≤ s < t ≤ ti+1. Since on (ti, ti+1) we have B − B B B˙ D = ti+1 ti =D 1 1/2 ti+1 − ti (ti+1 − ti) we obtain then Z ⊗n Z B 2 dBD ⊗ ... ⊗ dBD = 1 L du ··· du n/2 1 n ∆n (ti+1 − ti) ∆n s,t L2 s,t ⊗n n B 2 |t − s| ≤ 1 L |t − s|n/2 n! n = c3 |t − s| 2 .

Finally, for ti−1 ≤ s ≤ ti < tj ≤ t ≤ tj+1, we use the identity

D D D D SN B = SN B ⊗ SN B ⊗ SN B . s,t s,ti ti,tj tj ,t For n ≤ N, by our previous estimates,

 D  πn SN B s,t L2

X  D   D   D  = πα SN B ⊗ πβ SN B ⊗ πγ SN B s,ti ti,tj tj ,t α+β+γ=n L2 X  D   D   D  ≤ c (n) πα SN B πβ SN B πγ SN B s,ti L2 ti,tj L2 tj ,t L2 α+β+γ=n X α β γ ≤ c |ti − s| 2 |tj − ti| 2 |t − tj| 2 α+β+γ=n X α+β+γ n ≤ c |t − s| 2 ≤ c |t − s| 2 α+β+γ=n and the proof is finished.

30 Rates of Convergence for the full Brownian rough path

Theorem 1.1.7. Let D be any partition of [0, 1] and let N ∈ N. Then for all n ≤ N there is a constant c = c (n) such that for all 1/r ∈ [0, 1/2] and (s, t) ∈ ∆

  1/2−1/r n D r πn SN (B )s,t − SN (B)s,t ≤ c |D| |t − s| L2 Remark 1.1.8. This proposition is a generalization of [FV10b, Prop. 13.20] where the state- ment was shown for N = 2. One might wonder if all the work in this section could be avoided by applying the Lipschitz property of the Lyons-lifting map SN (see e.g. [FV10b, Thm. 9.10]). (n) The reasoning would be as follows: By definition of ρα-H¨ol,   D (n) D nα πn SN (B )s,t − SN (B)s,t ≤ ρα-H¨ol(SN (B ),SN (B)) |t − s| L2 L2 D nα ≤ ρα-H¨ol(SN (B ),SN (B)) L2 |t − s| for every α ∈ (0, 1]. Lipschitz-continuity of SN tells us that for 1/3 < α ≤ 1/2,

D D ρα-H¨ol(SN (B ),SN (B)) ≤ cρα-H¨ol(S2(B ),S2(B)) and therefore   D D nα πn SN (B )s,t − SN (B)s,t ≤ c ρα-H¨ol(S2(B ),S2(B)) 2 |t − s| L2 L 0 ≤ c |D|1/2−α |t − s|nα for α0 ∈ (α, 1/2] by [FV10b, Cor. 13.21]. The point we want to make here is that since α0 ∈ (1/3, 1/2], the optimal rate of convergence with this approach will only be arbitrary close to 1/6. Theorem 1.1.7 on the other hand states that   D 1/2−α nα πn SN (B )s,t − SN (B)s,t ≤ c |D| |t − s| L2 with α ∈ (0, 1/2], hence we can choose α close to 0 to obtain a convergence rate close to 1/2.

Proof. Fix n ≤ N. Assume first that ti ≤ s < t ≤ ti+1 for ti, ti+1 ∈ D. Applying Lemma 1.1.2 and 1.1.6 gives us     D D  πn SN (B )s,t − SN (B)s,t ≤ πn SN (B )s,t 2 + πn SN (B)s,t L2 L L2 n ≤ c1 |t − s| 2 n−1 1/2 = c1 |t − s| 2 min {|D| , |t − s|} .

From Lemma 1.1.4, for s = ti < tj = t ∈ D,

  n−1 1/2 D 2 ˜ πn SN (B )s,t − SN (B)s,t ≤ c2 |t − s| D L2 n−1 n o1/2 2 ˜ = c2 |t − s| min D , |t − s| n−1 1/2 ≤ c2 |t − s| 2 min {|D| , |t − s|} where D˜ = {s = ti < . . . < tj = t}. Now assume that there are ti, tj ∈ D such that ti−1 ≤ s ≤ D ti < tj ≤ t ≤ tj+1. Since SN (B ) and SN (B) are multiplicative functionals,   S (BD) − S (B) = S (BD) − S (B) ⊗ S (BD) N s,t N s,t N s,ti N s,ti N ti,t   +S (B) ⊗ S (BD) − S (B) ⊗ S (BD) N s,ti N ti,tj N ti,tj N tj ,t   +S (B) ⊗ S (BD) − S (B) N s,tj N tj ,t N tj ,t

31 Rates Brownian rough paths

Now we project this down on the n-th level and use the previous estimates. Note that for all u < v,  D  π0 SN (B )u,v − SN (B)u,v = 1 − 1 = 0 and for ti < tj ∈ D,   π S (BD) − S (B) = B − B = 0. 1 N ti,tj N ti,tj ti,tj ti,tj

Hence    D D πn SN (B )s,ti − SN (B)s,t ⊗ SN (B )ti,t i L2 n X  D  D  = πl SN (B )s,t − SN (B) ⊗ πn−l SN (B )t ,t i s,ti i l=1 L2 n   X D D  ≤ c(n) πl SN (B )s,ti − SN (B)s,t πn−l SN (B )ti,t 2 i L2 L l=1 n X l−1 1/2 n−l ≤ c |ti − s| 2 min {|D| , |ti − s|} |t − ti| 2 l=1 n 1/2 X l−1+n−l ≤ c min {|D| , |ti − s|} |t − s| 2 l=1 1/2 n−1 ≤ c3 min {|D| , |t − s|} |t − s| 2 .

In the same way one obtains

   1/2 n−1 D 2 πn SN (B)s,t ⊗ SN (B )tj ,t − SN (B)t ,t ≤ c4 min {|D| , |t − s|} |t − s| . j j L2 and     D D πn SN (B)s,t ⊗ SN (B )ti,tj − SN (B)t ,t ⊗ SN (B )tj ,t i i j L2 1/2 n−1 ≤ c5 min {|D| , |t − s|} |t − s| 2 .

Using the triangle inequality, we end up with

  1/2 n−1 D 2 πn SN (B )s,t − SN (B)s,t ≤ (c3 + c4 + c5) min {|D| , |t − s|} |t − s| . L2 We have shown that this estimate holds for all s < t. Assume 2/r ∈ (0, 1). By geometric interpolation,

1/2   n−1  1−2/r 2/r D 2 πn SN (B )s,t − SN (B)s,t ≤ c |t − s| |D| |t − s| L2 1/2−1/r n−1 1 ≤ c |D| |t − s| r |t − s| r 1/2−1/r n = c |D| |t − s| r .

Since the constant c does not depend on r, this estimate also holds for 1/r ∈ {0, 1/2}.

Corollary 1.1.9. Let 0 ≤ α < 1/2. Then, for every η ∈ (0, 1/2 − α), there is a constant c = c(α, η, N) such that for all q ∈ [1, ∞),

D N/2 η ρα-H¨ol(SN (B ),SN (B)) Lq ≤ cq |D|

32 Rates of Convergence for the full Brownian rough path

Proof. Set 1/r := 1/2 − η. Note that with this choice, α < 1/r < 1/2. By Theorem 1.1.7, we know that for all n = 1,...,N

  1/2−1/r n D r πn SN (B )s,t − SN (B)s,t ≤ c1 |D| |t − s| . L2 Lemma 1.1.2 and 1.1.6 show that

n r |πn (SN (B)s,t)|L2 ≤ c2 |t − s| , n D  r πn SN (B )s,t L2 ≤ c3 |t − s| . Applying [FV10b, Prop. 15.24], we obtain

(n) D n 1/2−1/r 2 ρα-H¨ol(SN (B ),SN (B)) ≤ c(n, α, η, N)q |D| Lq ≤ c (α, η, N) qN/2 |D|η , therefore, D N/2 η ρα-H¨ol(SN (B ),SN (B)) Lq ≤ cq |D| .

The next theorem states the main result of this section.   Theorem 1.1.10. Let B be a Brownian motion on a probability space Ω, F, (Ft)t∈[0,1] , P . Assume that we have a sequence of partitions (D ) of [0, 1] such that the sequence (|D |) n n∈N n n∈N S q of real numbers is contained in q≥1 l . Let N ∈ N, 0 ≤ α < 1/2 and η ∈ (0, 1/2 − α). Then there is a (almost surely finite) random variable C, F measurable, depending also on α, η and N, such that Dn η ρα-H¨ol(SN (B ),SN (B)) ≤C |Dn| a.s. for all n ∈ N.

Remark 1.1.11. We shall apply this theorem with N = [1/α] which makes ρα-H¨ol a rough path metric and let α ↓ 0 to obtain optimal rates of convergence.

Remark 1.1.12. The typical example of a sequence of partitions (Dn)n∈N satisfying the condi- tion in the theorem are the dyadic partitions. Another example would be the uniform partitions  1 2 n − 1  D = 0 < < < . . . < < 1 n n n n since for q > 1, ∞ q X  1  |(|D |)|q = < ∞. n lq n n=1 Proof. Choose η0 such that η < η0 < 1/2 − α and define  := η0 − η > 0. Applying Corollary (1.1.9) with η0 gives us

Dn ρα-H¨ol(SN (B ),SN (B)) N/2  η ≤ cq |Dn| . |Dn| Lq for every q ≥ 1. Using the Markov inequality shows that for every δ > 0, ∞ ∞ q  Dn  Dn X ρα-H¨ol(SN (B ),SN (B)) 1 X ρα-H¨ol(SN (B ),SN (B)) P η ≥ δ ≤ q η |D | δ |D | q n=1 n n=1 n L ∞ X q ≤ c |Dn| . n=1

33 Rates Brownian rough paths

From the assumptions on (Dn) we can choose q big enough such that the series converge. With Borell-Cantelli we conclude that

Dn ρα-H¨ol(SN (B ),SN (B)) η → 0 a.s. |Dn| for n → ∞. We set Dn ρα-H¨ol(SN (B ),SN (B)) C := sup η n |Dn| which is finite almost surely. Since C is the supremum of F-measurable random variables, it is itself F-measurable.

34 2

Convergence rates for the full Gaussian rough paths

Recall that rough path theory [Lyo98, LQ02, FV10b] is a general framework that allows to establish existence, uniqueness and stability of differential equations driven by multi-dimensional d continuous signals x: [0,T ] → R of low regularity. Formally, a rough differential equation (RDE) is of the form

d X i e dyt = Vi (yt) dxt ≡ V (yt) dxt; y0 ∈ R (2.1) i=1

e where (Vi)i=1,...,d is a family of vector fields in R . When x has finite p-variation, p < 2, such differential equations can be handled by Young integration theory. Of course, this point of view does not allow to handle differential equations driven by Brownian motion, indeed

X 2 sup Bti+1 − Bti = +∞ a.s., D⊂[0,T ] ti∈D leave alone differential equations driven by stochastic processes with less sample path regularity than Brownian motion (such as fractional Brownian motion (fBM) with Hurst parameter H < 1/2). Lyons’ key insight was that low regularity of x, say p-variation or 1/p-H¨olderfor some p ∈ [1, ∞), can be compensated by including ”enough” higher order information of x such as all increments Z n xs,t ≡ dxt1 ⊗ ... ⊗ dxtn (2.2) s

d where ”enough” means n ≤ [p]({e1, . . . , ed} denotes just the usual Euclidean basis in R here). Subject to some generalized p-variation (or 1/p-H¨older)regularity, the ensemble x1,..., x[p] then constitutes what is known as a rough path.1 In particular, no higher order information is necessary in the Young case; whereas the regime relevant for Brownian motion requires second order - or level 2 - information (”L´evy’sarea”), and so on. Note that the iterated integral on the

1A basic theorem of rough path theory asserts that further iterated integrals up to any level N ≥ [p], i.e.

n SN (x) := (x : n ∈ {1,...,N}) are then deterministically determined and the map x 7→ SN (x), known as Lyons lift, is continuous in rough path metrics. Rates Gaussian rough paths r.h.s. of (2.2) is not - in general - a well-defined Riemann-Stieltjes integral. Instead one typically proceeds by mollification - given a multi-dimensional sample path x = X (ω), consider piecewise linear approximations or convolution with a smooth kernel, compute the iterated integrals and then pass, if possible, to a limit in probability. Following this strategy one can often construct a ”canonical” enhancement of some stochastic process to a (random) rough path. Stochastic integration and differential equations are then discussed in a (rough) pathwise fashion; even in the complete absence of a semi-martingale structure. It should be emphasized that rough path theory was - from the very beginning - closely related to higher order Euler schemes. Let D = {0 = t0 < . . . < t#D−1 = 1} be a partition of N the unit interval.2 Considering the solution y of (2.1), the step-N Euler approximation yEuler ;D is given by

EulerN ;D y0 = y0 N N  N   N  yEuler ;D = yEuler ;D + V yEuler ;D xi + V V yEuler ;D xi1,i2 tj+1 tj i tj tj ,tj+1 i1 i2 tj tj ,tj+1  N  + ... + V ... V V yEuler ;D xi1,...,iN i1 iN−1 iN tj tj ,tj+1 at the points tj ∈ D where we use the Einstein summation convention, Vi stands for the differential operator Pe V k∂ and xi1,...,in = R dxi1 . . . dxin . An extension of the k=1 i xk s,t s

sEulerN ;D y0 = y0 N N N N sEuler ;D sEuler ;D  sEuler ;D i 1  sEuler ;D i1 i2 y = y + Vi y x + Vi Vi y x x tj+1 tj tj tj ,tj+1 2 1 2 tj tj ,tj+1 tj ,tj+1 N 1  sEuler ;D i1 iN + ... + Vi ... Vi Vi y x ... x . N! 1 N−1 N tj tj ,tj+1 tj ,tj+1 1 Since xtj ,tj+1 = Xtj ,tj+1 (ω) = Xtj+1 (ω) − Xtj (ω) this is precisely the effect in replacing the underlying sample path segment of X by its piecewise linear approximation, i.e.   t − tj {Xt (ω): t ∈ [tj, tj+1]} ↔ Xtj (ω) + Xtj ,tj+1 (ω): t ∈ [tj, tj+1] . tj+1 − tj Therefore, as pointed out in [DNT12] in the level N = 2 H¨olderrough path context, it is immediate that a Wong-Zakai type result, i.e. a.s. convergence of y(k) → y for k → ∞ where y(k) solves (k)  (k) (k) (k) e dyt = V yt dxt ; y0 = y0 ∈ R

(k) k and x is the piecewise linear approximation of x at the points (tj)j=0 = Dk, i.e.

(k) t − tj xt = xtj + xtj ,tj+1 if t ∈ [tj, tj+1], tj ∈ Dk, tj+1 − tj

2A general time horizon [0,T ] is handled by trivial reparametrization of time. 3... which one would call Milstein scheme when N = 2 ...

36 leads to the convergence of the simplified (and implementable!) step-N Euler scheme. While Wong-Zakai type results in rough path metrics are available for large classes of stochas- tic processes [FV10b, Chapter 13, 14, 15, 16] our focus here is on Gaussian processes which can be enhanced to rough paths. This problem was first discussed in [CQ02] where it was shown in particular that piecewise linear approximation to fBM are convergent in p-variation rough path metric if and only if H > 1/4. A practical (and essentially sharp) structural condition for the covariance, namely finite ρ-variation based on rectangular increments for some ρ < 2 of the underlying Gaussian process was given in [FV10a] and allowed for a unified and detailed analysis of the resulting class of Gaussian rough paths. This framework has since proven useful in a variety of different applications ranging from non-Markovian H¨ormandertheory [CF10] to non-linear PDEs perturbed by space-time white-noise [Hai11]. Of course, fractional Brownian motion can also be handled in this framework (for H > 1/4) and we shall make no attempt to survey its numerous applications in engineering, finance and other fields. Before describing our main result, let us recall in more detail some aspects of Gaussian rough path theory (e.g. [FV10a], [FV10b, Chapter 15], [FV11]). The basic object is a centred, contin- 1 d  d i uous Gaussian process with sample paths X (ω) = X (ω) ,...,X (ω) : [0, 1] → R where X j 2 d×d and X are independent for i 6= j. The law of this process is determined by RX : [0, 1] → R , the covariance function, given by

 1 1  d d RX (s, t) = diag E Xs Xt ,...,E Xs Xt .

The main result in this context (see e.g. [FV10b, Theorem 15.33], [FV11]) now asserts  2 that if there exists ρ < 2 such that Vρ RX , [0, 1] < ∞ then X lifts to an enhanced Gaussian 0,p−var [p] d process X with sample paths in the p-variation rough path space C [0, 1] ,G R , any p ∈ (2ρ, 4). This lift is ”natural” in the sense that for a large class of smooth approximations X(k) of X (say piecewise linear, mollifier, Karhunen-Loeve) the corresponding iterated integrals of X(k) converge (in probability) to X with respect to the p-variation rough path metric. (We N d recall from [FV10b] that ρp-var, the so-called inhomogeneous p-variation metric for G R - valued paths, is called p-variation rough path metric when [p] = N; the It¯o-Lyons map enjoys local Lipschitz regularity in this p-variation rough path metric.) Moreover, this condition is sharp; indeed fBM falls into this framework with ρ = 1/ (2H) and we known that piecewise- linear approximations to L´evy’sarea diverge when H = 1/4. Our main result (cf. Theorem 2.4.1), when applied to (mesh-size 1/k) piecewise linear approximations X(k) of X, reads as follows.

1 d d Theorem 2.0.1. Let X = X ,...,X : [0, 1] → R be a centred Gaussian process on a probability space (Ω, F,P ) with continuous sample paths where Xi and Xj are independent for i 6= j. Assume that the covariance RX has finite ρ-variation for ρ ∈ [1, 2) and K ≥  2 Vρ RX , [0, 1] . Then there is an enhanced Gaussian process X with sample paths a.s. in 0,p−var [p] d C [0, 1] ,G R for any p ∈ (2ρ, 4) and

  (k)  ρp−var S[p] X , X → 0 Lr

r for k → ∞ and every r ≥ 1 (|·|Lr denotes just the usual L (P )-norm for real valued random 1 1 variables here). Moreover, for any γ > ρ such that γ + ρ > 1 and any q > 2γ and N ∈ N there is a constant C = C (q, ρ, γ, K, N) such that

1− ρ   (k)  N/2 (k) γ ρq−var SN X ,SN (X) ≤ Cr sup Xt − Xt Lr 0≤t≤1 L2 holds for every k ∈ N.

37 Rates Gaussian rough paths

As an immediate consequence we obtain (essentially) sharp a.s. convergence rates for Wong- Zakai approximations and the simplified step-3 Euler scheme. Corollary 2.0.2. Consider a RDE with C∞-bounded vector fields driven by a Gaussian H¨older rough path X. Then mesh-size 1/k Wong-Zakai approximations (i.e. solutions of ODEs driven by X(k)) converge uniformly with a.s. rate k−(1/ρ−1/2−ε), any ε > 0, to the RDE solution. The same rate is valid for the simplified (and implementable) step-3 Euler scheme. Proof. See Corollary 2.4.5 and Corollary 2.4.7.

Several remarks are in order. • Rough path analysis usually dictates that N = 2 (resp. N = 3) levels need to be considered when ρ ∈ [1, 3/2) resp. ρ ∈ [3/2, 2). Interestingly, the situation for the Wong-Zakai error is quite different here - referring to Theorem 2.0.1, when ρ = 1 we can and will take γ arbitrarily large in order to obtain the optimal convergence rate. Since ρq−var is a rough path metric only in the case N = [q] ≥ [2γ], we see that we need to consider all levels N which is what Theorem 2.0.1 allows us to do. On the other hand, as ρ approaches 2, there is not so much room left for taking γ > ρ. Even so, we can always find γ with [γ] = 2 such that 1/γ + 1/ρ > 1. Picking q > 2γ small enough shows that we need N = [q] = 4. • The assumption of C∞-bounded vector fields in the corollary was for simplicity only. In the proof we employ local Lipschitz continuity of the It¯o-Lyons map for q-variation rough paths (involving N = [q] levels). As is well-known, this requires Lipq+ε-regularity of the vector fields4. Curiously again, we need C∞-bounded vector fields when ρ = 1 but only Lip4+ε as ρ approaches the critical value 2. • Brownian motion falls in this framework with ρ = 1. While the a.s. (Wong-Zakai) rate k−(1/2−ε) is part of the folklore of the subject (e.g. [GS06]) the C∞-boundedness assumption appears unnecessarily strong. Our explanation here is that our rates are universal (i.e. valid away from one universal null-set, not dependent on starting points, coefficients etc). In particular, the (Wong-Zakai) rates are valid on the level of stochastic flows of diffeomorphisms; we previously discussed these issues in the Brownian context in [FR11]. • A surprising aspect appears in the proof of Theorem 2.0.1. The strategy is to give sharp estimates for the levels n = 1,..., 4 first, then performing an induction similar to the one used in Lyon’s Extension Theorem ([Lyo98]) for the higher levels. This is in contrast to the usual considerations of level 1 to 3 only (without level 4!) which is typical for Gaussian rough paths. (Recall that we deal with Gaussian processes which have sample paths of finite p-variation, p ∈ (2ρ, 4), hence [p] ≤ 3 which indicates that we would need to control the first 3 levels only before using the Extension Theorem.) • Although Theorem 2.0.1 was stated here for (step-size 1/k) piecewise linear approxima- tions X(k) , the estimate holds in great generality for (Gaussian) approximations whose covariance satisfies a uniform ρ-variation bound. The statements of Theorem 2.4.1 and Theorem 2.4.2 reflect this generality. • Wong-Zakai rates for the Brownian rough path (level 2) were first discussed in [HN09]. They prove that Wong-Zakai approximations converge (in γ-H¨oldermetric) with rate k−(1/2−γ−ε) (in fact, a logarithmic sharpening thereof without ε) provided γ ∈ (1/3, 1/2). This restriction on γ is serious (for they fully rely on ”level 2” rough path theory); in particular, the best ”uniform” Wong-Zakai convergence rate implied is k−(1/2−1/3−ε) = k−(1/6−ε) leaving a significant gap to the well-known Brownian a.s. Wong-Zakai rate. 4...in the sense of E. Stein; cf. [LQ02, FV10b] for instance.

38 Iterated integrals and the shuffle algebra

• Wong-Zakai (and Milstein) rates for the fractional Brownian rough path (level 2 only, Hurst parameter H > 1/3) were first discussed in [DNT12]. They prove that Wong- Zakai approximations converge (in γ-H¨oldermetric) with rate k−(H−γ−ε) (again, in fact, a logarithmic sharpening thereof without ε) provided γ ∈ (1/3,H). Again, the restriction on γ is serious and the best ”uniform” Wong-Zakai convergence rate - and the resulting rate for the Milstein scheme - is k−(H−1/3−ε). This should be compared to the rate k−(2H−1/2−ε) obtained from our corollary. In fact, this rate was conjectured in [DNT12] and is sharp as may be seen from a precise result concerning Levy’s stochastic area for fBM, see [NTU10].

The remainder of this chapter is structured as follows: Section 2.1 recalls the connection between the shuffle algebra and iterated integrals. In particular, we will use the shuffle structure to see that in order to show the desired estimates, we can concentrate on some iterated integrals which somehow generate all the others. Our main tool for showing L2 estimates on the lower levels is multidimensional Young integration which we present in Section 2.2. The main work, namely showing the desired L2-estimates for the difference of high-order iterated integrals, is done in Section 2.3. After some preliminary Lemmas in Subsection 2.3.1, we show the estimates for the lower levels, namely for n = 1, 2, 3, 4 in Subsection 2.3.2 , then give an induction argument in Subsection 2.3.3 for the higher levels n > 4. Section 2.4 contains our main result, namely sharp a.s. convergence rates for a class of Wong-Zakai approximations, including piecewise- linear and mollifier approximations. We further show in Subsection 2.4.3 how to use these results in order to obtain sharp convergence rates for the simplified Euler scheme. As already mentioned, we can restrict ourselves to the case [0,T ] = [0, 1] in the following chapter.

2.1 Iterated integrals and the shuffle algebra

1 d d Let x = x , . . . , x : [0, 1] → R be a path of finite variation. Forming finite linear combina- tions of iterated integrals of the form Z i1 in dx . . . dx , i1, . . . , in ∈ {1, . . . , d} , n ∈ N n ∆0,1 defines a vector space over R. In this section, we will see that this vector space is also an algebra where the product is given simply by taking the usual multiplication. Moreover, we will describe precisely how the product of two iterated integrals looks like.

1.1 The shuffle algebra Let A be a set which we will call from now on the alphabet. In the following, we will only ∗ consider the finite alphabet A = {a, b, . . .} = {a1, a2, . . . , ad} = {1, . . . , d}. We denote by A the set of words composed by the letters of A, hence w = ai1 ai2 . . . ain , aij ∈ A. The empty word is denoted by e. A+ is the set of non-empty words. The length of the word is denoted by |w| and

|w|a denotes the number of occurrences of the letter a. We denote by R hAi the vector space of noncommutative polynomials on A over R, hence every P ∈ R hAi is a linear combination of ∗ words in A with coefficients in R.(P, w) denotes the coefficient in P of the word w. Hence every polynomial P can be written as X P = (P, w)w w∈A∗ and the sum is finite since the (P, w) are non-zero only for a finite set of words w. We define the degree of P as deg (P ) = max {|w| ;(P, w) 6= 0} .

39 Rates Gaussian rough paths

A polynomial is called homogeneous if all monomials have the same degree. We want to define a product on R hAi. Since a polynomial is determined by its coefficients on each word, we can define the product PQ of P and Q by X (P Q, w) = (P, u)(Q, v). w=uv

Note that this definition coincides with the usual multiplication in a (noncommutative) polyno- mial ring. We call this product the concatenation product and the algebra R hAi endowed with this product the concatenation algebra. There is another product on R hAi which will be of special interest for us. We need some no- tation first. Given a word w = ai1 ai2 . . . ain and a subsequence U = (j1, j2, . . . , jk) of (i1, . . . , in), we denote by w(U) the word aj1 aj2 . . . ajk and we call w(U) a subword of w. If w, u, v are words  w  and if w has length n, we denote by the number of subsequences U of (1, . . . , n) such u v that w(U) = u and w(U c) = v.

Definition 2.1.1. The (homogeneous) polynomial

X  w  u ∗ v = w u v w∈A∗ is called the shuffle product of u and v. By linearity we extend it to a product on R hAi.

In order to proof our main result, we want to use some sort of induction over the length of the words. Therefore, the following definition will be useful.

Definition 2.1.2. If U is a set of words of the same length, we call a subset {w1, . . . , wk} of U a generating set for U if for every word w ∈ U there is a polynomial R and real numbers λ1, . . . , λk such that k X w = λjwj + R j=1 P where R is of the form R = u,v∈A+ µu,vu ∗ v for real numbers µu,v.

n1 nd ∗ Definition 2.1.3. We say that a word w is composed by a1 , . . . , ad if w ∈ {a1, . . . , ad} and |w| = n for i = 1, . . . , d, hence every letter appears in the word with the given multiplicity. ai i The aim now is to find a (possibly small) generating set for the set of all words composed by some given letters. The next definition introduces a special class of words which will be important for us.

Definition 2.1.4. Let A be totally ordered and put on A∗ the alphabetical order. If w is a word such that whenever w = uv for u, v ∈ A+ one has u < v, then w is called a Lyndon word.

Proposition 2.1.5. (i) For the set {words composed by a, a, b} a generating set is given by {aab}.

(ii) For the set {words composed by a, a, a, b} a generating set is given by {aaab}.

(iii) For the set {words composed by a, a, b, b} a generating set is given by {aabb}.

(iv) For the set {words composed by a, a, b, c} a generating set is given by {aabc, aacb, baac}.

40 Iterated integrals and the shuffle algebra

Proof. Consider the alphabet A = {a, b, c}. We choose the order a < b < c. A general theorem states that every word w has a unique decreasing factorization into Lyndon words, i.e. i1 ik w = l1 . . . lk where l1 > . . . > lk are Lyndon words and i1, . . . , ik ≥ 1 (see [Reu93, Theorem 5.1 and Corollary 4.7]), and the formula

1 X l∗i1 ∗ ... ∗ l∗ik = w + α u i ! . . . i ! 1 k u 1 k u

abac = baac + aabc + aacb − b ∗ aac it follows that also {aabc, aacb, baac} generates this set.

1.2 The connection to iterated integrals 1 d d Let x = (x , . . . , x ): [0, 1] → R be a path of finite variation and fix s < t ∈ [0, 1]. For a ∗ word w = (ai1 . . . ain ) ∈ A , A = {1, . . . , d} we define

( R i1 in + n dx . . . dx if w ∈ A xw = ∆s,t . 1 if w = e

Let (R hAi , +, ∗) be the shuffle algebra over the alphabet A. We define a map Φ: R hAi → R w by Φ (w) = xs,t and extend it linearly to polynomials P ∈ R hAi. The key observation is the following:

Theorem 2.1.6. Φ is an algebra homomorphism from the shuffle algebra (R hAi , +, ∗) to (R, +, ·). Proof. [Reu93], Corollary 3.5.

In the following, X will denote a Gaussian process as in Theorem 2.0.1 and X denotes the natural Gaussian rough path. We will need the following Proposition:

Proposition 2.1.7. Let X be as in Theorem 2.0.1 and assume that ω controls the ρ-variation of the covariance of X, ρ ∈ [1, 2). Then for every n ∈ N there is a constant C (n) = C (n, ρ) such that   n Xn ≤ C (n) ω [s, t]2 2ρ s,t L2 for any s < t.

Proof. For n = 1, 2, 3 this is proven in [FV10b, Proposition 15.28]. For n ≥ 4 and fixed ˜ 1 ρ s < t, we set Xτ := 1 Xs+τ(t−s). Then RX˜ ρ−var;[0,1] ≤ 1 =: K and by the standard ω([s,t]2) 2ρ (deterministic) estimates for the Lyons lift,

1/n Xn   s,t ˜ ˜ 1 ≤ c1 Sn X ≤ c2 (n, p) X   p−var;[0,1] p−var;[0,1] ω [s, t]2 2ρ

41 Rates Gaussian rough paths for any p ∈ (2ρ, 4). Now we take the L2-norm on both sides. From [FV10b, Theorem 15.33]

˜ we know that X is bounded by a constant only depending on p, ρ and K which p−var;[0,1] L2 shows the claim. Alternatively (and more in the spirit of the forthcoming arguments), one performs an in- duction similar (but easier) as in the proof of Proposition 2.3.21.

The next proposition shows that we can restrict ourselves in showing the desired estimates only for the iterated integrals which generate the others. Proposition 2.1.8. Let (X,Y ) = X1,Y 1,...,Xd,Y d be a Gaussian process on [0, 1] with paths of finite variation. Let A = {1, . . . , d} be the alphabet, let U be a set of words of length n and V = {w1, . . . , wk} be a generating set for U. Let ω be a control, ρ, γ ≥ 1 constants and s < t ∈ [0, 1]. Assume that there are constants C = C (|w|) such that

|w| |w| w w X ≤ C (|w|) ω (s, t) 2ρ and Y ≤ C (|w|) ω (s, t) 2ρ s,t L2 s,t L2 holds for every word w ∈ A∗ with |w| ≤ n − 1. Assume also that for some  > 0

|w|−1 w w 1 X − Y ≤ C (|w|) ω (s, t) 2γ ω (s, t) 2ρ s,t s,t L2 holds for every word w with |w| ≤ n − 1 and w ∈ V . Then there is a constant C˜ which depends on the constants C, on n and on d such that

w w 1 n−1 X − Y ≤ Cω˜ (s, t) 2γ ω (s, t) 2ρ s,t s,t L2 holds for every w ∈ U.

1 Remark 2.1.9. We could account for the factor ω (s, t) 2γ in  here but the present form is how we shall use this proposition later on. Proof. Consider a copy A¯ of A. If a ∈ A, we denote bya ¯ the corresponding letter in A¯. If ∗ ∗ ¯ ¯ w = ai1 . . . ain ∈ A , we definew ¯ =a ¯i1 ... a¯in ∈ A and in the same way we define P ∈ R A for P ∈ R hAi. Now we consider R A∪˙ A¯ equipped with the usual shuffle product. Define

Ψ: R A∪˙ A¯ → R by Z b b Ψ(w) = dZ i1 . . . dZ in n ∆s,t for a word w = bi1 . . . bin where

 aj bj X for bj = aj Z = a¯ Y j for bj =a ¯j and extend this definition linearly. By Theorem 2.1.6, we know that Ψ is an algebra homo- morphism. Take w ∈ U. By assumption, we know that there is a vector λ = (λ1, . . . , λk) such that k X w − w¯ = λj (wj − w¯j) + R − R¯ j=1 P where R is of the form R = u,v∈A+,|u|+|v|=n µu,v u ∗ v with real numbers µu,v. Applying Ψ and taking the L2 norm yields

k X w w Xw − Yw ≤ |λ | X j − Y j + Ψ R − R¯ s,t s,t L2 j s,t s,t L2 L2 l=1 1 n−1 2γ 2ρ ¯ ≤ c1ω (s, t) ω (s, t) + Ψ R − R L2 .

42 Multidimensional Young-integration and grid-controls

Now, X X R − R¯ = µu,v (u ∗ v − u¯ ∗ v¯) = µu,v (u − u¯) ∗ v + µu,vu¯ ∗ (v − v¯) . u,v u,v Applying Ψ and taking the L2 norm gives then X Ψ R − R¯ ≤ |µ | Xu − Yu  Xv + |µ | Yu Xv − Yv  L2 u,v s,t s,t s,t L2 u,v s,t s,t s,t L2 u,v X   ≤ c Xu − Yu Xv + Yu Xv − Yv 2 s,t s,t L2 s,t L2 s,t L2 s,t s,t L2 u,v X 1 |v|+|u|−1 ≤ c3ω (s, t) 2γ ω (s, t) 2ρ u,v 1 n−1 ≤ c4ω (s, t) 2γ ω (s, t) 2ρ where we used equivalence of Lq-norms in the Wiener Chaos (cf. [FV10b, Proposition 15.19 and Theorem D.8]). Putting all together shows the assertion.

2.2 Multidimensional Young-integration and grid-controls

n Let f : [0, 1] → R be a continuous function. If s1 < t1, . . . , sn < tn and u1, . . . , un are elements in [0, 1], we make the following recursive definition:       s1, t1 t1 s1  u2   u2   u2  f   : = f   − f   and  .   .   .   .   .   .  un un un       s1, t1 s1, t1 s1, t1  .   .   .   .   .   .         sk−1, tk−1   sk−1, tk−1   sk−1, tk−1        f  sk, tk  : = f  tk  − f  sk  .        uk+1   uk+1   uk+1         .   .   .   .   .   .  un un un We will also use the simpler notation   s1, t1  .  f (R) = f  .  sn, tn n for the rectangle R = [s1, t1] × ... × [sn, tn] ⊂ [0, 1] . Note that for n = 2 this is consistent   s1, t1 n with our initial definition of f . If f, g : [0, 1] → R are continuous functions, the s2, t2 n-dimensional Young-integral is defined by Z f (x1, . . . , xn) dg (x1, . . . , xn) [s1,t1]×...×[sn,tn]  t1 , t1  i1 i1+1 X 1 n   .  : = lim f ti1 , . . . , tin g . |D1|,...,|Dn|→0    1  n n t ⊂D1 t , t i1 in in+1 . . tn ⊂D ( in ) n

43 Rates Gaussian rough paths if this limit exists. Take p ≥ 1. The n-dimensional p-variation of f is defined by

 1/p     1 1  p  t , t   i1 i1+1   X  .   Vp (f, [s1, t1] × ... × [sn, tn]) =  sup f  .    D1⊂[s1,t1]   1  n n  t ⊂D1 t , t   . i1 in in+1   . .  Dn⊂[sn,tn] .  tn ⊂D ( in ) n

n and if Vp (f, [0, 1] ) < ∞ we say that f has finite (n-dimensional) p-variation. The fundamental theorem is the following:

1 1 Theorem 2.2.1. Assume that f has finite p-variation and g finite q-variation where p + q > 1. Then the joint Young-integral below exists and there is a constant C = C (p, q) such that   s1, u1 Z  .  f  .  dg (u1, . . . , un) [s1,t1]×...×[sn,tn] sn, un ≤ CVp (f, [s1, t1] × ... × [sn, tn]) Vq (g, [s1, t1] × ... × [sn, tn]) .

Proof. [Tow02], Theorem 1.2 (c).

We will mainly consider the case n = 2, but we will also need n = 3 and 4 later on. In particular, the discussion of level n = 4 will require us to work with 4D grid control functions which we now introduce. With no extra complication we make the following general definition.

+ Definition 2.2.2 (n-dimensional grid control). A map ω˜ : ∆ × ... × ∆ → R is called a n-D | {z } n-times grid-control if it is continuous and partially super-additive, i.e. for all (s1, t1) ,..., (sn, tn) ∈ ∆ and si < ui < ti we have

ω˜ ([s1, t1] × ... × [si, ui] × ... × [sn, tn]) +ω ˜ ([s1, t1] × ... × [ui, ti] × ... × [sn, tn])

≤ ω˜ ([s1, t1] × ... × [si, ti] × ... × [sn, tn]) for every i = 1, . . . , n. ω˜ is called symmetric if     ω˜ ([s1, t1] × ... × [sn, tn]) =ω ˜ sσ(1), tσ(1) × ... × sσ(n), tσ(n) holds for every σ ∈ Sn.

The point of this definition is that |f (A)|p ≤ ω˜ (A) for every rectangle A ⊂ [0, 1]n implies p n that Vp (f, R) ≤ ω˜ (R) for every rectangle R ⊂ [0, 1] . Note that a 2D control is automatically a 2D grid-control. The following immediate properties will be used in Section 2.3.2 with m = n = 2.

Lemma 2.2.3. (i) The restriction of a (m + n)-dimensional grid-control to m arguments is a m-dimensional grid-control.

(ii) The product of a m- and a n-dimensional grid-control is a (m + n)-dimensional grid- control.

44 Multidimensional Young-integration and grid-controls

2.1 Iterated 2D-integrals In the 1-dimensional case, the classical Young-theory allows to define iterated integrals of p functions with finite p-variation where p < 2. There, the superadditivity of (s, t) 7→ |·|p−var;[s,t] played an essential role. We will see that Theorem 0.0.1 can be used to define and estimate iterated 2D-integrals. This will play an important role in Section 2.3 when we estimate the L2-norm of iterated integrals of Gaussian processes.

2 Lemma 2.2.4. Let f, g : [0, 1] → R be continuous where f has finite p-variation and g finite controlled q-variation with p−1 + q−1 > 1. Let (s, t) ∈ ∆ and assume that f (s, ·) = f (·, s) = 0. 2 Define Φ:[s, t] → R by Z Φ(u, v) = f dg. [s,u]×[s,v] Then there is a constant C = C (p, q) such that

 2  2 Vq−var Φ; [s, t] ≤ C (p, q) Vp−var f;[s, t] |g|q−var;[s,t]2 .

Proof. (i) Let ti < ti+1 and t˜j < t˜j+1. Then,

t , t  Z Φ i i+1 = f dg. t˜ , t˜ j j+1 [ti,ti+1]×[t˜j ,t˜j+1]

Now let ti < u < ti+1 and t˜j < v < t˜j+1. Then one has   ti, u   f = f (u, v) − f (ti, v) − f u, t˜j + f ti, t˜j . t˜j, v

Therefore,

  Z   Z ti, ti+1 ti, u Φ ≤ f dg (u, v) + f (ti, v) dg (u, v) t˜j, t˜j+1 t˜j, v [ti,ti+1]×[t˜j ,t˜j+1] [ti,ti+1]×[t˜j ,t˜j+1]

Z Z ˜  ˜  + f u, tj dg (u, v) + f ti, tj dg (u, v) [ti,ti+1]×[t˜j ,t˜j+1] [ti,ti+1]×[t˜j ,t˜j+1] For the first integral we use Young 2D-estimates to see that

Z t , u f i dg (u, v) t˜j, v [ti,ti+1]×[t˜j ,t˜j+1]     ≤ c1 (p, q) Vp f, [ti, ti+1] × t˜j, t˜j+1 Vq g, [ti, ti+1] × t˜j, t˜j+1  2 ≤ c1 (p, q) Vp f, [s, t] |g| q−var;[ti,ti+1]×[t˜j ,t˜j+1]

For the second, one has by a Young 1D-estimate

Z Z

f (ti, v) dg (u, v) = f (ti, v) d (g (ti+1, v) − g (ti, v)) [ti,ti+1]×[t˜j ,t˜j+1] [t˜j ,t˜j+1] ≤ c2 sup |f (u, ·)| |g| . p−var;[s,t] q−var;[ti,ti+1]×[t˜j ,t˜j+1] u∈[s,t]

Similarly,

Z  f u, t˜j dg (u, v) ≤ c2 sup |f (·, v)| |g| . p−var;[s,t] q−var;[ti,ti+1]×[t˜j ,t˜j+1] [ti,ti+1]×[t˜j ,t˜j+1] v∈[s,t]

45 Rates Gaussian rough paths

Finally,

Z   ˜  ˜  ti, ti+1 f ti, tj dg (u, v) = f ti, tj g ≤ |f|∞;[s,t] |g|q−var;[t ,t ]× t˜ ,t˜ . t˜j, t˜j+1 i i+1 [ j j+1] [ti,ti+1]×[t˜j ,t˜j+1] Putting all together, we get   q ti, ti+1 Φ t˜j, t˜j+1 !q  2 ≤ c3 Vp f, [s, t] + sup |f (u, ·)|p−var;[s,t] + sup |f (·, v)|p−var;[s,t] + |f|∞;[s,t] u∈[s,t] v∈[s,t] × |g|q . q−var;[ti,ti+1]×[t˜j ,t˜j+1] Take a partition D ⊂ [s, t] and u ∈ [s, t]. Then

  p p X p X s, u  2 |f (u, ti+1) − f (u, ti)| = f ≤ Vp f, [s, t] ti, ti+1 ti∈D ti∈D and hence  2 sup |f (u, ·)|p−var;[s,t] ≤ Vp f, [s, t] . u∈[s,t] The same way one obtains

 2 sup |f (·, v)|p−var;[s,t] ≤ Vp f, [s, t] . v∈[s,t]

Finally, for u, v ∈ [s, t],   s, u  2 |f (u, v)| = f ≤ Vp f, [s, t] s, v

 2 and therefore |f|∞;[s,t] ≤ Vp f, [s, t] . Putting everything together, we end up with

  q q ti, ti+1  2 q Φ ≤ c4Vp f, [s, t] |g| ˜ ˜ . t˜j, t˜j+1 q−var;[ti,ti+1]×[tj ,tj+1] ˜ q Hence for every partition D, D ⊂ [s, t] one gets, using superadditivity of |g|q−var,

  q q X ti, ti+1  2 X q Φ ≤ c4Vp f, [s, t] |g| ˜ ˜ t˜j, t˜j+1 q−var;[ti,ti+1]×[tj ,tj+1] ti∈D,t˜j ∈D˜ ti∈D,t˜j ∈D˜  q ≤ c V f, [s, t]2 |g|q . 4 p q−var;[s,t]2 Passing to the supremum over all partitions shows the assertion.

2 This lemma allows us to define iterated 2D-integrals. Let f, g1, . . . , gn : [0, 1] → R. An R R iterated 2D-integral is given by ∆1 ×∆1 f dg1 = [s,t]×[s0,t0] f (u, v) dg1 (u, v) for n = 1 and s,t s0,t0 recursively defined by Z Z Z ! f dg1 . . . dgn := f dg1 . . . dgn−1 dgn (u, v) ∆n ×∆n [s,t]×[s0,t0] ∆n−1×∆n−1 s,t s0,t0 s,u s0,v for n ≥ 2.

46 The main estimates

2 Proposition 2.2.5. Let f, g1, g2,... : [0, 1] → R and p, q1, q2,... be real numbers such that −1 −1 −1 −1 p + q1 > 1 and qi + qi+1 > 1 for every i ≥ 1. Assume that f has finite p-variation and gi has finite qi-variation for i = 1, 2,... and that for (s, t) ∈ ∆ we have f (s, ·) = f (·, s) = 0. Then for every n ∈ N there is a constant C = C (p, q1, . . . , qn) such that

Z       2 2 2 f dg1 . . . dgn ≤ CVp f, [s, t] Vq1 g1, [s, t] ...Vqn gn, [s, t] . n n ∆s,t×∆s,t

(n) R Proof. Define Φ (u, v) = n n f dg1 . . . dgn. We will show a stronger result; namely that ∆s,u×∆s,v 0 0 for every n ∈ N and qn > qn there is a constant C = C (p, q1, . . . , qn, qn) such that

 (n) 2  2  2  2 0 Vqn Φ , [s, t] ≤ CVp f, [s, t] Vq1 g1, [s, t] ...Vqn gn, [s, t] .

To do so, letq ˜ , q˜ ,...be a sequence of real numbers such thatq ˜ > q and 1 + 1 > 1 for every 1 2 j j q˜j−1 q˜j j = 1, 2,... where we setq ˜0 = p. We make an induction over n. For n = 1, we haveq ˜1 > q1 and 1 + 1 > 1, hence from Theorem 0.0.1 we know that g has finite controlledq ˜ -variation p q˜1 1 1 and Lemma 2.2.4 gives us

 (1) 2  2  2  2 V Φ ;[s, t] ≤ c V f;[s, t] |g | 2 ≤ c V f;[s, t] V g ;[s, t] . q˜1 1 p 1 q˜1;[s,t] 2 p q1 1

0 W.l.o.g, we may assume that q1 > q˜1 > q1, otherwise we chooseq ˜1 smaller in the beginning.  (1) 2  (1) 2 From V 0 Φ ;[s, t] ≤ V Φ ;[s, t] the assertion follows for n = 1. Now take n ∈ . q1 q˜1 N Note that Z (n) (n−1) Φ (u, v) = Φ dgn [s,u]×[s,v] and clearly Φ(n−1) (s, ·) = Φ(n−1) (·, s) = 0. We can use Lemma 2.2.4 again to see that

 (n) 2  (n−1) 2 V Φ , [s, t] ≤ c V Φ ;[s, t] |g | 2 q˜n 3 q˜n−1 n q˜n−var;[s,t]  (n−1) 2  2 ≤ c4Vq˜n−1 Φ ;[s, t] Vqn gn;[s, t] .

Using our induction hypothesis shows the result forq ˜n. By choosingq ˜n smaller in the beginning 0 if necessary, we may assume that qn > q˜n and the assertion follows.

2.3 The main estimates

In the following section, (X,Y ) = X1,Y 1,...,Xd,Y d will always denote a centred continuous Gaussian process where Xi,Y i and Xj,Y j are independent for i 6= j. We will also assume that the ρ-variation of R(X,Y ) is finite for a ρ < 2 and controlled by a symmetric 2D-control ω (this in particular implies that the ρ-variation of RX ,RY and RX−Y is controlled by ω, see 1 1 [FV10b, Section 15.3.2]). Let γ > ρ such that ρ + γ > 1. The aim of this section is to show 5 that for every n ∈ N there are constants C (n) such that

1 n−1 n n  2 2γ  2 2ρ Xs,t − Ys,t  ⊗n ≤ C (n) ω [s, t] ω [s, t] for every s < t (2.4) L2 (Rd)

1−ρ/γ 2  2 where  = V∞ RX−Y , [s, t] (see Definition 2.3.1 below for the exact definition of V∞). Equivalently, we might show (2.4) coordinate-wise, i.e. proving that the same estimate holds for

1 + n−1 5We prefer to write it in this notation instead of writing ω [s, t]2 2γ 2ρ to emphasize the different roles of the two terms. The first term will play no particular role and just comes from interpolation whereas the second one will be crucial when doing the induction step from lower to higher levels in Proposition 2.3.21.

47 Rates Gaussian rough paths

w w |X − Y | 2 for every word w formed by the alphabet A = {1, . . . , d}. In some special cases, L (R) i.e. if a word w has a very simple structure, we can do this directly using multidimensional Young integration. This is done in Subsection 2.3.1. Subsection 2.3.2 shows (2.4) for n = 1, 2, 3, 4 coordinate-wise, using the shuffle algebra structure for iterated integrals and multidimensional Young integration. In Subsection 2.3.3, we show (2.4) coordinate-free for all n > 4, using an induction argument very similar to the one Lyon’s used for proving the Extension Theorem (cf. [Lyo98]). We start with giving a 2-dimensional analogue for the one-dimensional interpolation inequal- ity.

Definition 2.3.1. If f : [0, 1]2 → B is a continuous function in a Banach space and (s, t) × (u, v) ∈ ∆ × ∆ we set V∞ (f, [s, t] × [u, v]) = sup |f (A)| . A⊂[s,t]×[u,v] Lemma 2.3.2. For γ > ρ ≥ 1 we have the interpolation inequality

1−ρ/γ ρ/γ Vγ−var (f, [s, t] × [u, v]) ≤ V∞ (f, [s, t] × [u, v]) Vρ−var (f, [s, t] × [u, v]) for all (s, t) , (u, v) ∈ ∆.

Proof. Exactly as 1D-interpolation, see [FV10b, Proposition 5.5].

3.1 Some special cases If Z : [0, 1] → R is a process with smooth sample paths, we will use the notation Z (n) Zs,t = dZ . . . dZ n ∆s,t for s < t.

Lemma 2.3.3. Let X : [0, 1] → R be a centred Gaussian process with continuous paths of finite variation and assume that the ρ-variation of the covariance RX is controlled by a 2D-control ω. For fixed s < t, define  (n) (n) f (u, v) = E Xs,u Xs,v . Then there is a constant C = C (ρ, n) such that

n  2  2 ρ Vρ f, [s, t] ≤ Cω [s, t] .

Proof. Let ti < ti+1, t˜j < t˜j+1. Then   ti, ti+1  (n) (n)   (n) (n)  f = E Xs,t − Xs,t X ˜ − X ˜ . t˜j, t˜j+1 i+1 i s,tj+1 s,tj

n (n) (X) We know that X = n! . From the identity bn − an = (b − a) an−1 + an−2b + ... + . . . abn−2 + bn−1 we deduce that   n−1   t , t 1 X n−1−k  n−1−l  l f i i+1 = E X X X  (X )k X X . 2 ti,ti+1 t˜j ,t˜j+1 s,ti+1 s,ti s,t˜j+1 s,t˜j t˜j, t˜j+1 (n!) k,l=0

48 The main estimates

We want to apply Wick’s formula now (cf. [Jan97, Theorem 1.28]). If n o Z, Z˜ ∈ X ,X ,X ,X s,ti+1 s,ti s,t˜j+1 s,t˜j we know that

 ρ E Xti,ti+1 Z ≤ ω ([ti, ti+1] × [s, t])   ρ   E Xt ,t X˜ ˜ ≤ ω [ti, ti+1] × t˜j, t˜j+1 i i+1 tj ,tj+1   ρ   ˜ 2 E ZZ ≤ ω [s, t] and the same holds for X . Now take two partitions D, D˜ ∈ [0, 1]. Then, by Wick’s formula t˜j ,t˜j+1 and the estimates above,

  ρ n−2 X ti, ti+1  2 X    f ≤ c1 (ρ, n) ω [s, t] ω ([ti, ti+1] × [s, t]) ω t˜j, t˜j+1 × [s, t] t˜j, t˜j+1 ti∈D,t˜j ∈D˜ ti∈D,t˜j ∈D˜ n−1  2 X   +c2 (ρ, n) ω [s, t] ω [ti, ti+1] × t˜j, t˜j+1

ti∈D,t˜j ∈D˜ n  2 ≤ c3ω [s, t] .

2 Lemma 2.3.4. Let (X,Y ) be a centred Gaussian process in R with continuous paths of finite variation. Assume that the ρ-variation of R(X,Y ) is controlled by a 2D-control ω for ρ < 2 and take γ > ρ. Then for every n ∈ N there is a constant C = C (n) such that

1 n−1 (n) (n)  2 2γ  2 2ρ Xs,t − Ys,t ≤ C (n) ω [s, t] ω [s, t] L2

1−ρ/γ 2  2 for any s < t where  = V∞ RX−Y , [s, t] .

Proof. By induction. For n = 1 we simply have from Lemma 2.3.2

2  2 |Xs,t − Ys,t|L2 = E [(Xs,t − Ys,t)(Xs,t − Ys,t)] ≤ Vγ−var RX−Y , [s, t] ρ/γ 1 2  2 2  2 γ ≤  Vρ−var RX−Y , [s, t] ≤  ω [s, t]

For n ∈ N we use the identity 1   X(n) − Y(n) = X X(n−1) − Y Y(n−1) s,t s,t n s,t s,t s,t s,t and hence

(n) (n)  (n−1) (n−1) (n−1)  Xs,t − Ys,t ≤ c1 |Xs,t − Ys,t| 2 Xs,t + Xs,t − Ys,t |Ys,t| 2 L2 L L2 L2 L 1 n−1  2 2γ  2 2ρ ≤ c2ω [s, t] ω [s, t] .

49 Rates Gaussian rough paths

1 2 2 Assume that Z ,Z is a centred, continuous Gaussian process in R with smooth sample paths and that both components are independent. Then (at least formally, cf. [FV10a]),

Z 1 2 "Z 1 2# "Z # 1 2 1 2 1 1 2 2 Z0,u dZu = E Z0,u dZu = E Z0,uZ0,v dZ dZv (2.5) 0 L2 0 [0,1]2 Z Z    1 1   2 2 0 · = E Z0,uZ0,v dE ZuZv = RZ1 dRZ2 (2.6) [0,1]2 [0,1]2 0 · where the integrals in the second row are 2D Young-integrals (to make this rigorous, one uses that the integrals are a.s. limits of Riemann sums and that a.s. convergence implies convergence in L1 in the (inhomogeneous) Wiener chaos). These kinds of computations together with our estimates for 2D Young-integrals will be heavily used from now on.

Lemma 2.3.5. Let (X,Y ) = X1,Y 1,...,Xd,Y d be a centred Gaussian process with contin- uous paths of finite variation where Xi,Y i and Xj,Y j are independent for i 6= j. Assume that the ρ-variation of R(X,Y ) is controlled by a 2D-control ω for ρ < 2. Let w be a word of the 1 1 form w = i1 ··· in where i1, . . . , in ∈ {1, . . . , d} are all distinct. Take γ > ρ such that ρ + γ > 1. Then there is a constant C = C (ρ, γ, n) such that

  1   n−1 Xw − Yw ≤ C (n) ω [s, t]2 2γ ω [s, t]2 2ρ s,t s,t L2

1−ρ/γ 2  2 for any s < t where  = V∞ RX−Y , [s, t] .

Proof. By the triangle inequality,

Z Z Xw − Yw = dXi1 . . . dXin − dY i1 . . . dY in s,t s,t L2 ∆n ∆n s,t s,t L2 n X Z i1 ik−1 ik ik  ik+1 in ≤ dY . . . dY d X − Y dX . . . dX . ∆n k=1 s,t L2 From independence, Proposition 2.2.5 and Lemma 2.3.2

2 Z i1 ik−1 ik ik  ik+1 in dY . . . dY d X − Y dX . . . dX ∆n s,t L2 Z = dRY i1 . . . dR ik−1 dRXik −Y ik dR ik+1 . . . dRXin n n Y X ∆s,t×∆s,t  2  2  2 ≤ c1Vρ RY i1 , [s, t] ...Vρ RY ik−1 , [s, t] Vγ RXik −Y ik , [s, t]  2  2 ×Vρ RXik+1 , [s, t] ...Vρ RXin , [s, t] n−1 1 n−1  2  2 ρ 2  2 γ  2 ρ ≤ c1Vγ RX−Y , [s, t] ω [s, t] ≤ c1 ω [s, t] ω [s, t] .

The first inequality above is an immediate generalization of the calculations made in (2.5) and (2.6). Note that the respective random terms are not only pairwise but mutually independent here since we are dealing with a Gaussian process (X,Y ). Interchanging the limits is allowed since convergence in probability implies convergence in Lp, any p > 0, in the Wiener chaos.

3.2 Lower levels

50 The main estimates n = 1, 2 Proposition 2.3.6. Let (X,Y ), ω, ρ and γ as in Lemma 2.3.5. Then there are constants C (1) ,C (2) which depend on ρ and γ such that

  1   n−1 Xn − Yn ≤ C (n) ω [s, t]2 2γ ω [s, t]2 2ρ s,t s,t L2

1−ρ/γ 2  2 holds for n = 1, 2 and every (s, t) ∈ ∆ where  = V∞ RX−Y , [s, t] .

Proof. The coordinate-wise estimates are just special cases of Lemma 2.3.4 and Lemma 2.3.5.

n = 3 Proposition 2.3.7. Let (X,Y ), ω, ρ and γ as in Lemma 2.3.5. Then there is a constant C (3) which depends on ρ and γ such that

  1   2 X3 − Y3 ≤ C (3) ω [s, t]2 2γ ω [s, t]2 2ρ s,t s,t L2

1−ρ/γ 2  2 holds for every (s, t) ∈ ∆ where  = V∞ RX−Y , [s, t] .

Proof. We have to show the estimate for Xi,j,k − Yi,j,k where i, j, k ∈ {1, . . . , d}. From Propo- sition 2.1.8 and 2.1.5 it follows that it is enough to show the estimate for Xw − Yw where

w ∈ {iii, ijk, iij : i, j, k ∈ {1, . . . , d} distinct} .

The cases w = iii and w = ijk are special cases of Lemma 2.3.4 and Lemma 2.3.5. The rest of this section is devoted to show the estimate for w = iij.

2 Lemma 2.3.8. Let (X,Y ) : [0, 1] → R be a centred Gaussian process and consider

f (u, v) = E ((Xu − Yu) Xv) .

Assume that the ρ-variation of R(X,Y ) is controlled by a 2D-control ω where ρ ≥ 1. Let s < t and consider a rectangle [σ, τ] × [σ0, τ 0] ⊂ [s, t]2. Let γ > ρ. Then

1/2(1/ρ−1/γ)  0 0  2  0 01/γ Vγ−var f, [σ, τ] × σ , τ ≤ ω [s, t] ω [σ, τ] × σ , τ

1−ρ/γ 2  2 where  = V∞ RX−Y , [s, t] .

Proof. Let u < v and u0 < v0 ∈ [s, t]. Then  0 0 0 0 E (Xu,v − Yu,v) Xu ,v ≤ |Xu,v − Yu,v|L2 Xu ,v L2 1/2 1/2  2  2 ≤ V∞ RX−Y , [s, t] Vρ−var R(X,Y ), [s, t]

and hence

1/2 1   2  2 2ρ sup E (Xu,v − Yu,v) Xu0,v0 ≤ V∞ RX−Y , [s, t] ω [s, t] . u

51 Rates Gaussian rough paths

Now take a partition D of [σ, τ] and a partition D˜ of [σ0, τ 0]. Then X    γ E Xt ,t − Yt ,t X˜ ˜ i i+1 i i+1 tj ,tj+1 ti∈D,t˜j ∈D˜  γ−ρ X    ρ ≤ sup E (X − Y ) X 0 0 E X − Y X u,v u,v u ,v ti,ti+1 ti,ti+1 t˜j ,t˜j+1 u

2 Lemma 2.3.9. Let (X,Y ) : [0, 1] → R be a centred Gaussian process with continuous paths of finite variation. Assume that the ρ-variation of R(X,Y ) is controlled by a 2D-control ω where ρ ≥ 1. Consider the function

h (2) (2)  (2) (2)i g (u, v) = E Xs,u − Ys,u Xs,v − Ys,v . Then for every γ > ρ there is a constant C = C (ρ, γ) such that

1/γ+1/ρ  2 2  2 Vγ−var g, [s, t] ≤ C ω [s, t]

1−ρ/γ 2  2 holds for every (s, t) ∈ ∆ where  = V∞ RX−Y , [s, t] . Proof. Let u < v and u0 < v0. Then  u, v  h       i g = E X(2) − X(2) − Y(2) − Y(2) X(2) − X(2) − Y(2) − Y(2) u0, v0 s,v s,u s,v s,u s,v0 s,u0 s,v0 s,u0

1  2 2  2 2  2 2  2 2  = E X − X − Y − Y X 0 − X 0 − Y 0 − Y 0 . 22 s,v s,u s,v s,u s,v s,u s,v s,u Now, 2 2  2 2  Xs,v − Xs,u − Ys,v − Ys,u = Xu,v (Xs,u + Xs,v) − Yu,v (Ys,u + Ys,v) = Xu,v (Xs,u − Ys,u) + (Xu,v − Yu,v) Ys,u

+Xu,v (Xs,v − Ys,v) + (Xu,v − Yu,v) Ys,v. The same way one gets 2 2  2 2    Xs,v0 − Xs,u0 − Ys,v0 − Ys,u0 = Xu0,v0 Xs,u0 − Ys,u0 + Xu0,v0 − Yu0,v0 Ys,u0   +Xu0,v0 Xs,v0 − Ys,v0 + Xu0,v0 − Yu0,v0 Ys,v0 . Now we expand the product of both sums and take expectation. For the first term we obtain, using the Wick formula and Lemma 2.3.8,  E Xu,v (Xs,u − Ys,u) Xu0,v0 Xs,u0 − Ys,u0    ≤ E Xu,vXu0,v0 E (Xs,u − Ys,u) Xs,u0 − Ys,u0     + E Xu,v Xs,u0 − Ys,u0 E Xu0,v0 (Xs,u − Ys,u)   + E Xu0,v0 Xs,u0 − Ys,u0 E [Xu,v (Xs,u − Ys,u)]  0 0  2 ≤ Vρ−var R(X,Y ), [u, v] × u , v Vγ−var RX−Y , [s, t]   0 0  +2Vγ−var R(X,X−Y ), [u, v] × [s, t] Vγ−var R(X,X−Y ), u , v × [s, t]  1/γ ≤ 2ω [u, v] × u0, v01/ρ ω [s, t]2  1/ρ−1/γ +22ω [s, t]2 ω ([u, v] × [s, t])1/γ ω u0, v0 × [s, t]1/γ .

52 The main estimates

Now take two partitions D, D˜ of [s, t]. With our calculations above, X    γ E Xt ,t (Xs,t − Ys,t ) X˜ ˜ X ˜ − Y ˜ i i+1 i i tj ,tj+1 s,tj s,tj ti∈D,t˜j ∈D˜ 2γ  2 X  γ/ρ ≤ c1 ω [s, t] ω [ti, ti+1] × t˜j, t˜j+1

ti∈D,t˜j ∈D˜ γ/ρ−1 2γ  2 X    +c2 ω [s, t] ω ([ti, ti+1] × [s, t]) ω t˜j, t˜j+1 × [s, t]

ti∈D,t˜j ∈D˜  γ/ρ γ/ρ−1 2 2γ  2  2  2  2 ≤ c3 ω [s, t] ω [s, t] + ω [s, t] ω [s, t] .

The other terms are treated exactly the same way. Taking the supremum over all partitions shows the result.

The next corollary completes the proof of Proposition 2.3.7. Corollary 2.3.10. Let (X,Y ), ω, ρ and γ as in Lemma 2.3.5. Then there is a constant C = C (ρ, γ) such that

1 2 i,i,j i,i,j  2 2γ  2 2ρ Xs,t − Ys,t ≤ Cω [s, t] ω [s, t] L2

1−ρ/γ 2  2 holds for every (s, t) ∈ ∆ and i 6= j where  = V∞ RX−Y , [s, t] .

Proof. From the triangle inequality,

Z Z Xi,i,j − Yi,i,j ≤ Xi,i − Yi,i  dY j + Yi,i d Xj − Y j . s,t s,t 2 s,u s,u u s,u u L [s,t] [s,t] L2 L2 For the first integral, we use independence to move the expectation inside the integral as seen in the proof of Lemma 2.3.5, then we use 2D Young integration and Lemma 2.3.9 to obtain the desired estimate. The second integral is estimated in the same way using Lemma 2.3.3. n = 4 Proposition 2.3.11. Let (X,Y ), ω, ρ and γ as in Lemma 2.3.5. Then there is a constant C (4) which depends on ρ and γ such that

  1   3 X4 − Y4 ≤ C (4) ω [s, t]2 2γ ω [s, t]2 2ρ s,t s,t L2

1−ρ/γ 2  2 holds for every (s, t) ∈ ∆ where  = V∞ RX−Y , [s, t] .

Proof. From Proposition 2.1.8 and 2.1.5 one sees that it is enough to show the estimate for Xw − Yw where

w ∈ {iiii, ijkl, iijj, iiij, iijk, jiik : i, j, k, l ∈ {1, . . . , d} distinct} .

The cases w = iiii and w = ijkl are special cases of Lemma 2.3.4 and Lemma 2.3.5. Hence it remains to show the estimate for

w ∈ {iijj, iiij, iijk, jiik : i, j, k ∈ {1, . . . , d} pairwise distinct} .

This is the content of the remaining section.

53 Rates Gaussian rough paths

Lemma 2.3.12. Let (X,Y ), ω, ρ and γ as in Lemma 2.3.5. Then there is a constant C = C (ρ, γ) such that 1 3 i,i,j,k i,i,j,k  2 2γ  2 2ρ Xs,t − Ys,t ≤ Cω [s, t] ω [s, t] L2 1−ρ/γ 2  2 holds for every (s, t) ∈ ∆ where i, j, k are distinct and  = V∞ RX−Y , [s, t] .

Proof. From the triangle inequality,

i,i,j,k i,i,j,k Xs,t − Ys,t L2

Z Z i,i j k i,i j k = Xs,u dXu dXv − Ys,u dYu dYv {s

Z Z ≤ Xi,i − Yi,i  dXj dXk + Yi,i d Xj − Y j dXk s,u s,u u v s,u u v {s

Z   i,i j k k + Ys,u dYu d X − Y . {s

2 Z Z i,i i,i  j k  i,i i,i i,i i,i Xs,u − Ys,u dXu dXv = E Xs,· − Ys,· Xs,· − Ys,· dRXj dRXk {s

For the other two integrals we also use Proposition 2.2.5 together with Lemma 2.3.3 to obtain the same estimate.

2 Lemma 2.3.13. Let (X,Y ) : [0, 1] → R be a centred Gaussian process with continuous paths of finite variation. Assume that the ρ-variation of R(X,Y ) is controlled by a 2D-control ω where ρ ≥ 1. Consider the function

h (3) (3)  (3) (3)i g (u, v) = E Xs,u − Ys,u Xs,v − Ys,v .

Then for every γ > ρ there is a constant C = C (ρ, γ) such that

1/γ+2/ρ  2 2  2 Vγ−var g, [s, t] ≤ C ω [s, t]

(1−ρ/γ) 2  2 holds for every (s, t) ∈ ∆ where  = V∞ RX−Y , [s, t] .

Proof. Similar to the one of Lemma 2.3.9 applying again Wick’s formula.

Corollary 2.3.14. Let (X,Y ), ω, ρ and γ as in Lemma 2.3.5. Then there is a constant C = C (ρ, γ) such that

1 3 i,i,i,j i,i,i,j  2 2γ  2 2ρ Xs,t − Ys,t ≤ Cω [s, t] ω [s, t] L2

(1−ρ/γ) 2  2 holds for every (s, t) ∈ ∆ and i 6= j where  = V∞ RX−Y , [s, t] .

54 The main estimates

Proof. The triangle inequality gives

Z Z Xi,i,i,j − Yi,i,i,j = Xi,i,i dXj − Yi,i,i dY j s,t s,t 2 s,u u s,u u L [s,t] [s,t]

Z Z ≤ Xi,i,i − Yi,i,i dXj + Yi,i,i d Xj − Y j . s,u s,u u s,u u [s,t] [s,t] L2 L2 For the first integral, we move the expectation inside the integral, use 2D Young integration and Lemma 2.3.13 to conclude the estimate. The second integral is estimated the same way applying Lemma 2.3.3.

It remains to show the estimates for Xw − Yw where w ∈ {iijj, jiik}. We need to be a i,i,j R i,i j bit careful here for the following reason: It is clear that X0,1 = [0,1] Xu dXu. One might j,i,i R j i,i expect that also X0,1 = [0,1] Xu dXu holds, but this is not true in general. Indeed, just take f (u) = g (u) = u. Then

Z 1 Z u  1 Z 1 Z 1 1 f (u) d g (v) dg (v) = u d u2 = u2 du = 0 0 2 0 0 3 but Z Z 1 f (u) dg (u) dg (v) = du1 du2 du3 = . 2 3 6 ∆0,1 ∆0,1 One the other hand, if g is smooth, we can use Fubini to see that Z Z 0 0 f (u) dg (u) dg (v) = f (u) g (u) g (v) 1{u

2 Lemma 2.3.15. Let f : [0, 1] → R be a continuous function. Set

f¯(u1, u2, v1, v2) = f (u1 ∧ u2, v1 ∧ v2) .

(i) Let u1 < u˜1, u2 < u˜2, v1 < v˜1, v2 < v˜2 be all in [0, 1]. Then   u1, u˜1    u2, u˜2  u, u˜ f¯  = f  v1, v˜1  v, v˜ v2, v˜2

55 Rates Gaussian rough paths

where we set  [u , u˜ ] ∩ [u , u˜ ] if [u , u˜ ] ∩ [u , u˜ ] 6= ∅ [u, u˜] = 1 1 2 2 1 1 2 2 . [0, 0] if [u1, u˜1] ∩ [u2, u˜2] = ∅  [v , v˜ ] ∩ [v , v˜ ] if [v , v˜ ] ∩ [v , v˜ ] 6= ∅ [v, v˜] = 1 1 2 2 1 1 2 2 [0, 0] if [v1, v˜1] ∩ [v2, v˜2] = ∅

(ii) For s < t, σ < t and p ≥ 1 we have

 2 2 Vp (f, [s, t] × [σ, τ]) = Vp f,¯ [s, t] × [σ, τ] .

Proof. (i) By definition of the higher dimensional increments,           u1, u˜1 u˜1 u˜1 u1 u1  u2, u˜2   u˜2   u2   u˜2   u2  f¯  = f¯  − f¯  − f¯  + f¯   v1   v1   v1   v1   v1  v2 v2 v2 v2 v2

= f (˜u1 ∧ u˜2, v1 ∧ v2) − f (˜u1 ∧ u2, v1 ∧ v2)

−f (u1 ∧ u˜2, v1 ∧ v2) + f (u1 ∧ u2, v1 ∧ v2) .

By a case distinction, one sees that this is equal to f (˜u, v1 ∧ v2) − f (u, v1 ∧ v2). One goes on with           u1, u˜1 u1, u˜1 u1, u˜1 u1, u˜1 u1, u˜1  u2, u˜2   u2, u˜2   u2, u˜2   u2, u˜2   u2, u˜2  f¯  = f¯  − f¯  − f¯  + f¯   v1, v˜1   v˜1   v˜1   v1   v1  v2, v˜2 v˜2 v2 v˜2 v2

= h (˜v1 ∧ v˜2) − h (˜v1 ∧ v2) − h (v1 ∧ v˜2) + h (v1 ∧ v2) = h (˜v) − h (v)

where h (·) = f (˜u, ·) − f (u, ·) .Hence

 u, u˜  h (˜v) − h (v) = f (˜u, v˜) − f (u, v˜) − f (˜u, v) + f (u, v) = f . v, v˜

(ii) Let D be a partition of [s, t] and D˜ a partition of [σ, τ]. Then by 1,   p ti, ti+1   p p X ti, ti+1 X ¯ ti, ti+1    ¯ 2 2 f = f   ≤ Vp f, [s, t] × [σ, τ] , t˜j, t˜j+1  t˜j, t˜j+1  ti∈D,t˜∈D˜ ti∈D,t˜∈D˜ t˜j, t˜j+1

 2 2 hence Vp (f, [s, t] × [σ, τ]) ≤ Vp f,¯ [s, t] × [σ, τ] . Now let D1,D2 be partitions of [s, t]

and D˜1, D˜2 be partitions of [σ, τ]. Set D = D1 ∪ D2, D˜ = D˜1 ∪ D˜2. Then D is a partition of [s, t] and D˜ a partition of [σ, τ] (see Figure 1 below). By (1),

 t1 , t1  p i1 i1+1 t2 , t2   p X  i2 i2+1  X ti, ti+1 p f  1 1  = f ≤ (Vp (f, [s, t] × [σ, τ])) t˜ , t˜ t˜j, t˜j+1 1 2  j1 j1+1  ˜ t ∈D1,t ∈D2 2 2 ti∈D,t˜∈D i1 i2 t˜ , t˜ 1 2 j2 j2+1 t˜ ∈D˜1,t˜ ∈D˜2 j1 j2

 2 2 and we also get Vp f,¯ [s, t] × [σ, τ] ≤ Vp (f, [s, t] × [σ, τ]).

56 The main estimates

2 Lemma 2.3.16. Let (X,Y ) : [0, 1] → R be a centred Gaussian process with continuous paths of finite variation and assume that ω is a symmetric control which controls the ρ-variation of 1−ρ/γ 2  2 R(X,Y ) where ρ ≥ 1. Take (s, t) ∈ ∆, γ > ρ and set  = V∞ RX−Y , [s, t] .

(i) Set f (u1, u2, v1, v2) = E [Xu1 Xu2 Xv1 Xv2 ]. Then there is a constant C1 = C1 (ρ) and a symmetric 4D grid-control ω˜1 which controls the ρ-variation of f and

1/ρ 2  4  4  2 ρ Vρ f, [s, t] ≤ ω˜1 [s, t] = C1ω [s, t] .

˜ h (2) (2) i (ii) Set f (u1, u2, v1, v2) = E Xs,u1∧u2 Xs,v1∧v2 . Then there is a constant C2 = C2 (ρ) such that 2  4  2 ρ Vρ f,˜ [s, t] ≤ C2ω [s, t] .

(iii) Set

g (u1, u2, v1, v2) = E [(Xu1 Xu2 − Yu1 Yu2 )(Xv1 Xv2 − Yv1 Yv2 )] .

Then there is a constant C3 = C3 (ρ, γ) and a symmetric 4D grid-control ω˜2 which controls the γ-variation of g and

1/γ 1/γ+1/ρ  4  4 2  2 Vγ g, [s, t] ≤ ω˜2 [s, t] = C3 ω [s, t] .

(iv) Set    (2) (2)  (2) (2) g˜ (u1, u2, v1, v2) = E X − Y X − Y . s,u1∧u2 s,v1∧v2

Then there is a constant C4 = C4 (ρ, γ) such that

1/γ+1/ρ  4 2  2 Vγ g,˜ [s, t] ≤ C4 ω [s, t] .

Proof. (i) Let u1 < u˜1, u2 < u˜2, v1 < v˜1, v2 < v˜2. By the Wick-formula,

ρ |E [Xu1,u˜1 Xu2,u˜2 Xv1,v˜1 Xv2,v˜2 ]| ρ−1 ρ ρ−1 ρ ≤ 3 |E [Xu1,u˜1 Xu2,u˜2 ] E [Xv1,v˜1 Xv2,v˜2 ]| + 3 |E [Xu1,u˜1 Xv1,v˜1 ] E [Xu2,u˜2 Xv2,v˜2 ]| ρ−1 ρ +3 |E [Xu1,u˜1 Xv2,v˜2 ] E [Xu2,u˜2 Xv1,v˜1 ]| ρ−1 ≤ 3 ω ([u1, u˜1] × [u2, u˜2]) ω ([v1, v˜1] × [v2, v˜2]) ρ−1 +3 ω ([u1, u˜1] × [v1, v˜1]) ω ([u2, u˜2] × [v2, v˜2]) ρ−1 +3 ω ([u1, u˜1] × [v2, v˜2]) ω ([u2, u˜2] × [v1, v˜1])

= :ω ˜1 ([u1, u˜1] × [u2, u˜2] × [v1, v˜1] × [v2, v˜2]) .

It is easy to see thatω ˜1 is a symmetric grid-control and that it fulfils the stated property.

(ii) A direct consequence of Lemma 2.3.3 and Lemma 2.3.15.

(iii) We have

Xu1 Xu2 − Yu1 Yu2 = (Xu1 − Yu1 ) Xu2 + Yu1 (Xu2 − Yu2 ) .

57 Rates Gaussian rough paths

Hence for u1 < u˜1, u2 < u˜2, v1 < v˜1, v2 < v˜2,   u1, u˜1 ˜ u2, u˜2  h i f   = E (X − Y )u ,u˜ Xu2,u˜2 (X − Y )v ,v˜ Xv2,v˜2  v1, v˜1  1 1 1 1 v2, v˜2 h i +E Y (X − Y ) (X − Y ) X u1,u˜1 u2,u˜2 v1,v˜1 v2,v˜2 h i +E (X − Y ) X Y (X − Y ) u1,u˜1 u2,u˜2 v1,v˜1 v2,v˜2 h i +E Y (X − Y ) Y (X − Y ) u1,u˜1 u2,u˜2 v1,v˜1 v2,v˜2 For the first term we have, using Lemma 2.3.8, h i γ E (X − Y ) Xu ,u˜ (X − Y ) Xv ,v˜ u1,u˜1 2 2 v1,v˜1 2 2 γ γ γ−1 h i h i ≤ 3 E (X − Y ) Xu ,u˜ E (X − Y ) Xv ,v˜ u1,u˜1 2 2 v1,v˜1 2 2 γ γ−1 h i γ +3 E (X − Y ) (X − Y ) |E [Xu ,u˜ Xv ,v˜ ]| u1,u˜1 v1,v˜1 2 2 2 2 γ γ γ−1 h i h i +3 E (X − Y ) Xv ,v˜ E Xu ,u˜ (X − Y ) u1,u˜1 2 2 2 2 v1,v˜1 γ −1 γ−1 2γ  2 ρ ≤ 3  ω [s, t] ω ([u1, u˜1] × [u2, u˜2]) ω ([v1, v˜1] × [v2, v˜2]) γ γ−1 2γ +3  ω ([u1, u˜1] × [v1, v˜1]) ω ([u2, u˜2] × [v2, v˜2]) ρ γ −1 γ−1 2γ  2 ρ +3  ω [s, t] ω ([u1, u˜1] × [v2, v˜2]) ω ([u2, u˜2] × [v1, v˜1]) γ −1 γ−1 2γ  2 ρ ≤ 3  ω [s, t] (ω ([u1, u˜1] × [u2, u˜2]) ω ([v1, v˜1] × [v2, v˜2])

+ω ([u1, u˜1] × [v1, v˜1]) ω ([u2, u˜2] × [v2, v˜2])

+ω ([u1, u˜1] × [v2, v˜2]) ω ([u2, u˜2] × [v1, v˜1]))

= :ω ˜ ([u1, u˜1] × [u2, u˜2] × [v1, v˜1] × [v2, v˜2]) . ω˜ is a symmetric grid-control and fulfils the stated property. The other terms are treated in the same way. (iv) Follows from Lemma 2.3.9 and Lemma 2.3.15.

Corollary 2.3.17. Let (X,Y ), ω, ρ and γ as in Lemma 2.3.5. Then there is a constant C = C (ρ, γ) such that 1 3 i,i,j,j i,i,j,j  2 2γ  2 2ρ Xs,t − Ys,t ≤ Cω [s, t] ω [s, t] L2 1−ρ/γ 2  2 holds for every (s, t) ∈ ∆ and i 6= j where  = V∞ RX−Y , [s, t] . Proof. As seen before, we can use Fubini to obtain Z Z i,i,j,j i,i j j 1 i,i j j  Xs,t = Xs,u1 dXu1 dXu2 = Xs,u1∧u2 d Xu1 Xu2 2 2 2 ∆s,t [s,t] and hence

1 Z   Xi,i,j,j − Yi,i,j,j ≤ Xi,i − Yi,i d Xj Xj  s,t s,t 2 s,u1∧u2 s,u1∧u2 u1 u2 L 2 [s,t]2 L2

1 Z + Yi,i d Xj Xj − Y j Y j  . s,u1∧u2 u1 u2 u1 u2 2 [s,t]2 L2

58 The main estimates

We use a Young 4D-estimate and the estimates of Lemma 2.3.16 to see that

2 Z   Xi,i − Yi,i d Xj Xj  s,u1∧u2 s,u1∧u2 u1 u2 [s,t]2 L2 Z h i,i i,i   i,i i,i i  j j j j  = E Xs,u1∧u2 − Ys,u1∧u2 Xs,v1∧v2 − Ys,v1∧v2 dE Xu1 Xu2 Xv1 Xv2 [s,t]4 1/γ 3/ρ 2  2  2 ≤ c1 ω [s, t] ω [s, t] .

The second term is estimated in the same way using again Lemma 2.3.16.

2 2 2 Lemma 2.3.18. Let f : [0, 1] → R and g : [0, 1] × [0, 1] → R be continuous where g is symmetric in the first and the last two variables. Let (s, t) ∈ ∆ and assume that f (s, ·) = f(·, s) = 0. Assume also that f has finite p-variation and that the q-variation of g is controlled 1 1 by a symmetric 4D grid-control ω˜ where p + q > 1. Define Z Ψ(u, v) = f(u1 ∧ u2, v1 ∧ v2) dg (u1, u2; v1, v2) [s,u]2×[s,v]2

Then there is a constant C = C (p, q) such that

1/q  2  2  4 Vq Ψ; [s, t] ≤ CVp f;[s, t] ω˜ [s, t] .

Proof. Set

f˜(u1, u2, v1, v2) = f(u1 ∧ u2, v1 ∧ v2).

Let u < v and u0 < v0. Note that

1[s,v]2×[s,v0]2 − 1[s,u]2×[s,v0]2 − 1[s,v]2×[s,u0]2 + 1[s,u]2×[s,u0]2

= 1([s,v]2\[s,u]2)×[s,v0]2 − 1([s,v]2\[s,u]2)×[s,u0]2

= 1([s,v]2\[s,u]2)×([s,v0]2\[s,u0]2)

If we take out the square [s, u]2 of the larger square [s, v]2, what is left is the union of three essentially disjoint squares. More precisely,

[s, v]2 \ [s, u]2 = [u, v]2 ∪ ([s, u] × [u, v]) ∪ ([u, v] × [s, u]) .

The same holds for u0 and v0. Hence,

    [s, v]2 \ [s, u]2 × [s, v0]2 \ [s, u0]2 = [u, v]2 ∪ ([s, u] × [u, v]) ∪ ([u, v] × [s, u]) × [u0, v0]2 ∪ [s, u0] × [u0, v0] ∪ [u0, v0] × [s, u0] = [u, v]2 × [u0, v0]2 ∪ [u, v]2 × [s, u0] × [u0, v0] ∪ [u, v]2 × [u0, v0] × [s, u0] ∪ [s, u] × [u, v] × [u0, v0]2 ∪ [s, u] × [u, v] × [s, u0] × [u0, v0] ∪ [s, u] × [u, v] × [u0, v0] × [s, u0] ∪ [u, v] × [s, u] × [u0, v0]2 ∪ [u, v] × [s, u] × [s, u0] × [u0, v0] ∪ [u, v] × [s, u] × [u0, v0] × [s, u0]

59 Rates Gaussian rough paths and all these are unions of essentially disjoint sets. Using continuity and the symmetry of f˜ and g we have then   Z u, v ˜ Ψ 0 0 = f dg u , v ([s,v]2\[s,u]2)×([s,v0]2\[s,u0]2) Z Z = f˜ dg + 2 f˜ dg [u,v]2×[u0,v0]2 [u,v]2×[s,u0]×[u0,v0] Z Z +2 f˜ dg + 4 f˜ dg. [s,u]×[u,v]×[u0,v0]2 [s,u]×[u,v]×[s,u0]×[u0,v0]

For the first integral we use Young 4D-estimates. Since f˜(s, ·, ·, ·) = ... = f˜(·, ·, ·, s) = 0, we can proceed as in the proof of Lemma 2.2.4 and use Lemma 2.3.15 to see that

Z   ˜ 2 2 0 0 2 f dg ≤ c1Vp f, [s, t] Vq g, [u, v] × [u , v ] [u,v]2×[u0,v0]2  2 2 0 0 21/q ≤ c1Vp f, [s, t] ω˜ [u, v] × [u , v ]

For the second integral, we have Z f˜ dg [u,v]2×[s,u0]×[u0,v0] Z = f(u1 ∧ u2, v1 ∧ v2) dg (u1, u2; v1, v2) [u,v]2×[s,u0]×[u0,v0] Z  0 0 = f(u1 ∧ u2, v1) d g u1, u2; v1, v − g u1, u2; v1, u [u,v]2×[s,u0] We now use a Young 3D-estimate to see that

Z   ˜ 3 f dg ≤ c2Vp f (· ∧ ·, ·) , [s, t] [u,v]2×[s,u0]×[u0,v0]  0 0 2  0 ×Vq g ·, ·; ·, v − g ·, ·; ·, u , [u, v] × s, u

 3  2 As in Lemma 2.3.15, one can show that Vp f (· ∧ ·, ·) , [s, t] = Vp f, [s, t] . For g, we have

 0 0 2  0  2  0  0 0 Vq g ·, ·; ·, v − g ·, ·; ·, u , [u, v] × s, u ≤ Vq g, [u, v] × s, u × u , v  1/q ≤ ω˜ [u, v]2 × [s, t] × u0, v0 .

Hence Z    1/q ˜ 2 2  0 0 f dg ≤ c2Vp f, [s, t] ω˜ [u, v] × [s, t] × u , v . [u,v]2×[s,u0]×[u0,v0] Similarly, using Young 3D and 2D estimates, we get

Z    1/q ˜ 2  0 02 f dg ≤ c3Vp f, [s, t] ω˜ [s, t] × [u, v] × u , v [s,u]×[u,v]×[u0,v0]2 and

Z   ˜ 2 0 0 1/q f dg ≤ c4Vp f, [s, t] ω˜ [s, t] × [u, v] × [s, t] × [u , v ] . [s,u]×[u,v]×[s,u0]×[u0,v0]

60 The main estimates

Putting all together, using the symmetry ofω ˜ we have shown that

  q q u, v  2  0 0 2 Ψ ≤ c5Vp f, [s, t] ω˜ [u, v] × [u , v ] × [s, t] . u0, v0

0 0  0 0 2 Sinceω ˜2 ([u, v] × [u , v ]) :=ω ˜ [u, v] × [u , v ] × [s, t] is a 2D grid-control this shows the claim.

We are now able to prove the remaining estimate.

Corollary 2.3.19. Let (X,Y ), ω, ρ and γ as in Lemma 2.3.5. Then there is a constant C = C (ρ, γ) such that

1 3 j,i,i,k j,i,i,k  2 2γ  2 2ρ Xs,t − Ys,t ≤ Cω [s, t] ω [s, t] L2

1−ρ/γ 2  2 holds for every (s, t) ∈ ∆ and i, j, k pairwise distinct where  = V∞ RX−Y , [s, t] .

Proof. From Z Z j i i 1 j i i  Xs,u1 dXu1 dXu2 = Xs,u1∧u2 d Xu1 Xu2 2 2 2 ∆s,w [s,w] we see that Z t Z ! j,i,i,k 1 j i i  k Xs,t = Xs,u1∧u2 d Xu1 Xu2 dXw. 2 s [s,w]2 Hence

j,i,i,k j,i,i,k Xs,t − Ys,t L2 Z t Z t Z t 1 k 1 k 1  k k ≤ Ψ1 (w) dXw + Ψ2 (w) dXw + Ψ3 (w) d X − Y 2 s L2 2 s L2 2 s w L2 where Z  j j  i i  Ψ1 (w) = Xs,u1∧u2 − Ys,u1∧u2 d Xu1 Xu2 [s,w]2 Z j i i i i  Ψ2 (w) = Ys,u1∧u2 d Xu1 Xu2 − Yu1 Yu2 [s,w]2 Z j i i  Ψ3 (w) = Ys,u1∧u2 d Yu1 Yu2 . [s,w]2

We start with the first integral. From independence and Young 2D-estimates,

Z t 2 Z k h k k i Ψ1 (w) dXw = E [Ψ1 (w1)Ψ1 (w2)] dE Xw1 Xw2 s L2 [s,t]2  2  2 ≤ c1Vρ E [Ψ1 (·)Ψ1 (·)] , [s, t] Vρ RXk [s, t] .

Now,

E [Ψ1 (w1)Ψ1 (w2)] Z h j j   j j i  i i i i  = E Xs,u ∧u − Ys,u ∧u Xs,v ∧v − Ys,v ∧v dE Xu Xu Xv Xv . 2 2 1 2 1 2 1 2 1 2 1 2 1 2 [s,w1] ×[s,w2]

61 Rates Gaussian rough paths

 i i i i In Lemma 2.3.16 we have seen that the ρ-variation of E X· X· X· X· is controlled by a sym- metric grid-controlω ˜1. Hence we can apply Lemma 2.3.18 to conclude that

1/ρ  2  2  4 Vρ E [Ψ1 (·)Ψ1 (·)] , [s, t] ≤ c2Vγ RX−Y ;[s, t] ω˜1 [s, t] 1/γ 2/ρ 2  2  2 ≤ c3 ω [s, t] ω [s, t] .

1/ρ  2  2 Clearly, Vρ RXk [s, t] ≤ ω [s, t] and therefore

2 Z t 1/γ 3/ρ k 2  2  2 Ψ1 (w) dXw ≤ c4 ω [s, t] ω [s, t] . s L2 Now we come to the second integral. From independence,

Z t 2 Z k h k k i Ψ2 (w) dXw = E [Ψ2 (w1)Ψ2 (w2)] dE Xw1 Xw2 . s L2 [s,t]2  2  2 ≤ c5Vγ E [Ψ2 (·)Ψ2 (·)] , [s, t] Vρ RXk [s, t] .

Now

E [Ψ2 (w1)Ψ2 (w2)] Z h j j i  i i i i  i i i i  = E Ys,u ∧u Ys,v ∧v dE Xu Xu − Yu Yu Xv Xv − Yv Yv 2 2 1 2 1 2 1 2 1 2 1 2 1 2 [s,w1] ×[s,w2] Z h j j i = : E Ys,u ∧u Ys,v ∧v dg (u1, u2, v1, v2) . 2 2 1 2 1 2 [s,w1] ×[s,w2] In Lemma 2.3.16 we have seen that the 4D γ-variation of g is controlled by a symmetric 4D grid-controlω ˜2 where 1/γ 1/ρ+1/γ  4 2  2 ω˜2 [s, t] = c6 ω [s, t] . Hence

1/γ 2/ρ+1/γ  2  2  4 2  2 Vγ E [Ψ2 (·)Ψ2 (·)] , [s, t] ≤ c7Vρ RY j ;[s, t] ω˜2 [s, t] ≤ c8 ω [s, t] .

This gives us 2 Z t 1/γ 3/ρ k 2  2  2 Ψ2 (w) dXw ≤ c9 ω [s, t] ω [s, t] . s L2 For the third integral we see again that

Z t 2 Z    k k  k k  k k Ψ3 (w) d X − Y = E [Ψ3 (w1)Ψ3 (w2)] dE X − Y X − Y s w L2 [s,t]2 w1 w2  2  2 ≤ c10Vρ E [Ψ3 (·)Ψ3 (·)] , [s, t] Vγ RX−Y , [s, t] .

From Z h j j i  i i i i  E [Ψ3 (w1)Ψ3 (w2)] = E Ys,u ∧u Ys,v ∧v dE Yu Yu Yv Yv 2 2 1 2 1 2 1 2 1 2 [s,w1] ×[s,w2] we see that we can apply Lemma 2.3.18 to obtain

2/ρ 3/ρ  2  2  2  2 Vρ E [Ψ3 (·)Ψ3 (·)] , [s, t] ≤ c11Vρ RY j ;[s, t] ω [s, t] ≤ c11ω [s, t] .

62 The main estimates

1/γ  2 2  2 Clearly, Vγ RX−Y , [s, t] ≤  ω [s, t] and hence

2 Z t 1/γ 3/ρ  k k 2  2  2 Ψ3 (w) d X − Y ≤ c12 ω [s, t] ω [s, t] s w L2 which gives the claim.

Remark 2.3.20. Even though Proposition 2.3.6, 2.3.7 and 2.3.11 are only formulated for Gaus- sian processes with sample paths of finite variation, the estimate (2.4) is valid also for general Gaussian rough paths for n = 1, 2, 3, 4. Indeed, this follows from the fact that Gaussian rough paths are just defined as L2 limits of smooth paths, cf. [FV10a].

3.3 Higher levels Once we have shown our desired estimates for the first four levels, we can use induction to obtain also the higher levels. This is done in the next proposition.

Proposition 2.3.21. Let X and Y be Gaussian processes as in Theorem 2.0.1. Let ρ, γ be fixed and ω be a control. Assume that there are constants C˜ = C˜ (n) such that

n 2ρ n n ˜ ω (s, t) Xs,t 2 , Ys,t 2 ≤ C (n) L L  n  β 2ρ ! holds for n = 1,..., [2ρ] and constants C = C (n) such that

n−1 1 ω (s, t) 2ρ n n 2γ Xs,t − Ys,t 2 ≤ C (n) ω (s, t) L  n−1  β 2ρ ! holds for n = 1,..., [2ρ] + 1 and every (s, t) ∈ ∆. Here,  > 0 and β is a positive constant such that   [2ρ] + 1  β ≥ 4ρ 1 + 2([2ρ]+1)/2ρ ζ − 1 2ρ where ζ is just the usual Riemann zeta function. Then for every n ∈ N there is a constant C = C (n) such that n−1 1 ω (s, t) 2ρ n n 2γ Xs,t − Ys,t 2 ≤ Cω (s, t) L  n−1  β 2ρ ! holds for every (s, t) ∈ ∆.

Proof. From Proposition 2.1.7 we know that for every n ∈ N there are constants C˜ (n) such that n 2ρ n n ˜ ω (s, t) Xs,t 2 , Ys,t 2 ≤ C L L  n  β 2ρ ! holds for all s < t. We will proof the assertion by induction over n. The induction basis is fulfiled by assumption. Suppose that the statement is true for k = 1, . . . , n where n ≥ [2ρ] + 1. We will show the statement for n + 1. Let D = {s = t0 < t1 < . . . < tj = t} be any partition of [s, t]. Set

¯ 1 n  n+1  d Xs,t : = 1, Xs,t,..., Xs,t, 0 ∈ T R , ¯ D ¯ ¯ Xs,t : = Xs,t1 ⊗ ... ⊗ Xtj−1,t

63 Rates Gaussian rough paths

¯ D and the same for Y. We know that lim|D|→0 Xs,t = Sn+1 (X)s,t a.s. and the same holds for Y (indeed, this is just the definition of the Lyons lift, cf. [Lyo98, Theorem 2.2.1]). By ¯ D  k multiplicativity, πk Xs,t = Xs,t for k ≤ n. We will show that for any dissection D we have

n 1 ω (s, t) 2ρ ¯ D ¯ D  2γ πn+1 Xs,t − Ys,t 2 ≤ C (n + 1) ω (s, t) . L  n  β 2ρ !

Dk D 0 We use the notation X := πk X¯ . Assume that j ≥ 2. Let D be the partition of [s, t] obtained by removing a point ti of the dissection D for which

( 2ω(s,t) j−1 for j ≥ 3 ω (ti−1, ti+1) ≤ ω (s, t) for j = 2 holds (Lemma 2.2.1 in [Lyo98] shows that there is indeed such a point). By the triangle in- equality,

n+1  0 n+1  0 n+1  0 0 n+1 XD − YD ≤ XD − XD − YD − YD + XD − YD . 2 L L2 L2 We estimate the first term on the right hand side. As seen in the proof of [Lyo98, Theorem  0 n+1 2.2.1], XD − XD = Pn Xl Xn+1−l. Set Rl = Yl−Xl. Then s,t s,t l=1 ti−1,ti ti,ti+1

n+1 n+1  D D0   D D0  Xs,t − Xs,t − Ys,t − Ys,t n X     = Xl Xn+1−l − Xl + Rl Xn+1−l + Rn+1−l ti−1,ti ti,ti+1 ti−1,ti ti−1,ti ti,ti+1 ti,ti+1 l=1 n X = −Xl Rn+1−l − Rl Yn+1−l. ti−1,ti ti,ti+1 ti−1,ti ti,ti+1 l=1 By the triangle inequality, equivalence of Lq-norms in the Wiener Chaos, our moment estimate for Xk and Yk and the induction hypothesis,

n+1 n+1  D D0   D D0  Xs,t − Xs,t − Ys,t − Ys,t L2 n X l n+1−l l n+1−l ≤ c1 (n + 1) Xt ,t Rt ,t + Rt ,t Yt ,t i−1 i L2 i i+1 L2 i−1 i L2 i i+1 L2 l=1 n l n−l X 1 ω (ti−1, ti) 2ρ ω (ti, ti+1) 2ρ ≤ c2 (n + 1) ω (ti, ti+1) 2γ  l   n−l  l=1 β 2ρ ! β 2ρ ! l−1 n+1−l 1 ω (ti−1, ti) 2ρ ω (ti, ti+1) 2ρ +ω (ti−1, ti) 2γ  l−1   n+1−l  β 2ρ ! β 2ρ ! n l n−l 1 X ω (ti−1, ti) 2ρ ω (ti, ti+1) 2ρ ≤ 2c2ω (s, t) 2γ  l   n−l  l=0 β 2ρ ! β 2ρ ! n l n−l 4ρ 1 1 X ω (ti−1, ti) 2ρ ω (ti, ti+1) 2ρ = c2ω (s, t) 2γ β2 2ρ  l   n−l  l=0 2ρ ! 2ρ ! n 1 ω (ti−1, ti+1) 2ρ ≤ 4ρc2ω (s, t) 2γ 2  n  β 2ρ !

64 The main estimates where we used the neo-classical inequality (cf. [HH10]) and superadditivity of the control function. Hence for j ≥ 3, n  0 n+1  0 n+1 1 ω (ti−1, ti+1) 2ρ D D D D 2γ Xs,t − Xs,t − Ys,t − Ys,t ≤ 4ρc2ω (s, t)   L2 2 n β 2ρ ! n n  2  2ρ 1 ω (s, t) 2ρ ≤ 4ρc2ω (s, t) 2γ . j − 1 2  n  β 2ρ ! For j = 2 we get n  0 n+1  0 n+1 1 ω (s, t) 2ρ D D D D 2γ Xs,t − Xs,t − Ys,t − Ys,t ≤ 4ρc2ω (s, t)   L2 2 n β 2ρ !

n+1 0  D0 D0  but then D = {s, t} and therefore Xs,t − Ys,t = 0. Hence by successively dropping L2 points we see that

 ∞ n  n n+1  2  2ρ 1 ω (s, t) 2ρ D D  X 2γ Xs,t − Ys,t ≤ 1 +  4ρc2ω (s, t) L2 j − 1 2  n  j=3 β 2ρ ! holds for all partitions D. Since n ≥ [2ρ] + 1,

∞ n ∞ [2ρ]+1 X  2  2ρ X  2  2ρ [2ρ]+1  [2ρ] + 1  ≤ ≤ 2 2ρ ζ − 1 j − 1 j − 1 2ρ j=3 j=3 and thus   [2ρ]+1   [2ρ]+1   2ρ n 4ρ 1 + 2 ζ 2ρ − 1 n+1 1 ω (s, t) 2ρ D D  2γ Xs,t − Ys,t ≤ c2ω (s, t) . L2 β  n  β 2ρ ! By the choice of β, we get the uniform bound

n n+1 1 ω (s, t) 2ρ D D  2γ Xs,t − Ys,t ≤ c2ω (s, t) L2  n  β 2ρ ! which holds for all partitions D. Noting that a.s. convergence implies convergence in L2 in the Wiener chaos, we obtain our claim by sending |D| → 0.

Corollary 2.3.22. Let (X,Y ), ω, ρ and γ as in Lemma 2.3.5. Then for all n ∈ N there are constants C = C (ρ, γ, n) such that

1 n−1 Xn − Yn ≤ Cω [s, t]2 2γ ω [s, t]2 2ρ s,t s,t L2 1−ρ/γ 2  2 holds for every (s, t) ∈ ∆ where  = V∞ RX−Y , [0, 1] .

Proof. For n = 1, 2, 3, 4 this is the content of Proposition 2.3.6, 2.3.7 and 2.3.11. By making the constants larger if necessary, we also get

n−1 2 1 ω [s, t] 2ρ n n 2 2γ Xs,t − Ys,t 2 ≤ c (n) ω [s, t] L  n−1  β 2ρ !

65 Rates Gaussian rough paths with β chosen as in Proposition 2.3.21. We have already seen that

n 2 2ρ n n ω [s, t] Xs,t 2 , Ys,t 2 ≤ c˜(n) L L  n  β 2ρ ! holds for constantsc ˜(n) where n = 1, 2, 3. Since ρ < 2, we have [2ρ] + 1 ≤ 4. From Proposition 2.3.21 we can conclude that

n−1 2 1 ω [s, t] 2ρ n n 2 2γ Xs,t − Ys,t 2 ≤ c (n) ω [s, t] L  n−1  β 2ρ !

c(n) holds for every n ∈ N and constants c (n). Setting C (n) =  n−1  gives our claim. β 2ρ !

2.4 Main result

Assume that X is a Gaussian process as in Theorem 2.0.1 with paths of finite p-variation. Consider a sequence (Λ ) of continuous operators k k∈N p−var 1−var Λk : C ([0, 1] , R) → C ([0, 1] , R) .

1 d p−var d 1 d If x = x , . . . , x ∈ C [0, 1] , R , we will write Λk (x) = Λk x ,..., Λk x . Assume that Λk fulfils the following conditions:

p−var d (i) Λk (x) → x in the |·|∞-norm if k → ∞ for every x ∈ C [0, 1] , R .

(ii) If RX has finite controlled ρ-variation, then, for some C = C (ρ),

sup R ≤ C |R | 2 . (Λk(X),Λl(X)) ρ−var;[0,1]2 X ρ−var;[0,1] k,l∈N

Our main result is the following: Theorem 2.4.1. Let X be a Gaussian process as in Theorem 2.0.1 for ρ < 2 and K ≥  2 Vρ RX , [0, 1] . Then there is an enhanced Gaussian process X with sample paths in C0,p−var [0, 1] ,G[p] d w.r.t. (Λ ) where p ∈ (2ρ, 4), i.e. R k k∈N ρ S (Λ (X)) , X → 0 p−var [p] k Lr

1 1 for k → ∞ and every r ≥ 1. Moreover, choose γ such that γ > ρ and γ + ρ > 1. Then for q > 2γ and every N ∈ N there is a constant C = C (q, ρ, γ, K, N) such that

1− ρ N/2 γ |ρq−var (SN (Λk (X)) ,SN (X))|Lr ≤ Cr sup |Λk (X)t − Xt| 2 d 0≤t≤1 L (R ) holds for every k ∈ N. Proof. The first statement is a fundamental result about Gaussian rough paths, see [FV10b, Theorem 15.33]. For the second, take δ > 0 and set

γ0 = (1 + δ) γ and ρ0 = (1 + δ) ρ.

1 1 0 By choosing δ smaller if necessary we can assume that ρ0 + γ0 > 1 and q > 2γ . Set

0 ω (A) = R ρ k,l (Λk(X),Λl(X)) ρ0−var;A

66 Main result for a rectangle A ⊂ [0, 1]2 and

0 1 − ρ 1 − ρ  2 2 2γ0  2 2 2γ k,l = V∞ R(Λk(X)−Λl(X)), [0, 1] = V∞ R(Λk(X)−Λl(X)), [0, 1] .

0 From Theorem 0.0.1 we know that ωk,l is a 2D control function which controls the ρ -variation of R(Λk(X),Λl(X)). From Corollary 2.3.22 we can conclude that there is a constant c1 such that

1 n−1     0   0 2 2γ 2 2ρ πn SN (Λk (X))s,t − SN (Λl (X))s,t ≤ c1k,lωk,l [s, t] ωk,l [s, t] L2 holds for every n = 1,...,N,(s, t) ∈ ∆ and k, l ∈ N. Now,

n−1   2  2ρ0 n−1 ωk,l [s, t] n−1  2 2ρ0  2 2ρ0 ωk,l [s, t] = ωk,l [0, 1]   2 ωk,l [0, 1]

n−1 n−1 − n−1  2 2γ0  2 2ρ0 2γ0 ≤ ωk,l [s, t] ωk,l [0, 1] .

From Theorem 0.0.1 and our assumptions on the Λk we know that

1/ρ0  2  2 0  ωk,l [0, 1] ≤ c2 |RX |ρ0−var;[0,1]2 ≤ c3Vρ RX , [0, 1] ≤ c4 ρ, ρ ,K . holds uniformly over all k, l. Hence

n     0 2 2γ πn SN (Λk (X))s,t − SN (Λl (X))s,t ≤ c5k,lωk,l [s, t] . L2 Proposition 2.1.7 shows with the same argument that

n n     0   0 2 2ρ 2 2γ πn SN (Λk (X))s,t ≤ c6ωk,l [s, t] ≤ c7ωk,l [s, t] L2 for every k ∈ N and the same holds for SN (Λl (X))s,t. From [FV10b, Proposition 15.24] we can conclude that there is a constant c8 such that

N/2 |ρq−var (SN (Λk (X)) ,SN (Λl (X)))|Lr ≤ c8r k,l holds for all k, l ∈ . In particular, we have shown that (S (Λ (X))) is a Cauchy sequence N N k k∈N r in L and it is clear that the limit is given by the Lyons lift SN (X) of the enhanced Gaussian process X. Now fix k ∈ N. For every l ∈ N,

|ρq−var (SN (Λk (X)) ,SN (X))|Lr ≤ |ρq−var (SN (Λk (X)) ,SN (Λl (X)))|Lr

+ |ρq−var (SN (Λl (X)) ,SN (X))|Lr N/2 ≤ c8r k,l + |ρq−var (SN (Λl (X)) ,SN (X))|Lr . It is easy to see that

1 − ρ  2 2 2γ k,l → V∞ R(Λk(X)−X), [0, 1] for l → ∞ and since

|ρq−var (SN (Λl (X)) ,SN (X))|Lr → 0 for l → ∞ we can conclude that

1 − ρ N/2  2 2 2γ |ρq−var (SN (Λk (X)) ,SN (X))|Lr ≤ c8r V∞ R(Λk(X)−X), [0, 1]

67 Rates Gaussian rough paths

0 0 2 holds for every k ∈ N. Finally, we have for [σ, τ] × [σ , τ ] ⊂ [0, 1]   σ, τ R(Λk(X)−X) 0 0 ≤ 4 sup R(Λk(X)−X) (s, t) d×d σ , τ d×d R R 0≤s

R (s, t) ≤ |Λ (X) − X | |Λ (X) − X | ≤ sup |Λ (X) − X |2 (Λk(X)−X) d×d k s s L2( d) k t t L2( d) k t t L2 d R R R 0≤t≤1 (R ) and therefore

1 − ρ ρ  2 2 2γ 1− γ V∞ R(Λk(X)−X), [0, 1] ≤ c9 sup |Λk (X)t − Xt| 2 d 0≤t≤1 L (R ) which shows the result.

The next Theorem gives pathwise convergence rates for the Wong-Zakai error for suitable approximations of the driving signal.

 2 (k) Theorem 2.4.2. Let X be as in Theorem 2.0.1 for ρ < 2, K ≥ Vρ RX , [0, 1] and X =

Λk (X). Consider the SDEs

n dYt = V (Yt) dXt,Y0 ∈ R (2.7) (k) (k) (k) (k) n dYt = V (Yt ) dXt ,Y0 = Y0 ∈ R (2.8) where |V |Lipθ ≤ ν < ∞ for a θ > 2ρ. Assume that there is a constant C1 and a sequence ( ) ⊂ S lr such that k k∈N r≥1

2 (k) 1/ρ sup Xt − Xt ≤ C1k for all k ∈ N. 0≤t≤1 L2

Choose η, q such that

1 1 1 1  2ρ  0 ≤ η < min − , − and q ∈ , θ . ρ 2 2ρ θ 1 − 2ρη

Then both SDEs (2.7) and (2.8) have unique solutions Y and Y (k) and there is a finite random variable C and a null set M such that

(k) (k) η Y (ω) − Y (ω) ≤ Y (ω) − Y (ω) ≤ C (ω) k (2.9) ∞;[0,1] q−var;[0,1] holds for all k ∈ N and ω ∈ Ω \ M. The random variable C depends on ρ, q, η, ν, θ, K, C1, the sequence ( ) and the driving process X but not on the equation itself. The same holds for k k∈N the set M.

Remark 2.4.3. Note that this means that we have universal rates, i.e. the set M and the ran- dom variable C are valid for all starting points (and also vector fields subject to a uniform Lipθ- bound). In particular, our convergence rates apply to solutions viewed as Cl-diffeomorphisms where l = [θ − q], cf. [FV10b, Theorem 11.12] and [FR11].

68 Main result

1 1 1 1 1 1 Proof of Theorem 2.4.2. Note that γ > ρ and ρ + γ > 1 is equivalent to 0 < 2ρ − 2γ < ρ − 2 . Hence there is a γ > ρ such that η = 1 − 1 and 1 + 1 > 1. Furthermore, 2γ = 2ρ < q. 0 2ρ 2γ0 ρ γ0 0 1−2ρη Choose γ > γ such that still 2γ < q and η < 1 − 1 < 1 − 1 , hence 1 + 1 > 1 hold. Set 1 0 1 2ρ 2γ1 ρ 2 ρ γ1 α := 1 − 1 − η > 0. From Theorem 2.4.1 we know that for every r ≥ 1 and N ∈ there is a 2ρ 2γ1 N constant c1 such that

ρ   1− 1 − 1 (k) N/2 (k) γ N/2 2ρ 2γ ρq−var SN (X ),SN (X) ≤ c1r sup Xt − Xt ≤ c2r k Lr 0≤t≤1 L2 holds for every k ∈ N. Hence

(k)  ρq−var SN (X ),SN (X) ≤ c rN/2α η 2 k k Lr for every k ∈ N. From the Markov inequality, for any δ > 0,

∞ " (k)  # ∞ (k)  r ∞ X ρq−var SN (X ),SN (X) 1 X ρq−var SN (X ),SN (X) X P ≥ δ ≤ ≤ c αr η δr η 3 k k=1 k k=1 k Lr k=1 By assumption, we can choose r large enough such that the series converges. With Borel-Cantelli we can conclude that (k)  ρq−var SN (X ),SN (X) η → 0 k outside a null set M for k → ∞. We set

(k)  ρq−var SN (X ),SN (X) C2 := sup η < ∞ a.s. k∈N k

Since C2 is the supremum of F-measurable random variables it is itself F-measurable. Now set N = [q] which turns ρq−var into a rough path metric. Note that since θ > 2ρ, (2.7) and (2.8) (k) (k) have indeed unique solutions Y and Y . We substitute the driver X by SN (X) resp. X by (k) SN (X ) in the above equations, now considered as RDEs in the q-rough paths space. Since θ > q, both (RDE-) equations have again unique solutions and it is clear that they coincide with Y and Y (k). From

 (k)   (k)  ρq−var SN (X ), 1 ≤ ρq−var SN (X ),SN (X) +ρq−var (SN (X) , 1) ≤ C1+ρq−var (SN (X) , 1)

(k) we see that for every ω ∈ Ω\M the SN (X (ω)) are uniformly bounded for all k in the topology given by the metric ρq−var. Thus we can apply local Lipschitz-continuity of the It¯o-Lyons map (see [FV10b, Theorem 10.26]) to see that there is a random variable C3 such that   (k) (k) η Y − Y ≤ C3ρq−var SN (X ),SN (X) ≤ C3 · C2k q−var;[0,1] holds for every k ∈ N outside M. Finally,

(k) (k) (k) (k) Yt − Yt = Y0,t − Y0,t ≤ Y − Y ≤ Y − Y q−var;[0,t] q−var;[0,1] is true for all t ∈ [0, 1] and the claim follows.

69 Rates Gaussian rough paths

4.1 Mollifier approximations ∞ Let φ be a mollifier function with support [−1, 1], i.e. φ ∈ C0 ([−1, 1]) is positive and |φ|L1 = 1. If x: [0, 1] → R is a continuous path, we denote byx ¯: R → R its continuous extension to the whole real line, i.e.   x0 for x ∈ (−∞, 0] x¯u = xu for x ∈ [0, 1]  x1 for x ∈ [1, ∞) For  > 0 set 1 φ (u) : = φ (u/) and   Z  xt : = φ (t − u)x ¯u du. R Let ( ) be a sequence of real numbers such that  → 0 for k → ∞. Define k k∈N k

k Λk (x) := x . In [FV10b], Chapter 15.2.3 it is shown that the sequence (Λ ) fulfils the conditions of k k∈N Theorem 2.4.1. Corollary 2.4.4. Let X be as in Theorem 2.0.1 and assume that there is a constant C such that  2 1/ρ V R ;[s, t] ≤ C |t − s| holds for all s < t. Choose ( ) ∈ S lr and set X(k) = Xk . ρ X k k∈N r≥1 Then the solutions Y (k) of the SDE (2.8) converge pathwise to the solution Y of (2.7) in the η sense of (2.9) with rate O k where η is chosen as in Theorem 2.4.2. Proof. It suffices to note that for every  > 0, Z ∈ X1,...,Xd and t ∈ [0, 1] we have "Z 2# h  2i ¯  E |Zt − Zt| = E φ (t − u) Zu − Zt du R   Z !2 ¯  = E  φ (t − u) Zu − Zt du  [t−,t+] "Z #   = E φ (t − u) φ (t − v) Z¯u − Zt Z¯v − Zt du dv [t−,t+]2 Z    = φ (t − u) φ (t − v) E Z¯u − Zt Z¯v − Zt du dv [t−,t+]2  ¯  ¯  ≤ sup E Zt+h1 − Zt Zt+h2 − Zt t∈[0,1] |h1|,|h2|≤ h 2i 1/ρ ≤ sup E Z¯t+h − Zt ≤ c1 t∈[0,1] |h|≤

k 2 1/ρ from which follows that sup0≤t≤1 |Xt − Xt|L2 ≤ c1k . We conclude with Theorem 2.4.2.

4.2 Piecewise linear approximations If D = {0 = t0 < t1 < . . . < t#D−1 = 1} is a partition of [0, 1] and x: [0, 1] → R a continuous path, we denote by xD the piecewise linear approximation of x at the points of D, D i.e. x coincides with x at the points ti and if ti ≤ t < ti+1 we have xD − xD x − x ti+1 t = ti+1 ti . ti+1 − t ti+1 − ti

70 Main result

Let (D ) be a sequence of partitions of [0, 1] such that |D | := max {|t − t |} → 0 k k∈N k ti∈Dk i+1 i for k → ∞. If x: [0, 1] → R is continuous, we define

Dk Λk (x) := x .

In [FV10b, Chapter 15.2.3] it is shown that (Λ ) fulfils the conditions of Theorem 2.4.1. If k k∈N RX is the covariance of a Gaussian process, we set

  ρ |D| = max V R ;[t , t ]2 . RX ,ρ ρ X i i+1 ti∈D

Corollary 2.4.5. Let X be as in Theorem 2.0.1. Choose a sequence of partitions (Dk)k∈ of   N the interval [0, 1] such that |D | ∈ S lr and set X(k) = XDk . Then the solutions k RX ,ρ k∈N r≥1 Y (k) of the SDE (2.8) converge pathwise to the solution Y of (2.7) in the sense of (2.9) with   rate O η where ( ) = |D | and η is chosen as in Theorem 2.4.2. k k k∈N k RX ,ρ k∈N

Proof. Let D be any partition of [0, 1] and t ∈ [ti, ti+1] where ti, ti+1 ∈ D. Take Z ∈ X1,...,Xd . Then D t − ti Zt − Zt = Zti,ti+1 − Zti,t. ti+1 − ti Therefore

1/2 1 D  2 2ρ Z − Z ≤ Z + |Z | 2 ≤ 2V R ;[t , t ] ≤ 2 |D| . t t L2 ti,ti+1 L2 ti,t L ρ X i i+1 RX ,ρ

We conclude with Theorem 2.4.2.

Example 2.4.6 Let X = BH be the fractional Brownian motion with Hurst parameter 1 H ∈ (1/4, 1/2]. Set ρ = 2H < 2. Then one can show that RX has finite ρ-variation and  2 1/ρ Vρ RX ;[s, t] ≤ c (H) |t − s| for all (s, t) ∈ ∆ (see [FV11], Example 1). Assume that the vector fields in (2.7) are sufficiently smooth by which we mean that 1/ρ − 1/2 ≤ 1/ (2ρ) − 1/θ, i.e. 2ρ 1 θ ≥ = . ρ − 1 1/2 − H Let (D ) be the sequence of uniform partitions. By Corollary 2.4.5, for every η < 2H − 1/2 k k∈N there is a random variable C such that  1 η Y (k) − Y ≤ C a.s. ∞ k hence we have a Wong-Zakai convergence rate arbitrary close to 2H − 1/2. In particular, for the Brownian motion, we obtain a rate close to 1/2, see also [GS06] and [FR11]. For H → 1/4, the convergence rate tends to 0 which reflects the fact that the L´evyarea indeed diverges for H = 1/4, see [CQ02].

4.3 The simplified step-N Euler scheme Consider again the SDE n dYt = V (Yt) dXt,Y0 ∈ R interpreted as a pathwise RDE driven by the lift X of a Gaussian process X which fulfils the conditions of Theorem 2.0.1. Let D be a partition of [0, 1]. We recall the simplified step-N

71 Rates Gaussian rough paths

Euler scheme from the introduction:

sEulerN ;D Y0 = Y0 N N N N sEuler ;D sEuler ;D  sEuler ;D i 1  sEuler ;D i1 i2 Y = Y + Vi Y X + Vi Vi Y X X tj+1 tj tj tj ,tj+1 2 1 2 tj tj ,tj+1 tj ,tj+1 N 1  sEuler ;D i1 iN + ... + Vi ... Vi Vi Y X ...X N! 1 N−1 N tj tj ,tj+1 tj ,tj+1 where tj ∈ D. In this section, we will investigate the convergence rate of this scheme. For simplicity, we will assume that

 2  1/ρ Vρ RX ;[s, t] = O |t − s| which can always be achieved at the price of a deterministic time-change based on

ρ  2 Vρ RX ; [0, t] [0, 1] 3 t 7→ ρ ∈ [0, 1] .  2 Vρ RX ; [0, 1]

 i Set Dk = k : i = 0, . . . , k .

Corollary 2.4.7. Let p > 2ρ and assume that |V |Lipθ < ∞ for θ > p. Choose η and N such that 1 1 1 1 η < min − , − and N ≤ [θ] . ρ 2 2ρ θ

Then there are random variables C1 and C2 such that

η N+1 −1 N  1   1  p max Y − Y sEuler ;Dk ≤ C + C a.s. for all k ∈ . tj tj 1 2 N tj ∈Dk k k

Proof. Recall the step-N Euler scheme from the introduction (or cf. [FV10b, Chapter 10]). Set N N sEuler ;D Euler ;Dk X(k) = XDk and let Y (k) be the solution of the SDE (2.8). Then Y k = Y (k) tj tj for every tj ∈ Dk and therefore, using the triangle inequality,

N N Euler ;Dk sEuler ;Dk (k) (k)  (k) max Yt − Y ≤ sup Yt − Y + max Y − Y . j tj t tj tj ∈Dk t∈[0,1] tj ∈Dk tj

By the choice of D we have |D | = O k−1. Applying Corollary 2.4.5 we obtain for the k k RX ,ρ (k) −η first term Y − Y ∞ = O (k ). Refering to [FV10b, Theorem 10.30] we see that the second    − N+1 −1 term is of order O k p .

2ρ Remark 2.4.8. Assume that the vector fields are sufficiently smooth, i.e. θ ≥ ρ−1 . Then we    − N+1 −1 obtain an error of O k−(2/p−1/2) + O k p , any p > 2ρ. That means that in the case ρ = 1, the step-2 scheme (i.e. the simplified Milstein scheme) gives an optimal convergence rate of (almost) 1/2. For ρ ∈ (1, 2), the step-3 scheme gives an optimal rate of (almost) 1/ρ − 1/2. In particular, we see that using higher order schemes does not improve the convergence rate since in that case, the Wong-Zakai error persists. In the fractional Brownian motion case, the simplified Milstein scheme gives an optimal convergence rate of (almost) 1/2 for the Brownian motion and for H ∈ (1/4, 1/2) the step-3 scheme gives an optimal rate of (almost) 2H − 1/2. This answers a conjecture stated in [DNT12].

72 3

Integrability of (non-)linear rough differential equations and integrals

Integrability properties of linear rough differential equations (RDEs), and related topics, driven by Brownian and then a Gaussian rough path (GRP), a random rough path X = X (ω), have been a serious difficulty in a variety of recent applications of rough path theory. To wit, for solu- tions of linear RDEs one has the typical - and as such sharp - estimate O (exp ((const) × ωx (0,T ))) p where ωx (0,T ) = kxkp-var;[0,T ] denotes some (homogeneous) p-variation norm (raised to power p). In a Gaussian rough path setting, kXkp-var;[0,T ] enjoys Gaussian integrability but as soon as p > 2 (the ”interesting” case, which covers Brownian and rougher situations) one has lost all control over moments of such (random) linear RDE solutions. In a recent work, Cass, Litterer, Lyons [CLL] have overcome a similar problem, the integrability of the Jacobian of Gaussian RDE flow, as needed in non-Markovian H¨ormandertheory [HP]. With these (and some other, cf. below) problems in mind, we revisit the work of Cass, Litterer, Lyons and propose (what we believe to be) a particularly user-friendly formulation. We avoid the concept of ”localized p-variation”, as introduced in [CLL], and work throughout with a quantity called N[0,T ] (x). As it turns out, in many (deterministic) rough path estimates, as obtained in [FV10b] for instance, one may replace ωx (0,T ) by N[0,T ] (x). Doing so does not require to revisit the (technical) proofs of these rough path estimates, but rather to apply the existing estimates repeatedly on the intervals of a carefully chosen partition of [0,T ]. The point is that N[0,T ] (X) enjoys much better integrability than ωX (0,T ). Of course, this does not rule out that for some rough paths x, N[0,T ] (x) ≈ ωx (0,T ), in agreement with the essentially optimal nature of exisiting rough path estimates in terms of ωx (0,T ). For instance, both quantities will scale like λ when p = 2 and x is the pure-area rough path, dilated by λ >> 1. Differently put, the point is that N[0,T ] (X (ω)) will be smaller than ωX(ω) (0,T ) for most realizations of X (ω). The consequent focus on N[0,T ] rather than ”localized p-variation” aside, let us briefly enlist our contributions relative to [CLL].

(i) A technical condition “p > q [p]” is removed; this shows that the Cass, Litterer, Lyons results are valid assuming only “complementary Young regularity of the Cameron-Martin space”, i.e. H,→ Cq−var where 1/p + 1/q > 1 and sample paths have finite p-variation, a natural condition, in particular in the context of Malliavin calculus, whose importance was confirmed in a number of papers, [FV10a], [FO10], [CFV09], [CF10], see also [FV10b].

(ii) Their technical main result, Weibull tails of N[0,T ] (X) with shape parameter 2/q, here X is a Gaussian rough path, remains valid for general rough paths obtained as image of X under locally linear maps on (rough) path space. (This random rough paths may be far from Gaussian: examples of locally linear maps are given by rough integration and (solving) rough differential equations.) Integrability RDEs

(iii) The arguments are adapted to deal with (random) linear (and also linear growth) rough differential equations (the solution maps here are not locally linear!) driven by X (ω). As above, it suffices that X is the locally linear image of a Gaussian rough path.

We conclude with two applications. First, we show how to recover log-Weibull tails for |J|, the Jacobian of a Gaussian RDE flow. (Afore-mentioned extended validity and some minor sharpening of the norm of |J| aside, this was the main result of [CLL].) Our point here is that [CLL] use somewhat involved (known) explicit estimates for the J in terms of the Gaussian driving signal. In contrast, our “user-friendly” formulation allows for a simple step-by-step approach: recall that J solves dJ = J dM where M is a non-Gaussian driving rough path, obtained by solving an RDE / performing a rough integration. Since M is the locally linear image of a Gaussian rough path, we can immediately appeal to (iii). As was pointed out recently by [HP], such estimates are - in combination with a Norris lemma for rough paths - the key to a non-Markovian Hoermander and then ergodic theory. Secondly, as a novel application, we consider (random) rough integrals of the form R G (X ) dX, with G ∈ Lipγ−1,γ > p and establish Weibull tails with shape parameter 2/q, uniform over classes of Gaussian process whose covariance satisfies a uniform variational estimate. A special case arises when X = X is taken, independently in each component, as solution to the stochastic heat equation on the 1D torus,u ˙ = uxx + W˙ with hyper-viscosity term uxxxx - as function of the space variable, for fixed time. Complementary Young regularity is seen to hold with p > 2 and and q = 1 and we so obtain (and in fact, improve from exponential to Gaussian integrability) the uniform in  integrability estimate [Hai11, Theorem 5.1], a somewhat central technical result whose proof encompasses almost a third of that paper.

3.1 Basic definitions

Definition 3.1.1. Let ω be a control. For α > 0 and [s, t] ⊂ [0,T ] we set

τ0 (α) = s

τi+1 (α) = inf {u : ω (τi, u) ≥ α, τi (α) < u ≤ t} ∧ t and define Nα,[s,t] (ω) = sup {n ∈ N∪ {0} : τn (α) < t} .

When ω arises from a (homogenous) p-variation norm of a (p-rough) path, such as ωx = p p kxkp-var;[·,·] or ω¯x := |||x|||p-var;[·,·], detailed definitions are given later in the text, we shall also write ¯ Nα,[s,t] (x) := Nα,[s,t] (ωx) and Nα,[s,t] (x) := Nα,[s,t] (¯ωx) . −1 In fact, we will be in a situation where C ω¯x ≤ ωx ≤ Cω¯x for some constant C which entails (cf. Lemma 3.1.3 below) ¯ ¯ NαC,[·,·] (x) ≤ Nα,[·,·] (x) ≤ Nα/C,[·,·] (x) .

Furthermore, the precise value of α > 0 will not matter (c.f. Lemma 3.1.4 below) so that a factor C or 1/C is indeed inconsequential; effectively, this means that one can switch between N and N¯ as one pleases. We now study the scaling of Nα. Note that Nα,[s,t] (ω) & 0 for α % ∞. Lemma 3.1.2. Let ω be a control and λ > 0. Then (s, t) 7→ λω (s, t) is again a control and for all s < t, Nα,[s,t] (λω) = Nα/λ,[s,t] (ω) . Proof. Follows directly from the definition.

74 Basic definitions

Lemma 3.1.3. Let ω1, ω2 be two controls, s < t and α > 0. Assume that ω1 (u, v) ≤ Cω2 (u, v) holds whenever ω2 (u, v) ≤ α for a constant C. Then NCα,[s,t] (ω1) ≤ Nα,[s,t] (ω2).

Proof. It suffices to consider the case C = 1, the general case follows by the scaling of N. Set

j τ0 (α) = s j n  j  j o τi+1 (α) = inf u : ωj τi , u ≥ α, τi (α) < u ≤ t ∧ t

2 1 for j = 1, 2. It suffices to show that τi ≤ τi holds for every i ∈ N. By induction over i: For 2 1 i = 0 this is clear. If τi ≤ τi for some fixed i,

1  1  2  ω1 τi , u ≤ ω2 τi , u ≤ ω2 τi , u

2  whenever ω2 τi , u ≤ α. Hence

 2   2  inf ω2 τ , u ≥ α ≤ inf ω1 τ , u ≥ α u i u i

2 1 and therefore τi+1 ≤ τi+1.

Lemma 3.1.4. Let ω be a control and 0 < α ≤ β. Then

β N (ω) ≤ 2N (ω) + 1 . α,[s,t] α β,[s,t] Proof. Set X ωα (s, t) := sup ω(ti, ti+1). (t )=D⊂[s,t] i ti ω(ti,ti+1)≤α

We clearly have ωα (s, t) ≤ ωβ (s, t) and

Nα,[s,t](ω)−1 X αNα,[s,t] (ω) = ω(τi (α) , τi+1 (α)) ≤ ωα(s, t). i=0  Finally, Proposition 4.6 in [CLL] shows that ωβ (s, t) ≤ 2Nβ,[s,t] (ω) + 1 β. (Strictly speaking, Proposition 4.6 is formulated for a particular control ω, namely the control induced by the p- variation of a rough path. However, the proof only uses general properties of control functions and the conclusion remains valid.)

N d Let x: [0,T ] → G R be a path. In the whole section, k·kp−var denotes the p-variation norm for such paths induced by the Carnot-Caratheodory metric; [FV10b]. Set ωx(s, t) = p kxkp−var;[s,t] and Nα,[s,t] (x) = Nα,[s,t] (ωx) (the fact that ωx is indeed a control is well-known; c.f. [FV10b]).

Lemma 3.1.5. For any α > 0,

1/p  kxkp−var;[s,t] ≤ α Nα,[s,t] (x) + 1

Proof. Let u = u0 < u1 < . . . < um = v. Note that

m−1 p p p−1 X p kxu,vk = xu,u1 ⊗ xu1,u2 ⊗ ... ⊗ xum−1,v ≤ m xui,ui+1 . i=0

75 Integrability RDEs

Nα,[s,t](x) Nα,[s,t](x) ¯ Nα,[s,t](x) Let D be a dissection of [s, t] and (τj)j=0 = (τj (α))j=0 . Set D = D ∪ (τj)j=0 . Then, X p p−1 X p xti,ti+1 ≤ Nα,[s,t] (x) + 1 xt¯i,t¯i+1

ti∈D t¯i∈D¯

Nα,[s,t](x) p−1 X ≤ N (x) + 1 kxkp α,[s,t] p−var;[τj ,τj+1] j=0 p ≤ Nα,[s,t] (x) + 1 α. Taking the supremum over all partitions shows the claim.

3.2 Cass, Litterer and Lyons revisited

The basic object is a continuous d-dimensional Gaussian process, say X, realized as coor- d dinate process on the (not-too abstract) Wiener space (E, H, µ) where E = C [0,T ] , R equipped with µ is a Gaussian measure s.t. X has zero-mean, independent components and  2 that Vρ-var R, [0,T ] , the ρ-variation in 2D sense of the covariance R of X, is finite for some ρ ∈ [1, 2). From [FV10b, Theorem 15.33] it follows that we can lift the sample paths of X to p-rough paths for any p > 2ρ and we denote this process by X, called the enhanced Gaussian process. We also assume that the Cameron-Martin space H has complementary Young regularity q-var d 1 1 in the sense that H embeds continuously in C [0,T ] , R with p + q > 1. Note q ≤ p for µ is supported on the paths of finite p-variation. There are many examples of such a situation [FV10b], let us just note that fractional Brownian motion (fBM) with Hurst parameter H > 1/4 falls in this class of Gaussian rough paths. In this section, we present, in a self-contained fashion, the results [CLL]. In fact, we present a slightly modified argument which avoids the technical condition ”p > q [p]” made in [CLL, Theorem 6.2, condition (3)] (this still applies to fBM with H > 1/4 but causes some disconti- nuities in the resulting estimates when H crosses the barrier 1/3). Our argument also gives a unified treatment for all p thereby clarifying the structure of the proof (in [CLL, Theorem 6.2] the cases [p] = 2, 3 are treated separately ”by hand”). That said, we clearly follow [CLL] in their ingenious use of Borell’s inequality. In the whole section, if not stated otherwise, for a p-rough path x, set

1/p  [p]  p/k X (k) |||x|||p−var;[s,t] :=  x  . p/k−var;[s,t] k=1

Then |||·|||p−var is a homogeneous rough path norm. Recall that, as a consequence of Theorem 7.44 in [FV10b], the norms |||·|||p−var and ||·||p−var are equivalent, hence there is a constant C such that 1 |||·||| ≤ ||·|| ≤ C |||·||| . (3.1) C p−var p−var p−var p ¯ The map (s, t) 7→ ω¯x (s, t) = |||x|||p−var;[s,t] is a control and we set Nα,[s,t] (x) = Nα,[s,t] (¯ωx). Lemma 3.2.1. Assume that H has complementary Young regularity to X. Then for any a > 0, the set n o Aa = |||X|||p−var;[0,T ] < a

 2 has positive µ-measure. Moreover, if M ≥ Vρ−var R; [0,T ] , we have the lower bound

n o C µ |||X||| < a ≥ 1 − p−var;[0,T ] exp (a)

76 Cass, Litterer and Lyons revisited where C is a constant only depending on ρ, p and M. Proof. The support theorem for Gaussian rough paths ([FV10b, Theorem 15.60]) shows that

supp [X∗µ] = S[p] (H) holds for p ∈ (2ρ, 4). Hence every neighbourhood of the zero-path has positive measure which is the first statement. The general case follows from the a.s. estimate

|||S[p0] (X) |||p0−var ≤ |||S[p0] (X) |||p−var ≤ Cp,p0 |||X|||p−var (3.2) which holds for every p ≤ p0, c.f. [FV10b, Theorem 9.5]. For the lower bound, recall that from [FV10b, Theorem 15.33] one can deduce that   E exp |||X|||p−var;[0,T ] ≤ C (3.3) for p ∈ (2ρ, 4) where C only depends on ρ, p and M. Using (3.2) shows that this actually holds for every p > 2ρ. Finally, by Chebychev’s inequality, n o C µ |||X||| < a ≥ 1 − . p−var;[0,T ] exp (a)

In the next theorem we cite the famous isoperimetric inequality due to C. Borell (for a proof c.f. [Led96, Theorem 4.3]). Theorem 3.2.2 (Borell). Let (E, H, µ) be an abstract Wiener space and K denote the unit ball in H. If A ⊂ E is a Borell set with positive measure, then for every r ≥ 0 µ (A + rK) ≥ Φ Φ−1 (µ (A)) + r where Φ is the cumulative distribution function of a standard normal random variable, i.e. −1/2 R · 2  Φ = (2π) −∞ exp −x /2 dx. Corollary 3.2.3. Let f, g : E → [0, ∞] be measurable maps and a, σ > 0 such that

Aa := {x : f (x) ≤ a} −1 has positive measure and let aˆ ≤ Φ µ (Aa). Assume furthermore that there exists a null-set N such that for all x ∈ N c and h ∈ H :

f (x − h) ≤ a ⇒ σ khkH ≥ g (x) . Then g has a Gaussian tail; more precisely, for all r > 0, 2 ! aˆ + r  µ ({x : g (x) > r}) ≤ exp − σ . 2 Proof. W.l.o.g. σ = 1. Then [ {x : g (x) ≤ r} = {x : khkH ≥ g (x)} h∈rK [ ⊃ {x : f (x − h) ≤ a} h∈rK [ = {x + h : f (x) ≤ a} h∈rK = Aa + rK. By Theorem 3.2.2, c µ ({x : g (x) > r}) ≤ µ ({Aa + rK} ) ≤ Φ¯ (ˆa + r) where Φ¯ = 1 − Φ. The claim follows from the standard estimate Φ(¯ r) ≤ exp −r2/2.

77 Integrability RDEs

Proposition 3.2.4. Let X be a continuous d-dimensional Gaussian process, realized as coordi- d nate process on (E, H, µ) where E = C [0,T ] , R equipped with µ is a Gaussian measure s.t. X has zero-mean, independent components and that the covariance R of X has finite ρ-variation for some ρ ∈ [1, 2). Let X be its enhanced Gaussian process with sample paths in a p-rough paths space, p > 2ρ. Assume H has complementary Young regularity, so that Cameron–Martin paths 1 1 ˜ enjoy finite q-variation regularity, q ≤ p and p + q > 1. Then there exists a set E ⊂ E of full measure with the following property: If 1/p |||X (ω − h)|||p−var;[0,T ] ≤ α (3.4) for all ω ∈ E˜, h ∈ H and some α > 0 then 1/p ¯ 1/q C |h|q−var;[0,T ] ≥ α Nβ,[0,T ] (X (ω)) where β = 2p [p] α and C depends only on p and q. Proof. Set E˜ = {ω : Th (X (ω)) = X (ω + h) for all h ∈ H} . From [FV10b, Lemma 15.58] we know that E˜ has full measure. Define the random partition ∞ ∞ (τi)i=0 = (τi (β))i=0 for the controlω ¯X. Let h ∈ H and assume that (3.4) holds. We claim that there is a constant Cp,q such that C |h| ≥ α1/p for all i = 0,..., N¯ (X) − 1. (3.5) p,q q−var;[τi,τi+1] β,[0,T ] The statement then follows from ¯ Nβ,[0,T ](X)−1 X Cq |h|q ≥ Cq |h|q ≥ αq/pN¯ (X) . p,q q−var;[0,T ] p,q q−var;[τi,τi+1] β,[0,T ] i=0 ¯ To show (3.5), we first notice that for every i = 0,..., Nβ,[0,T ] (X) − 1,

[p] X p/k β = |||X (ω) |||p = X(k) (ω) . p−var;[τi,τi+1] p/k−var;[τ ,τ ] k=1 i i+1

(k) p/k β M Fix i. Then there is a k ∈ {1,..., [p]} such that X (ω) ≥ . Let D = (tj) p/k−var;[τi,τi+1] [p] j=0 be any dissection of [τi, τi+1]. We define the vector   X(k) (ω) := X(k) (ω) ,..., X(k) (ω) t0,t1 tM−1,tM and do the same for X(k) (ω − h) and for the mixed iterated integrals Z  X if i = 0 dZi1 ⊗ ... ⊗ dZik where Zi = . ∆k h if i = 1 We have then X Z X(k) (ω − h) = (−1)i1+...+ik dZi1 ⊗ ... ⊗ dZik k k ∆ (i1,...,ik)∈{0,1} and by the triangle inequality, Z

dh ⊗ ... ⊗ dh ∆k lp/k  

 X Z  ≥ X(k) (ω) −  X(k) (ω − h) + dZi1 ⊗ ... ⊗ dZik  .(3.6) p/k p/k l  l k p/k   k ∆ l  (i1,...,ik)∈{0,1} 0

78 Cass, Litterer and Lyons revisited

p Since q < 2 and p ≥ q, we can use Young and super-additivity of |h|q−var to see that Z p/k p/k X dh ⊗ ... ⊗ dh ≤ c |h|p q,k q−var;[tj ,tj+1] k p/k ∆ l j ≤ cp/k |h|p . q,k q−var;[τi,τi+1] For the mixed integrals one has for any u < v

Z i1 ik l k−l dZ ⊗ ... ⊗ dZ ≤ ck,l,p,q |h|q−var;[u,v] |||X (ω) |||p−var;[u,v] k ∆u,v where l = i1 +...+ik (this follows from [FV10b, Theorem 9.26]). Hence we have, using H¨older’s inequality and super-additivity

Z p/k lp (k−l)p p/k X dZi1 ⊗ ... ⊗ dZik ≤ c |h| k |||X (ω) ||| k k,l,p,q q−var;[tj ,tj+1] p−var;[tj ,tj+1] k p/k ∆ l j k−l  l/k   k p/k X X ≤ c |h|p |||X (ω) |||p k,l,p,q  q−var;[tj ,tj+1]  p−var;[tj ,tj+1] j j

pl p(k−l) ≤ cp/k |h| k |||X (ω) ||| k k,l,p,q q−var;[τi,τi+1] p−var;[τi,τi+1] and hence Z i1 ik l k−l dZ ⊗ ... ⊗ dZ ≤ ck,p,q |h| |||X (ω) ||| q−var;[τi,τi+1] p−var;[τi,τi+1] ∆k lp/k l k−l = c |h| β p . k,l,p,q q−var;[τi,τi+1] By assumption,

(k) k k/p X (ω − h) ≤ |||X (ω − h) |||p−var;[0,T ] ≤ α . lp/k Plugging this into (3.6) yields

k−1 ! k X l k−l (k) k/p p cq,k |h|q−var;[τ ,τ ] ≥ X (ω) − α + ck,l,p,q |h|q−var;[τ ,τ ] β . i i+1 lp/k i i+1 l=1 Now we can take the supremum over all dissections D and obtain, using X(k) (ω) ≥ p/k−var;[τi,τi+1] k/p  β  [p] ,

k−1 ! k X l k−l (k) k/p p cq,k |h|q−var;[τ ,τ ] ≥ X (ω) − α + ck,l,p,q |h|q−var;[τ ,τ ] β i i+1 p/k−var;[τ ,τ ] i i+1 i i+1 l=1  k/p k−1 ! β k/p X l k−l ≥ − α + ck,l,p,q |h| β p [p] q−var;[τi,τi+1] l=1 k−1 ! k−l   X  1/p l k−l = 2k − 1 αk/p − 2 [p] c |h| α p . k,l,p,q q−var;[τi,τi+1] l=1

By making constants larger if necessary, we may assume that there is a constant ck,p,q such that

k  k−1 ! k 2 − 1 X l k−l |h| ≥ αk/p − |h| α p . q−var;[τi,τi+1] q−var;[τi,τi+1] ck,p,q l=1

79 Integrability RDEs

This implies that there is a constant Ck,p,q depending on ck,p,q such that C |h| ≥ α1/p. k q−var;[τi,τi+1]  Setting Cp,q = max C1,p,q,...,C[p],p,q finally shows (3.5). Now we come to the main result. d Corollary 3.2.5. Let X be a centered Gaussian process in R with independent components and covariance RX of finite ρ-variation, ρ < 2. Consider the Gaussian p-rough paths X for p > 2ρ and assume that there is a continuous embedding ι: H,→ Cq−var 1 1 ¯ where p + q > 1 and let K ≥ kιkop. Then for every α > 0, Nα,[0,T ] (X) has a Weibull tail with shape 2/q. More precisely, there is a constant C = C (p, q) such that  2 1/p 1/q !  ¯  1 α r  µ Nα,[0,T ] (X) > r ≤ exp − aˆ +  2 CK  for every r > 0 where aˆ > −∞ is chosen such that  α  aˆ ≤ Φ−1µ |||X|||p ≤ . p−var;[0,T ] 2p [p] Proof. Set n 1/po Aa = ω : |||X (ω) |||p−var;[0,T ] ≤ a .

Lemma 3.2.1 guarantees that Aa has positive measure for any a > 0. From Proposition 3.2.4 ˜ 1/p we know that there is a set E of full measure such that whenever kX (ω − h)kp−var;[0,T ] ≤ a for ω ∈ E˜, h ∈ H and a > 0 we have 1/p ¯ 1/q a Nβ,[0,T ] (X) ≤ cp,q |h|q−var;[0,T ] ≤ cp,q kιkop khkH where β = 2p [p] a. Setting a = α/ (2p [p]), Corollary 3.2.3 shows that  !2 aˆ α1/pr1/q µ ω : N¯ (X (ω)) > r  ≤ exp − √ + α,[0,T ]  √ 1/p  2 2 2cp,q [p] kιkop −1 wherea ˆ ≤ Φ µ (Aa). ¯ Remark 3.2.6. Corollary 3.2.5 remains valid if one replaces Nα,[0,T ] by Nα,[0,T ] in the state- ment. This follows directly from (3.1) and Lemma 3.1.3 by putting the constant C of (3.1) in the constant Cp,q of the respective corollaries. Remark 3.2.7. In [FV10b, Proposition 15.7] it is shown that r  2 |h|ρ−var ≤ Vρ−var RX ; [0,T ] khkH holds for all h ∈ H. Hence in the regime ρ ∈ [1, 3/2) we can always choose q = ρ and the condi- tions of Corollary 3.2.5 are fulfilled. For the fractional Brownian motion with Hurst parameter 1 1 H one can show that ρ = 2H and q > H+1/2 are valid choices (cf. [FV10b, chapter 15]) and the results of Corollary 3.2.5 remain valid provided H > 1/4.

Remark 3.2.8. If X is a Gaussian rough paths, we know that kXkp−var has a Gaussian tail (or a Weibull tail with shape parameter 2), e.g. obtained by a non-linear Fernique Theorem, cf. [FO10], whereas Corollary 3.2.5 combined with Lemma 3.1.5 only gives that kXkp−var has a Weibull tail with shape 2/q and thus the estimate is not sharp for q > 1. On the other hand, Lemma 3.1.5 is robust and also available in situations where Fernique- (or Borell-) type arguments are not directly available, e.g. in a non-Gaussian setting.

80 Transitivity of the tail estimates under locally linear maps

3.3 Transitivity of the tail estimates under locally linear maps

Existing maps for rough integrals and RDE solutions suggest that we consider maps Ψ such that kΨ(x)kp−var;I ≤ const. kxkp−var;I uniformly over all intervals I ⊂ [0,T ] where kxkp−var;I ≤ R, R > 0. More formally,

p−var N d p−var M e  Definition 3.3.1. We call Ψ: C [0,T ]; G R → C [0,T ]; G (R ) a locally linear map if there is a R ∈ (0, ∞] such that n o kΨkR := inf kΨ(x)kp−var;[u,v] ≤ C kxkp−var;[u,v] for all (u, v) ∈ ∆, x s.t. kxkp−var;[u,v] ≤ R C>0 is finite.

Remark 3.3.2. (i) For λ ∈ R, we denote by δλ the dilation map. Set (δλΨ) : x 7→ δλΨ(x). Then k·kR is homogeneous w.r.t. dilation, e.g. kδλΨkR = |λ| kΨkR.

(ii) If Ψ commutes with the dilation map δ modulo p-variation, e.g. kΨ(δλx)kp-var;I = kδλΨ(x)kp-var;I for any x, λ ∈ R and intervall I ⊂ [0,T ], we have kΨkR = kΨk∞ for any R > 0. An example of such a map is the Lyons lift map

p−var  [p]  d p−var  N  d SN : C [0,T ]; G R → C [0,T ]; G R

for which we have kSN k∞ ≤ C (N, p) < ∞, c.f. [FV10b, Theorem 9.5]. (iii) If φ: [0,T ] → φ [0,T ] ⊂ [0,T ] is a bijective, continuous and increasing function and x a φ φ rough path, we set xt = xφ(t) and call x a reparametrization of x. If Ψ commutes with φ φ reparametrization modulo p-variation, e.g. Ψ x = Ψ(x) for any x, p−var;I p−var;I φ and intervall I ⊂ [0,T ], we have n o kΨkR := inf kΨ(x)kp−var;[0,T ] ≤ C kxkp−var;[0,T ] for all x s.t. kxkp−var;[0,T ] ≤ R . C>0

This follows by a standard reparametrization argument. Examples of such maps are rough R · integration over 1-forms, e.g. x 7→ 0 ϕ (x) dx, and the It¯o-Lyonsmap, e.g. Ψ(x)s,t = ys,t [p] e where y solves dy = V (x) dx with initial condition y0 ∈ G (R ). In this case,

kΨ(x)kp−var;[0,T ] kΨk∞ = sup x kxkp−var;[0,T ]

(where 0/0 := 0) and we find the usual operator norm. (Note that, however, we can not speak of linear maps in this context since rough paths spaces are typically non-linear.)

(iv) Clearly, if kΨk∞ < ∞, kΨ(X)kp−var;[s,t] inherits the integrability properties of kXkp−var;[s,t]. However, for the most interesting maps, e.g. the It¯o-Lyonsmap, we will not have

kΨk∞ < ∞, but kΨkR < ∞ for any finite R > 0. In a way, the purpose of this sec- tion is to show that one still has transitivity of integrability if one considers Nα,[s,t] (X) instead of kXkp−var;[s,t]. Lemma 3.3.3. Let Ψ and Φ be locally linear maps with kΨk < ∞ and kΦk < ∞. Then R RkΨkR Ψ ◦ Φ is again locally linear and

kΦ ◦ Ψk ≤ kΦk kΨk . R RkΨkR R

81 Integrability RDEs

Proof. Let kxkp−var;[u,v] ≤ R. Then kΨ(x)kp−var;[u,v] ≤ kΨkR kxkp−var;[u,v] ≤ R kΨkR which implies

kΦ ◦ Ψ(x)k ≤ kΦk kΨ(x)k ≤ kΦk kΨk kxk . p−var;[u,v] RkΨkR p−var;[u,v] RkΨkR R p−var;[u,v]

The interesting property of locally linear maps is formulated in the next proposition.

p−var N d p−var M e  Proposition 3.3.4. Let Ψ: C [0,T ]; G R → C [0,T ]; G (R ) be locally lin- ear and kΨkR < ∞ for some R ∈ (0, ∞]. Then

N p (Ψ (x)) ≤ N (x) αkΨkR,[s,t] α,[s,t] for any s < t and α ∈ (0,Rp].

Proof. Follows directly from Lemma 3.1.3.

3.1 Full RDEs Consider the full RDE [p] e dy = V (y) dx; y0 ∈ G (R ) (3.7) [p] d where x is a weak geometric p-rough path with values in G R , V = (Vi)i=1,....d is a collection γ e of Lip -vector fields in R where γ > p and y0 is the initial value. Theorem 10.36 and 10.38 in [FV10b] state that (3.7) possesses a unique solution y which is a weak geometric p-rough path [p] e with values in G (R ). Corollary 3.3.5. The It¯o-Lyonsmap Ψ: x 7→ y is locally linear with

 p p−1 kΨk ≤ K kV k γ−1 ∨ kV k R (3.8) R Lip Lipγ−1 for any R ∈ (0, ∞) where K only depends on p and γ. Moreover, if kV kLipγ−1 ≤ ν, then for any α > 0 there is a constant C = C (p, γ, ν, α) such that  Nα,[s,t] (y) ≤ C Nα,[s,t] (x) + 1 for any s < t.

Proof. (3.8) follows from the estimate (10.26) of [FV10b, Theorem 10.36]. From Proposition 3.3.4 we obtain

Nβ,[s,t] (Ψ (x)) ≤ Nα,[s,t] (x) where β = α kΨkp . This already shows the claim if kΨkp ≤ 1. In the case kΨkp > 1, α1/p α1/p α1/p we conclude with Lemma 3.1.4.

3.2 Rough integrals γ−1 d e If x is a p-rough path and ϕ = (ϕi)i=1,...,d a collection of Lip R , R -maps, one can define the rough integral Z ϕ (x) dx (3.9)

p−var [p] e  as an element in C [0,T ]; G (R ) (c.f. [FV10b, chapter 10.6]).

82 Linear RDEs

Corollary 3.3.6. The map Ψ: x 7→ z, z given by the rough integral (3.9), is locally linear with p−1 kΨkR ≤ K kϕkLipγ−1 1 ∨ R (3.10) for any R ∈ (0, ∞) where K only depends on p and γ. Moreover, if kϕkLipγ−1 ≤ ν, then for any α > 0 there is a constant C = C (p, γ, ν, α) such that  Nα,[s,t] (z) ≤ C Nα,[s,t] (x) + 1 for any s < t. Proof. (3.10) follows from [FV10b, Theorem 10.47]. One proceeds as in the proof of Corollary 3.3.5.

3.4 Linear RDEs

For a p-rough path x, consider the full linear RDE

[p] e dy = V (y) dx; y0 ∈ G (R ) (3.11) where V = (Vi)i=1,...,d is a collection of linear vector fields of the form Vi (z) = Aiz + bi, Ai are e e × e matrices and bi ∈ R . It is well-known (e.g. [FV10b, section 10.7]) that in this case (3.11) has a unique solution y. Unfortunately, the map Ψ: x 7→ y is not locally linear in the sense of Definition 3.3.1 and our tools of the former section do not apply. However, we can do a more direct analysis and obtain a different transitivity of the tail estimates. Let ν be a bound on maxi (|Ai| + |bi|) and set y = π1 (y). In [FV10b, Theorem 10.53] one sees that there is a constant C depending only on p such that

 p p  kys,tk ≤ C (1 + |ys|) ν kxkp−var;[s,t] exp Cν kxkp−var;[s,t] (3.12) holds for all s < t ∈ [0,T ]. (Strictly speaking, we only find the estimate for (s, t) = (0, 1), the general case follows by reparametrization.) We start with an estimate for the supremum norm of y.

Lemma 3.4.1. For any α > 0 there is a constant C = C (p, ν, α) such that  |y|∞;[s,t] ≤ C (1 + |ys|) exp CNα;[s,t] (x) holds for any s < t. Proof. From (3.12) we have

 p p  |yu,v| ≤ C (1 + |yu|) ν kxkp−var;[u,v] exp Cν kxkp−var;[u,v] (3.13) for any u < v ∈ [s, t]. From |yu,v| = |ys,v − ys,u| ≥ |ys,v| − |ys,u| we obtain

 p p  |ys,v| ≤ C (1 + |yu|) ν kxkp−var;[u,v] exp Cν kxkp−var;[u,v] + |ys,u| n p p o ≤ C (1 + |ys| + |ys,u|) exp Cν kxkp−var;[u,v] by making C larger. Now let s = τ0 < . . . < τN < τM+1 = u ≤ t with M ≥ 0. By induction, one sees that ( M ) M+1 X p p |ys,u| ≤ C ((M + 1) (1 + |ys|)) exp C ν kxk p−var;[τi,τi+1] i=0 ( M ) M+1 X p p ≤ C (1 + |ys|) exp C ν kxk . p−var;[τi,τi+1] i=0

83 Integrability RDEs

This shows that for every u ∈ [s, t],

(Nα;[s,t](x)+1) p  |ys,u| ≤ C (1 + |ys|) exp Cν α Nα;[s,t] (x) + 1  p  = (1 + |ys|) exp (log (C) + Cν α) Nα;[s,t] (x) + 1 and hence  sup |ys,u| ≤ C (1 + |ys|) exp CNα;[s,t] (x) u∈[s,t] for a constant C = C (p, ν, α) and therefore also  |y|∞;[s,t] ≤ C (1 + |ys|) exp CNα;[s,t] (x) .

Corollary 3.4.2. Let α > 0. Then there is a constant C = C (p, ν, α) such that

p  Nα,[s,t] (y) ≤ C (1 + |ys|) exp CNα;[s,t] (x) for any s < t. Proof. Using (3.12) we can deduce that

   p p  kyu,vk ≤ C 1 + |y|∞;[s,t] ν kxkp−var;[u,v] exp Cν kxkp−var;[u,v] holds for any u < v ∈ [s, t] and hence also

   p p  kykp−var;[u,v] ≤ C 1 + |y|∞;[s,t] ν kxkp−var;[u,v] exp Cν kxkp−var;[u,v]

p for any u < v ∈ [s, t]. Now take u < v ∈ [s, t] such that kxkp−var;[u,v] ≤ α. We then have p ˜ p kykp−var;[u,v] ≤ C kxkp−var;[u,v] where p ˜ p   p p C = C 1 + |y|∞;[s,t] ν exp (pCν α). From Lemma 3.1.3,

NCα,˜ [s,t] (y) ≤ Nα,[s,t] (x) . If C˜ ≤ 1, this already shows the claim. For C˜ > 1, we use Lemma 3.1.4 and Lemma 3.4.1 to see that   ˜ Nα,[s,t] (y) ≤ 2NCα,˜ [s,t] (y) + 1 C   p ≤ C Nα,[s,t] (x) + 1 1 + |y|∞;[s,t] p  ≤ C (1 + |ys|) exp CNα,[s,t] (x) .

Remark 3.4.3 (Unbounded vector fields). Let x be a p-rough path. Consider a collection γ−1 e V = (Vi)1≤i≤d of locally Lip -vector fields on R , γ ∈ (p, [p] + 1), such that Vi are Lips- [p]   chitz continuous and the vector fields V = Vi1 ,...,Vi[p] are (γ − [p])-H¨older i1,...,i[p]∈{1,...,d} continuous. Then the RDE

[p] e dy = V (y) dx; y0 ∈ G (R )

84 Applications in stochastic analysis has a unique solution (c.f. [FV10b, Exercise 10.56] and the solution thereafter and [Lej09]). Moreover, in [FV10b] it is shown that

 p p  ky0,1k ≤ C (1 + |y0|) ν kxkp−var;[0,1] exp Cν kxkp−var;[0,1]

[p] 1/[p] |V (y)−V (z)| where C = C (p, γ) and ν is a bound on V (γ−[p])-H¨ol ∨ supy,z |y−z| . This shows that Lemma 3.4.1 and Corollary 3.4.2 apply for y, hence for any α > 0 there is a constant C = C (p, γ, ν, α) such that

p  Nα,[s,t] (y) ≤ C (1 + |ys|) exp CNα;[s,t] (x) for all s < t in this case.

3.5 Applications in stochastic analysis

5.1 Tail estimates for stochastic integrals and solutions of SDEs driven by Gaussian signals We now apply our results to solutions of SDEs and stochastic integrals driven by Gaussian signals, i.e. a Gaussian rough path X. Remark that all results here may be immediately formulated for SDEs and stochastic integrals driven by random rough paths as along as suitable quantitative Weibull-tail estimate for Nα,[0,T ] (X) are assumed. We first consider the non-linear case:

d Proposition 3.5.1. Let X be a centered Gaussian process in R with independent components and covariance RX of finite ρ-variation, ρ < 2. Consider the Gaussian p-rough paths X for p > 2ρ and assume that there is a continuous embedding

ι: H,→ Cq−var

1 1 e where p + q > 1. Let Y : [0,T ] → R be the pathwise solution of the stochastic RDE

e dY = V (Y ) dX; Y0 ∈ R

γ e where V = (Vi)i=1,....d is a collection of Lip -vector fields in R with γ > p. Moreover, let e Z : [0,T ] → R be the stochastic integral given by Z t  Zt = π1 ϕ (X) dX 0

γ−1 d e where ϕ = (ϕi)i=1,...,d is a collection of Lip R , R -maps, γ > p. Then both kY kp−var;[0,T ] and kZkp−var;[0,T ] have Weibull tails with shape parameter 2/q. More precisely, if K ≥ kιkop,  2 M ≥ Vρ−var R; [0,T ] and ν ≥ kV kLipγ−1 there is a constant η = η (p, q, ρ, γ, ν, K, M) > 0 such that   1   P kY k > r ≤ exp −ηr2/q for all r ≥ 0 p−var;[0,T ] η and the same holds for kZkp−var;[0,T ] if ν ≥ kϕkLipγ−1 instead. In particular, kY kp−var;[0,T ] and kZkp−var;[0,T ] have finite exponential moments as long as q < 2. Proof. From 3.2.1 we know that there is a α = α (ρ, p, M) such that

 α  1 P |||X|||p ≤ ≥ . p−var;[0,T ] 2p [p] 2

85 Integrability RDEs

−1 1  Hence, by Corollary 3.2.5, applied witha ˆ = Φ 2 = 0, and the remark thereafter,

 2 1/p 1/q !   1 α r  P Nα,[0,T ] (X) > r ≤ exp − for all r ≥ 0  2 c1  with c1 = c1 (p, q, K, M). Corollary 3.3.5 shows that there is a constant c2 = c2 (p, q, K, M, γ, ν) such that also ( 2/q )  r P Nα,[0,T ] (Y) > r ≤ c2 exp − for all r ≥ 0. c2

From Lemma 3.1.5 we see that

1/p  kY kp−var;[0,T ] ≤ kYkp−var;[0,T ] ≤ α Nα,[0,T ] (Y) + 1

which shows the claim for kY kp−var;[0,T ]. The same holds true for kZkp−var;[0,T ] by using Corollary 3.3.6.

Remark 3.5.2. In the Brownian motion case (q = 1), we recover the well-known fact that solutions Y of the Stratonovich SDE

e dY = V (Y ) ◦ dB; Y0 ∈ R have Gaussian tails at any fixed point Yt provided V is sufficiently smooth. We also recover that the Stratonovich integral Z t ϕ (B) ◦ dB 0 has finite Gaussian tails for every t ≥ 0, ϕ sufficiently smooth.

e Proposition 3.5.3. Let X be as in Proposition 3.5.1. Let Y : [0,T ] → R be the pathwise solution of the stochastic linear RDE

e dY = V (Y ) dX; Y0 ∈ R where V = (Vi)i=1,...,d is a collection of linear vector fields of the form Vi (z) = Aiz + bi, Ai are e   e × e matrices and bi ∈ R . Then log kY kp−var;[0,T ] has a Weibull tail with shape 2/q. More  2 precisely, if K ≥ kιkop, M ≥ Vρ−var R; [0,T ] and ν ≥ maxi (|Ai| + |bi|) there is a constant η = η (p, q, ρ, ν, K, M) > 0 such that

    1   P log kY k > r ≤ exp −ηr2/q for all r ≥ 0 p−var;[0,T ] η

s In particular, kY kp−var;[0,T ] has finite L -moments for any s > 0 provided q < 2. Proof. Same as for Proposition 3.5.1 using Corollary 3.4.2.

Remark 3.5.4. In the case q = 1, which covers Brownian driving signals, we have log-normal tails. This is in agreement with trivial examples such as the standard Black-Scholes model in which the stock price St is log-normally distributed.

Remark 3.5.5. The same conclusion holds for unbounded vector fields as seen in remark 3.4.3.

86 Applications in stochastic analysis

5.2 The Jacobian of the solution flow for SDEs driven by Gaussian signals d 1 d e e Let x: [0,T ] → R be smooth and let V = V ,...,V : R → R be a collection of e d e e vector fields. We can interpret V as a function V : R → L R , R with derivative DV : R → e d e ∼ d e  L R ,L R , R = L R , End (R ) . It is well-known that for sufficiently smooth V , the ODE dy = V (y) dx has a solution for every starting point y0 and the solution flow y0 → Ut←0 (y0) = yt is (Fr´echet) x differentiable. We denote its derivative by Jt←0 (y0) = DUt←0 (·) |·=y0 . Moreover, for fixed y0, x the Jacobian Jt = Jt←0 (y0) is given as the solution of the linear ODE

dJt = dMt · Jt; J0 = Id

e where Mt ∈ End(R ) is given by the integral Z t Mt = DV (ys) dxs. (3.14) 0 If x is a p-rough path, one proceeds in a similar fashion. First, in order to make sense of (3.14) if x and y are rough paths, one has to define the joint rough path (x, y) = z ∈ p−var [p] d e C [0,T ] ,G R ⊕ R first. To do so, one defines z as the solution of the full RDE ˜ dz = V (z) dx; z0 = exp (0, y0) .

p−var [p] e×e  where V˜ = (Id, V ). Then, one defines M ∈ C [0,T ] ,G (R ) as the rough integral Z t Mt = φ (z) dz 0

d e d e e  0 0 0 where φ: R ⊕ R → L R ⊕ R , End (R ) is given by φ (x, y)(x , y ) = DV (y)(x ) for all 0 d 0 e x x x, x ∈ R and y, y ∈ R . Finally, one obtains Jt = Jt←0 (y0) as the solution of the linear RDE x x dJt = dMt · Jt ; J0 = Id. All this can be made rigorous; for instance, see [FV10b, Theorem 11.3]. Next, we give an alternative proof of the main result of [CLL], slightly sharpened in the sense that we consider the p-variation norm instead of the supremum norm.

d Proposition 3.5.6. Let X be a centered Gaussian process in R with independent components and covariance RX of finite ρ-variation, ρ < 2. Consider the Gaussian p-rough paths X for p > 2ρ and assume that there is a continuous embedding

ι: H,→ Cq−var

1 1  X  where p + q > 1. Then log J·←0 (y0) p−var;[0,T ] has a Weibull tail with shape 2/q. In X r particular, if q < 2, this implies that J·←0 (y0) p−var;[0,T ] has finite L -moments for any r > 0.

Proof. From Corollary 3.2.5 we know that N1,[0,T ] (X) has a Weibull tail with shape 2/q. Com- bining the Corollaries 3.3.5, 3.3.6 and 3.4.2 shows that there is a constant C such that

X    log N1,[0,T ] J·←0 (y0) + 1 ≤ C N1;[0,T ] (X) + 1 . From Lemma 3.1.5 we know that

X X X  J·←0 (y0) p−var;[0,T ] ≤ J·←0 (y0) p−var;[0,T ] ≤ N1,[0,T ] J·←0 (y0) + 1.

87 Integrability RDEs

5.3 An example from rough SPDE theory In situations where one performs a change of measure to an equivalent measure on a path space, one often has to make sense of the exponential moments of a stochastic integral, i.e. to show that  Z Z  E exp G (X) dX + F (X) dt (3.15) is finite for a given process X and some suitable maps G and F . The second integral is often trivially handled (say, when F is bounded) and thus take F = 0 in what follows. Various situations in the literature (e.g. [Hai11], [CDFO]) require to bound (3.15) uniformly over a family of processes, say (Xε : ε > 0). We will see in this section that our results are perfectly suited for doing this. In the following, we study the situation of [Hai11, section 4]. Here ψε = ψε (t, x; ω) is the stationary (in time) solution to the damped stochastic heat equation with hyper-viscosity of parameter ε > 0, √  2   dψ = − ∂xxxxψ dt + (∂xx − 1) ψ dt + 2 dWt

2 where W is space-time white noise, a cylindrical Wiener process over L (T) where T denotes the torus, say [−π, π] with periodic boundary conditions. Following [Hai11] we fix t, so that the ”spatial” interval [−π, π] plays the role of our previous ”time-horizon” [0,T ]. Note that x 7→  ψ (x, t) is a centered Gaussian process on T, with independent components and covariance given by E (ψ (x, t) ⊗ ψ (y, t)) = R (x, y) I = K (x − y) I where K (x) is proportional to X cos (kx) . 1 + k2 + 2k4 k∈Z As was pointed out by Hairer, it can be very fruitful in a non-linear SPDE context to consider ε limε→0 ψ (t, ·, ω) as random spatial rough path. To this end, it is stated (without proof) in [Hai11] that the covariance of ψε has fnite ρ-variation in 2D sense, ρ > 1, uniformly in ε. In fact, we can show something slightly stronger. Following [Hai11], ψε is C1 in x for every ε > 0, and can be seen as p-rough path, any p > 2, when ε = 0.

2  Lemma 3.5.7. The map T 3 (x, y) 7→ R (x, y) has finite 1-variation in 2D sense, uniformly in ε. That is, ε 2 M := sup V1−var R ; T < ∞. ε≥0

Proof. By lower semi-continuity of variation norms under pointwise convergence, it suffices to consider ε > 0. (Alternatively, the case ε = 0 is treated explicitly in [Hai11]). We then note that 1  X cos (kx) π cosh (|x| − π) =  1 + 2k2  sinh π  k∈Z 

2 1  in L (T) as may be seen by Fourier expansion on [−π, π] of the function x 7→ cosh  (|x| − π) . 00 Since |∂x,yR (x, y)| = |K (x − y)| we have

Z Z ε 2 00 V1−var R ; T = |∂x,yR (x, y)| dx dy = K (x − y) dx dy T2 T2 88 Applications in stochastic analysis

On the other hand,

 2  1  00 X 1 k π cosh  (|x| − π) K (x) ≤ − cos (kx) +  1 + 2k2 1 + k2 + 2k4  sinh π  k∈Z 

1  X 1 cos (kx) π cosh  (|x| − π) = 1 + + k2 (1 + 2k2) (1/k2 + 1 + 2k2)  sinh π  k6=0  1  X 1 π cosh (|x| − π) ≤ 1 + +  k2  sinh π  k6=0  2 1  π π cosh  (|x| − π) ≤ 1 + + π  . 3  sinh  Hence Z  2  Z   00 2 π π 1 K (x − y) dx dy ≤ (2π) 1 + + π  cosh (|x − y| − π) dx dy. T2 3  sinh  T2  We leave to the reader to see that the final integral is bounded, independent of ε. For instance, introduce z = x − y as new variable so that only

π Z 2π 1  π  cosh (|z| − π) dz  sinh  −2π  π Z π z  = 4 π cosh dz  sinh  0  needs to be controlled. Using cosh ≈ sinh ≈ exp for large arguments (or integrating explicitly ...) we get

Z π Z π z  1 z  exp  π  cosh dz ≈ π  dz  sinh  0  0  exp  π  exp ε − 1 = π  exp  ≤ 1

We then have the following sharpening of [Hai11, Theorem 5.1].

Theorem 3.5.8. Fix γ > 2 and p ∈ (2, γ). Assume G = (Gi)i=1,...,d is a collection of γ−1 d e   Lip R , R maps. Then for some constant η = η γ, p, kGkLipγ−1 > 0 we have the uniform estimate ( Z 2!)   sup sup E exp η G (ψ (x, t)) dxψ (x, t) < ∞. t∈[0,∞) ε≥0 T (When ε > 0, ψ is known to be C1 in x so that we deal with Riemann–Stieltjes integrals, when ε = 0, the integral is understood in rough path sense.) Proof. By stationarity in time of ψ (·, t), uniformity in t is trivial. Note that the Riemann– Stieltjes integral Z   G (ψ (x, t)) dxψ (x, t) T 89 Integrability RDEs can also be seen as rough integral where the integrator is given by the ”smooth” rough path ε R ε ε ψ , ψ ⊗ dxψ when ε > 0. For ε = 0, the above integral is a genuine rough integral, the existence of a canonical lift of ψ0 (·, t) to a geometric rough path is a standard consequence (cf. [FV10a, FV10b]) of finite 1-variation of R0, the covariance function of ψ0 (·, t). After these remarks, ( Z 2!)   sup E exp C G (ψ (x, t)) dxψ (x, t) < ∞ ε≥0 T is an immediate application of Lemma 3.5.7 and Proposition 3.5.1.

90 4

A simple proof of distance bounds for Gaussian rough paths

The intersection between rough path theory and Gaussian processes has been an active research area in recent years ([FV10a], [FV10b], [Hai11]). The central idea of rough paths, as realized by Lyons ([Lyo98]), is that the key properties needed for defining integration against an irregular path do not only come from the path itself, but from the path together with a sequence of iterated integrals along the path, namely

Z n Xs,t = dXu1 ⊗ ... ⊗ dXun . (4.1) s

In particular, Lyons extension theorem shows that for paths of finite p-variation, the first bpc levels iterated integrals determine all higher levels. For instance, if p = 1, the path has bounded variation and the higher iterated integrals coincide with the usual Riemann-Stieltjes integrals. However, for p ≥ 2, this is not true anymore and one has to say what the second (and possibly higher) order iterated integrals should be before they determine the whole rough path. Lyons and Zeitouni ([LZ99]) were the first to study iterated Wiener integrals in the sense of rough paths. They provide sharp exponential bounds on the iterated integrals of all levels by controlling the variation norm of the L´evyarea. The case of more general Gaussian processes were studied by Friz and Victoir in [FV10a] and [FV10b]. They showed that if X is a Gaussian process with covariance of finite ρ-variation for some ρ ∈ [1, 2), then its iterated integrals in the sense of (4.1) can be defined in a natural way and we can lift X to a Gaussian rough path X. In the recent work [FR], Friz and the first author compared the two lift maps X and Y for the joint process (X,Y ). It was shown that their average distance in rough paths topology can ζ be controlled by the value supt |Xt − Yt|L2 for some ζ > 0, and a sharp quantitative estimate for ζ was given. In particular, it was shown that considering both rough paths in a larger rough paths space (and therefore in a different topology) allows for larger choices of ζ. Using this, the authors derived essentially optimal convergence rates for X → X in rough paths topology when  → 0 where X is a suitable approximation of X. n n In order to prove this result, sharp estimates of |Xs,t − Ys,t| need to be calculated on every 3 level n. Under the assumption ρ ∈ [1, 2 ), the sample paths of X and Y are p-rough paths for any p > 2ρ, hence we can always choose p < 3 and therefore the first two levels determine the entire rough path. Lyons’ continuity theorem then suggests that one only needs to give sharp estimates on level 1 and 2; the estimates on the higher levels can be obtained from the lower levels through induction. On the other hand, interestingly, one additional level was estimated ”by hand” in [FR] before performing the induction. To understand the necessity of computing this additional term, let us note from [Lyo98] that the standard distance for two deterministic Simple proof distance bounds

p-rough paths takes the form of the smallest constant Cn such that

n n n p |Xs,t − Ys,t| ≤ Cnω(s, t) , n = 1, ··· , bpc holds for all s < t where ω is a control function to be defined later. The exponent on the control for the next level is expected to be n + 1 bpc + 1 = > 1, (4.2) p p so when one repeats Young’s trick of dropping points in the induction argument (the key idea of the extension theorem), condition (4.2) will ensure that one can establish a maximal inequality for the next level. However, in the current problem where Gaussian randomness is involved, the L2 distance for the first b2ρc iterated integrals takes the form

n n 1 + n−1 3 |X − Y | 2 < C ω(s, t) 2γ 2ρ , n = 1, 2, ρ ∈ [1, ), s,t s,t L n 2 where γ might be much larger than ρ. Thus, the ’n − 1’ in the exponent leaves condition (4.2) unsatisfied, and one needs to compute the third level by hand before starting induction on n. In this article, we resolve the difficulty by moving part of  to fill in the gap in the control so that the exponent for the third level control reaches 1. In this way, we obtain the third level estimate merely based on the first two levels, and it takes the form

3 3 η |Xs,t − Ys,t|L2 < C3 ω(s, t), where η ∈ (0, 1], and its exact value depends on γ and ρ. We see that there is a 1 − η reduction in the exponent of , which is due to the fact that it is used to compensate the control exponent. This interplay between the ’rate’ exponent and the control exponent can be viewed as an analogy to the relationship between time and space regularities for solutions to SPDEs. We will make the above heuristic argument rigorous in section 4. We also refer to the recent work [LX12] for the situation of deterministic rough paths. Our main theorem is the following.

1 1 d d d+d Theorem 4.0.1. Let (X,Y ) = (X ,Y , ··· ,X ,Y ) : [0,T ] → R be a centered Gaussian process on a probability space (Ω, F,P ) where (Xi,Y i) and (Xj,Y j) are independent for i 6= j, 2 2d×2d with continuous sample paths and covariance function R(X,Y ) : [0,T ] 7→ R . Assume further 3 that there is a ρ ∈ [1, 2 ) such that the ρ-variation of R(X,Y ) is bounded by a finite constant K. 1 1 Let γ ≥ ρ such that γ + ρ > 1. Then, for every σ > 2γ, N ≥ bσc, q ≥ 1 and every δ > 0 small enough, there exists a constant C = C(ρ, γ, σ, K, δ, q) such that

1 1 (i) If 2γ + ρ > 1, then

1− ρ N γ q |ρσ−var(X, Y)|L ≤ C sup |Xt − Yt|L2 . t∈[0,T ]

1 1 (ii) If 2γ + ρ ≤ 1, then

N 3−2ρ−δ q |ρσ−var(X, Y)|L ≤ C sup |Xt − Yt|L2 . t∈[0,T ]

The proof of this theorem will be postponed to subsection 4.2.2 after we have established all the estimates needed.

92 2D variation and Gaussian rough paths

Remark 4.0.2. We emphasize that the constant C in the above theorem depends on the process (X,Y ) only through the parameters ρ and K.

This chapter is structured as follows. In section 4.1, we introduce the class of Gaussian pro- cesses which possess a lift to Gaussian rough paths and estimate the difference of two Gaussian rough paths on level one and two. Section 4.2 is devoted to the proof of the main theorem. We first obtain the third level estimate directly from the first two levels, which requires a technical extension of Lyons’ continuity theorem, and justify the heuristic argument above rigorously. All higher level estimates are then obtained with the induction procedure in [Lyo98], and the claim of the main theorem follows. In section 4.3, we give two applications of our main theorem. The first one deals with convergence rates for Wong-Zakai approximations in the context of rough differential equations. The second example shows how to derive optimal time regularity for the solution of a modified stochastic heat equation seen as an evolution in rough paths space.

Notations. Throughout the chapter, C,Cn,Cn(ρ, γ) will denote constants depending on certain parameters only, and their actual values may change from line to line.

4.1 2D variation and Gaussian rough paths

If I = [a, b] is an interval, a dissection of I is a finite subset of points of the form {a = t0 < . . . < tm = b}. The family of all dissections of I is denoted by D(I). Let I ⊂ R be an interval and A = [a, b] × [c, d] ⊂ I × I be a rectangle. Recall the following definitions: If f : I × I → V is a function, mapping into a normed vector space V , we define the rectangular increment f(A) by setting

 a, b   b   a   b   a  f(A) := f := f − f − f + f . c, d d d c c

Definition 4.1.1. Let p ≥ 1 and f : I × I → V . For [s, t] × [u, v] ⊂ I × I, set

1   p   p X ti, ti+1 V (f;[s, t] × [u, v]) :=  sup f  . p  t0 , t0   (ti)∈D([s,t]) 0 j j+1  0 ti,tj (tj )∈D([u,v])

If Vp(f, I × I) < ∞, we say that f has finite (2D) p-variation. We also define   σ, τ V∞(f;[s, t] × [u, v]) := sup f σ,τ∈[s,t] µ, ν µ,ν∈[u,v]

Lemma 4.1.2. Let f : I × I → V be a continuous map and 1 ≤ p ≤ p0 < ∞. Assume that f has finite p-variation. Then for every [s, t] × [u, v] ⊂ I × I we have

p p 1− 0 0 Vp0 (f;[s, t] × [u, v]) ≤ V∞(f;[s, t] × [u, v]) p Vp(f;[s, t] × [u, v]) p

93 Simple proof distance bounds

0 Proof. Let (ti) ∈ D([s, t]) and (tj) ∈ D([u, v]). Then,   p0   p X ti, ti+1 0 X ti, ti+1 f ≤ V (f;[s, t] × [u, v])p −p f . t0 , t0 ∞ t0 , t0 0 j j+1 0 j j+1 ti,tj ti,tj Taking the supremum over all partitions gives the claim.

0 Lemma 4.1.3. Let f : I × I → R be continuous with finite p-variation. Choose p such that p0 ≥ p if p = 1 and p0 > p if p > 1. Then there is a control ω and a constant C = C(p, p0) such that

1 0 Vp0 (f; J × J) ≤ ω(J) p ≤ CVp(f; J × J) holds for every interval J ⊂ I. Proof. Follows from [FV11, Theorem 1].

1 d d Let X = (X ,...,X ): I → R be a centered, stochastic process. Then the covariance d×d function RX (s, t) := CovX (s, t) = E(Xs ⊗ Xt) is a map RX : I × I → R and we can ask for its ρ-variation (we will use the letter ρ instead of p in this context). Clearly, RX has finite ρ- i j variation if and only if for every i, j ∈ {1, . . . , d} the map s, t 7→ E(XsXt ) has finite ρ-variation. i j In particular, if X and X are independent for i 6= j, RX has finite ρ-variation if and only if RXi has finite ρ-variation for every i = 1, . . . , d. In the next example, we calculate the ρ-variation for the covariances of some well-known real valued Gaussian processes. In particular, we will see that many interesting Gaussian processes have a covariance of finite 1-variation.

Example 4.1.4

(i) Let X = B be a Brownian motion. Then RB(s, t) = min{s, t} and thus, for A = [s, t] × [u, v], Z |R(A)| = |(s, t) ∩ (u, v)| = δx=y dx dy. [s,t]×[u,v]

This shows that RB has finite 1-variation on any interval I.

(ii) More generally, let f : [0,T ] → R be a left-continuous, locally bounded function. Set Z t Xt = f(r) dBr. 0 Then, for A = [s, t] ∩ [u, v] we have by the It¯oisometry, "Z Z # Z RX (A) = E f dB f dB = δx=yf(x)f(y) dx dy [s,t] [u,v] [s,t]×[u,v]

which shows that RX has finite 1-variation. (iii) Let X be an Ornstein-Uhlenbeck process, i.e. X is the solution of the SDE

dXt = −θXt dt + σ dBt (4.3)

for some θ, σ > 0. If we claim that X0 = 0, one can show that X is centered, Gaussian and a direct calculation shows that the covariance of X has finite 1-variation on any interval [0,T ]. The same is true considering the stationary solution of (4.3) instead.

94 2D variation and Gaussian rough paths

(iv) If X is a continuous Gaussian martingale, it can be written as a time-changed Brownian motion. Since the ρ-variation of its covariance is invariant under time-change, X has again a covariance of finite 1-variation.

(v) If X : [0,T ] → R is centered Gaussian with X0 = 0, we can define a Gaussian bridge by X X (t) = X − t T . Bridge t T One can easily show that if the covariance of X has finite ρ-variation, the same is true for XBridge. In particular, Brownian bridges have finite 1-variation.

Next, we cite the fundamental existence result about Gaussian rough paths. For a proof, cf. [FV10a] or [FV10b, Chapter 15].

d Theorem 4.1.5 (Friz, Victoir). Let X : [0,T ] → R be a centered Gaussian process with con- tinuous sample paths and independent components. Assume that there is a ρ ∈ [1, 2) such 2 that Vρ(RX ; [0,T ] ) < ∞. Then X admits a lift X to a process whose sample paths are geo- 0,p−var bpc d metric p-rough paths for any p > 2ρ, i.e. with sample paths in C ([0,T ],G (R )) and π1(Xs,t) = Xt − Xs for any s < t.

In the next proposition, we give an upper L2-estimate for the difference of two Gaussian rough paths on the first two levels.

1 1 d d d+d Proposition 4.1.6. Let (X,Y ) = (X ,Y , ··· ,X ,Y ) : [0,T ] → R be a centered Gaussian process with continuous sample paths where (Xi,Y i) and (Xj,Y j) are independent for i 6= j. 3 2 Let ρ ∈ [1, 2 ) and assume that Vρ0 (R(X,Y ), [0,T ] ) ≤ K < +∞ for a constant K > 0 where 0 0 1 1 ρ < ρ in the case ρ > 1 and ρ = 1 in the case ρ = 1. Let γ ≥ ρ such that γ + ρ > 1. 0 Then there are constants C0,C1,C2 dependending on ρ, ρ , γ and K and a control ω such that ω(0,T ) ≤ C0 and

ρ 1− 1 γ 2γ |Xs,t − Ys,t|L2 ≤ C1 sup |Xu − Yu|L2 ω(s, t) u∈[s,t] and

t t ρ Z Z 1− 1 1 γ 2γ + 2ρ Xs,u ⊗ dXu − Ys,u ⊗ dYu ≤ C2 sup |Xu − Yu|L2 ω(s, t) s s L2 u∈[s,t] hold for every s < t.

2 Proof. Note first that, by assumption on Vρ0 (R(X,Y ); [0,T ] ), Lemma 4.1.3 guarantees that there 0 is a control ω and a constant c1 = c1(ρ, ρ ) such that

2 2 2 1/ρ Vρ(RX ;[s, t] ) ∨ Vρ(RY ;[s, t] ) ∨ Vρ(R(X−Y );[s, t] ) ≤ ω(s, t)

ρ holds for all s < t and i = 1, . . . , d with the property that ω(0,T ) ≤ c1K =: C0. We will estimate both levels componentwise. We start with the first level. Let i ∈ {1, . . . , d}. Then,   i i 2 s, t X − Y 2 = R i i s,t s,t L (X −Y ) s, t 2 ≤ Vγ(R(X−Y );[s, t] )

95 Simple proof distance bounds and thus q 2 |Xs,t − Ys,t|L2 ≤ c2 Vγ(R(X−Y );[s, t] ).

For the second level, consider first the case i = j. We have, using that (X,Y ) is Gaussian and that we are dealing with geometric rough paths,

Z t Z t 1 Xi dXi − Y i dY i = (Xi )2 − (Y i )2 s,u u s,u u s,t s,t L2 s s L2 2 1 i i i i = (X − Y )(X + Y ) 2 2 s,t s,t s,t s,t L i i i i  ≤ c X − Y |X | 2 + |Y | 2 . 3 s,t s,t L2 s,t L s,t L From the first part, we know that q Xi − Y i ≤ V (R ;[s, t]2). s,t s,t L2 γ (X−Y ) Furthermore, s   q 1 i s, t 2 |X | 2 = RX ≤ Vρ(RX ;[s, t] ) ≤ ω(s, t) 2ρ s,t L s, t

i and the same holds for |Ys,t|L2 . Hence

t t Z Z q 1 i i i i 2 2ρ Xs,u dXu − Ys,u dYu ≤ c4 Vγ(R(X−Y );[s, t] )ω(s, t) . s s L2 For i 6= j, Z t Z t i j i j Xs,u dXu − Ys,u dYu s s L2 Z t Z t i i j i j j ≤ (X − Y )s,u dXu + Ys,u d(X − Y )u . s L2 s L2 We estimate the first term. From independence,

"Z t 2# Z   i i j s, u E (X − Y )s,u dXu = R(Xi−Y i) dRXj (u, v) s [s,t]2 s, v where the integral on the right is a 2D Young integral.1 By a 2D Young estimate (cf. [Tow02]),

Z   s, u 2 2 R(Xi−Y i) dRXj (u, v) ≤ c5(ρ, γ)Vγ(R(Xi−Y i);[s, t] )Vρ(RXj ;[s, t] ) [s,t]2 s, v 2 1/ρ ≤ c6Vγ(R(X−Y );[s, t] )ω(s, t) . The second term is treated exactly in the same way. Summarizing, we have shown that

t t Z Z q 1 2 Xs,u ⊗ dXu − Ys,u ⊗ dYu ≤ C Vγ(R(X−Y );[s, t] )ω(s, t) 2ρ . s s L2 1The reader might feel a bit uncomfortable at this point asking why it is allowed to put expectation inside the integral (which is not even an integral in Riemann-Stieltjes sense). However, this can be made rigorous by dealing with processes which have sample paths of bounded variation first and passing to the limit afterwards (cf. [FV10a, FV10b, FR, FH]). We decided not to go too much into detail here in order not to distract the reader from the main ideas and to improve the readability.

96 2D variation and Gaussian rough paths

Finally, by Lemma 4.1.2 2 2 1−ρ/γ 1/γ Vγ(R(X−Y );[s, t] ) ≤ V∞(R(X−Y );[s, t] ) ω(s, t) and by the Cauchy-Schwarz inequality 2 2 V∞(R(X−Y );[s, t] ) ≤ 4 sup |Xu − Yu|L2 u∈[s,t] which gives the claim.

Corollary 4.1.7. Under the assumptions of Proposition 4.1.6, for every γ satisfying γ ≥ ρ and 1 1 0 γ + ρ > 1, and every p > 2ρ and γ > γ, there is a (random) control ωˆ such that n n/p |Xs,t| ≤ ωˆ(s, t) (4.4) n n/p |Ys,t| ≤ ωˆ(s, t) (4.5) 1 + n−1 n n 2γ0 p |Xs,t − Ys,t| ≤ ωˆ(s, t) (4.6) ρ 1− γ holds a.s. for all s < t and n = 1, 2 where  = supu∈[0,T ] |Xu − Yu|L2 . Furthermore, there is a constant C = C(p, ρ, γ, γ0,K) such that p/2 γ0 |ωˆ(0,T )|Lq ≤ CT (q + q ) holds for all q ≥ 1. Proof. Let ω be the control from Proposition 4.1.6. We know that

1 n−1 n n 2γ + 2ρ |Xs,t − Ys,t|L2 ≤ c1ω(s, t) n n 2ρ holds for a constant c1 for all s < t and n = 1, 2. Furthermore, |Xs,t|L2 ≤ c2ω(s, t) for a constant c2 for all s < t and n = 1, 2 and the same holds for Y (this just follows from setting d Y = const. and γ = ρ in Proposition 4.1.6). Now introduce a new process X˜ : [0,T ] → R on the same sample space as X such that for all sample points, we have ˜ Xω(0,t)/ω(0,T ) = Xt, ∀t ∈ [0,T ], and define Y˜ in the same way. Then X˜ , Y˜ are well defined, multiplicative, and we can replace the control ω by c3K|t−s| for the two re-parametrized processes. Using that X,Y are Gaussian, 2 q n n/2 we may pass from L to L estimates and we know that O(|X |Lq ) = q (same for Y and X − Y, cf. [FV10b, Appendix A]). Hence

q n ˜ n 1/ρ n 2ρ |Xs,t|Lq ≤ c4( qK ) |t − s| (4.7)

q n ˜ n 1/ρ n 2ρ |Ys,t|Lq ≤ c4( qK ) |t − s| (4.8) q 1 + (n−1) ˜ n ˜ n 1/ρ n 2γ 2ρ |Xs,t − Ys,t|Lq ≤ c˜ 4( qK ) |t − s| (4.9)

1 − 1 hold for all s < t, n = 1, 2 and q ≥ 1 with ˜ = K 2γ 2ρ . Using Lemma A.1.1 in the appendix, 0 we see that there is a constant c5 = c5(p, ρ, γ, γ ,K) such that

|X˜ n | sup s,t ≤ c qn/2 (4.10) n/p 5 s

|Y˜ n | sup s,t ≤ c qn/2 (4.11) n/p 5 s

|X˜ n − Y˜ n | sup s,t s,t ≤ c qn/2 (4.12) 1/p(n) 5 s

97 Simple proof distance bounds

1 1 n−1 hold for q sufficiently large and n = 1, 2 where p(n) = 2γ0 + p . Set

n X n p/n ωˆX (s, t) := sup |Xti,ti+1 | D⊂[s,t] ti∈D n X n p/n ωˆY (s, t) := sup |Yti,ti+1 | D⊂[s,t] ti∈D n X n n p(n) ωˆX−Y (s, t) := sup |Xti,ti+1 − Yti,ti+1 | D⊂[s,t] ti∈D and

1 X n n p(n) n ωˆ(s, t) := ωˆX (s, t) +ω ˆY (s, t) +  ωˆX−Y (s, t). n=1,2 for s < t. Clearly,ω ˆ fulfils (4.4), (4.5) and (4.6). Moreover, the notion of p-variation is invariant under reparametrization, hence

|X˜ n |p/n n X n p/n X ˜ n p/n s,t ωˆX (0,T ) = sup |Xti,ti+1 | = sup |Xti,ti+1 | ≤ T sup D⊂[0,T ] D⊂[0,T ] s

n n and a similar estimate holds forω ˆY (0,T ) andω ˆX−Y (0,T ). By the triangle inequality and the estimates (4.7), (4.8) and (4.9),

1 X n n p(n) n |ωˆ(0,T )|Lq ≤ |ωˆX (0,T )|Lq + |ωˆY (0,T )|Lq +  |ωˆX−Y (0,T )|Lq n=1,2  p/2 p(1) p(2) p/2 γ0 ≤ c6T q + q 2 + q ≤ c7T (q + q ) for q large enough. We can extend the estimate to all q ≥ 1 by making the constant larger if necessary.

Corollary 4.1.8. Let ωˆ be the random control defined in the previous corollary. Then, for every n, there exists a constant cn such that

n n n p n p |Xs,t| < cnωˆ(s, t) , |Ys,t| < cnωˆ(s, t)

2n a.s. for all s < t. The constants cn are deterministic and can be chosen such that cn ≤ (n/p)! , where x! := Γ(x − 1). Proof. Follows from the extension theorem, cf. [Lyo98, Theorem 2.2.1] or [LCL07, Theorem 3.7].

4.2 Main estimates

1 1 + In what follows, we let p ∈ (2ρ, 3). Let γ ≥ ρ such that γ + ρ > 1. We write log x = max{x, 0}, and set

ρ 1− γ  = sup |Xu − Yu|L2 u∈[0,T ]

98 Main estimates

2.1 Higher level estimates

N d We first introduce some notations. Suppose X is a multiplicative functional in T (R ) with finite p-variation controlled by ω, N ≥ bpc. Then, define

N ˆ X n N+1 d Xs,t = 1 + Xs,t ∈ T (R ). n=1

N N+1 Then, Xˆ is multiplicative in T , but in general not in T . For any partition D = {s = u0 < u1 < ··· < uL < uL+1 = t}, define

ˆ D ˆ ˆ N+1 d Xs,t := Xs,u1 ⊗ · · · ⊗ XuL,t ∈ T (R ).

The following lemma gives a construction of the unique multiplicative extension of X to higher degrees. It was first proved in Theorem 2.2.1 in [Lyo98].

N Lemma 4.2.1. Let X be a multiplicative functional in T . Let D = {s < u1 < ··· < uL < t} j be any partition of (s, t), and D denote the partition with the point uj removed from D. Then,

N ˆ D ˆ Dj X n N+1−n N+1 d Xs,t − Xs,t = Xuj−1,uj ⊗ Xuj ,uj+1 ∈ T (R ). (4.13) n=1

In particular, its projection onto the subspace T N is the 0-vector. Suppose further that X has finite p-variation controlled by ω, and N ≥ bpc, then the limit

ˆ D N+1 d lim Xs,t ∈ T (R ) |D|→0 exists. Furthermore, it is the unique multiplicative extension of X to T N+1 with finite p- variation controlled by ω.

Theorem 4.2.2. Let (X,Y ) and ρ, γ as in Proposition 4.1.6. Then for every p > 2ρ and γ0 > γ 0 there exists a constant C3 depending on p and γ and a (random) control ωˆ such that for all q ≥ 1, we have

|ωˆ(0,T )|Lq ≤ M < +∞, where M = M(p, ρ, γ, γ0, K, q), and the following holds a.s. for all [s, t]:

1 2 (i) If 2γ0 + p > 1, then

1 + 2 3 3 2γ0 p |Xs,t − Ys,t| < C3ωˆ(s, t) .

1 2 (ii) If 2γ0 + p = 1, then

1− p 3 3 +  2γ0  |Xs,t − Ys,t| < C3 · (1 + log ωˆ(0,T )/ ) · ωˆ(s, t).

1 2 (iii) If 2γ0 + p < 1, then

3−p 3 3 1−p/2γ0 |Xs,t − Ys,t| < C3 ωˆ(s, t),

99 Simple proof distance bounds

Proof. Let s < t ∈ [0,T ] and letω ˆ be the (random) control defined in Corollary 4.1.7. Then, by the same corollary, for every q ≥ 1, |ωˆ(0,T )|Lq ≤ M. Fix an enhanced sample rough path (X, Y) up to level 2 and for simplicity, we will use ω to denote the corresponding realisation of the (random) controlω ˆ. We can assume without loss of generality that

1 − 1  < ω(s, t) p 2γ0 , (4.14) otherwise there will be nothing to prove. Let D = {s = u0 < ··· < uL+1 = t} be a dissection. Then (cf. [Lyo98, Lemma 2.2.1]), there exists a j such that 2 ω(u , u ) ≤ ω(s, t),L ≥ 1. (4.15) j−1 j+1 L j Let D denote the dissection with the point uj removed from D. Then, we have

2 ˆ D ˆ D 3 ˆ Dj ˆ Dj 3 X k 3−k |(Xs,t − Ys,t) | < |(Xs,t − Ys,t ) | + (|Ruj−1,uj ⊗ Xuj ,uj+1 | k=1 k 3−k k 3−k + |Xuj−1,uj ⊗ Ruj ,uj+1 | + |Ruj−1,uj ⊗ Ruj ,uj+1 |), where Rs,t = Ys,t − Xs,t. By assumption,   1 1 + 2 1 3 |Rk ⊗ R3−k | < C · min  ω(s, t) 2γ0 p , ω(s, t) p , (4.16) uj−1,uj uj ,uj+1 L L and similar inequalities hold for the other two terms in the bracket. Thus, we have   j j 1 1 + 2 1 3 |(Xˆ D − Yˆ D )3| < |(Xˆ D − Yˆ D )3| + C min  ω(s, t) 2γ0 p , ω(s, t) p . s,t s,t s,t s,t 3 L L Let N be the integer that

1 1 − 1 1 1 − 1 [ ω(s, t)] p 2γ0 ≤  < [ ω(s, t)] p 2γ0 , (4.17) N + 1 N then

1 1 + 2 1 3 [ ω(s, t)] 2γ0 p < [ ω(s, t)] p L L if and only if L ≤ N. By Lemma 4.2.1, we have 3 ˆ D 3 3 ˆ D 3 Xs,t = lim (Xs,t) , Ys,t = lim (Ys,t) . |D|→0 |D|→0

Thus, for a fixed partition D, we choose a point each time according to (4.15), and drop them successively. By letting |D| → +∞, we have

 N +∞  X 1 1 + 2 X 1 3 |X3 − Y3 | ≤ C  ω(s, t) 2γ0 p + ω(s, t) p . s,t s,t 3 L L L=1 L=N+1 Approximating the sums by integrals, we have

N +∞ 1 + 2 Z −( 1 + 2 ) 3 Z 3 3 3 2γ0 p 2γ0 p p − p |Xs,t − Ys,t| < C3[ω(s, t) (1 + x dx) + ω(s, t) x dx]. 1 N Compute the second integral, and use

1 ( 1 − 1 ) [ ω(s, t)] p 2γ0 ≤ , N + 1

100 Main estimates we obtain N 1 + 2 Z −( 1 + 2 ) 3−p 3 3 2γ0 p 2γ0 p 1−p/2γ0 |Xs,t − Ys,t| < C3[ω(s, t) (1 + x dx) +  ω(s, t)]. (4.18) 1 Now we apply the above estimates to the three situations respectively.

1 2 1. 2γ0 + p > 1. In this case, the integral

N +∞ Z −( 1 + 2 ) Z −( 1 + 2 ) x 2γ0 p dx < x 2γ0 p dx < +∞ 1 1 converges. On the other hand, (4.14) implies

3−p 1 + 2  1−p/2γ0 ω(s, t) < ω(s, t) 2γ0 p , thus, from (4.18), we get

1 + 2 3 3 2γ0 p |Xs,t − Ys,t| < C3ω(s, t) .

1 2 2. 2γ0 + p = 1.

1 1 3−p 3−p In this case, p − 2γ0 = p , and 1−p/2γ0 = 1. Thus, by the second inequality in (4.17), we have

Z N p x−1dx = log N < log ω(s, t) − log . 1 3 − p On the other hand, (4.14) gives p log ω(s, t) − log  > 0. 3 − p Combining the previous two bounds with (4.18), we get p |X3 − Y3 | < C [1 + log ω(s, t) − log ]ω(s, t). s,t s,t 3 3 − p We can simplify the above inequality to

p 3 3 + 3−p |Xs,t − Ys,t| < C3[1 + log (ω(0,T )/ )]ω(s, t),

3 −1 where we have also included the possibility of  ≥ ω(0,T ) p .

1 2 3. 2γ0 + p < 1. Now we have N Z −( 1 + 2 ) 1− 1 − 2 −(1− 1 − 2 )/ 1 − 1 1− 1 − 2 1 + x 2γ0 p dx < CN 2γ0 p < C ·  2γ0 p p 2γ0 ω(s, t) 2γ0 p , 1 where the second inequality follows from (4.17). Combining the above bound with (4.18), we obtain

3−p 3 3 1−p/2γ0 |Xs,t − Ys,t| < C3 ω(s, t).

101 Simple proof distance bounds

The following theorem, obtained with the standard induction argument, gives estimates for all levels n = 1, 2, ··· .

Theorem 4.2.3. Let (X,Y ) and ρ, γ as in Proposition 4.1.6, p > 2ρ and γ0 > γ. Then there exists a (random) control ωˆ such that for every q ≥ 1, we have

|ωˆ(0,T )|Lq ≤ M

0 where M = M(p, ρ, γ, γ , q, K), and for each n there exists a (deterministic) constant Cn de- pending on p and γ0 such that a.s. for all [s, t]:

1 2 (i) If 2γ0 + p > 1, then we have

1 + n−1 n n 2γ0 p |Xs,t − Ys,t| < Cnωˆ(s, t)

1 2 (ii) If 2γ0 + p = 1, then we have

1− p 1 + n−1 n n +  2γ0  2γ0 p |Xs,t − Ys,t| < Cn · (1 + log ωˆ(0,T )/ ) · ωˆ(s, t) .

1 2 (iii) If 2γ0 + p < 1, then for all s < t and all small , we have

3−p n−1+{p} n n 1−p/2γ0 p |Xs,t − Ys,t| < Cn ωˆ(s, t) . (4.19)

1 2 Proof. We prove the case when 2γ0 + p < 1; the other two situations are similar. Letω ˆ be the control in the previous theorem. Fix an enhanced sample path (X, Y), the corresponding realisation ω ofω ˆ, and s < t ∈ [0,T ]. We may still assume (4.14) without loss of generality. Thus, for n = 1, 2, we have

1 + n−1 3−p n−1+{p} n n 2γ0 p 1−p/2γ0 p |Xs,t − Ys,t| < ω(s, t) < Cn ω(s, t) , where the second inequality comes from (4.14). The above inequality also holds for k = 3 by the previous theorem. Now, suppose (4.19) holds for k = 1, ··· , n, where n ≥ 3, then for level k = n + 1, the exponent is expected to be n + {p} > 1, p so that the usual induction procedure works (cf. [Lyo98], Theorem 2.2.2.). Thus, we prove (4.19) for all n.

2.2 Proof of Theorem 4.0.1

1 1 Proof. We prove the second situation when 2γ + ρ ≤ 1. The first one is similar. Let  = ρ 1− γ supu∈[0,T ] |Xu − Yu|L2 . It is sufficient to show that for every p > 2ρ there is a constant C such that

3−p N 1−ρ/γ |ρσ−var(X, Y)|Lq ≤ C , where σ > 2γ and N ≥ bσc both satisfy the assumptions of Theorem 4.0.1. Set

ρ0 := (1 + η)ρ, p := 2(1 + 2η)ρ, γ0 := (1 + η)γ, γ00 := (1 + 2η)γ

102 Applications

1 1 for some η > 0. We can choose η small enough such that ρ0 + γ0 > 1 and p < 3 hold, and 0 0 1 2 1 2 the conditions of Theorem 4.2.3 are satisfied for ρ and γ . Clearly γ00 + p < 2γ + p ≤ 1, thus Theorem 4.2.3 implies that

3−p n−1+{p} n n 1−ρ/γ p |Xs,t − Ys,t| < Cn ωˆ(s, t) holds a.s. for any n and s < t whereω ˆ is a random control as in Theorem 4.2.3. Furthermore, for any n,

n − 1 + {p} n 1 1 1 − {p} n 1 1 2 1 = + n( − ) − = + (n − 1)( − ) + (1 − − ). p 2γ00 p 2γ00 p 2γ00 p 2γ00 p 2γ00 (4.20)

Note that the last expression implies that

1 1 1 − {p} θ := n( − ) − > 0 n p 2γ00 p for all n. Fix a dissection D = {0 = u0 < . . . < uL < T } of the interval [0,T ]. Using ωˆ(ui, ui+1) ≤ ωˆ(0,T ), we have   n   n X σ σ 3−p X σ σ n n n 1−ρ/γ θn 2γ00 |Xui,ui+1 − Yui,ui+1 | ≤ Cn ωˆ(0,T ) ωˆ(ui, ui+1) . i i Choosing η smaller if necessary, we may assume that σ ≥ 2γ00 and super-additivity of the control implies

n  σ  σ n X 00 00 ωˆ(ui, ui+1) 2γ ≤ ωˆ(0,T ) 2γ . i Passing to the supremum over all partitions of [0,T ], we have

  n X σ σ 3−p n−1+{p} n n n 1−ρ/γ p sup |Xui,ui+1 − Yui,ui+1 | ≤ Cn ωˆ(0,T ) . D i Let q ≥ 1. By Theorem 4.2.3, there is a constant M depending on ρ, γ, σ, δ, q and K such that q |ωˆ(0,T )|Lq ≤ M. Taking L norm on both sides, we have

3−p N 1−ρ/γ |ρσ−var(X, Y)|Lq ≤ C which was the claim.

4.3 Applications

3.1 Convergence rates of rough differential equation

Consider the rough differential equation of the form

d X i e dYt = Vi(Yt) dXt =: V (Yt) dXt; Y0 ∈ R (4.21) i=1

103 Simple proof distance bounds

d d where X is a centered Gaussian process in R with independent components and V = (Vi)i=1 a e collection of bounded, smooth vector fields with bounded derivatives in R . Rough path theory gives meaning to the pathwise solution to (4.21) in the case when the covariance RX has finite 3 ρ-variation for some ρ < 2. Assume that ρ ∈ [1, 2 ) and that there is a constant K such that

2 1 Vρ(RX ;[s, t] ) ≤ K|t − s| ρ (4.22) for all s < t (note that this condition implies that the sample paths of X are α-H¨olderfor all 1 α < 2ρ ). For simplicity, we also assume that [0,T ] = [0, 1]. For every k ∈ N, we can approximate the sample paths of X piecewise linear at the time points {0 < 1/k < 2/k < . . . < (k−1)/k < 1}. We will denote this process by X(k). Clearly, X(k) → X uniformly as k → ∞. Now we substitute X by X(k) in (4.21), solve the equation and obtain a solution Y (k); we call this the Wong-Zakai approximation of Y . One can show, using rough path theory, that Y (k) → Y a.s. in uniform topology as k → ∞. The proposition below is an immediate consequence of Theorem 4.0.1 and gives us rates of convergence.

1 Proposition 4.3.1. The mesh size k Wong-Zakai approximation converges uniformly to the 3 −( 2ρ −1−δ) 3 solution of (4.21) with a.s. rate at least k for any δ ∈ (0, 2ρ − 1). In particular, the 1 rate is arbitrarily close to 2 when ρ = 1, which is the sharp rate in that case. Proof. First, one shows that (4.22) implies that

(k) 1 − 2ρ sup |Xt − Xt|L2 = O(k ). t∈[0,1]

One can show (cf. [FV10b, Chapter 15.2.3]) that there is a constant C such that

1 2 ρ sup Vρ(R(X,X(k));[s, t] ) ≤ C|t − s| k∈N holds for all s < t. By choosing q large enough, a Borel-Cantelli type argument applied to (k) Theorem 4.0.1 shows that ρσ−var(X, X ) → 0 a.s. for k → ∞ with rate arbitrarily close to

− 1 ( 1 − 1 ) 1 1 k 2 ρ γ , if + > 1, 2γ ρ and arbitrarily close to

1 1 k−(3−2ρ), if + ≤ 1, 2γ ρ

3 1 1 both cases are subject to γ ≥ 2 and γ + ρ > 1. Note that in the second situation, the actual 3 value of γ does not matter, and we always have a rate of ’almost’ 2ρ − 1. For the first situation, we need to let γ as large as possible but still satisfy the constraints. The critical value is 1 1 1 3 γ∗ = 2 (1 − ρ ), which also results in a rate that is arbitrarily close to 2ρ − 1. Using the local Lipschitz property of the It¯oLyons map (cf. [FV10b, Theorem 10.26]), we conclude that the Wong-Zakai convergence rate is faster than

−( 3 −1−δ) k 2ρ for any δ > 0 (but not for δ = 0).

104 Applications

3 Remark 4.3.2. For ρ ∈ (1, 2 ), the rate above is not optimal. In fact, the sharp rate in this case 1 1 is ’almost’ ρ − 2 , as shown in [FR]. The reason for the non-optimality of the rate is that we obtain the third level estimate merely based on the first two levels, which leads to a reduction in the exponent in the rate. On the other hand, this method does not use any Gaussian structure on the third level, and can be applied to more general processes. For the case ρ = 1, we recover 1 the sharp rate of ’almost’ 2 .

3.2 The stochastic heat equation

In the theory of stochastic partial differential equations (SPDEs), one typically considers the SPDE as an evolution equation in a function space. When it comes to the question of time and space regularity of the solution, one discovers that they will depend on the particular choice of this space. As a rule of thumb, the smaller the space, the lower the time regularity ([Hai09], Section 5.1). The most prominent examples of such spaces are Hilbert spaces, typically Sobolev spaces. However, in some cases, it can be useful to choose rough paths spaces instead ([Hai11]). A natural question now is whether the known regularity results for Hilbert spaces are also true for rough paths spaces. In this section, we study the example of a modified stochastic heat equation for which we can give a positive answer. Consider the stochastic heat equation:

dψ = (∂xx − 1)ψ dt + σ dW (4.23) where σ is a positive constant, the spatial variable x takes values in [0, 2π], W is space-time white 2 d noise, i.e. a standard cylindrical Wiener process on L ([0, 2π], R ), and ψ denotes the stationary d 1 solution with values in R . The solution ψ is expected to be almost 4 -H¨oldercontinuous in time 1 and almost 2 -H¨oldercontinuous in space (cf. [Hai09]). In the next Theorem, we show that this is indeed the case if we choose the appropriate rough paths space.

Theorem 4.3.3. Let p > 2. Then, for any fixed t ≥ 0, the process x 7→ ψt(x) is a Gaussian process (in space) which can be lifted to an enhanced Gaussian process Ψt(·), a process with 0,p−var bpc d sample paths in C ([0, 2π],G (R )). Moreover, t 7→ Ψt(·) has a H¨oldercontinuous modi-  1 1  fication (which we denote by the same symbol). More precisely, for every α ∈ 0, 4 − 2p , there exists a (random) constant C such that

α ρp−var(Ψs, Ψt) ≤ C|t − s| holds almost surely for all s < t. In particular, choosing p large gives a time regularity of almost 1 4 -H¨older.

Proof. The fact that x 7→ ψt(x) can be lifted to a process with rough sample paths and that there is some H¨older-continuity in time was shown in Lemma 3.1 in [Hai11], see also [FH]. We quickly repeat the argument and show where we can use our results in order to derive the exact H¨olderexponents. Using the standard Fourier basis

 1  √ sin(kx) if k > 0  π e (x) = √1 if k = 0 k 2π   √1  π cos(kx) if k < 0 the equation (4.23) can be rewritten as a system of SDEs

k 2 k k dYt = −(k + 1)Yt dt + σ dWt

105 Simple proof distance bounds

k k where (W )k∈Z is a collection of independent standard Brownian motions and (Y )k∈Z are the stationary solutions of the SDEs, i.e. a collection of centered, independent, stationary Ornstein-Uhlenbeck processes. The solution of (4.23) is thus given by the infinite sum ψt(x) = P Y ke (x). One can easily see that k∈Z t k 2 σ X cos(k(x − y)) 2 E [ψ (x) ⊗ ψ (y)] = e−(1+k )|t−s| × I s t 4π 1 + k2 d k∈Z

d×d where Id denotes the identity matrix in R . In particular, for s = t,

E [ψt(x) ⊗ ψt(y)] = K(x − y) × Id where K is given by

σ2 K(x) = cosh(|x| − π) 4 sinh(π) for x ∈ [−π, π] and extended periodically for the remaining values of x (this can be derived by a Fourier expansion of the function x 7→ cosh(|x| − π)). In particular, one can calculate that x 7→ ψt(x) is a Gaussian process with covariance of finite 1-variation (see the remark at the end of the section for this fact), hence ψt can be lifted to process Ψt with sample paths in the rough 0,p−var bpc d paths space C ([0, 2π],G (R )) for any p > 2. Furthermore, for any s < t, x 7→ (ψs(x), ψt(x)) is a Gaussian process which fulfils the assumptions of Theorem 4.0.1 and the covariance R(ψs,ψt) also has finite 1-variation, uniformly bounded for all s < t, hence

2 sup |R(ψs,ψt)|1−var;[0,2π] =: c1 < ∞. s

Therefore, for any γ ∈ (1, p/2) and q ≥ 1 there is a constant C = C(p, γ, c1, q) such that

1 1− γ |ρp−var(Ψs, Ψt)|Lq ≤ C sup |ψt(x) − ψs(x)|L2 x∈[0,2π] holds for all s < t. A straightforward calculation (cf. [Hai11, Lemma 3.1]) shows that

1/4 |ψt(x) − ψs(x)|L2 ≤ c2|t − s| for a constant c2. In particular, we can find γ and q large enough such that

q 1 (1 − ) − 1 1 1  1 1 1 α < 4 γ = − − < − . q 4 4γ q 4 2p

Since C0,p−var is a Polish space, we can apply the usual Kolmogorov continuity criterion to conclude.

Remark 4.3.4. We emphasize that here, for every fixed t, the process ψt(·) is a Gaussian process, where the spatial variable x should now be viewed as ’time’. This idea is due to M.Hairer. Knowing that the spatial regularity is ’almost’ 1/2 for every fixed time t, one could guess that covariance of this spatial Gaussian process has finite 1-variation. For a formal calculation, we refer to [Hai11] or [FH].

106 5

Spatial rough path lifts of stochastic convolutions

The lack of spatial regularity of solutions to PDE subjected to space-time white noise (or other infinite-dimensional noise) can cause serious obstacles concerning well-posedness and stability. In recent work Hairer and coauthors [Hai11, HW13, Hai, HMW13] realized that several inter- esting, at first sight ill-posed, non-linear stochastic PDE can be solved by constructing a spatial rough path associated to the linearized equation. This program was carried out for a system of stochastic Burger’s equations (with motivation from path sampling problems) and more recently for the KPZ equation on the one dimensional torus. In both cases, the linearized equation is the classical one dimensional stochastic heat equation

dΨt = (∆ − 1) Ψtdt + dWt, on [0, 2π], (SHE)

2 with periodic boundary conditions. Here ∆ is the one dimensional Laplacian, i.e. ∆ = ∂x, the summand −1 allows to consider the corresponding stationary solution and W is space-time white noise integrated in time. We shall follow Hairer’s setup and consider d i.i.d. realizations of (SHE). The question then arises how to construct a spatial rough path over x 7→ Ψ(t, x; ω) for fixed time t. If such a rough path lift has been constructed one can view (SHE) as an evolution in a rough path space, a point of view which has proven extremely fruitful in solving new classes of until now ill-posed stochastic PDE. In [Hai11] Hairer succeeded in establishing finite 1-variation (in 2D) sense of the covariance of the stationary solution to (SHE), i.e. of (x, y) 7→ EΨ(t, x)Ψ(t, y). According to (Gaussian) rough path theory in the form of [FV10a, FV10b] this gives a “canonical” way of lifting Ψ to a rough path Ψ. The rough path Ψ is a “level 2” rough path or more precisely: Ψ is a 1/p-H¨oldergeometric rough path for any p > 2 as function of x, with local regularity properties akin to standard Brownian motion as function of t. It is clear that the Brownian-like regularity of x 7→ Ψ(t, x; ω) is due to the competition between the smoothing effects of the Laplacian and the roughness of space-time white noise. Truncation of the higher noise modes (or suitable “coloring”) leads to better spatial regularity; on the other hand, replacing ∆ by a fractional Laplacian, i.e. considering

α dΨt = (− (−∆) − 1) Ψtdt + dWt, on [0, 2π], (fSHE) for some α ∈ (0, 1) dampens the smoothing effect and x 7→ Ψ(t, x; ω) will have “rougher” regularity properties than a standard Brownian motion. One thus expects ρ-variation regularity for the spatial covariance of x 7→ Ψ(t, x; ω) only for some ρ > 1 and subsequently only the existence of a “rougher” rough path, i.e. necessarily with higher p than before. Applied in the present context, our main insights/results are as follows:

• The local covariance/decorrelation structure of the stationary fractional stochastic heat equation (fSHE) (in x) is akin to the same structure for fractional Brownian motion (in Rough path lifts of stoch. convolutions

t). In quantitative terms, 1 fBM with Hurst parameter H ↔ fSHE with α = + H. 2 As a consequence one has finite 2D-ρ-variation of the covariance1 with 1 1 ρ = for fBM with H ≤ , 2H 2 1 ρ = for fSHE with α ≤ 1. 2α − 1

• Recalling criticality of ρ∗ = 2 we are able to lift the stationary solution to (fSHE) in the spatial variable to a rough path provided 3 α > α∗ = , 4 ∗ 1 similar to the well-known condition H > H = 4 for fBM (cf. [CQ02]). More precisely, 1 2 the resulting (geometric rough) path enjoys p -H¨olderregularity for any p > 2ρ = 2α−1 . 5 1 3 When α > 6 we have ρ = 2α−1 < 2 and can pick p < 3. The resulting rough path can 1 1 then be realized as a “level 2” rough path. In the general case (similar to H ∈ ( 4 , 3 ] in the fBM setting) one must go beyond the stochastic area and control the third level iterated integrals. We emphasize that the notoriously difficult third-level computation need not be repeated in the present context. Everything is obtained as application of available general theory, once finite ρ-variation of the covariance is established. A satisfactory approximation theory is also available, based on uniform ρ-variation estimates.

• On a more technical level, we give a novel criterion to control the 2D-variation of the covariance of (not-necessarily Gaussian) processes with stationary increments and variance satisfying a decay condition at 0 and a concavity property; this translates into the required local covariance/decorrelation structure to which we alluded in the first point above. Checking this criterion for fBM is essentially trivial. For fSHE this involves a fair amount of Fourier analysis, as does the quest for uniform 2D-ρ-variation estimates, e.g. in the case of smoothing fSHE with hyper-viscosity. Among others, we are then led to consider convexity, H¨olderregularity and L1-estimates for Fourier series.

We believe that following a similar route as Hairer in [Hai]2 our techniques may prove useful in solving the fractional KPZ equation

α 2 dXt = −(−∆) Xtdt + (∇Xt) dt + dWt, on R, with α ∈ (0, 1). Such fractional KPZ equations arise as models of growing surfaces with impu- rities (cf. [MW97, XTHX12, Kat03]). Another motivation are path sampling problems for diffusions driven by fractional noise. Recall that the original motivation for studying vector-valued stochastic Burger’s equations emerged in path sampling problems for SDE of the form

d dZu = AZudu + f(Zu)du + CdBu, in R , (5.1)

d d on [0, 2π]. Here f : R → R is a possibly non-linear function and B denotes standard Brownian d motion in R . In a series of papers [HSVW05, HSV07, Hai11] the authors realized that the law 1The result for fBM is of course known (cf. [FV11] and the references therein). 2Let us note that Hairer considers the KPZ equation on the torus while most literature is on the whole real line. In the end of Section 5.3 we demonstrate that our results may be used to construct local spatial rough path lifts of solutions to (fSHE) on the real line.

108 Main Result

d L(Z) on C([0, 2π]; R ) and conditioned on the endpoints coincides with the invariant measure of vector-valued stochastic Burger’s equations of the type n i i X i j i dXt = ∆Xt + gj(Xt)∂xXt dt + dWt , i = 1, ..., d, (5.2) j=1 with appropriate boundary conditions. Based on this, efficient sampling algorithms for the paths of Z conditioned on the endpoints may be derived. It is tempting to try a similar approach in the fractional case; that is, to sample the law of (5.1) with B replaced by fractional Brownian motion BH conditional on its endpoints, via the stationary solution of a suitable fractional SPDE. However, combining the heuristics found in [HSV07] suggests an SPDE of the form (fSHE) with appropriate boundary conditions and with an additional non-local, nonlinear term. At present, handling the resulting SPDE is an open problem, although one suspects that the present considerations will prove useful in this regard. As this work near completion we learned that Gubinelli and coauthors [GIP12] adapted Hairer’s analysis of the stochastic Burger’s equation to the fractional case (i.e. (5.2) with ∆ α 5 replaced by −(−∆) for α > 6 ) and established existence and uniqueness of solutions. In doing so they construct a rough path associated to fSHE by direct calculations under the stronger 5 condition α > 6 . In contrast, our construction of the associated rough path is based on general Gaussian rough path theory and thus allows immediate application of the corresponding general results, such as stability under approximations and moment estimates. Without further effort we deduce estimates on the rate of convergence of approximations following from general Gaussian rough path techniques. On the other hand, a very important contribution of [GIP12] is the possibility of going beyond space dimension one which is not at all within the scope of our considerations here. The chapter is structured as follows: In Section 5.1 we establish a general condition proving finite ρ-variation for processes with stationary increments and convex or concave variance func- tion. We then use this condition to prove the existence of rough path lifts for such processes. The application of this general result to stationary solutions of fractional stochastic heat equa- tions thus requires proof of concavity of their (spatial) variance function. Sufficient conditions for the concavity of such variance functions in terms of their Fourier coefficients are derived in Section 5.2 and used in Section 5.3 to lift strictly stationary Ornstein-Uhlenbeck processes corresponding to fractional stochastic heat equations to Gaussian rough paths with respect to their space variable. Moreover, we prove strong convergence in rough path metric of hyper- viscosity and Galerkin approximations. Section 5.3 is then concluded by a brief investigation of the stochastic fractional heat equation on the whole real line.

5.1 Main Result

In this section, we shall consider centered, continuous stochastic processes with IID compo- nents X = X1,...,Xd and stationary increments. If X is Gaussian, the construction of a (geometric) rough path associated to X then naturally passes through an understanding of the two-dimensional ρ-variation of the covariance of X1. For brevity, we abuse notation and write X ≡ X1 until further notice. The law of such a process is fully determined by  t, t + u  σ2 (u) := X2 = R . E t,t+u X t, t + u

Lemma 5.1.1. (i) Assume that σ2 (·) is concave on [0, h] for some h > 0. Then, non- overlapping increments are non-positively correlated in the sense that  s, t  X X = R ≤ 0, E s,t u,v X u, v

109 Rough path lifts of stoch. convolutions

for all 0 ≤ s ≤ t ≤ u ≤ v ≤ h.

(ii) Assume in addition that σ2 (·) restricted to [0, h] is non-decreasing. Then,

2 2 0 ≤ EXs,tXu,v = |EXs,tXu,v| ≤ EXu,v = σ (v − u) ,

for all 0 ≤ s ≤ u ≤ v ≤ t ≤ h.

Proof. (1): This result can be found in [MR06, Lemma 7.2.7]. It just follows from the identity

2 2 2 2 2EXs,tXu,v = σ (v − s) + σ (u − t) − σ (v − t) − σ (u − s) and the concavity of the function σ2. (2): Note Xs,tXu,v = (a + b + c) b where a = Xs,u, b = Xu,v, c = Xv,t. Applying the algebraic identity 2 (a + b + c) b = (a + b)2 − a2 + (c + b)2 − c2 and taking expectations yields

2 2 2 2 2EXs,tXu,v = EXs,v − EXs,u + Xu,t − EXv,t = σ2 (v − s) − σ2 (u − s) + σ2 (t − u) − σ2 (t − v) ≥ 0 + 0, where we used that σ2 (·) is non-decreasing. We thus see

0 ≤ EXs,tXu,v = |EXs,tXu,v| .

On the other hand, from (a + b + c) b = b2 + ab + cb, and then non-positive correlation of the non-overlapping increments,

2 2 EXs,tXu,v = EXu,v + EXs,uXu,v + EXv,tXu,v ≤ EXu,v. | {z } ≤0

This concludes the proof of part (ii).

Theorem 5.1.2. Let X be a real-valued stochastic process on [0,T ] with stationary increments 2 2 2 and covariance function σ (u) := E(Xu−X0) . Assume that σ : [0,T ] → R+ is concave and that 1 2 ρ 2 |σ (u)| ≤ Cσ|u| for all u ∈ [0,T ]. Assume further that for some h > 0, σ|[0,h] is non-decreasing. Then the covariance of X is of finite ρ-variation on every rectangle [s, t] × [u, v] ⊆ [0,T ]2. More precisely, if the interior of [s, t] × [u, v] does not intersect with the diagonal D = {(x, x): x ∈ [0,T ]}, then

ρ ρp p Vρ(RX ;[s, t] × [u, v]) ≤ Cσ |t − s| |v − u|. (5.3)

 2 If [s, t] × [u, v] is contained in the strip Sh = (x, y) ∈ [0, 1] : |x − y| ≤ h , then

ρ Vρ(RX ;[s, t] × [u, v]) ≤ C(|t − s| ∧ |v − u|) (5.4) holds for some constant C = C(ρ, Cσ) > 0. In particular, there is a constant C1 = C1(ρ, Cσ, h) such that

2 ρ Vρ(RX ;[s, t] ) ≤ C1|t − s| holds for all [s, t]2 ⊆ [0,T ]2 and the covariance of X has finite H¨older-controlled ρ0-variation for all ρ0 > ρ on [0,T ].

110 Main Result

2 Proof. We start by proving (5.3): Take a rectangle [s, t] × [u, v] ⊆ [0,T ] and let (ti) be any 0 dissection of [u, v] and (tj) any dissection of [s, t]. Assume first that [s, t] × [u, v] does not intersect with D. We have, using that concavity of σ2 implies negative correlations along disjoint increments (cf. Lemma 5.1.1) and the Cauchy-Schwarz inequality, ρ   ρ X ti, ti+1 X ρ X 0 0 0 0 RX 0 0 = Xti,ti+1 Xt ,t ≤ Xti,ti+1 Xt ,t t , t E j j+1 E j j+1 0 j j+1 0 0 ti,tj ti,tj ti,tj ρ 2 ρ/2 2 ρ/2 ρp p ≤ |EXs,tXu,v| ≤ σ (t − s) σ (v − u) ≤ Cσ |t − s| |v − u|. Taking the supremum over all partitions shows (5.3). Let us now prove (5.4): Let [s, t] × [u, v] ⊆ Sh and assume that |v − u| ≤ |t − s| ≤ h. The proof relies on separating diagonal and off-diagonal rectangles. We will distinguish 5 cases: case 1: u ≤ v ≤ s ≤ t, case 2: u ≤ s ≤ v ≤ t, case 3: s ≤ u ≤ v ≤ t, case 4: s ≤ u ≤ t ≤ v and case 5: s ≤ t ≤ u ≤ v. We will only prove the cases 1 to 3, case 4 is similar to case 2 and case 5 is similar to case 1. Case 1: For any ti < ti+1 ∈ [u, v] we have, using again Lemma 5.1.1, ρ X ρ Xt ,t X 0 0 ] ≤ | Xt ,t Xs,t| E i i+1 tj ,tj+1 E i i+1 0 tj and 2 |EXti,ti+1 Xs,t| ≤ |EXti,ti+1 Xti,s| + |EXti,ti+1 Xti,t| ≤ 2σ (ti+1 − ti). Hence ρ X ρ X 2 ρ ρ ρ X ρ ρ Xt ,t X 0 0 ≤ 2 σ (ti+1 − ti) ≤ 2 C |ti+1 − ti| ≤ 2 C |v − u| E i i+1 tj ,tj+1 σ σ 0 t t ti,tj i i and passing to the supremum gives ρ Vρ(RX ;[s, t] × [u, v]) ≤ C1|v − u| ρ ρ where C1 = 2 Cσ. Case 3: For ti < ti+1 ∈ [u, v] we have ρ 1−ρ X 1−ρ ρ ρ 3 Xt ,t X 0 0 ≤ 3 | Xt ,t X·| ≤ | Xt ,t X·| E i i+1 tj ,tj+1 E i i+1 ρ−var;[s,t] E i i+1 ρ−var;[s,ti] 0 tj ρ (5.5) + | Xt ,t X·| E i i+1 ρ−var;[ti,ti+1] ρ + | Xt ,t X·| . E i i+1 ρ−var;[ti+1,t] As in case 1, we see that

|EXti,ti+1 X·|ρ−var;[s,ti] ≤ |EXti,ti+1 Xs,ti | 2 ≤ |EXti,ti+1 Xs,ti+1 | + |EXti,ti+1 | 2 ≤ 2σ (ti+1 − ti). The third term is bounded analogously. For the middle term in (5.5) we estimate

ρ X ρ 0 0 |EXti,ti+1 X·|ρ−var;[t ,t ] = sup |EXti,ti+1 Xt ,t | i i+1 0 j j+1 D ⊂[ti,ti+1] 0 0 tj ∈D X 2 0 0 ρ ≤ sup σ (tj+1 − tj) 0 D ⊂[ti,ti+1] 0 0 tj ∈D ρ ≤ Cσ|ti+1 − ti|.

111 Rough path lifts of stoch. convolutions

Using these estimates in (5.5) yields

X ρ | X X 0 0 | ≤ C |t − t | E ti,ti+1 tj ,tj+1 3 i+1 i tj for some constant C3 and we conclude as in case 1. Case 2: We use the estimates from the cases 1 and 3 to see that

1−ρ ρ ρ ρ 2 Vρ(RX ;[s, t] × [u, v]) ≤ Vρ(RX ;[s, t] × [u, s]) + Vρ(RX ;[s, t] × [s, v])

≤ C1|s − u| + C3|v − s|

≤ (C1 ∨ C3)|v − u|.

This finishes the proof of (5.4) If [s, t] × [u, v] is any rectangle contained in [0,T ]2, it can be written as a finite union of rectangles {A1,...,An} which are contained in Sh or lie off-diagonal. From the inequality

n X Vρ(RX ;[s, t] × [u, v]) ≤ C(ρ, n) Vρ(RX ; Ak) k=1 it follows that indeed the ρ-variation is is finite over all rectangles. From equation (5.4), we know that

2 ρ Vρ(RX ;[s, t] ) ≤ C|t − s| holds for all [s, t]2 provided that |t − s| ≤ h. Using (5.3) and choosing the constant C larger, we may conclude that the estimate holds for all squares [s, t]2 ⊆ [0,T ]2. With [FV11, Theorem 1] this implies the claimed finite H¨older-controlled ρ0-variation for all ρ0 > ρ.

In the sense of the following Remark, the assumption of σ2 to be non-decreasing on [0, h] in Theorem 5.1.2 is superfluous. We have required it only in order to get a control on the ρ-variation on the complete interval [0, h].

2 2 Remark 5.1.3. Let σ : [0,T ] → R+ be a continuous, concave function with σ (0) = 0 for some T > 0. Then there is an h ∈ (0,T ] such that σ2 is non-decreasing on [0, h].

2 Proof. Since σ : [0,T ] → R+ is a continuous, concave function, each local maximum is a global  2 2 maximum. Let h := inf t ∈ [0,T ]| σ (t) = maxr∈[0,T ] σ (r) . Without loss of generality we may assume σ2 6≡ 0. Thus, h > 0 and σ2 is non-decreasing on [0, h], since otherwise σ2 would 2 have a local maximum in [0, h), thus attaining maxr∈[0,T ] σ (r) in [0, h) in contradiction to the definition of h.

A continuous, centered Gaussian process satisfying the assumptions from Theorem 5.1.2 with ρ ∈ [1, 2) can be lifted to a geometric p-rough path on the interval [0,T ] (cf. [FV10a, Theorem 35], [FV10b, Theorem 15.33]). Thus, we obtain

1 d d Corollary 5.1.4. Let X = (X , ..., X ): [0,T ] → R be a centered continuous stochastic pro- cess with independent components such that each Xi has stationary increments with covariance 2,i i i 2 2,i function σ (u) := E(Xu − X0) . Assume that for all i = 1, . . . , d, σ : [0,T ] → R+ is concave, 1 2,i ρ 2 |σ (u)| ≤ Cσ|u| for all u ∈ [0,T ] and some ρ ≥ 1 and that there is an h > 0 such that σ|[0,h] 2 d×d is non-decreasing. Then the covariance function R: [0,T ] → R has finite ρ-variation and finite H¨older-controlled ρ0-variation for every ρ0 > ρ. In particular, if X is Gaussian and ρ < 2, [p] d there exists a continuous G (R )-valued process X such that

112 Conditions in terms of Fourier coefficients

0, 1 −Holder¨ p [p] d (i) X is a geometric p-rough path with X ∈ C0 ([0,T ],G (R )) almost surely for every p > 2ρ,

(ii) X lifts X in the sense that π1(Xt) = Xt − X0,

(iii) There is a C = C(ρ, Cσ, h) such that for all s < t in [0,T ] and q ∈ [1, ∞),

√ 1 |d(Xs, Xt)|Lq ≤ C q|t − s| 2ρ

(iv) (Fernique-estimates) for all p > 2ρ there exists η = η(p, ρ, h) > 0, such that

ηkXk2 1 −Hol¨ ;[0,T ] Ee p < ∞.

Remark 5.1.5. At this point, one might ask what can be said in the case where σ2(u) = 2 E(Xu − X0) is convex on the interval [0, h]. Proceeding as in Lemma 5.1.1 (i), it can be shown that non-overlapping increments are non-negatively correlated in this case. By writing large increments Xs,t as a sum of smaller increments Xs,t = Xs,u1 + Xu1,u2 + Xu2,t, s ≤ u1 ≤ u2 ≤ t, it then follows that

 s, t  R = X X ≥ 0, X u, v E s,t u,v

2 for every rectangle [s, t] × [u, v] ⊆ [0, h] . In other words, every rectangular increment of RX is positive. This readily implies finite 1-variation over every rectangle [s, t] × [u, v] ⊆ [0, h]2. More precisely, we have the estimate

V1(RX ;[s, t] × [u, v]) ≤ |EXs,tXu,v|.

Furthermore, by convexity, 0 ≤ σ2(u) ≤ uσ2(1) for u ∈ [0, 1] we deduce p p V1(RX ;[s, t] × [u, v]) . |t − s| |v − u|, for all [s, t]2 ⊆ [0, h]2. The same is of course true for multidimensional processes X = (X1,...,Xd) under the 2;i i i 2 condition that every σ (u) = E(Xu−X0) is convex on [0, h]. if X is Gaussian, this implies that 0, 1 −H¨older p [p] d we can lift X to a process with sample paths in the rough paths space C0 ([0, h],G (R )) for every p > 2. Moreover, the concatenation of geometric rough paths over adjacent intervals is again a rough path (cf. [CLL12], Lemma 4.9) and hence we can lift the sample paths of Gaussian d processes X : [0,T ] → R to Gaussian rough paths on the whole interval [0,T ] provided there is a (possibly small) h > 0 such that every σ2;i is convex on [0, h]. For instance, this applies to fractional Brownian motion with Hurst parameter H ≥ 1/2. However, from a rough paths point of view, these cases are rather trivial since we can use It¯o calculus or Young’s integration theory to define iterated integrals.

5.2 Conditions in terms of Fourier coefficients

The application of Theorem 5.1.2 to stationary solutions Ψ(x) to the stochastic fractional heat equation (fSHE), as functions of their spatial variable, leads to continuous, stationary, centered 1 1 Gaussian processes Ψ : S → R, S = [0, 2π]/{0, 2π}, with Fourier decomposition X Ψ(x) = Zkeikx, k∈Z

113 Rough path lifts of stoch. convolutions where Zk are complex Gaussian random variables3 with covariance a ZkZ¯l = δ k E k,l 2 and real valued coefficients ak satisfying ak = a−k for all k ∈ N. Then, the covariance R is of the form 1 X R (x, y) = Ψ(x)Ψ(y) = Ψ(x)Ψ(y) = a eik(x−y), ∀x, y ∈ [0, 2π]. K E E 2 k k∈Z Therefore, ∞ a0 X R (x, y) = K(|x − y|) = + a cos(k|x − y|), (5.6) K 2 k k=1 for some K ∈ C([0, 2π]). In the following let ∆, ∆2 be the first and second forward-difference operators, i.e. for a sequence {ak}k∈N ∆ak := ak+1 − ak and ∆2 := ∆ ◦ ∆. Moreover, let

n n X ikx X Dn(x) := e = 1 + 2 cos(kx), x ∈ R k=−n k=1 be the Dirichlet kernel and n X Fn(x) := Dk(x), x ∈ R, k=0 be the unnormalized Fej´erkernel.

2 2 Lemma 5.2.1. Let {ak}k∈N be such that ∆ (k ak) ≤ 0, ∀k ∈ N and 3 2 2 lim k |∆ ak| + k |∆ak| + k|ak| = 0. (5.7) k→∞ Then ∞ a0 X K(x) = + a cos(kx) 2 k k=1 exists locally uniformly in (0, 2π), is convex on [0, 2π] and decreasing on [0, π]. Proof. We first note that since

2 2 ∆(k ak) = k ∆ak + (2k + 1)ak+1 and 2 2 2 2 ∆ (k ak) = k ∆ ak + 2(2k + 1)∆ak+1 + 2ak+2 assumption (5.7) is equivalent to

2 2 2 lim |k∆ (k ak)| + |∆(k ak)| + k|ak| = 0. (5.8) k→∞ We now follow ideas from [Kra11]. Using the Abel transformation we observe

n n a0 X 1 X 1 S (x) = + a cos(kx) = ∆a D (x) + a D (x). n 2 k 2 k+1 k 2 n+1 n k=1 k=0

3Actually, the Gaussian structure plays no role in this section. Our entire analysis applies whenever the covariance function has the form (5.6).

114 Conditions in terms of Fourier coefficients

P∞ By the assumptions and (5.8) we have k=1 |∆ak| < ∞. Since supn∈N Dn(x) is bounded locally uniformly on (0, 2π) and an → 0 we observe that

∞ ∞ a0 X 1 X K(x) := + a cos(kx) = ∆a D (x) 2 k 2 k k k=1 k=0 exists locally uniformly and is continuous in (0, 2π). The Ces`aro means of the sequence Sn(x) are given by n   a0 X k σ (x) = + 1 − a cos(kx). n 2 n + 1 k k=1

By Fej´er’sTheorem [Zyg59, Theorem III.3.4] and continuity of K, σn → K locally uniformly 00 00 in (0, 2π). Hence, σn → K in the space of distributions on (0, 2π). Clearly, n X  k  σ00(x) = − 1 − k2a cos(kx). n n + 1 k k=0

 k  2 Let βk := 1 − n+1 k ak. Using summation by parts twice we obtain

n 00 X 2σn(x) = ∆βkDk(x) k=0 n−1 X 2 = ∆βnFn(x) − ∆ βkFk(x) k=0 n−1 n−1 2 2 2 ! 2 X X k∆ (k ak) 2∆ (k + 1) ak+1 n = − ∆2(k2a )F (x) − − F (x) + a F (x). k k n + 1 n + 1 k n + 1 n n k=0 k=0

C C By (5.8) and 0 ≤ Fn(x) ≤ x2 + (2π−x)2 it follows

00 lim inf inf σn(x) ≥ 0, n→∞ x∈[ε,2π−ε]

∞ for every ε > 0. For any non-negative test-function ϕ ∈ Cc (0, 2π) Fatou’s Lemma implies Z 2π 00 00 00 K (ϕ) = lim σn(ϕ) ≥ lim inf σn(x)ϕ(x)dx ≥ 0, n→∞ 0 n→∞ i.e. K00 is a non-negative distribution on (0, 2π). Thus, K is convex on [0, 2π]. Assume now that K is not decreasing on [0, π], i.e. there are x < y ∈ [0, π] such that K(x) < K(y). Since K is given as a cosine series, we have K(x) = K(x0) and K(y) = K(y0) for x0 = 2π − x and y0 = 2π − y. Choose t ∈ (0, 1) such that tx + (1 − t)x0 = y. Then

K(tx + (1 − t)x0) = K(y) > K(x) = tK(x) + (1 − t)K(x0) which is in contradiction to the convexity of K.

2 So far we need to require concavity of the Fourier coefficients k ak in order to control the 2D-ρ variation of the corresponding covariance function RK . Smoothing RK should preserve the control of its 2D-ρ variation, while it does not have to preserve concavity of the Fourier coefficients. The following Proposition shows that finiteness of the 2D-ρ variation is indeed preserved under convolution with a measure of finite total variation and that it can be bounded by the total variation norm of this measure (cf. also [FV10b, Proposition 5.64]).

115 Rough path lifts of stoch. convolutions

In the following let M(S1) be the space of signed, real Borel-measures on S1 with finite 1 1 total variation k · kTV . We say that a sequence µn ∈ M(S ) weakly converges to µ ∈ M(S ) if Z 2π Z 2π µn(ϕ) := ϕ(x)dµn(x) → µ(ϕ) = ϕ(x)dµ(x), for n → ∞, 0 0 for all 2π-periodic Lipschitz functions ϕ ∈ Lip([0, 2π]). Define Mw(S1) to be M(S1) endowed 1 1 1 with the topology of weak convergence. For B ∈ L (S ) we set µB := B dx ∈ M(S ) to be the associated measure with density B. Proposition 5.2.2. Assume that the covariance of Ψ is of the form

X ik(x−y) R(x, y) = akbke (5.9) k∈Z 1 with ak = a−k, bk = b−k real-valued for all k ∈ N and there is a real-valued measure µ ∈ M(S ) such that Z 2π −ikz bk = e µ(dz). 0 Moreover, assume that R (x, y) = P a eik(x−y) is of finite ρ-variation on [0, 2π]2 and K k∈Z k P |a | < ∞. Then R is of finite ρ-variation with k∈Z k

Vρ(R;[s, t] × [u, v]) ≤ kµkTV sup Vρ(RK ;[s − z, t − z] × [u, v]) 0≤z≤2π 2 ≤ 2kµkTV Vρ(RK ; [0, 2π] ) for every [s, t] × [u, v] ⊆ [0, 2π]2. Both estimates also hold for controlled ρ-variation. Proof. Since P |a | < ∞ we observe k∈Z k Z 2π X ik(x−y) R(x, y) = akbke = (RK (·, y) ∗ µ)(x) = RK (x − z, y) dµ(z) 0 k∈Z Thus, using Jensen’s inequality,   ρ Z 2π   ρ x, y x − z, y − z R 0 0 ≤ RK 0 0 d|µ|(z) x , y 0 x , y Z 2π   ρ ρ x − z, y − z |µ|(z) ≤ kµkTV RK 0 0 d . 0 x , y kµkTV We will first show the estimate for the controlled ρ-variation. Let Π ∈ P ([s, t] × [u, v]). Then   ρ Z 2π   ρ X x, y ρ X x − z, y − z |µ|(z) R 0 0 ≤ kµkTV RK 0 0 d x , y x , y kµkTV [x,y]×[x0,y0]∈Π 0 [x,y]×[x0,y0]∈Π Z 2π ρ ρ |µ|(z) ≤ kµkTV |RK |ρ−var;[s−z,t−z]×[u,v]d 0 kµkTV ρ ρ ≤ kµkTV sup |RK |ρ−var;[s−z,t−z]×[u,v]. 0≤z≤2π Taking the supremum over all partitions yields the first inequality. For the second inequality, we use periodicity to see that ρ ρ |RK |ρ−var;[s−z,t−z]×[u,v] ≤ |RK |ρ−var;[−2π,2π]×[0,2π] ρ−1  ρ ρ  ≤ 2 |RK |ρ−var;[−2π,0]×[0,2π] + |RK |ρ−var;[0,2π]2 ρ ρ ≤ 2 |RK |ρ−var;[0,2π]2 .

116 Conditions in terms of Fourier coefficients

The estimate for the usual ρ-variation follows exactly in the same way by considering only grid-like partitions of [s, t] × [u, v], i.e. partitions of the form

 0 0 0 [ti, ti+1] × [tj, tj+1] | (ti) partition of [s, t] and (tj) partition of [u, v] .

Remark 5.2.3. In many cases, z 7→ Vρ(RK ;[s − z, t − z] × [s, t]) attains its maximum at z = 0. If µ = δ0, we have bk = 1 for every k and the estimate above is sharp. In order to use Proposition 5.2.2 to control the ρ-variation of R(x, y), we need to control kµkTV . Since bk = b−k we observe

∞ ! X b0 X b eikx = 2 + b cos(kx) . k 2 k k∈Z k=1 Recall

Lemma 5.2.4. Let {bk}k∈Z be a sequence satisfying bk = b−k and bk → b ∈ R for k → ∞ and 1 Pn ikx let Sn(x) := 2π k=−n bke . Assume one of the following conditions P∞ (i) k=1 |bk − b| < ∞. P∞ (ii) there exists a non-increasing sequence Ak such that k=0 Ak < ∞ and |∆bk| ≤ Ak for all k ≥ 0.

(iii) bk is quasi-convex, i.e. ∞ X 2 (k + 1)|∆ bk| < ∞. k=0

Then, B(x) = 1 P (b − b)eikx locally uniformly on (0, 2π) and the right hand side is 2π k∈Z k the Fourier series of B. Moreover,

1 µSn * µB + bδ0 =: µ, weakly in M(S )

R 2π −ikz and bk = 0 e µ(dz). Moreover, there is a numerical constant C > 0 such that

P∞  k=1 |bk − b|, in case (1) P∞ kµkTV ≤ |b| + C k=1 Ak, in case (2) (5.10) P∞ 2 k=0(k + 1)|∆ bk|, in case (3). Proof. Step 1: We first restrict to the case b = 0. It is enough to prove that B(x) = 1 P b eikx exists locally uniformly and in L1([0, 2π]) 2π k∈Z k with L1-bound corresponding to (5.10). (1): obvious. P∞ (2): Assume that there exists a decreasing sequence Ak such that k=0 Ak < ∞ and |∆bk| ≤ Ak for all k ≥ 0. In this case the claim has been proven in [Tel73]. For completeness we include the proof. Arguing as in the proof of Lemma 5.2.1 we obtain that

∞ ∞ b0 X 1 X 2πB(x) = + b cos(kx) = ∆b D (x) 2 k 2 k k k=1 k=0 exists locally uniformly in (0, 2π).

117 Rough path lifts of stoch. convolutions

P∞ Since k=0 Ak < ∞ we have kAk → 0. Using summation by parts twice we obtain

∞ ∞ ∞ k ! b0 X 1 X ∆bk 1 X X ∆bi + bk cos(kx) = Ak Dk(x) = |∆Ak| Di(x) . 2 2 Ak 2 Ai k=1 k=0 k=0 i=0 Hence,

Z 2π ∞ ∞ Z 2π k b0 X 1 X X ∆bi + bk cos(kx) dx ≤ ∆Ak Di(x) dx. 2 2 Ai 0 k=1 k=0 0 i=0

In [Tel73, Lemma 1] it is shown that there is a constant C > 0 such that for each sequence

{ai}i∈N with |ai| ≤ 1

Z 2π k X aiDi(x) dx ≤ C(k + 1). 0 i=0 Therefore,

Z 2π ∞ ∞ ∞ b0 X X X + b cos(kx) dx ≤ C (∆A )(k + 1) = C A < ∞. 2 k k k 0 k=1 k=0 k=0

b0 P∞ 1 Consequently, 2 + k=1 bk cos(kx) ∈ L ([0, 2π]) and bi are the Fourier coefficients corresponding to B. P∞ 2 (3): Let now k=0(k + 1)|∆ bk| < ∞. A direct proof of this classical result can be found in [Kol23]. Here, we will prove that the assumptions in (i) are implied. We choose Ak := P∞ 2 i=k |∆ bi|. Then ∞ ∞ ∞ ∞ X X X 2 X 2 Ak = |∆ bi| = (k + 1)|∆ bk| < ∞ k=0 k=0 i=k k=0 and the claim follows from (2). Step 2: Let now b ∈ R. We split up Sn as

n n X ikx X ikx 2πSn(x) = bke = (bk − b)e + bDn(x). k=−n k=−n By the first step we know that

X ikx 2πB(x) = (bk − b)e k∈Z exists locally uniformly in (0, 2π) and in L1([0, 2π]) with bound on the L1-norm given by (5.10). 1 Pn ikx w 1 Therefore, the measures µBn = 2π k=−n(bk − b)e dx converge to µB in M (S ). It is w 1 well-known that Dn(x) → 2πδ0 in M (S ). Thus,

w 1 Sn(x) → µB + bδ0 =: µ, in M (S ).

The bound on the total variation norm of µ then follows from step one.

Lemma 5.2.4 in combination with Proposition 5.2.2 allows to derive bounds on the ρ- variation of covariance functions of type (5.9) depending on µ only via its total variation norm. Since we will use this to prove uniform estimates, we will need the following uniform estimates on the L1-norm of Fourier series.

118 Conditions in terms of Fourier coefficients

1 τ m Lemma 5.2.5. Let b ∈ C (0, ∞) with b(r) → 0 for r → ∞ and bk := b(τ k) for some τ, m > 0. If

τ τ (i) b is convex, non-increasing, then bk satisfies the assumptions of Lemma 5.2.4, (1), B (x) = τ b0 P∞ τ 2 + k=1 bk cos(kx) exists locally uniformly in (0, 2π) and τ kB kL1([0,2π]) ≤ Cb0, for some C > 0.

2 00 1 τ (ii) b ∈ C (0, ∞) with r|b (r)| ∈ L (R+), then bk satisfies the assumptions of Lemma 5.2.4, (2) and Z ∞ τ 00 kB kL1([0,2π]) ≤ C r|b (r)|dr, 0 for some C > 0 with Bτ as in (1).

0 τ Proof. (1): Since b, |b | are non-increasing ∆bk ≤ 0 and −∆bk is non-increasing. We set P∞ Ak := −∆bk. Clearly, k=0 Ak = 2b0 and the claim follows from Lemma 5.2.4. (2): Let bτ (r) := b(τ mr) and observe

Z k+2 Z k+2 Z k+2 Z s 2 τ τ τ τ τ 0 τ 0 τ 00 ∆ bk = bk+2 − 2bk+1 + bk = (b ) (s)ds − (b ) (s − 1)ds = (b ) (r)drds. k+1 k+1 k+1 s−1 Since (bτ )00 (r)dr = τ mb00(τ mr)d(τ mr), we obtain

∞ ∞ Z k+2 Z s X 2 τ X τ 00 (k + 1)|∆ bk| ≤ (r + 1)| (b ) (r)|drds k=0 k=0 k+1 s−1 ∞ Z ∞ Z k+2 X τ 00 ≤ 1[s−1,s](r)(r + 1)| (b ) (r)|dsdr k=0 0 k+1 Z ∞ Z ∞ τ 00 = 1[r,r+1](s)(r + 1)| (b ) (r)|dsdr 0 1 Z ∞ ≤ (r + 1)(r ∧ 1)| (bτ )00 (r)|dr 0 Z ∞ = τ m(τ −mr + 1)(τ −mr ∧ 1)|b00(r)|dr 0 Z ∞ Z ∞ = r(τ −mr ∧ 1)|b00(r)|dr + (r ∧ τ m)|b00(r)|dr 0 0 Z ∞ ≤ 2 r|b00(r)|dr. 0 By Lemma 5.2.4 this finishes the proof.

Proposition 5.2.2 and Lemma 5.2.4 motivate the following partial order between bounded sequences

Definition 5.2.6. (i) A sequence (bk) is negligible if (bk) satisfies one condition (1) or (2) from Lemma 5.2.4.

τ τ (ii) A family of sequences (bk) is uniformly negligible if (bk) satisfies condition (1) or (2) from Lemma 5.2.4 with uniformly bounded right hand side in (5.10).

(iii) For two bounded sequences (ak),(ck) we write (ck)  (ak) if there is a negligible sequence (bk) such that ck = akbk for every k.

119 Rough path lifts of stoch. convolutions

The reason for this definition is that as soon as we can control the ρ-variation of RK (x, y) = P a eik(x−y) and P |a | < ∞ we also have a control on the ρ-variation of R(x, y) = k∈Z k k∈Z k P c eik(x−y) for each sequence (c )  (a ) by Proposition 5.2.2 and Lemma 5.2.4. As an k∈Z k k k immediate Corollary we obtain

Proposition 5.2.7. Assume that the covariance of Ψ is of the form

X ik(x−y) R(x, y) = cke k∈Z

2 2 with ck  ak and ak satisfying ak = a−k, ∆ (k ak) ≤ 0 for all k ∈ N,

3 2 2 lim k |∆ ak| + k |∆ak| + k|ak| = 0 k→∞

 −(1+ 1 ) and (ak) is non-increasing with ak = O k ρ for some ρ ≥ 1. Then the series K(x) = P a cos(kx) is the Fourier series of a 1 -H¨olderfunction, the covariance R of the random k∈Z k ρ process x 7→ (Ψ1(x), ..., Ψd(x)) is of finite ρ-variation and there is a constant C > 0 such that

2 ρ Vρ(R;[x, y] ) ≤ C|y − x|, holds for all [x, y]2 ⊆ [0, 2π]2. The constant C depends only on ρ, the H¨older-normof K and the right hand side of (5.10). If, in addition, ρ ∈ [1, 2) then x 7→ (Ψ1(x), ..., Ψd(x)) lifts to a 0, 1 −Holder¨ p [p] d geometric p-rough path Ψ ∈ C0 ([0, 2π],G (R )) almost surely for every p > 2ρ.

Proof. We first consider the case ck = ak for all k ∈ N and verify the assumptions in Corollary 5.1.4. By assumption x 7→ (Ψ1(x), ..., Ψd(x)) is a centered, continuous Gaussian process with independent components. By Lemma 5.2.1 we know that

2 1 1 2 x 7→ σ (x) = E|Ψ (x) − Ψ (0)| = 2(K(0) − K(x)) is concave on [0, 2π] and non-decreasing on [0, π]. Moreover, in [Lor48, Satz 8] it is shown that 1 1 1  −(1+ ) 2 K is -H¨olderif and only if ak = O k ρ and in particular σ (x) ≤ 2|K| 1 |x| ρ . Hence, ρ ρ −H¨ol Theorem 5.1.2 and Corollary 5.1.4 yield the claim. In the case of general sequences (bk) Proposition 5.2.2 together with Lemma 5.2.4 imply the claim.

5.3 Lifting Ornstein-Uhlenbeck processes in space

Recall that we aim to construct spatial geometric rough path lifts of Ornstein-Uhlenbeck pro- cesses corresponding to

i α i i dΨt = (−(−∆) − 1)Ψt dt + dWt (5.11) with periodic boundary conditions on [0, 2π]. Note that (−∆)α admits an orthonormal basis of eigenvectors in L2([0, 2π]) which up to multiplicities is given by the Fourier basis

 sin(kx), k > 0  1 ek(x) := 2 , k = 0 cos(kx), k < 0.

120 Lifting Ornstein-Uhlenbeck processes in space

As a natural generalization of the fractional Laplacian (−∆)α in (5.11) we now consider the operator A on L2([0, 2π]) given by

( ) 2 X 2 2 D(A) = x ∈ L ([0, 2π]) µ (x, ek) < ∞ k k∈Z (5.12) X Ax = µk(x, ek), k∈Z where (µk) is a sequence satisfying µ−k = µk and µk ≥ 0 for all k ∈ Z. The case of the α 2α (fractional) Laplace operator A = (−∆) is recovered with the choice µk = |k| . We consider (possibly) colored Wiener noise Wt with covariance operator Q given by Qek := σkek, where σk is a non-negative sequence of real numbers satisfying σk = σ−k. Thus, (infor- mally) we can write X √ k Wt = σkβt ek, k∈N k d for a sequence β of independent R -valued standard Brownian motions. We will apply Corollary 5.1.4 to construct geometric rough path lifts of the strictly stationary solution to i i i dΨt = (−A − λ)Ψtdt + dWt , i = 1, ..., d. (5.13) with λ > 0. Note that ek diagonalizes A + λ with eigenvalues

λk = µk + λ ≥ λ > 0.

Assume from now on

X σk < ∞. (5.14) λk k∈Z Then there is a unique strictly stationary mild solution Ψ to (5.13). The decomposition

X k X k ikx Ψ(t, x; ω) = Yt (ω)ek(x) = Zt (ω)e (5.15) k∈Z k∈Z leads to a decoupled, infinite system of d-dimensional Ornstein-Uhlenbeck processes:

k k √ k dYt = −λkYt dt + σkdβt ,

k 1  −|k| |k| k and Zt := 2 Yt − isgn(k)Yt with sgn(0) := 0, for all k ∈ Z. The Y are independent d R -valued Gaussian processes with stationary increments and i.i.d. components. Since λk = λ−k k −k and σk = σ−k we have L(Y ) = L(Y ) for all k ∈ N. In the following we consider a single component of Ψ and Y k and suppress the corresponding index for simplicity of notation. Note σ k l k l −λk|t−s| k EZt Zs = EYt Ys = e δk,l (5.16) 2λk and thus

1 X σk ik(x−y) RK (x, y) = EΨ(t, x)Ψ(t, y) = e = K(x − y) 2 2λk k∈Z with ∞ σ0 X σk X σk K(x) = + cos(kx) = eikx. 4λ0 2λk 4λk k=1 k∈Z

121 Rough path lifts of stoch. convolutions

    Proposition 5.3.1. (i) Assume that σk  (k−2α) for some α ∈ ( 1 , 1] and that σk is λk 2 λk eventually non-increasing. Then, for every t ≥ 0, the spatial process x 7→ Ψt(x) is a stationary, centered Gaussian process which admits a continuous modification (which we denote by the same symbol). Moreover, the covariance RK is of finite ρ-variation for all 1 ρ ≥ 2α−1 and 2 1 Vρ(RK ;[x, y] ) ≤ C|x − y| ρ 2 2 3 holds for all [x, y] ⊆ [0, 2π] and for each t ∈ [0,T ]. If α > 4 , the process x 7→ Ψt(x) lifts 0, 1 −Holder¨ p [p] d to a geometric p-rough path Ψ(t) ∈ C0 ([0, 2π],G (R )) almost surely for every 2 p > 2α−1 .

4 (ii) Assume that (σk) is bounded, (λk) is eventually non-decreasing and that

|∆λ |  1  k = O . (5.17) λk k

Then there is a continuous (even H¨oldercontinuous) modification of the map

0, 1 −Holder¨ Ψ : [0,T ] → C p ([0, 2π],G[p]( d)) 0 R (5.18) t 7→ Ψ(t).

Proof. (1): It is clear that the spatial processes are stationary, centered and Gaussian. Con-   cerning continuity, note that since negligible sequences are bounded, σk  (k−2α) implies λk 2α   that there is a constant C such that σk ≤ C 1 holds for all k. Since σk is eventually λk k λk non-increasing, there is a non-increasing sequence (ak) and a sequence (bk) for which bk = 0 for σk all k ≥ N for some N ∈ , and = ak + bk. It follows that there is a constant C1 such that N λk 1 2α |ak| ≤ C1 k and [Lor48, Satz 8] implies that the corresponding cosine series defines 2α − 1 H¨oldercontinuous function. Hence also K is the Fourier series of a 2α − 1 H¨oldercontinuous function. Therefore,

2 2α−1 E |Ψt(x) − Ψt(y)| = 2|K(0) − K(x − y)| . |x − y| which implies that there is a continuous modification. Hence we can apply Proposition 5.2.7 which yields the claim. (2): We will derive the existence of a continuous modification by application of Kolmogorov’s continuity theorem. Therefore, we need an estimate on a q-th moment of the distance in the d 1 metric of the rough paths Ψ(t), Ψ(s) at different times 0 ≤ s < t ≤ T . Such an estimate p −Hol¨ can be obtained by applying [FV10b, Theorem 15.37]: Let 0 ≤ s ≤ t ≤ T , τ := |t − s| and X(x) := (Ψ1(t, x), ..., Ψd(t, x)), Y (x) := (Ψ1(s, x), ..., Ψd(s, x)). We will first prove that the 0 0 1 covariance of (X,Y ) has finite H¨olderdominated ρ -variation for all ρ > 2α−1 , uniformly in τ. Note that

1 1 1 1 Rτ (x, y) = EX (x)Y (y) = EΨ (t, x)Ψ (s, y) X σk = e−λkτ cos(k(x − y)) 4λk k∈Z  2α  X σk|k|   = |k|−2εαe−λkτ |k|−2(1−ε)α cos(k(x − y)) 4λk k∈Z

4These conditions may be relaxed in various ways. However, we decided to formulate it like that for the sake of simplicity.

122 Lifting Ornstein-Uhlenbeck processes in space for every ε > 0. We will apply Proposition 5.2.2 multiple times. We therefore need to proof that the first two sequences are (uniformly) negligible. For the first sequence this follows from our assumptions. We will show that the ∆-criterion holds for the sequence (|k|−2εαe−τλk ). By the mean value theorem,

−τλ |∆λk| −ξ −1 |∆e k | ≤ sup |ξe | . |k| . λk ∧ λk+1 ξ>0

Therefore,

−2εα −τλ −2εα −2εα −τλ −1−2εα |∆(|k| e k )| ≤ |∆|k| | + |k| |∆e k | . |k| .

Since this is summable for α, ε > 0 we can apply Lemma 5.2.4 which shows that the second 0 sequence is uniformly negligible. This implies that Rτ has finite H¨olderdominated ρ variation 0 1 for ρ = 2(1−ε)α−1 , uniformly in τ > 0, for every ε > 0. Choose ε > 0 small enough such that p > 2ρ0 holds. Then [FV10a, Theorem 37] implies √ θ d 1 −Hol¨ (Ψ(t), Ψ(s)) ≤ C q|RΨ(t)−Ψ(s)|∞, p Lq for some θ = θ(p, ρ0) > 0 and all q ∈ [1, ∞). In order to estimate the right hand side we note

1 1 1 1 |RΨ1(t)−Ψ1(s)(x, y)| = |EΨ (t, x)Ψ (t, y) − Ψ (t, x)Ψ (s, y) − Ψ1(t, y)Ψ1(s, x) + Ψ1(s, x)Ψ1(s, y)|

X σk   = 1 − e−λkτ eik(x−y) 2λk k∈Z X σk ≤ 1 − e−λkτ λk k∈N 2α0−1 X 1−2α0 X k σk ≤ C σk|t − s| + CN λk k≤N k>N 0 ≤ C(N|t − s| + N 1−2α )

0 − 1 for α ∈ (1/2, α). We then choose N ∼ |t − s| 2α0 to obtain

1 1− 0 |RΨ1(t)−Ψ1(s)(x, y)| ≤ C|t − s| 2α , and thus √ θ(1− 1 ) 2α0 d 1 −Hol¨ (Ψ(t), Ψ(s)) ≤ C q|t − s| , p Lq 2 for some θ > 0 and all p > 2α−1 , q ∈ [1, ∞). Kolmogorov’s continuity Theorem proves that for small γ > 0 there is a modification of t 7→ Ψ(t) (again denoted by Ψ(t)) such that

1  q  q EkΨkγ−Hol¨ ;[0,T ] ≤ C < ∞, for some constant C = C(γ, T, ρ, θ, q).

Example 5.3.2 We consider the stochastic fractional heat equation with (possibly) colored noise on the 1-dimensional torus, i.e.

i α i − γ i dΨt = (−A Ψt − λ) dt + dA 2 Wt , i = 1, ..., d, (5.19)

123 Rough path lifts of stoch. convolutions

α α where A = (−∆) , α ∈ (0, 1], γ ≥ 0, λ > 0 and Wt is a cylindrical Wiener process. Hence,   2α −2γ σk −(2γ+2α) λk = |k| + λ and σk = |k| . We claim that  (k ). To see this, we need to λk  |k|2α  1 show that λ+|k|2α is negligible. Set g(x) = λx+1 . By the mean value theorem,

 2α  |k| 0 −2α −2α −2α−1 ∆ ≤ |g |∞ |k + 1| − |k| |k| . λ + |k|2α . Since this is summable the claim is shown for α > 0. Therefore, the assumptions of Proposition 3 5.3.1 (1) are satisfied if 2γ + 2α > 2 . Clearly,

2α |∆λk| |∆|k| | 2α−1 −2α −1 = 2α ∼ |k| |k| ∼ |k| , λk |k| + λ

1 and (5.17) holds. Thus, there is a geometric p-rough path ( p -H¨oldercontinuous) Ψt, lifting the 2 strictly stationary solution Ψt to (5.19) in space for every p > 2γ+2α−1 and t ≥ 0. Moreover, there is a continuous modification of the map

0, 1 −Holder¨ p [p] d Ψ : [0,T ] → C0 ([0, 2π],G (R )).

3.1 Stability and approximations In this section we will consider hyper-viscosity approximations and Galerkin approximations to Ψ and prove the strong convergence of the corresponding rough paths lifts. The hyper-viscosity approximation Ψε = (Ψε,1,..., Ψε,d) is the solution to

ε,i β ε,i i dΨt = (−A − εA − λ)Ψt dt + dWt , i = 1, ..., d, (5.20) for some (large) β ≥ 1 and ε > 0. The Ψε are called (hyper-)viscosity approximations of Ψ. As β before, ek diagonalizes (A + εA + λ) with eigenvalues

ε β λk = µk + εµk + λ > 0. The covariance Rε of every component of Ψε(t) is given by Rε(x, y) = Kε(|x − y|) where

ε X σk cos(kx) K (x) = ε . 4λk k∈Z

N The Galerkin approximation Ψt of Ψt is defined to be the projection of Ψ onto the (2N +1)- dimensional subspace spanned by {ek}|k|≤N . This process solves the SPDE

N,i N,i i dΨt = (−PN A − λ)Ψt dt + dPN Wt , i = 1, ..., d (5.21)

N where PN A has the eigenvalues µk1|k|≤N and PN Wt has the covariance operator Q given by N N N,i N N Q ek = 1|k|≤N σkek. The covariance R of Ψ (t) is of the form R (x, y) = K (|x−y|) where

X σk cos(kx) KN (x) = . 4λk |k|≤N

ε N One easily checks that we can lift the spatial sample paths of Ψt and Ψt to Gaussian rough ε N paths and find continuous modifications of t 7→ Ψt resp. t 7→ Ψt . Moreover, we can prove the following strong convergence result:

124 Lifting Ornstein-Uhlenbeck processes in space

1 Proposition 5.3.3. Under the same assumptions as in Proposition 5.3.1, for every ρ > 2α−1 there are constants C1,C2 such that

2 1 2 1 sup Vρ(R ε ;[x, y] ) ≤ C1|x − y| ρ and sup Vρ(R N ;[x, y] ) ≤ C2|x − y| ρ (5.22) (Ψt,Ψt ) (Ψt,Ψt ) ε>0 N∈N

2 2 2 holds for every square [x, y] ⊆ [0, 2π] and t ∈ [0,T ]. Moreover, for every p > 2α−1 , q > 0 and t ∈ [0,T ], one has

ε N d 1 (Ψt, Ψ ) → 0 and d 1 (Ψt, Ψ ) → 0, (5.23) −H¨ol t q −H¨ol t q p L (P) p L (P) for ε → 0 resp. N → ∞.

Proof. Hyper-viscosity approximation: To prove (5.22), we have to show that

ε 2 1/ρ ε 2 1/ρ sup Vρ(R ;[x, y] ) . |x − y| and sup Vρ(f ;[x, y] ) . |x − y| (5.24) ε>0 ε>0

2 ε  i ε,i  holds for every [x, y] where f (x, y) = E Ψt(x)Ψt (y) . In order to get a control on the ρ- variation of Rε we apply Proposition 5.2.2 and Lemma 5.2.4 multiple times. To do so, we need σk to decompose the coefficients ε as a product of a well-controlled sequence (ak) and possibly 4λk 0 multiple negligible coefficients. For simplicity of notation we write λk := λk. We note     σk −2(1−δ)α 2α σk −2δα λk ε = |k| |k| |k| ε . 4λk 4λk λk We need to show that the second and the third sequence are uniformly negligible. For the  β −1 λk µk second sequence, this follows by our assumptions. Note that ε = 1 + ε and by the λk λk mean value theorem,

 β  µk ∆ β ! λ   ξ  λk µ k 1−β k ∆ ≤ sup β β ∼ µk ∆ . λε (1 + ξ)2 µ µ λ k ξ>0 k ∧ k+1 k λk λk+1 Furthermore,

−1 |∆λk| |∆µk| |∆λk | ≤ 2 ≤ 2 λk µk and thus ! µβ k β −1 −1 β β−2 −1 β ∆ ≤ µk |∆λk | + λk+1|∆µk | . µk |∆µk| + µk |∆µk |, λk which shows that   β λk |∆µk| |∆µ | |∆λk| −1 ∆ + k ∼ = O(k ), λε . µ β λ k k µk k

λk uniformly in ε. Since ε ≤ 1, λk     −2δα λk −2δα −2δα λk −1−2δα ∆ |k| ε ≤ |∆|k| | + |k| ∆ ε = O(k ), λk λk

125 Rough path lifts of stoch. convolutions

  −2δα λk 1 which shows that for every δ > 0, |k| ε is uniformly negligible. For any ρ > , we λk 2α−1 1 can choose δ > 0 small enough such that ρ = 2(1−δ)α−1 holds. This shows the left hand side of (5.24). For Ψi(x) = P Y ke (x) and Ψε,i(x) = P Y k,εe (x) we have t k∈Z t k t k∈Z t k Z 0 Z 0 l,ε √ ε √ k λks k λl s l EYt Yt = E e σk dβs e σl dβs −∞ −∞ Z ∞ ε σk −(λk+λk)s = σk e ds δk,l = ε δk,l. 0 λk + λk Hence

ε X k k,ε f (x, y) = ek(x)ek(y)EYt Yt k∈Z X σk = ek(x)ek(y) ε λk + λk k∈Z ∞ σ0 X σk = ε + ε cos(k(x − y)). 2(λ0 + λ ) λk + λ 0 k=1 k We decompose the coefficients as follows: ! σ  σ  λ = |k|−2(1−δ)α |k|2α k |k|−2δα k λ + λε λ β k k k 2λk + εµk

 β −1 λk µk for δ > 0. Noting that β = 2 + ε , we can proceed as above to see that also the λk 2λk+εµk right hand side of (5.24) holds and thus (5.22) is shown in the hyper-viscosity case. It remains to prove (5.23). Using [FV10a, Theorem 37] and the Cauchy Schwarz inequality, it is enough to show that

ε 2 sup E|Ψt(x) − Ψt (x)| → 0 (5.25) x∈[0,2π] for ε → 0. We have

ε 2 2 ε 2 ε E|Ψ(x, t) − Ψ (x, t)| = E|Ψ(x, t)| + E|Ψ (x, t)| − 2EΨ(x, t)Ψ (x, t)

X σk σk σk ≤ + ε − ε . 4λk 4λ λk + λ k k k Now

σk σk σk σk σk σk σk + − ≤ + + ≤ 4λ 4λε λ + λε 4λ β β λ k k k k k 4λk + 4εµk 2λk + εµk k and since this is summable, we can use dominated convergence to see that indeed (5.25) holds for ε → 0. Galerkin approximation: Again we have to prove that

N 2 1/ρ N 2 1/ρ sup Vρ(R ;[x, y] ) . |x − y| and sup Vρ(f ;[x, y] ) . |x − y| (5.26) N∈N N∈N

N N,i i holds where f (x, y) = EΨt (x)Ψt(y). Note that   N X −2(1−δ)α 2α σk −2δα K (x) = |k| |k| (|k| 1|k|≤N ) cos(kx) 4µk k∈Z

126 Lifting Ornstein-Uhlenbeck processes in space for all δ > 0. The second sequence is negligible by assumption. Using Proposition 5.2.2, it suffices to show that the Fourier series

N X −2δα X −2δα B = |k| 1|k|≤N ek = |k| ek k∈Z |k|≤N

1 −2δα −2δα−1 −2δα is uniformly bounded in L ([0, 2π]). Since ∆k = O(k ) and limk→∞ log(k)k = 0 we can apply the Sidon-Telyakovskii Theorem (cf. [Tel73, Theorem 4]) to obtain BN → B for N → ∞ in L1([0, 2π]). This proves the left hand side of (5.26). If ΨN,i(x) = P Y˜ ke (x) t k∈Z t k ˜ k 1 k with Yt = |k|≤N Yt , one has

N X k ˜ k X k 2 X σk f (x, y) = ek(x)ek(y)EYt Yt = ek(x)ek(y)E|Yt | = cos(k(x − y)) 4µk k∈Z |k|≤N |k|≤N = KN (x − y).

With the first part, this implies (5.26) and thus (5.22) in the case of the Galerkin approximations. Furthermore,

N 2 X σk X σk X σk X σk E|Ψ(x, t) − Ψ (x, t)| ≤ + − 2 = → 0 4µk 4µk 4µk 4µk k∈Z |k|≤N |k|≤N |k|>N for N → ∞ due to summability.

ζ Remark 5.3.4. One can check, using Lemma 5.2.5, that in the special case µk = |k| we have

2 1 ε ρ sup Vρ(R(Ψt,Ψ );[x, y] ) ≤ C1|x − y| ε>0 t

1 2 2 even for ρ = 2α−1 for all squares [x, y] ⊆ [0, 2π] . Remark 5.3.5. The bounds obtained in (5.22) can also be used for other purposes. For instance, they become crucial when proving uniform (exponential) integrability of certain related stochastic integrals, a question raised in [Hai11]. See [FR13] for further details. Remark 5.3.6. Our calculations are well suited to determine also the rate of convergence in (5.23) for the inhomogeneous rough paths metric ρ 1 (·, ·) (cf. [FV10b] for the formal p −H¨ol definition of these metrics). As an example, we consider the Galerkin approximations. Since Z ∞ X σk X 1 −2α . 2α ∼ x dx, 2µk |k| |k|>N |k|>N N

N 2 we have E|Ψ(x, t) − Ψ (x, t)| → 0 for N → ∞ with rate 2α − 1. Using the results of [FR] (see also [RX12]), for every ε > 0 there is a p = pρ,ε > 2ρ such that

1 1 ρ N b c N 21− 2 −ε % 1 (Ψt, Ψ ) q 2 p sup |Ψ(x, t) − Ψ (x, t)| . −H¨ol t q . E p L (P) x∈[0,2π]

1 Hence for every ε > 0, we can choose ρ close enough to 2α−1 to see that

N % 1 (Ψt, Ψ ) → 0, −H¨ol t q p L (P)

3 for N → ∞ with rate 2α − 2 − ε. Using a Borel-Cantelli argument, we also obtain almost sure convergence with the same rate. Note that in general, one has to choose p large in order to obtain the optimal convergence rate (cf. [FR] for the optimal choice of p).

127 Rough path lifts of stoch. convolutions

3.2 The continuous case Consider the stationary solution of

α dΨt = −((−∆) + λ)Ψt dt + dWt, on R, for some α ∈ (0, 1], λ > 0. The stationary solution can be written down explicitly (cf. [Wal86]), namely

Z t Z Ψt(x) = Kt−s(x, y) W (ds, dy), −∞ R where K is the fractional heat kernel operator associated to −((−∆)α+λ) with Fourier transform given by

−t|ξ|2α−λt Kˆt(ξ) = e .

After some calculations, one sees that the covariance R of the spatial process x 7→ Ψt(x) for every time point t is given by R(x, y) = K(x − y) where

Z ∞ K(x) = f(ξ)e−ixξ dξ −∞ and 1 f(ξ) = . 2|ξ|2α + 2λ

In order to deduce the existence of a rough path lift of x 7→ Ψt(x) on compact intervals of R by means of Theorem 5.1.2 we have to prove convexity of K, which will follow from Lemma 1 2 2 2α−1 5.3.7 below if α > 2 . It is easy to see that σ (x) = E(Ψt(x) − Ψt(0)) . |x| . Hence, we can apply Corollary 5.1.4 to see that Ψt can be lifted, for every fixed time point t, to a process Ψt 0,β−H¨older [1/β] d with sample paths in C0 (I,G (R )), every β < α − 1/2, provided α > 3/4, where I can be an arbitrary compact interval in R containing 0. It remains to give a continuous version of Lemma 5.2.1 proving convexity of K, i.e. a criterion 1 for the convexity of Fourier transforms. For a function f ∈ L (R) we define Z ∞ fˆ(x) = f(ξ)e−ixξ dξ. −∞

Then the following Lemma holds:

1 Lemma 5.3.7. Assume that f,ˆ f ∈ L (R), f(ξ) = f(−ξ) for all ξ, f is twice differentiable almost everywhere,

lim ξ3|f 00(ξ)| + ξ2|f 0(ξ)| + ξ|f(ξ)| = 0 ξ→∞ and that there is an x0 ∈ (0, ∞] such that

Z R 2 ∂ 2 lim sup 2 (f(ξ) ξ )Fξ(x) dξ ≤ 0, R→∞ 0 ∂ξ

1−cos(ξx) ˆ for all x ∈ (0, x0) where Fξ(x) = x2 denotes the F´ejerkernel. Then f is a convex function on [0, x0).

128 Lifting Ornstein-Uhlenbeck processes in space

Proof. Since the proof is very similar to Lemma 5.2.1 we just sketch it briefly. By F´ejer’s Theorem for Fourier transforms (cf. [K¨or89,Theorem 49.3]),

1 Z R  |ξ| lim 1 − gˆ(ξ)eixξ dξ = g(x), R→∞ 2π −R R for all x provided g ∈ C ∩ L1. Setting g = fˆ, we obtain from Fourier inversion

Z R  |ξ| lim 1 − f(ξ)eixξ dξ = fˆ(x). R→∞ −R R We then proceed as in the proof of Lemma 5.2.1.

Note that given f ∈ L1 it does not follow in general that also fˆ ∈ L1. However, Bernstein’s Theorem states that the Fourier transform of functions f in the Sobolev space Hs are contained 1 1 in L for all s > 2 (cf. [H¨or83,Corollary 7.9.4]).

129 Rough path lifts of stoch. convolutions

130 6

From rough path estimates to multilevel Monte Carlo

We consider implementable schemes for large classes of stochastic differential equations (SDEs)

dYt = V (Yt) dXt (ω)

d driven by multidimensional Gaussian signals, say X = Xt (ω) ∈ R . The interpretation of these equations is in Lyons’ rough path sense [LQ02, LCL07, FV10b]. This requires smooth- ness/boundedness conditions on the vector fields V = (V1,...,Vd); for simplicity, the reader may assume bounded vector fields with bounded derivatives of all order (but we will be more specific later). This also requires a “natural” lift of X (·, ω) to a (random) rough path X· (ω), a situation fairly well understood, cf [FV10b, Ch. 15] and the references therein. For instance, fractional Brownian motion [CQ02] is covered for Hurst parameter H > 1/4. It may help the reader to recall that, in the case when X = B, a multidimensional Brownian motion, all this amounts to enhance B with L´evy’sstochastic area or, equivalently, with all iterated stochastic R t integrals of B against itself, say Bs,t = s Bs,· ⊗ dB. The (rough-)pathwise solution concept then agrees with the usual notion of an SDE solution (in Itˆo-or Stratonovich sense, depending on which integration was used in defining B). As is well-known this provides a robust exten- sion of the usual Itˆoframework of stochastic differential equations with an exploding number of new applications (including non-linear SPDE theory, robustness of the filtering problem, non-Markovian H¨ormander theory). In a sense, the rough path interpretation of a differential equation is most closely related to strong, pathwise error estimates of Euler- resp. Milstein-approximation to stochastic differential equations. For instance, Davie’s definition [Dav07] of a (rough)pathwise SDE solution is

i k i,j Yt − Ys ≡ Ys,t = Vi (Ys) Bs,t + Vi (Ys) ∂kVj (Ys) Bs,t + o (|t − s|) as t ↓ s. (6.1) In fact, this becomes an entirely deterministic definition, only assuming

α 2α ∃α ∈ (1/3, 1/2) : |Bs,t| ≤ C |t − s| , |Bs,t| ≤ C |t − s| , something which is known to hold true almost surely (i.e. for C = C (ω) < ∞ a.s.), and something which is not at all restricted to Brownian motion. As the reader may suspect this approach leads to almost-sure convergence (with rates) of schemes which are based on the iteration of the approximation seen in the right-hand-side of (6.1). The practical trouble is that L´evy’sarea, the antisymmetric part of B, is notoriously difficult to simulate; leave alone the simulation of L´evy’sarea for other Gaussian processes. It has been understood for a while, at least in the Brownian setting, that the truncated (or: simplified) Milstein scheme, in which L´evy’sarea is omitted, i.e. replace Bs,t by Sym (Bs,t) in (6.1), still offers benefits: For instance, Talay [Tal86] replaces L´evyarea by suitable Bernoulli r.v. such as to obtain weak order 1 (see Rough paths estimates and MLMC also Kloeden–Platen [KP92] and the references therein).1 In the multilevel context, [GS12] use this truncated Milstein scheme together with a sophisticated antithetic (variance reduction) method. Finally, in the rough path context this scheme was used in [DNT12]: the convergence of the scheme can be traced down to an underlying Wong-Zakai type approximation for the driving random rough path – a (probabilistic!) result which is known to hold in great generality for stochastic processes, starting with [CQ02] in the context of fractional Brownian motion, see [FV10b, Ch. 15] and the references therein. A rather difficult problem is to go from almost-sure convergence (with rates) to L1 (or ever: Lr any r < ∞) convergence. Indeed, as pointed out in [DNT12, Remark 1.2]: ”Note that the almost sure estimate [for the simplified Milstein scheme] cannot be turned into an L1-estimate [...]. This is a consequence of the use of the rough path method, which exhibits non-integrable (random) constants.” The resolution of this problem forms the first contribution of this chapter. It is based on some recent progress [CLL, FR13], initially developed to prove smoothness of laws for (non-Markovian) SDEs driven by Gaussian signals under a H¨ormander condition, [CF10, HP]. Having established Lr-convergence (any r < ∞, with rates) for implementable “simplified” Milstein schemes we move to our second contribution: a multilevel algorithm, in the sense of Giles [Gil08b], for stochastic differential equations driven by large classes of Gaussian signals. The key remark here is that there is not much downside to replace the weak error estimate (“rate α”) which forms part of Giles’ abstract condition [Gil08b, Theorem 3.1], by the corresponding strong estimate. Indeed, a strong, L2 error estimate (“rate β/2”) is the key assumption in Giles’ complexity theorem, and this is precisely what we have established in the first part. Some other extension of the Giles theorem are necessary; indeed it is crucial to allow for α < 1/2 (ruled out explicitly in [Gil08b]) whenever we deal with driving signals with sample path regularity “worse” then Brownian motion. Luckily this can be done without too much trouble. More precisely, we consider the following scheme for approximating Y , see [DNT12, FV10b].

Given a time-grid 0 = t0 < t1 < ··· < tn = T , the corresponding increments Xtk,tk+1 , k = 0, . . . , n − 1, of the driving noise and a (suitably big) integer N ∈ N, define Y 0 ≡ Y0 and N X 1  i1 il Y t = Y t + Vi ··· Vi I Y t X ··· X , (6.2) k+1 k l! 1 l k tk,tk+1 tk,tk+1 l=1 where I(y) = y is the identity function, V = (V1,...,Vd) for vector fields V1,...,Vd, which have also the interpretation as linear first order operators, acting on functionals by Vig(y) = ∇g(y) · Vi(y). Moreover, the Einstein summation convention is in force. For a more detailed description of the algorithm we refer to Section 6.2. Combining this discretization scheme with multi-level Monte Carlo simulation, we obtain the following main result: Theorem 6.0.1. Assume the Gaussian driving signal has independent components, with H¨older dominated covariance of finite ρ-variation (in the precise sense of [FV10b, Ch. 15], [FV11]; ex- amples include multi-dimensional Brownian motion with ρ = 1 and fractional Brownian motion m n with Hurst parameter H ∈ (1/4, 1/2], with ρ = 1/ (2H)). Let f : C([0,T ], R ) → R be a Lip- schitz continuous functional. Then the Monte Carlo evaluation of a path-dependent functional of the form E(f(Y·)), Y being the solution of an RDE driven by this Gaussian signal, to within a MSE of ε2, can be achieved with computational complexity   2ρ O ε−θ ∀θ > . 2 − ρ In the case of SDEs driven by Brownian motion (ρ = 1) our computational complexity   is arbitrarily ”close” to known result O ε−2 (log ε)2 [Gil08a, Gil08b], recently sharpened to O ε−2 [GS12] with the aid of a suitable antithetic multilevel correction estimator.

1A well-known counter-example by Clark and Cameron [CC80]) shows that it is impossible to get strong order 1 if only using Brownian increments.

132 Rough path estimates revisited

A direct Monte Carlo implementation of the scheme (6.2) would require a complexity of O(−(2+1/α) in order to attain an MSE of no more than 2. Here, α is the weak rate of conver- gence of the scheme. This rate clearly depends on the regularity of the functional, but under only weak regularity conditions, the rate is equal to the strong rate (under stronger regularity condi- tions, the weak rate can be significantly bigger). In this case, the number of time-steps needed −1/α to guarantee a weak discretization error E[f(YT )] − E[f(Y T )] = O() is O( ), whereas the number of samples needed to guarantee a statistical error of order O() is O(−2), which gives the claimed complexity bound of O(−(2+1/α)). Consequently, the use of multilevel Monte Carlo in this highly degenerate case gives an amazing boost in the efficiency, as the complexity can be reduced by a factor .

6.1 Rough path estimates revisited

1.1 Preliminaries

Definition 6.1.1. Let ω be a control. For α > 0 and [s, t] ⊂ [0,T ] we set

τ0 (α) = s

τi+1 (α) = inf {u : ω (τi, u) ≥ α, τi (α) < u ≤ t} ∧ t and define Nα,[s,t] (ω) = sup {n ∈ N∪ {0} : τn (α) < t} .

When ω arises from the (homogeneous) p-variation norm k · kp−var of a (p -rough) path, x, i.e. p ωx = kxkp-var;[·,·], we shall also write Nα,[s,t] (x) := Nα,[s,t] (ωx). Recall that if ω1 and ω2 are controls, also ω1 + ω2 is a control.

Lemma 6.1.2. Let ω1 and ω2 be two controls. Then

1 2 1 2 Nα,[s,t](ω + ω ) ≤ 2Nα,[s,t](ω ) + 2Nα,[s,t](ω ) + 2 for every s < t and α > 0.

Proof. If ω is any control, set X ωα (s, t) := sup ω(ti, ti+1). (t )=D⊂[s,t] i ti ω(ti,ti+1)≤α

1 2 i Ifω ¯ := ω + ω ,ω ¯(ti, ti+1) ≤ α implies ω (ti, ti+1) ≤ α for i = 1, 2 and thereforeω ¯α (s, t) ≤ 1 2 i i  ωα (s, t)+ωα (s, t). From Proposition 4.6 in [CLL] we know that ωα (s, t) ≤ α 2Nα,[s,t] ω + 1 for i = 1, 2. (Strictly speaking, Proposition 4.6 is formulated for a particular control ω, namely the control induced by the p-variation of a rough path. However, the proof only uses general properties of control functions and the conclusion remains valid.) We conclude

Nα,[s,t](¯ω)−1 X αNα,[s,t] (¯ω) = ω¯(τi (α) , τi+1 (α)) i=0

≤ ω¯α(s, t) 1 2 ≤ ωα (s, t) + ωα (s, t) 1 2  ≤ α 2Nα,[s,t] ω + 2Nα,[s,t] ω + 2 .

133 Rough paths estimates and MLMC

Lemma 6.1.3. Let ω1 and ω2 be two controls and assume that ω2(s, t) ≤ K. Then

1 2 1 Nα,[s,t](ω + ω ) ≤ Nα−K,[s,t](ω ) for every α > K. Proof. Setω ¯ := ω1 + ω2 and

τ¯0 (α) = s

τ¯i+1 (α) = inf {u :ω ¯ (¯τi, u) ≥ α, τ¯i (α) < u ≤ t} ∧ t.

1 Similarly, we define (τi)i∈N = (τi(α − K))i∈N for ω . It suffices to show thatτ ¯i ≥ τi for i = 0,...,Nα,[s,t](¯ω). We do this by induction. For i = 0, this is clear. Ifτ ¯i ≥ τi for some i ≤ Nα,[s,t](¯ω) − 1, superadditivity of control functions gives

1 α =ω ¯(¯τi, τ¯i+1) ≤ ω (τi, τ¯i+1) + K which implies τi+1 ≤ τ¯i+1.

1 2 i i i Let x , x be p-rough paths and ω a control. Let V = (V1 ,...,Vd ), i = 1, 2 be two families 1 2 1 2 of vector fields, γ > p, ν a bound on |V |Lipγ and |V |Lipγ and y , y the solutions of the RDEs

i i i i i e dyt = V (yt) dxt; ys ∈ R for s ≤ t and i = 1, 2.

i Lemma 6.1.4. Let s < t ∈ [0,T ] and assume that kx kp−ω;[s,t] ≤ 1 for i = 1, 2. Then there is a constant C = C(γ, p) such that

1 2 h 1 2 1 2 1 2 i ν|y − y |∞;[s,t] ≤ ν|ys − ys | + V − V Lipγ−1 + νρp−ω;[s,t](x , x )  p × (Nα,[s,t](ω) + 1) exp Cν α(Nα,[s,t](ω) + 1) for every α > 0. Proof. Sety ¯ = y1 − y2 and

1 2 V − V γ−1 κ = Lip + ρ (x1, x2). ν p−ω;[s,t] From [FV10b, Theorem 10.26] we can deduce that there is a constant C = C(γ, p) such that

1/p p |y¯u,v| ≤ Cνω(u, v) [|y¯u| + κ] exp {Cν ω(u, v)} for every u < v ∈ [s, t]. From |y¯u,v| ≥ |y¯s,v| − |y¯s,u| we obtain

1/p p |y¯s,v| ≤ Cνω(u, v) [|y¯u| + κ] exp {Cν ω(u, v)} + |y¯s,u| p ≤ [|y¯s| + |y¯s,u| + κ] exp {Cν ω(u, v)} for s ≤ u < v ≤ t. Now let s = τ0 < τ1 < . . . < τM < τM+1 = v ≤ t for M ≥ 0. By induction, one sees that

( M ) p X |y¯s,v| ≤ (M + 1)(|y¯s| + κ) exp Cν ω(τi, τi+1) i=0 ( M ) M+1 p X ≤ C [|y¯s| + κ] exp Cν ω(τi, τi+1) . i=0

134 Rough path estimates revisited

It follows that for every v ∈ [s, t],

 p |y¯s,v| ≤ [|y¯s| + κ](Nα,[s,t](ω) + 1) exp Cν α(Nα,[s,t](ω) + 1) , therefore  p |y¯v| ≤ [|y¯s| + κ](Nα,[s,t](ω) + 1) exp Cν α(Nα,[s,t](ω) + 1) + |y¯s| and finally

 p |y¯|∞;[s,t] ≤ [|y¯s| + κ](Nα,[s,t](ω) + 1) exp Cν α(Nα,[s,t](ω) + 1) .

Next, we recover the well-known local Lipschitz continuity property of the It¯oLyons map. In comparison with [FV10b, Theorem 10.26], the estimate is sharpened in a way that we can use the integrability results of [CLL]..

Corollary 6.1.5. Consider the RDEs

i i i i i e dyt = V (yt) dxt; y0 ∈ R for i = 1, 2 on [0,T ] where V 1 and V 2 are two families of vector fields, γ > p and ν is a bound 1 2 on |V |Lipγ and |V |Lipγ . Then for every α > 0 there is a constant C = C(γ, p, ν, α) such that

1 2 h 1 2 1 2 1 2 i y − y ∞;[0,T ] ≤ C |y0 − y0| + V − V Lipγ−1 + ρp−var;[0,T ](x , x )  1 2  × exp C Nα,[0,T ](x ) + Nα,[0,T ](x ) + 1 holds.

i Proof. Let ω be a control such that kx kp−ω;[0,T ] ≤ 1 for i = 1, 2 (the precise choice of ω will be made later). From 6.1.4 we know that there is a constant C = C(γ, p, ν, α) such that

1 2 h 1 2 1 2 1 2 i y − y ∞;[0,T ] ≤ |y0 − y0| + V − V Lipγ−1 + ρp−ω;[s,t](x , x )  × exp C(Nα,[s,t](ω) + 1) .

Now we set ω = ωx1,x2 where

 p/k bpc ρ(k) (x1, x2) 1 p 2 p X p−var;[s,t] ωx1,x2 (s, t) = kx kp−var;[s,t] + kx kp−var;[s,t] + .  (k) p/k k=1 1 2 ρp−var;[0,T ](x , x )

It is easy to check that

kx1k ≤ 1, p−ωx1,x2 ;[0,T ] kx2k ≤ 1 and p−ωx1,x2 ;[0,T ] ρ (x1, x2) ≤ ρ (x1, x2). p−ωx1,x2 ;[0,T ] p−var;[0,T ]

Finally, if α > bpc we can use Lemma 6.1.3 and Lemma 6.1.2 to see that

Nα,[0,T ](ωx1,x2 ) + 1 ≤ Nα−bpc,[0,T ](ωx1 + ωx2 ) + 1 1 2  ≤ 3 Nα−bpc,[0,T ](x ) + Nα−bpc,[0,T ](x ) + 1 .

Substituting α 7→ α + bpc gives the claimed estimate.

135 Rough paths estimates and MLMC

Remark 6.1.6. Comparing the result above with [FV10b, Theorem 10.26], one sees that we obtain a slightly weaker result; namely, the distance between y1 and y2 is measured here in uniform topology instead of p-variation topology. However, with little more effort, one can show 1 2 1 2 that the same estimate holds for ρp-var; [0,T ] y , y instead of y − y ∞;[0,T ].

1.2 Deterministic convergence of Euler approximations based on entire rough path

We are now interested in convergence rates for Euler schemes. Recall the notation from e [FV10b]: If V = (V1,...,Vd) is a collection of sufficiently smooth vector fields on R , g ∈ N d e T R and y ∈ R , we define an increment of the step-N Euler scheme by

N X k,i1,...,ik E(V ) (y, g) := Vi1 ...Vik I (y) g k=1

k,i1,...,i i1,...,ik e where g k = πk (g) ∈ R, I is the identity on R and every Vj is identified with (k) ∂ the first-order differential operator Vj (y) ∂yk (throughout, we use the Einstein summation convention). Furthermore, we set

g E y := y + E(V ) (y, g) .

p−var bpc d Given D = {0 = t0 < . . . < tn = T } and a path x ∈ C [0,T ]; G R we define the (step-N) Euler approximation to the RDE solution y of

dy = V (y) dx (6.3)

e with starting point y0 ∈ R at time tk ∈ D by

Euler;D t ←t SN (x) SN (x) y := E k 0 y := E tk−1,tk ◦ · · · ◦ E t0,t1 y . tk 0 0

p−var bpc d p Proposition 6.1.7. Let x ∈ C [0,T ]; G R and set ω (s, t) = kxkp−var;[s,t]. Assume θ that V ∈ Lip for some θ > p and let ν ≥ |V |Lipγ . Choose N ∈ N such that bpc ≤ N ≤ Euler;D θ. Fix a dissection D = {0 = t0 < . . . < tn = T } of [0,T ] and let yT denote the step- h N N+1  N Euler approximation of y. Then for every ζ ∈ p , p and α > 0 there is a constant C = C (p, θ, ζ, N, ν, α) such that

n Euler;D   X ζ yT − yT ≤ C exp C Nα,[0,T ](x) + 1 ω (tk−1, tk) . k=1

In particular, if x is a H¨olderrough path and |tk+1 − tk| ≤ |D| for all k we obtain

y − yEuler;D ≤ CT kxkζp exp C N (x) + 1 |D|ζ−1 (6.4) T T 1/p-H¨ol;[0,T ] α,[0,T ]

Proof. We basically repeat the proof of [FV10b, Theorem 10.30]. Recall the notation π(V ) (s, ys; x) for the (unique) solution of (6.3) with starting point ys at time s. Set

k tk←t0  z = π(V ) tk, E y0; x .

Euler;D Then z0 = y , zk = Etk←t0 y for every k = 1, . . . , n and zn = y , hence t t tk 0 T T

n Euler;D X k k−1 yT − yT ≤ zT − zT . k=1

136 Probabilistic convergence results for RDEs

One can easily see that     zk−1 = π t , zk−1 ; x = π t , zk−1; x T (V ) k−1 tk−1 (V ) k tk for all k = 1, . . . , n. Applying Corollary 6.1.5 (in particular the Lipschitzness in the starting point) we obtain for any α > 0

k k−1 k k−1   z − z ≤ c1 z − z exp c1 N (x) + 1 . T T tk tk α,[0,T ]

Moreover (cf. [FV10b, Theorem 10.30]),   k k−1 zt − zt ≤ π(V ) (tk−1, ·, x)t ,t − E(V ) ·,SN (x)t ,t . k k k−1 k k−1 k ∞

N+δ (N+δ)−1 Let δ ∈ [0, 1) such that ζ = p . Since (N + δ) − 1 < N ≤ γ we have V ∈ Lip . Thus we can apply [FV10b, Corollary 10.15] to see that

   N+δ

π(V ) (tk−1, ·, x)t ,t − E(V ) ·,SN (x)t ,t ≤ c2 |V |Lip(N+δ)−1 kxkp−var;[t ,t ] k−1 k k−1 k ∞ k−1 k pζ ζ ≤ c2 |V |Lipγ ω (tk−1, tk) which gives the claim.

6.2 Probabilistic convergence results for RDEs

2.1 Lr-rates for step-N Euler approximation (based on entire rough path) We now give convergence rates for the step-N Euler scheme in Lp. Although the scheme described here is not easy to implement (simulation of higher order iterated integrals!) it will serve as a stepping stone to establish the rates for the (easy-to-implement) simplified Euler scheme discussed later in this section. For simplicity, we formulate it only in the H¨oldercase.

d Theorem 6.2.1. Let X : [0,T ] → R be a continuous, centered Gaussian process with indepen-  2 1/ρ dent components. Assume further that Vρ RX ;[s, t] ≤ K |t − s| holds for all s < t, some ρ ∈ [1, 2) and a constant K. Let H denote the Cameron-Martin space associated to X. Assume that ι: H ,→ Cq−var

θ and let M ≥ |ι|op. Choose p > 2ρ and assume that V ∈ Lip for some θ > p and let ν ≥ Euler;D |V |Lipθ . Set D = {0 <  < 2 < . . . < (bT/c − 1) < T } and let YT denote the step-N Euler approximation of Y , the (pathwise) solution of

e dY = V (Y ) dX ; Y0 ∈ R

0 h N N+1  where N is chosen such that bpc ≤ N ≤ θ. Then for every r ≥ 1, r > r and ζ ∈ p , p there is a constant C = C(ρ, p, q, θ, ν, K, M, r, r0, N, ζ) such that

Euler;D ζp ζ−1 YT − YT ≤ CT kXk 0  Lr 1/p-H¨ol;[0,T ] Lr holds for all  > 0.

137 Rough paths estimates and MLMC

N+1 N+1 Remark 6.2.2. By choosing pˆ ∈ (2ρ, p) one has p < pˆ and applying the Theorem with pˆ instead of p shows that Euler;D N+1 −1 p YT − YT .  Lr holds for every p > 2ρ if  → 0.

Proof. Similar to the proof of the forthcoming Theorem 6.2.4: We use the pathwise estimate (6.4) and take the Lr norm on both sides. The H¨olderinequality shows that

Euler;D ζp   ζ−1 YT − YT ≤ c1T kXk 0 exp C Nα,[0,T ](X) + 1 r00 |D| Lr 1/p-H¨ol;[0,T ] Lr L holds for some (possibly large) r00 > r. Applying the results of [CLL] (see also [FR13]) shows   that exp C N (X) + 1 00 < ∞ holds for all α > 0. However, if we want a bound α,[0,T ] Lr depending only on K and M, we have to (and can!) choose α large enough (using the result of [FR13]) to obtain   exp C N (X) + 1 00 ≤ c < ∞. α,[0,T ] Lr 2

2.2 Lr-rates for Wong-Zakai approximations We aim to formulate a version of the Wong-Zakai Theorem which contains convergence rates in Lr, any r ≥ 1 for a class of suitable approximations X of X. By this, we mean that

 d+d ;i i ;j j (i) (X ,X) : [0,T ] → R is jointly Gaussian, X ,X and X ,X are independent for i 6= j and  2 sup Vρ R(X,X); [0,T ] =: K < ∞ ∈(0,1] for some ρ ∈ [1, 2).

(ii) Uniform convergence of the second moments:

h  2i 1/ρ sup E |Xt − Xt| =: δ () → 0 for  → 0. t∈[0,T ]

Example 6.2.3 A typical example of such approximations are the piecewise linear ap- proximations of X (ω) at the time points {0 <  < 2 < . . . < (bT/c − 1) < T } (see [FV10b,  2 1/ρ Chapter 15.5]). In the case Vρ RX ;[s, t] . |t − s| (i.e. if we deal with H¨olderrough paths), one can show that δ () . .

d Theorem 6.2.4. Let X : [0,T ] → R be a centered Gaussian process with continuous sample paths and independent components and X a suitable approximation as above for some ρ ∈ [1.2). Let H¯ denote the Cameron-Martin space of the joint process (X,X). Assume that complementary Young regularity holds for the sample paths of (X,X) and the paths in the ¯ 1 1 Cameron Martin space H , i.e. there are p, q ≥ 1 such that p > 2ρ, p + q > 1 and that there is a continuous embedding ι : H¯ ,→ Cq−var and furthermore  sup |ι |op =: M < ∞. ∈(0,1]

138 Probabilistic convergence results for RDEs

Let X and Xdenote the lift of X resp. X to a process with p-rough sample paths for some e p > 2ρ. Let V = (V1,...,Vd) be a collection of vector fields in R with |V |Lipθ ≤ ν < ∞ where 2ρ  e θ ≥ ρ−1 . Let Y,Y : [0,T ] → R denote the pathwise solutions to the equations

e dYt = V (Yt) dXt; Y0 ∈ R     e dYt = V (Yt ) dXt; Y0 = Y0 ∈ R .

1 1 Then, for any η < ρ − 2 and r ≥ 1 there is a constant C = C(ρ, p, q, θ, ν, K, M, η, r) such that

 η |Y − Y |∞;[0,T ] ≤ Cδ () Lr holds for all  > 0.

Remark 6.2.5. The assumptions on the Cameron-Martin paths are always√ fulfilled for ρ ∈ [1, 3/2) using the Cameron-Martin embedding from [FV10b]. In this case, M ≤ K. In the case ρ ∈ [3/2, 2), they are still fulfilled provided complementary Young-regularity holds for the sample paths of X and the paths in its Cameron-Martin space H and if the operators Λ : ω 7→ ω are uniformly bounded as operators Cq−var → Cq−var. This is the case, for instance, when dealing with piecewise-linear or mollifier approximations.

Proof. Set X0 = X and let H denote the Cameron-Martin space associated to X. By assump- tion, we know that

|h|q−var ≤ M |h|H holds for every h ∈ H and  ≥ 0. Lemma 5 together with Corollary 2 and Remark 1 in [FR13] show that there is a α = α(p, ρ, K) > 0 and a positive constant c1 = c1 (p, q, ρ, M) such that we have the uniform tail estimate

 n 2/p 2/qo P (Nα,[0,T ](X ) > u) ≤ exp −c1α u

1  ρ  for all u > 0 and  ≥ 0. Choose γ such that η = 2ρ 1 − γ . By assumption on η we have 1 1 ˆ   ˆ ρ + γ > 1 and we can choosep ˆ ∈ (2γ, θ). Set X = Sbpˆc (X ) and X = Sbpˆc (X). Lipschitzness of the map Sbpˆc and [FR13, Lemma 2] show that also

ˆ  n 2/p 2/qo P (Nα,[0,T ](X ) > u) ≤ exp −c1α u (6.5) hold for all u > 0 and  ≥ 0 for a possibly smaller α > 0. Now we use Corollary 6.1.5 and the Cauchy-Schwarz inequality to see that n  o  ˆ  ˆ ˆ  ˆ |Y − Y |∞;[0,T ] ≤ c2 ρpˆ−var;[0,T ](X , X) exp c2 Nα,[0,T ](X ) + Nα,[0,T ](X) + 1 Lr L2r L2r for a constant c2. The uniform tail estimates (6.5) show that

n  ˆ  ˆ o sup exp c2 Nα,[0,T ](X ) + Nα,[0,T ](X) + 1 ≤ c3 < ∞. ≥0 L2r

Using [FR, Theorem 6] gives

1− ρ ˆ  ˆ  γ η ρpˆ−var;[0,T ](X , X) ≤ c4 sup |X − Xt| 2 = δ () 2r t L L t∈[0,T ] for a constant c4 which finishes the proof.

139 Rough paths estimates and MLMC

2.3 Lr-rates for the simplified Euler schemes For N ≥ 2, step-N Euler schemes contain iterated integrals whose distributions are not easy to simulate when dealing with Gaussian processes. In contrast, the simplified step-N Euler schemes avoid this difficulty by substituting the iterated integrals by a product of increments. In the context of fractional Brownian motion, it was introduced in [DNT12]. We make the e following definition: If V = (V1,...,Vd) is sufficiently smooth, x is a p-rough path, y ∈ R and N ≥ bpc, we set

N   X 1 Esimple y, S (x) := V ...V I (y) xi1 ··· xik (V ) N s,t k! i1 ik s,t s,t k=1 for s < t and SN (x)s,t simple   Esimple y := y + E(V ) y, SN (x)s,t .

p−var bpc d Given D = {0 = t0 < . . . < tn = T } and a path x ∈ C [0,T ]; G R we define the simplified (step-N) Euler approximation to the RDE solution y of

dy = V (y) dx

e with starting point y0 ∈ R at time tk ∈ D by

SN (x) S (x) ysimple Euler;D := Etk←t0 y := E tk−1,tk ◦ · · · ◦ E N t0,t1 y tk simple 0 simple simple 0 and at time t ∈ (tk, tk+1) by

 t − t    ysimple Euler;D := k ysimple Euler;D − ysimple Euler;D + ysimple Euler;D. t tk+1 tk tk tk+1 − tk

d Corollary 6.2.6. Let X : [0,T ] → R be as in Theorem 6.2.1. Assume that |V |Lipθ ≤ ν < ∞ 2ρ for some θ ∈ (0, ∞] chosen such that θ ≥ ρ−1 . Choose N ∈ N such that b2ρc ≤ N ≤ θ and D = {0 <  < 2 < . . . < (bT/c − 1) < T } for  > 0. Then for any δ > 0 and r ≥ 1,

simple Euler;D 1 −δ 1 − 1 −δ N+1 −1−δ Y − Y .  2ρ +  ρ 2 +  2ρ ∞ Lr for all  > 0.

1 1 Remark 6.2.7. In the proof we will see that the rate 2ρ − δ comes from the (almost) 2ρ - 1 1 H¨older-regulartiy of the sample paths of Y , the rate ρ − 2 − δ is the rate for the Wong-Zakai N+1 approximation and 2ρ − 1 − δ comes from the step-N Euler approximation. Since we always assume ρ ≥ 1, the Wong-Zakai error always dominates the first error. In particular, for ρ = 1 1 we can choose N = 2 to obtain a rate arbitrary close to 2 . For ρ > 1, the choice N = 3 gives a 1 1 rate of almost ρ − 2 . In both cases the rate does not increase for larger choices of N. Proof. Let X denote the Gaussian process whose sample paths are piecewise linear approxi-  e mated at the time points given by D and let Y : [0,T ] → R denote the pathwise solution to the equation     e dY = V (Y ) dX ; Y0 = Y0 ∈ R .

Then for any tk, tk+1 ∈ D we have 1 X;k;i1,...,ik = Xi1 ··· Xik , tk,tk+1 k! tk,tk+1 tk,tk+1

140 Giles’ complexity theorem revisited

simple Euler;D ; Euler;D hence Yt = Yt for any t ∈ D and thus

Y − Y simple Euler;D ≤ |Y − Y | + max Y  − Y ; Euler;D t t ∞ tk tk tk∈D

t−tk tk+1−t if t ∈ D. For t∈ / D, choose tk ∈ D such that tk < t < tk+1. Set a = and b = . tk+1−tk tk+1−tk Then a + b = 1 and by the triangle in equality,

simple Euler;D simple Euler;D Yt − Y ≤ a Yt − Yt + b |Yt − Yt | + a Yt − Y t k+1 k k+1 tk+1 simple Euler;D + b Yt − Y k tk

1/p kY k + max Y − Y simple Euler;D . 1/p-H¨ol;[0,T ] tk tk tk∈D 1/p  p   .  kXk1/p-H¨ol;[0,T ] ∨ kXk1/p-H¨ol;[0,T ] + |Y − Y |∞

+ max Y  − Y ; Euler;D tk tk tk∈D for any p > 2ρ. Since the right hand side does not depend on t, we can pass to the sup-norm on the left hand side. We now take the Lr-norm on both sides and check that the conditions of Theorem 6.2.4 and 6.2.1 are fulfilled and that the constants can be chosen independently from . Since we are dealing with piecewise linear approximations, we have

 2 h  2i 1/ρ sup Vρ R(X,X); [0,T ] < ∞ and sup E |Xt − Xt| .  >0 t∈[0,T ]

 1−1/p (cf. [FV10b, Chapter 15]). Furthermore, for every ω ∈ Ω one has |ω |p−var ≤ 3 |ω|p−var and  1−1/p |ω |1/p−H¨ol ≤ 3 |ω|1/p−H¨ol for every p ≥ 1 and  > 0 (follows, for instance, from [FV10b, Theorem 5.23]). This shows that we can apply Theorem 6.2.4 to see that for any δ > 0,

1 1  ρ − 2 −δ ||Y − Y |∞|Lr . 

0 N+1 N+1 holds for all  > 0. Furthermore, can choose p > 2ρ such that p0 −1 = 2ρ −1−δ and then ap-

  ply Theorem 6.2.1. Since d1/p0-H¨ol(X , X) r → 0 for  → 0, clearly sup>0 kX k1/p0-H¨ol;[0,T ] < L Lr ∞ and the constants on the right hand side do indeed not depend on . Choosing p such that 1 1 p = 2ρ − δ gives the claim.

6.3 Giles’ complexity theorem revisited

We adapt the main theorem of [Gil08b] to our later needs. Below one should think

P = f (Y·) for a Lipschitz function f and Y the solution to the Gaussian RDE dY = V (Y ) dX. Let Pbl denote some (modified) Milstein approximation `ala [DNT12], for instance (6.2), based on a l meshsize hl = T/(M0M ). Recall the basic idea h i E [P ] ≈ E PbL for L large L h i X h i = E Pb0 + E Pbl − Pbl−1 l=1

141 Rough paths estimates and MLMC

h i set Pb−1 ≡ 0 and define the (unbiased) estimator Ybl of E Pbl − Pbl−1 , say

Nl 1 X  (i) (i)  Ybl = Pb − Pb (6.6) N l l−1 l i=1

(i) (i) based on i = 1,...,Nl independent samples. Note that Pbl − Pbl−1 comes from approximations with different mesh but the same realization of the driving noise. In fact, we have the abstract theorem, an extension of [Gil08b] to the case α < 1/2.

Theorem 6.3.1. Let 0 < α < 1/2 and 0 < β ≤ 2α. Following Giles, we assume that there are 0 constants c1, c2, c2 and c3 such that

h i α (i) E Pbl − P ≤ c1hl ,

h i h i h i h i (ii) E Yb0 = E Pb0 and E Ybl = E Pbl − Pbl−1 , l > 0,

h i 0 −1 h i −1 β 2 (iii) V Yb0 ≤ c2N0 and V Ybl ≤ c2Nl hl for l ∈ N,

−1 (iv) the complexity Cl of computing Ybl is bounded by C0 ≤ c3N0h0 for l = 0 and Cl ≤ −1 −1 3 c3Nl(hl + hl−1) for l ≥ 1.

Then for every  > 0, there are choices L and Nl, 0 ≤ l ≤ L, to be given below in (6.10) and (6.11), respectively, and constants c4 and c5 given in (6.12) together with (6.13) such that PL the multilevel estimator Yb = l=0 Ybl satisfies the mean square error bound

 2 MSE ≡ E Yb − E[P ] < 2, with complexity bound

− 1+2α−β  − 1+2α−β  C ≤ const  α + o  α , where const = c4 for β < 2α and const = c4 + c5 for β = 2α.

Proof. We first ignore the basic requirement of L and Nl being integer values to obtain (almost) optimal real-valued choices for L and Nl. Then we are going to verify the above given bounds for the MSE and the complexity using the smallest integers dominating our real-valued choices. In this proof, we abuse notation by setting T = T/M0, noting that both complexity and MSE only depend on T and M0 by T/M0.

2 0 h i We distinguish between c2 and c2, since the former controls the variance V Yb0 , which is often already proportional to the variance of f(Y·), whereas the latter controls the variance of the difference Ybl, which is often much smaller in size. 3 −1 Note that the complexity at the 0-level is proportional to the number of timesteps h0 , whereas at higher levels, we need to apply the numerical scheme twice, once for the finer and once for the coarser grid.

142 Giles’ complexity theorem revisited

The mean-square-error satisfies

 2 MSE = E Yb − E[P ]

 h i2  h i 2 = E Yb − E Yb + E Yb − E[P ] h i  h i 2 = V Yb + E PbL − E[P ] L X h i 2 2α ≤ V Ybl + c1hL l=0 L 0 −1 β X −1 −lβ 2 2α ≤ c2N0 + c2T Nl M + c1hL . l=1

Now we need to minimize the total computational work

L −1 X −1 −1  C ≤ c3N0h0 + c3 Nl hl + hl−1 l=1 " L # M + 1 X = c T −1 N + N M l 3 0 M l l=1

2 under the constraint MSE ≤  . We first assume L to be given and minimize over N0,...,NL, and then we try to find an optimal L. We consider the Lagrange function

" L # M + 1 X f(N ,...,N , λ) ≡ c T −1 N + N M l + 0 L 3 0 M l l=1 L ! 0 −1 β X −1 −lβ 2 2α 2 + λ c2N0 + c2T Nl M + c1hL −  . l=1

Taking derivatives with respect to Nl, 0 ≤ l ≤ L, we arrive at

∂f −1 0 −2 = c3T − λc2N0 = 0, ∂N0 ∂f −1 l M + 1 β −lβ −2 = c3T M − λc2T M Nl = 0, ∂Nl M implying that

s √ 0 c2 N0 = λ T, (6.7a) c3 √ r r c2 (1+β)/2 M −l(1+β)/2 Nl = λ T M , 1 ≤ l ≤ L, (6.7b) c3 M + 1 which we insert into the bound for the MSE to obtain " r # √ q √ M + 1 M L(1−β)/2 − 1 T −(1−β)/2 λ = c0 c T −β + c c M (1−β)/2 . (6.8) 2 3 2 3 (1−β)/2−1 2 2 2α −2αL M M  − c1T M

143 Rough paths estimates and MLMC

By construction, we see that for any such choice of N0,...,NL, the MSE is, indeed, bounded by 2. For fixed L, the total complexity is now given by

 s  √ 0 L √ r r −1 c2 M + 1√ X l c2 (1+β)/2 M −l(1+β)/2 C(L) ≡ c3T  λ T + c2c3 M λ T M  c3 M c3 M + 1 l=1 " r # √ q √ M + 1 M L(1−β)/2 − 1 = λT −(1−β)/2 c0 c T −β + c c M (1−β)/2 (6.9) 2 3 2 3 M M (1−β)/2 − 1 " r #2 q √ M + 1 M L(1−β)/2 − 1 T −(1−β) = c0 c T −β + c c M (1−β)/2 . 2 3 2 3 (1−β)/2 2 2 2α −2αL M M − 1  − c1T M

In general, the optimal (but real-valued) choice of L would now be the arg-min of the above function, which we could not determine explicitly in an arbitrary regime. We parametrize the optimal choice of L by d1 in & α −1' log d1c1T  L = . (6.10) α log(M)

There are three different approaches to the choice of L: Giles chooses d1 = 1/2, which is probably motivated by the considerations in Remark 6.3.3 below. If all the constants involved have already been estimated, then one could choose L by numerical minimization of the complexity, or one could provide an asymptotic optimizer L (for  → 0). The latter approach has been carried out for the special case β = 1 in Theorem 6.3.4 below, and in this case the optimal L is indeed (almost) of the form (6.10) with d1 weakly depending on . 1−β Moreover, we choose with κ = 2α

& r !' pc0 T β q √ M + 1 dκcκT (1−β)/2−κ − 1 N = 2 c0 T −β + c M (1−β)/2 1 2 , (6.11a) 0 2 −2 2 2 (1−β)/2  (1 − d1 ) M M − 1 √ q  M β r κ κ (1−β)/2 −κ ! c2 M+1 T q 0 −β √ M + 1 (1−β)/2 d1 c2 T  − 1 −l(1+β)/2 Nl =  c T + c2 M M  ,  2(1 − d−2) 2 M M (1−β)/2 − 1   1  (6.11b)

1 ≤ l ≤ L. By construction, the MSE will be bounded by 2 using the choices (6.10) and (6.11). As x ≤ dxe ≤ x + 1 and using the inequalities

L 1/α 1/α −1/α M ≤ d1 c1 T M , L(1−β)/2 κ κ (1−β)/2 (1−β)/2 −κ M ≤ d1 c1 T M  , together with the shorthand-notations

r q √ M + 1 M (1−β)/2 e = c0 T −β − c , 1 2 2 M M (1−β)/2 − 1 r M + 1 M (1−β)/2 e = dκcκ+1/2 T (1−β)/2 2 1 2 M M (1−β)/2 − 1

144 Giles’ complexity theorem revisited motivated from our choice (6.11), we arrive after a tedious calculation at

" L # M + 1 X C ≤ c T −1 N + N M l 3 0 M l l=1 " q −1 0 β −2 −1 −κ −2 ≤ c3T 1 + c2T (1 − d1 ) (e1 + e2 ) + r √ M + 1 M (1−β)/2   + c T β(1 − d−2)−1 dκcκT (1−β)/2M (1−β)/2−κ − 1 (e + e −κ)−2+ 2 1 M M (1−β)/2 − 1 1 1 1 2 # M + 1   + d1/αc1/αT M−1/α − 1 . M − 1 1 1

1−β Arranging the terms according to powers of  and recalling κ = 2α , we get

−2(1+κ) −1/α −(2+κ) −2 C ≤ c4 + c5 + c6 + c7 + c8, (6.12) where

2κ 3(1−β)/2 κ 1+κ d1 M + 1 M c4 = c1 c c3 , (6.13a) 2 −2 (1−β)/2 2 1 − d1 M M − 1 M(M + 1) c = c1/αc d1/α , (6.13b) 5 1 3 1 M − 1 r √ d2κ M + 1 M (1−β)/2 c = (cκ + cκ) c c 1 T −(1−β)/2 (6.13c) 6 1 2 2 3 −2 (1−β)/2 1 − d1 M M − 1 r ! q √ M + 1 M (1−β)/2 × c0 T −β − c , (6.13d) 2 2 M M (1−β)/2 − 1 −(1−β) c3T 2 c7 = −2 e1, (6.13e) 1 − d1 c T −1 c = −2 3 . (6.13f) 8 M − 1

We remark that, under the condition that β ≤ 2α, we have 2(1 + κ) ≥ 1/α with equality iff β = 2α. Consequently, −2(1+κ) is the dominating term in the complexity-expansion. We further note that the second term in the expression can be either −1/α or −(2+κ).

The leading order coefficients c4 and c5 are positive, whereas the sign of c6 is not clear. 0 In particular, if we do not distinguish between the variance of Yb0 (controlled by c2) and the variances of the differences Ybl, l = 1,...,L, controlled by c2, then c6 will be negative. c7 is again positive (but often small) and c8 is negative. Clearly, we could simplify the complexity bound by omitting all terms with negative coefficients in (6.12). We further note that the leading order terms of the complexity do not depend on T or M0. We do not know the rate of weak convergence α of our simplified Milstein scheme, but for Lipschitz functions f, we clearly have

|E [f(X·) − f(Y·)]| ≤ |f|LipE [|X − Y |∞] , so that the weak rate of convergence is at least as good as the strong rate of convergence, i.e., α ≥ β/2 in the above notation. In fact, if we only impose minimal regularity conditions on f, then it is highly unlikely that we can get anything better than α = β/2.

145 Rough paths estimates and MLMC

Corollary 6.3.2. Under the assumptions of Theorem 6.3.1, let us additionally assume that α = β/2. Then the complexity of the above multi-level algorithm is bounded by

0 −1/α  −1/α C ≤ c4 + o  , where " # c(1−β)/βc1/β M 3(1−β)/2 M c0 = c d1/α(M + 1) 1 2 + c2/β . 4 3 1 2 2 1 d1 − 1 M M (1−β)/2 − 1 M − 1

0 The optimal choice of d1 minimizing c4 is obtained by s p 2 2 (1 − β)f1 (1 − β) f1 + 4βf1f2 d1 = 1 − + , 2f2 2f2 with

3(1−β)/2 (1−β)/β 1/β M + 1 M f1 = c1 c2 c3 , M M (1−β)/2 − 12 M(M + 1) f = c2/βc . 2 1 3 M − 1

0 0 Proof. We use c4 = c4 + c5 in order to obtain the formula for the constant. Then we consider c4 as a function of d1 and get the minimizer as the unique zero of the derivative in ]1, ∞[, noting 0 that c4 approaches ∞ on both boundaries of the domain.

Remark 6.3.3.√ In the now classical works of Giles on multilevel Monte Carlo, he usually chooses d1 = 2, see for instance [Gil08b].. This means that we reserve the same error tolerance /2 both for the bias or discretization error and for the statistical or Monte Carlo error. In many situations, this choice is not optimal. In fact, even in an ordinary Monte Carlo framework, one should not blindly follow this rule. For instance, for an SDE driven by a Brownian motion, the Euler scheme usually (i.e., under suitable regularity conditions) exhibits weak convergence with rate 1. Assuming the same constants for the weak error and the statistical error, a straightforward optimization will show that it is optimal to choose the number of timesteps and the number of Monte Carlo samples such that the discretization error is /3 and the statistical error is 2/3. 2 In the above, the choice of d1 corresponds to the distribution of the total MSE  between the statistical and the discretization error according to

2   2  1 2  = 2 + 1 − 2  . d1 d1 |{z} | {z } disc. error stat. error

So, depending on the parameters, Corollary 6.3.2 shows that the canonical error distribution is not optimal. 0 As the leading order coefficients c4 depends only mildly on M, we do not try to find an optimal choice of the parameter M.

The above analysis has also given us new insight into the classical multi-level Monte Carlo algorithm corresponding to the choice β = 1. Indeed, even in this case an equal distribution of the error tolerance  among the bias and the statistical error is far from optimal. Indeed, we have

146 Giles’ complexity theorem revisited

Theorem 6.3.4. For β = 1 and α ≥ 1/2, the optimal choice of L is (apart from rounding up) given by

   s  1/2 0 1 −1 2 2α c2 M 2 2α −1 L() = log  c1T 1 + α log M + c1T log    α log M  c2T M + 1 

log log −1 + O( ), log −1 which is of the form (6.10) with

 s 1/2 0 c2 M −1 d1 ≈ 1 + α log M + log   . c2T M + 1

Proof. Let us investigate the behaviour of (6.9) for β ↑ 1. For β ↑ 1 we obtain

" #2 q r β=1 0 −1 √ M + 1 1 C (L) = c2c3T + c2c3 L 2 2 2α −2αL , (6.14) M  − c1T M and we want to minimize this object for L. Let us abbreviate to

C(L) 2 1 D(L) := 0 −1 = (1 + aL) 2 −cL c2c3T  − bM with obvious definitions for a, b, c. Setting the derivative to zero yields

1 bcM −cL log M D0(L) = 2a (1 + aL) − (1 + aL)2 = 0 2 − bM −cL (2 − bM −cL)2 i.e. 2a 2 − bM −cL = (1 + aL) bcM −cL log M

2a2M cL = 2ab + bc log M + abcL log M abbreviate again bc bc 2M cL = b + log M + L log M =: p + qL 2a 2 and write the latter as M cL 1 = (6.15) p + qL 2 We now derive an asymptotic expansion for the solution L() to (6.15) for  ↓ 0. For this we take the logarithm of (6.15) to obtain with y := log −2 (with y → ∞ as  ↓ 0).

y log (p + qL) L = + . (6.16) c log M c log M

By reformulating (6.16) as

y 1   p  L = + log L + log q + log + 1 c log M c log M qL y log q log L = + + + O(L−1), y → ∞, (6.17) c log M c log M c log M

147 Rough paths estimates and MLMC and then writing (6.17) as y log q −1 c log M + c log M + O(L ) L = log L 1 − Lc log M we easily observe that L = O(y) as y → ∞. We thus get by iterating (6.16), y L = + O(log y), c log M and iterating once again,

 qy  y log p + c log M + O(log y) L = + c log M c log M  qy  y log p + c log M log y = + + O( ). c log M c log M y The next few iterations yield

  ! q log p+ qy qy c log M log p + c log M + c log M y log y L = + + O( ), and c log M c log M y2 qy ! q log p+  qy ( c log M )  q log p+ c log M + c log M log p + qy +   c log M c log M  y log y L = + + O( ), c log M c log M y3 etc. After re-expressing the asymptotic solution in the origonal terms via s c2 M + 1 a = 0 T c2 M 2 2α b = c1T

r ! 2 2α c02 M p = c1T 1 + α log M c2T M + 1 2 2α q = c1αT log M we gather, respectively, log −1 L() = + O(log log −1), α log M 1 L() = log −1 (6.18) α log M   s   0 1 2 2α c2 M 2 2α −1 + log c1T 1 + α log M + c1T log   2α log M c2T M + 1 log log −1 + O( ) log −1 etc. The resulting complexity (6.9) will be obviously

log2 −1 Cβ=1(L) = O( ), 2

148 Multilevel Monte Carlo for RDEs where sharper expressions can be obtained by inserting one of the above asymptotic expansions for L(). Note that (6.18) may be written as    s  1/2 0 1 −1 2 2α c2 M 2 2α −1 L() = log  c1T 1 + α log M + c1T log    α log M  c2T M + 1 

log log −1 + O( ) log −1 which suggest that in (6.10),  s 1/2 0 c2 M −1 d1 ≈ 1 + α log M + log   c2T M + 1 for the case β = 1.

6.4 Multilevel Monte Carlo for RDEs

d Let X : [0,T ] → R be Gaussian with the same assumptions as in Theorem 6.2.1 for some m ρ ∈ [1, 2). Consider the solution Y : [0,T ] → R of the RDE m dYt = V (Yt) dXt; Y0 ∈ R 2ρ m γ where V = (V1,...,Vd) is a collection of vector fields in R with |V |Lip < ∞ for some γ ≥ ρ−1 . (h ) Set S := Y and let S l be the simplified step-3 Euler approximation of Y with mesh-size hl m n (in the case ρ = 1, it suffices to consider a step-2 approximation). Let f : C([0,T ], R ) → R (h ) be a Lipschitz continuous functional and set P := f(S), Pbl := f(S l ). We want to calculate the quantities needed in Theorem 6.3.1. It suffices to apply the modified complexity theorem with α = β/2. To wit, h i  2  2   2 (hl) β V Pbl − P ≤ E Pbl − P ≤ |f|Lip E S − S = O hl and  1/2 1/22 h i h i h i  β V Pbl − Pbl−1 ≤ V Pbl − P + V Pbl−1 − P = O hl

2 for all β < ρ − 1. Of course the variance of the average of Nl IID samples becomes

h i 1 h i  β  V Ybl = V Pbl − Pbl−1 = O hl /Nl . Nl This shows (iii). Trivially, a strong rate is also a weak rate, in the sense that  1/2    2  β/2 E Pbl − P ≤ E Pbl − P = O hl .

Condition (ii), “unbiasedness” is obvious for the estimator (6.6). Finally, the computational complexity Ybl is obviously bounded by O (Nl/hl). (Create Nl samples paths with step-size ∼ hl). Corollaries 6.2.6 and 6.3.2 then imply Theorem 6.4.1. The Monte Carlo evaluation of a functional of an RDE driven by Gaussian signal, to within a MSE of ε2, can be achieved with computational complexity   2ρ O ε−θ ∀θ > . 2 − ρ

149 Rough paths estimates and MLMC

150 Appendix

A Kolmogorov theorem for multiplicative functionals

The next Lemma is a slight modification of [FV10b, Theorem A.13]. The proof follows the ideas of [FH, Theorem 3.1].

Lemma A.1.1 (Kolmogorov for multiplicative functionals). Let X, Y : [0,T ] × Ω → T N (V ) be random multiplicative functionals and assume that X(ω) and Y(ω) are continuous for all ω ∈ Ω. Let β, δ ∈ (0, 1] and choose β0 < β and δ0 < δ. Assume that there is a constant M > 0 such that

n n nβ |Xs,t|Lq/n ≤ M |t − s| n n nβ |Ys,t|Lq/n ≤ M |t − s| n n n δ+(n−1)β |Xs,t − Ys,t|Lq/n ≤ M |t − s| hold for all s < t ∈ [0,T ] and n = 1,...,N where  is a positive constant and q ≥ q0 where

 1 1  q := 1 + ∨ . 0 β − β0 δ − δ0

Then there is a constant C = C(N, β, β0, δ, δ0) such that

|Xn | sup s,t ≤ CM n (A.19) nβ0 s

|Yn | sup s,t ≤ CM n (A.20) nβ0 s

|Xn − Yn | sup s,t s,t ≤ CM n (A.21) δ0+(n−1)β0 s

Proof. W.l.o.g., we may assume T = 1. Let (Dk)k∈N be the sequence of dyadic partitions of the  l k 1 −k interval [0, 1), i.e. Dk = k : l = 0,..., 2 − 1 . Clearly, |Dk| = = 2 . Set 2 #Dk

n n Kk,X := max |Xti,ti+1 | ti∈Dk n n Kk,Y := max |Yti,ti+1 | ti∈Dk n 1 n n Kk,X−Y := max |Xti,ti+1 − Yti,ti+1 |  ti∈Dk Kolmogorov multiplicative functionals for n = 1,...,N and k ∈ N. By assumption, we have q X q q n n n n n n q qβ−1 E|Kk,X | ≤ E |Xti,ti+1 | ≤ #Dk max E|Xti,ti+1 | ≤ M |Dk| . ti∈Dk ti∈Dk n n In the same way one estimates Kk,Y and Kk,X−Y , hence n n nβ−n/q |Kk,X |Lq/n ≤ M |Dk| (A.22) n n nβ−n/q |Kk,Y |Lq/n ≤ M |Dk| (A.23) n n δ+(n−1)β−n/q |Kk,X−Y |Lq/n ≤ M |Dk| . (A.24) S∞ Note the following fact: For any dyadic rationals s < t, i.e. s < t ∈ ∆ := k=1 Dk, there is a m ∈ N such that |Dm+1| < |t − s| ≤ |Dm| and a partition

s = τ0 < τ1 < . . . < τN = t (A.25) of the interval [s, t) with the property that for any i = 0,...,N − 1 there is a k ≥ m + 1 with [τi, τi+1) ∈ Dk, but for fixed k ≥ m + 1 there are at most two such intervals contained in Dk.

n Step 1: We claim that for every n = 1,...,N there is a real random variable KX such that n n 0 0 N |KX |Lq/n ≤ M c where c = c(β, β , δ, δ ) and that for any dyadic rationals s < t and m,(τi)i=0 chosen as in (A.25) we have

N−1 |Xn | X τi,τi+1 ≤ Kn . (A.26) |t − s|nβ0 X i=0 n n Furthermore, the estimate (A.26) also holds for Y and a random variable KY . Indeed: By the N choice of m and (τi)i=0, N−1 |Xn | ∞ 2Kn ∞ Kn ∞ Kn X τi,τi+1 X k,X X k,X X k,X n nβ0 ≤ nβ0 ≤ 2 nβ0 ≤ 2 nβ0 =: KX . |t − s| |Dm+1| |Dk| |Dk| i=0 k=m+1 k=m+1 k=1 n n It remains to prove that |KX |Lq/n ≤ M c. By the triangle inequality and the estimate (A.22), ∞ n ∞ ∞ Kk,X 0 0 X n X n(β−1/q−β ) n X (β−1/q0−β ) nβ0 ≤ M |Dk| ≤ M |Dk| < ∞ |Dk| k=1 Lq/n k=1 k=1 0 since β − 1/q0 − β > 0 which shows the claim.

Step 2: We show that (A.19) and (A.20) hold for all n = 1,...,N. It is enough to consider |Xn | X. Note first that, due to continuity, it is enough to show the estimate for sup s,t . By s

152 We can now take the supremum over all s < t ∈ ∆ on the left. Taking the Lq/n-norm on both sides, using first the triangle, then the H¨olderinequality and the estimates from step 1 together with the induction hypothesis gives the claim.

Step 3: As in step 1, we claim that for any n = 1,...,N there is a random variable n q/n N KX−Y ∈ L such that for any dyadic rationals s < t and m,(τi)i=0 chosen as above we have

N−1 |Xn − Yn | X τi,τi+1 τi,τi+1 ≤ Kn . (A.27) |t − s|δ0+(n−1)β0 X−Y i=0

n n 0 0 Furthermore, we claim that |KX−Y |Lq/n ≤ M c˜ wherec ˜ =c ˜(β, β , δ, δ ). The proof follows the lines of step 1, setting

N−1 |Xn − Yn | ∞ Kn 1 X τi,τi+1 τi,τi+1 X k,X−Y ≤ 2 =: Kn .  |t − s|δ0+(n−1)β0 |D |δ0+(n−1)β0 X−Y i=0 k=1 k Step 4: We prove that (A.21) holds for all n = 1,...,N. By induction over n: The case −1 n = 1 is again just the usual Kolmogorov continuity criterion applied to t 7→  (Xt − Yt). Assume the assertion is shown up to level n − 1 and chose two dyadic rationals s < t. Using the multiplicative property, we have

N−1 n−1 N−1 n n X n n X n−l X l l |Xs,t − Ys,t| ≤ |Xτ ,τ − Yτ ,τ | + max |Xs,τ | |Xτ ,τ − Yτ ,τ | i i+1 i i+1 i=1,...,N i i i+1 i i+1 i=0 l=1 i=0 n−1 N−1 X n−l n−l X l + max |Xs,τ − Ys,τ | |Yτ ,τ |. i=1,...,N i i i i+1 l=1 i=0 Now we proceed as in step 2, using the estimates from step 1 to step 3 and the induction hypothesis.

153 Kolmogorov multiplicative functionals

154 Bibliography

[BHOZ08] Francesca Biagini, Yaozhong Hu, Bernt Oksendal, and Tusheng Zhang, Stochas- tic calculus for fractional brownian motion and applications, Probability and Its Applications, Springer, 2008.

[CC80] John M. C. Clark and R. J. Cameron, The maximum rate of convergence of discrete approximations for stochastic differential equations, Stochastic differential systems (Proc. IFIP-WG 7/1 Working Conf., Vilnius, 1978), Lecture Notes in Control and Information Sci., vol. 25, Springer, Berlin, 1980, pp. 162–171.

[CDFO] Dan Crisan, Joscha Diehl, Peter K. Friz, and Harald Oberhauser, Robust filtering: Correlated noise and multidimensional observation, To appear in Ann. of Appl. Prob.

[CF10] Thomas Cass and Peter K. Friz, Densities for rough differential equations under H¨ormander’scondition, Ann. of Math. (2) 171 (2010), no. 3, 2115–2141.

[CF11] , Malliavin calculus and rough paths, Bull. Sci. Math. 135 (2011), no. 6-7, 542–556.

[CFV09] Thomas Cass, Peter K. Friz, and Nicolas B. Victoir, Non-degeneracy of Wiener functionals arising from rough differential equations, Trans. Amer. Math. Soc. 361 (2009), no. 6, 3359–3371.

[CHLT12] Thomas Cass, Martin Hairer, Christian Litterer, and Samy Tindel, Smoothness of the density for solutions to gaussian rough differential equations, arXiv:1209.3100 (2012).

[CL05] Laure Coutin and Antoine Lejay, Semi-martingales and rough paths theory, Electron. J. Probab. 10 (2005), no. 23, 761–785 (electronic).

[CLL] Thomas Cass, Christian Litterer, and Terry J. Lyons, Integrability estimates for gaussian rough differential equations, to appear in Ann. of Prob.

[CLL12] , Rough paths on manifolds, New trends in stochastic analysis and related topics, Interdiscip. Math. Sci., vol. 12, World Sci. Publ., Hackensack, NJ, 2012, pp. 33–88.

[CQ02] Laure Coutin and Zhongmin Qian, Stochastic analysis, rough path analysis and fractional Brownian motions, Probab. Theory Related Fields 122 (2002), no. 1, 108–140.

[Dav07] Alexander M. Davie, Differential equations driven by rough paths: an approach via discrete approximation, Appl. Math. Res. Express. AMRX (2007), no. 2, Art. ID abm009, 40. Bibliography

[DNT12] Aur´elienDeya, Andreas Neuenkirch, and Samy Tindel, A Milstein-type scheme with- out L´evyarea terms for SDEs driven by fractional Brownian motion, Ann. Inst. Henri Poincar´eProbab. Stat. 48 (2012), no. 2, 518–550.

[FGGR12] Peter K. Friz, Benjamin Gess, Archil Gulisashvili, and Sebastian Riedel, Spatial rough path lifts of stochastic convolutions, arXiv:1211.0046 (2012).

[FH] Peter K. Friz and Martin Hairer, A short course on rough paths, Preprint.

[FO09] Peter K. Friz and Harald Oberhauser, Rough path limits of the Wong-Zakai type with a modified drift term, J. Funct. Anal. 256 (2009), no. 10, 3236–3256.

[FO10] , A generalized Fernique theorem and applications, Proc. Amer. Math. Soc. 138 (2010), no. 10, 3679–3688.

[F¨ol81] H. F¨ollmer, Calcul d’Itˆosans probabilit´es, Seminar on Probability, XV (Univ. Stras- bourg, Strasbourg, 1979/1980) (French), Lecture Notes in Math., vol. 850, Springer, Berlin, 1981, pp. 143–150.

[FR] Peter K. Friz and Sebastian Riedel, Convergence rates for the full gaussian rough paths, To appear in Ann. Inst. Henri Poincar´eProbab. Stat.

[FR11] , Convergence rates for the full Brownian rough paths with applications to limit theorems for stochastic flows, Bull. Sci. Math. 135 (2011), no. 6-7, 613–628.

[FR13] , Integrability of (non-)linear rough differential equations and integrals, Stoch. Anal. Appl. 31 (2013), no. 2, 336–358.

[FV05] Peter K. Friz and Nicolas B. Victoir, Approximations of the Brownian rough path with applications to stochastic analysis, Ann. Inst. H. Poincar´eProbab. Statist. 41 (2005), no. 4, 703–724.

[FV06a] , A note on the notion of geometric rough paths, Probab. Theory Related Fields 136 (2006), no. 3, 395–416.

[FV06b] , A variation embedding theorem and applications, J. Funct. Anal. 239 (2006), no. 2, 631–637.

[FV08a] , The Burkholder-Davis-Gundy inequality for enhanced martingales, S´eminairede probabilit´esXLI, Lecture Notes in Math., vol. 1934, Springer, Berlin, 2008, pp. 421–438.

[FV08b] , Euler estimates for rough differential equations, J. Differential Equations 244 (2008), no. 2, 388–412.

[FV08c] , On uniformly subelliptic operators and stochastic area, Probab. Theory Related Fields 142 (2008), no. 3-4, 475–523.

[FV10a] , Differential equations driven by Gaussian signals, Ann. Inst. Henri Poincar´e Probab. Stat. 46 (2010), no. 2, 369–413.

[FV10b] , Multidimensional stochastic processes as rough paths, Cambridge Studies in Advanced Mathematics, vol. 120, Cambridge University Press, Cambridge, 2010, Theory and applications.

[FV11] , A note on higher dimensional p-variation, Electron. J. Probab. 16 (2011), no. 68, 1880–1899.

156 Bibliography

[Gil08a] Michael B. Giles, Improved multilevel Monte Carlo convergence using the Milstein scheme, Monte Carlo and quasi-Monte Carlo methods 2006, Springer, Berlin, 2008, pp. 343–358.

[Gil08b] , Multilevel Monte Carlo path simulation, Oper. Res. 56 (2008), no. 3, 607– 617.

[GIP12] Massimiliano Gubinelli, Peter Imkeller, and Nicolas Perkowski, Paraproducts, rough paths and controlled distributions, arXiv:1210.2684 (2012), 1–30.

[GS06] Istv´anGy¨ongyand Anton Shmatkov, Rate of convergence of Wong-Zakai approxi- mations for stochastic partial differential equations, Appl. Math. Optim. 54 (2006), no. 3, 315–341.

[GS12] Michael B. Giles and Lukasz Szpruch, Antithetic multilevel monte carlo estimation for multi-dimensional sdes without lvy area simulation, arXiv:1202.6283 (2012).

[Gub04] Massimiliano Gubinelli, Controlling rough paths, J. Funct. Anal. 216 (2004), no. 1, 86–140.

[Gub10] , Ramification of rough paths, J. Differential Equations 248 (2010), no. 4, 693–721.

[Hai] Martin Hairer, Solving the KPZ equation, to appear in Ann. of. Math.

[Hai09] , An introduction to stochastic pdes, Preprint, Berlin, 2009.

[Hai11] , Rough stochastic PDEs, Comm. Pure Appl. Math. 64 (2011), no. 11, 1547– 1585.

[HH10] Keisuke Hara and Masanori Hino, Fractional order Taylor’s series and the neo- classical inequality, Bull. Lond. Math. Soc. 42 (2010), no. 3, 467–477.

[HMW13] Martin Hairer, Jan Maas, and Hendrik Weber, Approximating rough stochastic pdes, arXiv:1202.3094v2 (2013).

[HN09] Yaozhong Hu and David Nualart, Rough path analysis via fractional calculus, Trans. Amer. Math. Soc. 361 (2009), no. 5, 2689–2718.

[H¨or83] Lars H¨ormander, The analysis of linear partial differential operators. I, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sci- ences], vol. 256, Springer-Verlag, Berlin, 1983, Distribution theory and Fourier anal- ysis.

[HP] Martin Hairer and Natesh S. Pillai, Regularity of laws and ergodicity of hypoelliptic sdes driven by rough paths, to appear in Ann. of Prob.

[HSV07] Martin Hairer, Andrew M. Stuart, and Jochen Voss, Analysis of SPDEs arising in path sampling. II. The nonlinear case, Ann. Appl. Probab. 17 (2007), no. 5-6, 1657–1706.

[HSVW05] Martin Hairer, Andrew M. Stuart, Jochen Voss, and Petter Wiberg, Analysis of SPDEs arising in path sampling. I. The Gaussian case, Commun. Math. Sci. 3 (2005), no. 4, 587–603.

[HW13] Martin Hairer and Hendrik Weber, Rough Burgers-like equations with multiplicative noise, Probab. Theory Related Fields 155 (2013), no. 1-2, 71–126.

157 Bibliography

[It¯o44] Kiyosi It¯o, Stochastic integral, Proc. Imp. Acad. Tokyo 20 (1944), 519–524.

[It¯o51] , On stochastic differential equations, Mem. Amer. Math. Soc. 1951 (1951), no. 4, 51.

[Jan97] Svante Janson, Gaussian Hilbert spaces, Cambridge Tracts in Mathematics, vol. 129, Cambridge University Press, Cambridge, 1997.

[Kat03] Eytan Katzav, Growing surfaces with anomalous diffusion: Results for the fractal kardar-parisi-zhang equation, Phys. Rev. E 68 (2003), 031607.

[Kol23] Andrei N. Kolmogorow, Sur l’ordre de grandeur des coefficient de la s´eriede fourier- lebesque, Bull. Acad. Polon., Ser. A (1923), 83–86.

[K¨or89] Thomas W. K¨orner, Fourier analysis, second ed., Cambridge University Press, Cam- bridge, 1989.

[KP92] Peter E. Kloeden and Eckhard Platen, Numerical solution of stochastic differential equations, Applications of Mathematics (New York), vol. 23, Springer-Verlag, Berlin, 1992.

[Kra11] Xhevat Z. Krasniqi, On the second derivative of the sums of trigonometric series, Annals of the University of Craiova - Mathematics and Computer Science Series 38 (2011), no. 4, 76–86.

[LCL07] Terry J. Lyons, Michael Caruana, and Thierry L´evy, Differential equations driven by rough paths, Lecture Notes in Mathematics, vol. 1908, Springer, Berlin, 2007, Lectures from the 34th Summer School on Probability Theory held in Saint-Flour, July 6–24, 2004, With an introduction concerning the Summer School by Jean Picard.

[Led96] Michel Ledoux, Isoperimetry and Gaussian analysis, Lectures on probability theory and statistics (Saint-Flour, 1994), Lecture Notes in Math., vol. 1648, Springer, Berlin, 1996, pp. 165–294.

[Lej09] Antoine Lejay, On rough differential equations, Electron. J. Probab. 14 (2009), no. 12, 341–364.

[LL06] Xiang-Dong Li and Terry J. Lyons, Smoothness of Itˆomaps and diffusion processes on path spaces. I, Ann. Sci. Ecole´ Norm. Sup. (4) 39 (2006), no. 4, 649–677.

[Lor48] George G. Lorentz, Fourier-Koeffizienten und Funktionenklassen, Math. Z. 51 (1948), 135–149.

[LQ98] Terry J. Lyons and Zhongmin Qian, Flow of diffeomorphisms induced by a geometric multiplicative functional, Probab. Theory Related Fields 112 (1998), no. 1, 91–119.

[LQ02] , System control and rough paths, Oxford Mathematical Monographs, Oxford University Press, Oxford, 2002, Oxford Science Publications.

[LV07] Terry J. Lyons and Nicolas B. Victoir, An extension theorem to rough paths, Ann. Inst. H. Poincar´eAnal. Non Lin´eaire 24 (2007), no. 5, 835–847.

[LX12] Terry J. Lyons and Weijun Xu, A uniform estimate for rough paths, arXiv:1110.5278v4 (2012).

[Lyo91] Terry J. Lyons, On the nonexistence of path integrals, Proc. Roy. Soc. London Ser. A 432 (1991), no. 1885, 281–290.

158 Bibliography

[Lyo94] , Differential equations driven by rough signals. I. An extension of an in- equality of L. C. Young, Math. Res. Lett. 1 (1994), no. 4, 451–464. [Lyo98] , Differential equations driven by rough signals, Rev. Mat. Iberoamericana 14 (1998), no. 2, 215–310. [LZ99] Terry J. Lyons and Ofer Zeitouni, Conditional exponential moments for iterated Wiener integrals, Ann. Probab. 27 (1999), no. 4, 1738–1749. [Mal97] Paul Malliavin, Stochastic analysis, Grundlehren der Mathematischen Wis- senschaften [Fundamental Principles of Mathematical Sciences], vol. 313, Springer- Verlag, Berlin, 1997. [MR06] Michael B. Marcus and Jay Rosen, Markov processes, Gaussian processes, and local times, Cambridge Studies in Advanced Mathematics, vol. 100, Cambridge University Press, Cambridge, 2006. [MSS06] Annie Millet and Marta Sanz-Sol´e, Large deviations for rough paths of the fractional Brownian motion, Ann. Inst. H. Poincar´eProbab. Statist. 42 (2006), no. 2, 245–271. [MW97] J. Adin Mann and Wojbor A. Woyczynski, Rough surfaces generated by nonlinear transport, invited paper, Symposium on Non-linear Diffusion, TMS International Meeting, Indianapolis, 1997. [NTU10] Andreas Neuenkirch, Samy Tindel, and J´er´emieUnterberger, Discretizing the frac- tional L´evyarea, Stochastic Process. Appl. 120 (2010), no. 2, 223–254. [Reu93] Christophe Reutenauer, Free Lie algebras, London Mathematical Society Mono- graphs. New Series, vol. 7, The Clarendon Press Oxford University Press, New York, 1993, Oxford Science Publications. [RX12] Sebastian Riedel and Weijun Xu, A simple proof of distance bounds for gaussian rough paths, arXiv:1206.5866 (2012), 1–20. [Tal86] Denis Talay, Discr´etisationd’une ´equation diff´erentielle stochastique et calcul ap- proch´ed’esp´erances de fonctionnelles de la solution, RAIRO Mod´el.Math. Anal. Num´er. 20 (1986), no. 1, 141–179. [Tel73] S. A. Teljakovski˘ı, A certain sufficient condition of Sidon for the integrability of trigonometric series, Mat. Zametki 14 (1973), 317–328. [Tow02] Nasser Towghi, Multidimensional extension of L. C. Young’s inequality, JIPAM. J. Inequal. Pure Appl. Math. 3 (2002), no. 2, Article 22, 13 pp. (electronic). [Unt09] J´er´emieUnterberger, Stochastic calculus for fractional Brownian motion with Hurst 1 exponent H > 4 : a rough path method by analytic extension, Ann. Probab. 37 (2009), no. 2, 565–614. [Wal86] John B. Walsh, An introduction to stochastic partial differential equations, Ecole´ d’´et´ede probabilit´esde Saint-Flour,XIV—1984, Lecture Notes in Math., vol. 1180, Springer, Berlin, 1986, pp. 265–439. [XTHX12] Hui Xia, Gang Tang, Dapeng Hao, and Zhipeng Xun, Dynamics of surface rough- ening in the space-fractional kardarparisizhang growth: numerical results, Journal of Physics A: Mathematical and Theoretical 45 (2012), no. 29, 295001. [You36] Laurence C. Young, An inequality of the H¨oldertype, connected with Stieltjes inte- gration, Acta Math. 67 (1936), no. 1, 251–282.

159 Bibliography

[Zyg59] Antoni Zygmund, Trigonometric series. 2nd ed. Vols. I, II, Cambridge University Press, New York, 1959.

160