arXiv:0812.3102v3 [math.PR] 15 Dec 2011 etmthn,prmtrestimation. parameter matching, ment l hs aes h nlssi etitdt oesta d that models to restricted is analysis Brown the fractional papers, these include e all processes—which differential type for Volterra estimation parameter by fracti of [ problem by in the driven found cusses be equations can differential results recent sp of some to motion, case restricted the In are models. processes continuous non-Markovian of [ and overview 2047–2073 4, No. DOI: 39, Vol. Statistics 2011, of Annals The oeflxbeadhsbe ucsflyapidt eea Ma general to ea applied usually successfully [ is been (see GMME has the and MLE, While flexible the more (GMME). to compared Estimator efficient Matching less t Moment eral, in Generalized method the alternative of An diffusion. chal (MLE multi-dimensional many Estimator eral Likelihood poses Maximum still the diffusions constructing ample, for inference statistical of rjc Aciee etrfrMdln,Aayi n Comp and Analysis Modeling, Modellin for Multiscale Center “Archimedes to project Applications with Equations ential okhsbe oei h otx fdffsos(e [ (see diffusions of context the par in In done importance. and been output has research work of terms in both field,

c aiainadtpgahcdetail. typographic 2047–2073 and 4, the pagination by No. published 39, Vol. article 2011, original Statistics the Mathematical of of reprint Institute electronic an is This nvriyo awc n nvriyo rt,adUniversit and Crete, of University and Warwick of University nttt fMteaia Statistics Mathematical of Institute e od n phrases. and words Key t in inference statistical of methods most hand, other the On 1 .Introduction. 1. M 00sbetclassifications. subject 2000 AMS 2011. January revised 2010; June Received upre npr yE/098/ rn PrmtrEstimat “Parameter Grant EP/H019588/1 by part in Supported 10.1214/11-AOS893 AAEE SIAINFRRUHDIFFERENTIAL ROUGH FOR ESTIMATION PARAMETER 1 , yAatsaPpvslo n hitpeLadroue Christophe and Papavasiliou Anastasia By 10 iuinadafatoa iuin,ta s differentia a motion. is, Brownian that fractional diffusions, by fractional parame driven estimate a to and it its diffusion prove use a We we normality. and asymptotic paths and rough tency by driven equations differential ]). ecntutte“xetdsgauemthn”etmtrf estimator matching” signature “expected the construct We 2 o oercn eeomns.Nvrhls,teproblem the Nevertheless, developments). recent some for ] ttsia neec o tcatcpoessi huge a is processes stochastic for inference Statistical og ah,dffsos rcinldffsos generalize diffusions, fractional diffusions, paths, Rough 2011 , hsrpitdffr rmteoiia in original the from differs reprint This . EQUATIONS rmr 21,6F2 eodr 62M99. secondary 62F12; 62F12, Primary in h naso Statistics of Annals The 1 3 1 , 11 , 24 ”adFP7-REGPOT-2009-1 and g” .I [ In ]. utation.” 3 , 15 12 egs s o ex- for as, lenges, o o og Differ- Rough for ion , equation l pn ieryon linearly epend , ,teato dis- author the ], 22 utosdriven quations cfi lse of classes ecific i aei that is case his nlBrownian onal iua,altof lot a ticular, a oin In motion. ian consis- kvprocesses rkov esof ters o general a for ] o h gen- the for ) fBristol of y irt use, to sier econtext he or ngen- in , mo- d 2 A. PAPAVASILIOU AND C. LADROUE the parameter and for parameters appearing in the drift. Finally, for non- Markovian processes coming from stochastic delay equations, see [14, 23]. The theory of rough paths provides a general framework for making sense of differential equations driven by any type of noise modelled as a rough path—this includes diffusions, differential equations driven by fractional Brownian motion, delay equations and even delay equation driven by frac- tional Brownian motion (see [20]). The basic ideas have been developed in the 90s (see [18] and references within). However, the problem of statisti- cal inference for differential equations driven by rough paths has not been addressed yet. This is exactly what we strive to do in this paper. The exact setting of the statistical problem we consider is the following: we observe many independent copies of specific iterated of the response Y , 0

(2) (1) dYu1 dYu2 ZZ0

2.1. Some basic results from the theory of rough paths. In this section, we review some of the basic results from the theory of rough paths. For more details, see [6, 19] and references within. The goal of this theory is to give 4 A. PAPAVASILIOU AND C. LADROUE meaning to the differential equation (2.1) dY = f(Y ) dX , Y = y , t t · t 0 0 for very general continuous paths X. More specifically, we think of X and Y as paths on a Euclidean space: X : I Rn and Y : I Rm for I := [0, T ], n m → m→ n m so Xt R and Yt R for each t I. Also, f : R L(R , R ), where L(Rn, R∈m) is the space∈ of linear functions∈ from Rn to Rm→which is isomorphic to the space of m n matrices. For the sake of simplicity, we will assume that f(y) is a polynomial× in y—however, the theory holds for more general f. The path X is any path of finite p-variation, meaning that 1/p p sup Xtℓ Xtℓ−1 < , D⊂[0,T ] k − k ∞ Xℓ  where = tℓ ℓ goes through all possible partitions of [0, T ] and is the EuclideanD { } norm. Note that we will later define finite p-variationk · k for multiplicative functionals, also to be defined later. The fact the X is allowed to have any finite p-variation is exactly what makes this theory so general: Brownian motion is an example of a path that has finite p-variation for any p> 2 while fractional Brownian motion 1 with Hurst index h has finite p variation for p> h . We will define fractional Brownian motion formally in the corresponding example—for now, let us just say that it is Gaussian, self-similar but not Markovian except for h = 1/2 when it coincides with Brownian motion. When p [1, 2), we say that Y is a solution of (2.1) if ∈ t Y = Y + f(Y ) dX (s,t) ∆ , t s u · u ∀ ∈ T Zs where ∆T := (s,t); 0 s t T . In this case, the is defined as the Young integral{ (see≤ [18≤]).≤ What} does it mean for Y to be a solution of (2.1) when p 2? In order to answer this question, we first need to define the integral. To≥ make this task possible, we rewrite the integral so that the integrand is a function of the integrator: set fy0 ( ) := f( + y0). Define h : Rn Rm End(Rn Rm) by · · ⊕ → ⊕ I 0 (2.2) h(x,y) := n×n n×m . f (y) 0  y0 m×m  Instead of defining t f(Y ) dX , we will define the integral s u · u R t (2.3) h(Z ) dZ (s,t) ∆ , u · u ∀ ∈ T Zs where Z = (X,Y ). Note that if f is a polynomial in y, then h will also be a polynomial in z. More generally, we will define this integral for any path Z in Rℓ1 of finite p-variation and any polynomial h : Rℓ1 L(Rℓ1 , Rℓ2 ) of de- → PARAMETER ESTIMATION FOR ROUGH DIFFERENTIAL EQUATIONS 5 gree q. Since h is a polynomial, its Taylor expansion will be a finite sum: q (z z )⊗k h(z )= h (z ) 2 − 1 z , z Rℓ1 , 2 k 1 k! ∀ 1 2 ∈ Xk=0 ℓ1 ℓ1 ⊗k ℓ1 ℓ2 ℓ1 where h0 = h and hk : R L(R , L(R , R )) and for all z R , hk(z) is a symmetric k-linear mapping→ from Rℓ1 to L(Rℓ1 , Rℓ2 ), for k ∈ 1. Suppose that Z is a path of bounded variation (i.e., p = 1).≥ Then, using the symmetry of hk(z) and the “shuffle product property,” we can write q h(Z )= h (Z )Zk (s, u) ∆ , u k s s,u ∀ ∈ T Xk=0 where for every (s,t) ∆ , ∈ T Z0 1 R ≡ ∈ and ⊗k Zk (i1) (ik) Rℓ1 s,t = dZu1 dZuk . · · · · · · k ∈ Z Z s

Z(i1,...,ik) := dZ(i1) dZ(ik). s,t · · · u1 · · · uk Z Z s

⊗1 (i) (i) ⊗2 (i1) (i1) (i2) (i2) ((z2 z1) ) = z z and ((z2 z1) ) = (z z )(z z ) − i 2 − 1 − i1,i2 2 − 1 2 − 1 ⊗k 2 (z2−z1) and thus the sum k=0 hk(z1) k! becomes 0 P 0 0 0 0 0 + +  (3) (3) 2  ,  (3) (3) 2   (3) (3) (3)  (z2 z1 ) z1 + (z1 ) + 1 (1 + 2z1 )(z2 z1 ) 2 − −  2        which is equal to h(z2). It is easy to see that for all 0

t (3) (3) 2 t u1 (z zs ) (z(3) z(3))= dz(3) and t − = dz(3) dz(3). t − s u 2 u1 u2 Zs Zs Zs Thus, using the notation of the iterated integral, we write (3) 2 (3,3) h(zu)= h(zs)+ ∂3h(zs)Zs,u + ∂3,3h(zs)Zs,u and if we integrate once more we get t (3) (3,3) 2 (3,3,3) h(zu) du = h(zs)Zs,t + ∂3h(zs)Zs,t + ∂3,3h(zs)Zs,t . Zs Note that in the above example, we did not use the shuffle product formula because m = 1 (Y R). If the response Y lives in more that one dimensions, t ∈ PARAMETER ESTIMATION FOR ROUGH DIFFERENTIAL EQUATIONS 7 then the shuffle product formula is used, for example, to say that

1 (z z )(i2)(z z )(i1) = 1 Z(i2)Z(i1) = Z(i1,i2) + Z(i2,i1). 2 t − s t − s 2 s,t s,t s,t s,t Below we give a concrete example to show how the shuffle product formula extends integration by parts.

Example 2. Let us give here an example of the shuffle product. Let zt m be a smooth path in R for some m 1. Then, for any pair i1, i2 1,...,m using the integration by parts formula,≥ we get ∈ { } t u t Z(i1,i2) = dz(i1) dz(i2) = (z(i1) z(i1)) dz(i2) s,t u2 u u − s u Zs Zs Zs t = z(i1) dz(i2) z(i1)(z(i2) z(i2)) u u − s t − s Zs t = [z(i1)z(i2)]t z(i2) dz(i1) z(i1)(z(i2) z(i2)) u u s − u u − s t − s Zs t = z(i2)(z(i1) z(i1)) z(i2) dz(i1) t t − s − u u Zs t = (z(i2) z(i2))(z(i1) z(i1)) (z(i2) z(i2)) dz(i1) t − s t − s − u − s u Zs = Z(i1)Z(i2) Z(i2,i1), s,t s,t − s,t which is in agreement with the shuffle product formula, since the shuffle product of two letters is (i ) (i )= (i , i ), (i , i ) . 1 ⊔ 2 { 1 2 2 1 } It is now clear that in order to extend this construction to any path Z of finite p-variation, where p 2, we will first need to define their iterated in- Zk ≥ tegrals s,t. These are not necessarily unique (e.g., if Z is Brownian motion, then Itˆoand Stratonovich gave two different definitions for the integral). Then, we will need to find those integrals that respect the “shuffle product property.” Before going any further, we need to give some definitions.

Definition 2.1. Let ∆ := (s,t); 0 s t T . Let p 1 be a real T { ≤ ≤ ≤ } ≥ number. We denote by T (k)(Rℓ1 ) the kth truncated tensor algebra ⊗2 ⊗k T (k)(Rℓ1 ) := R Rℓ1 Rℓ1 Rℓ1 . ⊕ ⊕ ⊕···⊕ (1) Let Z :∆ T (k)(Rℓ1 ) be a continuous map. For each (s,t) ∆ , T → ∈ T denote by Zs,t the image of (s,t) through Z and write

0 1 k (k) ℓ1 j (i1,...,ij ) ℓ1 Zs,t = (Z , Z ,..., Z ) T (R ) where Z = Z . s,t s,t s,t ∈ s,t { s,t }i1,...,ij =1 8 A. PAPAVASILIOU AND C. LADROUE

The function Z is called a multiplicative functional of degree k in Rℓ1 if Z0 = 1 for all (s,t) ∆ and s,t ∈ T Z Z = Z s, u, t satisfying 0 s u t T, s,u ⊗ u,t s,t ∀ ≤ ≤ ≤ ≤ that is, for every (i ,...,i ) 1,...,ℓ l and l = 1,...,k: 1 l ∈ { 1} l (i1,...,il) (i1,...,ij ) (ij+1,...,il) (Z Z ) = Zs,u Z . s,u ⊗ u,t u,t Xj=0 This is called Chen’s identity. (2) A p-rough path Z in Rℓ1 is a multiplicative functional of degree p ℓ ⌊ ⌋ in R 1 that has finite p-variation, that is, i = 1,..., p and (s,t) ∆T , it satisfies ∀ ⌊ ⌋ ∈ (M(t s))i/p Xi − , k s,tk≤ β(i/p)! where is the Euclidean norm in the appropriate dimension and β a real numberk depending· k only on p and M is a fixed constant. The space of p-rough ℓ1 ℓ1 paths in R is denoted by Ωp(R ). (3) A geometric p-rough path is a p-rough path that can be expressed as a limit of 1-rough paths in the p-variation distance dp, defined as follows: (⌊p⌋) ℓ1 for any X, Y continuous functions from ∆T to T (R ), i/p X Y Xi Yi p/i dp( , )= max sup tℓ−1,tℓ tℓ−1,tℓ , 1≤i≤⌊p⌋ D⊂[0,T ] k − k Xℓ  where = tℓ ℓ goes through all possible partitions of [0, T ]. The space of D { } n ℓ1 geometric p-rough paths in R is denoted by GΩp(R ).

One of the main results of the theory of rough paths is the following, called the “extension theorem.”

Theorem 2.2 (Theorem 3.7, [19]). Let p 1 be a real number and k 1 (k) n ≥ ≥ be an integer. Let X :∆T T (R ) be a multiplicative functional with finite p-variation. Assume that→k p . Then there exists a unique extension of X ≥ ⌊ ⌋ to a multiplicative functional Xˆ :∆ T (k+1)(Rn). T → Let X : [0, T ] Rn be an n-dimensional path of finite p-variation for n> 1. One way of constructing→ a p-rough path is by considering the set of all (1) (n) iterated integrals of degree up to p . If Xt = (Xt ,...,Xt ), we define X :∆ T (⌊p⌋) as follows: ⌊ ⌋ T → X0 1 R ≡ ∈ PARAMETER ESTIMATION FOR ROUGH DIFFERENTIAL EQUATIONS 9 and

Xk (i1) (ik ) Rn⊗k s,t = dXu1 dXuk · · · · · · k ∈ Z Z s

(i1,i2) (i1,i2) (i1) (i2) (i1,i2) (Zs,t) = (Zs,u) + (Zs,u) (Zu,t) + (Zu,t) .

This follows by breaking the domain of integration u1, u2 : s < u1 < u2

We can now proceed to define the integral (2.3) when Z is a path of finite p-variation with p 2. First, it is clear that in order for the integral to be uniquely defined, we≥ should define the first p iterated integrals, so we define the integral not with respect to Z but a corresponding⌊ ⌋ p-rough path Z. To extend the previous construction, we also need that Z satisfies the “shuffle product property.” It is not hard to see that geometric p-rough paths do satisfy this property since they are limits of paths of bounded variation and for paths of bounded variation the property follows from the usual integration by parts formula (see also [18]). So, we will define h(Z) dZ, where Z is a geometric p-rough path in Rℓ1 , that is, Z GΩ (Rℓ1 ). ∈ p R By definition, there exists a sequence Z(r) Ω (Rℓ1 ) such that d (Z(r), ∈ 1 p Z) 0 as r . Then, for each r> 0, we define Z˜(r) := h(Z(r)) dZ(r). → →∞ These are also a 1-rough paths in Rℓ2 and thus, their higher iterated inte- grals are uniquely defined. In addition, it is possible to shoRw that the map ℓ1 ℓ2 h :Ω1(R ) Ω1(R ) sending Z(r) to Z˜(r) is continuous in the p-variation topology. → R 10 A. PAPAVASILIOU AND C. LADROUE

We define Z˜ := h(Z) dZ as the limit of the Z˜(r) with respect to dp— this is will also be a geometric p-rough path. In other words, the contin- R ℓ1 uous map h can be extended to a continuous map from GΩp(R ) to ℓ2 ℓ1 ℓ2 GΩp(R ), which are the closures of Ω1(R ) and Ω1(R ), respectively (see Theorem 4.12,R [19]). Note that this construction of the integral can be extended for any h Lip(γ 1) for γ>p (see [19]). ∈ − Remark 2.4. We say that a sequence Z(r) of p-rough paths converges to a p-rough path Z in p-variation topology if there exists an M R and a sequence a(r) converging to zero when r , such that ∈ →∞ Z(r)i , Zi (M(t s))i/p k s,tk k s,tk≤ − and Z(r)i Zi a(r)(M(t s))i/p k s,t − s,tk≤ − for i = 1,..., p and (s,t) ∆ . Note that this is not exactly equivalent ⌊ ⌋ ∈ T to convergence in dp: while convergence in dp implies convergence in the p- variation topology, the opposite is not true. Convergence in the p-variation topology implies that there is a subsequence that converges in dp.

We can now give the precise meaning of the solution of (2.1), when driven not by a path X but a geometric p-rough path X:

X n m Definition 2.5. Consider GΩp(R ) and y0 R . Set fy0 ( ) := n m ∈ n m ∈ · f( + y0) and define h : R R End(R R ) as in (2.2). We call Z · GΩ (Rn Rm) a solution⊕of (2.1→) if the following⊕ two conditions hold: ∈ p ⊕ (i) Z = h(Z) dZ. n (ii) πRn (Z)= X, where by πRn we denote the projection of Z to R . R As in the case of ordinary differential equations (p = 1), it is possible to construct the solution using Picard iterations: we define Z(0) := (X, e), e e 0 0 where by we denote the trivial rough path = (1, Rn , Rn⊗2 ,...). Then, for every r 1, we define Z(r)= h(Z(r 1)) dZ(r 1). The following theorem, called≥ the “Universal Limit theorem,”− gives the− conditions for the existence and uniqueness of the solutionR to (2.1). The theorem holds for any f Lip(γ) for γ>p but we will assume that f is a polynomial. The proof is∈ based on the convergence of the Picard iterations.

n Theorem 2.6 (Theorem 5.3, [19]). Let p 1. For all X GΩp(R ) and m ≥ ∈ n all y0 R , equation (2.1) admits a unique solution Z = (X, Y) GΩp(R Rm), in∈ the sense of Definition 2.5. This solution depends continuously∈ on X⊕ PARAMETER ESTIMATION FOR ROUGH DIFFERENTIAL EQUATIONS 11

n m and y0 and the mapping If : GΩp(R ) GΩp(R ) which sends (X,y0) to Y is continuous in the p-variation topology.→ The rough path Y is the limit of the sequence Y(r), where Y(r) is the projection of the rth Picard iteration Z(r) to Rm. For all ρ> 1, there exists T (0, T ] such that ρ ∈ (M(t s))i/p Y(r)i Y(r + 1)i 2iρ−r − k s,t − s,tk≤ β(i/p)! (s,t) ∆ , i = 0,..., p . ∀ ∈ Tρ ∀ ⌊ ⌋ The constant Tρ depends only on f and p.

2.2. The problem. We now describe the problem that we are going to study in the rest of the paper. Let (Ω, , P) be a probability space and n F X :Ω GΩp(R ) a random variable, taking values in the space of geometric p-rough→ paths endowed with the p-variation topology. For each ω Ω, the rough path X(ω) drives the following differential equation ∈ (2.5) dY (ω)= f(Y (ω); θ) dX (ω), Y = y , t t · t 0 0 where θ Θ Rd, Θ being the parameter space and for each θ Θ. As ∈ m⊆ n m ∈ before, f : R Θ L(R , R ) and fθ(y) := f(y; θ) is a polynomial in y for each θ Θ. According× → to Theorem 2.6, we can think of equation (2.5) as a map ∈ (2.6) I : GΩ (Rn) GΩ (Rm), fθ,y0 p → p sending a geometric p-rough path X to a geometric p-rough path Y and is continuous with respect to the p-variation topology. Consequently, Y := I X :Ω GΩ (Rm) fθ,y0 ◦ → p m T is also a random variable, taking values in GΩp(R ) and if P is the distri- bution of X0,T , the distribution of Y0,T will be (2.7) QT = PT I−1 . θ ◦ fθ,y0 Suppose that we know the expected signature of X at [0, T ], that is, we know

E(X(i1,...,ik)) := E dX(i1) dX(ik) 0,T · · · u1 · · · uk Z Z0

3. Method. In order to estimate θ, we are going to use a method that is similar to the “Method of Moments.” The idea is simple: we will try to (partially) match the empirical expected signature of the observed p- rough path with the theoretical one, which is a function of the unknown parameters. Remember that the data we have available is several realizations of the p-rough path Y0,T described in Section 2.2. To make this more precise, let us introduce some notation: let τ Yτ (3.1) E (θ) := Eθ( 0,T ) be the theoretical expected signature corresponding to parameter value θ and word τ and N 1 (3.2) M τ := Yτ (ω ) N N 0,T i Xi=1 be the empirical expected signature, which is a Monte Carlo approximation of the actual one. The word τ is constructed from the alphabet 1,...,m , k { ˆ } that is, τ Wm where Wm := k≥0 1,...,m . The idea is to find θ such that ∈ { } S Eτ (θˆ)= M τ τ V W N ∀ ∈ ⊂ m for some choice of a set of words V . Then θˆ will be our estimate.

Remark 3.1. When m = 1, the expected signature of Y is equivalent to its moments, since m 1 Y (1,..., 1) = (Y Y )m. 0,T m! T − 0 z }| { When m = 2, one example is to consider the word τ = (1, 2). Then, one needs to compute the iterated integral (or an approximation of, if the path is discretely observed) T s Y(1,2) (1) (2) 0,T (ωi)= dYu (ωi) dYs (ωi) Z0 Z0 (1) (1) for each path Yt(ωi) = (Yt (ωi),Yt (ωi)), for i = 1,...,N. Then N 1 M (1,2) := Y(1,2)(ω ). N N 0,T i Xi=1 PARAMETER ESTIMATION FOR ROUGH DIFFERENTIAL EQUATIONS 13

Note that this is closely related to the correlation of the two one-dimensional paths Y (1) and Y (2) since, by the shuffle product, { t }t∈[0,T ] { t }t∈[0,T ] Y(1,2) Y(2,1) Y(1) Y(2) 0,T (ωi)+ 0,T (ωi)= 0,T (ωi) 0,T (ωi) and by the law of large numbers, (1,2) (2,1) (1) (1) (2) (2) lim (M + M )= E((Y Y0 )(Y Y0 )). N→∞ N N T − T −

Several questions arise: (i) How can we get an analytic expression for Eτ (θ) as a function of θ? (ii) What is a good choice for V or, for m =1, how do we choose which moments to match? (iii) How good is θˆ as an estimate? We will try to answer these questions below.

3.1. Computing the theoretical expected signature. We want to get an analytic expression for the expected signature of the p-rough path Y at (0, T ), where Y is the solution of (2.5) in the sense described above. In other words, we want to compute (3.1). We are given the expected signature of the p-rough path X which is driving the equation, again at (0, T ), that is, we are given E(Xσ ) σ 1,...,n k, k N. 0,T ∀ ∈ { } ∈ In addition, we know the vector field fθ(y)= f(y; θ) in (2.5), up to parame- ter θ and we know that it is polynomial. It turns out that we cannot compute (3.1), in general. We need to make one more approximation since the solution Y will not usually be available: we will approximate the solution by the rth Picard iteration Y(r), described in the Universal Limit theorem (Theorem 2.6). Finally, we will approximate the expected signature of the solution corresponding to a word τ, Eτ (θ), by the expected signature of the rth Picard iteration at τ, which we will denote τ by Er (θ): τ Y τ (3.3) Er (θ) := Eθ( (r)0,T ).

The good news is that when fθ is a polynomial of degree q on y, for any q N, the rth Picard iteration of the solution is a linear combination of iterated∈ integrals of the driving force X. More specifically, for any realization ω and any time interval (s,t) ∆ , we can write ∈ T Y τ τ Xσ (3.4) (r)s,t = αr,σ(y0,s; θ) s,t, r |σ|≤|τ|(qX−1)/(q−1) 14 A. PAPAVASILIOU AND C. LADROUE

τ r where αr,σ(y; θ) is a polynomial in y of degree q and gives the length of a word. Thus, | · | τ τ Xσ (3.5) Er (θ)= αr,σ(y0,s; θ)E( s,t). r |σ|≤|τ|(qX−1)/(q−1) We will prove (3.4), first for p = 1 and then for any p 1 by taking limits ≥ with respect to dp. We will need the following lemma.

n m Lemma 3.2. Suppose that X GΩ1(R ), Y GΩ1(R ) and it is possi- ble to write ∈ ∈ (3.6) Y(j) = α(j)(y )Xσ (s,t) ∆ and j = 1,...,m, s,t σ s s,t ∀ ∈ T ∀ σ∈Wn,qX1≤|σ|≤q2 (j) m where ασ : R L(R, R) is a polynomial of degree q with q,q1,q2 N and q 1. Then → ∈ 1 ≥ Yτ τ Xσ (3.7) s,t = ασ(ys) s,t σ∈Wn,|τ|Xq1≤|σ|≤|τ|q2 τ m for all (s,t) ∆T and τ Wm. ασ : R L(R, R) are polynomials of degree q τ . ∈ ∈ → ≤ | | Proof. We will prove (3.7) by induction on τ , that is, the length of the word. By hypothesis, it is true when τ =1. Suppose| | that it is true for any τ W such that τ = k 1. First, note| | that from (3.6), we get that ∈ m | | ≥ dY (j) = α(j)(y )Xσ− dXσℓ u [s,t], u σ s s,u u ∀ ∈ σ∈Wn,qX1≤|σ|≤q2 where σ is the word σ without the last letter and σ is the last letter. For − ℓ example, if σ = (i1,...,ib−1, ib), then σ = (i1,...,ib−1) and σℓ = ib. Note that this cannot be defined when σ is the− empty word ∅ (b = 0). Now suppose that τ = k +1, so τ = (j1,...,jk, jk+1) for some j1,...,jk+1 1,...,m . Then| | ∈ { } t Yτ Yτ− (jk+1) s,t = s,u dYu Zs t (j ) σ2 τ− Xσ1 k+1 Xσ2− ℓ = ασ1 (ys) s,u ασ2 (ys) s,u dXu s Z kq1≤|Xσ1|≤kq2  q1≤|Xσ2|≤q2 t (j ) σ2 τ− k+1 Xσ1 Xσ2− ℓ = (ασ1 (ys)ασ2 (ys)) s,u s,u dXu . s kq1≤|σ1|≤kqX2,q1≤|σ2|≤q2 Z PARAMETER ESTIMATION FOR ROUGH DIFFERENTIAL EQUATIONS 15

Now we use the fact that for any geometric rough path X and any (s, u) ∈ ∆T , we can write

Xσ1 Xσ2− Xσ (3.8) s,u s,u = s,u, σ∈σ1X⊔(σ2−) where σ1 (σ2 ) is the shuffle product between the words σ1 and σ2 . Applying⊔ (3.8)− above, we get − Yτ τ Xσ s,t = ασ(ys) s,t, σ∈Wn,(k+1)Xq1≤|σ|≤(k+1)q2 where

τ τ− τℓ ασ(ys)= ασ1 (ys)ασ2 (ys) (σ ⊔σ −)∋σ−,σ =σ 1 2 X ℓ 2ℓ is a polynomial of degree kq + q = (k + 1)q. Note that the above sum is over all σ ,σ W such that≤ kq σ kq and q σ q .  1 2 ∈ n 1 ≤ | 1|≤ 2 1 ≤ | 1|≤ 2 We now prove (3.4) for p = 1.

n Lemma 3.3. Suppose that X GΩ1(R ) is driving system (2.1), where f : Rm L(Rn, Rm) is a polynomial∈ of degree q. Let Y(r) be the projection of the →rth Picard iteration Z(r) to Rm, as described above. Then, Y(r) m ∈ GΩ1(R ) and it satisfies Y τ τ Xσ (3.9) (r)s,t = αr,σ(y0,s) s,t r |σ|≤|τ|(qX−1)/(q−1) for all (s,t) ∆ and τ W . ατ (y,s) is a polynomial of degree τ qr ∈ T ∈ m r,σ ≤ | | in y.

n+m Proof. For every r 0, Z(r) GΩ1(R ) since Z(0) := (X, e), X n ≥ ∈ ∈ GΩ1(R ), and integrals preserve the roughness of the integrator. So, Y(r) m ∈ GΩ1(R ). We will prove the claim by induction on r. For r = 0, Y(0)= e and thus (3.9) becomes Y 0 τ τ ( )s,t = α0,∅(y0,s) ∅ τ and it is true for α0,∅ 1 and α0,∅ 0 for every τ Wm such that τ > 0. Now suppose it is true≡ for some r≡ 0. Remember∈ that Z(r) = (X|, Y| (r)) and that Z(r + 1) is defined by ≥

Z(r +1)= h(Z(r)) dZ(r), Z 16 A. PAPAVASILIOU AND C. LADROUE

where h is defined in (2.2) and fy0 (y)= f(y0 + y). Since f is a polynomial of degree q, h is also a polynomial of degree q and, thus, it is possible to write q (z z )⊗k (3.10) h(z )= h (z ) 2 − 1 z , z Rℓ, 2 k 1 k! ∀ 1 2 ∈ Xk=0 where ℓ = n + m. Then, the integral is defined to be

t q Z Z Z Z k+1 (r + 1)s,t := h( (r)) d (r)= hk(Z(r)s) (r)s,t (s,t) ∆T . s ∀ ∈ Z Xk=0 ⊗k Let’s take a closer look at functions h : Rℓ L(Rℓ , L(Rℓ, Rℓ)). Since (3.10) k → is the Taylor expansion for polynomial h, hk is the kth derivative of h. ℓ So, for every word β Wℓ such that β = k and every z = (x,y) R , β ∈ℓ ℓ | | ∈ (hk(z)) = ∂βh(z) L(R , R ). By definition, h is independent of x and thus the derivative will∈ always be zero if β contains any letters in 1,...,n . Remember that Y(r + 1) is the projection of Z(r + 1) onto{ Rm. So,} for each j 1,...,m , ∈ { } q Y (j) Z (n+j) Z k+1 (n+j) (r + 1)s,t = (r + 1)s,t = (hk(Z(r)s) (r)s,t ) Xk=0 ℓ Z (τ+n,i) (3.11) = ∂τ+nhn+j,i(Z(r)s) (r)s,t Xi=1 τ∈WXm(0,q) n Y (τ,i) = ∂τ fj,i(y0 + Y (r)s) (r)s,t , Xi=1 τ∈WXm(0,q) where W (k , k )= τ W ; k τ k for any k , k N, that is, it is m 1 2 { ∈ m 1 ≤ | |≤ 2} 1 2 ∈ the set of all words of length between k1 and k2. By the induction hypothesis, we know that for every τ W , ∈ m Z τ+n Y τ τ Xσ (r)s,t = (r)s,t = αr,σ(y0,s) s,t r |σ|≤|τ|(qX−1)/(q−1) and thus, for every i = 1,...,n, Z (τ+n,i) τ X(σ,i) (3.12) (r)s,t = αr,σ(y0,s) s,t . r |σ|≤|τ|(qX−1)/(q−1) Putting this back to the equation above, we get n Y (j) τ X(σ,i) (r + 1)s,t = ∂τ fj,i(y0 + Y (r)s) αr,σ(y0,s) s,t r Xi=1 |Xτ|≤q |σ|≤|τ|(qX−1)/(q−1) PARAMETER ESTIMATION FOR ROUGH DIFFERENTIAL EQUATIONS 17 and by reorganizing the sums, we get Y (j) (j) Xσ (3.13) (r + 1)s,t = αr+1,σ(y0,s) s,t, r r+1 |σ|≤q(q −1)/(q−1)+1=(X q −1)/(q−1) where α(j) 0 and for every σ W ∅, r+1,∅ ≡ ∈ n − (j) τ αr+1,σ(y0,s)= ∂τ fj,σℓ (y0 + Y (r)s)αr,σ−(y0,s). r |σ−|(q−1)/X(q −1)≤|τ|≤q τ r (j) If αr,σ are polynomials of degree τ q , then αr,σ are polynomials of degree qr. The result follow by applying≤| Lemma| 3.2. Notice that (in the notation≤ of Lemma 3.2) q 1 since α(j) 0.  1 ≥ r+1,∅ ≡ We will now prove (3.4) for any p 1. ≥ n Theorem 3.4. The result of Lemma 3.3 still holds when X GΩp(R ), for any p 1. ∈ ≥ Proof. Since X GΩ (Rn), there exists a sequence X(k) in ∈ p { }k≥0 n k→∞ GΩ1(R ), such that X(k) X in the p-variation topology. We denote by Z(k, r) and Z(r) the rth Picard→ iteration corresponding to equation (2.1) driven by X(k) and X, respectively. First, we show that Z(k, r) k→∞ Z(r) and consequently Y(k, r) k→∞ Y(r) in the p-variation topology, for→ every r 0. It is clearly true for→ r = 0. Now suppose that it is true for some r≥ 0. By definition, Z(r +1) = h(Z(r)) dZ(r). Remember that the integral≥ is defined as the limit in the p-variation topology of the integrals corresponding to a sequence of 1-rough pathsR that converge to Z(r) in the p-variation topology. By the induc- tion hypothesis, this sequence can be Z(k, r). It follows that Z(k, r +1)= h(Z(k, r)) dZ(k, r) converges to Z(r + 1), which proves the claim. Conver- gence of the rough paths in p-variation topology implies convergence of each Rof the iterated integrals, that is, k→∞ Y(k, r)τ Y(r)τ s,t → s,t for all r 0, (s,t) ∆T and τ Wm. By Lemma≥ 3.3,∈ since X(k) ∈GΩ (Rn) for every k 1, we can write ∈ 1 ≥ Y τ τ X σ (k, r)s,t = αr,σ(y0,s) (k)s,t r |σ|≤|τ|(qX−1)/(q−1) k→∞ for every τ Wm, (s,t) ∆T and k 1. Since X(k) X in the p-variation topology and∈ the sum is∈ finite, it follows≥ that → Y τ k→∞ τ Xσ (k, r)s,t αr,σ(y0,s) s,t. → r |σ|≤|τ|(qX−1)/(q−1) 18 A. PAPAVASILIOU AND C. LADROUE

The statement of the theorem follows. 

3.2. The expected signature matching estimator. We can now give a pre- cise definition of the estimator, which we will formally call the expected signature matching estimator (ESME): suppose that we are in the setting τ τ of the problem described in Section 2.2 and MN and Er (θ) are defined as in (3.2) and (3.3), respectively, for every τ Wm. Let V Wm be a set of d words constructed from the alphabet 1,...,m∈ . For each⊂ such V , we define ˆV { } the ESME θr,N as the solution to (3.14) Eτ (θ)= M τ τ V. r N ∀ ∈ This definition requires that (3.14) has a unique solution. This will not be true in general. Let r be the set of all V containing d words, such that τ V Er (θ)= M, τ V , has a unique solution for all M Sτ R where Sτ is ∀ ∈ τ ∈ ⊆ the set of all possible values of MN , for any N 1. We will assume the following. ≥

Assumption 1 (Observability). The set r is nonempty and known (at least up to a nonempty subset). V

Then θˆV can be defined for every V . r,N ∈Vr Remark 3.5. In order to achieve uniqueness of the estimator, we might need some extra information that we could get by looking at time corre- lations. We can fit this into our framework by considering scaled versions of (2.5) together with the original one: for example, consider the equation dY (ω)= f(Y (ω); θ) dX (ω), Y = y , t t · t 0 0 dY (c) (ω)= f(Y (c) (ω); θ) dX (ω), Y (c) = y , t t · ct 0 0 for some appropriate constant c. Then Y (c)t = Yct and the expected sig- (j1) (j2) nature at [0, T ] will also contain information about E(YT YcT ) for any j1, j2 = 1,...,m.

It is very difficult to say anything about the solutions of system (3.14), as it is very general. However, under the assumption that f is also a polynomial in θ, (3.14) becomes a system of polynomial equations and the problem of identifiability becomes equivalent to the problem of existence and uniqueness of solutions for that system.

3.3. Properties of the ESME. It is possible to show that the ESME de- fined as the solution of (3.14) will converge to the true value of the parameter and will be asymptotically normal. More precisely, the following holds. PARAMETER ESTIMATION FOR ROUGH DIFFERENTIAL EQUATIONS 19

ˆV Theorem 3.6. Let θr,N be the expected signature matching estimator for the system described in Section 2.2 and V . Assume that the expected ∈Vr signature of Y0,T is finite and that f(y; θ) is a polynomial of degree q with respect to y and twice differentiable with respect to θ. Let θ0 be the “true” parameter value, meaning that the distribution of the observed signature Y0,T is QT , defined in (2.7). Set θ0 V ∂ τ Yτ Yτ ′ (3.15) Dr (θ)i,τ = Er (θ) and ΣV (θ0)τ,τ ′ = cov( 0,T , 0,T ) ∂θi V V and assume that infr>0,θ∈Θ Dr (θ) > 0, that is, Dr (θ) is uniformly nonde- generate with respect to r andk θ. Then,k for r log N and T are sufficiently small, ∝ (3.16) θˆV θ with probability 1 r,N → 0 and L (3.17) √NΦ (θ )−1(θˆV θ ) (0,I) V 0 r,N − 0 →N as N , where →∞ V −1 1/2 (3.18) ΦV (θ0)= D (θ0) ΣV (θ0) with DV (θ) = ∂ Eτ (θ). i,τ ∂θi

τ Proof. By Theorem 3.4 and the definition of Er (θ), τ τ Xσ Er (θ)= αr,σ(y0; θ)E( 0,T ), r |σ|≤|τ|(qX−1)/(q−1) τ where functions αr,σ(y0; θ) are constructed recursively, as in Lemmas 3.2 and 3.3. Since f is twice differentiable with respect to θ, functions α and τ consequently Er will also be twice differentiable with respect to θ. Thus, we can write Eτ (θ) Eτ (θ )= DV (θ˜) (θ θ ) θ Θ Rd r − r 0 r ·,τ − 0 ∀ ∈ ⊆ for some θ˜ within a ball of center θ0 and radius θ θ0 and the func- V V k −ˆV k tion Dr (θ) is continuous. By inverting Dr and for θ = θr,N , we get (3.19) (θˆV θ )= DV (θ˜V )−1(EV (θˆV ) EV (θ )), r,N − 0 r r,N r r,N − r 0 where EV (θ)= Eτ (θ) . By definition r { r }τ∈V N V ˆV τ 1 Yτ (3.20) Er (θr,N )= MN τ∈V = 0,T (ωi) , { } (N ) Xi=1 τ∈V where Y0,T (ωi) are independent realizations of the random variable Y0,T . Suppose that T is small enough, so that the above Monte Carlo approx- 20 A. PAPAVASILIOU AND C. LADROUE imation satisfies both the Law of Large Numbers and the Central Limit theorem, that is, the covariance matrix satisfies 0 < ΣV (θ0) < . Then, for N k k ∞ →∞ Eτ (θˆV ) Eτ (θ ) = Eτ (θˆV ) E(Yτ ) 0 τ V | r r,N − 0 | | r r,N − 0,T |→ ∀ ∈ with probability 1. Note that the convergence does not depend on r. Also, for r →∞ Eτ (θ ) Eτ (θ ) r 0 → 0 as a result of Theorem 2.6. Thus, for r log N ∝ Eτ (θˆV ) Eτ (θ ) 0 with probability 1, τ V. | r r,N − r 0 |→ ∀ ∈ V Combining this with (3.19) and the uniform nondegeneracy of Dr , we V get (3.16). From (3.16) and the continuity and uniform nondegeneracy of Dr , we conclude that DV (θ )DV (θ˜V )−1 I with probability 1 0 r r,N → provided that T is small enough, so that EV (θ ) < . Now, since 0 ∞ Φ (θ )−1(θˆV θ )=Σ (θ )−1/2(DV (θ )DV (θ˜V )−1)(EV (θˆV ) EV (θ )) V 0 r,N − 0 V 0 0 r r,N r r,N − r 0 to prove (3.17) it is sufficient to prove that √NΣ (θ )−1/2(EV (θˆV ) EV (θ )) L (0,I). V 0 r r,N − r 0 →N It follows directly from (3.20) that L √NΣ (θ )−1/2(EV (θˆV ) EV (θ )) (0,I). V 0 r r,N − 0 →N It remains to show that √NΣ (θ )−1/2(EV (θ ) EV (θ )) 0. V 0 r 0 − 0 → It follows from Theorem 2.6 that EV (θ ) EV (θ ) Cρ−r k r 0 − 0 k≤ for any ρ> 1 and sufficiently small T . The constant C depends on V,p and T . 1 Suppose that r = a log N for some a> 0 and choose ρ> exp ( 2c ). Then √N (EV (θ ) EV (θ )) CN (1/2−c log ρ), k r 0 − 0 k≤ which proves the claim. 

4. Extensions. In this section, we discuss how to extend the method described in Section 3 in two different directions. First, we generalize the ESME by matching linear combinations of the elements of the signature and considering issues of optimality. Then, we extend the method to a different setting where we observe one path of the signature of the response at many points in time rather than many independent signatures of the response at one fixed time T . PARAMETER ESTIMATION FOR ROUGH DIFFERENTIAL EQUATIONS 21

4.1. Generalized ESME and discussion of optimality. In some sense, our method is standard: we assumed that we observe N realizations of the ran- Y θ dom variable 0,T with distribution QT defined in (2.7). Then, we estimate θ by matching the empirical and theoretical expectation of that random vari- able, with the challenging part being computing the theoretical expecta- tion. Similarly, we can generalize the method and study its optimality in the standard way (see [9]): we consider functions g(Y , ):Θ Rd such that 0,T · → E (g(Y ,θ)) 0. θ 0,T ≡ An obvious choice is (4.1) g(Y ,θ)= Yτ E (Yτ ) 0,T { 0,T − θ 0,T }τ∈V for V as before. More generally, we can consider linear combinations of iter- Yτ ated integrals 0,T . This is sufficient since products of iterated integrals can be written as linear combinations of iterated integrals. Then, the generalized moment matching estimator is defined as the solution to

N 1 g(Y (ω ),θ) = 0. N 0,T i Xi=1 We define the generalized ESME to be the solution to the system above with expectations being approximated by the expectations of Picard iterations. For g defined in (4.1), we get back the ESME. The asymptotically optimal choice of function g among all linear combina- Yτ tions of iterated integrals 0,T with τ V is the one minimizing asymptotic variance. The optimization can be done∈ iteratively: suppose that we want to choose parameters ασ σ∈V such that the estimator constructed by solv- ing { }

N Yσ 1 Yσ ασ Eθ 0,T 0,T (ωi) − N ! σX∈V Xi=1 Yσ in term of θ, has minimal variance among all linear combinations of 0,T . We go through the following steps:

(0) Choose an initial value ασ(0) for the ασ’s. (1) For the current value of the ασ’s, solve for θ. (2) Compute the asymptotic variance as a function of the α’s and θ. (3) Set the α’s equal to the argument minimizing the asymptotic variance in terms of the α’s for θ the solution of step (1). (4) Go to step (1). This is also discussed in [9]. 22 A. PAPAVASILIOU AND C. LADROUE

4.2. Observing one path. Suppose that we observe one realization of the solution of (2.5), namely Y0,t(ω) 0≤t≤T . We are going to say that the rough path Y is ergodic if, for {T , } →∞ 1 T (4.2) δY dt µ weakly, Q -a.s. T 0,t(ω) → θ0 θ0 Z0 for θ0 in the parameter space Θ. The limit µθ0 is a distribution on the space m of geometric rough paths GΩp(R ) and we call it the invariant distribution. Y Then, if 0 µθ0 , the process will be stationary. In particular, for all t 0 and words τ∼ W , ≥ ∈ m Yτ Yτ (4.3) Eθ0 ( 0,t)= Eθ0 ( 0 ), where the expectation Eθ0 is with respect to µθ0 .Thus, for T large and any S 0 ≥ 1 T (4.4) Yτ (ω) dt E (Yτ ), Q -a.s., T 0,t ≈ θ0 0,S θ0 Z0 where the left-hand side can be computed from the observations and the right-hand side is a function of θ0. However, as before, the expectation Yτ Eθ0 ( 0,S) will not be know in general. We approximate it using (3.4). We get Y τ τ Y Xσ Eθ0 ( (r)0,S) Eθ0 (αr,σ( 0, 0; θ))E( 0,S). ≈ r |σ|≤|τ|(qX−1)/(q−1) τ According to Theorem 3.4, functions αr,σ(y, 0; θ) are polynomials in y of degree τ qr where q is the degree of polynomial f in (2.5) with respect to y. Thus, we| | write |τ|qr |τ|qr E τ Y E Yk r,σ,τ E Yk r,σ,τ θ0 (αr,σ( 0, 0; θ)) = θ0 0 ck (θ) = θ0 ( 0 ) ck (θ). · ! · Xk=1 Xk=1 Yk The expectation Eθ0 ( 0 ) is still unknown but can be approximated us- ing (4.2). We end up with the equation 1 T Yτ (ω) dt T 0,t Z0 r |τ|q T 1 Yk r,σ,τ Xσ 0,t(ω) dt ck (θ) E( 0,S). ≈ r T 0 · ! |σ|≤|τ|(qX−1)/(q−1) Xk=1 Z  r,σ,τ The coefficients ck (θ) are polynomials with respect to θ. By considering several different words τ Wm, we construct a polynomial system of equa- tions of θ. As before, if θˆ∈is a solution of the system, we call it the expected signature matching estimator. PARAMETER ESTIMATION FOR ROUGH DIFFERENTIAL EQUATIONS 23

Note that S and T do not need to be the same. In fact, T should be large so that (4.4) holds while S needs to be small in order for the local approximation of the expectation by Picard iterations to be valid.

Remark 4.1. At the moment, there is no unified theory of ergodicity for rough paths. Some interesting results in this direction can be found in [8]. Note that we have assumed that the system is initialized by a rough path Y0 rather than a point Y Rm. This is consistent with the results in [8], where 0 ∈ the authors point our the need to consider all the past Yt;

5. Examples. In this section, we use the ESME in specific examples of diffusions and fractional diffusions. The code that was used in these exam- ples is written in Mathematica and can be found in http://chrisladroue.com/ software/brownian-motion-and-iterated-integrals-on-mathematica/. It can be used to generate more examples corresponding to different choices of drift and diffusion coefficient.

5.1. Diffusions. First, we apply the ESME to estimate the parameters of the following Stratonovich SDE: (5.1) dY = a(1 Y ) dX(1) + bY 2 dX(2), Y (1) = 0, t − t t t t 0 (1) (2) where Xt = t and Xt = Wt. We chose an SDE because the expected signature of (t,Wt) can easily be computed explicitly. After three Picard iterations and replacing the expected signature of (t,Wt) by its value (see [13, 16]), we get a2t2 a3t3 1 1 E(Y(3)(1))= at + + a3b2t4 a4b2t5, 0,t − 2 6 4 − 10 7a4t4 a5t5 7 E(2Y(3)(1,1))= a2t2 a3t3 + + a4b2t5 0,t − 12 − 6 10 a6t6 17 191 + a5b2t6 + a6b2t7 36 − 20 420 11 21 1 a7b2t8 + a6b4t8 + a8b2t9 − 105 80 144 43 33 1 a7b4t9 + a8b4t10 + a8b6t11. − 180 700 50 This gives us an approximation of the moments of the solution as polyno- mials of the parameters. The empirical moments are computed from the data. We generate 2,000 approximate realizations of paths of the solution using Milstein’s method with discretization step 0.001. We use these paths to approximate the iter- 24 A. PAPAVASILIOU AND C. LADROUE

Fig. 1. 100 realizations of the expected signature matching estimator, after centering and normalizing by the asymptotic variance. Left: from fractional Brownian motion paths (Hurst index = 11/24), right: from Brownian motion paths.

1 ated integrals over the interval [0, 4 ]. We use the values a =1 and b = 2. Then 1 we get an approximation to the empirical moments at T = 4 by averaging Y the different realizations of the iterated integrals of [0,1/4]. Finally, by equating the empirical and theoretical approximations to the 1 moments for t = 4 , we get a system of polynomials of (a, b) of degree 14. We get two exact real solutions to this system: (0.996353, 2.12892) and (0.996353, 2.12892). As expected, the sign of b cannot be− identified. The estimates are very close to the true values. We repeat this process 100 times and get 100 different estimates of (a, b). In figure, we normalize the 100 positive solutions by the asymptotic vari- V ance (3.18), where Dr (θ)i,τ and ΣV (θ0)τ,τ ′ in (3.15) are computed, the first using approximation of the theoretical moments from Picard iterations and the second is computed from the data by Monte Carlo. The normalized estimates are shown in Figure 1 (right). Their covariance matrix is 0.97172 0.0243445 , 0.0243445 0.954654   which is very close to the identity.

5.2. Fractional diffusions. We now apply the ESME to estimate the pa- rameters of the differential equation driven by fractional diffusion with Hurst parameter h> 1/4. We choose the same vector field as before. Let (5.2) dY = a(1 Y ) dX(1) + bY 2 dX(2), Y (1) = 0, t − t t t t 0 PARAMETER ESTIMATION FOR ROUGH DIFFERENTIAL EQUATIONS 25

(1) (2) h h where Xt = t and Xt = Bt , where Bt is fractional Brownian motion with Hurst parameter h. Fractional Brownian motion generalizes Brownian motion, in the sense that it is a self-similar . It is defined as the Gaussian process with correlation given by E(BhBh)= 1 ( s 2h + t 2h t s 2h). s t 2 | | | | − | − | 1 Clearly, for h = 2 we get independent intervals and Brownian motion. For 1 h> 2 the intervals are positively correlated and “smoother” than Brownian 1 motion while for h< 2 they are negatively correlated and they get more and more “rough” as h gets smaller. In particular, the paths of fractional 1 Brownian motion possess finite p-variation for every p> h . Defining integration with respect to fractional Brownian motion is nec- essary in order for (5.2) to make sense. This is nontrivial and it is a very active area of research. One of the most successful approach is given by 1 rough paths—but it is limited to h> 4 (see [18] or [25] for a more recent approach), that is, to paths of finite p-variation for p< 4. Having defined (5.2) as a differential equation driven by the rough path (t, h Bt ), we can proceed to estimate the parameters a and b. As in the diffu- sion case, we first construct an approximation to the theoretical moments, using Picard iterations. One difference is that up to this moment, an an- alytic expression for the expected signature is not known. Instead, we get a numerical approximation by simulating many paths of fractional Brownian motion, computing their iterated integral and then averaging. 1 11 We need to set some parameters: we choose T = 4 as before and h = 24 . We use 1,000 paths of fractional Brownian motion with Hurst parameter h = 11 −3 24 —these are exact simulations with discretization step 10 —to compute the iterated integrals appearing in the Picard iteration and then average to approximate their expectations. We get the following formulas for the theoretical approximation of the first two moments of the response Y :

E(Y(3)(1) ) = 0.25a 0.03125a2 + 0.00260417a3 0,1/4 − + 0.00044726a2 b 0.000111815a3 b − + 4.97138 10−6a4b + 0.00116494a3 b2 × 0.000115953a4 b2 + 2.53676 10−6a4b3, − × E(2Y(3)(1,1) ) = 0.0625a2 0.015625a3 + 0.00227865a4 0,1/4 − 0.00016276a5 + 6.78168 10−6a6 − × + 0.00022363a3 b 0.0000838612a4 b − + 0.0000118036a5 b 8.93081 10−7a6b − × 26 A. PAPAVASILIOU AND C. LADROUE

+ 2.58926 10−8a7b + 0.000814373a4 b2 × 0.000246738a5 b2 + 0.000033084a6 b2 − 1.92279 10−6a7b2 + 3.27969 10−8a8b2 − × × + 4.39419 10−6a5b3 1.24474 10−6a6b3 × − × + 1.21202 10−7a7b3 3.26456 10−9a8b3 × − × + 5.74363 10−6a6b4 1.31226 10−6a7b4 × − × + 6.56898 10−8a8b4 + 1.3868 10−8a7b5 × × 1.39803 10−9a8b5 + 8.47574 10−9a8b6. − × × We create the data by numerically simulating 2,000 paths of the solution 11 −3 of (5.2) for h = 24 , a =1 and b = 2 and discretization step δ = 10 . We use a method proposed by Davie that is the equivalent of Milstein’s method for differential equations driven by fractional Brownian motion (see [5] and references within). The error is of order δ3h−1, which for our choices of discretization step δ and Hurst parameter h is 0.075. Finally, we match the theoretical moments that are polynomials of (a, b) with the empirical moments and solve the system. As in the diffusion case, we get two solutions corresponding to b positive or negative. Since frac- tional Brownian motion is mean zero Gaussian process, we cannot expect to identify the sign of b. We repeat the process 100 times to get 100 realizations of the estimates. These are shown in Figure 1 (left), after normalization.

5.3. Parameter estimation from one path. As described in Section 4, we can apply this method on a single stationary path. We consider the fractional Orstein–Uhlenbeck process: (5.3) dY = a(Y b) dt + c dBh, Y (1) = Y . t t − t 0 0 Through Picard iteration we compute the expansion of the two first mo- h ments. If Bt is a Brownian motion, we obtain the polynomials E(Y ) = Y0 abt 1 a2bt2 1 a3bt3 1 a4bt4 1 a5bt5 + , t − − 2 − 6 − 24 − 120 · · · E(Y 2) = Y02 + c2t + a2b2t2 + ac2t2 + a3b2t3 + 2 a2c2t3 + 7 a4b2t4 + . t 3 12 · · · h If Bt is a fractional Brownian motion, the two moments also have an analytic expression. We use Davie’s method (see [5]) to numerically simulating one paths of the solution of (5.3) for h = 11 and h = 1 , a = 5, b = 2, c = 1 and dis- 24 2 − cretization step δ = 10−2. The error is of order δ3h−1, which for our choices of discretization step δ and Hurst parameter h is 0.17. PARAMETER ESTIMATION FOR ROUGH DIFFERENTIAL EQUATIONS 27

Fig. 2. 500 realizations of the ESME for single paths. Left: from fractional O.U. paths (Hurst index = 11/24), right: from O.U. paths.

Using the simulated data, we apply the method described in Section 4 for T =7 and S = 0.01 in order to estimate b and c. Figure 2 shows 500 realizations of the estimations of the mean b and volatility c.

REFERENCES [1] A¨ıt-Sahalia, Y. and Mykland, P. A. (2008). An analysis of Hansen–Scheinkman moment estimators for discretely and randomly sampled diffusions. J. Econo- metrics 144 1–26. MR2439920 [2] Beskos, A., Papaspiliopoulos, O. and Roberts, G. (2009). Monte Carlo max- imum likelihood estimation for discretely observed diffusion processes. Ann. Statist. 37 223–245. MR2488350 [3] Bishwal, J. P. N. (2008). Parameter Estimation in Stochastic Differential Equa- tions. Lecture Notes in Math. 1923. Springer, Berlin. MR2360279 [4] Calderon, C. P. (2007). Fitting effective diffusion models to data associated with a “glassy” potential: Estimation, classical inference procedures, and some heuris- tics. Multiscale Model. Simul. 6 656–687 (electronic). MR2338498 [5] Deya, A., Neuenkirch, A. and Tindel, S. (2010). A Milstein-type scheme without L´evy area terms for SDEs driven by fractional Brownian motion. Available at arXiv:1001.3344. [6] Friz, P. K. and Victoir, N. B. (2010). Multidimensional Stochastic Processes as Rough Paths: Theory and Applications. Cambridge Studies in Advanced Mathe- matics 120. Cambridge Univ. Press, Cambridge. MR2604669 [7] Gobet, E., Hoffmann, M. and Reiß, M. (2004). Nonparametric estimation of scalar diffusions based on low frequency data. Ann. Statist. 32 2223–2253. MR2102509 [8] Hairer, M. and Pillai, N. S. (2009). Ergodicity of hypoelliptic SDEs driven by fractional Brownian motion. Available at arXiv:0909.4505. [9] Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50 1029–1054. MR0666123 28 A. PAPAVASILIOU AND C. LADROUE

[10] Hansen, L. P. and Scheinkman, J. A. (1995). Back to the future: Generating moment implications for continuous-time Markov processes. Econometrica 63 767–804. MR1343081 [11] Hu, Y. and Nualart, D. (2010). Parameter estimation for fractional Ornstein– Uhlenbeck processes. Statist. Probab. Lett. 80 1030–1038. MR2638974 [12] Hult, H. (2003). Approximating some Volterra type stochastic integrals with appli- cations to parameter estimation. . Appl. 105 1–32. MR1972287 [13] Kloeden, P. E. and Platen, E. (1992). Numerical Solution of Stochastic Differ- ential Equations. Applications of Mathematics (New York) 23. Springer, Berlin. MR1214374 [14] Kuchler,¨ U. and Sørensen, M. (1997). Exponential Families of Stochastic Pro- cesses. Springer, New York. MR1458891 [15] Kutoyants, Y. A. (2004). Statistical Inference for Ergodic Diffusion Processes. Springer, London. MR2144185 [16] Ladroue, C. (2010). Expectation of Stratonovich iterated integrals of Wiener pro- cesses. Available at arXiv:1008.4033. [17] Li, J., Kevrekidis, P. G., Gear, C. W. and Kevrekidis, I. G. (2007). Deciding the nature of the coarse equation through microscopic simulations: The baby- bathwater scheme. SIAM Rev. 49 469–487 (electronic). MR2353808 [18] Lyons, T. and Qian, Z. (2002). System Control and Rough Paths. Oxford Univ. Press, Oxford. MR2036784 [19] Lyons, T. J., Caruana, M. and Levy,´ T. (2007). Differential Equations Driven by Rough Paths. Lecture Notes in Math. 1908. Springer, Berlin. MR2314753 [20] Neuenkirch, A., Nourdin, I. and Tindel, S. (2008). Delay equations driven by rough paths. Electron. J. Probab. 13 2031–2068. MR2453555 [21] Papavasiliou, A. (2010). Coarse-grained modelling of multiscale diffusions: The p- variation estimates. In Stochastic Analysis 2010 (D. Crisan, ed.). Springer, Berlin. [22] Prakasa Rao, B. L. S. (1999). Statistical inference for diffusion type processes. In Kendall’s Library of Statistics 8. Oxford Univ. Press, New York. MR1717690 [23] Reiss, M. (2005). Adaptive estimation for affine stochastic delay differential equa- tions. Bernoulli 11 67–102. MR2121456 [24] Tudor, C. A. and Viens, F. G. (2007). Statistical aspects of the fractional . Ann. Statist. 35 1183–1212. MR2341703 [25] Unterberger, J. (2009). Stochastic calculus for fractional Brownian motion with 1 Hurst exponent H > 4 : A rough path method by analytic extension. Ann. Probab. 37 565–614. MR2510017

Department of Statistics Bristol Genetic Epidemiology Laboratories University of Warwick MRC Centre for Causal Analyses CV4 7AL, Coventry in Translational Epidemiology United Kingdom School of Social and Community Medicine and University of Bristol Department of Applied Mathematics Bristol BS8 2BN University of Crete United Kingdom P.O. Box 2208, 71409 Heraklion Crete, Greece E-mail: [email protected]