Rough Paths Theory and its Application to Time Series Analysis of Financial Data Streams
Antti K. Vauhkonen Christ Church University of Oxford
A thesis submitted in partial fulfillment of the degree of Master of Science in Mathematical Finance
Trinity 2017 Abstract
The signature of a continuous multi-dimensional path of bounded varia- tion, i.e. the sequence of its iterated integrals, is a central concept in the theory of rough paths. The key insight of this theory is that for any path of finite p-variation with p 1 (e.g. sample paths of Brownian motion have ≥ finite p-variation for any p > 2 almost surely), one can define a construct analogous to signature, called its rough path lift, that incorporates all the information required for solving controlled differential equations driven by the given path. In the first part of this thesis we give an intuitive yet mathematically rigorous introduction to rough paths. Information encoded in the signatures of multi-dimensional discrete data streams can also be utilised in their time series analysis, and in some recent publications signatures of financial data streams have been used as feature sets in linear regression for the purposes of classifying data and making statistical predictions. In the second part of this thesis we present a novel application of this signature-based approach in the context of a market model where every variable is assumed to follow a diffusion process that either has a constant underlying drift or reverts to some long-term mean. Specifically, we show that third order areas of financial data streams – special linear combinations of their fourth order iterated integrals – provide an efficient means of determining the parameters ofa market variable given one of its realisations in a space of finitely many Brownian sample paths that can drive the process, and thus enable one to distinguish between the two fundamental modes of market behaviour, namely upward or downward trending versus mean-reverting. An interesting line of future research would be to investigate the possibility of using third order areas as a tool for decomposing arbitrary market paths into mean-reverting path components with a spectrum of mean reversion speeds. To the memory of my mother. Acknowledgements
I would like to express my gratitude to my academic supervisor Prof. Ben Hambly for his technical guidance, careful reading of my thesis and valuable comments. I also owe a big debt of gratitude to Dr. Daniel Jones for giving his time so generously, his wise counsel and constant encouragement and support without which this thesis would probably never have been completed. My sincere thanks are also due to my family for their help, support and understanding while working on this thesis over a period that at times must have seemed interminable. Lastly, with love and eternal gratitude I remember my late mother, my most steadfast supporter in all of my varied endeavours, who sadly didn’t live to see this project reach its conclusion. Contents
1 Rough paths theory 1 1.1 Origins of rough paths ...... 1 1.2 Formal definition of rough paths ...... 11
2 Application of rough paths theory to time series analysis of financial data streams 21 2.1 Classical time series analysis ...... 21 2.2 Signatures as feature sets for linear regression analysis ...... 22 2.3 Lead and lag transforms of data streams ...... 27 2.3.1 Gyurk´o-Lyons-Kontkowski-Field method ...... 28 2.3.2 Flint-Hambly-Lyons method (Mark 1) ...... 30 2.3.3 Flint-Hambly-Lyons method (Mark 2) ...... 30 2.4 Area processes of multi-dimensional paths ...... 32 2.4.1 Definition and basic properties of areas ...... 32 2.4.2 Higher order areas ...... 35 2.5 Classification of paths using third order areas ...... 39 2.5.1 Diffusion process market model ...... 39 2.5.2 Areas for pairs of diffusion processes ...... 42 2.5.3 Classifying sample paths of diffusion processes by using third order areas ...... 49 2.6 Conclusion ...... 52
References 54
Appendix 1: Quadratic variation and cross-variation of data streams 56
Appendix 2: Python code 58
i List of Figures
1 GLKF method of lead-lag transforming data streams...... 29 2 FHL (Mark1) method of lead-lag transforming data streams...... 31 3 FHL (Mark 2) method of lead-lag transforming data streams. . . . . 32 4 Area between path components Xi and Xj...... 33 5 A typical 2-dimensional Brownian sample path...... 34 6 Scatter plot of areas of two pairs of MR processes with different long- term means and volatilities, but all four processes having the same mean reversion speed and driven by the same Brownian path. . . . . 43 7 Scatter plot of areas of the same two pairs of MR processes as in Figure 6 after slightly altering the mean reversion speed for one of the processes. 44 8 Scatter plot of the areas of a pair of two CD processes and a mixed pair of CD and MR processes all driven by the same Brownian path. 45 9 Scatter plot of terminal values of the areas of two mixed pairs of CD and MR processes for 500 simulation runs...... 46 10 Scatter plot of terminal values of the same two areas as in Figure 9 with different long-term means assigned to the MR processes. .47 11 Scatter plots of terminal values of the areas of two pairs of MR pro- cesses all having the same mean reversion speed for 500 simulation runs with the pairwise correlation between Brownian motions driving the processes equal to 1.00, 0.99 and 0.90, respectively...... 48
ii List of Tables
1 Determining the mean reversion speed of a given ‘market’ path by minimising its third order area with three test paths all driven by the same Brownian motion...... 50
iii Chapter 1
Rough paths theory
1.1 Origins of rough paths
There is no more fundamental question in science than that pertaining to the nature of change. Since antiquity thinkers have pondered over problems concerning motion, as illustrated by the famous paradoxes of Zeno. In one of them, Zeno argued that a flying arrow occupies a particular position in space at any given instant of time,hence is instantaneously motionless, and since time consists of instants, he concluded that motion is just an illusion; and in another paradox the Greek hero Achilles was unable to overtake a tortoise in a race where the latter had been given a head start, for in order to accomplish this he would need to traverse an infinite number of (progressively shorter) distances, which, according to Zeno, is impossible in a finite amount of time. While the arrow paradox can be seen simply as an acute observation that motion has no meaning with respect to a single instant of time – in fact any set of instants which has zero measure – the notion of an infinite series that is convergent to a limit provides a satisfactory resolution to the Achilles and tortoise paradox: specifically, that a geometric series like 1/2 + 1/4 + 1/8 + 1/16 + that arises in the equivalent ∙ ∙ ∙ dichotomy paradox does not grow without limit but converges to 1, enabling Achilles to quickly pass the tortoise. The concept of a limit of a function f that expresses the dependence of a variable y on another variable x as y = f(x) was similarly crucial to the development in the late 1600s of modern differential and integral calculus which provides proper analytical tools for the mathematical study of change. The chief among these is the derivative of ˙ df(x) a function, usually denoted by f 0(x), f(x) or dx , which, as the last notation due to Leibniz suggests, was originally conceived as the quotient of an infinitesimally small change df(x) in the value of the function f(x) corresponding to an infinitesimally
1 small change dx in the value of its argument x, until derivatives were defined in a more rigorous way using the (, δ)-definition of a limit in the early 19th century. Rather than needing to differentiate given functions, one is often faced with the (usually harder) inverse problem of finding a function F (x) whose derivative is a given function f(x), i.e. solving the differential equation
dF (x) = f(x). (1.1) dx By the fundamental theorem of calculus, such an antiderivative F (x) of f(x) is the same as an indefinite integral of f(x), i.e.
x F (x) = f(z) dz Za for any constant a < x in the domain of f where it is continuous. Differential equations first emerged in the context of dynamical systems asaway to implicitly describe their time evolution, and most fundamental laws in the mathe- matical sciences from fluid dynamics and electromagnetism to general relativity and quantum mechanics – and also mathematical finance – are expressed in terms of dif- ferential equations. For example, if in (1.1), relabelling the independent variable t for time, f(t) is a time-varying force acting on a body of mass m, then, according to dx(t) Newton’s second law of motion, the momentum mv(t) = m dt of the body, where x(t) is its position at time t, is a solution of this differential equation. Indeed, this is the first type of differential equation Newton considered and solved using infinite series in his Methodus fluxionum et Serierum Infinitarum of 1671. The second type of differential equations that Newton studied in the same work are of the form dy = f(x, y), dx and we will be especially interested in the special case where f is a function of the unknown variable y only, i.e. dy = f(y). (1.2) dx However, not all functions are differentiable. Up to the second half of the19 th century, it was a general belief among mathematicians that continuous functions had to be everywhere differentiable except at some isolated singular points, until the first examples of ‘pathological’ continuous functions that are nowhere differentiable were constructed by Riemann and Weierstrass. Actually, far from being pathological, such functions are in fact the norm rather than exceptions, for almost all continuous
2 functions – viewed as sample paths of a Brownian motion – can be seen to be nowhere differentiable! For non-differentiable functions, we would like to generalise (1.2) and write it(in a manner of Leibniz) in the following form:
dy = f(y) dx (1.3)
– hoping that we can still make sense of it subject to some conditions! We can think of (1.3) as describing a dynamical system that evolves in such a way that the change in its state y = yt over an infinitesimally small time interval t, t + dt is given by the product of its velocity in the current state, as specified by the vector field f on the state space, and the corresponding increment in the control process xt driving the system. In general, the state space of a dynamical system is some manifold that is locally either a Euclidean space or a Banach space so that in these two cases (1.3) can be rewritten as d i dyt = fi(yt) dxt (1.4) i=1 X e e e i where yt R , fi : R R and xt R for i = 1, . . . , d, or ∈ → ∈
dyt = f(yt) dxt (1.5) where y U, x V and f : U L(V,U) with U and V (possibly infinite- t ∈ t ∈ → dimensional) Banach spaces. Equations of the type of (1.4) and (1.5) are called controlled differential equations.
Thus, assuming that initially at time 0 the system is in state y0, solving for its state yt at an arbitrary time t > 0 involves iterating its equation of motion (1.5) an infinite number of times and integrating all the infinitesimally small local changes into a global change over the time interval 0, t , so that
t yt = y0 + f(yu) dxu. (1.6) Z0 Within the theory of rough paths, whose formal definition will be given in the next section, the function I (x , y ) y is called the Itˆomap associated with the vector f t 0 7→ t field f.
In other words, in the language of differential geometry, for finding yt we should be able to integrate the one-form f on V with values in the linear space of vector fields on U with respect to the path xt in V . As one might expect, this cannot be done
3 in general without imposing some regularity conditions on f and xt. It is a classical result (the Picard-Lindel¨oftheorem, see [10, Theorem 1.3]) that if the vector field f is
Lipschitz continuous and the control process xt is a path of bounded variation in V , then, for any initial condition y U, (1.5) has a unique solution given by (1.6) where 0 ∈ the integral is defined as a Stieltjes integral. Under the weaker condition that f is merely continuous, a solution is still guaranteed to exist by the Cauchy-Peano theorem (see [10, Theorem 1.4]), but it may not be unique. But for less regular driving signals – e.g. sample paths of a Brownian motion – classical integration methods are known to fail. Let us see why this is the case by considering formal solution of controlled differential equations using iteration.
For simplicity, we shall consider the 1-dimensional case where xt, yt and f all take values in R. Hence, formally integrating (1.4) gives
u=t u=t
dyu = f(yu) dxu. (1.7) uZ=s uZ=s
Under any reasonable definition of an integral, the left hand side of (1.7) must be equal to δy := y y , so we have s,t t − s u=t
δys,t = f(yu) dxu. (1.8) uZ=s
Further, expanding f about ys in a formal Taylor series (effectively assuming that f is an analytic function) on the right hand side of (1.8), then using (1.8) to substi- tute integral expressions for the increments δys,t in the Taylor series expansion, and repeating the procedure yields after three iterations
u=t
δys,t = f(ys) dxu uZ=s u=t v=u
+ f 0(ys)f(ys) dxv dxu Z Z u=s v=s (1.9) u=t v=u w=v 2 + f 0(ys) f(ys) dxw dxv dxu uZ=s vZ=s wZ=s u=t v=u w=u 1 2 + f 00(y )f(y ) dx dx dx + ... 2 s s v w u uZ=s vZ=s wZ=s 4 where the remaining terms (not shown above) all involve fourth or higher order iter- ated integrals. Moreover, provided that the above integrals satisfy the usual integra- tion by parts formula, the integral in the last term can be written as
u=t v=u w=u u=t v=u w=v dx dx dx = 2 dx dx dx . v w u w v u uZ=s vZ=s wZ=s uZ=s vZ=s wZ=s In the general multi-dimensional case, an expression analogous to (1.9) can be j just as easily derived for each component yt of yt for j = 1, . . . , e. For each integer n 1, let us formally define nth order componentwise iterated integrals of a path x ≥ t in Rd over the time interval s, t by
un=t u1=u2
i1,...,in i1 in xs,t := ... dxu1 . . . dxun (1.10)
unZ=s u1Z=s for i , . . . , i 1, . . . , d . In particular, xi = xi xi for i = 1, . . . , d, so the first 1 n ∈ { } s,t t − s d order iterated integrals of xt R are just its componentwise linear increments over ∈ s, t . Then, for each j 1, . . . , e , we have ∈ { }
∞ j j j i1,...,in yt = ys + Fi1,...,in (ys) xs,t (1.11) n=1 i1,...,in X X1,...,d ∈{ } j where the functions F : Re R are products of partial derivatives of components i1,...,in → j e of the vector fields f : R R evaluated at ys for i = 1, . . . , d and j = 1, . . . , e, as i → in (1.9). As illustrated by (1.11), the importance of iterated integrals for solving controlled differential equations is due to the fact that the local behaviour of the solutionis controlled by the sequence of iterated integrals of the path driving the equation – assuming that the series in (1.11) converges and a solution does indeed exist. However, this is by no means always the case. For we need to remind ourselves that the solution in (1.11) was derived under the strongest possible condition on the vector field f (namely analyticity), and, moreover, we didn’t specify how the iterated integrals in (1.10) should be constructed, but rather tacitly assumed that they can be canonically defined as limits of Riemann sums even though we also didn’t impose any condition on the regularity of the path xt. To advance beyond the classical Picard-Lindel¨ofand Cauchy-Peano theorems, one would like to be able to solve (1.5) for vector fields which satisfy some mildly stronger form of continuity than plain continuity, and for paths
5 which are not of bounded variation but whose irregularity – colloquially roughness – is nevertheless controlled. For this purpose we introduce the concept of p-variation of a path. For a closed bounded time interval 0,T , a subdivision D of 0,T will be taken to mean a finite ordered set of real numbers (t , t , . . . , t ) such that 0 = t < t < < t = T , 0 1 k 0 1 ∙ ∙ ∙ k denoting the set of all subdivisions of 0,T by ( 0,T ). Then we make the following D d Definition 1.1. Let x : 0,T R be a continuous function. Then, for any real → number p 1, the p-variation of x on 0,T is defined by ≥
k 1/p p x p,[0,T ] = sup xth xth 1 k k D ([0,T ]) | − − | ! ∈D Xh=1 where denotes the Euclidean norm on Rd. | ∙ |
The concept of p-variation can also be straightforwardly extended to paths xt that take values in an arbitrary Banach space V by replacing the Euclidean norm with a norm V on V in the above definition. Up to reparameterisation, a path having k ∙ k 1 finite p-variation is equivalent to saying that it is H¨oldercontinuous with exponent p . Of course, paths with finite 1-variation are just paths of bounded variation. It isworth emphasizing that the p-variation of a path is defined by taking the supremum over all the subdivisions of the time interval, not as a limit as the mesh of the subdivision tends to zero, as there are paths of finite (non-zero) p-variation with p > 1 for which the latter is zero. One should also note that if a path has finite p-variation, then it also has finite q-variation for all q > p. As a major advance to the classical theory of integration, L. C. Young discovered in 1936 (see [13]) that Stieltjes integrals can also be defined for paths which are of unbounded variation but have finite p-variation for some p > 1 as long as the 1 1 integrand is a continuous function of finite q-variation such that p + q > 1. This result allows (1.5) to be solved for paths of finite p-variation with 1 p < 2 provided ≤ that the vector field f is Lipschitz-γ continuous with p 1 < γ 1, and, subject to − ≤ these conditions, the Young integral, as a function of t, also has finite p-variation. However, even with Young’s extension of the classical theory, integration against sample paths of stochastic processes remained tantalisingly out of reach for a long time, as many important classes of stochastic processes have finite p-variation with p 2, 3). In particular, almost all Brownian paths have infinite 2-variation and ∈ finitep-variation for all p > 2 on any finite time interval – which is not to be confused with the fundamental fact that the quadratic variation process of a Brownian motion
6 (Bt)t 0 is deterministic, finite and equal to t – and sample paths of semi-martingales ≥ almost surely have finite p-variation for p (2, 3). ∈ It wasn’t until 1945 that integrals of some tractable stochastic processes against Brownian motion were successfully defined when K.Itˆopublished his construction of what is now called, in his honour, the Itˆointegral, which has subsequently been ex- tended to other martingales and, further, semi-martingales as integrators. Essentially the Itˆointegral is a Riemann-Stieltjes type of stochastic integral in that it is defined as the limit of a sequence of Riemann sums of random variables that converges in probability. Unfortunately, such limits do not usually exist in a pathwise sense – which per- haps isn’t all that surprising considering that Brownian motion has exceedingly nice properties – it is a Gaussian process with independent and stationary increments – whereas its sample paths are very rough being (almost surely) nowhere differentiable and having unbounded variation on any time interval (no matter how small). So, in view of this, while Itˆo’stheory of stochastic integration ranks among the princi- pal achievements of 20th century mathematics, developing a theory of integration for Brownian motion paths would appear, on the face of it, an even more challenging task, although some early results in this direction – notably the construction of the L´evyarea of a 2-dimensional Brownian path, defined as the difference of two second order iterated integrals – had been established even before the invention of stochastic integrals. Let us now examine in some detail, albeit somewhat heuristically, what goes wrong when one tries to define iterated integrals of Brownian paths as classical Riemann integrals, as this will give us important clues as to how one should formally define rough paths. But first, as a precursor, let us briefly return to the construction of iterated integrals for more regular paths. 1 d d For a continuous path xt = (xt , . . . , xt ) R on a time interval 0,T all of whose ∈ th components are differentiable functions of t, we can define its n iterated integrals xi1,...,in over a subinterval s, t for any n 1 as limits of the sequences of Riemann s,t ≥ sums
N in i2 n i1 i1 in 1 in 1 in in S (N) = xt xt ... xt − xt − x x s,t i1 (i1 1) in 1 (in 1 1) tin t(in 1) ∙ ∙ ∙ − − − − − − − − in=1 in 1=1 i1=1 − X X X (1.12) where ti t(i 1) = (t s)/N for k = 1, . . . , n, so that k − k− −
i1,...,in n xs,t = lim Ss,t(N). (1.13) N →∞
7 Assuming that (t s) 1, we have, by Taylor’s theorem, that −
ik ik xti xt(i 1) =x ˙(t(ik 1))(t s)/N + o (t s)/N , k − k− − − −