Rough Paths Theory and its Application to Time Series Analysis of Financial Data Streams

Antti K. Vauhkonen Christ Church University of Oxford

A thesis submitted in partial fulfillment of the degree of Master of Science in Mathematical Finance

Trinity 2017 Abstract

The signature of a continuous multi-dimensional path of bounded varia- tion, i.e. the sequence of its iterated , is a central concept in the theory of rough paths. The key insight of this theory is that for any path of finite p-variation with p 1 (e.g. sample paths of Brownian motion have ≥ finite p-variation for any p > 2 almost surely), one can define a construct analogous to signature, called its rough path lift, that incorporates all the information required for solving controlled differential equations driven by the given path. In the first part of this thesis we give an intuitive yet mathematically rigorous introduction to rough paths. Information encoded in the signatures of multi-dimensional discrete data streams can also be utilised in their time series analysis, and in some recent publications signatures of financial data streams have been used as feature sets in linear regression for the purposes of classifying data and making statistical predictions. In the second part of this thesis we present a novel application of this signature-based approach in the context of a market model where every variable is assumed to follow a diffusion process that either has a constant underlying drift or reverts to some long-term mean. Specifically, we show that third order areas of financial data streams – special linear combinations of their fourth order iterated integrals – provide an efficient means of determining the parameters ofa market variable given one of its realisations in a space of finitely many Brownian sample paths that can drive the process, and thus enable one to distinguish between the two fundamental modes of market behaviour, namely upward or downward trending versus mean-reverting. An interesting line of future research would be to investigate the possibility of using third order areas as a tool for decomposing arbitrary market paths into mean-reverting path components with a spectrum of mean reversion speeds. To the memory of my mother. Acknowledgements

I would like to express my gratitude to my academic supervisor Prof. Ben Hambly for his technical guidance, careful reading of my thesis and valuable comments. I also owe a big debt of gratitude to Dr. Daniel Jones for giving his time so generously, his wise counsel and constant encouragement and support without which this thesis would probably never have been completed. My sincere thanks are also due to my family for their help, support and understanding while working on this thesis over a period that at times must have seemed interminable. Lastly, with love and eternal gratitude I remember my late mother, my most steadfast supporter in all of my varied endeavours, who sadly didn’t live to see this project reach its conclusion. Contents

1 Rough paths theory 1 1.1 Origins of rough paths ...... 1 1.2 Formal definition of rough paths ...... 11

2 Application of rough paths theory to time series analysis of financial data streams 21 2.1 Classical time series analysis ...... 21 2.2 Signatures as feature sets for linear regression analysis ...... 22 2.3 Lead and lag transforms of data streams ...... 27 2.3.1 Gyurk´o-Lyons-Kontkowski-Field method ...... 28 2.3.2 Flint-Hambly-Lyons method (Mark 1) ...... 30 2.3.3 Flint-Hambly-Lyons method (Mark 2) ...... 30 2.4 Area processes of multi-dimensional paths ...... 32 2.4.1 Definition and basic properties of areas ...... 32 2.4.2 Higher order areas ...... 35 2.5 Classification of paths using third order areas ...... 39 2.5.1 Diffusion process market model ...... 39 2.5.2 Areas for pairs of diffusion processes ...... 42 2.5.3 Classifying sample paths of diffusion processes by using third order areas ...... 49 2.6 Conclusion ...... 52

References 54

Appendix 1: Quadratic variation and cross-variation of data streams 56

Appendix 2: Python code 58

i List of Figures

1 GLKF method of lead-lag transforming data streams...... 29 2 FHL (Mark1) method of lead-lag transforming data streams...... 31 3 FHL (Mark 2) method of lead-lag transforming data streams. . . . . 32 4 Area between path components Xi and Xj...... 33 5 A typical 2-dimensional Brownian sample path...... 34 6 Scatter plot of areas of two pairs of MR processes with different long- term means and volatilities, but all four processes having the same mean reversion speed and driven by the same Brownian path. . . . . 43 7 Scatter plot of areas of the same two pairs of MR processes as in Figure 6 after slightly altering the mean reversion speed for one of the processes. 44 8 Scatter plot of the areas of a pair of two CD processes and a mixed pair of CD and MR processes all driven by the same Brownian path. 45 9 Scatter plot of terminal values of the areas of two mixed pairs of CD and MR processes for 500 simulation runs...... 46 10 Scatter plot of terminal values of the same two areas as in Figure 9 with different long-term means assigned to the MR processes. .47 11 Scatter plots of terminal values of the areas of two pairs of MR pro- cesses all having the same mean reversion speed for 500 simulation runs with the pairwise correlation between Brownian motions driving the processes equal to 1.00, 0.99 and 0.90, respectively...... 48

ii List of Tables

1 Determining the mean reversion speed of a given ‘market’ path by minimising its third order area with three test paths all driven by the same Brownian motion...... 50

iii Chapter 1

Rough paths theory

1.1 Origins of rough paths

There is no more fundamental question in science than that pertaining to the nature of change. Since antiquity thinkers have pondered over problems concerning motion, as illustrated by the famous paradoxes of Zeno. In one of them, Zeno argued that a flying arrow occupies a particular position in space at any given instant of time,hence is instantaneously motionless, and since time consists of instants, he concluded that motion is just an illusion; and in another paradox the Greek hero Achilles was unable to overtake a tortoise in a race where the latter had been given a head start, for in order to accomplish this he would need to traverse an infinite number of (progressively shorter) distances, which, according to Zeno, is impossible in a finite amount of time. While the arrow paradox can be seen simply as an acute observation that motion has no meaning with respect to a single instant of time – in fact any set of instants which has zero measure – the notion of an infinite series that is convergent to a limit provides a satisfactory resolution to the Achilles and tortoise paradox: specifically, that a geometric series like 1/2 + 1/4 + 1/8 + 1/16 + that arises in the equivalent ∙ ∙ ∙ dichotomy paradox does not grow without limit but converges to 1, enabling Achilles to quickly pass the tortoise. The concept of a limit of a function f that expresses the dependence of a variable y on another variable x as y = f(x) was similarly crucial to the development in the late 1600s of modern differential and calculus which provides proper analytical tools for the mathematical study of change. The chief among these is the derivative of ˙ df(x) a function, usually denoted by f 0(x), f(x) or dx , which, as the last notation due to Leibniz suggests, was originally conceived as the quotient of an infinitesimally small change df(x) in the value of the function f(x) corresponding to an infinitesimally

1 small change dx in the value of its argument x, until derivatives were defined in a more rigorous way using the (, δ)-definition of a limit in the early 19th century. Rather than needing to differentiate given functions, one is often faced with the (usually harder) inverse problem of finding a function F (x) whose derivative is a given function f(x), i.e. solving the differential equation

dF (x) = f(x). (1.1) dx By the fundamental theorem of calculus, such an antiderivative F (x) of f(x) is the same as an indefinite integral of f(x), i.e.

x F (x) = f(z) dz Za for any constant a < x in the domain of f where it is continuous. Differential equations first emerged in the context of dynamical systems asaway to implicitly describe their time evolution, and most fundamental laws in the mathe- matical sciences from fluid dynamics and electromagnetism to general relativity and quantum mechanics – and also mathematical finance – are expressed in terms of dif- ferential equations. For example, if in (1.1), relabelling the independent variable t for time, f(t) is a time-varying force acting on a body of mass m, then, according to dx(t) Newton’s second law of motion, the momentum mv(t) = m dt of the body, where x(t) is its position at time t, is a solution of this differential equation. Indeed, this is the first type of differential equation Newton considered and solved using infinite series in his Methodus fluxionum et Serierum Infinitarum of 1671. The second type of differential equations that Newton studied in the same work are of the form dy = f(x, y), dx and we will be especially interested in the special case where f is a function of the unknown variable y only, i.e. dy = f(y). (1.2) dx However, not all functions are differentiable. Up to the second half of the19 th century, it was a general belief among mathematicians that continuous functions had to be everywhere differentiable except at some isolated singular points, until the first examples of ‘pathological’ continuous functions that are nowhere differentiable were constructed by Riemann and Weierstrass. Actually, far from being pathological, such functions are in fact the norm rather than exceptions, for almost all continuous

2 functions – viewed as sample paths of a Brownian motion – can be seen to be nowhere differentiable! For non-differentiable functions, we would like to generalise (1.2) and write it(in a manner of Leibniz) in the following form:

dy = f(y) dx (1.3)

– hoping that we can still make sense of it subject to some conditions! We can think of (1.3) as describing a dynamical system that evolves in such a way that the change in its state y = yt over an infinitesimally small time interval t, t + dt is given by the product of its velocity in the current state, as specified by the vector field f on the state space, and the corresponding increment in the control process xt driving the system. In general, the state space of a dynamical system is some manifold that is locally either a Euclidean space or a so that in these two cases (1.3) can be rewritten as d i dyt = fi(yt) dxt (1.4) i=1 X e e e i where yt R , fi : R R and xt R for i = 1, . . . , d, or ∈ → ∈

dyt = f(yt) dxt (1.5) where y U, x V and f : U L(V,U) with U and V (possibly infinite- t ∈ t ∈ → dimensional) Banach spaces. Equations of the type of (1.4) and (1.5) are called controlled differential equations.

Thus, assuming that initially at time 0 the system is in state y0, solving for its state yt at an arbitrary time t > 0 involves iterating its equation of motion (1.5) an infinite number of times and integrating all the infinitesimally small local changes into a global change over the time interval 0, t , so that

t  yt = y0 + f(yu) dxu. (1.6) Z0 Within the theory of rough paths, whose formal definition will be given in the next section, the function I (x , y ) y is called the Itˆomap associated with the vector f t 0 7→ t field f.

In other words, in the language of differential geometry, for finding yt we should be able to integrate the one-form f on V with values in the linear space of vector fields on U with respect to the path xt in V . As one might expect, this cannot be done

3 in general without imposing some regularity conditions on f and xt. It is a classical result (the Picard-Lindel¨oftheorem, see [10, Theorem 1.3]) that if the vector field f is

Lipschitz continuous and the control process xt is a path of bounded variation in V , then, for any initial condition y U, (1.5) has a unique solution given by (1.6) where 0 ∈ the integral is defined as a Stieltjes integral. Under the weaker condition that f is merely continuous, a solution is still guaranteed to exist by the Cauchy-Peano theorem (see [10, Theorem 1.4]), but it may not be unique. But for less regular driving signals – e.g. sample paths of a Brownian motion – classical integration methods are known to fail. Let us see why this is the case by considering formal solution of controlled differential equations using iteration.

For simplicity, we shall consider the 1-dimensional case where xt, yt and f all take values in R. Hence, formally integrating (1.4) gives

u=t u=t

dyu = f(yu) dxu. (1.7) uZ=s uZ=s

Under any reasonable definition of an integral, the left hand side of (1.7) must be equal to δy := y y , so we have s,t t − s u=t

δys,t = f(yu) dxu. (1.8) uZ=s

Further, expanding f about ys in a formal Taylor series (effectively assuming that f is an analytic function) on the right hand side of (1.8), then using (1.8) to substi- tute integral expressions for the increments δys,t in the Taylor series expansion, and repeating the procedure yields after three iterations

u=t

δys,t = f(ys) dxu uZ=s u=t v=u

+ f 0(ys)f(ys) dxv dxu Z Z u=s v=s (1.9) u=t v=u w=v 2 + f 0(ys) f(ys) dxw dxv dxu uZ=s vZ=s wZ=s u=t v=u w=u 1 2 + f 00(y )f(y ) dx dx dx + ... 2 s s  v  w u uZ=s vZ=s wZ=s     4 where the remaining terms (not shown above) all involve fourth or higher order iter- ated integrals. Moreover, provided that the above integrals satisfy the usual integra- tion by parts formula, the integral in the last term can be written as

u=t v=u w=u u=t v=u w=v dx dx dx = 2 dx dx dx .  v  w u w v u uZ=s vZ=s wZ=s uZ=s vZ=s wZ=s     In the general multi-dimensional case, an expression analogous to (1.9) can be j just as easily derived for each component yt of yt for j = 1, . . . , e. For each integer n 1, let us formally define nth order componentwise iterated integrals of a path x ≥ t in Rd over the time interval s, t by

  un=t u1=u2

i1,...,in i1 in xs,t := ... dxu1 . . . dxun (1.10)

unZ=s u1Z=s for i , . . . , i 1, . . . , d . In particular, xi = xi xi for i = 1, . . . , d, so the first 1 n ∈ { } s,t t − s d order iterated integrals of xt R are just its componentwise linear increments over ∈ s, t . Then, for each j 1, . . . , e , we have ∈ { }

  ∞ j j j i1,...,in yt = ys + Fi1,...,in (ys) xs,t (1.11) n=1 i1,...,in X X1,...,d ∈{ } j where the functions F : Re R are products of partial derivatives of components i1,...,in → j e of the vector fields f : R R evaluated at ys for i = 1, . . . , d and j = 1, . . . , e, as i → in (1.9). As illustrated by (1.11), the importance of iterated integrals for solving controlled differential equations is due to the fact that the local behaviour of the solutionis controlled by the sequence of iterated integrals of the path driving the equation – assuming that the series in (1.11) converges and a solution does indeed exist. However, this is by no means always the case. For we need to remind ourselves that the solution in (1.11) was derived under the strongest possible condition on the vector field f (namely analyticity), and, moreover, we didn’t specify how the iterated integrals in (1.10) should be constructed, but rather tacitly assumed that they can be canonically defined as limits of Riemann sums even though we also didn’t impose any condition on the regularity of the path xt. To advance beyond the classical Picard-Lindel¨ofand Cauchy-Peano theorems, one would like to be able to solve (1.5) for vector fields which satisfy some mildly stronger form of continuity than plain continuity, and for paths

5 which are not of bounded variation but whose irregularity – colloquially roughness – is nevertheless controlled. For this purpose we introduce the concept of p-variation of a path. For a closed bounded time interval 0,T , a subdivision D of 0,T will be taken to mean a finite ordered set of real numbers (t , t , . . . , t ) such that 0 = t < t < < t = T ,   0 1 k   0 1 ∙ ∙ ∙ k denoting the set of all subdivisions of 0,T by ( 0,T ). Then we make the following D d    Definition 1.1. Let x : 0,T R be a continuous function. Then, for any real → number p 1, the p-variation of x on 0,T is defined by ≥  

 k 1/p p x p,[0,T ] = sup xth xth 1 k k D ([0,T ]) | − − | ! ∈D Xh=1 where denotes the Euclidean norm on Rd. | ∙ |

The concept of p-variation can also be straightforwardly extended to paths xt that take values in an arbitrary Banach space V by replacing the Euclidean norm with a norm V on V in the above definition. Up to reparameterisation, a path having k ∙ k 1 finite p-variation is equivalent to saying that it is H¨oldercontinuous with exponent p . Of course, paths with finite 1-variation are just paths of bounded variation. It isworth emphasizing that the p-variation of a path is defined by taking the supremum over all the subdivisions of the time interval, not as a limit as the mesh of the subdivision tends to zero, as there are paths of finite (non-zero) p-variation with p > 1 for which the latter is zero. One should also note that if a path has finite p-variation, then it also has finite q-variation for all q > p. As a major advance to the classical theory of integration, L. C. Young discovered in 1936 (see [13]) that Stieltjes integrals can also be defined for paths which are of unbounded variation but have finite p-variation for some p > 1 as long as the 1 1 integrand is a continuous function of finite q-variation such that p + q > 1. This result allows (1.5) to be solved for paths of finite p-variation with 1 p < 2 provided ≤ that the vector field f is Lipschitz-γ continuous with p 1 < γ 1, and, subject to − ≤ these conditions, the Young integral, as a function of t, also has finite p-variation. However, even with Young’s extension of the classical theory, integration against sample paths of stochastic processes remained tantalisingly out of reach for a long time, as many important classes of stochastic processes have finite p-variation with p 2, 3). In particular, almost all Brownian paths have infinite 2-variation and ∈ finitep-variation for all p > 2 on any finite time interval – which is not to be confused with the fundamental fact that the quadratic variation process of a Brownian motion

6 (Bt)t 0 is deterministic, finite and equal to t – and sample paths of semi-martingales ≥ almost surely have finite p-variation for p (2, 3). ∈ It wasn’t until 1945 that integrals of some tractable stochastic processes against Brownian motion were successfully defined when K.Itˆopublished his construction of what is now called, in his honour, the Itˆointegral, which has subsequently been ex- tended to other martingales and, further, semi-martingales as integrators. Essentially the Itˆointegral is a Riemann-Stieltjes type of stochastic integral in that it is defined as the limit of a sequence of Riemann sums of random variables that converges in probability. Unfortunately, such limits do not usually exist in a pathwise sense – which per- haps isn’t all that surprising considering that Brownian motion has exceedingly nice properties – it is a with independent and stationary increments – whereas its sample paths are very rough being (almost surely) nowhere differentiable and having unbounded variation on any time interval (no matter how small). So, in view of this, while Itˆo’stheory of stochastic integration ranks among the princi- pal achievements of 20th century mathematics, developing a theory of integration for Brownian motion paths would appear, on the face of it, an even more challenging task, although some early results in this direction – notably the construction of the L´evyarea of a 2-dimensional Brownian path, defined as the difference of two second order iterated integrals – had been established even before the invention of stochastic integrals. Let us now examine in some detail, albeit somewhat heuristically, what goes wrong when one tries to define iterated integrals of Brownian paths as classical Riemann integrals, as this will give us important clues as to how one should formally define rough paths. But first, as a precursor, let us briefly return to the construction of iterated integrals for more regular paths. 1 d d For a continuous path xt = (xt , . . . , xt ) R on a time interval 0,T all of whose ∈ th components are differentiable functions of t, we can define its n iterated integrals xi1,...,in over a subinterval s, t for any n 1 as limits of the sequences of Riemann s,t ≥ sums  

N in i2 n i1 i1 in 1 in 1 in in S (N) = xt xt ... xt − xt − x x s,t i1 (i1 1) in 1 (in 1 1) tin t(in 1) ∙ ∙ ∙ − − − − − − − − in=1 in 1=1 i1=1 − X X X   (1.12) where ti t(i 1) = (t s)/N for k = 1, . . . , n, so that k − k− −

i1,...,in n xs,t = lim Ss,t(N). (1.13) N →∞

7 Assuming that (t s) 1, we have, by Taylor’s theorem, that − 

ik ik xti xt(i 1) =x ˙(t(ik 1))(t s)/N + o (t s)/N , k − k− − − −

i1,...,in which, when substituted into (1.12), implies, by (1.13), that x (t s)n. s,t ∼ − If xt has bounded variation on 0,T , then its iterated integrals can be similarly defined as Stieltjes integrals, and we also have xi1,...,in (t s)n.   s,t ∼ − Even when x is of unbounded variation but has finite p-variation for some p t ∈ (1, 2) its iterated integrals can still be defined in this way as Young integrals, but now xi1,...,in (t s)n/p. Thus, in common with paths of bounded variation, this means s,t ∼ − that also in this case xi1,...,in = o(t s) for any n 2, so that second and higher order s,t − ≥ iterated integrals all become negligible as t s. → Finally, let xt be a sample path of a Brownian motion (Bt)0 t T , and, for the sake ≤ ≤ of simplicity, let us assume that d = 1, i.e. the Brownian motion Bt is 1-dimensional.

It is instructive to consider iterated integrals of xt from the view point of expected values of corresponding stochastic integrals of Bt using the defining properties of Brownian motion – even though this will not lead us to precisely the right answers.

For example, the sum of the expected absolute increments of Bt over 0,T in the limit as the size of time increments tends to zero is given by   N N 2 T 2T N lim E Bi(T/N) B(i 1)(T/N) = lim = lim = N | − − | N π N N π ∞ →∞ i=1 →∞ i=1 →∞ X   X q q q since, for all 0 s < t T , Bt Bs is normally distributed with mean 0 and variance ≤ ≤ − 2 t s, and hence E Bt Bs = √t s. These results suggest that a Brownian − | − | π − path xt has infinite variation onq any finite time interval (as T can be made arbitrarily small) – which is correct (almost surely) – and that x 1 = x x (t s)1/2 – which s,t t − s ∼ − is almost correct.

Similarly, one can get a measure of the 2-variation of xt on 0,T by computing

N N   2 E Bi(T/N) B(i 1)(T/N) = T/N = T, (1.14) − − i=1 i=1 X    X which indicates that Brownian paths have finite 2-variation – which, as we know by now, is nearly but not quite true. 2 Further, corresponding to the second order iterated integral xs,t of xt over s, t defined by u=t v=u   2 xs,t := dxv dxu uZ=s vZ=s

8 we have the following discrete stochastic approximation

N i

Bs+j(t s)/N Bs+(j 1)(t s)/N Bs+i(t s)/N Bs+(i 1)(t s)/N . (1.15) − − − − − − − − i=1 j=1 X X   Since disjoint increments of Brownian motion are independent, taking expectation of (1.15) simply yields

N 2 E Bs+i(t s)/N Bs+(i 1)(t s)/N (1.16) − − − − i=1 X    which, by (1.14), is just equal to t s. Thus, we might conjecture that x 2 − s,t ∼ (t s) – which again is slightly wrong. In order not to mislead the reader any − further, let us state the correct expressions for the orders of magnitude of the first and second iterated integrals of Brownian paths x : for any p > 2, x 1 (t s)1/p t s,t ∼ − and x 2 (t s)2/p. Hence, for Brownian paths, second iterated integrals do not s,t ∼ − become negligible as t s, but rather both their first and second iterated integrals → are greater than first order in (t s). − Now it is also apparent from (1.15) and (1.16) why the second iterated integral of a Brownian path xt cannot be defined as the limit of Riemann sums, for this would include the following expression

N 2 lim xs+i(t s)/N xs+(i 1)(t s)/N N − − − − →∞ i=1 X  which is infinite since Brownian paths have unbounded 2-variation almost surely, and the total contribution from the cross terms involving disjoint subintervals of s, t can be expected to be finite and small.   d In general, if xt R is a path of finite p-variation on 0,T for some p 1, then, ∈ ≥ motivated by the above discussion, we would like its iterated  integrals – if they can be defined – to satisfy the analytic condition xi1,...,in (t s)n/p for all 0 s < t T s,t ∼ − ≤ ≤ and n 1. In particular, this would mean that xi1,...,in = o(t s) for any n > p , ≥ s,t − b c and thus, going back to our example of solving a controlled differential equation using iteration and assuming that the control process has finite p-variation with p 3, 4), ∈ its solution would be given to first order in (t s) by the terms shown in (1.9), while −  ignoring any of these terms would mean that the Itˆomap taking the control and an initial condition to the solution would in general fail to be continuous. One can now also fully appreciate the significance of p = 2 as a key threshold, since for paths of finite p-variation with 1 p < 2 iterated integrals can be canonically ≤ 9 defined as either Riemann-Stieltjes (p = 1) or Young (1 < p < 2) integrals even though they are not required beyond first order linear increments for solving controlled differential equations, whereas when p 2 second and possibly also higher order ≥ iterated integrals would be needed in the solution if only they could be defined! As we will see, the theory of rough paths provides a general resolution to this fundamental dichotomy. But for now, let us just note, as already mentioned, that Brownian motion is one of those special stochastic processes with irregular sample paths for which second order iterated integrals can be defined pathwise – the L´evyarea ofa 2-dimensional Brownian path involving one specific construction – and with them all differential equations controlled by Brownian paths can be solved (subject to vector fields being regular enough). d The problem of integrating one-forms f along a path xt in R essentially boils down to being able to give meaning to the differential dxt. The challenges that one faces when trying to define differentials of less regular paths are well illustrated by considering Brownian paths. While a differentiable function x(t) of time t becomes smooth and linear on sufficiently small time scales so that its differential canbe expressed as dx(t) =x ˙(t) dt, there is no time scale on which a typical Brownian path w could be guaranteed to behave in a regular fashion: for, as δt 0, δw := t → t w w can go through a whole range of positive and negative values – indeed t+δt − t taking arbitrarily large values with non-zero probability for any δt > 0 – so that δw /δt does not approach any limiting value; all we can say is that δw is expected t | t| to be of the order δt 1/2. Therefore, simply knowing linear increments of a Brownian path is not sufficient to define its differential. d i For a differential equation of the form dyt = i=1 fi(yt) dxt to make sense, we should be able to write down a full expression forP the change δyt,t+δt in yt over a small time interval t, t + δt that is first order in δt, which, assuming that xt has finite p-variation for some p 1 on t, t + δt , involves, as we have seen above, its nth  ≥ order iterated integrals for n = 1,..., p , supposing that these satisfy the analytic  b c  condition xi1,...,in (δt)n/p for all n 1. Then, letting δt 0, any higher order terms t,t+δt ∼ ≥ → become negligible and vanish for an infinitesimally small change dt, so, by (1.11), we have that p b c i1,...,in dyt = Fi1,...,in (yt) dxt (1.17) n=1 i1,...,in X X1,...,d ∈{ } which can then be normally integrated with respect to time t. In this sense, we can i1,...,in i1,...,in say that the sequence of iterated integrals dxt := xt,t+dt over the infinitesimal

10 time interval t, t + dt for n = 1,..., p describes the full differential dx of x . b c t t d We have seen above that the sequence of iterated integrals of a path xt R   ∈ emerges in a natural way when one (formally) solves a differential equation controlled by xt through iteration. Moreover, it has been known since the works of K. T. Chen in the 1970s (see [1]) that if xt is a path of bounded variation and one forms all the iterated integrals of xt into a single mathematical object, called the signature of the path, viewing it as an element of the infinite sequence of successive tensor product powers of Rd, then this object can be shown to possess some remarkable algebraic (multiplicative) properties. The central idea of the theory of rough paths is to define for any path of finite p-variation with p 1 an analogous object, called a p-rough path, as an extension of ≥ the path into an extended tensor product space that satisfies the relevant algebraic conditions. Furthermore, as part of the definition, second and higher order com- ponents of a p-rough path with p 2 – which play the role of canonically defined ≥ iterated integrals for more regular paths with 1 p < 2, and provide the data that ≤ enables differential equations controlled by the rough path to be solved – areassumed to satisfy the analytic condition prescribed above, thus extending the concept of finite p-variation to rough paths. All of these foundational ideas will be made rigorous in the following section where rough paths are formally defined.

1.2 Formal definition of rough paths

Let x : 0,T V be a continuous path of finite 1-variation, as defined in Definition → d 1.1, where V = R with an ordered set of basis vectors e1, . . . , ed , so that, for each   { } integer n 1, the set e ... e : i , . . . , i 1, . . . , d furnishes a basis for the ≥ { i1 ⊗ ⊗ in 1 n ∈ { }} th n n tensor power V ⊗ of V . For short, we shall denote e ... e by e . It is i1 ⊗ ⊗ in i1,...,in n easy to see that, for each n 1, V⊗ is isomorphic as a vector space to the space of ≥ homogeneous polynomials of degree n in non-commuting indeterminates X1,...,Xd. Hence, the extended tensor product algebra T (V ) of V defined by

2 T (V ) := R V V ⊗ ... ⊕ ⊕ ⊕ with componentwise addition and multiplication induced by the tensor product is isomorphic as an algebra to the space of all formal power series in X1,...,Xd with the tensor product of elements of T (V ) corresponding to the product of non-commuting polynomials.

11 Under the above assumptions, we define for any n 1 the nth order iterated ≥ integral x n of the path x V over any time interval s, t with 0 s < t T as an s,t t ∈ ≤ ≤ n element of V ⊗ by   n i1,...,in xs,t = xs,t ei1,...,in (1.18) i1,...,in X1,...,d ∈{ } i1,...,in where the coefficients xs,t are defined in (1.10). By the multi-linearity of ten- sor products, iterated path integrals can be equivalently expressed in the following coordinate-free way:

un=t u1=u2 x n = ... dx dx (1.19) s,t u1 ⊗ ∙ ∙ ∙ ⊗ un unZ=s u1Z=s which is very useful as it allows this definition to be generalised to paths that take values in arbitrary infinite-dimensional Banach spaces. We have now all the requisite ingredients for defining an object that will serve as the prototype for rough paths. Definition 1.2 (Signature). Let x : 0,T V be a continuous path of bounded → variation taking values in a Banach space V , and let Δ := (s, t) : 0 s t T .   T { ≤ ≤ ≤ } Then the signature S(x):Δ T (V ) of x is the continuous functional mapping T → (s, t) to S(x) := x 0 , x 1 , x 2 ,... where x n for n 1 are the iterated integrals s,t s,t s,t s,t s,t ≥ defined in (1.19) and x 0 1. s,t ≡  Signatures of bounded variation paths can readily be shown to have the following fundamental property: Theorem 1.3 (Multiplicative property). Let S(x) be the signature of a bounded variation path x : 0,T V . Then, for all 0 s u t T , we have that → ≤ ≤ ≤ ≤   S(x) S(x) = S(x) . s,u ⊗ u,t s,t This result is usually called Chen’s identity (even though K. T. Chen wasn’t the first person to discover it), and the signature S(x) of a bounded variation path x is commonly called the Chen lift of x, since it extends – lifts – a path x in V to an element S(x) in T (V ) such that the projection of S(x) onto V is x 1 = x x . s,t s,t t − s Let x : t, u V and y : v, w V be two arbitrary bounded variation paths → → taking values in a Banach space V . Then the concatenation of x and y is defined to be the path x y : t, u + w v V satisfying ∗ − →  x(s) for t s u, x y(s) = ≤ ≤ ∗ x(u) + y(s u + v) y(v) for u s u + w v. ( − − ≤ ≤ − 12 The set of V -valued bounded variation paths, denoted by (V ), is clearly closed BV under concatenation, and, moreover, as this operation is associative, ( (V ), ) is BV ∗ a semigroup (or even a monoid since each trivial path x : t, t V is an identity → element for the operation of concatenation).   Thus, we have

S(x)t,u S(x)v,w = S(x y)t,u S(x y)u,u+w v (1.20) ⊗ ∗ ⊗ ∗ − as signatures are invariant under time translations of paths, and, further, by Chen’s identity

S(x y)t,u S(x y)u,u+w v = S(x y)t,u+w v (1.21) ∗ ⊗ ∗ − ∗ − which, combined with (1.20), shows that the range of the signature map S : (V ) BV → T (V ) is closed under multiplication in T (V ) induced by the tensor product . More- ⊗ over, every element v = (v0, v1, v2,... ) of T (V ) with v0 R 0 possesses an inverse ∈ \{ } element, namely ∞ n 1 1 v ⊗ v− = 1 v0 − v0 n=0 X   where 1 is the multiplicative unit element (1, 0, 0,... ), as one can directly verify. In particular, the subset

1 2 n n T (V ) := (1, v , v ,... ): v V ⊗ , n 1 T (V ) { ∈ ≥ } ⊂ is a group whiche contains the range of the signature map as a subgroup, since the inverse of the signature of a bounded variation path is the signature of the path ‘run backwards’, i.e. for any x : s, u V belonging to (V ) → BV   1 (S(x)s,u)− = S (←−x )s,u (1.22) where x (t) := x(s + u t) for s t u (see [10, Proposition 2.14]). Hence, we have ←− − ≤ ≤ established that the signature map is a homomorphism from the monoid ( (V ), ) BV ∗ into the group (T (V ), ). ⊗ In fact, projections of the range of the signature map S : (V ) T (V ) onto BV → e (n) 2 n the truncated tensor product algebras T (V ) := R V V ⊗ ... V ⊗ are ⊕ ⊕ ⊕ ⊕e Lie groups, as defined below, for all n 1. ≥ Definition 1.4. For a Banach space V , let us define

V, V := v ,..., v , v ... : v V, 1 i n n n 2 1 i ∈ ≤ ≤   n    o 13 for n 2 with V, V := V and V, V := 0 where the Lie bracket is defined as ≥ 1 0 { } v, w := v w w v for all v, w V . Then the space of Lie polynomials of degree ⊗ − ⊗  ∈  n over V , denoted by (n)(V ) and defined as   L n (n)(V ) := V, V L i i=0 M   (n) n i is a linear subspace of T (V ) = i=0 V ⊗ . Further, let us define the exponential map exp : T (n)(V ) T (n)(V ) by n → L n x i exp (x) := ⊗ n i! i=0 X for any x T (n)(V ). Then G(n)(V ) := exp (n)(V ) is a group with multiplication ∈ n L (n) in T (V ) induced by the tensor product, and is called the free nilpotent Lie group of step n over V , or the set of group -like elements in T (n)(V ).

Proposition 1.5 ([10, Proposition 2.27]). For any n 1, G(n)(V ) coincides with the ≥ projection of the range of the signature map S : (V ) T (V ) onto the truncated BV → tensor product algebra T (n)(V ). Thus, every element of G(n)(V ) can be expressed as e the truncated signature of a bounded variation path in V .

It is also natural to enquire about the kernel of the signature map in (V ). By BV (1.22), for any path x (V ), S(x) S( x ) = S(x x ) = 1, i.e. any path concate- ∈ BV ⊗ ←− ∗ ←− nated with its reverse path has trivial signature, and, furthermore, any path that can be reduced to a constant path by successively removing pairs of path segments of the form (x, ←−x ) also has trivial signature. Such paths are called tree-like, and one should point out that path segments x and ←−x in such paths do not necessarily need to be adjacent (and may even be infinitesimal): e.g. a path of the form x y z z y x ∗ ∗ ∗←−∗←−∗←− is tree-like and has trivial signature. As a profound converse statement, B. Ham- bly and T. Lyons have proved (see [6, Theorem 1]) for bounded variation paths in finite-dimensional Euclidean spaces Rd that a path being tree-like is also a necessary condition for it to have trivial signature. Thus, we can define an equivalence relation on (V ) by x y if and only if x y is tree-like. Then we have that S(x) = S(y) BV ∼ ∗ ←− 1 if and only if S(x) S(y)− = S(x) S( y ) = S(x y ) = 1, i.e. if and only if x and ⊗ ⊗ ←− ∗ ←− y are tree-like equivalent. In addition to the above geometric interpretation of signatures as elements of T (V ) that are in one-to-one correspondence with classes of tree-like equivalent bounded variation paths in V , the signature of each bounded variation path x : 0,T V →   14 can be characterised as the solution of the following ‘universal’ rough differential equation, i.e. a differential equation on the extended tensor product algebra T (V ):

dS = S dx (1.23) t t ⊗ t with the initial value S = 1, and where dx represents the element (0, x x , 0,... ) 0 t t+dt− t of T (V ). Indeed, it is nice to observe how the signature of a path builds up through repeated application of tensor multiplication by infinitesimal path increments dxt in

(1.23), so that St = S(x)0,t is the unique solution to (1.23). Thus, informally we can think of the signature of a bounded variation path as a universal non-commutative exponential of the path. Furthermore, this provides a succinct proof of Theorem 1.3 above – viz. the multiplicative property of signatures – since for all 0 s t T ≤ ≤ ≤ both S(x) S and S(x) satisfy the same differential equation (1.23) with the 0,s ⊗ s,t 0,t same initial condition, and hence must be equal. We take this key characteristic of signatures of bounded variation paths as the defining property of a more general abstract object.

Definition 1.6 (Multiplicative functional). With above notation, a multiplicative functional is a continuous functional x :Δ T (V ) with x = 1, x 1 , x 2 ,... T → s,t s,t s,t n n where x V ⊗ for n 1, satisfying the multiplicative property s,t ∈ ≥  x x = x . (1.24) s,u ⊗ u,t s,t for all 0 s u t T . ≤ ≤ ≤ ≤ Next we extend the concept of p-variation to multiplicative functionals.

Definition 1.7 (p-variation). Let x :Δ T (V ) be a multiplicative functional. T → Then, for any real number p 1, the p-variation of x on 0,T is defined by ≥

k  n/p n p/n x p,[0,T ] = sup sup x n th 1,th V ⊗ k k n 1 D ([0,T ]) k − k ! ≥ ∈D Xh=1 n where n denote a set of compatible norms on V ⊗ for n 1. k ∙ kV ⊗ ≥ As with paths of finite p-variation in V , we note that any multiplicative functional of finite p-variation in T (V ) also has finite q-variation for all q > p. A multiplicative functional having finite p-variation, as defined above, is equivalent to it satisfying the condition in the following

15 Proposition 1.8 ([11, Proposition 3.3.2]). Let x :Δ T (V ) be a multiplicative T → functional. Then x has finite p-variation on 0,T for some p 1, in the sense of ≥ Definition 1.7 above, if and only if there exists a super-additive continuous function + ω :ΔT R , called a control function, such that → n n/p x n ω(s, t) k s,tkV ⊗ ≤ for all (s, t) Δ and n 1. ∈ T ≥ In fact, the above condition is the same, in the general context of Banach spaces, as the analytic condition that we formulated earlier for iterated integrals of finite p-variation paths in Euclidean spaces with the control function ω(s, t) = C(t s)n/p − where C is a constant that may depend on n and p. However, the reader should be cautioned about this terminology, as we have also used the word ‘control’ for paths that drive controlled differential equations, rather than referring to functions that control the regularity of iterated path integrals. Finally, we can state the formal definition of a p-rough path.

Definition 1.9 (p-rough path). A rough path of regularity p, or a p-rough path for short, is a multiplicative functional of finite p-variation.

The first fundamental result on rough paths in the development of the theoryby T. Lyons is the following theorem ([9, Theorem 2.2.1]); it shows that only the first p components of a p-rough path really matter since a p-rough path is uniquely b c determined by its truncature at level p . For this reason, we may regard p-rough b c paths as multiplicative functionals of degree p with finite p-variation, taking values b c ( p ) in the truncated tensor product algebra T b c (V ).

Theorem 1.10 (Truncature of p-rough path). If x and y are p-rough paths in T(V) such that x n = y n for all (s, t) Δ and n = 1,..., p , then x = y. s,t s,t ∈ T b c Conversely, any multiplicative functional of degree p with finite p-variation can be b c uniquely extended to a multiplicative functional with finite p-variation of arbitrarily high degree r > p . b c ( p ) By contrast, it is important to realise that for a p-rough path x :Δ T b c (V ) T → for any k satisfying 1 < k p , the terms x n for n = k, . . . , p are never uniquely ≤ b c s,t b c determined by the lower order terms x m for m = 1, . . . , k 1. For example, a sample s,t − path of a Brownian motion in Rd can be extended to a p-rough path for any p > 2 by defining its second order components as either Itˆoor Stratonovich integrals, which are distinct in general.

16 Thus, abstracting the concept and characteristic properties of the signature of a bounded variation path, a p-rough path is an extension of a path of finite p-variation taking values in a Banach space V to its extended tensor product algebra T (V ) or its ( p ) truncature T b c (V ) at level p such that the second and higher order components b c – to be interpreted as, or actually defining, its iterated path integrals – satisfy an analogous regularity condition. In other words, a p-rough path incorporates a full sequence of iterated integrals, and hence encodes all the necessary data to provide an unambiguous solution to any rough differential equation controlled by the p-rough path. However, only components up to order p are needed for the solution of an b c RDE. In particular, as already noted, in the classical case where 1 p < 2 only first ≤ order linear increments of the controlling path are required for integration. The so-called Universal Limit Theorem ([10, Theorem 5.3]), the main result of the theory of rough paths, asserts that any rough differential equation dyt = f(yt)xt controlled by a p-rough path xt, subject to the vector field f being Lipschitz-γ con- tinuous with 1 p < γ, has a unique solution y that is also a p-rough path, and the ≤ t Itˆomap I (x , y ) y is uniformly continuous. This is a deep and satisfying result. f t 0 7→ t As to the meaning of the defining multiplicative property of p-rough paths, this simply corresponds to the additive property of iterated integrals over contiguous time domains: for example, equating second order components on the left and right hand sides of (1.24) for 0 s t u T yields ≤ ≤ ≤ ≤ x 2 + x 2 + x 1 x 1 = x 2 s,t t,u s,t ⊗ t,u s,u which expresses the natural requirement that the second order iterated integral over s, u should equal the sum of second order integrals over s, t and t, u plus the product  of the first order integrals (i.e. linear increments)  over s, t and t, u . One should carefully note the presence of the tensor product term on the left hand side of the above equation, for second and higher order components of multiplicative functionals are not additive! ( p ) For any p 1, let Ω (V ) denote the set of all p-rough paths x :Δ T b c (V ). ≥ p T → In particular, by our earlier remark, we note that Ω1(V ) is contained in Ωp(V ) for all p > 1. Further, we can make Ωp(V ) into a metric space by equipping it with the following metric:

Definition 1.11 (p-variation distance). If x and y are two elements of Ωp(V ), then their p-variation distance is defined by

k n/p n n p/n d p(x, y) = max sup xth 1,th yth 1,th V n . 1 n p k − − − k ⊗ ≤ ≤b c D ([0,T ]) ! ∈D Xh=1 17 In fact, it is straightforward to show that Ωp(V ), dp is a complete metric space for all p 1 (see [11, Lemma 3.3.3]). However, for p 2 , Ω (V ) is not a linear space ≥ ≥  p due to non-linearity of the multiplicative property: in general the sum or difference of two multiplicative functionals fails to be multiplicative.

With this metric, we can identify an important subclass of p-rough paths in Ωp(V ), namely those elements that can be approximated arbitrarily closely by 1-rough paths as measured by the p-variation distance.

Definition 1.12 (Geometric p-rough paths). The closure in Ωp(V ) of the space

Ω1(V ) of 1-rough paths under the topology induced by the p-variation distance dp is called the space of geometric p-rough paths and denoted by GΩp(V ).

Thus, an element x of Ωp(V ) is a geometric p-rough path if and only if there exists a sequence (x n)n 1 of 1-rough paths such that lim n dp(x n, x) = 0. Based ≥ →∞ on the above topological description, one may wonder what ‘geometric’ there is about geometric p-rough paths. The reason for this nomenclature is that geometric p-rough paths take their values in the free nilpotent Lie group of step p – the very interesting b c algebro-geometric object we defined above! ( p ) However, when p 2 there are also p-rough paths that take values in G b c (V ) ≥ but cannot be expressed as the limit of a sequence of 1-rough paths in the p-variation distance. We shall denote the space of such weakly geometric p-rough paths on V by WGΩ (V ). Hence, GΩ (V ) WGΩ (V ) with the inclusion being strict for p 2. p p ⊆ p ≥ One should note that even though by Proposition 1.5 each weakly geometric p-rough path x WGΩ ([0,T ],V ) can be expressed as the truncated signature of a bounded ∈ p variation path for any (s, t) Δ , there isn’t a single y ([0,T ],V ) satisfying ∈ T ∈ BV x = S(y) for all (s, t) Δ – unless, of course, x is actually a 1-rough path. s,t s,t ∈ T Generally though the difference between geometric and weakly geometric rough paths is insignificant, and for our purposes we shall ignore it and just talk about geometric rough paths without distinction. The key implication of this is that for any geometric p-rough path x, we have

1 x = x− x s,t 0,s ⊗ 0,t 1 ( p ) for all 0 s t T where x− is the group inverse of x in G b c (V ), hence ≤ ≤ ≤ 0,s 0,s providing for x s,t the natural interpretation of an increment of the p-rough path x over the time interval s, t . In addition to the two  characterisations – a topological and an algebro-geometric one – of geometric rough paths given above, there is also a third way of describing them that is analytical in nature and somewhat more concrete than the previous ones.

18 d Let V = R . Then it can be shown that among all the p-rough paths in Ωp(V ) only the geometric ones x GΩ (V ) satisfy the following identity ∈ p

i1,...,im j1,...,jn k1,...,km+n xs,t . xs,t = xs,t (1.25)

k1,...,km,km+1,...,km+n i1,...,imX j1,...,jn ∈{ }t{ } where i , . . . , i j , . . . , j is the shuffle product of these two sets of indices, i.e. { 1 m} t { 1 n} the set of all permutations of i , . . . , i , j , . . . , j that preserve the orderings of i { 1 m 1 n} k (k = 1, . . . , m) and jl (l = 1, . . . , n) – just like the orderings of cards in each half of the deck are preserved in a riffle shuffle. It was first observed by R. Ree (see[12]) that if x is the signature of a bounded variation path in V constructed canonically by means of Riemann integrals, then x satisfies the above shuffle product identity. The analytical content of this rather combinatorial looking result may not be immediately obvious, but if we set m = n = 1 above, then (1.25) becomes

i j i,j j,i xs,t . xs,t = xs,t + xs,t which, when interpreting the terms as normal first and second order iterated integrals, is nothing other than the familiar integration by parts formula! Thus, the shuffle product identity can be seen to be a generalisation of the integration by parts formula to higher order iterated integrals. Therefore, geometric rough paths can be characterised as those rough paths that obey the standard rules of calculus, and it is in this fact that their analytical signifi- cance lies. To finish off our brief overview of the theory of rough paths, let us say afewwords about non-geometric p-rough paths for p 2 that hence don’t follow the ordinary ≥ rules of calculus without any correction terms. In 2010, M. Gubinelli published a new theory (see [4]) in which he defines branched rough paths as functionals mapping from a simplex Δ into a Hopf algebra that is generated, as an algebra, by the set T H of rooted trees with vertices labelled by the basis elements of the path space V = Rd (containing the tensor product algebra T (V ) as a linear subspace spanned by the set of linear, i.e. non-branched, trees) such that these functionals satisfy two algebraic conditions analogous to the multiplicative property and the shuffle product identity for geometric rough paths as well as an analytic condition that corresponds to finite p-variation. Other parallels with the theory of geometric rough paths include the fact that the set of branched rough paths also form a Lie group in the Hopf algebra which is very similar to the free nilpotent Lie group. However, as branched, i.e. non- geometric, rough paths will not be used in the rest of this thesis, we will not explore

19 this fascinating theory in greater detail.

In this chapter we have endeavoured to give an intuitive introduction to the theory of rough paths, and especially we have wanted to show that even though this is a very modern theory, developed over the past two decades, it has deep historical roots going back to classical infinitesimal calculus and further all the way to ancient Greek mathematics, and indisputably represents one of the most important advances to the mathematical study of change since the days of Newton and Leibniz. From its initial, purely analytical problem of solving differential equations con- trolled by irregular paths – equivalently, integrating differential forms along such paths – the key insight of the theory of rough paths has been to take the whole sequence of iterated path integrals as the fundamental object driving differential equations and controlling the local behaviour of their solutions, and, endowing them individually with a rich algebraic structure, consequently discovering that collectively they form a beautiful geometric object. Underpinning this pleasing aesthetics of the theory of rough paths lies its fundamental achievement of giving meaning to the differential of a function with controlled irregularity – and, in the field of mathematical analysis, surely nothing is more fundamental than that.

20 Chapter 2

Application of rough paths theory to time series analysis of financial data streams

In this chapter we consider ways in which the theory of rough paths can be applied to analyse time series, particularly focussing on high frequency financial data streams. We begin with a brief discussion of the methods traditionally employed in classical time series analysis, and then give a literature review of recent applications of the approach to use the signature of a financial time series for the purposes of data classification and prediction based on supervised learning algorithms. Finally, we present our own novel application of rough paths theory in the context of financial data analysis.

2.1 Classical time series analysis

Financial data is usually obtained in discrete form as a time series: an ordered set ˆ ˆ 1 ˆ d d of multi-dimensional numerical values X = (Xt ,..., Xt ) R : i = 0,...,N { i i ∈ } observed at finitely many time points t < t < < t . In traditional time series 0 1 ∙ ∙ ∙ N analysis it is a common approach to view discrete data points as samples of an under- lying continuous time which is typically assumed to have a specific form in order to capture certain characteristics of the time series being modelled – e.g. autoregression (AR), moving-average (MA) or conditional heteroscedasticity (CH) – and whose parameters are estimated using regression techniques so as to best fit the chosen model to the given data stream. However, parametric approaches to modelling time series have several inherent limitations. First of all, they usually depend to a large extent on the assumptions

21 made about the underlying (unknown) data-generating process, and thus are sub- ject to the potential risk of model misspecification, for it may happen that a chosen model cannot adequately describe a given time series even with an optimal calibra- tion. Secondly, in some cases sampling may not be an effective way to approximate a continuous process – especially when dealing with highly oscillatory processes – as a sequence of data points may fail to capture the order of events between different coordinates of a multi-dimensional process, and hence be incapable of detecting la- tencies and causal effects within the structure of the data stream. Though increasing sampling frequency normally improves the accuracy of a discrete approximation, this is not always the case: for example, it is known that sampling a Brownian motion, as the driving signal of a dynamical system, with arbitrarily high frequency does not necessarily provide sufficient statistical information for its effects on the evolution of the system to be predicted. Moreover, sampling at high rates is beset with its own fundamental problems, chief of which is the curse of dimensionality. For instance, recording a high fre- quency financial data stream tick by tick – which may be only milliseconds apart– is usually an inefficient way of representing such data, for it is bound to contain lots of redundant information, meaningless market noise, that might obscure the main structural characteristics of the data stream, and to carry out regression analysis on increments as features of the data stream would be infeasible because of prohibitively high dimensionality. Therefore, one would like to find methods to summarise high frequency data streams in a more concise way – to compress big data sets without losing key information – and thus to achieve significant dimension reduction enabling standard regression techniques to be applied. As we will see, using the signature of a data stream, truncated to a suitable order, as the feature set of the data stream accomplishes these objectives.

2.2 Signatures as feature sets for linear regression analysis

As we recall from Section 1.2 above, B. Hambly and T. Lyons showed in [6] that the signature of a multi-dimensional path of finite length uniquely determines the path up to tree-like equivalence (i.e. up to modifications that have null effect when the path is used as a system control), and hence pointed out that mapping a path to its signature can be viewed as a faithful data transform in the sense that no information is lost in the process. Following this notion in [8], D. Levin, T. Lyons

22 and H. Ni were the first to propose using signatures of time series for the purposes of analysing financial data, and demonstrated the potential this approach hasfor machine learning and statistical inference. Specifically, in their paper the authors built a general non-parametric model for determining the conditional distribution of the output variable of a system as a linear functional of the components of the expected truncated signature of a random input stream (for a large number of series of data samples), and thus, using the signature of a data stream as a feature set, estimated the functional relationship between an input stream and the corresponding noisy system response by employing standard techniques of regression analysis. Moreover, they showed that classical parametric time series models such as AR, ARCH and GARCH can be considered to be special cases of the expected signature model. Given that signature is an intrinsically non-linear object, its use as a feature set for linear regression analysis might, on first thought, seem somewhat counterintuitive. However, assuming that the conditional distribution of a future system output is a smooth function of the signature of the current input stream, then it is reasonable also to assume that this function can be well approximated locally by a polynomial. But, by the shuffle product property of signatures, as stated in (1.25) above, any polynomial of signature components, i.e. iterated path integrals, can be expressed as a linear combination of higher order signature components. Thus, it is indeed natural to assume a linear relationship between the expected signature of input streams and the system output. (dn+1 1) − Since the signature of a d-dimensional path truncated to order n has (d 1) − components, one can see that the signature approach allows high frequency data streams to be represented by relatively few features compared to the sampling method. Furthermore, the number of these signature based features does not depend on the sampling frequency whereas the dimensionality of increment features increases linearly with the sampling frequency. This reduction in effective dimensionality through the use of signature components as a feature set can lead to substantial efficiency gains when performing regression analysis, and may also help to avoid overfitting issues that often bedevil the use of high frequency data. In [8], D. Levin et al. showed that for a 2-dimensional path one can predict the output of a system controlled by the path more accurately by using the truncated signature of degree 4, and thus of dimension 31, than by using an increment feature set of dimension 250, which demonstrates that signature features furnish a more efficient summary of a path in terms of its effects than traditional increment features. The authors also compared the performance of the expected signature model to that of a non-parametric Gaussian process model,

23 and found that it achieves similar forecasting accuracy with a computational cost that is lower by two orders of magnitude! Even though some information is inevitably lost when the full signature of a path is projected onto a finite-dimensional truncated tensor product space, the low order components contain most of the information with the truncation error decreasing factorially as the degree of the truncated signature increases, and, moreover, these leading components are not particularly sensitive to the sampling frequency used. Hence, the iterated integrals of a path (of bounded variation) provide very efficient statistics of the path in the sense that they determine the response of any linear system driven by the path very accurately. Indeed, the beauty and power of this whole approach lies in the fundamental fact that the signature of a path efficiently summarises information on normal time scales in a way that enables the effects of the path in dynamical interactions to be effectively predicted without needing to know the behaviour of the path on microscopic scales – which for some less regular paths can be highly complex. In [5], G. Gyurk´o et al. took a similar approach by embedding financial time se- ries into continuous processes using linear interpolation between discrete data points, computing truncated signatures of the paths thus constructed, and by performing standard linear regression (combined with the LASSO shrinkage method) on the signature components as a feature set, classified data streams in a given learning set according to some selected properties, and then proceeded to classify fresh data streams based on their signatures in out-of-sample testing. For example, in one of the numerical tests presented in [5], it was explored to what extent the signatures of streams of WTI crude oil futures market data (including mid-price, bid-ask spread, order imbalance and cumulative trading volume) sampled by the minute from stan- dard 30 minute time intervals determine the time buckets they are sampled from, and, by using standard statistical indicators to measure the accuracy of classification, it was shown that a very small number of low-dimensional signature components of data streams suffice to characterise their time buckets with a high degree of accu- racy (the ratio of correct classification exceeding 90% in most cases). This example, together with the other experiments presented in [5] aiming to characterise two dif- ferent trade execution algorithms by distinguishing between parent orders generated by them and hence to detect their traces in market data, demonstrate again that signatures of data streams efficiently capture information in a non-parametric way that avoids traditional statistical modelling of time series data.

24 This paper is also notable for (i) introducing lead and lag transforms of a multi- dimensional data stream – special types of time re-parameterisation of the data stream that preserve its signature – in order to capture the quadratic variation of path com- ponents, as this quantity – i.e. volatility – is of fundamental importance in financial applications, and (ii) using first and higher order areas between path components to analyse data streams. We will discuss these topics in detail in the next two sections, especially the latter since in our novel application of the signature approach (to be presented in Section 2.5) third order areas will play a key role as sensitive tools for detecting mean-reverting behaviour in financial time series. However, let us first state the fundamental properties of signatures that will be used in practice, as the theoret- ical foundation of our numerical algorithms, to compute signatures of data streams as well as the invariance property of signatures under time re-parameterisations upon which the useability of lead and lag transforms rests. 1 d d Let Xt = (Xt ,...,Xt ) R be a continuous d-dimensional path of bounded ∈ variation defined on a time interval 0,T . For any multi-index, i.e. an ordered set of indices I = (i , . . . , i ) with k 1 and i 1, . . . , d for j = 1, . . . , k, we define, as in 1 k ≥  j ∈ { } (1.10), the kth order iterated integral of the path X corresponding to the multi-index I over the time interval s, t for any 0 s < t T by ≤ ≤   (i ,..., i ) X 1 k := dXi1 . . . dX ik . s,t ∙ ∙ ∙ u1 uk s

Then the signature S(X)s,t of X over the time interval s, t is defined to be I the sequence of iterated integrals (Xs,t)I where is the set of all multi-indices ∈I I   with the zeroth order component of the signature corresponding to the empty set of indices defined to be 1, and, for any non-negative integer n, the truncated signature n I S (X)s,t of degree n is the sequence (Xs,t)I n where n is the set of all multi-indices ∈I I that consist of at most n indices. With this notation, we have the following key properties of signatures:

(i) Uniqueness ([6, Theorem 1]): The signature S(X)s,t of a path (Xu)s u t of ≤ ≤ bounded variation taking values in Rd determines the path, i.e. the function u (X X ) for s u t, up to tree-like equivalence. Moreover, if at least 7→ u − s ≤ ≤ one of the co-ordinates Xi with i 1, . . . , d is a monotonically increasing u ∈ { } function of u, then the path (Xu)s u t is uniquely determined by the signature ≤ ≤ S(X)s,t. However, the proof of this uniqueness result in [6] is non-constructive, and recently X. Geng has provided an explicit method in a more general setting

25 for reconstructing a rough path from its signature (see [3]), thus effectively in-

verting the signature map (Xu)s u t S(X)s,t. It should be noted that any two ≤ ≤ 7→ 1-dimensional paths whose initial and final values differ by the same amount are tree-like equivalent and hence have the same signature, irrespective of the way the distance between the start and end points is traversed (whether travelling straight or zigzagging). By contrast, this is not the case for multi-dimensional paths whose higher order iterated integrals are not uniquely determined by their first order increments, but in general depend on the trajectories between the start and end points. Nevertheless, by the uniqueness property and (iii) below, the signature of a path of arbitrary dimension that is tree-like equivalent to a linear path is uniquely determined by its first order increments.

(ii) Invariance under time re-parameterisations: For any continuous and monotonically increasing function f : 0,T U, V and (i , . . . , i ) , → 1 k ∈ I we have    

i1 ik i1 ik dXu . . . dXu = dX 1 . . . dX 1 ∙ ∙ ∙ 1 k ∙ ∙ ∙ f − (u1) f − (uk) s

for any 0 s < t T . Therefore, S(X) = S(Xˆ) where the path ≤ ≤ s,t f(s),f(t) (Xˆ u)f(s) u f(t) is an arbitrary time re-parameterisation of the original path ≤ ≤ ˆ (Xu)s u t such that Xu = Xf 1(u) for f(s) u f(t). ≤ ≤ − ≤ ≤

(iii) Signature of a linear path: If Xt = X0 + Y t for some fixed points X0 and d Y = (Y1,...,Yd) in R and all t 0,T , then for any multi-index (i1, . . . , ik) ∈   (t s)k k X(i1,...,ik) = − Y s,t k! ij j=1 Y for any 0 s < t T . Thus, each iterated integral of a linear path is simply ≤ ≤ the product of its increments in the relevant co-ordinates over the time interval divided by the factorial of the order of the iterated integral. This means that the signature of a linear path – in fact that of any path – is independent of its initial

value X0, or, to put it differently, signatures are invariant under translations of paths in the spatial domain.

(iv) Multiplicative property: For all 0 s t u T , we have ≤ ≤ ≤ ≤ S(X) S(X) = S(X) . s,t ⊗ t,u s,u

26 Application of lead and lag transforms to data streams relies on property (ii) in that time re-parameterisations of paths leave their signatures invariant, whereas properties (iii) and (iv) will be used to compute truncated signatures of data streams through the following procedure: 1) for given data streams, continuous paths are constructed by linearly interpolating between discrete data points, 2) the signature of each linear segment of the path is computed using (iii), and 3) the signatures of contiguous linear segments are joined together to form the signature of the whole piecewise linear path using (iv).

Note (on Numerical Algorithms). Even though free open-source software packages are available for the computation of signatures1, all numerical algorithms used in this thesis were developed from scratch and implemented in Python by the author. These include a function that produces a time-indexed sequence of signatures, truncated to an arbitrary degree specified by the user, for a given serial data stream of arbitrary dimension, and functions that generate different types of lead and lag transforms of input data streams, as well as various routines used to visualise outputs of such functions. All of these programs were rigorously tested (e.g. by checking signature components against iterated integrals computed in an Excel spreadsheet) to ensure that they do not contain any bugs. Code samples are exhibited in Appendix 2.

2.3 Lead and lag transforms of data streams

Since financial data usually comes in the form of a time series, whereas signatures are defined for continuous paths, our first task is to embed discrete data streams intopaths ˆ ˆ 1 ˆ d d defined over continuous time intervals. Let X = (Xt ,..., Xt ) R : i = 0,...,N { i i ∈ } be a set of data points observed at finitely many time points t < t < < t . 0 1 ∙ ∙ ∙ N Obviously there are various possible ways of embedding Xˆ into a continuous time ˆ path (Xt)t0 t t so that Xt = Xt for i = 0,...,N – and different ways of ‘joining ≤ ≤ N i i the dots’ generally produce paths with different signatures. The following methods are the most relevant for our current purposes: the first two – constructing piecewise linear or piecewise constant paths – are standard approaches (applied, for instance, in [5] and [8], respectively), whereas the third method – lead and lag transforms – was introduced by B. Hoff in his D.Phil. thesis [7] in 2005.

1 For example, the sigtools Python package which is based on the libalgebra library of the CoRoPa project (downloadable from http://sourceforge.net/projects/coropa) that was used in both [5] and [8].

27 (i) Piecewise linear interpolation: For t t , t with i = 0,...,N 1, we ∈ i i+1 − define t t   X = Xˆ + − i Xˆ Xˆ . t ti t t ti+1 − ti i+1 − i  (ii) Piecewise constant ‘axis’ path: For t t , t with i = 0,...,N 1, we ∈ i i+1 − set X = Xˆ , so that at each time point t (i = 0,...,N 1) the path jumps t ti i+1  − ˆ discontinuously to the value Xti+1 . It is worth remarking that even though such axis paths are continuous time paths in the sense that they are defined for a continuous range of time values within a specified interval, they are clearly not continuous functions of time, and to call them that, as is done in some research papers (see e.g. [8]) is somewhat misleading. ˆ N For any embedding of a data stream (Xti )i=0 into a continuous time path

(Xt)t0 t t , we define the signature of the data stream as S(X)t0,t . As said, ≤ ≤ N N in general different embeddings yield different signatures, but it is easy tosee lin con that the piecewise linear path Xt and the piecewise constant path Xt defined from the same data stream have the same signature. However, one should note that even though S(Xlin) = S(Xcon) for any 0 i < j N, S(Xlin) ti,tj ti,tj ≤ ≤ ti,t does not equal S(Xcon) for any t t , t , since S(Xcon) = (1, 0, 0,... ). ti,t ∈ j j+1 tj ,t (iii) Lead and lag transforms: The idea behind lead and lag transforming a given ˆ N d-dimensional time series (Xti )i=0 is to create new backward (‘lag’) and forward (‘lead’) time series by adding data points to the original time series in two distinct ways that both preserve its increments, and hence leave its signature invariant, since such transforms are time re-parameterisations of the given data stream. The data points of the lag and lead transformed streams are then joined together to form axis or piecewise linear paths, depending on the method applied. Several different definitions of lead and lag transforms can be foundin the literature, of which we will review three methods below.

2.3.1 Gyurk´o-Lyons-Kontkowski-Field method

In [5, Section 2.5], Gyurk´o, Lyons, Kontkowski and Field (‘GLKF’) defined the ˆ N lead and lag transforms of a d-dimensional data stream (Xti )i=0 as follows: For ˆ lead ˆ lag ˆ ˆ lead ˆ i = 0,...,N, X = Xt = Xti , and, for i = 1,...,N, X = Xti and ti i ti 1 − 2 ˆ lag ˆ Xt 1 = Xti 1 . Thus, the lead and lag transforms of a given stream of N + 1 data i − − 2 points consist of 2N + 1 data points. Indeed, from the above description it is easy to see that they can be created by repeating the data points of the original stream, and

28 deleting the first and last data points in order to obtain the lead and lag transforms, respectively. Hence, lead and lag transforms are time translations of each other, as illustrated below in Figure 1 which displays the lead and lag transforms produced by applying the GLKF method to a 1-dimensional data stream whose increments are randomly sampled from a standard normal distribution.

Figure 1: GLKF method of lead-lag transforming data streams.

This definition was motivated by the authors’ desire to be able to easily read offthe ˆ j N volatilities of the components (Xti )i=0, for j = 1, . . . , d, of a given data stream from the signature of the (2d)-dimensional data stream (Yˆ ) 2N = (Xˆ lead, Xˆ lag ) 2N , as ti/2 i=0 ti/2 ti/2 i=0 volatilities of market variables are highly relevant quantities in financial applications. For, it is straightforward to verify by a direct calculation that for any j = 1, . . . , d

N 1 − Y (j, j+d) Y (j+d, j) = (Xˆ j Xˆ j )2 t0,tN − t0,tN ti+1 − ti i=0 X where (Yt)t0 t t is the piecewise constant or piecewise linear interpolation of the data ≤ ≤ N ˆ 2N th stream (Y ti/2 )i=0, i.e. the quadratic variation of the j component of the original

29 ˆ N th data stream (Xti )i=0 is equal to the difference between the iterated integral of the j components of the lead and lag transformed data streams and the iterated integral of the same components in reverse order. This latter quantity is twice the area between the jth components of the lead and lag transformed data streams, as defined in Section 2.3 of [5]. We will use the same definition of area between path components in this work. Furthermore, in Section 2.5 of [5] it was claimed that “the (signed) area between the ith component of the lead-transform and the jth component of the lag-transform equals to the quadratic cross-variation of the trajectories Xˆ i and Xˆ j ”. Unfortunately, this is not a valid statement, as one can readily show either analytically or using a numerical simulation. The current author has verified this in both ways. To prove this point mathematically, in Appendix 1 we provide an explicit calculation of the signature of a multi-dimensional data stream with one lead-transformed and one lag- transformed component.

2.3.2 Flint-Hambly-Lyons method (Mark 1)

In the early version of [2] (of February 2014), Flint, Hambly and Lyons (‘FHL’) used a different definition (see Definition 1.2 of that version of their paper) forleadand ˆ lead ˆ ˆ lag ˆ lag transforms by setting Xti = Xti+1 and Xti = Xti 1 for i = 1,...,N 1, − − ˆ lead ˆ ˆ lag ˆ ˆ lead ˆ lag ˆ X = Xti+1 and X = Xti for i = 0,...,N 1, with X = X = Xt0 and t 1 ti+ 1 t0 t0 i+ 2 2 − ˆ lead ˆ lag ˆ XtN = XtN = XtN , and then linearly interpolating between data points to form continuous time paths. Figure 2 below illustrates the lead and lag transforms of the same data stream as in Figure 1 produced by using this method. From Figure 2, we can see that at the start and at the end of the data stream its lead and lag transforms are not simple time translations of each other; nevertheless, this method of lead-lag transforming a data stream does preserve its increments, and hence leaves its signature invariant.

2.3.3 Flint-Hambly-Lyons method (Mark 2)

In the current version of [2] (of September 2016), the authors have modified their earlier definition of lead and lag transforms (see Definition 2.1 of that version ofthe ˆ lead ˆ lead ˆ lead ˆ ˆ lead paper), and now define them by setting Xti = Xt 1 = Xt 1 = Xti+1 and Xt 3 = i+ 4 i+ 2 i+ 4 ˆ ˆ lead ˆ lead ˆ lead ˆ lead ˆ lead ˆ Xti+2 for i = 0,...,N 2 with X = X = X = X = X = XtN , tN 1 tN 3 tN 1 tN 1 tN − − − 4 − 2 − 4 ˆ lag ˆ lag ˆ lag ˆ lag ˆ ˆ lag ˆ and Xti = Xt 1 = Xt 1 = Xt 3 = Xti for i = 0,...,N 1 with XtN = XtN 1 . i+ 4 i+ 2 i+ 4 − − (Strictly speaking, Definition 2.1 of [2] fails to assign a value to the penultimate point

30 Figure 2: FHL (Mark1) method of lead-lag transforming data streams.

ˆ lead Xt 1 of the lead transform.) Thus, under this method the lead and lag transforms N 4 of a− time series of N + 1 data points consist of 4N + 1 data points. In Figure 3 below these are illustrated for the same data stream that was used in Figures 1 and 2. This new definition was suggested by a context in mathematical finance wherean investor would readjust at time ti+1 the amounts of stock he holds in his portfolio based on the stock prices at time ti – i.e. when there is a delay between receiving market information and acting on it by trading – for by defining the lead and lag transforms of the time series of stock prices in this way allows one to express the profit (or loss) made by the investor’s trading strategy as an exact integral ofa function of the lag transform with the lead transform as an integrator. The problem with this definition is that lead and lag transforms specified asabove ˆ lead ˆ ˆ lag ˆ do not preserve the increments of a data stream, since Xt0 = Xt1 and XtN = XtN 1 − mean that the first and last increments of the original data stream are missing from the lead and lag transformed streams, respectively, as can be seen from Figure 3 below, and consequently their signatures generally differ from that of the original

31 Figure 3: FHL (Mark 2) method of lead-lag transforming data streams. data stream (as well as from each other), as one can readily verify with a numerical ˆ lead ˆ example. However, this situation can be easily remedied by redefining Xt0 = Xt0 ˆ lag ˆ and XtN = XtN .

2.4 Area processes of multi-dimensional paths

2.4.1 Definition and basic properties of areas

In Subsection 2.3.1 above we already encountered the concept of area between two components of a multi-dimensional path. This is now formalised in the following

1 d d Definition 2.1 (Area). Let X : u 0,T Xu,...,Xu R be a continuous ∈ 7→ ∈ (i, j) i j path of finite length. Then the area A s,t between two path components Xu and Xu with 1 i, j d over any time interval s, t where 0 s t T is defined by ≤ ≤ ≤ ≤ ≤ v=t u=v v=t u=v 1 1 A(i, j) := dXi dXj dXj dXi = X(i, j) X(j, i) . s,t 2  u v − u v 2 s,t − s,t vZ=s uZ=s vZ=s uZ=s     32 Figure 4: Area between path components Xi and Xj.

(i, i) As immediate consequences of the above definition, we have that A s,t = 0 and (i, j) (j, i) A s,t = A s,t for all 1 i, j d and 0 s t T . − (i, j) ≤ ≤ ≤ ≤ ≤ The quantity A s,t has a natural geometric interpretation – which also explains its name – as the signed area between the curve u (Xi ,Xj ) for u s, t and the 7→ u u ∈ i j i j chord that connects the start point (Xs,Xs ) and end point Xt ,Xt of the curve. This is illustrated in Figure 4 above. For, it is clear that the second order iterated (i, j) (j, i) integrals X s,t and X s,t represent the areas between the curve and the vertical and (i, j) (j, i) (i) (j) horizontal axes, respectively, so that X s,t + X s,t = X s,tX s,t . Moreover, denoting (i, j) the area between the curve and the chord by A s,t , which is shaded yellow in Figure (i, j) (j, i) 1 (i) (j) 4, we have that A s,t + X s,t = 2 X s,tX s,t , which, when substituted into the previous (i, j) 1 (i, j) (j, i) equation, yields A s,t = 2 X s,t X s,t . −(i, j) One should emphasize that A s,t is a signed area, i.e. that it can take positive (i, j) or negative (or zero) values: indeed, if A s,t > 0, then, as we have seen, it follows (j, i) straight from the definition that A s,t < 0; or, in geometric terms, reflecting the curve in Figure 4 in the chord connecting its start and end points, which corresponds to reversing the order of the path components in the area calculation, would give a negative area (of the same absolute magnitude) above the chord.

33 Figure 5: A typical 2-dimensional Brownian sample path.

However, sample paths of stochastic processes don’t usually look like the smooth monotonic curve displayed in Figure 4. Rather, a typical 2-dimensional Brownian- like random walk with independent normal increments is shown in Figure 5 above. It should be noted that for less regular paths that zigzag and cross themselves the simple geometric interpretation of area between path components as the signed area enclosed between the curve and the chord is no longer valid in general. In particular, it is easy to see that the sign of the area enclosed within a loop – a path segment whose start and end points coincide – depends on the direction in which the loop is traversed. Yet none of the many research articles or text books we have come across that deal with areas between path components point out this basic fact while presenting the standard geometric interpretation. Another invalid view about areas between path components that can be found in the literature is expressed in the following statement (see Section 2.3 of [5] under the heading of ‘Lead-lag relationship’): “if an increase (respectively decrease) of the component X1 is typically followed by an increase (decrease) in the component X2,

34 then the area A1,2 is positive. If a move in X1 is followed by a move in X2 to the opposite direction, the area is negative”. This assertion is clearly false, since, for example in Figure 4 above, both the path and its reflection in the chord connecting its start and end points have positive increments in their components, but the area enclosed by the former below the chord is positive whereas the area enclosed by the latter above the chord is negative. It is also evident that the area between path components which have increments of opposite signs can be either positive or negative (or zero). Even though area between path components does not capture correlation between increments in the components, as is clear from the above discussion, by viewing the operation of taking iterated integrals of path components as a kind of ‘product’ on the space of path components, the operation of computing areas between path components can be regarded as a ‘commutator’ or ‘Lie bracket’ on the space of path components, and in this sense the area between two path components can be viewed as a measure of their non-commutativity. We will formalise this idea in the next subsection, and will subsequently explore what kind of algebraic structure it endows on the path space.

2.4.2 Higher order areas

(i1, i2) i1 In the previous subsection, we defined the area A s, t between two components Xu i2 1 d and Xu of a continuous d-dimensional path Xu,...,Xu 0 u T of finite length where ≤ ≤ ik 1, . . . , d for k = 1, 2 over a time interval s, t with 0 s t T . For a ∈ { (i1, i2}) (i1≤, i2) ≤ ≤ fixed s, A s, u is a function of u for s u t, and  thus A s, u can be viewed as ≤ ≤ (i1, i2) a 1-dimensional path defined on the time interval s, t . Furthermore, as A s, u is ((i1, i2), i3) clearly continuous and has finite length, we can define  the second order area A s, t between three path components Xi1 , Xi2 and Xi3 where i 1, . . . , d for k = 1, 2, 3 u u u k ∈ { } over a time interval s, t with 0 s t T as follows: ≤ ≤ ≤  v=t u=v v=t u=v 1 A((i1, i2), i3) := dA(i1, i2) dXi3 dXi3 dA(i1, i2) . (2.1) s,t 2  s, u v − u s, v  vZ=s uZ=s vZ=s uZ=s   By the anti-commutativity of area operation with respect to the order of path indices, we have that A((i1, i2), i3) = A(i3, (i1, i2)) = A(i3, (i2, i1)). However, one should s, t − s, t s, t carefully note that the operation of forming second order areas is not associative – ((i1, i2), i3) the way path indices are bracketed certainly matters – so that in general A s, t (i1, (i2, i3)) (i1, (i2, i3)) is not equal to A s, t . For example, when i2 = i3, A s, t = 0, but for i1 = i2 ((i1, i2), i3) 6 A s, t may well be non-zero.

35 (i1, i2) (i1, i2) (i1, i2) Let us examine the area differential dA s, u := A A s, u that appeared in s, u+du − the above definition, as it will not only enable us to express second and higher order areas (to be analogously defined shortly) as linear combinations of iterated integrals of one degree higher, but will also give the crucial idea for our classification of paths using third order areas. Since the area differential can be written as

v=u v=u 1 dA(i1, i2) = dXi1 dXi2 dXi2 dXi1 s,u 2   v  u −  v  u  Z Z  v=s v=s  (2.2)       1 = Xi1 Xi1 dXi2 Xi2 Xi2 dXi1 , 2 u − s u − u − s u n o (i1, i2)   we have that dA s, u = 0 is equivalent to (Xi2 Xi2 ) dXi2 = u − s dXi1 u i1 i1 u (Xu Xs ) − i i (i1, i2) provided that (X 1 X 1 ) = 0. Therefore, it follows that A s, u = 0 for all u s, t u − s 6 ∈ if and only if   (Xi2 Xi2 ) Xi2 = Xi2 + t − s (Xi1 Xi1 ) u s i1 i1 u s (X Xs ) − t − i1 i i i (i1, i2) with X = X 1 or X 1 = X 1 for all u s, t . Thus, the area process u A s, u for t 6 s u s ∈ 7→ u s, t is identically zero if and only if the point (Xi1 ,Xi2 ) traces a straight line of ∈   u u fixed slope from the start point (Xi1 ,Xi2 ) to the end point (Xi1 ,Xi2 ) for u s, t   s s t t ∈ – possibly moving backwards and forwards or pausing for some time along the way. Indeed, this is a natural result bearing in mind the earlier picture of an area between a curve and the chord connecting its start and end points! For, in order to have a zero area enclosed between a curve and its chord at every point in time the two must always coincide; equivalently, a non-zero area can arise only if the curve has non-zero curvature at some point in time. Thus, the area process of two path components can be seen to capture any curvature in their trajectory. We can now use the above expression for area differential to derive a general formula for second order area as a linear combination of third order iterated integrals. For, by substituting (2.2) into (2.1), after some manipulation of the iterated integrals, we obtain

(i1, (i2, i3)) 1 (i1, i2, i3) (i2, i1, i3) (i3, i2, i1) A s, t = X s, t + X s, t + X s, t 4 (2.3)  (i , i , i ) (i , i , i ) (i , i , i ) X 1 3 2 X 2 3 1 X 3 1 2 . − s, t − s, t − s, t  36 For indices i1, i2 and i3 that are all distinct, the third order iterated integrals corresponding to all the permutations of them generally have different values so that the expression for their second order area consists of 6(= 3!) different terms, whereas, when i1 = i2 or i1 = i3, two of the terms cancel each other out and the remaining four comprise two pairs of equal terms so that A(i1, (i1, i3)) = 1 X(i1, i1, i3) X(i1, i3, i1) , s, t 2 s, t − s, t (i1, (i2, i3)) and, when i2 = i3, A s, t of course vanishes. In particular, one should note that the signs of the iterated integrals in the expression for the second order area are not given by the signs of the permutations of the indices. Analogously, third order areas involve four path components, and can be defined

i1 in two fundamentally different ways: as the area between path component Xu and (i2, (i3, i4)) i2 i3 i4 the second order area A s, u between path components Xu , Xu and Xu , or the (i1, i2) (i3, i4) area between the areas A s, u and A s, u of two pairs of path components. Formally, we define

v=t u=v v=t u=v 1 A(i1, (i2, (i3, i4))) := dXi1 dA(i2, (i3, i4)) dA(i2, (i3, i4)) dXi1 (2.4) s, t 2  u s, v − s, u v  vZ=s uZ=s vZ=s uZ=s   and

v=t u=v v=t u=v 1 A((i1, i2), (i3, i4)) := dA(i1, i2) dA(i3, i4) dA(i3, i4) dA(i1, i2) . (2.5) s, t 2  s, u s, v − s, u s, v  vZ=s uZ=s vZ=s uZ=s   For our path classification purposes, we will find third order areas of the latter type much more useful, and in the sequel whenever we refer to a third order area without qualification we will always mean a third order area of this type. Applying the differential operator to (2.3) and substituting into (2.4) yields

1 A(i1, (i2, (i3, i4))) = X(i1, i2, i3, i4) + X(i2, i1, i3, i4) + X(i2, i3, i1, i4) s, t 8 s, t s, t s, t  (i1, i3, i2, i4) (i3, i1, i2, i4) (i3, i2, i1, i4) + X s, t + X s, t + X s, t

(i1, i4, i3, i2) (i4, i1, i3, i2) (i4, i3, i1, i2) + X s, t + X s, t + X s, t + X(i2, i4, i3, i1) + X(i3, i4, i2, i1) + X(i4, i2, i3, i1) s, t s, t s, t (2.6) X(i1, i2, i4, i3) X(i2, i1, i4, i3) X(i2, i4, i1, i3) − s, t − s, t − s, t X(i1, i4, i2, i3) X(i4, i1, i2, i3) X(i4, i2, i1, i3) − s, t − s, t − s, t X(i1, i3, i4, i2) X(i3, i1, i4, i2) X(i3, i4, i1, i2) − s, t − s, t − s, t X(i2, i3, i4, i1) X(i3, i2, i4, i1) X(i4, i3, i2, i1) . − s, t − s, t − s, t 

37 Similarly, substituting (2.2) into (2.5) gives

1 A((i1, i2), (i3, i4)) = X(i1, i2, i3, i4) + X(i1, i3, i2, i4) + X(i3, i1, i2, i4) s, t 8 s, t s, t s, t  (i2, i1, i4, i3) (i2, i4, i1, i3) (i4, i2, i1, i3) + X s, t + X s, t + X s, t

(i4, i3, i1, i2) (i4, i1, i3, i2) (i1, i4, i3, i2) + X s, t + X s, t + X s, t + X(i3, i4, i2, i1) + X(i3, i2, i4, i1) + X(i2, i3, i4, i1) s, t s, t s, t (2.7) X(i2, i1, i3, i4) X(i2, i3, i1, i4) X(i3, i2, i1, i4) − s, t − s, t − s, t X(i1, i2, i4, i3) X(i1, i4, i2, i3) X(i4, i1, i2, i3) − s, t − s, t − s, t X(i3, i4, i1, i2) X(i3, i1, i4, i2) X(i1, i3, i4, i2) − s, t − s, t − s, t X(i4, i3, i2, i1) X(i4, i2, i3, i1) X(i2, i4, i3, i1) . − s, t − s, t − s, t  Thus, as one can see from (2.6) and (2.7) above, the general expressions for the two types of third order area as linear combinations of 4! = 24 fourth order iterated integrals are indeed different. Further, general formulae for different types of higher order areas could be worked out just as straightforwardly – though the tedium of such an exercise increases factorially, and for our current purposes it is unnecessary to go beyond third order areas. In addition to mathematically proving (2.3), (2.6) and (2.7), these expressions have also been verified numerically by computing second and third order areas from first principles according to their definitions (2.1), (2.4) and (2.5), and by comparing the results obtained using the two methods to make sure that they agree. Finally, we return to our earlier suggestion of viewing the operation of computing 1 areas as a Lie bracket on the space of 1-dimensional paths. For, if (Xu)s u t and ≤ ≤ 2 (Xu)s u t are two paths defined over the time interval s, t , we can define their ≤ ≤ 1 2 (1, 2) product (X X )s u t to be the path u X s, u for u s, t , i.e. multiplying ∗ ≤ ≤ 7→  ∈  paths means computing their second order iterated integral. Further,  we define the 1 2 1 2 Lie bracket X ,X s u t of Xu and Xu to be their commutator, so that for u s, t ≤ ≤ ∈ X 1,X2 := X1 X2 X2 X1 = X(1, 2) X(2, 1) = 2A(1, 2) . (2.8) u ∗ − ∗ u s, u − s, u s, u It is clear that the multiplication * on the path space is non-commutative, since (1, 2) (2, 1) in general X s, u is not equal to X s, u , and it is also important to realise that it is not associative either, as can readily be seen through theoretical considerations as well as verified by numerical examples, i.e.

(X1 X2) X3 = X1 (X2 X3) , ∗ ∗ u 6 ∗ ∗ u   38 1 2 3 for arbitrary paths Xu, Xu and Xu defined on s, t . Moreover, it should be pointed 1 2 3 (1, 2, 3) out that generally ((X X ) X ) is not equal to X s, u : e.g. for a 3-dimensional ∗ ∗ u   1 2 3 1 (1) (2) (3) linear path Xu,Xu,Xu s u t the former equals 4 X s,uX s,uX s,u whereas the latter ≤ ≤ 1 (1) (2) (3) equals 6 X s,uX s,uX s,u .  Despite being both bilinear and anti-commutative, the Lie bracket defined by (2.8) does not endow the path space with the algebraic structure of a Lie algebra, because the Jacobi identity does not hold due to the non-associativity of the * multiplication. However, using (2.3), the Lie bracket is easily seen to satisfy the following interesting identity for any u s, t]: ∈ 1 2 3  2 3 1 3 1 2 X ,X ,X u + X ,X ,X u + X ,X ,X u     = X (1, 2, 3) X(1, 3, 2) +X  (2, 3, 1)  X(2,1, 3) + X(3, 1, 2) X(3, 2, 1) . s, u − s, u s, u − s, u s, u − s, u Thus, we note that the expression on the right hand side of the above Jacobi-type identity for our Lie bracket is an alternating sum of third order iterated integrals for all the permutations of the path indices where the sign of the term indexed by (i , i , i ) with distinct i 1, 2, 3 for k = 1, 2, 3 is the sign of the permutation 1 2 3 k ∈ { } that maps (1, 2, 3) to (i1, i2, i3).

2.5 Classification of paths using third order areas

2.5.1 Diffusion process market model

The idea of modelling the evolution of stock prices and other variables in financial markets by diffusion processes goes back to the pioneering work of L. Bachelier at the turn of the 20th century. By a diffusion process we mean an n-dimensional stochastic 1 n process Xt = Xt ,...,Xt t 0 that satisfies a stochastic differential equation ofthe ≥ form  m i i i j dXt = μ (t, Xt)dt + σj (t, Xt)dBt j=1 X i n+1 i n+1 where, for each i 1, . . . , n , μ : R R and σj : R R are the drift and ∈ { } → → i j diffusion (volatility) coefficients of Xt , and Bt t 0 are independent Brownian motion ≥ processes for j = 1, . . . , m.  1 n In a financial setting, a diffusion process Xt = Xt ,...,Xt t 0 may be used ≥i to model a financial market that consists of n market variables Xt (i = 1, . . . , n) such as stock and commodity prices, currency exchange rates and interest rates of various maturities. Thus, in such a modelling framework the market is driven by

39 m random processes – which may be interpreted as representing macro- and micro- economic factors as well as political or other events that affect the prices of financial instruments – whose impacts on the value of each market variable are superimposed on an underlying growth rate or ‘drift’ that is specific to each variable but may depend on the values of other variables as well as time. In an alternative, more restricted formulation, each market variable would be driven by a single random process associated with it, though the Brownian motions for different variables would be assumed to be correlated, and the drift and volatility of each variable would be functions of its own value and time only. In this section, we shall adopt this approach to modelling a financial market, as specified below. Some market variables could be expected to grow in value by a constant or possibly deterministic time-dependent rate: e.g. it might be appropriate to assume that the share price of a public listed company would grow at a constant rate subject to random shocks due to market releases of company-specific information or general economic data. By contrast, other market variables have a tendency to fluctuate around some long-term mean levels in a cyclical fashion – e.g. interest rates exhibit such mean- i reverting behaviour over economic cycles – and for any such market variable Xt one could postulate its drift to be of the form θi αi Xi where the parameters θi and − t i α are called its mean reversion speed and mean-reverting  level (or long-term mean), respectively, which in general might be time-dependent. Hence, if the current value of such a mean-reverting process is below its long-term mean, the drift is positive, whereas if its current value is above the mean-reverting level, the drift is negative, so that in both cases the process tends to revert towards its long-term mean. i From now on, we shall consider a financial market consisting of n variables Xt (i = 1, . . . , n) each of which follows either a with a constant drift (‘CD process’) i i i i dXt = μ dt + σ dBt (2.9) where μi and σi > 0 are constants, or a mean-reverting Ornstein-Uhlenbeck process (‘MR process’) dXi = θi αi Xi dt + σidBi (2.10) t − t t where αi, θi > 0 and σi > 0 are constants, and the correlation ρij between the i i j Brownian motion Bt driving the variable Xt and the Brownian driver Bt of another j i j ij variable Xt is given by E dBtdBt = ρ dt. For t 0, (2.9) and (2.10) can be easily integrated to give ≥   i i i i i Xt = X0 + μ t + σ Bt (2.11)

40 and t i i θit i θit i θi(t s) i X = X e− + α 1 e− + σ e− − dB . (2.12) t 0 − s Z0 i  i i Thus, for a mean-reverting variable Xt , the limit of E Xt ] as t tends to infinity is α , so this parameter is indeed the long-term mean of such a process. 1 n We will simulate the evolution of our market Xt = Xt ,...,Xt 0 t T , as defined ≤ ≤ above, over a finite time horizon 0,T by generating a large number of correlated Brownian sample paths Bi for the market variables Xi for i = 1, . . . , n, t 0 t T   t 0 t T ≤ ≤ ≤ ≤ assuming that all pairs of Brownian motions driving distinct  market variables have the same correlation ρ. Even though in our diffusion model all paths of market variables are, by construc- tion, realisations of CD or MR processes for some combinations of drift (or mean reversion speed and long-term mean) and volatility parameters, just by looking at such market paths it is usually far from apparent which type of diffusion process was used to generate them and what parameter values those processes may have had. For example, individual realisations of an MR process for different Brownian sample paths might well appear either upward or downward trending rather than exhibiting mean-reverting behaviour depending on the characteristics of the Brownian sample paths driving the process. Indeed, an arbitrary continuous path can be represented as a realisation of either a CD or MR process with any combination of parameters for some continuous path as the Brownian driver of the given path; and, further, any number of arbitrary paths can all be regarded as realisations of, say, the same CD process driven by different Brownian sample paths. As can be seen from (2.9) and (2.10), for a small time step δt, the drift term of the increment of a CD or MR process is of the order of δt whereas the expected absolute value of the diffusive term is of the order of √δt. This means that over short time scales the diffusive term tends to dominate the drift term with random Brownian movements obscuring any underlying constant drift or mean-reverting trend, such latent tendencies only manifesting themselves over longer time periods. In practice, when trying to estimate the parameters of a CD process one commonly finds that while one can usually obtain a reasonably accurate estimate for the volatility parameter by calculating the standard deviation of increments for a small number of realised paths – even for a single realisation – the sample mean of increments is often, even for a significantly larger number of realisations, a woefully inadequate estimate for thedrift parameter.

41 2.5.2 Areas for pairs of diffusion processes

The key to classifying paths in our diffusion process market model is to derive general expressions for the area differentials of pairs of realisations of CD or MR processes.

ik ik ik For two CD processes Xt 0 t T with drift and volatility parameters μ and σ ≤ ≤ where i 1, . . . , n for k 1, 2 , both of which are driven by the same Brownian k ∈ { } ∈ { } path Bt 0 t T , by substituting (2.9) and (2.11) into (2.2) one readily obtains ≤ ≤  1 dA(i1, i2) = μi1 σi2 μi2 σi1 t dB B dt (2.13) 0, t 2 − t − t n  o ik for 0 t T . Similarly, for two MR processes Yt 0 t T with long-term mean ≤ ≤ ≤ ≤ ik ik and volatility parameters α and σ and the same mean reversion speed θ where i 1, . . . , n for k 1, 2 , both of which are driven by the same Brownian path k ∈ { } ∈ { } Bt 0 t T , by substituting (2.10) and (2.12) into (2.2) and after some algebra one ≤ ≤ arrives  at the following expression: 1 dA(i1, i2) = (αi1 Y i1 )σi2 (αi2 Y i2 )σi1 f (θ)dB θZ (θ)dt (2.14) 0, t 2 − 0 − − 0 t t − t n o t   θt θ(t s) where ft(θ) = 1 e− and Zt(θ) = e− − dBs for 0 t T . We note that − 0 ≤ ≤ as θ 0, f (θ) θt and Z (θ) ZB , so in the limit when the mean reversion → t → t → t speed approaches zero, we recover (2.13) with the drifts μik equal to the initial drifts i θ(αik Y k ) of the two MR processes for k 1, 2 . − 0 ∈ { } Just as straightforwardly one could also derive a (somewhat messier) expression for the area differential of a CD process and an MR process driven by thesame

Brownian path as a linear combination of dBt and dt terms whose coefficients are functions of ft(θ), t, Bt and Zt(θ) as well as the parameters of the two processes, but since this won’t be used in our subsequent analysis it is omitted. Hence, as can be seen from (2.13) and (2.14), the increment of the area of two CD or two MR processes with the same mean reversion speed and driven by the same Brownian path over a small (non-infinitesimal) time interval δt has a stochastic dependence on the cumulative value of the underlying Brownian motion as well as a deterministic time dependence, and, depending on the relative magnitudes and signs of these quantities, the area increment can have either the same or the opposite sign as the Brownian increment δBt at different points in time along the path. Thus, in this sense increments of the area process of two such CD or MR processes are in gen- eral neither perfectly correlated nor anti-correlated with increments of the Brownian motion driving the processes.

42 However, the key observation is that the ratio of area increments of two pairs of CD processes or two pairs of MR processes that have the same mean reversion speed, with all four processes driven by the same Brownian path, is a deterministic, time-invariant constant that depends only on the drift and volatility parameters of the four CD processes or on the long-term mean and volatility parameters of the four MR processes (as well as their initial values, but is independent of the mean reversion speed shared by the four MR processes). This means that in these cases the (i1, i2) (i3, i4) trajectory t A 0, t ,A 0, t is a straight line of constant gradient for t 0,T , 7→ ((i1, i2), (i3, i4)) ∈ or, equivalently, the third order area A 0, t is identically equal to zero for all t 0,T . This is illustrated in Figure 6 below where, at time points t = ( i )T for ∈ i 100 i = 0, 1,..., 100, the area of two MR processes is plotted against the area of another pair of MR processes with different long-term mean and volatility parameters but all four processes having the same mean reversion speed and driven by the same Brownian path.

Figure 6: Scatter plot of areas of two pairs of MR processes with different long-term means and volatilities, but all four processes having the same mean reversion speed and driven by the same Brownian path.

43 To underline the fundamental importance of the assumptions of the same mean reversion speed for MR processes and a common Brownian driving path to the linear relationship between areas of pairs of CD or MR processes, Figure 7 below illustrates the dramatic impact that changing the mean reversion speed – even slightly – for one of the MR processes that were used to create Figure 6 can have on this relationship – it suddenly becomes highly non-linear!

Figure 7: Scatter plot of areas of the same two pairs of MR processes as in Figure 6 after slightly altering the mean reversion speed for one of the processes.

In fact, the area process of a pair of CD or MR processes with the same mean reversion speed can sometimes be completely ‘derailed’ in this manner by changing any parameter of one of the processes for the duration of a single time step only while having a barely perceptible effect on the future evolution of the process that has been momentarily perturbed. So, any hope that the above result on the linear relationship between area processes could be extended to CD or MR processes with deterministic time-dependent drift or volatility parameters is unfortunately forlorn (as is also evident from a theoretical perspective by relaxing the parameter assumptions in (2.13) and (2.14) above) – it only works for constant parameters.

44 If instead one pairs a CD process with an MR process and plots their area against the area of a pair of either two CD processes or two MR processes or against that of another mixed pair of CD and MR processes, one can no longer expect to observe a linear relationship between the two area processes even though all four path processes may still be driven by the same Brownian motion. And, naturally, the same will be true for a scatter plot of the areas of a pair of CD paths and a pair of MR paths. Figure 8 below exhibits the areas of a pair of CD processes and a mixed pair of CD and MR processes plotted against each other for one Brownian sample path – and for different realisations of Brownian motion this scatter plot would look very different!

Figure 8: Scatter plot of the areas of a pair of two CD processes and a mixed pair of CD and MR processes all driven by the same Brownian path.

As a further example of intriguing non-linear behaviour that can be witnessed in some of such cases, in Figure 9 below we display the final values of the areas of two mixed pairs of CD and MR processes at the end of the time interval 0,T for 500 Brownian sample paths.   As can be seen from Figure 9, different Brownian paths have quite disparate effects on these two area processes resulting in a highly non-linear scatter plot. It is rather surprising that by simply changing some of the parameter values (e.g. the long-term

45 Figure 9: Scatter plot of terminal values of the areas of two mixed pairs of CD and MR processes for 500 simulation runs. means of the two MR processes) the relationship between the two area processes can be rendered approximately linear, as is shown in Figure 10 below, so that for each Brownian path both of these areas evolve in essentially the same way. Furthermore, the linear relationship can be made even tighter by increasing mean reversion speeds for the MR processes (without affecting the slope). As said, for the areas of two pairs of CD or MR processes to be linearly related, in addition to all four processes having the same mean reversion speed (regarding CD processes as having zero mean reversion speed even though in general they have non-zero drifts), it is also a crucial requirement that they should all be driven by the same Brownian path. To illustrate this point, in Figure 11 below are plotted the terminal values of the areas of the two pairs of MR processes that were used to produce Figure 6 for 500 simulation runs under the scenarios where the Brownian motions driving the four MR processes have pairwise correlations of 1.00, 0.99 and 0.90, respectively. We observe in the scatter plots increasing dispersion about the gradient as correlation is lowered. Indeed, this effect is quite pronounced even fora marginal reduction in correlation from 1.00 to 0.99 though this makes the Brownian

46 Figure 10: Scatter plot of terminal values of the same two areas as in Figure 9 with different long-term means assigned to the MR processes. paths diverge only slightly from one another. Reducing correlation further to 0.90 – which is still a high value for path correlation – renders the linear relationship scarcely discernible, and for lower values of correlation the points appear to be scattered at random without any recognisable pattern. Alternatively, one could consider the third order area of four CD or MR processes with the same mean reversion speed for a single simulation run (with the same four sequences of independent Gaussian increments) for different values of the pairwise correlation ρ. By examining the entries of Cholesky factorization of the correlation matrix2 that is used to produce correlated Gaussian increments from four sequences of independent drawings from the standard normal distribution, it is not too difficult to see that, for values of ρ close to 1, the third order area is O 1 ρ2 , so that, − for example, reducing correlation from 99.99% to 99.00% increasesp the third order area approximately by a factor of 10. However, when all the diffusion processes have the same parameters – so that the four paths are realisations of the same CD or MR

2 For it can readily be shown that each of the entries below the top row (1, ρ, ρ, ρ) of the upper triangular Cholesky factor tends to some constant multiple of 1 ρ2 as ρ approaches 1. − p

47 Figure 11: Scatter plots of terminal values of the areas of two pairs of MR processes all having the same mean reversion speed for 500 simulation runs with the pairwise correlation between Brownian motions driving the processes equal to 1.00, 0.99 and 0.90, respectively. process for highly correlated Brownian motions – in the expression (2.7) for the third order area all the fourth order iterated integrals that are O 1 ρ2 actually cancel − out, wherefore in this case the third order area is in fact O 1 ρ2 , and, for this p −  reason, it grows twice as fast on a logarithmic scale as correlation is reduced from 1.

48 2.5.3 Classifying sample paths of diffusion processes by using third order areas

In the previous subsection, we established, via (2.13) and (2.14), the fundamental result that the area process of the areas of two pairs of paths – i.e. the third order area (of this type) of the four paths concerned – that are realisations of either CD processes (with arbitrary drift and volatility parameters) or MR processes which have the same mean reversion speed (but arbitrary long-term means and volatilities) is identically zero as a function of time whenever all of these processes are driven by the same Brownian motion. We also saw that the third order area provides a sensitive means of detecting differences in mean reversion speeds between four given diffusion processes thatare driven by the same Brownian motion, as the value of the third order area of their sample paths can deviate spectacularly from zero even when discrepancies in mean reversion speeds are so slight that their impacts on the sample paths are hardly visible. Thus, third order areas can be used to classify sample paths of diffusion processes according to their mean reversion speeds, for given an arbitrary ‘market’ path that is assumed to be the realisation of a CD or MR process for some Brownian sample path, we can combine it with three other paths that are realisations of either CD or MR processes with the same mean reversion speed (but drift or long-term mean and volatility parameters that can be given arbitrary values) all driven by the same Brownian motion, and run a sufficiently large number of simulations to determine the value of mean reversion speed and the Brownian sample path that minimise the third order area of these four paths. More precisely, we want to find the combination of mean reversion speed and Brownian sample path that minimises the Euclidean norm of the third order area over the time interval 0,T as given by the expression below

n   1/2 2 ((1, 2), (3, 4)) A 0, i(T/n) i=0 ! X   where n is the number of time steps per simulation. In this way, computing third order areas of market paths with sets of three test paths of known characteristics can be used to differentiate between the two basic modes of market behaviour, namely upward or downward trending (with constant drift) versus mean-reverting. However, going back to our earlier discussion in Subsection 2.5.1, since an arbitrary market path can be represented as the realisation of either a CD or MR process (with any given parameters) for some Brownian sample paths, there are no absolute

49 grounds for regarding one path as ‘upward trending’, say, and another path as ‘mean- reverting’ if one works with the whole sample space of Brownian motion, that is, the set of all continuous paths. To make the distinction between these two types of path meaningful, we need to restrict, for the purposes of our analysis, the cardinality of the set of Brownian sample paths driving diffusion processes to a finite number. Nevertheless, this number, which we will denote by N, can be chosen to be arbitrarily large. Let us now illustrate our path classification method with a numerical example. As first step, we generate 1,000 Brownian sample paths, each comprising 100 independent Gaussian increments (so in this example n = 100 and N = 1000). For our market path, we choose the realisation of an MR process with mean reversion speed θmkt equal to 5.1 (and some arbitrary long-term mean and volatility) for the first Brownian sample path (i.e. of index 1). Then, for three ‘test’ MR processes which have the same mean reversion speed θtest (and fixed but arbitrary long-term means and volatilities), we compute the third order area of the quadruple of market and test paths that are the realisations of the test MR processes for each of the 1,000 Brownian sample paths – so that in each case the three test paths are driven by the same Brownian motion – and for each value of θ 0.0, 1.0,..., 9.0 we record the minimum Euclidean test ∈ { } norm of the third order area and the index of the Brownian sample path that attains the minimum area value3. The results of this experiment are displayed in Table 1.

Table 1: Determining the mean reversion speed of a given ‘market’ path by minimising its third order area with three test paths all driven by the same Brownian motion.

3 For θtest = 0.0, the test paths are actually realisations of CD processes with different drifts and volatilities. Hence, for a market path that is the realisation of some CD process, the third order area would be identically zero when the test paths are driven by the first Brownian sample path.

50 There are several interesting observations that one can make from these results. Firstly, we note a general tendency for the third order area to increase with increasing value of θtest – which is indeed what one would expect on the basis of (2.14). However, this trend is bucked at θtest = 5.0 – the value of mean reversion speed for the test paths that is closest to that of the market path (recall θmkt = 5.1) – where we observe a sharp dip in the minimum value of the Euclidean norm of the third order area that is attained when the test paths are driven by the same Brownian motion as the market path. It is also worth noticing that while for θtest = 6.0 the minimum third order area is also attained by the same Brownian sample path (of index 1) – though with much higher minimum value compared to when θtest = 5.0 – for all other values of θtest different Brownian sample paths are responsible for producing the minima, as shown in Table 1.

Having thus found an approximate value for θmkt – namely, that it is around 5.0 – as well as correctly identified the Brownian sample path that drives the market path, we could proceed to determine θmkt to an arbitrary degree of accuracy by iterating on the value of θtest to make the third order area converge towards zero. However, there is a more direct and efficient way to determine the exact value of θmkt as well as those of the other parameters: since both the market path and the Brownian sample path driving it are now completely known, we can write down equations for, say, the first three increments of the market path using (2.10), and as these are three simultaneous equations that are linear in θmkt, σmkt and θmktαmkt, they can be easily solved for the mean reversion speed, volatility and long-term mean of the MR process whose realisation the market path is. As this example demonstrates, in a market model where the evolution of every variable is the realisation of a CD or MR diffusion process for one of finitely many Brownian sample paths, the above method of computing third order areas can be used to classify market paths into upward/downward trending versus mean-reverting ones by efficiently determining the values of mean reversion speed and other parameters as well as the Brownian sample path that drives the diffusion process. In order to simulate the application of this path classification method to real mar- ket data streams (i.e. arbitrary continuous paths), we also carried out an experiment where market paths were realisations of CD or MR processes for Brownian sample paths not belonging to the set of Brownian paths that were used to drive test paths. Unfortunately, unlike in the previous experiment, in this experiment the approximate value of mean reversion speed of the market path could not be identified as the value of mean reversion speed for test paths that produces the lowest third order area –

51 even when the market and test paths had exactly the same mean reversion speed their third order area did not stand out by having a distinctively low, or even zero, value, and in most cases the Brownian driver of test paths that minimised third order area bore no resemblance to the Brownian path driving the market path! Moreover, increasing the number of Brownian sample paths from 1,000 to 10,000 or even 100,000 did not improve the performance of this path classification method. Evidently, the lesson to learn from this latter exercise is that even in a very large but finite space of Brownian sample paths the probability of chancing upon a path that is, in some defined sense, close to a given arbitrary path is vanishingly small – in fact zero–and in the previous subsection we saw how even tiny discrepancies in the driving paths of CD or MR processes can have a dramatic impact on their area processes! In view of this negative (though not at all surprising) result, it is clear that the proposed method of classifying paths according to their mean reversion speeds is not applicable, without modification, in the general market setting where paths of market variables can be thought of as realisations of diffusion processes for arbitrary sample paths of Brownian motion (i.e. any continuous paths). However, if the problem could somehow be reduced to one involving a space of only finitely many Brownian sample paths, this method would be rendered viable, as was shown in the first experiment. One idea in this direction would be to try to use third order areas of quadruples of an arbitrary market path and three CD or MR test paths computed, as in the above experiments, for a range of mean reversion speeds and a finite number of Brownian sample paths, in order to approximate a given market path by a sum of realisations of CD and MR processes for paths in a fixed, finite sample space of Brownian motion. This suggestion will be pursued by the author as a line of future research outside the scope of this thesis.

2.6 Conclusion

We began this chapter by surveying existing literature for applications of the theory of rough paths to time series analysis, and found that so far the two main approaches have been to use the signatures of multi-dimensional discrete data streams as feature sets in linear regression for the purposes of statistical classification and prediction – an approach that can capture subtle underlying features of market behaviour when applied to financial data – and to employ expected signatures of multi-dimensional stochastic processes for estimating their parameters.

52 In our original research work presented in this thesis, we have taken a different, more direct (non-statistical and non-probabilistic) approach by exploring ways in which first and higher order areas of multi-dimensional data streams – an nth order area being a specific linear combination of(n + 1)! signature components of order (n + 1) – can be used to classify data streams according to their basic characteristics. Having first developed all the requisite mathematical and computational tools for this task, we have shown, as a particular application of this approach, that in a market model where every variable follows either a Wiener process with a constant drift or a mean-reverting Ornstein-Uhlenbeck process driven by one of finitely many Brownian sample paths, third order areas provide an efficient means of determining the parameters of a market variable given any of its realisations, thus enabling one to distinguish between the two fundamental modes of market behaviour, namely upward or downward trending versus mean-reverting. In conclusion, our path classification method based on third order areas represents a novel way of using signature components of multi-dimensional paths for the purposes of time series analysis of financial data streams. An interesting idea for future research would be to investigate the possibility of using third order areas as a tool for decomposing arbitrary market paths into mean- reverting path components with a spectrum of mean reversion speeds.

53 References

[1] K. T. Chen. Iterated path integrals. Bulletin of American Mathematical Society, 83:831–879, 1977.

[2] G. Flint, B. Hambly, and T. Lyons. Discretely sampled signals and the rough Hoff process. Stochastic Processes and their Applications, arXiv:1310.4054v11, 2016.

[3] X. Geng. Reconstruction for the signature of a rough path. Preprint arXiv:1508.06890v2, 2016.

[4] M. Gubinelli. Ramification of rough paths. Journal of Differential Equations, 248(4):693–721, 2010.

[5] L. J. Gyurk´o,T. Lyons, M. Kontkowski, and J. Field. Extracting information from the signature of a financial data stream. Preprint arXiv:1307.7244v2, 2014.

[6] B. M. Hambly and T. J. Lyons. Uniqueness for the signature of a path of bounded variation and the reduced path group. Annals of Mathematics, 171(1):109–167, 2010.

[7] B. Hoff. The Brownian frame process as a rough path. D.Phil. thesis, University of Oxford, 2005.

[8] D. Levin, T. Lyons, and H. Ni. Learning from the past, predicting the statistics for the future, learning an evolving system. Preprint arXiv:1309.0260v6, 2016.

[9] T. J. Lyons. Differential equations driven by rough signals. Revista Matem´atica Iberoamericana, 14(2):215–310, 1998.

[10] T. J. Lyons, M. Caruana, and T. L´evy. Differential Equations Driven by Rough Paths. Number 1908 in Lecture Notes in Mathematics. Springer-Verlag, 2007.

54 [11] T. J. Lyons and Z. Qian. System Control and Rough Paths. Oxford Mathematical Monographs. Oxford University Press, 2002.

[12] R. Ree. Lie elements and an algebra associated with shuffles. Annals of Mathe- matics, 68(2):210–220, 1958.

[13] L. C. Young. An inequality of H¨oldertype, connected with Stieltjes integration. Acta Mathematica, 67:251–282, 1936.

55 Appendix 1: Quadratic variation and cross-variation of data streams

In this appendix we compute the signature of a multi-dimensional data stream after lead- and lag-transforming two of its path components respectively. N 1 d N d Let Xt t=0 = Xt e1+ +Xt ed t=0 be a data stream in V = R with a set of basis ∙ ∙ ∙ 2N N vectors e , . . . , e , and denote by Y the data stream derived from X { 1 d}  t/2 t=0 t t=0 th th by lead-transforming its i component and lag-transforming its j component  in i i j j accordance to the GLKF method so that Yt = Xt and Yt = Xt for t = 0,...,N, and i i j j Yt 1 = Xt and Yt 1 = Xt 1 for t = 1,...,N. As we are ultimately interested in the − 2 − 2 − area between the Y i and Y j path components – i.e. (half of) the difference between the (i, j) and (j, i) second order signature components of Y – in our calculation we will explicitly show only those signature components that can contribute to this quantity (i.e. the first order increments in Y i and Y j in addition to the aforementioned second order signature components). 1 Computing the signature of Y over the time interval 0, 2 , we get i i 1 i i  1  i i S(Y )0, 1 = 1 + (X1 X0)ei + 0ej + (X1 X0).0ei ej + 0.(X1 X0)ej ei + ... 2 − 2 − ⊗ 2 − ⊗ = 1 + (Xi Xi)e + ... 1 − 0 i j j Likewise, S(Y ) 1 ,1 = 1 + (X1 X0 )ej + ... , whence we obtain 2 − i i j j i i j j S(Y )0,1 = S(Y )0, 1 S(Y ) 1 ,1 = 1+(X1 X0)ei+(X1 X0 )ej+(X1 X0)(X1 X0 )ei ej+... 2 ⊗ 2 − − − − ⊗ Similarly, we have

i i j j i i j j S(Y )1,2 = S(Y )1, 3 S(Y ) 3 ,2 = 1+(X2 X1)ei+(X2 X1 )ej+(X2 X1)(X2 X1 )ei ej+... 2 ⊗ 2 − − − − ⊗ which, as S(Y ) S(Y ) = S(Y ) , gives 0,1 ⊗ 1,2 0,2 S(Y ) = 1 + (Xi Xi)e + (Xj Xj)e 0,2 2 − 0 i 2 − 0 j + (Xi Xi)(Xj Xj) + (Xi Xi)(Xj Xj) + (Xi Xi)(Xj Xj) e e 1 − 0 1 − 0 2 − 1 2 − 1 1 − 0 2 − 1 i ⊗ j j j i i + (X X )(X X )ej ei + ... 1 − 0 2 − 1 ⊗ 56 Further, extending the signature calculation to the next time step yields

S(Y ) = 1 + (Xi Xi)e + (Xj Xj)e 0,3 3 − 0 i 3 − 0 j + (Xi Xi)(Xj Xj) + (Xi Xi)(Xj Xj) + (Xi Xi)(Xj Xj) 1 − 0 1 − 0 2 − 1 2 − 1 3 − 2 3 − 2 i i j j i i j j (X X )(X X ) + (X X )(X X ) ei ej 1 − 0 2 − 1 2 − 0 3 − 2 ⊗ j j i i j j i i + (X X )(X X ) + (X X )(X X ) ej ei + ... 1 − 0 2 − 1 2 − 0 3 − 2 ⊗ 

Thus, we can see from the above expressions for S(Y )0,1, S(Y )0,2, S(Y )0,3, etc. i j i j that if Xt = Xt for all t = 0,...,N – i.e. Y and Y are the lead- and lag-transform of the same data stream – then the difference between the (i, j) and (j, i) signature components of Y is equal to the quadratic variation of this data stream. However, when Xi and Xj are two different data streams, it is evident that by subtracting the (j, i) component from the (i, j) component does not in general cancel out the ‘extra’ terms in the latter, leaving just the quadratic cross-variation of Xi and Xj. This can also be experimentally demonstrated in random simulations where (twice) the area between lead- and lag-transformed data streams and the quadratic cross-variation of the original streams usually have decidedly different values.

57 Appendix 2: Python code

In this appendix we exhibit samples of source code of the computer programs written in Python that have been used to produce all the computational results presented in this thesis.

1. Signature function

The numerical algorithm that is fundamental to all the computational work done in this piece of research is a function that generates a time-indexed sequence of truncated signatures of an arbitrary degree specified by the user for a given multi-dimensional time series. This is accomplished through the following two-step procedure: 1) by repeated application of outer matrix multiplication (a standard Numpy functionality) computing from path increments iterated integrals of successively higher orders for each linear segment of the continuous multi-dimensional path constructed by linearly interpolating between discrete data points, and constituting these into an incremen- tal signature over the time step by appending them to a list starting with 1 as the zeroth order component; and 2) by joining together incremental signatures for con- tiguous time steps using tensor multiplication (by another application of outer matrix multiplication) to form a cumulative signature over the whole time period. The source code of our user-defined signature function is shown in full below. import numpy as np

""" Definition of a function to compute a time-indexed sequence of truncated signatures of a specified degree of multi-dimensional path increments """

""" For t=0,1,...,(t_fin-t_ini) and k=0,1,...,p, Sig( ) function computes signature components (i.e. iterated integrals) S[t][k][d_1,...,d_k] with d_j=0,1,...,d-1 for j=1,...,k """ def Sig(path_increments, t_ini, t_fin, signature_degree):

I = path_increments # matrix of floats with n rows and d columns p = signature_degree # degree of truncated signature n = len(I[:,0]) # number of increments per path component d = len(I[0,:]) # number of path components (dimension)

# t_ini (0,...,n-1) and t_fin (1,...,n) are initial and final time points

58 """ Initialize signature for time t_ini """

Sig_0 = [1.0] Z = np.zeros(d)

for i in range(p):

s = np.multiply.outer(Sig_0[i], Z) Sig_0.append(s)

Sig = Sig_0 S = [Sig_0]

""" Compute incremental signature for each time step and update cumulative signature """

for i in range(t_ini, t_fin, 1): # iterate over time points from t_ini to t_fin-1

J = I[i,:] # extract ith row of path increments Sig_inc = [1.0]

for j in range(p): # compute incremental signature

x = np.multiply.outer(Sig_inc[j], J) x = x/(j+1) Sig_inc.append(x)

Sig_new = [1.0]

for k in range(1, p+1, 1): # k = 1,...,p

y = Sig_0[k]

for l in range(k+1): # l = 0,1,...,k

y = y + np.multiply.outer(Sig[l], Sig_inc[k-l])

Sig_new.append(y)

Sig = Sig_new # update cumulative signature

S.append(Sig) # form sequence of signatures indexed by time

return S # return value of user-defined signature function

2. Lead-lag transforms

Below we exhibit a Python script that can be used to obtain lead and lag transforms of arbitrary multi-dimensional data streams in accordance with the three methods examined in Section 2.3, and to compute their signatures – to check that they are indeed invariant under these transforms – as well as to visualise the original and lead-lag transformed paths.

import numpy as np import matplotlib.pyplot as plt import pylab from matplotlib.ticker import MultipleLocator

59 majorLocator = MultipleLocator(0.20) minorLocator = MultipleLocator(0.10)

""" Set dimensionality and number of time steps """ d = 3 # number of paths (dimensionality) n = 10 # number of increments per path p = 3 # degree of signature

""" Generate original Brownian paths """ np.random.seed(1) # initialize random number generation

I = np.random.randn(n, d) # standard normal random increments z = np.zeros(d) dP = np.vstack( (z, I) )

P = np.cumsum(dP, axis=0) # original Brownian paths

S = Sig(I, 0, n, p) # compute their signature

""" Lead transform paths (GLKF method) """ c1 = np.delete( np.repeat(P[:,0], 2), 0) # lead transform first path c2 = np.delete( np.repeat(P[:,1], 2), 0) # lead transform second path c3 = np.delete( np.repeat(P[:,2], 2), 0) # lead transform third path

R = np.column_stack( (c1, c2, c3) ) # lead transformed paths

R1 = np.delete(R, 0, 0) # delete top row R2 = np.delete(R, 2*n, 0) # delete bottom row

J = R1 - R2 # increments of lead transformed paths

T = Sig(J, 0, 2*n, p) # compute their signature

""" Lag transform paths (GLKF method) """

C1 = np.delete( np.repeat(P[:,0], 2), (2*n + 1)) # lag transform first path C2 = np.delete( np.repeat(P[:,1], 2), (2*n + 1)) # lag transform second path C3 = np.delete( np.repeat(P[:,2], 2), (2*n + 1)) # lag transform third path

Q = np.column_stack( (C1, C2, C3) ) # lag transformed paths

Q1 = np.delete(Q, 0, 0) # delete top row Q2 = np.delete(Q, 2*n, 0) # delete bottom row

K = Q1 - Q2 # increments of lag transformed paths

U = Sig(K, 0, 2*n, p) # compute their signature

""" Lead and lag transformed paths have the same signature as the original paths! """

60 print S # original signature print T # lead transformed signature print U # lag transformed signature

""" Plot the original and lead/lag transformed paths """ t = np.linspace(0.0, 1.0, n+1) u = np.linspace(0.0, 1.0, (2*n)+1) j = 1 # choose path (j = 0, . . . , d-1) plt.figure() plt.plot(t, P[:,j], ’-o’, color=’green’, label = ’data stream’) # original path plt.plot(u, R[:,j], ’-o’, color=’red’, label = ’lead transform’) # lead transformed path plt.plot(u, Q[:,j], ’-o’, color=’blue’, label = ’lag transform’) # lag transformed path plt.title(’Gyurko-Lyons-Kontkowski-Field Lead-Lag Transforms’, fontsize=14) ax = plt.axes() ax.xaxis.set_major_locator(majorLocator) ax.xaxis.set_minor_locator(minorLocator) ax.xaxis.grid(which=’minor’) plt.xlabel(’time’) plt.ylabel(’value’) pylab.legend(loc = ’upper left’) plt.grid() plt.show()

""" Lead transform paths (old FHL method) """ d1 = np.insert( np.repeat(P[:,0], 2), 2*n+2, P[n,0] ) # lead transform first path d1 = np.delete( np.delete(d1, 1), 1) d2 = np.insert( np.repeat(P[:,1], 2), 2*n+2, P[n,1] ) # lead transform second path d2 = np.delete( np.delete(d2, 1), 1) d3 = np.insert( np.repeat(P[:,2], 2), 2*n+2, P[n,2] ) # lead transform third path d3 = np.delete( np.delete(d3, 1), 1)

V = np.column_stack( (d1, d2, d3) ) # lead transformed paths

V1 = np.delete(V, 0, 0) # delete top row V2 = np.delete(V, 2*n, 0) # delete bottom row

L = V1 - V2 # increments of lead transformed paths

X = Sig(L, 0, 2*n, p) # compute their signature

""" Lag transform paths (old FHL method) """

D1 = np.insert( np.repeat(P[:,0], 2), 0, P[0,0] ) # lag transform first path D1 = np.delete( np.delete(D1, 2*n), 2*n) D2 = np.insert( np.repeat(P[:,1], 2), 0, P[0,1] ) # lag transform second path D2 = np.delete( np.delete(D2, 2*n), 2*n) D3 = np.insert( np.repeat(P[:,2], 2), 0, P[0,2] ) # lag transform third path D3 = np.delete( np.delete(D3, 2*n), 2*n)

W = np.column_stack( (D1, D2, D3) ) # lag transformed paths

61 W1 = np.delete(W, 0, 0) # delete top row W2 = np.delete(W, 2*n, 0) # delete bottom row

M = W1 - W2 # increments of lag transformed paths

Y = Sig(M, 0, 2*n, p) # compute their signature

""" Lead and lag transformed paths have the same signature as the original paths! """ print S # original signature print X # lead transformed signature print Y # lag transformed signature

""" Plot the original and lead/lag transformed paths """ t = np.linspace(0.0, 1.0, n+1) u = np.linspace(0.0, 1.0, (2*n)+1) j = 1 # choose path (j = 0, . . . , d-1) plt.figure() plt.plot(t, P[:,j], ’-o’, color=’green’, label = ’data stream’) # original path plt.plot(u, V[:,j], ’-o’, color=’red’, label = ’lead transform’) # lead transformed path plt.plot(u, W[:,j], ’-o’, color=’blue’, label = ’lag transform’) # lag transformed path plt.title(’Flint-Hambly-Lyons Lead-Lag Transforms: Old Method’, fontsize=14) ax = plt.axes() ax.xaxis.set_major_locator(majorLocator) ax.xaxis.set_minor_locator(minorLocator) ax.xaxis.grid(which=’minor’) plt.xlabel(’time’) plt.ylabel(’value’) pylab.legend(loc = ’upper left’) plt.grid() plt.show()

""" Lead transform paths (new FHL method) """ e1 = np.delete( np.repeat(np.delete(P[:,0], 0), 4) , 0) # lead transform first path e1 = np.insert( np.insert( e1, 4*n - 1, P[n,0]), 4*n, P[n,0]) # in order to preserve signature, set . . . #e1[0] = 0.0 e2 = np.delete( np.repeat(np.delete(P[:,1], 0), 4) , 0) # lead transform second path e2 = np.insert( np.insert( e2, 4*n - 1, P[n,1]), 4*n, P[n,1]) # in order to preserve signature, set . . . #e2[0] = 0.0 e3 = np.delete( np.repeat(np.delete(P[:,2], 0), 4) , 0) # lead transform third path e3 = np.insert( np.insert( e3, 4*n - 1, P[n,2]), 4*n, P[n,2]) # in order to preserve signature, set . . . #e3[0] = 0.0 v = np.column_stack( (e1, e2, e3) ) # lead transformed paths v1 = np.delete(v, 0, 0) # delete top row v2 = np.delete(v, 4*n, 0) # delete bottom row

62 l = v1 - v2 # increments of lead transformed paths x = Sig(l, 0, 4*n, p) # compute their signature

""" Lag transform paths (new FHL method) """

E1 = np.repeat(np.delete(P[:,0], n), 4) # lag transform first path (corrected way) E1 = np.insert( E1, 4*n, P[n,0]) E1[4*n]=E1[4*n-1] # incorrect FHL definition E2 = np.repeat(np.delete(P[:,1], n), 4) # lag transform second path (corrected way) E2 = np.insert( E2, 4*n, P[n,1]) E2[4*n]=E2[4*n-1] # incorrect FHL definition E3 = np.repeat(np.delete(P[:,2], n), 4) # lag transform third path (corrected way) E3 = np.insert( E3, 4*n, P[n,2]) E3[4*n]=E3[4*n-1] # incorrect FHL definition w = np.column_stack( (E1, E2, E3) ) # lag transformed paths w1 = np.delete(w, 0, 0) # delete top row w2 = np.delete(w, 4*n, 0) # delete bottom row m = w1 - w2 # increments of lag transformed paths y = Sig(m, 0, 4*n, p) # compute their signature

""" Neither lead nor lag transform preserves signature! """ print S # original signature print x # lead transformed signature print y # lag transformed signature

""" Plot the original and lead/lag transformed paths """ t = np.linspace(0.0, 1.0, n+1) f = np.linspace(0.0, 1.0, (4*n)+1) j = 1 # choose path (j = 0, . . . , d-1) plt.figure() plt.plot(t, P[:,j], ’-o’, color=’green’, label = ’data stream’) # original path plt.plot(f, v[:,j], ’-o’, color=’red’, label = ’lead transform’) # lead transformed path plt.plot(f, w[:,j], ’-o’, color=’blue’, label = ’lag transform’) # lag transformed path plt.title(’Flint-Hambly-Lyons Lead-Lag Transforms: New Method’, fontsize=14) ax = plt.axes() ax.xaxis.set_major_locator(majorLocator) ax.xaxis.set_minor_locator(minorLocator) ax.xaxis.grid(which=’minor’) plt.xlabel(’time’) plt.ylabel(’value’) pylab.legend(loc = ’upper left’) plt.grid() plt.show()

63 3. Area simulations

The code sample displayed below is an extract from a program that for a set of constant drift Wiener and mean-reverting Ornstein-Uhlenbeck processes with user- specified parameters and correlation matrix computes first, second and thirdorder areas of their sample paths as functions of time using the closed-form formulae in terms of second, third and fourth order iterated integrals, respectively (as worked out in Section 2.4) for a given number of simulation runs.

import numpy as np import sys

""" Set parameters for simulation """ d = 8 # number of paths (i.e. dimensionality of market) n = 100 # number of increments per path (i.e. number of time steps) t_ini = 0 # initial time point t_fin = n # final time point p = 4 # degree of signature dt = 1.0/n # length of time step sqrdt = np.sqrt(dt) rho = 0.99 # correlation of any pair of paths M = 1000 # number of simulation runs np.random.seed(1) # initialize random number generation

""" Set parameters for diffusion processes """

P = np.zeros((d, 3))

""" Each row of matrix P specifies mean reversion speed (set 0.0 for any constant drift Wiener process), long-term mean or drift, and volatility of an Ornstein-Uhlenbeck or Wiener process """

P[0] = [0.0, 10.0, 5.0] P[1] = [0.0, 5.0, 10.0] P[2] = [0.0, -10.0, 5.0] P[3] = [0.0, -5.0, 10.0] P[4] = [10.0, 10.0, 5.0] P[5] = [10.0, 5.0, 10.0] P[6] = [10.0, -10.0, 5.0] P[7] = [10.0, -5.0, 10.0] iv = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] # initial values of the paths correlation = ’same’ # ’same’ to use the same correlation (rho) for all pairs of paths

""" Choose pairs of paths for two first order areas """

# indices run from 0 to d-1 # set i1 indices to be less than d/2, i2 indices d/2 or greater; similarly for higher order areas

64 i1 = [0,1] i2 = [4,5]

""" Choose triples of paths for two second order areas """ j1 = [0,1,2] j2 = [4,5,6]

""" Choose quadruples of paths for two third order areas of type (a) """ k1 = [0,1,2,3] k2 = [4,5,6,7]

""" Choose quadruples of paths for two third order areas of type (b) """ l1 = [0,1,2,3] l2 = [4,5,6,7]

""" Specify distinct correlations for each pair of paths """

R = np.zeros((d, d))

R[0] = [1.0, 0.5, 0.4, 0.7, 0.3, 0.2, 0.4, 0.6] R[1] = [0.0, 1.0, 0.3, 0.4, 0.2, 0.5, 0.6, 0.4] R[2] = [0.0, 0.0, 1.0, 0.6, 0.4, 0.3, 0.5, 0.2] R[3] = [0.0, 0.0, 0.0, 1.0, 0.4, 0.5, 0.3, 0.7] R[4] = [0.0, 0.0, 0.0, 0.0, 1.0, 0.3, 0.2, 0.5] R[5] = [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.4, 0.2] R[6] = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.5] R[7] = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]

R = R + np.transpose(R) - np.eye(d) # correlation matrix if rho < 1.0 or correlation != ’same’:

if correlation == ’same’: # use the same correlation (rho) for all pairs of paths

R = np.ones((d,d))*rho + np.eye(d)*(1 - rho)

E = np.linalg.eigvals(R) # compute eigenvalues of correlation matrix

if E[0]<=0 or E[1]<=0 or E[2]<=0 or E[3]<=0 or E[4]<=0 or E[5]<=0 or E[6]<=0 or E[7]<=0:

sys.exit("Correlation matrix is not positive definite")

else:

L = np.linalg.cholesky(R) # lower triangular Cholesky factor of correlation matrix

U = np.transpose(L) # upper triangular Cholesky factor of correlation matrix

""" Simulation loop """

# For multiple simulation runs (M > 1), first and higher order areas at the final time point (n)

65 A1 = [ ] # first order areas for first pair of paths B1 = [ ] # first order areas for second pair of paths A2 = [ ] # second order areas for first triple of paths B2 = [ ] # second order areas for second triple of paths A3a = [ ] # third order areas of type (a) for first quadruple of paths B3a = [ ] # third order areas of type (a) for second quadruple of paths A3b = [ ] # third order areas of type (b) for first quadruple of paths B3b = [ ] # third order areas of type (b) for second quadruple of paths

# For a single simulation run (M = 1), first and higher order areas at each time point (0, 1, . . . , n) a1 = [ ] # first order areas for first pair of paths b1 = [ ] # first order areas for second pair of paths a2 = [ ] # second order areas for first triple of paths b2 = [ ] # second order areas for second triple of paths a3a = [ ] # third order areas of type (a) for first quadruple of paths b3a = [ ] # third order areas of type (a) for second quadruple of paths a3b = [ ] # third order areas of type (b) for first quadruple of paths b3b = [ ] # third order areas of type (b) for second quadruple of paths for m in range(M):

""" Generate independent standard normal path increments """

dW = np.random.randn(n,d)

if rho == 1.0 and correlation == ’same’:

for k in range(d):

dW[:,k] = dW[:,0]

""" Correlate path increments """

if rho == 1.0 and correlation == ’same’:

dY = dW

else:

dY = np.dot(dW, U)

""" Create correlated Brownian paths """

Z = np.zeros(d) Z = np.vstack( (Z, dY*sqrdt) ) Z = np.cumsum(Z, axis=0)

""" Create path increments for diffusion processes """

for k in range(d):

if P[k, 0] == 0.0: # create increments for a Wiener process with drift

dY[:,k] = dY[:,k]*P[k,2]*sqrdt + P[k,1]*dt

66 else: # create increments for an Ornstein-Uhlenbeck process

X = np.zeros(n+1) dX = np.zeros(n)

X[0] = iv[k]

for l in range(n): # l = 0,1,..., n-1

dX[l] = P[k,0]*(P[k,1] - X[l])*dt + P[k,2]*sqrdt*dY[l,k]

X[l+1] = X[l] + dX[l]

dY[:,k] = dX

""" Create paths for diffusion processes """

Y_0 = iv # initial values of paths Y = np.vstack( (Y_0, dY) ) X = np.cumsum(Y, axis=0) # entire paths

""" Compute sequence of time-indexed signatures of multi-dimensional path increments """

S1 = Sig(dY[:,:(d/2)], t_ini, t_fin, p) S2 = Sig(dY[:,(d/2):], t_ini, t_fin, p) s1 = S1[t_fin - t_ini] # signatures over the whole time interval s2 = S2[t_fin - t_ini] x1 = s1[2] # second order iterated integrals x2 = s2[2] if p > 2:

y1 = s1[3] # third order iterated integrals y2 = s2[3] if p > 3:

z1 = s1[4] # fourth order iterated integrals z2 = s2[4]

""" Function to compute first order area """ def Area1(i1, i2, x):

# i1 is index of first path # i2 is index of second path # x is array of second order iterated integrals

A1 = 0.5*(x[i1,i2] - x[i2,i1])

return A1

67 """ Function to compute second order area """ def Area2(j1, j2, j3, y):

# j1 is path to be integrated with area # j2 is first path for area # j3 is second path for area # y is array of third order iterated integrals

# second order area A(j1,(j2,j3)) is computed

A2 = 0.25*(y[j1,j2,j3] - y[j1,j3,j2] + y[j2,j1,j3] - y[j2,j3,j1] - y[j3,j1,j2] + y[j3,j2,j1])

return A2

""" Functions to compute third order areas (two types) """ def Area3a(k1, k2, k3, k4, z):

# k1 is path to be integrated with second order area # k2 is path to be integrated with area # k3 is first path for area # k4 is second path for area # z is array of fourth order iterated integrals

# third order area A(k1,(k2,(k3,k4))) is computed

A3a = 0.125*(z[k1,k2,k3,k4] + z[k2,k1,k3,k4] + z[k2,k3,k1,k4]

+ z[k1,k3,k2,k4] + z[k3,k1,k2,k4] + z[k3,k2,k1,k4]

+ z[k1,k4,k3,k2] + z[k4,k1,k3,k2] + z[k4,k3,k1,k2]

- z[k1,k2,k4,k3] - z[k2,k1,k4,k3] - z[k2,k4,k1,k3]

- z[k1,k3,k4,k2] - z[k3,k1,k4,k2] - z[k3,k4,k1,k2]

- z[k1,k4,k2,k3] - z[k4,k1,k2,k3] - z[k4,k2,k1,k3]

- z[k2,k3,k4,k1] - z[k3,k2,k4,k1] - z[k4,k3,k2,k1]

+ z[k2,k4,k3,k1] + z[k3,k4,k2,k1] + z[k4,k2,k3,k1])

return A3a def Area3b(l1, l2, l3, l4, z):

# l1 is first path for first area # l2 is second path for first area # l3 is first path for second area # l4 is second path for second area # z is array of fourth order iterated integrals

# third order area A((l1,l2),(l3,l4)) is computed

68 A3b = 0.125*(z[l1,l2,l3,l4] + z[l1,l3,l2,l4] + z[l3,l1,l2,l4]

- z[l1,l2,l4,l3] - z[l1,l4,l2,l3] - z[l4,l1,l2,l3]

- z[l2,l1,l3,l4] - z[l2,l3,l1,l4] - z[l3,l2,l1,l4]

+ z[l2,l1,l4,l3] + z[l2,l4,l1,l3] + z[l4,l2,l1,l3]

+ z[l4,l3,l1,l2] + z[l4,l1,l3,l2] + z[l1,l4,l3,l2]

- z[l4,l3,l2,l1] - z[l4,l2,l3,l1] - z[l2,l4,l3,l1]

- z[l3,l4,l1,l2] - z[l3,l1,l4,l2] - z[l1,l3,l4,l2]

+ z[l3,l4,l2,l1] + z[l3,l2,l4,l1] + z[l2,l3,l4,l1])

return A3b

A1.append(Area1(i1[0], i1[1], x1)) B1.append(Area1(i2[0]-(d/2), i2[1]-(d/2), x2)) if p > 2:

A2.append(Area2(j1[0],j1[1],j1[2],y1)) B2.append(Area2(j2[0]-(d/2),j2[1]-(d/2),j2[2]-(d/2),y2)) if p > 3:

A3a.append(Area3a(k1[0],k1[1],k1[2],k1[3],z1)) B3a.append(Area3a(k2[0]-(d/2),k2[1]-(d/2),k2[2]-(d/2),k2[3]-(d/2),z2)) A3b.append(Area3b(l1[0],l1[1],l1[2],l1[3],z1)) B3b.append(Area3b(l2[0]-(d/2),l2[1]-(d/2),l2[2]-(d/2),l2[3]-(d/2),z2)) if M == 1:

for t in range(t_fin - t_ini +1): # iterate over all time points (t = 0, 1, . . . , n)

s1 = S1[t] # signatures over time interval from 0 to time point t s2 = S2[t]

x1 = s1[2] # second order iterated integrals x2 = s2[2]

if p > 2:

y1 = s1[3] # third order iterated integrals y2 = s2[3]

if p > 3:

z1 = s1[4] # fourth order iterated integrals z2 = s2[4]

69 a1.append(Area1(i1[0], i1[1], x1)) b1.append(Area1(i2[0]-(d/2), i2[1]-(d/2), x2))

if p > 2:

a2.append(Area2(j1[0],j1[1],j1[2],y1)) b2.append(Area2(j2[0]-(d/2),j2[1]-(d/2),j2[2]-(d/2),y2))

if p > 3:

a3a.append(Area3a(k1[0],k1[1],k1[2],k1[3],z1)) b3a.append(Area3a(k2[0]-(d/2),k2[1]-(d/2),k2[2]-(d/2),k2[3]-(d/2),z2)) a3b.append(Area3b(l1[0],l1[1],l1[2],l1[3],z1)) b3b.append(Area3b(l2[0]-(d/2),l2[1]-(d/2),l2[2]-(d/2),l2[3]-(d/2),z2))

# end of simulation loop

4. Path classification

Our final code sample shown below is an extract from the source code of aprogram that implements our novel method of classifying paths as either upward/downward trending or mean-reverting based on computing the third order area of a given market path and three test paths that are realisations of diffusion processes with the same mean reversion speed for the same Brownian path driving the processes. For each value of mean reversion speed assigned to the test processes, the program computes the minimum Euclidean norm for the third order area of a given market path and three test paths among all the finitely many Brownian paths that can drive the processes, and identifies the Brownian path that attains the minimum.

import numpy as np

""" Set parameters for simulation """ n = 100 # number of increments per path (i.e. number of time steps) t_ini = 0 # initial time point t_fin = n # final time point p = 4 # degree of signature dt = 1.0/n # length of time step sqrdt = np.sqrt(dt)

""" Generate Brownian paths driving market and test paths """

N = 1000 # number of Brownian paths np.random.seed(4) # initialize random number generation dV = np.random.randn(N, n) # standard normal increments

70 dW = np.transpose(dV) # for given n and random seed, dW[:,0] are increments of the first path for any N

W = np.zeros(N) W = np.vstack( (W, sqrdt*dW) ) W = np.cumsum(W, axis=0) # Brownian paths

""" Set up matrices of increments and paths for computing third order areas """ dX = np.zeros((4,n,N)) # matrix of increments of four Wiener/OU paths for all underlying Brownian paths X = np.zeros((4,n+1,N)) # matrix of four Wiener/OU paths for all underlying Brownian paths a3b = np.zeros(n+1) # time-indexed sequence of third order areas for market and test paths en = np.zeros(N) # Euclidean norm of third order area of market and test paths for each Brownian path M = np.zeros((10,3)) # for each value of mrs, minimum third order area and Brownian path that attains it

""" Initialize market and test paths """ ini_val = [0.0, 0.0, 0.0, 0.0] # initial values of market and test paths for i in range(4):

X[i,0,:] = ini_val[i]*np.ones(N)

""" Set parameters for market path """ theta = 5.1 # mean reversion speed ‘mrs’ (theta = 0.0 for a Wiener process with constant drift) mu = 7.5 # drift/long-term mean of Wiener/Ornstein-Uhlenbeck process sigma = 10.0 # volatility

P = np.zeros((4, 3)) # matrix of parameters for market and test paths

P[0] = [theta, mu, sigma] # market path parameters

""" Generate market path """ if P[0,0] == 0.0: # create increments for a Wiener process with constant drift

for m in range(n): # m = 0,1,..., n-1

dX[0,m,0] = P[0,1]*dt + P[0,2]*sqrdt*dW[m,0] # the first Brownian path drives the market path

X[0,m+1,0] = X[0,m,0] + dX[0,m,0] else: # create increments for an Ornstein-Uhlenbeck process

for m in range(n):

dX[0,m,0] = P[0,0]*(P[0,1] - X[0,m,0])*dt + P[0,2]*sqrdt*dW[m,0]

X[0,m+1,0] = X[0,m,0] + dX[0,m,0]

""" Set parameters for test paths """

P[1] = [1.0, 2.0*mu, 0.5*sigma] # first column entries to be multiplied by ‘mrs’ in the loop below P[2] = [1.0, 0.0, 2.0*sigma] P[3] = [1.0, (-1.0)*mu, 1.5*sigma]

71 """ Iterate over values of mean reversion speed (mrs) for test paths """ for mrs in range(10):

""" Generate test paths for all underlying Brownian paths """

for i in range(1, 4, 1): # i = 1, 2 and 3

if mrs == 0.0: # create increments for a Wiener process with constant drift

for m in range(n):

dX[i,m,:] = P[i,1]*dt + P[i,2]*sqrdt*dW[m,:]

X[i,m+1,:] = X[i,m,:] + dX[i,m,:]

else: # create increments for an Ornstein-Uhlenbeck process

for m in range(n):

dX[i,m,:] = mrs*P[i,0]*(P[i,1] - X[i,m,:])*dt + P[i,2]*sqrdt*dW[m,:]

X[i,m+1,:] = X[i,m,:] + dX[i,m,:]

""" Compute signature of market and test paths for each underlying Brownian path """

for j in range(N):

I = np.column_stack( ( dX[0,:,0], dX[1,:,j], dX[2,:,j], dX[3,:,j] ) ) # path increment matrix

S = Sig( I, t_ini, t_fin, p)

for t in range(t_fin - t_ini +1):

z = S[t][4] # fourth order iterated integrals over interval from 0 to time point t

a3b[t] = Area3b(0,1,2,3,z) # third order area of market and test paths

en[j] = np.sqrt( sum(a3b**2) ) # Euclidean norm of third order area over the whole time interval

M[mrs] = [ mrs, np.amin(en), np.argmin(en)+1 ]

print M

72