Rough Paths Theory and Its Application to Time Series Analysis of Financial Data Streams

Rough Paths Theory and its Application to Time Series Analysis of Financial Data Streams Antti K. Vauhkonen Christ Church University of Oxford A thesis submitted in partial fulfillment of the degree of Master of Science in Mathematical Finance Trinity 2017 Abstract The signature of a continuous multi-dimensional path of bounded variation, i.e. the sequence of its iterated integrals, is a central concept in the theory of rough paths. The key insight of this theory is that for any path of finite p-variation with p 1 (e.g. sample paths of Brownian motion have ≥ finite p-variation for any p > 2 almost surely), one can define a construct analogous to signature, called its rough path lift, that incorporates all the information required for solving controlled differential equations driven by the given path. In the first part of this thesis we give an intuitive yet mathematically rigorous introduction to rough paths. Information encoded in the signatures of multi-dimensional discrete data streams can also be utilised in their time series analysis, and in some recent publications signatures of financial data streams have been used as feature sets in linear regression for the purposes of classifying data and making statistical predictions. In the second part of this thesis we present a novel application of this signature-based approach in the context of a market model where every variable is assumed to follow a diffusion process that either has a constant underlying drift or reverts to some long-term mean. Specifically, we show that third order areas of financial data streams – special linear combinations of their fourth order iterated integrals – provide an efficient means of determining the parameters ofa market variable given one of its realisations in a space of finitely many Brownian sample paths that can drive the process, and thus enable one to distinguish between the two fundamental modes of market behaviour, namely upward or downward trending versus mean-reverting. An interesting line of future research would be to investigate the possibility of using third order areas as a tool for decomposing arbitrary market paths into mean-reverting path components with a spectrum of mean reversion speeds. To the memory of my mother. Acknowledgements I would like to express my gratitude to my academic supervisor Prof. Ben Hambly for his technical guidance, careful reading of my thesis and valuable comments. I also owe a big debt of gratitude to Dr. Daniel Jones for giving his time so generously, his wise counsel and constant encouragement and support without which this thesis would probably never have been completed. My sincere thanks are also due to my family for their help, support and understanding while working on this thesis over a period that at times must have seemed interminable. Lastly, with love and eternal gratitude I remember my late mother, my most steadfast supporter in all of my varied endeavours, who sadly didn’t live to see this project reach its conclusion. Contents 1 Rough paths theory 1 1.1 Origins of rough paths . 1 1.2 Formal definition of rough paths . 11 2 Application of rough paths theory to time series analysis of financial data streams 21 2.1 Classical time series analysis . 21 2.2 Signatures as feature sets for linear regression analysis . 22 2.3 Lead and lag transforms of data streams . 27 2.3.1 Gyurk´o-Lyons-Kontkowski-Field method . 28 2.3.2 Flint-Hambly-Lyons method (Mark 1) . 30 2.3.3 Flint-Hambly-Lyons method (Mark 2) . 30 2.4 Area processes of multi-dimensional paths . 32 2.4.1 Definition and basic properties of areas . 32 2.4.2 Higher order areas . 35 2.5 Classification of paths using third order areas . 39 2.5.1 Diffusion process market model . 39 2.5.2 Areas for pairs of diffusion processes . 42 2.5.3 Classifying sample paths of diffusion processes by using third order areas . 49 2.6 Conclusion . 52 References 54 Appendix 1: Quadratic variation and cross-variation of data streams 56 Appendix 2: Python code 58 i List of Figures 1 GLKF method of lead-lag transforming data streams. 29 2 FHL (Mark1) method of lead-lag transforming data streams. 31 3 FHL (Mark 2) method of lead-lag transforming data streams. 32 4 Area between path components Xi and Xj. 33 5 A typical 2-dimensional Brownian sample path. 34 6 Scatter plot of areas of two pairs of MR processes with different long- term means and volatilities, but all four processes having the same mean reversion speed and driven by the same Brownian path. 43 7 Scatter plot of areas of the same two pairs of MR processes as in Figure 6 after slightly altering the mean reversion speed for one of the processes. 44 8 Scatter plot of the areas of a pair of two CD processes and a mixed pair of CD and MR processes all driven by the same Brownian path. 45 9 Scatter plot of terminal values of the areas of two mixed pairs of CD and MR processes for 500 simulation runs. 46 10 Scatter plot of terminal values of the same two areas as in Figure 9 with different long-term means assigned to the MR processes. .47 11 Scatter plots of terminal values of the areas of two pairs of MR processes all having the same mean reversion speed for 500 simulation runs with the pairwise correlation between Brownian motions driving the processes equal to 1.00, 0.99 and 0.90, respectively. 48 ii List of Tables 1 Determining the mean reversion speed of a given ‘market’ path by minimising its third order area with three test paths all driven by the same Brownian motion. 50 iii Chapter 1 Rough paths theory 1.1 Origins of rough paths There is no more fundamental question in science than that pertaining to the nature of change. Since antiquity thinkers have pondered over problems concerning motion, as illustrated by the famous paradoxes of Zeno. In one of them, Zeno argued that a flying arrow occupies a particular position in space at any given instant of time,hence is instantaneously motionless, and since time consists of instants, he concluded that motion is just an illusion; and in another paradox the Greek hero Achilles was unable to overtake a tortoise in a race where the latter had been given a head start, for in order to accomplish this he would need to traverse an infinite number of (progressively shorter) distances, which, according to Zeno, is impossible in a finite amount of time. While the arrow paradox can be seen simply as an acute observation that motion has no meaning with respect to a single instant of time – in fact any set of instants which has zero measure – the notion of an infinite series that is convergent to a limit provides a satisfactory resolution to the Achilles and tortoise paradox: specifically, that a geometric series like 1/2 + 1/4 + 1/8 + 1/16 + that arises in the equivalent ∙ ∙ ∙ dichotomy paradox does not grow without limit but converges to 1, enabling Achilles to quickly pass the tortoise. The concept of a limit of a function f that expresses the dependence of a variable y on another variable x as y = f(x) was similarly crucial to the development in the late 1600s of modern differential and integral calculus which provides proper analytical tools for the mathematical study of change. The chief among these is the derivative of ˙ df(x) a function, usually denoted by f 0(x), f(x) or dx , which, as the last notation due to Leibniz suggests, was originally conceived as the quotient of an infinitesimally small change df(x) in the value of the function f(x) corresponding to an infinitesimally 1 small change dx in the value of its argument x, until derivatives were defined in a more rigorous way using the (, δ)-definition of a limit in the early 19th century. Rather than needing to differentiate given functions, one is often faced with the (usually harder) inverse problem of finding a function F (x) whose derivative is a given function f(x), i.e. solving the differential equation dF (x) = f(x). (1.1) dx By the fundamental theorem of calculus, such an antiderivative F (x) of f(x) is the same as an indefinite integral of f(x), i.e. x F (x) = f(z) dz Za for any constant a < x in the domain of f where it is continuous. Differential equations first emerged in the context of dynamical systems asaway to implicitly describe their time evolution, and most fundamental laws in the mathematical sciences from fluid dynamics and electromagnetism to general relativity and quantum mechanics – and also mathematical finance – are expressed in terms of differential equations. For example, if in (1.1), relabelling the independent variable t for time, f(t) is a time-varying force acting on a body of mass m, then, according to dx(t) Newton’s second law of motion, the momentum mv(t) = m dt of the body, where x(t) is its position at time t, is a solution of this differential equation. Indeed, this is the first type of differential equation Newton considered and solved using infinite series in his Methodus fluxionum et Serierum Infinitarum of 1671. The second type of differential equations that Newton studied in the same work are of the form dy = f(x, y), dx and we will be especially interested in the special case where f is a function of the unknown variable y only, i.e.

Rough Paths Theory and Its Application to Time Series Analysis of Financial Data Streams

Convergence Rates for the Full Gaussian Rough Paths

Topics in Gaussian Rough Paths Theory

Final Report (PDF)

Arxiv:1712.01343V2 [Math.DS]

Rough Paths and Rough Partial Differential Equations

Diffusive and Rough Homogenisation in Fractional Noise Field

A Note on the Continuity in the Hurst Index of the Solution of Rough

Doob--Meyer for Rough Paths

Stochastic Control with Rough Paths Joscha Diehl, Peter Friz, Paul Gassiat

An Introduction to Rough Paths Antoine Lejay

Topics in Stochastic Differential Equations and Rough Path Theory

On the Theory of Rough Paths, Fractional and Multifractional Brownian Motion with Applications to Finance Fabian Andsem Harang, MAT5960 Master’S Thesis, Autumn 2015