!  " # $%  &   '(    '   )  *%

  +    ,        - %+                  ,           ./   % ! /          %+      /%+                  ./          '              0 "1%2 /  '/ //  "              %2    '/   -                 -        % +       / 3           %     /             /   %+          /   %     /  -   %2    '/  4            35         %+                 %

 ! "                      !      "               #

" # 677 %% 7  8 9 6  6 66  *#*$#

2.!:#;: #<*:::*; 2.!:#;: #<*:::==

#$     

  ' <: 

SOME EXTENSIONS OF FRACTIONAL ORNSTEIN-UHLENBECK MODEL

José Igor Morlanes

Some Extensions of Fractional Ornstein-Uhlenbeck Model

Arbitrage and Other Applications

José Igor Morlanes ©José Igor Morlanes, Stockholm University 2017

ISBN print 978-91-7649-994-8 ISBN PDF 978-91-7649-995-5

Printed in Sweden by Universitetsservice US-AB, 2017 Distributor: Department of , Stockholm University

List of Papers

The following papers, referred to in the text by their Roman numerals, are included in this thesis.

PAPER I: MORLANES, J.I, RASILA, A.,SOTTINEN, T. (2009) Empir- ical Evidence on Arbitrage by Changing the Stock Exchange. Advances and Applications in Statistics. Vol.12, Issue 2, pages 223-233.

PAPER II: GASBARRA, D., MORLANES, J.I, VALKEILA, E. (2011) Ini- tial Enlargement in a Market Model. Stochastics and Dynamics. Vol. 11, Nos. 2& 3, pages 389-413.

PAPER III: AZMOODEH, E., MORLANES, J.I (2015) Drift Parameter Esti- mation for fractional Ornstein-Uhlenbeck process of the Second Kind. Statistics: A Journal of Theoretical and Applied Statis- tics. Vol. 49, Issue 1, pages 1-18.

PAPER IV: MORLANES, J.I., ANDREEV, A. (2017) On Simulation of Frac- tional Ornstein-Uhlenbeck of the Second Kind by Circulant Em- bedding Method. Research Report. No. 2017:1, SU, pages 1-11.

PAPER V: ANDREEV, A., MORLANES, J.I. (2017) Simulations-based Study of Covariance structure for Fractional Ornstein-Uhlenbeck process of the Second Kind.. Preprint.

Reprints were made with permission from the publishers.

Motivation

There may be many paths that one can follow to become a well-rounded sci- entific researcher. The path I decided to follow involves three main steps: first to master the theoretical concepts and methods in the field of research; second, to develop tools that can be applied to theoretical or empirical problems; and third, to use the accumulated knowledge in the first two steps to solve problems with real data in an area of interest. This doctoral dissertation is an effort to accomplish the first two steps. The family of fractional Brownian motion and Ornstein-Uhlenbeck process have been the focus. As I have been moving toward my goal, I have always kept third step in mind too. This dissertation has thus been frame-worked in the area of finance and economics. As a result, this thesis consists of five scientific articles. In the first three articles, I settle my theoretical background. In the last two, I master my programming skills and gain a deeper understanding of numerical methods.

Contents

List of Papers iii

Motivation v

1 Introduction 1

2 Computational Aspects of Stochastic Differential Equations 3 2.1 Generalities of Stochastic Processes ...... 3 2.1.1 Martingales ...... 4 2.2 Brownian Motion ...... 4 2.2.1 Brownian motion as the limit of a .... 5 2.2.2 Karhunen-Loeve expansion of Brownian motion . . . 6 2.2.3 ...... 7 2.3 Itô and Stratonovich Diffusion Processes ...... 8 2.3.1 Numerical Stochastic Integral ...... 9 2.3.2 Switching from a Itô to a Stratonovich Differential Equa- tion ...... 10 2.3.3 Itô Lemma ...... 11 2.3.4 The Lamperti Transform ...... 11 2.3.5 Families of Stochastic Processes ...... 11 2.4 Simulating Diffusion Processes ...... 13 2.4.1 Euler-Maruyama Approximation ...... 13 2.4.2 Milstein Scheme ...... 14 2.4.3 Error and Accuracy ...... 14

3 Ornstein-Uhlenbeck Process 17 3.1 Numerical Implementation of UO ...... 18 3.1.1 Simulating SDE ...... 18 3.1.2 Integral Solution ...... 19 3.1.3 Analytical Solution ...... 19 4 Fractional Brownian Motion 21 4.1 Definition and Properties of fBm ...... 21 4.1.1 Correlation and Long-Range Dependence of Time Se- ries Data ...... 22 4.1.2 Numerical Integral respect to fBm ...... 23 4.1.3 Fractional Itô Lemma ...... 24 4.2 Fractional Ornstein-Uhlenbeck SDE ...... 25 4.3 Numerical Simulation ...... 25 4.3.1 Circulant Embedding Method in a Nutshell ...... 26

5 Fundamental Theory Background 29 5.1 ...... 29 5.1.1 Malliavin derivative ...... 29 5.1.2 Divergente Operator and Skorohod integral ...... 33 5.1.3 Wick–Itô Integral ...... 34 5.2 Kernels ...... 35 5.3 Regular conditional expectation ...... 37 5.3.1 Construction of regular conditional probability .... 40 5.3.2 Likelihood ratio function ...... 41 5.4 Maximum likelihood ratio process and inference ...... 43

Bibliography 45

6 Summary of Papers 49

7 Swedish Summary 53 1. Introduction

A key contribution of this thesis is that it gives a deeper insight of some aspects of mathematical finance from a statistical and probabilistic point of view. This thesis offers the possibility of improving the modelling of financial data and of gaining an insight into modelling of information and arbitrage strategies which may be used to write financial instruments that take into account a potential insider. My journey began by learning the concepts of several important topics of and Malliavin Calculus. I next developed probability and statistical tools to model random dynamic systems. These tools and models can be applied to any research field such as physics, engineering and telecom- munications. Although I have background in these fields, my true enthusiasm has been to seek applications in financial and economical problems. I have thus also learnt the main concepts of Continuous Stochastic Finance and Economet- rics. The main topic of this dissertation is the so-called fractional Ornstein- Uhlenbeck process of the second kind ( fOU2). The motivation for studying this process is that it exhibits a short-range dependence for all values of the self-similar parameter. This combined with a long-range dependence process is a more flexible dynamic model to capture the covariance structure of sam- pled data such as traffic networking data, interest rates of treasury bonds or stochastic volatility of financial derivatives. The fOU2 process is introduced in the scholarly paper by Kaarakka and Salminen [11]. The authors prove many mathematical properties of the process e.g. the process is locally Hölder continuous and stationary. They also consider the kernel representation of its covariance. The purpose of the articles III-V in this dissertation is to extend the re- search on the fOU2 process into the field of simulation and statistical infer- ence. These papers represent the first two research steps mentioned in the motivation. An estimator for the drift part of the process is constructed and simulation procedures to synthesize its sample trajectories are explored. The knowledge in these articles can then be used in a third step involving modelling and calibration of the process with real data. Article III describes a least-squared estimator and study its asymptotic properties. A logical next step is to examine the robustness of the estima-

1 tor. In order to achieve this, article IV-V present different algorithmic proce- dures to create the sample paths of the fOU2 process. The procedures syn- thesize the exact covariance structure and also consider the fOU2 paths with the most widely used marginal distributions by means of the circulant embed- ding method (CEM). The CEM is exact and easy to extend to non-Gaussian distributions with several dimensions. Articles I and II are devoted to financial derivatives and trading of non- public information. They are motivated by the need toreduce the likelihood of arbitrage opportunities for traders with access to insider information. Article I shows the possibility of making money without risk when a publicly- traded company switches marketplace, for example, from NASDAQ to the New York Stock Exchange. A trader with this information thus has an ar- bitrage opportunity. This is because the switching of marketplaces creates a shift in the price of the company derivative products. This shift is studied em- pirically in the paper and extended in a chapter of this dissertation by means of advanced Econometric techniques such the Logistic Smooth Transition Au- toregressive model and the Artificial Neural Network model. In article II an insider trader places orders in a marketplace modeled by a Markov chain. We investigate the possibility or not of arbitrage opportunities for an insider in a high frequency framework. The Markov chain process mod- els high frequency trading by tiny jumps between transactions. Some of the jumps may be accessible or predictable, but in the original filtration all jumps are inaccessible. Although the jumps change to accessible or predictable, the insider does not necessarily have arbitrage opportunities. This helps one to understand how the insider may profit even in the situation of being forced to close his position before expiration time. Along the research path of this pa- per, I have acquired knowledge in themes such as general theory of processes, enlargement of filtrations and Stochastic Calculus with jumps.

2 2. Computational Aspects of Stochastic Differential Equations

This chapter is devoted to classical stochastic processes, their definitions, main properties and important examples. In particular we will focus on the (Brownian motion) and the fractional Brownian motion process. Here, we construct approximations of stochastic integrals and prove an error esti- mate.

2.1 Generalities of Stochastic Processes

A real-valued is a parameterized collection of random vari- ables {Xt }t∈T defined on a probability space (Ω,F,P), taking values in R. The random variables are function of two variables of the form

X(t,ω) : T × Ω → R.

If T = N the process is said to be a discrete time process, and if T ⊂ R,wehave a continuous time process. Here we will consider continuous time processes with T =[0,∞) and we always think of T as the time axis. We will denote a continuous time process as X = {Xt , t ≥ 0}. We will also adopt the notation Xt = Xt (ω)=X(t,ω). Note that for each t ∈ T fixed we obtain a random variable

ω → Xt (ω), ω ∈ Ω and it represents the set of possible states of the process at time t. For each ω ∈ Ω fixed we can consider the function X(·,ω) : T → R, given by

t → Xt (ω) called a path, trajectory or realization of a stochastic process Xt and it repre- sents one possible evolution of the process.

3 2.1.1 Martingales

Now, we shall focus on a brief review of definition and some properties of a martingale. Let (Ω,F,P) be a probability space. A family of sub-σ-algebras {Ft , t ≥ 0} of σ-algebra F is called a filtration if for every s < t it follows that Fs ⊂ Ft . A stochastic process X is called adapted to the filtration {Ft , t ≥ 0} if and only if for every t ∈ T the random variable Xt (ω) is Ft -measurable. A martingale is a process X such that E|Xt |< ∞ for all t, it is adapted to the filtration {Ft , t ≥ 0}, and, for every s ≤ t; it holds true the equality E(Xt |Fs)=Xs. This means that Xs is the best predictor given Fs. A process X is a Markov process if for every s < t and every Borel set B ∈ R follows P(Xt ∈ B|Fs)=P(Xt ∈ B|Xs).

2.2 Brownian Motion

This process is named in honor of botanist Robert Brown. In 1827 he pub- lished a paper on the irregular motion of pollen particles suspended in water. He noted that the path of a given particle was very irregular. In 1900 Bache- lier described fluctuations in stock prices mathematically and this was later extended by Einstein in 1905. Einstein proposed the explanation that the ob- served “Brownian” motion was caused by individual water molecules hitting the pollen. He studied the details from a mathematical point of view and sug- gested that the main characteristics of this motion were randomness, its inde- pendent increments, its Gaussian distribution and its continuous paths. Similar theoretical ideas were also published independently by Smoluchowski in 1906. In 1923 Wiener gave the first construction of Brownian motion as a measure on the space of continuous functions, now called the Wiener measure. The Brownian motion is also a Wiener process, i.e., a continuous time with independent increments such that B0 = 0 with prob- ability 1, E(Bt )=0, and covariance function Cov(Bt ,Bs)=min{t,s} for all 0 ≤ s ≤ t. In particular Var(Bt − Bs)=t − s. Recall that every Gaussian pro- cess is uniquely determined by its mean function and the covariance function. From a simulation point of view, the most important features of a Brownian motion process is that for a fixed time increment Δt it holds true that √ BΔt+t − Bt ∼ Δt · N(0,1) (2.1) and that on any two disjoint intervals the increments are independent. For more information on the mathematical theory of Brownian motion, see for example [6].

4 1 Let consider the discretization 0 < t0 < t1 < ···< tN = T for the desired interval. iid 2 Generate Z1,...,ZN ∼ N(0,1). 3 Output N = − · , = ,..., Bt j ∑ t j t j−1 Z j j 1 N j=1 Algorithm 1: Generating Brownian motion.

1.2 1.1 1.0 0.9 0.8 0.7 B 0.6 t 0.5 0.4 0.3 0.2 0.1 0.0 −0.1 −0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time

Figure 2.1: Brownian motion trajectory.

2.2.1 Brownian motion as the limit of a random walk The simplest model of Brownian motion is to see it as a limit of a random walk. The term “random walk” was originally proposed by Karl Pearson in 1905. In a letter to Nature, he gave a simple model to describe a mosquito infestation in a forest. At each time step, a single mosquito moves a fixed length a, at a randomly chosen angle. Pearson wanted to know the distribution of the mosquitos after many steps had been taken. Brownian motion can be seen as the limit of a random walk in the following way. The steps are random displacements assumed to be a sequence of random variables independent and identically distributed X1,X2,...,Xn, taking only the values +1 and −1. Consider the partial sum

Sn = X1 + X2 + ···+ Xn, then as n → ∞ S[nt] P √ < x −→ P(B < x), n t

5 [ ] where x is the integer part of the real number x. Note that this result√ is a refinement of the that, in our case, asserts that Sn/ n → N(0,1).

8 1.2

1 6 0.8 4 0.6

2 0.4 Sum Sum 0.2 0 0 −2 −0.2

−4 −0.4 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 Steps Steps (a) Random walk with 100 steps. (b) Scaled random walk with 100 steps on (0,1).

Figure 2.2: Two random walks with different scaling. The sample trajectory (a) is a random walk with increment√ ΔXt = 1. The sample trajectory (b) is con- structed with a scaled factor dt.

Remark 2.2.1. Heuristically, we can consider the Brownian motion as a ran- dom walk over infinitesimal time steps of length dt whose increments √ dBt = Bt+dt − Bt ∼ dt · N(0,1) (2.2) √ are approximated by the random variable taking only the values ± dt. Note that from (2.1) we have

1√ 1√ E(ΔB )= Δt − Δt = 0, t 2 2 and 1 1 Var(Δt)=E(ΔB2)= Δt + Δt = Δt. t 2 2 Then, as Δt → 0 we have the interpretation (2.2).

2.2.2 Karhunen-Loeve expansion of Brownian motion Another characterization of the Wiener process quite useful in statistics is the Karhunen-Loeve expansion of B on some fixed interval [0,T]. The Karhunen- Loeve expansion is an L2([0,T],dt) expansion of random processes in terms

6 of a sequence of orthogonal functions and random coefficients. We recall that L2([0,T],dt) is the space of squared integrable functions from [0,T] to R. The paths of a Brownian motion belong to the space L2([0,T],dt) and it takes the Karhunen-Loeve expansion

∞ Bt (ω)=B(t,ω)=∑ Zi(ω)ϕi(t). i=0 with √ 2 2T (2i + 1)πt ϕ (t)= sin i (2i + 1)π 2T 2 The functions ϕi form an orthogonal basis in L ([0,T],dt) and Zi a sequence of independent and identically distributed Gaussian random variables.

1 Let consider the discretization 0 < t0 < t1 < ···< tN = T for the desired interval. iid 2 Generate Z1,...,ZN ∼ N(0,1) for suficient large N. 3 Output the approximation √ N 2 2T (2i + 1)πt B = ∑ Z sin , j = 1,...,N ti i ( + )π i=1 2i 1 2T

Algorithm 2: Brownian motion generation via Karhunen-Loeve.

2.2.3 Brownian Bridge

A Brownian bridge is an stochastic process {Xt , t ≥ 0} whose distribution is [ , ] = that of a Wiener process on the interval t0 T conditioned on Bt0 a and BT = b. In other words, a Brownian bridge starting at a at time t0 is "tied- down" to have the value b at time T. It plays a crucial role in the Kolmogorov- Smirnov test and in sampling used in combination with stratification, e.g., in the calculation of the expected payoff of a financial derivative. Another fre- quent use of the Brownian bridge is for refinement in the discrete time Wiener process. This process is easily simulated using the simulated trajectory of the Wiener process. If Bt is a Wiener process then − = + − t t0 ( − + ), ≤ ≤ . Xt a Bt−t0 BT−t0 b a t0 t T T −t0 is a Brownian Bridge.

7 3

2.5

2

1.5 B t 1

0.5

0

−0.5

−1 0 0.2 0.4 0.6 0.8 1 Time

Figure 2.3: Two random Brownian bridge samples with starting point at (0,2) and finishing point at (1,0)

2.3 Itô and Stratonovich Diffusion Processes

A stochastic differential equation (SDE) for a stochastic process {Xt , t ≥ 0} is an expression of the form

dXt = μ(t,Xt )dt + σ(t,Xt )dBt , X0 = x (2.3) where {Bt , t ≥ 0} is a Wiener process and μ(x,t) and σ(x,t) are deterministic functions. The coefficient μ is called the drift and σ is called the diffusion co- efficient. The resulting process {Xt , t ≥ 0} is referred to as a . Although equation (2.3) looks like an ordinary differential equation, standard methods to solve it are not applicable. This is because the paths of Brownian motion are not differentiable, although they are continuous. To get an intuition of this claim, recall√ the interpretation of Brownian motion as a random walk taking values ± dt (cf. Remark 1.2.1). According to this representation √ ± ± dBt dt = √ 1 ∞. dt dt dt

= dBt so the derivative is not well defined and dBt dt dt has no meaning. A way around the obstacle was found in the 1940s by Kiyoshi Itô a Japanese mathematician, who gave a rigorous meaning to (2.3) by writing it as t t Xt = X0 + μ(s,Xs)ds + σ(s,Xs)dBs, X0 = x (2.4) 0 0 where the integral with respect to Bt on the right-hand side is called the Itô stochastic integral.

8 It is possible to replace the driving process B by , which contains Brownian motion and a large variety of jump processes. They are useful tools when one is interested in modeling the jump character of real-life processes, such as the strong oscillations of foreign exchange rates or crashes of the stock market.

2.3.1 Numerical Stochastic Integral The realizations of Brownian motion are not differentiable at any point. As a consequence, we cannot naively define sample-path by sample-path an in- t ( ) tegral, 0 h s dBs, in the Riemann-Stieltjes sense. The two most common concepts to overcome this problem are the Itô integral and the . Recall from standard calculus how the Riemann-Stieljes integral is defined. T ( ) [ , ] Given a suitable function h, the integral 0 h s dGs over 0 T is approximated by the Riemann sum N ∑ h(t j)(G j+1 − G j) (2.5) j=1 where t j = jΔt with Δt = T/N for some positive integer N. The integral is defined by taking the limit Δt → 0 in (2.5). In a similar way, we may consider the sum N ∗ ∑ h(t )(B j+1 − B j), (2.6) j=1 and, by analogy, we may consider it as an approximation to the stochastic ∗ integral. The choice of t leads to different types of integrals. If we choose ∗ = t the left point t t j, we obtain the Itô integral 0 hs dBs, and if we choose ∗ = 1 ( + ) the midpoint, t 2 t j+1 t j , we obtain the Stratonovich integral, denoted t ◦ by 0 hs dBs. Both approximations to the integral give different results and this mismatch does not disappear as Δt → 0 in (2.6). From a simulation standing point, this highlights a significant difference between deterministic and stochastic integration in defining an integral as the limit of a Riemann-Stieltjes sum. We must be precise about how the sum is formed. The interpretation of the stochastic integral depends on the situation at hand. For the Itô integral we choose the left point in the interval so in the limit ∈ FB the function h is adapted to the filtration of the Wiener process, i.e, ht t (cf. Martingales 1.1.1). The reason for this requirement is the usage of Martingale theory for the construction of the integral. The Itô integral is then a martingale that offers rich structural properties. This gives an important computational advantage in many real world applications, such as in financial mathematics

9 for modeling stock prices. On the other hand, we cannot use ordinary calculus with an Itô integral. It also behaves dreadfully under changes of coordinates in applications within the physical sciences. A Stratonovich integral is not a martingale but obeys the classical chain rule, i.e. there are no second order terms in a Stratonovich analogue of the Itô’s lemma. This property makes the Stratonovich integral natural to use, for example, in connection with stochastic differential equations on manifolds. More information on the Stratonovich type of integral can be found in [23], and on the Itô integral in [22]. Remark 2.3.1. To ensure regularity of the Ito integral (such as the existence of the first and the second moments), the integrand h has to compensate for the roughness of the paths of Brownian motion. This fact implies the technical E t 2 < ∞ condition 0 hs ds . Remark 2.3.2. For computational purposes, it is best to present the Stratonovich integral as t t 1 hs ◦ dBs = hs dBs + h,B t , 0 0 2 the integral on right hand side is interpreted in the Itô sense and ·,· is the quadratic covariation between h and B [12]. = ( ) , = ∂h ( ) · , Remark 2.3.3. If Yt h Xt then Y B t ∂x Xt X B t

2.3.2 Switching from a Itô to a Stratonovich Differential Equation Let a dynamical system be a model by Itô stochastic differential equation (2.3)

dXt = μ(t,Xt )dt + σ(t,Xt )dBt , X0 = x then the same process can be described by a Stratonovich stochastic differential equation 1 ∂σ(t,X ) dX = μ(t,X ) − t σ(t,X )dt + σ(t,X ) ◦ dB , X = x. t t 2 ∂x t t t 0 If the system is defined by the Stratonovich differential equation

dXt = μ(t,Xt )dt + σ(t,Xt ) ◦ dBt , X0 = x then its Itô countpart becomes 1 ∂σ(t,X ) dX = μ(t,X )+ t σ(t,X )dt + σ(t,X )dB , X = x. t t 2 ∂x t t t 0 Note that the diffusion term is the same in both the Itô and Stratonovich SDEs.

10 2.3.3 Itô Lemma An important tool from Itô calculus is the Itô formula that it is also useful in simulations. If f is a twice differentiable on t and x, then ∂ f ∂ f 1 ∂ 2 f ∂ f d f (t,X )= (t,X )+μ(t,X ) (t,X )+ σ 2 (t,X ) dt +σ(t,X ) (t,X )dB . t ∂t t t ∂x t 2 ∂x2 t t ∂x t t (2.7) A special case is the product rule:

d(XtYt )=Yt dXt + Xt dYt + d[X,Y]t . In terms of Stratonovich integrals, the Itô formula looks like the funda- mental theorem of calculus ∂ f ∂ f d f (t,X )= (t,X )dt + (t,X ) ◦ dX t ∂t t ∂x t t

2.3.4 The Lamperti Transform An important application of the Itô formula to simulations is to transform

dXt = μ(t,Xt )dt + σ(Xt )dBt , X0 = x into one with a unitary diffusion coefficient by applying the Lamperti trans- form Xt 1 Yt = F(Xt )= du (2.8) z σ(u) where z is an arbitrary value of X. The process Y solves the SDE μ(t,Xt ) 1 ∂σ dYt = − (Xt ) dt + dBt σ(Xt ) 2 ∂x Note that the diffusion term depends only on the process X.

2.3.5 Families of Stochastic Processes Now we present an overview of existing models in finance: • Brownian Bridge. b − X dX = t dt + dB , X = a, X = b t T −t t 0 T μ σ = T−t where and are constants. By applying the Itô lemma to Yt T Xt and taking integrals the solution is t t 1 Xt = a(T −t)+b +(T −t) dBs T 0 T − s 11 • Geometric Brownian motion. This process is the well-known process called the Black-Scholes model in finance.

dXt = μXt dt + σXt dBt (2.9)

where μ and σ are constants. By applying the Itô lemma to Yt = logXt and taking integrals the solution is

1 2 (μ− σ )t+σBt Xt = X0e 2 (2.10)

• Geometric Mean Reverting process.

dXt = θ(k − logXt )Xt dt + σXt dBt

where θ, k and σ are constants. Applying the Itô lemma to the transfor- mation Yt = logXt the process is reduced to the linear stochastic differ- ential equation 1 dY = θ(k −Y ) − σ 2 dt + σdB t t 2 t

θt Now applying the product rule to Yt = e Xt the solution is σ 2 t −θt −θt −θ(t−s) logXt = e logX0 + k − (1 − e )+σ e dBs 2θ 0

• Cox-Ingersoll-Ross (CIR) model. √ dXt = θ(k − Xt )dt + σ Xt dBt

θt where k, θ and σ are constants. By applying the Itô lemma to Yt = e Xt and taking integrals the solution is t √ −θt −θt −θ(t−s) Xt = e X0 + k(1 − e )+σ e Xs dBs 0

• Constant of Elasticity of Variance model (CEV). = μ + σ( , ) , σ( , )=σ γ−1 dXt Xt dt t Xt Xt dBt t Xt Xt

where μ is constant and σ(t,Xt ) is a local volatility function. By apply- −μt ing the product rule to Yt = e Xt the solution is

1 t 1−γ = μt −μ(1−γ)t 1−γ + σ −μ(1−γ)s Xt e e X0 e dBs 0 12 • Stochastic Verhulst equation

=(λ − 2) + σ dXt Xt Xt dt dBt

where λ and σ are constants. Applying Itô lemma to the transformation Y = 1 the process is reduced to a linear stochastic differential equation t Xt

dYt =(1 +(2 − λ)Yt ))dt − σYt dBt

• Miscellaneous process. 1 dX = − a2X dt − a 1 − X2dB t 2 t t t This equation is reducible to a Stratonovich equation = − − 2 ◦ dXt a 1 Xt dBt

with solution

Xt = cos(aBt + arccosX0)

2.4 Simulating Diffusion Processes

There are different iterative methods that can be used to integrate SDE systems. The most widely-used ones are introduced in the following section.

2.4.1 Euler-Maruyama Approximation

To apply a numerical method to equation (2.3), we assume that {Xt , 0 ≤ t ≤ T} is a solution on [0,T]. We discretize the interval Δt = T/N for N, some integer = Δ = N and t j j t. We also denote Xt j Xj. The Euler-Maruyama Approximation of X is the stochastic process Y satisfying the iterative scheme

Yj = Yj−1 + μ(t j−1,Yj−1)Δt + σ(t j−1,Yj−1)(B j − B j−1).

In between any two points t j and t j−1 it is natural to apply linear interpolation

t −t j−1 Yt = Yj−1 +(Yj −Yj−1) , t ∈ [t j−1,t j). t j −t j−1

We only need to generate the increments (B j − B j−1).

13 1 Let us consider the discretization 0 < t0 < t1 < ···< tN = T for the desired interval. 2 Generate the initial value Y0 if it is from a distribution or fixed value. ∼ ( , ) 3 Draw Z j−1 N 0 1 . √ 4 Evaluate Yj = Yj−1 + μ(t j−1,Yj−1)Δt + σ(t j−1,Yj−1) ΔtZj−1. 5 Set j = j + 1 and go to Step 3. Algorithm 3: Euler’s Method

2

1.8

1.6

1.4 X t 1.2

1

0.8

0.6

0.4 0 0.2 0.4 0.6 0.8 1 Time

Figure 2.4: The solution to the geometrical Brownian motion or Black-Scholes stochastic differential equation (2.9). The exact solution (2.10) is plotted as a purple curve. The Euler-Maruyama approximation with time step Δt = 1/8is plotted as a dark curve. The drift and diffusion parameters are set to μ = 0.05 and σ = 0.8, respectively.

2.4.2 Milstein Scheme The Milstein scheme applies the Itô formula to the drift and diffusion terms to increase the accuracy of the approximation. It looks like

Yj = Yj−1 + μ(t j−1,Yj−1)Δt + σ(t j−1,Yj−1)(B j − B j−1) ∂σ (2.11) 1 2 + σ(t − ,Y − ) (t − ,Y − ){(B − B − ) − Δt} 2 j 1 j 1 ∂x j 1 j 1 j j 1

2.4.3 Error and Accuracy Two methods are mainly used in the literature to check the optimality of the nu- merical approximation: the strong and the weak orders of convergence. First, we consider strong convergence; that is, we check the approximation of the

14 1 Let us consider the discretization 0 < t0 < t1 < ···< tN = T for the desired interval. 2 Generate the initial value Y0 if it is from a distribution or fix value. 3 Draw Z j−1 ∼ N(0,1). 4 Evaluate Yj from (2.11) 5 Set j = j + 1 and go to Step 3. Algorithm 4: Milstein’s Method sample path. In contrast, in the weak convergence we are only concerned with the distributions. A discrete-time approximation is said to converge strongly to the solution Xt at time T if E| − Δt |= lim XT XT 0 (2.12) Δt→0 Δt where XT is the approximate solution of the continous-time process XT com- putated with stepsize Δt. It is also said to be of a general strong order of convergence β to X if E| − Δt |≤ Δ β XT XT C t (2.13) with C a constant independent of Δt. Strong convergence allows one to approximate a sample path accurately. This is required for some applications. In other applications, such as Monte Carlo estimates, the goal is to obtain the distribution of the solution XT ; indi- vidual sample paths may not be of interest. We give a definition on the same lines for strong convergence. A discrete-time approximation is said to converge weakly to the solution Xt at time T if E ( Δt )=E ( ) lim f XT f XT (2.14) Δt→0 for all polynomials f (x). It is also said to be of a general weak order of con- vergence γ to X if |E ( ) − E ( Δt )|≤ Δ γ f XT f XT C t (2.15) with C a constant independent of Δt. The Euler scheme has a strong order of convergence of 1.5 and a weak order of convergence of 1. The Milstein scheme has strong and weak orders of convergence equal to 1.

15 16 3. Ornstein-Uhlenbeck Process

The Ornstein-Uhlenbeck process (OU ) is widely used to model data series that fluctuate about certain constant or time varying levels. This set of data tends to revert to a constant or level every time it moves away from it, for example, interest rates or currency exchange rates in financial markets or the fluctuations of a spring under certain conditions in physics. This behavior can be described, mathematically, by the stochastic differential equation

dXt = −θ(Xt − μ)dt + σdWt , (3.1) where θ > 0, μ and σ > 0 are parameters and W = {Wt }t≥0 is a standard Brow- nian motion. The solution of this equation is called an Ornstein-Uhlenbeck process. The above representation is also known as the . For comparison purposes, we can think of the OU -process as the continu- ous version of the discrete-time AR(1) defined by μ ΔX + = −(1 − θ) X − + σε + , |θ|< 1. n 1 n (1 − θ) n 1 The drift part of equation (3.1) embodies the mean-reverting property. If μ is the long-run mean and the current value is Xt > μ, the drift is negative and the process moves downwards toward μ. If the current value is Xt < μ, the drift is positive and the process moves upwards towards μ. Observe that the magnitude of the reversion is proportional to the current value of Xt , the further away Xt is from the long-run mean, the stronger the return towards it. The parameter θ adjusts the speed at which the process reverts. The solution of the stochastic differential equation (3.1) can be found by θt applying Itô lemma to the ansatz e Xt θt θt θt θt d(e Xt )= θe Xt − e θ(Xt − μ) dt + e σdBt integrating both sides t t θt θs θs e Xt − X0 = θμ e ds + σ e dBs, 0 0 which leads to t −θt −θt θ(s−t) Xt = e X0 + μ(1 − e )+σ e dBs (3.2) 0 17 where the stochastic integral is understood as an Itô integral (cf. Section 1.3.1). It follows that X is a Gaussian process whenever X0 is Gaussian with mean function

−θt −θt E(Xt )=e E(X0)+μ(1 − e ) (3.3) and covariance function

Cov(X X )=E((X − E(X ))(X − E(X ))) t s s s t t s t 2 θ(u−s) θ(v−t) = σ e dBu · e dBv 0 0 s 2 s t 2 −θ(s+t) θu θu θv = σ e e dBu + e dBu · e dBv 0 0 s s −θ( + ) θ = σ 2e s t e2 u du 0 2 σ −θ( + ) θ { , } = e s t (e2 min t s − 1). 2θ

In particular,

1 − e−2θt Var(X )=σ 2 . t 2θ

This shows that the Ornstein-Uhlenbeck process converges in distribution to a Gaussian random variable with distribution N(μ,σ 2/(2θ)) as t → ∞.

3.1 Numerical Implementation of UO

3.1.1 Simulating SDE

The stochastic differential equation (3.1) can be discretized and approximated using the Euler-Maruyama method

Xj = Xj−1 + μ(t j−1,Xj−1)Δt + σ(t j−1,Xj−1)(B j − B j−1). where (B j −B j−1) are independent√ and identically distributed Gaussian Brow- nian increments with distribution Δt · N(0,1).

18 4.0

3.5

3.0

X 2.5 t 2.0

1.5

1.0

0.5 0 0.5 1 1.5 2 2.5 Time

Figure 3.1: Solution to Ornstein-Uhlenbeck stochastic differential equation with long-term mean μ = 1.2, variance σ = 0.3 and speed revertion θ = 1. Two Euler- Maruyama approximations are plotted as green and blue curves with X0 = 1 and X0 = 1.7 respectively. A Brownian motion is plotted as a red curve with B0 = 1.2, mean μ = 1.5 and variance σ = 0.5.

3.1.2 Integral Solution To obtain a numerical solution of equation (4.10) we need to numerically approximate the integral term using an SDE integration scheme like Euler- Maruyama. Then equation (4.10) becomes

−θΔt −θΔt −θΔt θ jΔt Xj = Xj−1 + e X0 + μ(1 − e )+σe · e (B j − B j−1).

3.1.3 Analytical Solution To compute numerically the analytical solution of an Ornstein–Uhlenbeck pro- cess (4.10), we need to use the time-change property of Brownian motion, that { , ≥ } = ≥ is, for Bt t 0 a Brownian process, define Zt BCt , t 0 for some given = t ( )2 < ∞ ≤ deterministic function Ct 0 f s ds for all t T. Then, the stochastic process {Zt , 0 ≤ t ≤ T} has the same distribution as the Itô integral process {Yt , t ≥ 0} defined t t 2 Yt = f (s)dBs ∼ N 0, f (s) ds . 0 0

We then have that 2θt −θ −θ −θ e − 1 X = e t X + μ(1 − e t )+σe t B (3.4) t 0 2θ

19 where B(t)=Bt is a Brownian motion.

1 Let us consider the discretization 0 < t0 < t1 < ···< tN = T for the desired interval. 2 Generate the initial value Y0 if it is from a distribution or fixed value. 1−e−2θΔt iid 3 Compute σ Z − where Z ,...,Z ∼ N(0,1). 2θ j 1 1 N = −θΔt + μ( − −θ jΔt )+σ −θΔt 1−e−2θΔt 4 Evaluating Xj e Xj−1 1 e e 2θ Z j−1 5 Set j = j + 1 and go to Step 3. Algorithm 5: Ornstein-Uhlenbeck sample path via the analytical solu- tion

20 4. Fractional Brownian Motion

Fractional Brownian motion represent a natural one parameter extension of Brownian motion with parameter H ∈ (0,1), called the Hurst parameter. The parameter is named after the hydrologist Hurst who developed in [10] a statis- tical analysis of the yearly water run-offs of the Nile river. This process was first introduced by Kolmogorov within a Hilbert space framework [14]. He in- troduced continuous time Gaussian processes with stationary increments and self-similarity property. Kolmogorov named such processes as ’Wiener Spi- rals’. Mandelbrot and Van Ness in [19] established an integral representation of the process in terms of Brownian motion.They named the process Fractional Brownian motion (fBm). The studies of fBm have moved into several direc- tions with applications in hydrology, telecommunications, and > 1 mathematical finance among others. If H 2 then fBm has long memory. This property is appropiate for modelling data from problems in weather derivates, water levels in a river, the widths of the consecutive annual rings of a tree or < 1 the values of the log returns of a stock. If H 2 then fractional Brownian motion has found applications in mathematical finance such as modeling the prices of electricity in a liberated Nordic electricity market.

4.1 Definition and Properties of fBm

{ H , ≥ A fractional Brownian motion is a continuous centered Gaussian process Bt t } H = 0 with B0 0 and covariance function 1 Cov(BH ,BH )= σ 2(s2H +t2H − (s −t)2H ) (4.1) t s 2 for all 0 ≤ s ≤ t. In particular,

( H )=σ 2 2H . Var Bt t (4.2)

= 1 σ 2 { , } Notice that for H 2 the variance function is min t s and fBm becomes a Brownian motion. The process BH is self-similar with a Hurst parameter > > { −H H , ≥ } H 0, i.e., for any constant c 0, the rescaled processes c Bct t 0 have the same probability distribution as BH . This property is an immediate

21 consequence of the fact that the covariance function (4.1) is homogeneous of order 2H, that is, ( H , H )= 2H ( H , H ), > . Cov Bct Bct c Cov Bt Bs c 0 ( H − H ) The increments Bt Bs are stationary with zero mean Gaussian distribution and variance H − H = σ 2| − |2H . Var Bt Bs t s An important property of fractional Brownian motion is that it is neither a semimartigale nor a Markov process. More details on fractional Brownian motion, modeling and applications can be also found in [20].

4.1.1 Correlation and Long-Range Dependence of Data { =( H − H ), ≥ } The time series Yn Bn Bn−1 n 1 is a discrete Gaussian sequence with mean zero and covariance function 1 2 2H 2H 2H Cov(Y + ,Y )= σ (k + 1) +(k − 1) − 2k , (4.3) n k n 2 called fractional Gaussian noise with Hurst parameter H. In particular, for = 1 H H 2 the increments are independent and B becomes a Brownian motion. It follows from (4.3) that the autocorrelation is 1 ρ (k)= (k + 1)2H +(k − 1)2H − 2k2H H 2 ( − ) ≈ H(2H − 1)k2 H 1

∈ ( 1 , ) ρ ( ) > as k tends to infinity. For H 2 1 , the autocorrelations are positive H k 0 ∞ ρ ( )=∞ and decay slowly to zero ∑n=1 H n , i.e., they exhibit long-range depen- ∈ ( , 1 ) ρ ( ) < dence. In the case that H 0 2 , the autocorrelations are negative H k 0 1 and decay to zero with a rate faster than n and the increments have the short- ∞ ρ ( ) < ∞ range dependence property, that is, ∑n=1 H n . The parameter H controls the regularity of trajectories. Remark 4.1.1. A particular class of stationary time series models which cap- ture both long and short range dependences are the fractional ARIMA models or FARIMA(p,d,q) where the integrated parameter d is linked to the Hurst parameter by 1 d = + H, H ∈ (−1,1). 2 Remark 4.1.2. A usual definition of a self-similar structure is the repeticion of a geometry or image on different size scales as one zooms in or out. For ex- ample, Fig. 4.1(a) shows a repetion pattern, the so-called the Koch snowflake

22 and, taking into account the results of Young in [27], we may consider it as an approximation to the stochastic integral respect to fBm. In particular, if > 1 T ( H ) H H 2 , the path-wise Riemann-Stieltjes integral 0 f Bt dBt exists if h is a continuously differentiable function. Nevertheless, unlike the case of the Itô stochastic integral with respect to the Brownian motion, the path-wise integral with respecte to fBm does not have the mean zero. We may also consider a Riemann-Stieltjes sum, but we replace the ordi- nary products in the approximation (4.5) with Wick products (cf. Divergence operator 5.1.2 and the Wick–Itô Integral 5.1.3).

N ( H )  ( H − H ), ∑ f B j B j+1 B j j=1 This approximation is called Wick–Iô integral and it has a mean of zero (cf. Remark 5.1.2). Remark 4.1.3. Let F be continuous with continuous derivative and both sat- 2 isfying the growth condition | f (x)|≤ ceλx [21]. The Wick–Iô integral is then a stochastic integral defined as T T T ( H )  H = ( H ) H − ( H ) 2H−1 f Bt dBt f Bt dBt H f Bt t dt (4.6) 0 0 0 T ( H ) H where 0 f Bt dBt is the path-wise Riemann-Stieltjes integral.

4.1.3 Fractional Itô Lemma If f is twice differentiable and f , f  and f  satisfy the growth condition | f (x)|≤ 2 ceλx then 2 ∂ f ∂ f ∂ f − d f (t,BH )= (t,BH )dt + (BH )  dBH + H (t,BH )t2H 1dt. t ∂t t ∂x t t ∂x2 t For the path-wise Riemann-Stieltjes integral we have the classical fundamental theorem of calculus ∂ f ∂ f d f (t,BH )= (t,BH )dt + (t,BH )dBH . (4.7) t ∂t t ∂x t t 1 < < There is a more general fractional Itô formula for (4.7) if 2 H 1 in [5]. Let dXt = μ(t,Xt )dt + σ(t,Xt )dBt then ∂ f ∂ f ∂ f d f (t,X )= (t,X )+μ(t,X ) (t,X ) dt + σ(t,X ) (t,X )dBH . t ∂t t t ∂x t t ∂x t t (4.8) 1 σ 2 ∂ 2 f ( , ) Remark 4.1.4. Notice that there is no term 2 ∂x2 t Xt in contrast to that of the usual Itô formula (2.7).

24 4.2 Fractional Ornstein-Uhlenbeck SDE

The fractional Ornstein–Uhlenbeck process is a fractional analogue of the Ornstein–Uhlenbeck-process. That is, a continuous process X that is the solu- tion of equation = −θ( − μ) + σ H , dXt Xt dt dBt (4.9) θ > μ σ > H = { H } where 0, and 0 are parameters and B Bt t≥0 is a fractional ∈ ( 1 , ) Brownian motion with Hurst parameter H 2 1 . The solution of the stochastic differencial equation (4.9) can be found by θt applying the fractional Itô lemma to the ansantz e Xt ( θt )= θ θt − θt θ( − μ) + θt σ H d e Xt e Xt e Xt dt e dBt integrating both sides t t θt − = θμ θs + σ θs H , e Xt X0 e ds e dBs 0 0 which leads to t = −θt + μ( − −θt )+σ θ(s−t) H Xt e X0 1 e e dBs (4.10) 0 where the stochastic integral is understood as a pathwise integral.

4.3 Numerical Simulation

We simulate Fractional Brownian motion by means of the Circulant Embed- ding Method (CEM). The CEM is an exact method to simulate stationary Gaus- sian processes on a time interval. Although fractional Brownian motion is not stationary, we may simulate it from the fractional Gaussian noise increments as they form a . We discretize the interval [0,T] in equally spaced intervals with Δt = T/N = Δ = = for N some integer N and t j j t. We also denote Xt j Xj. Define Xj H − H H = H = ∑ j−1 { , = , ,..., − B j+1 B j so that B0 0 and B j k=0 Xj. The process Xj j 0 1 N 1} is a fractional Gaussian noise with mean zero and covariance function (cf. 4.3) i + 1 i j + 1 j Cov(X ,X )=E BH − BH BH − BH i j N N N N 1 T 2H = | j − i + 1|2H +| j − i − 1|2H −2| j − i|2H . 2 N

25 We denote Cov(Xi,Xj)=γ(| j − i|), that is, as a function of one variable. In order to use the covariance function numerically, we rewrite it as a func- tion of the distance between two grid points k = t j −ti = |i− j|Δt and use Δt is a parameter, 2H 2H 2H 2H (Δt) k k k γ(k)= + 1 + − 1 − 2 , k = 0,1,...,N − 1. 2 Δt Δt Δt

−H We also redefine the increments as X˜ j =(Δt) Xj. This process has a covari- ance function γ˜(k)=(Δt)−2H γ(k) so that

2H 2H 2H 1 k k k γ˜(k)= + 1 + − 1 − 2 , k = 0,1,...,N − 1. 2 Δt Δt Δt (4.11)

Notice that now γ˜(0)=1. We then simulate X˜ by means of the CEM and construct fBm by H N−1 H = T ˜ B j ∑ Xk (4.12) N j=0

H = with B0 0.

4.3.1 Circulant Embedding Method in a Nutshell

The algorithm is based on the fact that the covariance matrix of a stationary discrete time Gaussian processes can be embedded in a so-called circulant ma- trix. This latter matrix should be non-negative definite for the algorithm to work, which is indeed the case for the fractional Gaussian Noise. The con- structed circulant matrix can be diagonalized explicitly, and the computations are done efficiently with the so-called Fast Fourier Transform (FFT) algorithm. N×N A circulant matrix C ∈ R with first column c1 has decomposition C = ∗ WDW√ where W is a Fourier matrix and D is a diagonal matrix with diagonal ∗ λ = NW c1. The columns of W are the eigenvectors of C, and D contains the eigenvalues. If all eigenvalues are non-negative then define R = WD1/2 and C has the factorization C = RR∗.

Next generate the complex value vector X = X1 + iX2 = Rξ where ξ is a complex Gaussian vector of length N and distribution ξ ∼ CN(0,2IN). Then

26 X ∼ CN(0,2C) and X1, X2 ∼ N(0,C).

1 Let us consider the discretization 0 < t0 < t1 < ···< tN = T for the desired interval. 2 Generate the initial value X0 if it is from a distribution or fixed value. 3 Embed the covariance matrix in a circulant matrix C. ∗ 4 Factorize C = RR by the fast Fourier transform. iid 5 Generate a complex vector ξ = ξ1 + iξ2 where ξ1, ξ2 ∼ N(0,IN). 6 Evaluate X = Rξ by the fast Fourier transform. 7 Get the real part ℜ{X} and the imaginary part ℑ{X}. 8 Output the first N values of ℜ{X} and ℑ{X}. Algorithm 6: Generation of two sample vectors of length N via Circulant Embedding Method.

Example 4.3.1. Let us consider the fractional Gaussian noise vector X = (X0,X1,X2) with Hurst parameter H = 0.7. The covariance matrix is ⎛ ⎞ 10.32 0.19 ⎝0.32 1 0.32⎠ 0.19 0.32 1 according to (4.11). This is a symmetric Toeplitz matrix but not circulant. We can always embed a Toeplitz matrix inside of a larger symmetric circulant matrix by means of the so-called minimal circulant embedding. We add the first entry of the second column to the last entry of the first column. The penultimate entry of the first column must equal the first entry of the third column and so on. Here we add one column and one row to the covariance matrix ⎛ ⎞ . . . ⎜ 1032 0 19 0 32 ⎟ ⎜ ⎟ ⎜ 0.32 1 0.32 0.19 ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ 0.19 0.32 1 0.32 ⎠ 0.32 0.19 0.32 1

T This is a circulant matrix with first column c1 =(1,0.32,0.19,0.32) and this has decomposition ⎛ ⎞⎛ ⎞ ⎛ ⎞ 1111 1.8 1111 ⎜ ⎟⎜ ⎟ ⎜ ⎟ 1 ⎜1 −i −1 i ⎟⎜ 0.8 ⎟ 1 ⎜1 i 1 −i⎟ √ ⎝ ⎠⎝ ⎠ √ ⎝ ⎠ 4 1 −11−1 0.5 4 11−11 1 i −1 −i 0.8 1 −i 1 i W D 27 The minimal circulant extension has all eigenvalues non-negative so it is a 1/2 valid covariance matrix. We generate X = WD ξ where ξ ∼ CN(0,I4). The output sample is shown in Table 4.1

Table 4.1: First three samples for fGn and fBm with H=0.7 on the interval [0,1].

time X fGn fBm 0 0.02-i0.83 0.02 0 1 -0.42+i1.35 -0.42 0.01 2 0.35-i1.66 0.35 -0.24 3 -0.64-i2.25 -0.02

28 5. Fundamental Theory Background

In this introductory chapter some basic concepts of fundamental theories, which are necessary to understand the methods used in the articles are presented. We summarize some standard concepts from the general theory of stochas- tic processes, measure theory, stochastic calculus for Gaussian processes and maximum likelihood estimation for counting processes. The aim is to define standard terminology and to set out results needed to understand the research papers presented in this thesis. To make them reader-friendly, we give an intu- itive perspective on the theory. For proofs and further technical details, refer- ences are in the text.

5.1 Malliavin Calculus

The mathematical theory, known as the Malliavin calculus or the stochastic calculus of variations was first introduced by Paul Malliavin in [17, 18]as an infinite dimensional integration by parts technique. The original motivation and the most important application of this theory has been to provide a proba- bilistic proof of Hörmander’s hypoellipticity theorem. The Malliavin calculus is a Gaussian calculus, i.e. a calculus with respect to a Gaussian process. Nowadays the theory has found many applications which include numerical methods, stochastic control, and quantum physics. The theory has been de- veloped by Stroock, Watanabe, Nualart, Øksendal and others. The integration- by-parts formula, which relates the Malliavin derivative operator on the Wiener space and the divergence operator, called the Itô-Skorohod stochastic integral in setting, represents a crucial fact in this theory.

5.1.1 Malliavin derivative We introduce a notion of the differential DF of a smooth square integrable random variable F : Ω → R. The aim is to differentiate F with respect to the random parameter ω ∈ Ω. We follow [2], [8] and [20]. Let G = {G(h), h ∈ H} be a isonormal Gaussian process defined in a com- plete probability space (Ω,F,P) associated with the separable real Hilbert

29 space H. Then, G is a centered Gaussian family of random variables with covariance function E(G(h1)G(h2)) = h1,h2 H (5.1) for all h1,h2 in H. We assume that F is generated by G. It turns out that this general notion allows the treatment of many abstract Gaussian families. Consider the smooth function f : Rn → R such that f and all of its partial derivatives have polynomial growth. Denote by S the set of smooth random variables of the form

F = f (G(h1),...,G(hn)). (5.2)

Let P ⊂ S be the dense subspace of smooth random variables where f is a polynomial in n variables. Definition 5.1.1. The derivative D of a smooth random variable of the form (5.2) is the random variable DF : Ω → H given by n ∂ = f ( ( ),..., ( )) DF ∑ ∂ G h1 G hn hi (5.3) i=1 xi It is called the stochastic gradient or the Malliavin derivative. The Malliavin derivative DF can be interpret as an extension to random variables of the relationship between the directional derivative of a function f in the direction u and its gradient, that is,

Du f (x)=∇ f (x) · u, (5.4) where ∇ f =(∂1 f ,...,∂n f ) is the gradient of f and the partial derivative of f with respect to the i-th variable is denoted by ∂i f . To see this, let us calculate the inner product in H of DF and arbitrary vector h ∈ H:

n ∂ , = f ( ( ),..., ( )) , DF h H ∑ ∂ G h1 G hn hi h H i=1 xi n ∂ = f ( ( ),..., ( )) , ∑ ∂ G h1 G hn hi h H i=1 xi

= ∇ f (x)|( ( ),..., ( )) · ( h1,h H ,..., hn,h H ) G h1 G hn ∈Rn = ∇ f (x) · ( h ,h ,..., h ,h )| 1 H n H (G(h1),...,G(hn)) 1 = lim [ f (G(h1)+ε h1,h H ,...,G(hn)+ε hn,h H ) − f (G(h1),...,G(hn)] ε→0 ε (5.5)

30 where in the last equality we use the property (5.4). For each h ∈ H and h ε > 0, we can define the shifted isonormal Gaussian process Gε = {G(g)+ ε g,h , g ∈ H}. Similarly for each F ∈ S there is a shifted random variable h Fε = f (G(h1)+ε h1,h H ,...,G(hn)+ε hn,h H ) in the direction of h. There- fore by (5.5) h Fε − F DF,h H = lim ε→0 ε is the directional derivative of F in the direction of h and DF can be thought as the total derivative of F. Example 5.1.1 (Derivative in the White noise case). Fix a time interval [0,T]. Consider a Brownian motion {Bt , t ∈ [0,T]}. Let also the Hilbert space of deterministic square integrable functions f : [0,T] → R be denoted by H = L2([0,T],dt). For each h ∈ H, define the random variable T G(h)= h(t)dBt , 0 where the integral is interpreted in the Ito’s sense. This is a Gaussian random variable with mean E(W(h)) = 0 and covariance T E(G(h),G(g)) = h(t)g(t)dt = h,g H , 0 by Ito’s isometry. Then, G = {Gh, h ∈ H} is an isonormal Gaussian process. Moreover, T ( )= ( ) = , ≥ . G I[0,t] I[0,t] s dBs Bt t 0 (5.6) 0 A formal way of viewing the Brownian motion B is as a stochastic process tak- ing values over the set of all possible trajectoties. Let Ω = C([0,T]) be the space of continuous functions ω : [0,T] → R. The Brownian motion induces a measure W in Ω called the Wiener measure. Then (Ω,F,W) is a probability space where F is the Kolmogorov sigma algebra generated by the coordinates maps {πt (ω)=ω(t), t ∈ [0,T]}. Under the Wiener measure the coordinates maps are a Brownian motion. Then each ω ∈ Ω is the trajectory of the Brow- nian motion B(t,ω)=Bt (ω)=ω(t). Note that Bt is the coordinate maps πt under the Wiener measure. Let f be a polynomial, h1,...,hn ∈ H. We define a class of cylindrical Wiener functionals S = { , = ( ,..., )= (ω( ),...,ω( ))} 1 F F f Bt1 Btn f t1 tn and the larger class of Wiener polynomials S = { , = ( ,..., )} 2 F F f Gh1 Ghn

31 [ ( )](ω)= T ( ) (ω) ∈ 2(Ω,F,W) [ ( )](ω) where G h 0 h t dBt L . G h is well defined W−a.s. Hence, F ∈ S2 is defined only as W−a.s. By Girsanov’s theorem the Cameron-Martin directions · h˜(·)= h(t)dt, h ∈ H (5.7) 0 are fine to shift the argument ω of the functional while keeping it well defined as the shifted Wiener measure is equivalent to the Wiener measure W. Con- sider the subspace t H1 = h˜ ∈ Ω, h˜(t)= h(s)ds, h ∈ L2([0,T], dt) , 0 that is, the space of continuous functions with square integrable derivatives. This space is called the Cameron-Martin subspace. Take F ∈ S1 so F(ω)= ( ( ),..., ( )) = (ω( ),...,ω( )) f B I[0,t1] B I[0,tn] f t1 tn , then by (5.5) n ∂ , = f ( ( ),..., ( )) , DF h H ∑ ∂ B h1 B hn hi h H i=1 xi 1 = lim [ f (B(h )+ε h ,h ,...,B(h )+ε h ,h ) − f (B(h ),...,B(h ))] ε→ ε 1 1 H n n H 1 n 0 1 t1 tn = lim [ f (B(h )+ε h(s)ds,...,B(h )+ε h(s)ds) − f (B(h ),...,B(h ))] ε→ ε 1 n 1 n 0 0 0 1 t1 tn = lim [ f (ω(h )+ε h(s)ds,...,ω(h )+ε h(s)ds) − f (ω(h ),...,ω(h ))] ε→ ε 1 n 1 n 0 0 0 d · d = F(ω + ε h(t)dt)|ε=0= F(ω + εh˜)|ε=0. dε 0 dε This shows that DF,h is a Gâteaux derivative. If a random variable F has a directional derivative Dγ F in the direction of t γ(t)= h(s)ds, h ∈ L2[0,T], 0 in the usual sense, F(ω + εγ) − F(ω) Dγ F(ω)=lim ε→0 ε and in addition T Dγ F(ω)= ψ(t,ω)h(t)dt (5.8) 0 for some stochastic process ψ ∈ L2([0,T]×Ω), then F has a Fréchet derivative defined by DF(ω)=ψ(t,ω).

32 As mentioned above, the derivative of a random variable F is a random vari- able DF ∈ L2(Ω;H), where in the white noise case H = L2([0,T], dt). Thus, 2 for each ω ∈ Ω,DF(ω) is a function in L ([0,T], dt). We can write {Dt F, t ∈ [0,T]} for the derivative process at time t. This means that DF is a stochastic process, that is, an element of the canonical space L2(Ω×[0,T],F⊗B,dt ×P).

Remark 5.1.1. If F is Fs-adapted then Dt F = 0 for t > s. The intuition is that F depends only on the paths up to time s then perturbing the paths after s, i.e., t > s should not change anything.

Now we give some properties and state theorems without proofs, which can be found, for example, in [2, 20]. By definition of the gradient operator for smooth random variables of the form (5.3), we have

D(FG)=FDG+ DF G

We also have an integration by parts formula on the Wiener space framework

E( DF,h H )=E(F,B(h)).

We can apply this result to the product of random variables FG. This yields

E(G DF,h H )=−E(F DG,h H )+E(FGB(h)) (5.9)

5.1.2 Divergente Operator and Skorohod integral In the framework of an isonormal Gaussian process over a Hilbert space H, the divergence operator δ is the adjoint of the derivative operator D. More precisely, the domain Dom(δ) consists of those u ∈ L2(Ω;H) such that there exists F ∈ L2(Ω) with the relationship

E(Fδ(u)) = E DF,u H . (5.10)

Remark 5.1.2. Taking F ≡ 1 in (5.10), we obtain E(δ(u)) = 0.

Consider two random variables F, G and h ∈ H. We can then compute δ(Fh) using (5.10) and (5.9)

E(Gδ(Fh)) = E( DG,Fh H )

= E(F DG,h H )

= −E(G DF,h H )+E(GFB(h)) which implies δ(Fh)=FB(h) − DF,h H . (5.11)

33 If we consider H = L2([0,T], dt) then the divergence operator is interpreted as a stochastic integral. Taking F ≡ 1 we get that δ coincides with the Itô integral on (deterministic) L2[0,T] functions, that is, T δ(h)=B(h)= h(t)dBt . 0 F = ( ) = Moreover, if F is r adapted and h I(r,s] t , then by Remark 5.1.1 Dt F 0 for t > s and T , = ( ) = . DF h H I(r,s] t Dt F dt 0 (5.12) 0 Using (5.11) and the definition of the Itô integral on adapted step functions we obtain the result T δ( )= ( )= (ω( ) − ω( )) = ( ) . FI(r,s] FB I(r,s] F r s FI(r,s] t dBt (5.13) 0 The divergence operator coincides with an extension of the Itô stochastic inte- gral for integrands which are not necessarily Ft -adapted introduced by Skoro- hod in [25]. For this reason the divergente operator is also called the Skorohod integral in this context. We denote T δ(ut )= us δBs (5.14) 0 Remark 5.1.3. For u ∈ H and B(u)=δ(u) equation (5.11) also reads

δ(Fu)=Fδ(u) − DF,u H .

5.1.3 Wick–Itô Integral = m ∈ D1,2 ∈ Consider u ∑i=1 Fihi where Fi and hi H Then u belongs to the Dom(δ) and from (5.11) m δ(u)=∑ FiB(hi) − DFi,hi H . (5.15) i=1

The expression FiB(hi) − DFi,hi H is called the Wick product of the random variable Fi and B(hi) and it is denoted

Fi  B(hi)=FiB(hi) − DFi,hi H . With this notation (5.15) yields m δ(u)=∑ Fi  B(hi) (5.16) i=1 δ( )= T  ∈ (δ) We may then use the notation u 0 us dBs when u Dom .

34 5.2 Kernels

The non-negative kernels are building blocks to construct the fundamental con- cepts of this theory. This concept is of central importance for understanding the followings sections. It is used to define regular conditional probabilities and to construct stochastic processes such as stochastic integrals with respect to the Wiener process and fractional Brownian motion. Let Ω be a set and A a countable generated collection of measurable sub- sets of Ω. The pair (Ω,A) is a measurable space called state space where A is a σ-algebra. The elements of Ω are called states. We first define a signed measure on (Ω,A) as a map: μ : A → R that is countably additive, i.e, ∞ n μ Ai = ∑ μ(Ai) i=1 i=1

A kernel on (Ω,A) is a function K : Ω × A → R having the following proper- ties (Nummelin 1984, section 1.1):

• For each fixed A ∈ A, the function x → K(x,A) is Borel measurable. This function is denoted by K(·,A).

• For each fixed x ∈ Ω, the function A → K(x,A) is a signed measure. This set function is denoted by K(x,·).

Kernels have the following standard operations. Given two kernels K1 and K2 we can "multiply" them to obtain a new kernel called the convolution of K1 and K2, denoted K3 = K1 ∗ K2, and defined by

K3(x,A)= K1(x,dy)K2(y,A) A ∈ A.

A new measure λ can be defined by "left multiplying" a kernel K by a signed measure μ as λ(A)= μ(dx)K(x,A), A ∈ A.

We can also defined a new measurable function g by "right multiplying" a kernel K and a measurable function f : Ω → R as g(x)= K(x,dy) f (y), A ∈ A provided the integral exits.

35 There are two types of kernel we use: We denote by I the identity kernel I(x,A) defined by 1 x ∈ A I(x,A) := I (x) := A 0 otherwise.

Observe that this identity defines two familiar notions. The map I(·,A) is the indicator function of a set A and the map I(x,·) is the probability delta measure concentrated at the point x. A second kernel is the integral kernel denoted by K(x,dy) := k(x,y)φ(dy). The k(x,y) is a non-negative borel measurable function and φ a signed mea- sure.

Example 5.2.1. The Hilbert-Schmidt operator on L2 ([0,T],dx). The back- ground needed about integral operators to study the asymptotic properties of the estimator in paper III are based in this important concept. Let T > 0,a map

k : [0,T] × [0,T] −→ R, (t,s) −→ k(t,s) ( , )2 < +∞ K 2 ([ , ], ) where [0.T]2 k t s dt ds defines a Hilbert-Schmidt operator on L 0 T dx by

K : L2 ([0,T],dx) −→ L2 ([0,T],dx), T ϕ −→ Kϕ := t → k(t,s)ϕ(s)ds . 0

Notice that Kϕ is a measurable function in L2 ([0,T],dx) according to the "right multiplication" kernel operation T (Kϕ)(t)= K(t,ds)ϕ(s) 0 where the integral kernel is defined by K(t,ds)=k(t,s)ds. In this case the integral kernel can be identified by the measurable function k(t,s). If the kernel is defined in the set of points D = (t,s) ∈ [0,T]2, t ≥ s then k(t,s) is called a Volterra kernel in L2.

Example 5.2.2. In the general theory of counting processes, we have to define measures with respect to an increasing process. These measures are the build- ing blocks used to construct a likelihood ratio estimator in a jump space. Let A be a right continuous increasing process in (R+ × Ω, B(R+) ⊗ F∞) such that E[A∞] < ∞. This process uniquely determine a random measure M on R+ with

36 values M((s,t],ω)=At (ω) − As(ω) for all ω ∈ Ω. Observe that by construc- tion M(dt,ω)=dAt (ω) One can compare the increasing process A with the analogous fact that to each probability measure on R is associated a unique distribution function. With the help of A, a measure μA can be defined on the product space by

λA ([0,T] × B) := I [(t,ω),[0,T] × B] d(M ⊗ P) R ×Ω + T = dAt (ω)P(dω), B ∈ F∞. B 0

The measure μA is defined by the "left multiplication" of the identity kernel ( ,ω) ( ⊗ P) = (ω)P( ω) I[0,T]×B t and the measure d M : dAt d .

5.3 Regular conditional expectation

This section describes the construction of a regular conditional probability. In paper II we require a stronger notion of conditional probability than that provided by the Kolmogorov conditional expectation. Before we begin, let us recall the definition of conditional expectation re- garding two discrete random variables (X,Y). The conditional probability of X = x given Y = y and denoted P(X = x|Y = y) is defined by

P(X = x, Y = y) P(X = x|Y = y)= and P(Y = y) = 0. (5.17) P(Y = y)

For a fixed y, P(X|Y = y) is a probability measure of X and the conditional expectation is defined by

E(X|Y = y)=∑xi P(X = xi|Y = y).

Remark 5.3.1. Notice that for a fixed x, P(X = x|Y) is a random variable, i.e, is a measurable Borel function. This tell us that the conditional probability can be interpreted as a kernel, as defined in section 5.2. As we will see below this interpretation is not always correct and only under certain conditions can we guarantee the existence of a kernel that defines the conditional probability. The E(X|Y = y) is a function of the value y and E(X|Y) is a random vari- able whose values depends on the values of Y. For example, if Y(ω)=y then

E(X|Y)(ω)=E(X|Y = y)

The family of all probability distributions with respect to Y is denoted P(X|Y) and it can be defined by the conditional expectation P(X|Y) := E(IX |Y).

37 For continuous variables (X,Y), we cannot interpret anymore the condi- tional probability as given in (5.17). It turn out that it is still possible to define the conditional probability as in the discrete case by means of the conditional probability density function f (x|y)

f (x,y) f (x|y)= and fY (y) = 0. fY (y) where f (x,y) is the joint density function and fY (y) is the marginal of Y. The conditional expectation is defined by E(X|Y = y)= xf(x|y)dx R

For any pair of random variables (X,Y), discrete or continuous, E(X|Y) is a random variable with some properties that can be checked in any elementary probability book (references). The most important property is that E(X)= E(E(X|Y))(Casella for a proof). This property characterized Kolmogorov’s definition of a conditional expectation with respect to a σ-algebra. The notion of conditioning on a σ-algebra is more general than the notion of conditioning on a random variable. The question is with respect to what we are conditioning in a σ-algebra. For example, if an experiment has an outcome ω the only information available to you is the set of values Y(ω) contained in σ(Y). These values are sets of the form {ω : Y(ω) ∈ B} for B ∈ B the Borel σ-algebra. The intuition is then that E(X|σ(Y))(ω) is the expected value of X given this information. In particular, if Y is a finite discrete random variable then the sets are of the form {ω : Y(ω)=yi} for i = 1,2,...,n. One knows exactly which set has happened and as a result the value of Y. The σ-algebra G introduces the informal concept of partial information with respect to a σ- algebra [3]. Formally, the modern definition of conditional expectation is introduced by Kolmogorov using the Radon-Nikodym theorem [13]:

Definition 5.3.1. Let (Ω,F,P) be a probability space and G ⊂ F a sub-σ- algebra contained in F. The conditional expectation of a random variable X given G and denoted E(X|G) is a G-measurable random variable with the following property (Kolmogorov definition) X dP = E(X|G)dP, B ∈ G. (5.18) B B These random variables always exist and are unique up to a set of prob- ability zero (for a proof of this result see for example Williams [26]. The conditional probability of a set A ∈ F with respect to the σ-algebra G is the

38 random variable defined as P(A|G) := E(IA|G), where IA is the indicator func- tion of the set A. Observe that, by (5.18), the conditional probability has the property P(A,B)= P(A|G)dP (5.19) B which resembles the law of total probability. It is natural to ask if the conditional probability can be consider a proba- bility measure such that we can define the conditional expectation ∞ E(X|G)= xP(dx|G). (5.20) −∞

The most usual answer is to seek a kernel function K(·,·) such that for fixed A, K(·,A)=P(A|G) is a G-measurable function and for fixed x, K(x,·) is a probability measure. Such a kernel is called a regular conditional probability of X with respect to G. The formally definition is taking from Ash [1].

Definition 5.3.2. Let X : (Ω,F) −→ (Ω,F) be a random object, and G a sub- σ-algebra of F. The kernel function K : Ω × F −→ [0,1] is called a regular conditional probability for X given G iff

(1) K(ω,A) is a probability measure in A for each fixed ω ∈ Ω, and

(2) for each fixed A ∈ F, K(·,A)=P(X ∈ A|G) a.s.

Example 5.3.1. As an example of a conditional expectation defined with re- spect to a regular conditional probability, consider the drift parameter estima- tion of an ergodic diffusion process in the interval [0,T]

dXt = S(ϑ,Xt )dt + σ(Xt )dWt , X0 = 0, t ≥ 0 where ϑ ∈ Θ ⊂ Rd is the unknown parameter. A dynamic statistical model of the diffusion consists of a family of experiments (Ω,F,{Pϑ ;ϑ ∈ Θ}) and the partial information generated by the sample path of the process, denoted {FX ≥ } by the filtration t ; t 0 . In a Bayesian framework, suppose the unknown parameter ϑ has a prior probability measure on Θ with density f (·). Further suppose that the loss function (y)=y2 is given too. We define the mean risk function

R(ϑT )=inf (ϑT − θ) f (θ)dPθ dθ. ϑT Θ Ω A Bayesian estimator ϑT that minimizes the mean risk function is defined by ϑ = E(ϑ|FX )= θP(θ|FX ) θ T T T d Θ

39 P(θ|FX ) where T is the posterior probability distribution or the conditional prob- ability of ϑ with respect to the sub-σ-algebra generated by the process X in the interval [0,T]. Observe that this is equation (5.20) written in terms of the kernel density. For further details, the interested reader is refereed to [15].

However, to define such a kernel is not a straightforward task.

5.3.1 Construction of regular conditional probability The problem with Kolmogorov’s definition is that it depends on the set A. Although there always exits the random variable P(A|G) for each measurable set A, each P(A|G) is defined uniquely up to a set of probability zero. This means that P(A|G) is not uniquely defined for all ω ∈ Ω, but only for ω outside a P-null set NA, which may depend upon the set A. This implies that P(A|G) may not be a probability measure. For example, for a countable sequence of disjoint sets A1,A2,...,∈ F satisfies the countable additive property ∞ ∞ P A|G (ω)=∑ P(Ai|G)(ω) i=1 i=1 ω Ω\N( , ,...) N( , ,...) where belongs to A1 A2 . If we define the set A1 A2 then P( | ) ω ∈ Ω \ N( , ,...) A G is a probability measure if A1 A2 . The difficulty is that this set N(A1,A2,...) may not be a set of probability zero or even need not to be measurable at all because it is built by a uncountable union of sets. For another example of non-existence see ([24] Section II. 43). To avoid this difficulty, we require that the space (Ω,F) where X takes values is a completely separable metric space with the Borel σ-algebra ([1] Section 6. Theorem 6.6). This assures the existence and uniqueness of a ker- nel function K(x,A) such as for fixed A K(·,A)=P(A|G) is a G-measurable random variable. For the proof of the existence, the interested reader is refer- eed to ([7] Theorem 19 Chapter 21).

Example 5.3.2. In paper II we consider a stochastic process with respect to a filtration of σ-algebras in the space (Ω = R∞,F = B(R∞)). This space is a complete separable space with a Borel σ-algebra which ensures the existence of the regular conditional probabilities.

Example 5.3.3. (Continuation of Example 5.2.2). In the general theory of {Nt :0≤ t ≤ ∞} a stochastic intensity λ is heuristically defined as the predictable conditional expectation of jumps per unit time,

λt dt E(dNt |Ft−). (5.21)

40 Formally, the integral of the intensity defines a new process by t Λt = λu du 0 This process is left continuous and increasing. It thus determines a random Λ Λ measure μ on R+ with values μ ((0,t],ω)=Λt (ω). The measure is denoted Λt by an abuse of notation. The intensity λt is a likelihood ratio function of the restriction to Ft− of the measure Λ with respect to the restriction to Ft− of the Lebesgue measure Λ d t λt = . (5.22) dt (Ft−)

It can thus be interpreted as dΛ = E(dNt |Ft−) With the help of Λ, a measure μΛ can be defined on the product space by T μΛ ([0,T] × B) := E(dNt |Ft−)(ω)P(dω), B ∈ F∞. B 0

The measure μΛ is defined by the "left multiplication" of the identity kernel ( ,ω) E( | )(ω)P( ω) I[0,T]×B t and the product measure dNt Ft− d .

5.3.2 Likelihood ratio function In paper II we interpret the regular conditional probabilities as a likelihood ratio process. This is a natural way since Kolmogorov’s definition 5.3.1 is based on the Radon-Nikodym theorem [26].

Theorem 5.3.1. Suppose that (Ω,F,P) is a probability triple in which F is separable in that F = σ(Fn : n ∈ N) for some sequence (Fn) of subsets of Ω. Suppose that Q is a finite measure on (Ω,F) which is absolutely continuous relative to P in that for F ∈ F, P(F)= 0 ⇒ Q(F)=0. Then there exits X in L1(Ω,F,P) such that Q(F)= X dP, F ∈ F. F The variable X is called a version of the Radon-Nikodym derivative of Q rela- tive to P on (Ω,F).

If we consider the measures restricted to the sub-σ-algebra G λ(B) := X dP, B ∈ G and P(dω) B 41 then λ is finite and absolutely continuous respect to P [1]. We thus define the Radon-Nikodym derivative λ(dω) E(X|G)= (5.23) P(dω) (G) where the superscript (G) indicates that the measures are restricted to G.We thus get the result λ(B)= E(X|G)P(dω), B ∈ G. B

Observe that equation (5.23) may be interpreted as the so-called likelihood ratio function of two probability distributions. If X = IA then by (5.19) λ(B) := P(A,B)= P(A|G)P(dω), B ∈ G (5.24) B

Example 5.3.4. (Continuation of Example 5.3.3). In credit risk theory a very important class of dynamic models is the Cox process, also known as the dou- bly stochastic Poisson process. The martingale characterization of this pro- cess is an example of with stochastic intensity [4]. Under cer- tain conditions, a Cox process is a counting process N adapted to a filtration {F ≥ } λ ∗ = ( , ) t : t 0 with stochastic intensity t f f Xt for some random process Xt . The Cox process also has a Ft -predictable version of the intensity defined as (P ⊗ ) ∗ d Nt λt = E(λ |Ft−)= t P(dω)dt (Ft−) that is, it suffices to define λt as a likelihood ratio function of the restriction to Ft− of the measure P ⊗ Nt with respect to the restriction to Ft− of the measure P(dω)dt. Notice that this is the same as the likelihood ration function (5.22). Hence, we may define the measure t Λ = E(λ ∗|F ) . t t t− du 0 where Λt is the left continuous increasing version of the process N with inten- sity λt .

Example 5.3.5. The interpretation of Theorem 3.1 in paper II explained in Remarks 3.1-3.4.

42 5.4 Maximum likelihood ratio process and inference

In statistical inference of ergodic diffusion processes, the maximum likelihood estimation, likelihood ration or likelihood function plays a central role in which some statistical estimators are based. The likelihood function is also the so- called Radon-Nykodym derivative (cf. Section 5.3.2). Let us consider a statistical model (CT ,BT ,P) where CT is the space of continuous functions x : [0,T] → R and P := {Pϑ ;ϑ ∈ Θ} an indexed family of probability distributions absolutely continuous with respect to each other. ϑ ∈ Θ P  P ϑ ∈ Θ Let further 0 satisfy ϑ ϑ0 for all . Consider also the filtration {FW ≥ } = { , ≥ } t ; t 0 generated by the Wiener process. Assume that X Xt t 0 is T a stochastic process and X := {Xt ;0≤ t ≤ T} is a sample path in CT evolving according to one measure Pϑ . Then the likelihood ratio function at time T is defined as P L(ϑ,ϑ )= d ϑ ( T ) 0 P X d ϑ0 σ FW where the measures are restricted to the -algebra T . The maximum likeli- hood estimator ϑˆT is defined to be the solution of the equation

L(ϑˆT ,ϑ0)=sup L(ϑ,ϑ0). (5.25) ϑ∈Θ if X is a diffusion process with stochastic differential equation

dXt = μ(t,Xt ,ϑ)dt + σ(t,Xt )dBt , X0 = x k where {Bt , t ≥ 0} is a Wiener process (cf. Section 1.3) and ϑ ∈ Θ ⊂ R is unknown. Denote by Pϑ the measure induced by the process X on CT with ϑ unknown. Lipster and Shiryayev ([16] Theorem 7.19) proved that the likeli- hood function is given by P T μ( , ,ϑ) − μ( , ,ϑ ) d ϑ ( T )= t Xt t Xt 0 X exp 2 dXt dPϑ 0 σ (t,X ) 0 t (5.26) T μ( , ,ϑ)2 − μ( , ,ϑ )2 − 1 t Xt t Xt 0 2 dt 2 0 σ (t,Xt ) almost surely. In particular, if we choose the measure PB induced by the Wiener process on CT such as Pϑ  PB then the likelihood function is de- P fined as d ϑ . This is because by the , the process X may be dPB interpreted as a Brownian motion with drift μ(t,Xt ,ϑ). The likelihood func- P tion d ϑ contains all the information about the drift term for estimating the dPB unknown parameter ϑ. The maximum likelihood estimator P d ϑ T L(ϑˆT ,B)=sup (X ) ϑ∈Θ dPB

43 is the same as the one in (5.25) (cf. Remark 5.4.2). Lipster and Shiryayev ([16] Theorem 7.7) proved that the maximum likelihood is, in this case, dPϑ T μ(t,X ,ϑ) 1 T μ(t,X ,ϑ)2 (XT )=exp t dX − t dt P σ 2( , ) t σ 2( , ) d ϑ0 0 t Xt 2 0 t Xt For more details on the likelihood ratio for diffusion processes, we refer the interested reader to [15, 16].

Remark 5.4.1. The process X is defined on a probability space (Ω,F,P). The Radon-Nykodym derivative is T T 1 2 LT =exp ϕt (ω)dBt − ϕt (ω) dt 0 2 0 T ϕ (ω) T t 1 2 =exp σ(t,Xt )dBt − ϕt (ω) dt 0 σ(t,Xt ) 2 0 . where μ(t,Xt ,ϑ) − μ(t,Xt ,ϑ0) ϕt (ω)= σ(t,Xt ) Now adding and substracting the term T (μ(t,Xt ,ϑ) − μ(t,Xt ,ϑ0))μ(t,Xt ,ϑ) 2 dt 0 σ(t,Xt ) to LT we obtain equation (5.26). Remark 5.4.2. Notice that the definition of the maximum likelihood on (5.25) P does not depend on the choice of ϑ0 . If we choose another probability mea- P P  P  P sure ϑ1 such as ϑ ϑ0 ϑ1 then by the change rule property of the Radon-Nykodym derivative

dPϑ dPϑ dPϑ ( T )= 0 ( T ). P X P P X d ϑ1 d ϑ0 d ϑ1

dPϑ 0 ϑ η The term Pϑ as a function of is a constant so d 1 P L(ϑˆ ,ϑ )=η · d ϑ ( T ). 1 sup P X ϑ∈Θ d ϑ0

44 Bibliography

[1] R. B. Ash. Real Analysis and Probability: Probability and : a Series of Monographs and Textbooks. Academic Press, 2014. 39, 40, 42

[2] Oksendal B. Sulem A. Biagini, F. and N. Wallner. An introduction to white-noise theory and Malliavin calculus for fractional Brownian mo- tion. In Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, volume 460, pages 347–372. The Royal Society, 2004. 29, 33

[3] P. Billingsley. Probability and measure. John Wiley & Sons, 2008. 38

[4] P. Brémaud. Point processes and queues. 1981. 42

[5] W Dai and CC Heyde. Itô’s formula with respect to fractional brownian motion and its application. International Journal of Stochastic Analysis, 9(4):439–448, 1996. 24

[6] Richard Durrett. Stochastic calculus: a practical introduction, volume 6. CRC press, 1996. 4

[7] B. E. Fristedt and L. F. Gray. A modern approach to . Springer Science & Business Media, 1997. 40

[8] Y. Hu and B. Øksendal. Fractional white noise calculus and applications to finance. Infinite dimensional analysis, quantum probability and related topics, 6(01):1–32, 2003. 29

[9] Yaozhong Hu and David Nualart. Parameter estimation for frac- tional ornstein–uhlenbeck processes. Statistics & Probability Letters, 80(11):1030–1038, 2010. 50

[10] Harold Edwin Hurst. {Long-term storage capacity of reservoirs}. Trans. Amer. Soc. Civil Eng., 116:770–808, 1951. 21

45 [11] Terhi Kaarakka and Paavo Salminen. On fractional ornstein-uhlenbeck processes. Communications on Stochastics Analysis, 5(1):121–133, 2011. 1

[12] Peter E Kloeden and RA Pearson. The numerical solution of stochas- tic differential equations. The Journal of the Australian Mathematical Society. Series B. Applied Mathematics, 20(01):8–12, 1992. 10

[13] A. N. Kolmogorov. Foundations of the Theory of Probability. Chelsea Publishing Co., 1950. 38

[14] Andrei N Kolmogorov. Wienersche spiralen und einige andere interes- sante kurven im hilbertschen raum. In CR (Dokl.) Acad. Sci. URSS, vol- ume 26, pages 115–118, 1940. 21

[15] Y. A. Kutoyants. Statistical inference for ergodic diffusion processes. Springer Science & Business Media, 2004. 40, 44

[16] Robert Liptser and Albert N Shiryaev. Statistics of random Processes: I. general Theory, volume 5. Springer Science & Business Media, 1977. 43, 44

[17] P. Malliavin. Ck-hypoellipticity with degeneracy. Stochastic Analysis, pages 199–214, 1978. 29

[18] P. Malliavin. Stochastic calculus of variation and hypoelliptic operators. In Proc. Intern. Symp. SDE Kyoto 1976, pages 195–263. Kinokuniya, 1978. 29

[19] Benoit B Mandelbrot and John W Van Ness. Fractional brownian mo- tions, fractional noises and applications. SIAM review, 10(4):422–437, 1968. 21

[20] D. Nualart. The Malliavin calculus and related topics. Probability and its Applications (New York),(2006). Springer-Verlag, Berlin. Math. Review 2006, 2006. 22, 29, 33

[21] David Nualart. Fractional brownian motion: stochastic calculus and applications. In International Congress of Mathematicians, volume 3, pages 1541–1562, 2006. 24

[22] B. Oksendal. Stochastic differential equations: an introduction with ap- plications. Springer Science & Business Media, 2013. 10

[23] P. Protter. Stothastic Integration and Differential Equations: a New Ap- proach. Springer, 1990. 10, 23

46 [24] Chris G. Rogers, L. and D. Williams. Diffusions, Markov processes and martingales: Volume 1, Foundations, volume 2. Cambridge university press, 2000. 40

[25] Anatoliy V Skorokhod. On a generalization of a stochastic integral. The- ory of Probability & Its Applications, 20(2):219–233, 1976. 34

[26] D. Williams. Probability with martingales. Cambridge university press, 1991. 38, 41

[27] Laurence C Young. An inequality of the holder type, connected with stieltjes integration. Acta Mathematica, 67(1):251–282, 1936. 24

47 48 6. Summary of Papers

Paper I: Empirical evidence on arbitrage by changing the stock exchange

In this paper we study the change of the stock exchange from a perspective of the mathematical finance. This work was motivated by the press release of the software developer Red Hat, Inc. in 2006 announcing the departure from NASQAD stock exchange to be listed to the New York Stock Exchange, on their belief, that it would reduce trading volatility. The paper shows how arbitrage can be generated in an options market by a change in the volatility due to the change of stock exchange of the underlying asset. We introduce the dynamics of a financial asset by a non-homogeneous in volatility Black-Scholes model. We assume a reduction in the trading volatility after changing the stock exchange. The prices of the options in models in the two stock exchange do not coincide, and this give rise to arbitrage opportuni- ties. We construct one arbitrage opportunity by short-selling a convex European vanilla option and we extend the result to a general convex function by means of the integral representation of convex functions. We determine the feasibility of the reduction volatility assumption with daily adjusted closing prices of the Red Hat stock during the period from February 6, 2006 to May 8, 2008. We analyse the data using a technical method called Bollinger bands to identify periods of high and low volatility. We also carry out a left-tailed F-test at 1% significance level. In short, the analysis confirms that the volatility changes in a significant manner after switching the trading market and that the structural change in the volatility described by our model exist in a practical setting.

Paper II: Initial Enlargement in a Markov Chain Market Model.

Enlargement of filtrations is a classical topic in the general theory of stochas- tic processes. This theory has been applied to stochastic finance in order to

49 analyze models with insider information. We study initial enlargement in a Markov chain market model, introduced by Norberg. In this model the state of economy is modeled by a finite state Markov chain, and the state of econ- omy determines the dynamics for the risky assets. The ordinary agent has the information described by the filtration generated by an observable process, but the insider has the additional information given by a certain random variable. We assume that the ordinary agent has no arbitrage possibilities. Then, in the initial enlargement the following things can happen;

• In the original filtration the jump times are totally inaccessible, but in the enlarged filtration there can be accessible and predictable jump times.

• Independently of the possible changes in the properties of jump times, the insider may have arbitrage possibilities, or may not have arbitrage possibilities.

The motivation for this study comes from the jump model example introduced by Kohatsu-Higa. Our results show some additional features in the enlarge- ment theory for processes with jumps.

Paper III: Drift parameter estimation for fractional Ornstein- Uhlenbeck of the second kind

In this paper we consider the least squared estimator of the unknown drift pa- rameter based on continuous observations of a process drived by the fractional Onrstein-Uhlenbeck process of the second kind. This set-up is an extension of the fractional Ornstein-Uhlenbeck process of the first kind studied in [9]. We replace the Gaussian noise driving the process with an equivalent (in distribution) noise and do computations in an statistical equivalent model. We prove that the least-squares estimator introduced in [Hu Y, Nualart D. Param- eter estimation for fractional Ornstein–Uhlenbeck processes. Stat. Probab. Lett. 2010;80(11–12):1030–1038], provides a consistent estimator. Moreover, using central limit theorem for multiple Wiener integrals, we prove asymptotic normality of the estimator valid for the whole range H ∈ (1/2,1).

Paper IV: On Simulation of fractional Ornstein-Uhlenbeck of the second kind by Circulant Embedding method

Fractional Ornstein-Uhlenbeck of the second kind (fOU2) comprises a family of Gaussian processes constructed via the Lamperti transformation of frac- tional Brownian motion with index H (0 < H < 1).

50 A random vector of fOU2 can be simulated by factorizing its covariance matrix. Among the existing algorithms, Hosking’s method and Cholesky de- composition are the standard methods to do this. These methods can simu- late accurately one-dimentional fOU2, but they have a high computational cost when are applied on some fine grid. The reason is that the fOU2 vector be- comes too long and its covariance matrix too large to handle in terms of storage and computational requirements. However, if the covariance matrix is circu- lant, a better algorithm in terms of the dual criteria "accurancy and efficiency" can be applied, the method is called circulant embedding method. The circulant embedding method exploits the fact that the factorization of a circulant matrix has very low computational cost using the fast Fourier transform; particularly if the number of grid points is a power of two. Although the fOU2 covariance matrix is not circulant, we can embed it into a circulant matrix.

Paper V: Simulations-based study of covariance structure for fractional Ornstein-Uhlenbeck process of the second kind

In this paper we explore the behaviour of the covariance structure as a func- tion of the Hurst parameter H and the drift parameter θ. We present well- known computational algorithms based on the covariance function to simulate Gaussian and non-Gaussian fractional Ornstein-Uhlenbeck of the second kind ( fOU2). We also check the assumptions that the covariance has to fulfil to apply these methods. We propose a new closed-form of the fOU2 covariance function to be able to study the range of the parameter values. The key point is to study the asymp- totic behaviour of the integral part of the new expression by means of numeri- cal growth rate analysis. In the range of parameter values defined in this study, the fOU2 Gaussian process can be generated exactly by means of the embedding matrix method. As an extension of this algorithm, we study the non-Gaussian case by Hermite approximations and memoryless non-linear translations. We illustrate the algorithms with a variety of distributions such as Gaus- sian, Gamma, Log-normal, Pareto and t-distribution. We also investigate the quantitative mean error of the sample covariance for an individual realization.

51 52 7. Swedish Summary

Föreliggande doktorsavhandling strävar efter att utöka sannolikhetsbaserade och statistiska modeller med stokastiska differentialekvationer. De beskrivna modellerna fångar väsentliga egenskaper i data som inte förklaras av klassiska diffusionsmodeller för brownsk rörelse. Nya resultat, som författaren har härlett, presenteras i fem uppsatser. De är ordnade i två delar. Del 1 innehåller tre uppsatser om statistisk inferens och simulering av en familj av stokastiska processer som är relaterade till frak- tionell brownsk rörelse och Ornstein-Uhlenbeckprocessen, så kallade andra ordningens fraktionella Ornstein-Uhlenbeckprocesser (fOU2). I två av upp- satserna visar vi hur vi kan simulera fOU2-processer med hjälp av cyklisk inbäddning och minneslös transformering. I den tredje uppsatsen konstruerar vi en minsta-kvadratestimator som ger konsistent skattning av driftparametern och bevisar centrala gränsvärdessatsen med tekniker från statistisk analys för gaussiska processer och malliavinsk analys. Del 2 av min forskning består av två uppsatser om marknadsmodeller med plötsliga hopp och portföljstrategier med arbitrage för en insiderhandlare. En av uppsatserna beskriver två arbitragefria marknader med riskneutrala värder- ingsformeln och en arbitragestrategi som består i växla mellan marknaderna. Den väsentliga komponenten är skillnaden mellan marknadernas volatilitet. Statistisk evidens i den här situationen visas utifrån ett sekventiellt datamate- rial. I den andra uppsatsen analyserar vi arbitragestrategier hos en insiderhand- lare i en finansiell marknad som förändrar sig enligt en Markovkedja där alla förändringar i tillstånd består av plötsliga hopp. Det gör vi med en likelihood- process. Vi konstruerar detta med utökad filtrering med hjälp av Itôanalys och allmän teori för stokastiska processer.

53 54