Imperial College of Science, Technology and Medicine Department of Computing

Parallel Numerical methods for SDEs and Applications

Nada Atallah

Submitted in part fulfilment of the requirements for the degree of Doctor of Philosophy in Computing and the Diploma of Imperial College London, July 2016

Abstract

Stochastic Differential Equations (SDEs) constitute an important mathematical tool with appli- cations in many areas of research such as finance, physics and computer science. The analytical study of these equations is problematic, especially in the multi-dimensional case, for this reason, numerical techniques prove to be necessary to solve such equations. In this project, parallel numerical techniques for SDEs are studied. Two kinds of parallelism will be explored: in space and in time. Implementation of these techniques applied to several systems of SDEs will be realised (using C++ and MPI) and performance measures like speedup and efficiency will be investigated on medium-scale computer clusters. In the second part of the thesis, a major application area in the field of computer and com- munication networks will be studied, that of second-order stochastic fluid networks. Recently, interest has been growing in networks with large numbers of components, with application in diverse fields, such as internet performance evaluation, the spread of computer viruses and biochemistry. Such models have such a large state space that discrete-state models are numer- ically infeasible due to the explosion in the size of the state space. Fluid approximations are therefore preferable and tend to be more accurate in large state spaces. In fluid models, an integer counter is replaced by a real number representing a volume and the solution method becomes based on differential equations rather than on difference equations. Some analytical solutions are possible in special cases but in general, numerical methods are required. In this project, parallel numerical studies of second order fluid networks will be conducted and re- sults in the context of the performance of computer and communication networks have been analysed, facilitating design improvement of their architecture.

i ii Acknowledgements

I would like to express my sincere gratitude to:

• my supervisor Peter Harrison,

• my second supervisor Tony Field,

• AESOP group,

• my brother, sister and all my friends who made this experience enjoyable and fruitful.

iii iv Contents

Abstract i

Acknowledgements iii

1 Introduction 1

2 Mathematical background 5

2.1 Measure and probability ...... 5

2.1.1 Probability Spaces ...... 5

2.1.2 Random variables and stochastic processes ...... 6

2.1.3 and variance ...... 10

2.1.4 Distribution Functions and independence ...... 11

2.1.5 Limit Theorems ...... 12

2.1.6 Conditional expectation ...... 13

2.1.7 Martingales ...... 15

2.2 Brownian Motion ...... 17

2.2.1 Properties ...... 18

v vi CONTENTS

2.2.2 Brownian Martingales ...... 18

2.2.3 Reflection principle ...... 20

2.2.4 Path properties of Brownian motion ...... 22

2.3 Poisson Process ...... 24

2.3.1 Properties ...... 25

2.3.2 Poisson Martingales ...... 25

2.3.3 Path properties of Poisson processes ...... 27

2.4 Levy processes ...... 28

2.5 Stochastic Integrals and Itˆo’sFormula ...... 28

2.5.1 Stochastic integral for simple step integrand ...... 28

2.5.2 Stochastic integral for square integrable adapted integrands ...... 30

2.5.3 Further extension of the stochastic integral ...... 30

2.5.4 Itˆoformula (with respect to Brownian motion) ...... 31

2.5.5 Itˆo’sformula (with respect to Itˆoprocesses) ...... 33

2.6 Stochastic differential equations ...... 33

3 Numerical Methods for SDEs 36

3.1 Examples of explicitly solvable SDEs ...... 36

3.2 Existence and uniqueness of strong solutions ...... 38

3.2.1 Solutions for ODEs ...... 39

3.2.2 Solutions for SDEs ...... 40

3.3 Stochastic discrete time approximations ...... 43 CONTENTS vii

3.3.1 Types of approximation ...... 44

3.3.2 Wagner-Platen expansion ...... 44

3.3.3 Convergence ...... 46

3.4 Strong approximations ...... 48

3.4.1 Euler Scheme ...... 48

3.4.2 Milstein Scheme ...... 50

3.4.3 Higher order schemes ...... 53

3.4.4 Geometric Brownian motion simulation ...... 53

3.5 Weak approximations ...... 57

3.5.1 Weak error criterion ...... 59

3.5.2 Euler scheme ...... 59

3.5.3 Higher order schemes ...... 61

4 Parallel stochastic simulation 63

4.1 Phase space Parallelism ...... 63

4.1.1 Method ...... 64

4.1.2 Numerical simulation ...... 65

4.2 Parallelism in time ...... 67

4.2.1 The Parareal algorithm ...... 68

4.2.2 Ordinary differential equations ...... 69

4.2.3 Stochastic differential equations ...... 73 5 Application: Second order stochastic fluid models 89

5.1 and fluid models ...... 89

5.2 Second order single fluid queue model ...... 90

5.2.1 Pathwise construction of the dynamics of a single server fluid queue . . . 91

5.2.2 Diffusion approximation and second order model ...... 93

5.2.3 Analytical study of second order fluid queues in a random environment . 96

5.2.4 Example of a single fluid queue and numerical results ...... 102

5.3 Second order stochastic queueing network model ...... 111

5.3.1 Traffic equations for single class Generalized ...... 111

5.3.2 Pathwise construction of the dynamics of a single class open GJN . . . . 113

5.3.3 Diffusion approximation and second order model ...... 115

5.3.4 Second order fluid network model analytically ...... 118

5.3.5 Second order fluid network model numerically ...... 123

6 Conclusion 134

viii List of Tables

4.1 Timing for a different number of processors...... 66

ix x List of Figures

2.1 Histograms for different values of n ...... 14

2.2 Sample paths of Brownian Motion ...... 17

2.3 Sample path of Brownian Motion and its reflected path ...... 21

2.4 Sample path of a Poisson process ...... 24

2.5 Sample path of a of intensity 10 and marks uniformly distributed over 0-1 ...... 27

3.1 Sample path of the solution for N = 64 ...... 51

3.2 Sample path of the solution for N = 1024 ...... 51

3.3 Sample path of the solution for N = 16 ...... 54

3.4 Sample path of the solution for N = 64 ...... 54

3.5 GBM sample paths for different values of N ...... 56

3.6 Log-Log plot of the absolute error versus the time increment for the Euler scheme 58

3.7 Log-Log plot of the absolute error versus the time increment for the Milstein scheme ...... 58

3.8 Q-Q Plot comparing the simulation result of the Naive Euler algorithm with the exact solution of the GBM ...... 62

xi xii LIST OF FIGURES

4.1 Speedup versus the number of processors ...... 66

4.2 System efficiency versus number of processors ...... 67

4.3 Parareal algorithm [43] ...... 87

4.4 Parareal simulation of reflected SDE ...... 87

4.5 First Parareal simulation of GBM ...... 88

5.1 Single fluid queue ...... 91

5.2 Markov modulated process ...... 97

5.3 On-Off process ...... 103

5.4 Q-Q plot in the fluid flow case σ1 = σ2 = 0...... 106

5.5 Q-Q plot when σ1 = 1, σ2 = 0...... 107

5.6 Analytical and numerical solution of F(x) versus x when σ1 = 0, σ2 = 0. . . . . 107

5.7 Analytical and numerical solution of F(x) versus x when σ1 = 1, σ2 = 0. . . . . 108

5.8 Analytical and numerical solution of F(x) versus x when σ1 = 0, σ2 = 1. . . . . 108

5.9 Analytical and numerical solution of F(x) versus x when σ1 = σ2 = 1...... 109

5.10 Stationary distribution of the buffer content...... 110

5.11 Expectation of the buffer content as a function of the variance...... 110

5.12 A generalized Jackson network (GJN) ...... 112

5.13 Tandem of fluid queues ...... 125

5.14 Mean queue length function of time with Q1(0) equal to 3 ...... 128

5.15 Mean queue length function of time with Q1(0) equal to 0 ...... 129

5.16 Variance of the queue length function of time with Q1(0) equal to 3 ...... 130 5.17 Variance of the queue length function of time with Q1(0) equal to 0 ...... 131

5.18 Stationary distribution with Q1(0) equal to 3 ...... 132

5.19 Stationary distribution with Q1(0) equal to 0 ...... 133

xiii xiv Chapter 1

Introduction

Motivation, Objectives and Contribution

Stochastic differential equations arise in many fields of study like physics, computing and biol- ogy. They extend the concept of differential equations by taking into consideration the noise factor. Few SDEs can be solved analytically, for this reason, the numerical approach constitute a very important method to solve especially high-dimensional SDEs. But in a lot of cases, they reveal to be highly demanding when the processing time or the memory utilisation is considered. Therefore, recurring to parallelize these methods seems to be a necessity for a lot of problems faced in this field of study. As for example, in the field of physics, a lot of problems related to nonequilibrium statistical mechanics, cosmology, ... seem to involve SDEs like the escape rate from a potential well problem [35]. And they invoke the solution of infinite-dimensional SDEs which requires the use of numerical simulation in order to solve them and parallelization techniques for better performance and to avoid memory or long time processing related issues.

In this thesis, numerical techniques to solve SDEs will be presented and applied to problems in computer and communication networks. Fluid modeling is an important tool when estimating the performance of queueing systems like computer and communication networks especially in

1 2 Chapter 1. Introduction the heavy traffic limit or in cases when we are faced with the state space explosion problem. The second order fluid queue models are fluid models that take into account the noise factor, consequently the mathematical description of these models tends to involve stochastic differen- tial equations. The solution of these equations turns out to be intractable analytically in most of the cases especially when considering large scale models of computer and communication systems.

Throughout this thesis, an extensive study of parallel numerical techniques for stochastic dif- ferential equations will be undertaken. It starts with the embarrassingly parallel case of phase space parallelism. These techniques involving Monte Carlo methods are useful when the inter- est lies on the weak approximation of stochastic differential equations. On the other hand when the focus is on the strong approximation the only kind of parallelism available to explore is the time parallelism that was originally explored for stochastic differential equations by G. Bal [43] in his extension of the parareal algorithm to cover the SDEs case. In his study he considered the Euler scheme as a coarse solver and did the convergence study to prove an improvement in the convergence order by a factor of k over the Euler technique after k iterations of the parareal algorithm thus obtaining k/2 for convergence rate with an Euler coarse solver of order (1/2). We have proceeded with an extended study and took the Milstein scheme as a coarse solver and proved that after k iterations we fall on an order of convergence of k with a Milstein coarse solver of order 1. Several stochastic differential equations have been simulated to visualise the convergence to the exact solution.

In the second part of the thesis, second order fluid models of computer and communication systems were analysed. A detailed study of the single fluid case was undertaken analytically and numerically. Where the weak approximation techniques explored in previous chapters have been used to explore numerically this kind of fluid queues and the analytical solution was used to show the convergence of these techniques to the exact solution when simulating fluid queues with the noise factor. Generalized Jackson networks and their reflected Brownian motion RBM approximation were introduced. Numerical techniques available for multidimensional RBM 3 were presented and the example of a tandem of fluid queue was explored numerically.

Chapters outline

This thesis starts with a chapter describing briefly the mathematical jargon and tools necessary for a good formalism of physical problems where the noise factor is taken into consideration. More specifically, stochastic differential equations and Itˆo calculus are presented in a nutshell. Visiting briefly concepts from measure and , Brownian motion, Levy pro- cesses and Itˆo integration.

In the following chapter, numerical techniques used to solve stochastic differential equations are explored. Beginning with a short survey of the analytical background behind the existence and uniqueness of the solution of stochastic differential equations and the assumptions necessary for that. Which sets the conditions without which solving numerically an equation might lead to erroneous results. Afterwards, numerical schemes for the simulation of strong and weak approximations of stochastic differential equations, are studied thoroughly with their order of convergence and numerical examples were provided for the sake of clarity.

The fourth chapter deals with the parallelization of the numerical schemes for stochastic differ- ential equations. In this chapter, phase space parallelism and time parallelism are investigated. Their domain of application is different. It depends whether one is seeking a strong or weak ap- proximation of these differential equations. Both aspects of parallelism are studied thoroughly, though the main emphasis as stated earlier, was on the parareal algorithm where an extension to previous studies was undertaken through considering Milstein scheme as the coarse solver and proving that after k iterations, a convergence rate of k is achieved.

In the last chapter, second order fluid queues in the field of computer and communication are in- 4 Chapter 1. Introduction vestigated. An analytical and numerical study of the single fluid queue case is undertaken with the purpose of validating the numerical results against the exact ones. This study is extended to the case of Generalized Jackson networks, that were studied analytically at a first stage and numerically thereafter. The example of a tandem of fluid queue was analysed numerically.

The purpose of the sixth chapter is to provide a suitable conclusion for this PhD thesis.

Statement of Originality

To the best of my knowledge, this thesis contains no material previously published or written by another person except where due acknowledgment is made in the thesis itself. Chapter 2

Mathematical background

In this chapter, we will discuss briefly the relevant mathematical concepts behind the theory of stochastic differential equations.

2.1 Measure and probability

2.1.1 Probability Spaces

Suppose that Ω is a non empty set.

Definition 2.1.1. The set of all possible outcomes of an experiment is called the sample space and is denoted by Ω. Events are subsets of the sample space.

Definition 2.1.2. A field on Ω is a collection F of subsets of Ω with the following properties:

(i) ∅ ∈ F. (ii) If E ∈ F, then EC ∈ F, where EC = Ω −E.

(iii) If E1,E2 ∈ F, then E1 ∪ E2 ∈ F.

Definition 2.1.3. A σ-algebra is a collection F of subsets of Ω with the following properties:

(i) ∅ ∈ F. (ii) If E ∈ F, then EC ∈ F, where EC = Ω −E.

5 6 Chapter 2. Mathematical background

∞ (iii) If E1,E2,... ∈ F, then ∪i=1Ei ∈ F.

A σ-algebra is closed under the operation of taking countable intersections.

Definition 2.1.4. A measure µ is a function defined on a σ-algebra F over a set Ω and taking values in the extended interval [0,∞] such that the following properties are satisfied:

(i) The empty set has measure zero: µ(∅) = 0.

(ii) Countable additivity or σ-additivity: if E1,E2,... is a countable sequence of pairwise disjoint S∞ P∞ sets in F: µ ( i=1 Ei) = i=1 µ(Ei).

The triple (Ω,F,µ) is then called a measure space, and the members of F are called measurable sets [1, 2, 3].

Definition 2.1.5. A Borel σ-field B is the σ-field generated by all intervals. The elements of

B are called Borel sets. The class B of Borel sets in Euclidean Rn is the smallest collection of sets that includes the open and closed sets.

\ B = {F : F is a σ-field containing all intervals}

Definition 2.1.6. A probability space is a triple (Ω,F,P) where Ω is an arbitrary set, F is a σ-algebra of subsets of Ω and P is a measure on F such that: P(Ω) = 1 called probability measure on F or, briefly, probability.

2.1.2 Random variables and stochastic processes

Probability spaces may not be ”directly observable” therefore it is useful to introduce mappings

X from Ω to Rn, the values of which can be observed [4, 5].

Definition 2.1.7. Let (Ω,F,P) be a probability space. A mapping X: Ω → Rn is called an n-dimensional random variable if for each B ∈ B, we have:

X−1(B) ∈ F 2.1. Measure and probability 7

Lemma 2.1.8. Let X :Ω → Rn be a random variable. Then

U(X) := {X−1(B) | B ∈ B} is a σ-algebra generated by X. This is the smallest sub-σ-algebra of U with respect to which X is measurable.

Definition 2.1.9. Given a probability space (Ω,F,P) and a measurable space (S, Σ), a stochas- tic process is a collection of S-valued random variables {X(t), t ∈ T } on the set Ω, indexed by the parameter t taking values in the set T. The random variable takes values in the set S, called the state-space of the . For each point ω ∈ Ω, the mapping t → X(t, ω) is the corresponding sample path.

The parameter t is usually used to represent the time. When T is a countable set, the process is said to be a discrete-time process. If T is an interval of the real line, the process is said to be continuous-time process.

Continuity of stochastic processes

In this paragraph, we introduce the notion of continuity for stochastic processes, for more details the reader is referred to references [50, 51, 52].

Definition 2.1.10. (Continuity in Sample Paths) A stochastic process {X(t, .):Ω → R, t ∈ T } is continuous at time s, s ∈ T, if for almost all ω ∈ Ω, t → s implies X(t, ω) → X(s, ω). A process is continuous if, for almost all ω ∈ Ω,X(., ω) is a continuous function.

Definition 2.1.11. (Cadlag) A stochastic process {X(t, .):Ω → R, t ∈ T }, is cadlag if for almost all ω ∈ Ω, we have that for every s ∈ T : t ↓ s implies X(t) → X(s), and for t ↑ s, limX(t) exists, but need not be X(s). t↑s In other words, a stochastic process X(t) is cadlag if almost all its sample paths are continuous from the right and limited from the left at every point. 8 Chapter 2. Mathematical background

The term cadlag stands for the french expression ’continues `adroite, limites `agauche’. These processes are also referred to by rcll(right continuous with left limits). Although this property is weaker than the pathwise continuity, it is still a very important one, widely used when real life applications are considered.

Definition 2.1.12. (Continuity in L2) A stochastic process {(X(t, .):Ω → R, t ∈ T } is continuous in mean square if E[X(t)2] < ∞ for all t and lim E[(X(s) − X(t))2] = 0 for all s→t t ∈ T.

The notion of expectation E[X] is introduced in section (2.1.3).

Definition 2.1.13. (Continuity in Lp) A stochastic process {(X(t, .):Ω → R, t ∈ T } is

p p continuous in Lp if E[X(t) ] < ∞ for all t and lim E[(X(s) − X(t)) ] = 0 for all t ∈ T. s→t

Definition 2.1.14. (Continuity in Probability) A stochastic process {(X(t, .):Ω → R, t ∈ T } is continuous in probability at time s ∈ T, if:

lim P [ω ∈ Ω: |X(t) − X(s)| ≥ ε] = 0, for all ε > 0. t→s

{X(t), t ∈ T } is continuous in probability or stochastically continuous if the previous condition holds for all t ≥ 0.

Evidently, continuity in sample paths implies continuity in Lp and continuity in probability, for more details please refer to [48].

Indistinguishable processes

Stochastic processes are mathematical tools used to represent natural phenomena. The fol-

0 lowing definitions are useful to determine when two processes Xt and Xt represent the same phenomena [48].

0 Definition 2.1.15. Let {Xt, t ∈ T } and {Xt, t ∈ T } be two stochastic processes sharing the same state space (G, G) defined respectively on the probability spaces (Ω,F,P) and (Ω0,F 0,P’). 2.1. Measure and probability 9

The two processes are equivalent if for every finite set of instants of time {t1, t2, ... , tn} in T and elements A1,A2, ... , An of G:

0 0 0 P (Xt1 ∈ A1,Xt2 ∈ A2, ... , Xtn ∈ An) = P (Xt1 ∈ A1,Xt2 ∈ A2, ... , Xtn ∈ An).

0 {Xt} and {Xt} are also said to have the same law.

0 Definition 2.1.16. Let {Xt, t ∈ T } and {Xt, t ∈ T } be two stochastic processes defined on the same probability space (Ω,F,P) sharing the same state space (G, G). The process {Xt} is

0 a modification of {Xt} if:

0 Xt = Xt a.s. for each t ∈ T.

0 Definition 2.1.17. Let {Xt, t ∈ T } and {Xt, t ∈ T } be two stochastic processes defined on the same probability space (Ω,F,P) sharing the same state space (G, G). The processes {Xt} and

0 {Xt} are P-indistinguishable if for almost all ω ∈ Ω:

0 Xt(ω) = Xt(ω) ∀ t ∈ T.

0 Lemma 2.1.18. Let {Xt, t ∈ T } and {Xt, t ∈ T } be two right-continuous stochastic processes defined on the same probability space (Ω,F,P) sharing the same state space (G, G). If the

0 process {Xt} is a modification of {Xt} then they are indistinguishable.

For more details, please refer to [48].

Examples of stochastic processes in real life

Stochastic processes have many applications in lots of fields like biology, physics, finance, telecommunication networks and many others [6].

Example 2.1.19. (Queues) Let X(t) be the number of customers waiting for service in a service facility. {X(t), t ≥ 0} is a continuous-time stochastic process with state-space S = {0, 1, 2, ...}.

Example 2.1.20. (Manufacturing) Let N be the number machines in a machine shop. Each machine can be in one of the two states: working or under repair. Let Xi(t) = 1 if the i-th 10 Chapter 2. Mathematical background

machine is working at time t, and 0 otherwise. Then {X(t) = (X1(t),X2(t), ..., XN (t)), t ≥ 0} is a continuous-time stochastic process with state-space S = {0, 1}N .

Example 2.1.21. (DNA Analysis) A DNA (Deoxyribose Nucleic Acid) is a long molecule that consists of a string of four basic molecules called bases, represented by A, T,G,C. Let Xn be the n-th base in the DNA. Then {Xn, n ≥ 0} is a discrete time stochastic process with state-space S = {A, T, G, C}.

2.1.3 Expected value and variance

Let X :Ω → R be a random variable.

Definition 2.1.22. The expected value or mean value of X is defined by:

Z E(X) := X.dP Ω

Definition 2.1.23. The variance of X is defined by:

Z V (X) := | X − E(X) |2 .dP Ω where | . | denotes the Euclidean norm.

Lemma 2.1.24. (Markov’s inequality). If X is a nonnegative random variable, then, for any value k > 0, E[X] P (X ≥ k) ≤ k

Lemma 2.1.25. (Chebyshev’s inequality). If X is a random variable with mean µ and variance σ2, then, for any value k > 0, σ2 P (| X − µ |≥ k) ≤ k2

The importance of Markov’s and Chebyshev’s inequalities lies in the fact that they provide us with a mean to derive bounds on probabilities when we only know the value of the mean or the values of both the mean and the variance while ignoring the probability distribution. Certainly 2.1. Measure and probability 11 when the distribution is known, the seeked probabilities can be determined exactly without recurring to the bounds [7].

2.1.4 Distribution Functions and independence

Let X :Ω → Rn be a random variable defined on the probability space (Ω,F,P) and let A and B be two events with P (B) > 0.

n Definition 2.1.26. The distribution function of X is the function FX : R → [0, 1] defined by:

n FX (x) := P (X ≤ x) for all x ∈ R

Definition 2.1.27. If there exists a nonnegative, integrable functionf : Rn → R such that:

Z x F (x) = f(y).dy −∞ then f is called the density function for X.

Proposition 2.1.28. Suppose g : Rn → R, and Y = g(X) is integrable. Then

Z E(Y ) = g(x).f(x)dx. n R

Definition 2.1.29. Two events A and B are called independent if:

P (A ∩ B) = P (A).P (B).

n Definition 2.1.30. Let Xi :Ω → R be random variables (i = 1, ...). We say the random variables X1, ... are independent if ∀k ∈ N such that: k ≥ 2 and all choices of Borel sets

n B1, ..., Bk ⊆ R :

P (X1 ∈ B1,X2 ∈ B2, ..., Xk ∈ Bk) = P (X1 ∈ B1)P (X2 ∈ B2)...P (Xk ∈ Bk). 12 Chapter 2. Mathematical background

2.1.5 Limit Theorems

A very famous result in probability theory is ’the strong ’, it states that the average of a sequence of independent, identically distributed random variables converges almost surely to the expected value of that distribution [1].

Theorem 2.1.31. (Strong law of large numbers). Let X1, ..., Xn, ... be a sequence of indepen- dent, identically distributed random variables defined on the same probability space, each having a finite mean µ = E[Xi]. Then,

X + ... + X P (lim 1 n = µ) = 1. n→∞ n

Another very well-known theorem in probability theory is ’the ’. In ad- dition to its theoretical value, this theorem presents a simple method to compute approximate probabilities for sums of independent random variables. It also provides an explanation for the prevalence of the normal distribution curve shape among the empirical frequencies of many real populations.

Theorem 2.1.32. (Central limit Theorem). Let X1, ..., Xn, ... be a sequence of independent, identically distributed, real-valued random variables each having mean µ and variance σ2. Then for all , −∞ < a < +∞

x2 Z a X1 + ... + Xn − n.µ 1 − limn→∞P ( √ ≤ a) = √ e 2 dx. nσ 2π −∞

Example 2.1.33. The experiment below illustrates this important mathematical concept. We

n generate N sequences of exponentially distributed random variables {Xi}i=1. Their distribution function is given by: 2.1. Measure and probability 13

FX (x) = 0 for x < 0

1 − exp(−λx) for x > 0

where λ = 0.5 . The average of this exponential distribution is µ = 2 and the variance is σ2 = 4. For each of the

n N sequences of generated random variables {Xi}i=1, we determine the value of the normalized random variable Zn given by: Pn X − nµ Z = i=1 √i n σ n

These N values of Zn will be used to plot the histogram approximating the simulated density function of Zn. This experiment is repeated for different values of n. From figure 2.1, it is obvious that as n increases, the density function of Zn approximated by the histogram, will tend more and more to the density function of a standard Gaussian distribution.

2.1.6 Conditional expectation

Definition 2.1.34. Let X be an integrable random variable on a probability space (Ω,F,P) and U a sigma field contained in F. Then E(X | U) is defined to be a random variable such that: (1) E(X | U) is U-measurable; (2) For any A ∈ U, Z Z XdP = E(X | U)dP. A A

Theorem 2.1.35. (Properties of conditional expectation). (1) E(E(X | U)) = E(X) a.s. (2) If a, b are constants, E(aX + bY | U) = aE(X | U) + bE(Y | U) a.s. 14 Chapter 2. Mathematical background

(a) n = 3 (b) n=5

(c) n = 10 (d) n=15

(e) n = 100 (f) n=500

Figure 2.1: Histograms for different values of n 2.1. Measure and probability 15

(3) If X is U-measurable and XY is integrable, then

E(XY | U) = XE(Y | U) a.s.

(4) If X is positive, then E(X | U) ≥ 0 a.s. (5) If W ⊆ U, we have

E(E(X | U) | W) = E(X | W) a.s.

2.1.7 Martingales

Definition 2.1.36. Let (Ω,F,P) be a probability space, a (discrete) filtration is an increasing sequence of sub-σ-fields (F n)n≥0 of F; i.e.

F 0 ⊂ F 1 ⊂ F 2 ⊂ ... ⊂ F n ⊂ ... ⊂ F.

We write F = (F n)n≥0 [1, 2, 8].

Definition 2.1.37. The sequence (Xn)n≥0 of random variables is adapted to the filtration F if

Xn is F n-measurable for every n ≥ 0. The tuple (Ω,F,(F n)n≥0,P) is called a filtered probability space.

Definition 2.1.38. A sequence of real-valued random variables X1,X2, ... is called a martingale with respect to a filtration F 1, F 2, ... if:

(1) Xn is integrable for each n = 1, 2, ... ;

(2) X1,X2, ... is adapted to F 1, F 2, ... ;

(3) E(Xk+1 | F k) = Xk a.s. for each k = 1, 2, ... .

Definition 2.1.39. A sequence of real-valued random variables X1,X2, ... is called a super- martingale with respect to a filtration F 1, F 2, ... if:

(1) Xn is integrable for each n = 1, 2, ... ;

(2) X1,X2, ... is adapted to F 1, F 2, ... ; 16 Chapter 2. Mathematical background

(3) E(Xk+1 | F k) ≤ Xk a.s. for each k = 1, 2, ... .

Definition 2.1.40. A sequence of real-valued random variables X1,X2, ... is called a submartin- gale with respect to a filtration F 1, F 2, ... if:

(1) Xn is integrable for each n = 1, 2, ... ;

(2) X1,X2, ... is adapted to F 1, F 2, ... ;

(3) E(Xk+1 | F k) ≥ Xk a.s. for each k = 1, 2, ... .

Example 2.1.41. (Symmetric ) Let X1,X2, ... be a sequence of independent iden- tically distributed random variables such that:

P {Xn = 1} = P {Xn = −1} = 0.5

and Yn the symmetric random walk given by:

Yn = X1 + X2 + ... + Xn,

2 then Yn and Yn − n are martingales with respect to the filtration F n = σ(X1,X2, ..., Xn).

Definition 2.1.42. A random variable τ taking values in the set {1, 2, ...} ∪ {∞} is called a with respect to a filtration F n if:

{τ = n} ∈ F n, for each n = 1, 2, ...

Theorem 2.1.43. (Doob’s Martingale inequalities)

(1) Doob’s maximal inequality: If Xn is a non-negative submartigale with respect to a filtration

F n, then for all λ > 0 1 P ( max Xk ≥ λ) ≤ E(Xn.1{ max Xk≥λ}) 0≤k≤n λ 0≤k≤n

2 (2)Doob’s maximal L inequality: If Xn is a non-negative square integrable submartigale with respect to a filtration F n, then

E(| max X(k)|2) ≤ (4.E(|X(n)|2). 0≤k≤n 2.2. Brownian Motion 17

2.2 Brownian Motion

A Brownian motion () is the most important example of a continuous-time martingale. It plays a central role in probability theory, the theory of stochastic processes as well as in many other fields of study [8, 9, 10].

Definition 2.2.1. A Brownian motion {Wt, (F t)t∈T } on (Ω,F,P) is a real-valued stochastic process that has continuous sample paths and stationary Gaussian independent increments, such that:

1. W0 = 0.

2. t → Wt(ω) is continuous.

3. For t ≥ s, Wt − Ws is a Gaussian random variable with mean 0 and variance t − s, and

is independent of the history F s.

Figure 2.2: Sample paths of Brownian Motion 18 Chapter 2. Mathematical background

2.2.1 Properties

Let {Wt, (F t)t∈T } be a standard one-dimensional Brownian motion on (Ω,F,P) . Then it follows that:

1. : for any finite sub-family (t1, ..., tn) of T, the finite dimensional dis-

tributions P(Wt1 ≤ x1, ..., Wtn ≤ xn) are multivariate normal with E(Wt) = 0 and

E(Wt.Ws) = min{s, t} for t ≥ 0 and s ≥ 0.

2. Time homogeneity: the process {Wt+s − Ws}, t ≥ 0 is a Brownian motion, for any s > 0.

3. Symmetry: the process {−Wt}, t ≥ 0 is a Brownian motion.

4. Scaling: the process {α.Wt/α2 }, t ≥ 0 is a Brownian motion, for any α > 0.

5. Time inversion: the process {t.W1/t}, t > 0 with W0 = 0 is a Brownian motion.

6. : {Wt}, t ∈ T, is a Markov process,

P [Wt ≤ x | F s] = P [Wt ≤ x | Ws], 0 ≤ s < t

P [Wt ≤ x | Ws] = P [Wt−s ≤ x], 0 ≤ s < t.

W 7. Strong Law of Large Numbers: lim t = 0, a.s. t→∞ t

For a more detailed study and proofs of the statements above please refer to [45, 46].

2.2.2 Brownian Martingales

Proposition 2.2.2. A Brownian motion {Wt}, t ≥ 0, is a martingale on the filtered probability space (Ω,F,P). 2.2. Brownian Motion 19

Proof. Briefly, (Wt − Ws) is independent of F s for s ≤ t, consequently:

E[Wt − Ws | F s] = E[Wt − Ws]

= 0 which implies that:

E[Wt | F s] = Ws

2 Proposition 2.2.3. The process {Wt − t} is a martingale on the filtered probability space

(Ω,F,P)

Proof. Briefly,

2 2 2 E[Wt − Ws | F s] = E[(Wt − Ws) + 2Ws(Wt − Ws) | F s]

2 = E[(Wt − Ws) | F s]

2 = E[(Wt − Ws) ]

= t − s which implies that:

2 2 E(Wt − t | F s) = Ws − s

2 Proposition 2.2.4. The process exp(δ.Wt − (δ /2).t) is a martingale on the filtered probability space (Ω,F,P), for δ ∈ R.

Proof. Briefly, for t ≥ s, (Wt − Ws) is a gaussian random variable, with mean 0 and variance

(t − s), independent of the history F s. Consequently: 20 Chapter 2. Mathematical background

2 δ.(Wt−Ws) (t−s).δ /2 E[e | F s] = e

2 δ.(Wt−Ws)−δ .(t−s)/2 E[e | F s] = 1 implying that:

2 2 δ.Wt−(δ /2).t δ.Ws−(δ /2).s E[e | F s] = e

2.2.3 Reflection principle

Let {Wt, t ≥ 0} be a Brownian motion on the filtered probability space (Ω,F,P).

Definition 2.2.5. The running maximum to date of Wt is defined as

Mt := sup Ws = maxWs s[0,t] s[0,t]

Definition 2.2.6. The first passage time Tx of Wt to a level x ∈ R is defined as:

Tx := inf{t ≥ 0,Wt = x}.

If m is a given positive level and t a given positive time, certain paths of the Brownian motion might exceed level m before or at time t others might only reach it while others might stay below it. Among the ones that exceed or reach this level we have 2 types of paths: one type that reach level m before time t and are found at some level a below level m at time t and the other type corresponds to the paths that reach level m before t but are found at a level above level m at time t . Which can be expressed mathematically by the equation below:

P [Tm < t] = P [Tm < t, Wt > m] + P [Tm < t, Wt < m]. (2.1) 2.2. Brownian Motion 21

The reflection principle provides a heuristic argument which states that for every path that crossed level m before time t and is found at some point w below level m at time t, there exists a symmetric path(obtained by reflection with respect to level m), which is found at the point (2m − w) at time t, figure 2.3, and both paths share the same probability because of the symmetry with respect to level m of the Brownian motion starting at this level. Therefore for every paths of the Brownian motion crossing level m before time t and ending at some point below level m corresponds an equally probable paths crossing level m before time t but ending up above level m. Which can be expressed by the equation:

P [Tm < t, Wt < m] = P [Tm < t, Wt > m] = P [Wt > m]. (2.2)

Figure 2.3: Sample path of Brownian Motion and its reflected path

Leading to the statement of the theorem below. 22 Chapter 2. Mathematical background

Theorem 2.2.7. The reflection equality states that for m ≥ 0 and w ≤ m,

2m − w P [Mt ≥ m, Wt ≤ w] = P [Wt ≥ 2m − w] = 1 − Φ( √ ), (2.3) t where Φ(.) is the CDF of the standard normal distribution.

Though based on the reflection principle, a rigorous mathematical proof can be provided using the strong Markov property, as for example in reference [45, 46].

Proposition 2.2.8. The density of the first passage time Tm of Wt to a level m ∈ R is given by:

|m| −m2/2t fT (t) = √ e m 2π.t3

Proof. Briefly,

P [Tm < t] = P [Tm < t, Wt > m] + P [Tm < t, Wt < m] using equation 2.2 we obtain that:

m P [Tm < t] = 2.P [Wt > m] = 2.(1 − Φ(√ )). t

By differentiating with respect to t, we obtain the density fTm (t).

2.2.4 Path properties of Brownian motion

Let {Wt, t ≥ 0} be a Brownian motion on the filtered probability space (Ω,F,P). We fix a time horizon 0 < T < ∞.

Definition 2.2.9. A partition of [0, T ] is a finite set Π = {t0, t1, t2, ..., tm} ⊂ [0,T ](m ∈ N) with 0 = t0 < t1 < ... < tm = T.

The maximal step size of Π is kΠk := maxi|ti+1 − ti|.

Theorem 2.2.10. ( of Brownian motion) If (Πn) is a refining sequence of 2.2. Brownian Motion 23

partitions with kΠkn → 0 then, for all t ∈ [0,T ],

X 2 lim |Wt ∧t − Wt | = t almost surely. n i+1 i ti∈Πn,ti≤t

Definition 2.2.11. For a function α : [0,T ] → R, the total variation of α up to t[0,T ] is defined as X TVt(α) := sup |α(ti+1 ∧ t) − α(ti)|, Π ti∈Π,ti≤t where the supremum is taken over all partitions. For a continuous function α(.) and a refining sequence (Πn) of partitions with kΠnk → 0,

X TVt(α) = lim |α(ti+1 ∧ t) − α(ti)|. n ti∈Πn,ti≤t

Theorem 2.2.12. (Total variation of Brownian motion) For a refining sequence (Πn) of par- titions with kΠnk → 0,

X lim |Wt ∧t − Wt | = +∞ almost surely. n i+1 i ti∈Πn,ti≤t

Theorem 2.2.13. (Law of Iterated Logarithm)

Wt lim supp = 1, a.s. t→0 2t log log (1/t)

W lim inf t = −1, a.s. t→0 p2t log log (1/t)

W lim sup√ t = 1, a.s. t→∞ 2t log log t

W lim inf √ t = −1, a.s. t→∞ 2t log log t

For a mathematical proof of the theorems above and for further details please refer to [45, 46]. 24 Chapter 2. Mathematical background

2.3 Poisson Process

Poisson processes play a central role in stochastic modelling especially when jumps are taken into consideration.

Let (Ω, F, F t, P) be a filtered probability space.

Definition 2.3.1. An F t-Poisson process {Nt}, t ≥ 0, of intensity (parameter) λ is a right- continuous, adapted and integer-valued process with stationary independent increments, such that:

1. (Independent Poisson Increments) For t ≥ s, Nt − Ns is a Poisson distributed random

variable with parameter λ(t − s) and is independent of the history F s:

(λ(t − s))k P (N − N = k) = e−λ(t−s). t s k!

2. (Step Function Paths) For t > 0, t → Nt(ω) is an increasing step function of t with jumps

of size one and initial value N0 = 0.

Figure 2.4: Sample path of a Poisson process 2.3. Poisson Process 25

2.3.1 Properties

Let {Nt}, t ≥ 0, be an F t-Poisson process of intensity λ on the filtered space (Ω, F, F t, P). Then it follows that:

1. Memoryless property: P (Nt+s > x + y | Ns > y) = (P (Nt > x), ∀x, y ≥ 0;

2. E[Nt] = λ.t, V ar[Nt] = λ.t

2.3.2 Poisson Martingales

Let {Nt}, t ≥ 0, be an F t-Poisson process of intensity λ on the filtered space (Ω, F, F t, P)

Proposition 2.3.2. The compensated Poisson process {Nt − λ.t}, t ≥ 0, is a martingale on the filtered probability space (Ω,F,F t,P).

Proof. Briefly, (Nt − Ns) is independent of F s for s ≤ t, consequently:

E[Nt − Ns | F s] = E[Nt − Ns]

= λ.(t − s), which implies that:

E[Nt | F s] − Ns = λ.(t − s), therefore,

E[Nt − λ.t | F s] = Ns − λ.(t − s),

2 Proposition 2.3.3. The process {(Nt − λ.t) − λ.t} is a martingale on the filtered probability space (Ω,F,P) 26 Chapter 2. Mathematical background

Proof. Briefly,

2 2 E[(Nt − λ.t) | F s] = E[(Nt − Ns + Ns − λ.t) | F s]

2 2 = E[(Nt − Ns) + (Ns − λ.t) + 2.(Nt − Ns)(Ns − λ.t) | F s]

2 2 = E[(Nt − Ns) ] + (Ns − λ.t) + 2.(Ns − λ.t)E[(Nt − Ns)]

2 2 2 = λ.(t − s) + λ .(t − s) + (Ns − λ.t) + 2.λ.(t − s)(Ns − λ.t)

2 = λ.(t − s) + (Ns − λ.s)

which implies that:

2 2 E((Nt − λ.t) − λ.t | F s) = (Ns − λ.s) − λ.s

Proposition 2.3.4. The process exp(Nt.ln(b) + (1 − b)λ.t) is a martingale on the filtered probability space (Ω,F,P), for b > 0.

Proof. Briefly, for s ≥ t, (Nt − Ns) is a Poisson random variable, with intensity λ(t − s), independent of the history F s. Consequently:

Nt.ln(b)+(1−b)λ.t Ns.ln(b)+(1−b)λ.(t) (Nt−Ns).ln(b) E[e | F s] = e .E[e | F s]

= eNs.ln(b)+(1−b)λ.(t).E[e(Nt−Ns).ln(b)]

ln(b) = eNs.ln(b)+(1−b)λ.(t).eλ(t−s).(e −1)

Therefore,

Nt.ln(b)+(1−b)λ.t Ns.ln(b)+(1−b)λ.s E[e | F s] = e 2.3. Poisson Process 27

2.3.3 Path properties of Poisson processes

Let {Nt}, t ≥ 0, be an F t-Poisson process of intensity λ on the filtered space (Ω, F, F t, P) We fix a time horizon 0 < T < ∞.

Theorem 2.3.5. (Quadratic variation of Poisson processes) If (Πn) is a refining sequence of partitions with kΠkn → 0 then, for all t ∈ [0,T ],

X 2 lim |Nt ∧t − Nt | = Nt. n i+1 i ti∈Πn,ti≤t

Theorem 2.3.6. (Total variation of Poisson processes) For a refining sequence (Πn) of parti- tions with kΠnk → 0, X lim |Nt ∧t − Nt | = Nt. n i+1 i ti∈Πn,ti≤t

Figure 2.5: Sample path of a Compound Poisson process of intensity 10 and marks uniformly distributed over 0-1 28 Chapter 2. Mathematical background

2.4 Levy processes

Levy processes represent a class of stochastic processes that include Poisson processes, Wiener processes. Recently, a lot of applications are relying on models based on these processes as for example in the field of finance and communication networks [58].

Let (Ω, F, F t, P) be a filtered probability space.

Definition 2.4.1. An Ft-adapted process {Xt, t ≥ 0}, with X0 = 0 a.s. is a Levy process, if:

1. (Independent Increments) For 0 ≤ s < t < ∞,Xt − Xs is independent of the history F s.

2. (Stationary Increments) For 0 ≤ s ≤ t, Xt − Xs is distributed similarly to Xt−s.

3. (Continuous in Probability) lim P [ω ∈ Ω: |X(t) − X(s)| ≥ ε] = 0, for all ε > 0. t→s

2.5 Stochastic Integrals and Itˆo’sFormula

Let (Ω,F,P) be a probability space with a finite time horizon T < ∞, a Brownian motion

(Wt)t∈[0,T ] and the filtration ( F t)t∈[0,T ] is the filtration generated by the Brownian motion

W P completed by null sets, that is F t = ( F t ) . Without loss of generality, we let F :=

F T [9, 11, 12, 36].

2.5.1 Stochastic integral for simple step integrand

Definition 2.5.1. The class of functions f : [0,T ] × ω → R with the properties: (1) f :(t, w) → f(t, w) is B ([0,T ]) ⊗ F − measurable,

(2) ft : w → f(t, w) is F t − measurable, for each t,

R T 2 (3) E[ 0 |f(s, w)| ds] < ∞ 2 define the class MT , in other words any product measurable, adapted and bounded f :(t, w) → R

2 is in MT . 2.5. Stochastic Integrals and Itˆo’sFormula 29

2 2 Definition 2.5.2. Let Mstep ∈ MT denote the class of simple step-function f of the form

m−1 X f(t, w) = ηi1(ti,ti+1](t) i=0

for some partition Π = {t0, t1, ..., tm} of [0,T ] and ηi, the value that ft takes for t ∈ (ti, ti+1],

2 belongs to L (Ω, F ti ,P ), ∀ i ∈ {0, 1, ... , m − 1}.

2 Definition 2.5.3. (Stochastic integral for simple step integrand) For f ∈ Mstep, the integral up to time t ∈ [0,T ] is defined as:

m−1 Z t X It(f) ≡ fs.dWs := ηi.(Wti+1∧t − Wti∧t). 0 i=0

2 Theorem 2.5.4. For simple step integrands f, g ∈ Mstep, the stochastic integral verifies the following properties:

2 (a) Linearity: For a, b ∈ R, af + bg ∈ Mstep and

It(af + bg) = a.It(f) + b.It(g).

(b) Martingale property: For t ∈ [0,T ],It(f) is a martingale, (c) Itˆoisometry: Z t 2 2 E[(It(f)) ] = E[ |fs| ds]. 0

2 n 2 n Theorem 2.5.5. For any f ∈ MT , there is a sequence (f )n∈N in Mstep such that, f converges to f in L2([0,T ] × Ω, B([0,T ]) ⊗ F, m ⊗ P ), then,

Z T n 2 n 2 kf − f kL2([0,T ]×Ω) = E[ |fs − fs | .ds] → 0 for n → ∞. 0 30 Chapter 2. Mathematical background

2.5.2 Stochastic integral for square integrable adapted integrands

2 n 2 n Definition 2.5.6. For f ∈ M , there is a sequence of f ∈ Mstep such that kf −f kL2([0,T ]×Ω) →

2 n 0. Hence (fn) is a Cauchy sequence in L ([0,T ] × Ω, m ⊗ P ), therefore (It(f )) is a Cauchy sequence in L2(Ω,P ) by the isometry and therefore has an L2-limit. We define

Z t 2 n It(f) ≡ fs.dWs := L − lim It(f ). 0 n→∞

2 R t Theorem 2.5.7. For f, g ∈ MT , the stochasric integral It(f) ≡ 0 fs.dWs verifies the following properties:

2 (a) Linearity: For a, b ∈ R, af + bg ∈ MT and

It(af + bg) = a.It(f) + b.It(g).

(b) Martingale property: For t ∈ [0,T ],It(f) is a martingale, (c) Itˆoisometry: Z t 2 2 E[(It(f)) ] = E[ |fs| ds]. 0

2 Theorem 2.5.8. (Continuity of the stochastic integral) For f ∈ MT , one can choose a (unique) version of the stochastic integral It(f) such that t 7→ It(f), t ∈ [0,T ] has continuous paths.

2.5.3 Further extension of the stochastic integral

Definition 2.5.9. The class of functions f : [0,T ] × Ω → R with the properties: (1) f :(t, w) → f(t, w) is B ([0,T ]) ⊗ F − measurable,

(2) ft : w → f(t, w) is F t-measurable, for each t,

R T 2 (3) E[ 0 |f(s, w)| ds] < ∞, 2 is the class Mloc.

2 2 Definition 2.5.10. (Stochastic integral for integrands from Mloc ) For f ∈ Mloc, the integral 2.5. Stochastic Integrals and Itˆo’sFormula 31 up to time t ∈ [0,T ] is defined as:

Z t Z t

It(f) ≡ fs.dWs := a.s. − lim 1[0,τn(ω)](s)fs.dWs. 0 n 0

2.5.4 Itˆoformula (with respect to Brownian motion)

Theorem 2.5.11. (Simplest version of Itˆo’sformula) Let F : R → R be two-times continuously differentiable, i.e. F ∈ C2(R), then we have a.s.

Z t Z t 0 1 F (Wt) − F (W0) = F (Ws).dWs + F ”(Ws).ds , t ∈ [0,T ]. 0 2 0 which can be expressed in a differentiable form, by:

1 dF (W ) = F 0(W ).dW + .F ”(W ).dt , t ∈ [0,T ]. t t t 2 t

Theorem 2.5.12. (Simplest version of the time-dependent Itˆoformula) Let F : [0,T ] × R → R, (t, x) 7→ F (t, x), be continuously differentiable in t and two-times continuously differentiable in x, i.e. F ∈ C1,2([0,T ] × R), then we have a.s., for t ∈ [0,T ]:

Z t ∂ Z t ∂ 1 Z t ∂2 F (t, Wt) = F (0,W0)+ F (s, Ws).ds+ F (s, Ws).dWs+ 2 F (s, Ws).ds , x = Ws. 0 ∂s 0 ∂x 2 0 ∂ x which can be expressed in a differentiable form, by:

∂ ∂ 1 ∂2 dF (t, W ) = F (t, W ).dt + F (t, W ).dW + F (t, W ).dt , x = W . t ∂t t ∂x t t 2 ∂2x t t

Example 2.5.13. (Geometric Brownian motion and its SDE) By applying the time-dependent Itˆoformula we obtain that the process:

1 S := s.exp(σW + (µ − σ2)t) t t 2 32 Chapter 2. Mathematical background with s ∈ (0, ∞) and with parameter σ > 0 and µ ∈ R is a solution to the SDE:

dSt = µStdt + σStdWt

with initial value S0 = s.

Definition 2.5.14. (Itˆoprocess) A process (Xt) is called an Itˆoprocess if it can be written as

Z t Z t Xt = x0 + αs.ds + σs.dWs, t ∈ [0,T ] 0 0 for x0 ∈ R, and product measurable and adapted functions αs = α(s, ω), σs = σ(s, ω) satisfying

Z T P [ |αs|ds < ∞] = 1, and 0 Z T 2 P [ |σs| ds < ∞] = 1. 0

Examples 2.5.15. (a) A Brownian motion W is an Itˆoprocess.

1,2 (b) If F ∈ C then Xt := F (t, Wt) is an Itˆo-process by Itˆo’sformula. R t (c) An absolutely continuous process of the form Xt = Xo + 0 α(s, ω)ds is an Itˆoprocess. (d) Every adapted process whose paths are continuously differentiable in t is an Itˆoprocess, ∂ since X (ω) = X (ω) + R t X(s, ω)ds. For instance, X = t is an Itˆoprocess. t 0 0 ∂t t

Definition 2.5.16. (Stochastic integral with respect to an Itˆoprocess) We define the stochastic integral with respect to an Itˆo-process X as

Z t Z t Z t fs.dXs := fsαs.ds + fsσsdWs 0 0 0

for suitable integrand functions fs = f(s, ω) which are such that the integrals on the RHS are

1 2 well defined; that means that fα ∈ L ([0,T ], ds) a.s. and fσ ∈ Mloc. 2.6. Stochastic differential equations 33

2.5.5 Itˆo’sformula (with respect to Itˆoprocesses)

Theorem 2.5.17. (time-dependent Itˆoformula with respect to Itˆoprocesses, one-dimensional)

For a function F ∈ C1,2([0,T ] × R) and an Itˆoprocess X it holds

∂ ∂ 1 ∂2 dF (t, X ) = F (t, X )dt + F (t, X )dX + F (t, X )(σ )2dt , t ∈ [0,T ]. t ∂t t ∂x t t 2 ∂2x t t

Theorem 2.5.18. (Multi-dimensional Itˆoformula with respect to two Itˆoprocesses) Let X and Y be two Itˆoprocesses, such that:

dXt = αt.dt + σt.dWt,

dYt = αt.dt + σt.dWt.

For a function F (x, y) ∈ C2(R × R) we have for t ∈ [0,T ],

∂ ∂ 1 ∂2 dF (X ,Y ) = F (X ,Y )dX + F (X ,Y )dY + F (X ,Y )(dX )2 t t ∂x t t t ∂x t t t 2 ∂2x t t t 1 ∂2 ∂2 + F (X ,Y )(dY )2 + F (X ,Y )(dX dY ). 2 ∂2y t t t ∂x∂y t t t t

Example 2.5.19.

d(Xt,Yt) = YtdXt + XtdYt + (dXtdYt) = (Ytαt + Xtαt + σtσt)dt + (Ytσt + Xtσt)dWt.

2.6 Stochastic differential equations

The inclusion of random effects in differential equations leads to Stochastic differential equations for which the solution has non-differentiable sample paths when a differential equation is forced by an irregular stochastic process such as a Gaussian [12, 13, 14].

Example 2.6.1. We can consider the molecular bombardment of a spec of dust on a water surface, which results in Brownian motion. Taking Xt as one of the components of the velocity 34 Chapter 2. Mathematical background of the particle, Langevin wrote the equation

dX t = −a.X + b.B dt t t for the acceleration of the particle. This equation represents the sum of a retarding frictional force depending on the velocity and the molecular forces represented by a white noise process

Bt, with intensity b which is independent of the velocity. The Langevin equation is written symbolically as a stochastic differential equation

dXt = −a.Xt.dt + b.dWt

where: dWt = Bt.dt.

It must be emphasized that a stochastic differential equation doesn’t have a well defined math- ematical meaning on its own and that it should always be interpreted as an integral equation with Ito or Stratonovich stochastic integrals. As for example, the previous SDE corresponding to the Langevin equation is represented by the following stochastic integral equation:

Z t Z t Xt = X0 − a.Xs.ds + b.dWs 0 0 where the second integral is an Ito stochastic integral.

In general the solution of an SDE inherit the nondifferentiability of sample paths from the Wiener Processes in the stochastic integrals.

If we apply a functional F to a given Itˆoprocess, say a Brownian motion (Wt ), we could use the Itˆoformula as a chain rule to obtain an SDE that is satisfied by F (t, Wt). But if we start on the other hand from a given SDE, we want to find the process that solves this SDE.

Example 2.6.2. The geometric Brownian motion solves the SDE:

dYt = Yt(µ.dt + σ.dWt) 2.6. Stochastic differential equations 35 and we know the solution Y explicitly from example 2.3.13.

A more systematic method than guessing the explicit solution in advance (and verifying it by the Itˆoformula) is to start by making an ansatz for the parametric form of the solution and to use, after an application of Itˆoformula, the method of matching the coefficients (of dt and dW

). For the above SDE for instance, the ansatz Xt = F (t, Wt) leads to the solution.

Example 2.6.3. (The Ornstein-Uhlenbeck process) Consider the SDE:

dYt = −αYt.dt + σ.dWt

with Y0 = y0, α, σ ∈ (0, ∞) and y0 ∈ R. For a and b being differentiable functions of time only with a > 0 and a(0) = 1, let the process X be given by

Z t Xt = a(t).(x0 + b(s).dWs) 0

If we want to choose the parametric form of X such that X is a solution to the SDE for Y, it is immediate to set x0 := y0. By the Itˆoformula we can compute dXt. Matching the coefficients with those from the SDE for Y, we arrive at the equations a(t)b(t) = σ and a0(t)/a(t) = −α. Solving for the functions a and b yields

a(t) = exp(−α.t) and b(t) = σ.exp(αt)

Substituting these into the general definition of X, Itˆoformula yields that X with these functions a and b indeed solves the SDE for Y . It is called the Ornstein-Uhlenbeck process. Chapter 3

Numerical Methods for SDEs

The analytical study of SDEs reveals to be very complicated in most of the situations and few stochastic differential equations are known to have an explicit solution. For this reason, numerical techniques play a very important role in finding approximate solutions to these equations in many fields of application, as for example finance, performance modelling and biological networks.

3.1 Examples of explicitly solvable SDEs

Below are listed some SDEs and their explicit solutions, these will be later on used for the validation of the numerical methods.

Additive noise

Constant Coefficients : homogeneous case

dXt = (a.Xt + b).dt + c.dWt

Z t a.t b −a.t −a.s Xt = e X0 + (1 − e + c e .dWs) a 0

36 3.1. Examples of explicitly solvable SDEs 37

Variable coefficients:

dXt = (a(t).Xt + b(t)).dt + c(t).dWt

Z t Z t −1 −1  Xt = Φt X0 + Φs b(s)ds + Φs c(s).dWs 0 0 with fundamental solution Z t  Φt = exp a(s)ds . 0

Multiplicative noise

Constant Coefficients : homogeneous case

dXt = a.Xt.dt + b.Xt.dWt

1 X = X expa − b2t + b.W  t 0 2 t

Constant Coefficients : inhomogeneous case

dXt = (a.Xt + c).dt + (b.Xt + d).dWt

Z t Z t −1 −1  Xt = Φt(X0 + (c − bd) Φs ds + Φs dWs 0 0 with fundamental solution 1 Φ = expa − b2t + b.W  t 2 t

Variable coefficients: homogeneous case

dXt = a(t).Xt.dt + b(t).Xt.dWt

Z t Z t 1 2   Xt = X0exp a(s) − b (s) ds + b(s).dWs 0 2 0 38 Chapter 3. Numerical Methods for SDEs

Variable coefficients: inhomogeneous case

dXt = (a(t).Xt + c(t)).dt + (b(t).Xt + d(t)).dWt

Z t Z t −1 −1  Xt = Φt,0 X0 + Φs,0(c(s) − b(s).d(s))ds + Φs,0d(s).dWs 0 0 with fundamental solution

Z t Z t 1 2   Φt,0 = exp a(s) − b (s) ds + b(s).dWs 0 2 0

(for more details please refer to [14])

3.2 Existence and uniqueness of strong solutions

When an explicit solution of the SDE is not available, it is important to know if the solution exists and if it is unique for some initial value X0, before proceeding with the numerical simula- tion. In this section, we discuss briefly the existence and uniqueness of solutions for stochastic differential equations [53, 54, 46]. We consider the case of one-dimensional stochastic differential equations (the same study can be generalised to the multi-dimensional case):

dXt = a(t, Xt).dt + b(t, Xt).dWt (3.1) where a(t,x), b(t,x) are two Borel-measurable functions defined from [0, ∞) × R into R and

W = {Wt; t ∈ [0, ∞)} is a one-dimensional Brownian motion on the filtered probability space

(Ω,F, Ft,P) for t ≥ 0 .

X = {Xt; t ∈ [0, ∞)} is a continuous stochastic process that satisfies the SDE above with initial value X0 which is F0-measurable. 3.2. Existence and uniqueness of strong solutions 39

3.2.1 Solutions for ODEs

When the function b(t,x) is equal to zero for all the values of x and t, the SDE reduces to an ordinary differential equation:

dXt = a(t, Xt).dt which can be written as: Z t Xt = X0 + a(s, Xs).ds (3.2) 0

The existence and uniqueness theorem ensures that we have a unique solution X(t, X0) by requiring a sufficient condition on the function a(t, x) that it verifies the Lipschitz condition:

|a(t, x) − a(t, y)| ≤ K.|x − y| for all x,y ∈ R, where K is a positive constant. The proof goes by considering the Picard- Lindelof iterations:

  (n+1) R t (n)  Xt = X0 + 0 a(s, Xs ).ds , n > 0 (0)  Xt = X0 , n = 0 and showing that it converges to the continuous solution of equation (3.2) and that it is unique.

To ensure the global existence of the solution (for all t > 0 ), the growth bound condition on a(t,x) must be satisfied: |a(t, x)|2 ≤ L.(1 + |x|2) for all x ∈ R, where L is a positive constant. Without this additional condition, the solution

3 1/3 might become unbounded after a small t. As for example, the solution x(t) = x0/(1 − 3t.x0)

4 3 of the ODE : dx/dt = x , which explodes at t = 1/(3.x0).

Although these conditions are sufficient for the existence and uniqueness of a solution they are not necessary. But their absence might lead us into solving differential equations that might 40 Chapter 3. Numerical Methods for SDEs not succeed to be solvable or that might reveal to have a continuum of solutions [59].

A similar theory was developed for SDEs where uniqueness and existence theorems based on Lipschitz-type conditions proved to be successful as in the ODE case.

3.2.2 Solutions for SDEs

Theorem 3.2.1. Consider the stochastic differential equation 3.1, if the functions a(t,x) and b(t,x) are locally Lipschitz-continuous in the space variable x then the strong uniqueness holds for this SDE.

Before working out the proof of Theorem 3.2.1, it is useful to state the definition of locally Lipschitz continuous functions and the Gronwall’s inequality due to the importance they play in this proof.

Definition 3.2.2. (Local Lipschitz condition) A function f : [0, ∞)×R → R is locally Lipschitz continuous with respect to its second argument if for every integer n ≥ 1 there exists a real constant Kn > 0 such that, for every t ≥ 0, | x |≤ n and | y |≤ n :

| f(t, x) − f(t, y) |≤ Kn. | x − y | .

Lemma 3.2.3. (Gronwall’s inequality) Assume that the continuous functions u: [0,T ] →

[0, ∞) and v: [0,T ] → R satisfy:

Z t u(t) ≤ v(t) + K. u(s).ds; ∀t ∈ [0,T ], 0 where K ≥ 0. Then the Gronwall inequality is:

Z t u(t) ≤ v(t) + K. v(s).eK.(t−s).ds; ∀t ∈ [0,T ]. 0

Proof of Theorem 3.2.1 . We suppose that on the probability space (Ω,F,P) there exist 2 strong 3.2. Existence and uniqueness of strong solutions 41 solutions X and X’ for the SDE (3.1) with respect to the same Brownian motion W, verifying the same initial condition X0 and with almost surely continuous sample path.

0 Let τN be the stopping time such that τN (ω) = inf{t ≥ 0; | Xt(ω) |≥ N} for N ≥ 1 and τN

0 0 another stopping time defined the same way but relative to X . We define SN as τN ∧ τN , for t ∈ [0,T ] we have that,

Z t∧SN X − X0 = a(s, X ) − a(s, X0 ).ds t∧SN t∧SN s s 0 Z t∧SN 0 + b(s, Xs) − b(s, Xs)).dWs 0

h Z t∧SN E[| X − X0 |2] =E | a(s, X ) − a(s, X0 ).ds t∧SN t∧SN s s 0 Z t∧SN 0 2 i + b(s, Xs) − b(s, Xs)).dWs | 0 using the inequality (v + u)2 ≤ 2(v2 + u2) we obtain:

Z t∧SN E[|X − X0 |2] ≤2E| a(s, X ) − a(s, X0 ).ds|2 t∧SN t∧SN s s 0 Z t∧SN 0 2 + 2E| b(s, Xs) − b(s, Xs)).dWs| 0 applying Cauchy-Schwarz inequality to the first term and the Ito isometry to the second term we get,

Z t∧SN E[|X − X0 |2] ≤2tE |a(s, X ) − a(s, X0 )|2.ds t∧SN t∧SN s s 0 Z t∧SN 0 2 + 2E |b(s, Xs) − b(s, Xs)| .ds 0 the local Lipschitz condition imply that,

Z t E[|X − X0 |2] ≤ 2(T + 1)K2 E|X − X0 |2.ds t∧SN t∧SN N s∧SN s∧SN 0 42 Chapter 3. Numerical Methods for SDEs

Finally by applying the Gronwall’s inequality where we take u(t) to be E[|X −X0 |2] and t∧SN t∧SN v(t) = 0, we obtain that E[|X −X0 |2] = 0. Then we deduce that {X ; 0 ≤ t < ∞} and t∧SN t∧SN t∧SN {X0 ; 0 ≤ t < ∞} are modifications of one another and thus are indistinguishable. By taking t∧SN the limit when N goes to ∞, we deduce that this conclusion is also valid for {Xt; 0 ≤ t < ∞}

0 and {Xt; 0 ≤ t < ∞}

Even though theorem 3.2.1 ensures the uniqueness of a strong solution, it does not guarantee its global existence. For this reason, a stronger condition than the local Lipschitz continuity will be required to confirm the existence.

Theorem 3.2.4. Consider the stochastic differential equation 3.1, if the assumptions below are satisfied: i- The functions a(t,x) and b(t,x) are jointly (L2) -measurable in (t, x) ∈ [0,T ] × R. ii- The functions a(t,x) and b(t,x) are globally Lipschitz-continuous in the space variable x,

∃K > 0 such that : |a(t, x) − a(t, y)| ≤ K|x − y|

|b(t, x) − b(t, y)| ≤ K|x − y| for all t ∈ [0,T ] and x,y ∈ R. iii- Linear growth bound conditions on the functions a(t,x) and b(t,x):

∃K > 0 such that : |a(t, x)|2 ≤ K2(1 + |x|2)

|b(t, x)|2 ≤ K2(1 + |x|2) for all t ∈ [0,T ] and x,y ∈ R.

2 iv- X0 is F0-measurable with E(|X0| ) < ∞.

then the SDE has a pathwise unique strong solution Xt on [0,T ] with

2 sup E |Xt| < ∞ 0≤t≤T

. 3.3. Stochastic discrete time approximations 43

For a proof of Theorem 3.2.4, please refer to [53].

Although Existence and uniqueness theorems provide a very useful tool to prove that stochastic differential equations satisfying certain conditions have a unique solution; these conditions are still sufficient but not necessary for the existence and uniqueness of solutions and a wider class of SDEs prove to have unique solutions as is obvious from the example below.

Example 3.2.5. We consider the SDE:

a dXt = |Xt| .dWt

where X0 = 0. In spite of the fact that the coefficient function |x|a is not differentiable at x = 0 for a ∈ (0, 1) and do not verify the local Lipschitz condition near x = 0 . It still has a unique solution Xt = 0 when a ≥ 0.5 but has infinitely many solutions when a ∈ (0, 0.5).

Example 3.2.6. We consider the SDE:

n − 1 dXt = .dt + dWt 2Xt

where X0 = x0 > 0 and n ≥ 2. The theorem 3.2.4 isn’t verified because of the singularity of the drift. In spite of this fact, the equation has a unique solution, the of dimension n.

3.3 Stochastic discrete time approximations

Due to the fact that explicit solutions of stochastic differential equations are only available in a limited number of cases, numerical techniques must be used. Different numerical approaches such as Euler, Milstein Scheme and variance reduction techniques will be introduced in the paragraphs below. [14, 15, 39]. 44 Chapter 3. Numerical Methods for SDEs

3.3.1 Types of approximation

There are two types of approximations for the solution of the stochastic differential equation: the strong approximation and the weak one. Depending on the application in hand, one must choose which one to adopt. If one is interested in a good pathwise approximation Y of X then the strong approximation is the suitable one and it is numerically implemented using scenario simulations which will be discussed later on in this chapter. Alternatively, in many applications the close pathwise approximation is not necessary and one might be only interested in a good approximation of the probability distribution or moments of functionals of the solution of the SDE then the weak approximation is his choice and it is modelled numerically using Monte Carlo algorithms.

3.3.2 Wagner-Platen expansion

The deterministic Taylor formula plays a vital role in approximating smooth functions around some expansion point to a certain level of accuracy. The stochastic version of this formula is the Itˆo’s formula, it provides a good approximation of functionals of diffusion processes. The iterated application of the Itˆo formula leads to the Wagner-Platen expansion also known as Itˆo-Taylor expansion. The latter proves to be very useful in the construction of discrete time approximations for the solution of stochastic differential equations. In the paragraph below, some Wagner-Platen expansions will be derived [56].

Let {Wt, t ∈ [0,T ]} be a Brownian motion on the filtered probability space (Ω,F,P) and {Xt, t ∈

[t0,T ]} a stochastic process verifying the SDE below:

Z t Z t

Xt = Xt0 + a(Xs).ds + b(Xs).dWs, (3.3) t0 t0 where we assume that the drift a and the diffusion coefficient b of the equation above are real-valued functions that satisfy the conditions of the existence and uniqueness theorem. Itˆo’s 3.3. Stochastic discrete time approximations 45 formula imply:

Z t  2  Z t ∂ 1 2 ∂ ∂ f(Xt) = f(Xt0 ) + a(Xs) f(Xs) + b (Xs) 2 f(Xs) .ds + b(Xs) f(Xs).dWs. t0 ∂x 2 ∂x t0 ∂x (3.4) We introduce the differential operators:

∂ 1 ∂2 L0 = a + b2 (3.5) ∂x 2 ∂x2 ∂ L1 = b (3.6) ∂x

Equation 3.4 can be written as:

Z t Z t 0 1 f(Xt) = f(Xt0 ) + L f(Xs).ds + L f(Xs).dWs, (3.7) t0 t0

By expanding the functions f = a and f = b using equation 3.7 and later on, by replacing the result in the SDE 3.3, we obtain:

Z t  Z s Z s  0 1 Xt = Xt0 + a(Xt0 ) + L a(Xz).dz + L a(Xz).dWz .ds t0 t0 t0 Z t  Z s Z s  0 1 + b(Xt0 ) + L b(Xz).dz + L b(Xz).dWz .dWs, (3.8) t0 t0 t0 Z t Z t

= Xt0 + a(Xt0 ) ds + b(Xt0 ) dWs + R0, t0 t0 where the remainder term R0 is equal to:

Z t Z s Z t Z s 0 1 R0 = L a(Xz).dz.ds + L a(Xz).dWz.ds t0 t0 t0 t0 (3.9) Z t Z s Z t Z s 0 1 + L b(Xz).dz.dWs + L b(Xz).dWz.dWs, t0 t0 t0 t0

Thus we obtained a Wagner-Platen expansion of the process Xt. We can apply the Itoˆ formula, equation 3.7, again but this time to the function f = L1b to obtain a finer version of the Wagner-Platen expansion:

Z t Z t Z t Z s 1 Xt = Xt0 + a(Xt0 ) ds + b(Xt0 ) dWs + L b(Xt0 ). dWz.dWs + R1, (3.10) t0 t0 t0 t0 46 Chapter 3. Numerical Methods for SDEs

where the remainder term R1 is equal to:

Z t Z s Z t Z s 0 1 R1 = L a(Xz).dz.ds + L a(Xz).dWz.ds t0 t0 t0 t0 Z t Z s Z t Z s Z z 0 0 1 + L b(Xz).dz.dWs + L L b(Xu).du.dWz.dWs (3.11) t0 t0 t0 t0 t0 Z t Z s Z z 1 1 + L L b(Xu).dWu.dWz.dWs. t0 t0 t0

The advantage of Platen-Wagner expansion is that it allows us to write a smooth function of a process as the sum of a finite number of multiple Itˆo integrals with constant integrands. Thus from one hand side one can benefit from the fact that those Itˆo integrals are martingales which have beautiful properties making the analytical study richer; on the other hand this expansion can be used to approximate the increment of the solution of the SDE over small intervals of time, thus contributing to the construction of efficient numerical algorithms used to solve this SDE [14, 15].

3.3.3 Convergence

As mentioned before, depending on the application in hand, one must choose the kind of convergence that his scheme must verify. In this paragraph, tools to quantify the accuracy of an approximation whether it is weak or strong, will be discussed.

Order of strong convergence

When strong convergence is considered, theoretically the error is estimated using the absolute error criterion:   ε(δ) = E |XT − YT | , (3.12)

where δ is the maximum time step size, T is the final instant of time, XT is the exact solution of the SDE and YT the discrete time approximation.

Definition 3.3.1. An approximating process Y δ converges in the strong sense to X at time T if: lim ε(δ) = 0. δ→0 3.3. Stochastic discrete time approximations 47

Y δ converges in the strong sense with order β ∈ (0, ∞] if there exists a constant K and a positive constant δ0 such that

δ β ε(δ) = E[|XT − YT |] ≤ K.δ (3.13)

for any time discretisation with maximum step size δ ∈ (0, δ0)

The order of strong convergence helps to classify different numerical schemes when discussing the pathwise closeness of their approximation compared to the exact solution.

Order of weak convergence

When the pathwise approximation is not necessary and the problem requires the computation of the moments of the solution of the SDE at the end of some interval [0,T ], or more generally the expectation of some function f of the solution E[f(XT )], the weak approximation is considered. As in the case of the strong approximation or scenario simulation, some error criterion is needed to classify the accuracy and efficiency of the numerical schemes. In the weak approximation (Monte Carlo simulation), the error criterion is usually given by:

ε(δ) = |E[f(XT )] − E[f(YT )]|, (3.14)

where f ∈ CP , CP being the set of all polynomials g: R → R, δ is the maximum time step size,

T is the final instant of time, XT is the exact solution of the SDE and YT the discrete time approximation.

Definition 3.3.2. An approximating process Y δ converges in the weak sense with order β ∈

(0, ∞] if for each f ∈ CP there exists a constant K and a positive constant δ0 such that:

δ β ε(δ) = |E[f(XT )] − E[f(YT )]| ≤ K.δ (3.15)

for any time discretisation with maximum step size δ ∈ (0, δ0). 48 Chapter 3. Numerical Methods for SDEs

3.4 Strong approximations

By a suitable truncation of the Wagner-Platen expansion discussed in the paragraph above, many numerical schemes for the strong approximation of the solution of the SDE 3.3 can be designed. The accuracy required will play the major role in determining at which term should the truncation take place. In this section, several schemes for scenario simulation will be studied.

3.4.1 Euler Scheme

Consider an SDE of the form:

dXt = a(Xt).dt + b(Xt).dWt (3.16)

where Wt is a Brownian motion process, a and b are functions of the process Xt and t ∈ [0,T ]. a is called drift coefficient and b is called diffusion coefficient. By truncating the Wagner-Platen expansion, equation 3.8, such that we include only the time and Wiener integrals of multiplicity one, we obtain the Euler scheme:

Xt+h = Xt + a(Xt).h + b(Xt).(Wt+h − Wt). (3.17)

Because Wt is a Brownian motion, its increment (Wt+h − Wt) verifies the normal distribution √ N(0,h), then (Wt+h − Wt) can be replaced by h.Z where Z is a N(0,1) random variable which can be generated by the mean of a normal random number generator.

To generate a sample path of Xt we divide the time interval into n subintervals of step size h = T/n and denote Xih by Xi where i = 1, 2, ..., n. Then we generate n independent normal random variables Z1, ..., Zn, N(0,1) distributed. And we compute recursively the values of Xi+1, starting from i = 0 until i = n − 1 by using:

√ Xi+1 = Xi + a(Xi).h + b(Xi). h.Zi+1. 3.4. Strong approximations 49

X0, ...., Xn is one sample path of Xt, called Euler scheme.

Order of convergence

When the drift and diffusion functions, a and b, satisfy the Lipschitz and linear growth condi-

1 tions, the Euler scheme has an order of convergence of 2 . However, for special cases of the drift and diffusion coefficients this scheme can exhibit a higher order of convergence. As for example when the diffusion coefficient b is a deterministic function of time, in this situation we speak of additive noise and the order of convergence reaches 1 with the Euler scheme [14, 15, 39].

Example

As an application of the Euler scheme, we consider the linear SDE with 2-dimensional noise, below:

1 2 dXt = a.Xt.dt + b.Xt.dWt + c.Xt.dWt

The solution of this differential equation is given by

b2 + c2 X = X expa − t + b.W 1 + c.W 2 t 0 2 t t

In this example, we take a = 1.5, b = 1, c = 1, X0 = 1 and the time interval [t0,T ] = [0, 1]. To simulate paths of this SDE using the Euler approximation, we proceed recursively starting from the initial value Y0 = X0 we iterate the next value using the equation:

1 2 Yi+1 = Yi + a.Yi.∆ + b.Yi.∆Wi + c.Yi.∆Wi

1 where we partition the time interval into N equidistant intervals ∆ = T/N, ∆Wi is the ith

1 2 increment of the Brownian motion Wt and ∆Wi is the ith increment of the Brownian motion

2 Wt . Each of the ∆Wi is obtained by using a normal random number generator to produce N(0, ∆) iid random variables. We obtain the figures 3.1, 3.2 by plotting the explicit and the numerical solutions corresponding 50 Chapter 3. Numerical Methods for SDEs to the same sample path of the Brownian motion. The only variable parameter between these figures is the value of the time increment ∆.

3.4.2 Milstein Scheme

Milstein scheme is more accurate and faster than Euler scheme. To derive Milstein Scheme, we consider the SDE:

dXt = a(Xt).dt + b(Xt).dWt and write the corresponding stochastic integral equation:

Z t+h Z t+h Xt+h = Xt + a(Xu).du + b(Xu).dWu. t t

By including an additional term to the truncated Wagner-Platen expansion, equation 3.10, considered in the Euler scheme, we obtain the discretized form below of the stochastic integral:

1 X = X + a(X ).h + b(X ).(W − W ) + .b(X ).b0(X ).((W − W )2 − h). t+h t t t t+h t 2 t t t+h t

The last term is nothing but the double integral:

Z t+h Z s Z t+h 1 0 L b(Xt). dWz.dWs = b(Xt).b (Xt). (Ws − Wt).dWs t t t (3.18) 1 = .b(X ).b0(X ).((W − W )2 − h). 2 t t t+h t

It is worth noting that in the general multidimensional case, the computation of the double integral might become much more complicated than the one-dimensional case, for a detailed discussion of this subject the reader is referred to [14].

To generate a sample path of Xt we divide the time interval into n subintervals of step size h = T/n and denote Xih by Xi where i = 1, 2, ..., n. Then we generate n independent normal random variables Z1, ..., Zn, N(0,1) distributed. And we compute recursively the values of Xi+1, 3.4. Strong approximations 51

Figure 3.1: Sample path of the solution for N = 64

Figure 3.2: Sample path of the solution for N = 1024 52 Chapter 3. Numerical Methods for SDEs starting from i = 1 until i = n − 1 by using:

√ 1 √ X = X + a(X ).h + b(X ). h.Z + b(X ).b0(X ). .( h.Z2 − h). i+1 i i i i+1 i i 2 i+1

X0, ...., Xn is one sample path of Xt, called Milstein scheme.

Order of convergence

In the one-dimensional case, when the coefficients a and b of the SDE verify the Lipschitz and linear growth conditions, the Milstein scheme converges in the strong sense with 1.0 for order of convergence. In the multi-dimensional case the situation is more complicated and the order 1 can decrease to . 2

Example: Brownian Exponential

As an application of the Milstein scheme, we consider the SDE of the Brownian exponential below:

dXt = a.Xt.dt + b.Xt.dWt

where X0 = 1. The solution of this differential equation is given by

b2 X = expa − t + b.W  t 2 t

In this example, we take a = 1.5, b = 2 and the time interval [t0,T ] = [0, 1]. To simulate paths of this SDE using the Milstein approximation, we proceed recursively starting from the initial value Y0 = X0 we iterate the next value using the equation:

b2 Y = Y + a.Y .∆ + b.Y .∆W + Y .((∆W )2 − ∆) i+1 i i i i 2 i i

where we partition the time interval into N equidistant intervals ∆ = T/N, ∆Wi is the ith increment of the Brownian motion Wt. Each of the ∆Wi is obtained by using a normal random 3.4. Strong approximations 53 number generator to produce N(0, ∆) iid random variables. We obtain the figures 3.3 and 3.4 by plotting the exact and the numerical solutions correspond- ing to the same sample path of the Brownian motion. The only variable parameter between these figures is the value of the time increment ∆.

3.4.3 Higher order schemes

By including more terms to the truncated Wagner-Platen expansion, equation 3.10, considered in the derivation of the Milstein scheme, we obtain more accurate schemes. While for some applications Euler or Milstein scheme’s accuracy is sufficient, for others a higher accuracy is necessary as for example when capturing the behavior of the paths in the tails of the log-return distribution in finance. The strong order 1.5 Taylor scheme is written as:

1 X = X + a(X ).h + b(X ).(W − W ) + .b(X ).b0(X ).((W − W )2 − h) t+h t t t t+h t 2 t t t+h t 0 1 0 1 2 00 2 0 + a (Xt).b(Xt).∆Z + .(a(Xt).a (Xt) + b (Xt).a (Xt).h + (a(Xt).b (Xt) 2 2 (3.19) 1 1 + b2(X ).b00(X ).(h.(W − W ) − ∆Z) + .b(X )(b(X ).b00(X ) 2 t t t+h t 2 t t t 1 + b02(X ))( (W − W )2 − h).(W − W ) t 3 t+h t t+h t

R t+h R s where ∆Z represents the double integral: t t dWz.ds.

3.4.4 Geometric Brownian motion simulation

In the example below, the simulation of an SDE with well-known explicit solution is provided to illustrate the convergence properties of the schemes studied above, especially the strong convergence properties of the Milstein scheme with respect to the Euler scheme [39]. Consider the Geometric Brownian motion:

dXt = a.Xt.dt + b.Xt.dWt 54 Chapter 3. Numerical Methods for SDEs

Figure 3.3: Sample path of the solution for N = 16

Figure 3.4: Sample path of the solution for N = 64 3.4. Strong approximations 55

The solution of this differential equation is given by

b2 X = X expa − t + b.W  t 0 2 t

In this example, we take a = 1.5, b = 2, X0 = 1 and the time interval [t0,T ] = [0, 1]. To simulate paths of the Geometric Brownian Motion using the Euler approximation, we proceed recursively starting from the initial value Y0 = X0 we iterate the next value using the equation:

Yi+1 = Yi + a.Yi.∆ + b.Yi.∆Wi

where we partition the time interval into N equidistant intervals ∆ = T/N and ∆Wi is the ith increment of the Brownian motion Wt. ∆Wi is obtained by using a normal random number generator to produce N(0, ∆) iid random variables. To simulate the Milstein approximation of the Geometric Brownian Motion we proceed the same way as in the Euler case but this time we use the recursive equation below:

b2 Y = Y + a.Y .∆ + b.Y .∆W + Y .((∆W )2 − ∆) i+1 i i i i 2 i i

The exact solution for the same sample path of the Brownian motion Wt is given by:

b2 X = X expa − t + b.W  i 0 2 i ti

We obtain the figures below by plotting the explicit and the numerical solutions corresponding to the same sample path of the Brownian motion. The only variable parameter between these figures is the value of the time increment ∆. It is obvious that the numerical approximations converge progressively toward the exact solution as the number of sub intervals increases. Furthermore, figure 3.5 illustrates the fact that the Milstein scheme provides stronger convergence properties than the Euler one. 56 Chapter 3. Numerical Methods for SDEs

(a) N = 4 (b) N = 16

(c) N = 64 (d) N= 256

(e) N = 1024 (f) N = 4096

Figure 3.5: GBM sample paths for different values of N 3.5. Weak approximations 57

Order of convergence

To approximate the order of convergence of both schemes, we simulate the same example as above but this time we take a = 1.5, b = 2, X0 = 1 and the time interval [t0,T ] = [0, 1]. We generate 3000 sample paths of this GBM, enabling us to compute the expectation of the

∆ absolute error ε(∆) = E[|XT − YT |] at time T = 1 for different values of ∆ ∈ (0, 0.25]. In the figures 3.6, 3.7, a Log-Log plot of ε(∆) with respect to ∆ is presented where it obvious that the data verify a line. By performing a regression analysis of the data available, we obtain the following results:

• In the case of the Euler scheme, we obtain:

log(ε(∆)) = −2.1 + 0.512372 log(∆)

, therefore the experimental order of convergence is: 0.51 ± 0.02, where 0.02 is the value of standard deviation on the slope of the line. We conclude that the empirical result is a very good approximation of the theoretical value 0.5.

• In the case of the Milstein scheme, we obtain:

log(ε(∆)) = −2.14 + 0.994486 log(∆)

, therefore the experimental order of convergence is: 0.99 ± 0.02, where 0.02 is the value of standard deviation on the slope of the line. We conclude that the empirical result is a very good approximation of the theoretical value 1.

3.5 Weak approximations

Similarly to the strong approximations, weak approximations of the solution of stochastic dif- ferential equations can be obtained from the Platen-Wagner expansion. The number of terms 58 Chapter 3. Numerical Methods for SDEs

Figure 3.6: Log-Log plot of the absolute error versus the time increment for the Euler scheme

Figure 3.7: Log-Log plot of the absolute error versus the time increment for the Milstein scheme 3.5. Weak approximations 59 included in the truncation of the expansion is dependant on the desired order of convergence.

However, in the weak approximation one is not interested in the paths of the process Xt but in the probability induced by this process. For this reason, reaching a certain order of convergence requires less number of terms of the expansion in the case of weak convergence than the strong convergence one.

3.5.1 Weak error criterion

The weak approximation is used to estimate functionals v of the solution of the SDE of the form:

v = E[f(Xt)].

We evaluate v using its raw Monte Carlo estimate, by generating a number N of discrete time

∆ weak approximations YT (wi) of the solution of the SDE at time T, where i ∈ {1, ... , N}, ∆ the time increment and wi ∈ Ω, and taking the sample average:

N 1 ∆ vN,∆ = Σ [f(YT (wi))]. N i=1

The weak error can therefore be written as:

εN,∆ = | vN,∆ − E[f(Xt)] | .

3.5.2 Euler scheme

Consider an SDE of the form:

dXt = a(Xt).dt + b(Xt).dWt (3.20) 60 Chapter 3. Numerical Methods for SDEs

where Wt is a Brownian motion process, a and b are functions of the process Xt and t ∈ [0,T ]. As in the case of the strong approximation, the Euler scheme takes the form:

Xt+h = Xt + a(Xt).h + b(Xt).(Wt+h − Wt). (3.21)

While the Euler scheme has 0.5 order of strong convergence under Lipschitz and linear growth conditions of the drift and diffusion function; under the same conditions it has an order one of weak convergence, thus illustrating the fact that for the same Wagner-Platen expansion truncation weak approximations have a higher order of convergence than strong approximations, for more details the reader is referred to [14, 15].

Simplified Euler scheme

Since in the weak approximation we aren’t interested in the paths of the solution but in the distribution of the solution of the SDE at some point in time. The Gaussian increment (Wt+h −

Wt) in 3.21 can be replaced by another random variable ∆Z with similar moment properties. Leading to the simplified version of the Euler scheme:

Xn+1 = Xn + a(Xn).h + b(Xn).∆Zn. (3.22)

where ∆Zn must be independent, Fn+1-measurable random variables with moments verifying the condition below:

3 2 2 | E(∆Zn) | + | E((∆Zn) ) | + | E((∆Zn) ) − ∆ |≤ K.∆ , (3.23) for some constant K. An example of this random variable is the Bernoulli random variable:

√ 1 P (∆Z = ± ∆) = . n 2 3.5. Weak approximations 61

3.5.3 Higher order schemes

Weak approximations of the solution of stochastic differential equations can be obtained from the Platen-Wagner expansion by including more terms in the truncated expansion until the desired order is reached. As for example the scheme below: The order 2.0 weak Taylor scheme is written as:

1 X = X + a(X ).h + b(X ).(W − W ) + .b(X ).b0(X ).((W − W )2 − h) t+h t t t t+h t 2 t t t+h t 1 1 + a0(X ).b(X ).∆Z + .(a(X ).a0(X ) + b2(X ).a00(X ).h2 + (a(X ).b0(X ) (3.24) t t 2 t t 2 t t t t 1 + b2(X ).b00(X ).(h.(W − W ) − ∆Z) 2 t t t+h t

R t+h R s where ∆Z represents the double integral: t t dWz.ds. Similarly to the Euler scheme a simplified version of the weak Taylor scheme can be constructed, providing a more efficient implementation sharing the same order of convergence as the weak Taylor scheme. 62 Chapter 3. Numerical Methods for SDEs

Figure 3.8: Q-Q Plot comparing the simulation result of the Naive Euler algorithm with the exact solution of the GBM Chapter 4

Parallel stochastic simulation

Many techniques are available for the numerical solution of SDEs, it is up to the user to decide on which one to rely in order to optimize the accuracy and precision of his results while taking into consideration the computational resources available. These techniques are computationally expensive and recurring to parallel computing, in order to reduce the computational time and avoid memory issues, seems to be necessary when dealing with most of the applications modelled using SDEs. In the sections below, two kind of parallelism will be explored: phase space parallelism and time parallelism. Depending on whether one is interested in the path of the stochastic solution or its law, one type of parallelism tend to be more advantageous than the other. When one is interested only in the law of the solution, the choice falls on the phase space parallelism whereas in the solution’s paths case one should consider the time parallelism.

4.1 Phase space Parallelism

Due to the probabilistic nature of the problem, the problem turns out to be embarrassingly parallel when one is interested in approximating the distribution of the solution, i.e., the weak approximation. This is due to the fact that in this case, one has to consider Monte Carlo techniques which are based on the concept of simulating a certain number of sample paths and averaging to obtain an expected value of a certain observable at a certain instant of time. Thus

63 64 Chapter 4. Parallel stochastic simulation the problem can be divided into subsystems of independent tasks consisting of one or many sample paths; where each subsystem is assigned to a different processor. In other words, we partition the sample space into several disjoint blocks running each on a different processor. Therefore static load balancing is inherent to the parallel version of this problem. The former properties of task independence and workload balance among the subsystems, result in em- barrassing phase space parallelism of these numerical(MC) techniques . The parallel version of the weak approximation of the solution of SDEs reveals to be necessary for applications where the interest lies in the behavior of the tail of the probability. This is due to the fact that a huge number of independent sample paths have to be simulated using a high order weak approximation scheme in the purpose of obtaining the desired level of accuracy.

4.1.1 Method

We consider the stochastic differential equation below:

dXt = a(t, Xt).dt + b(t, Xt).dWt (4.1) where a(t,x), b(t,x) are two Borel-measurable functions defined from [0, ∞) × R into R and

W = {Wt; t ∈ [0, ∞)} is a one-dimensional Brownian motion on the filtered probability space

(Ω,F, Ft,P) for t ∈ [0,T ].

X = {Xt; t ∈ [0,T ]} is a continuous stochastic process that satisfies the SDE above with initial value X0 which is F0-measurable.

Using a numerical scheme for solving SDEs as for example Euler, Milstein or higher order schemes introduced in chapter three, we generate a number N of paths of the solution of the SDE. The sample space explored must be divided into several blocks each one running on a different processor. Since the communication cost is minor due to the independence of the tasks, the parallelism is expected to be highly efficient. 4.1. Phase space Parallelism 65

4.1.2 Numerical simulation

Consider the Geometric Brownian motion:

dXt = a.Xt.dt + b.Xt.dWt

In this example, we take a = 4.0, b = 2.0, X0 = 1 and the time interval [t0,T ] = [0, 1]. To simulate paths of the Geometric Brownian Motion using the Milstein approximation, we proceed recursively starting from the initial value Y0 = X0 we iterate the next value using the equation: b2 Y = Y + a.Y .∆ + b.Y .∆W + Y .((∆W )2 − ∆) i+1 i i i i 2 i i

th where we partition the time interval into N equidistant intervals ∆ = T/N and ∆Wi is the i increment of the Brownian motion Wt. ∆Wi is obtained by using a normal random number generator to produce N(0, ∆) iid random variables. In this example, the weak approximation is used to estimate the expectation of the square of the solution of the SDE at time T:

2 v = E[(YT ) ].

We evaluate v by generating a number N = 300000 of paths of Yt and averaging over the sample

−4 of values of YT obtained. We fix the value of ∆ the time increment to 10 and take T = 1:

N 1 2 vN = Σ [YT ]. N i=1

The execution times are given in table 4.1; followed by a plot of the speedup versus the number of processors, displayed in figure 4.1. Due to the independence of the parallel tasks and the static load balancing resulting from the more or less evenly distributed workload over the processors at the beginning of the program, we can see clearly that the speedup acquired is excellent even when the number of processors is considerably big. It can be seen in figure 4.2 , that the system efficiency is almost one, independently of the number of processors which is no surprise at all. These results are expected and make from the parallel weak approximation a 66 Chapter 4. Parallel stochastic simulation trivial problem where optimal efficiency is a direct consequence of the embarrassingly parallel nature of Monte Carlo methods. The simulation above has been programmed on the Camelot cluster [57].

N processors 1 10 20 30 40 50 60

Timing (s) 492 52.9942 26.8462 17.7046 13.5005 10.6402 8.94209

Table 4.1: Timing for a different number of processors.

Figure 4.1: Speedup versus the number of processors 4.2. Parallelism in time 67

Figure 4.2: System efficiency versus number of processors

4.2 Parallelism in time

When considering simulations of sample paths over a long time scale where the integrand con- sists of an expensive function computational-wise, it turns out to be advantageous to consider parallelization in time. The algorithm considered in this thesis is the parareal algorithm; which was introduced by Lions et al. [41] for ODEs in year 2001 and modified mildly by Bal and Maday [42]. The parareal algorithm for SDEs was first explored by G. Bal in reference [43], where the Euler algorithm was chosen as a coarse solver. In this thesis, we aim at achieving strong convergence results using a higher order scheme than Euler, the Milstein scheme as a 68 Chapter 4. Parallel stochastic simulation coarse solver. Below, the parallelism in time will be explored for the simulation of ODEs and SDEs. Convergence will be discussed and numerical results will be presented.

4.2.1 The Parareal algorithm

The parareal algorithm was developed to speed up the simulation of certain types of differential equations through the use of parallel architectures. While the phase space parallelization is trivial and very easy to simulate, the situation is much more complicated when it comes to time parallelism. This is due to the fact that when simulating a time evolution equation on an interval [0,T ] with initial condition at t = 0, using time parallelism, we are supposed to divide the interval [0,T ] into N subintervals [Ti,Ti+1] where N corresponds to the number of processors considered and i ∈ {1, 2, ..., N}. But the problem arises when simulating each subinterval on a processor, the knowledge of the initial value of each subinterval is necessary for the simulation of the differential equation. The parareal algorithm face this problem through the use of a coarse solver that gives a rough estimation of the initial value on each subinterval [Ti,Ti+1]. A fine solver and propagator are used at a second step for a more accurate estimation. The parareal algorithm is divided into several steps:

• The first step consists of solving the differential equation sequentially on the interval [0,T ] using a numerical scheme with coarse time step 4T . The values of the solution obtained

will correspond to the initial value of the subintervals {[Ti,Ti+1], i ∈ {1, 2, ..., N}} of length 4T = T/N used in the following step.

• In the second step, the evolution equation will be solved in parallel on N processors. Each subinterval resulting from the coarse discretization of the interval [0,T ] in the previous step, will be discretised finely into M fine time intervals of length δt = 4T/M. The same numerical scheme or a higher order one will be used to provide a solution to the evolution equation on each subinterval.

• In the third step, the error between the coarse and fine discretization in the first and second step respectively, is propagated along the whole interval [0,T ] using the coarse 4.2. Parallelism in time 69

discretization numerical scheme of the first step.

• The following steps correspond to the repetition of the latter two steps (k − 1) times.

This algorithm replaces a serial coarse algorithm of convergence order m on the coarse grid(used in the first step) by an algorithm of order km on the same grid; which is a big improvement towards the accuracy of the solution. Furthermore, provided that we have a big number of processors at our disposal, this algorithm can turn an extremely expensive solution in terms of computational time, into a real time solution [42, 43, 44].

The details of the algorithm will be made clear and written explicitly in the sections below for the case of the ODE and SDE.

4.2.2 Ordinary differential equations

The parareal algorithm was first introduced by Lions et al. [41] to solve ODEs in parallel and was modified slightly by Bal and Maday [42]. In this section we provide a brief discussion of this algorithm for the case of the ODE, the reader is referred to [41, 42, 43] for more details.

Introduction

We consider the ordinary differential equation below:

dX(t) = a(t, X(t)).dt, (4.2)

where t ∈ [0,T ], T > 0 and X(t) is a vector in R with X(0) = X0. The function a(t, x) from [0,T ] × R to R verifies the Lipschitz condition and the growth bound condition :

|a(t, x) − a(t, y)| ≤ K.|x − y| (4.3)

|a(t, x)| ≤ K.(1 + |x|) (4.4) 70 Chapter 4. Parallel stochastic simulation for all x, y ∈ R and t ∈ [0,T ], where K is a positive universal constant.

The existence and uniqueness theorem ensures the uniqueness and global existence of the solu- tion X(t) of equation 4.2 when the conditions above are satisfied. In this paragraph, we introduce the notation for the coarse discretisation of equation 4.2 and some important inequalities useful for the proof of the convergence of the parareal algorithm. We divide the interval [0,T ] into N subintervals [T i,T i+1] of length 4T = T/N each where i ∈ {0, 1, ..., N}. Fine solver S(t, x): Throughout the convergence analysis, the fine solver, S(t, x) = X(t + 4T ), will be approximated by the solution function of the initial value problem specified by the ODE 4.2 starting at x at time t:

dX(t1) = a(t1,X(t1)).dt1, t1 ≥ t (4.5)

X(t) = x. (4.6)

n n Coarse solver S4(T ,X ): This solver approximates the function S by a discrete version S4, such that the coarse solution to 4.2 will be obtained iteratively from:

X(0) = X0, (4.7)

i+1 i i X = S4(T ,X ), 0 ≤ i ≤ (N − 1). (4.8)

Accuracy and order of convergence The accuracy of the discrete approximation above is determined by studying the term below:

n+1 n+1 n n n n X(T ) − X = S(T ,X(T )) − S4(T ,X ), (4.9) iteratively. Thus, if we suppose that the numerical scheme is of order m > 0 with some additional constraints on the function a, we have that:

n n m+1 sup |S(T , x) − S4(T , x)| ≤ K.(4T ) (1 + |x|). (4.10) 1≤n≤N;x∈R 4.2. Parallelism in time 71

Equation 4.3 implies that:

sup |S(t, x) − S(t, y)| ≤ (1 + K.4T ).|x − y| (4.11) t∈[0,T ];x,y∈R

Using equations 4.11, 4.10 we can easily derive the inequality below,

N m | X(T ) − X |≤ K.T.(∆T ) (1+ | X0 |). (4.12)

Parareal algorithm for ODEs

In this section, we present the parareal algorithm for ordinary differential equations and present a short analysis of convergence for a more detailed discussion please refer to [41, 43].

The main steps of the algorithm consist of the following: To start with, the interval [0,T ] will be

n n+1 divided into N coarse intervals [T ,T ]. Using the coarse solver in the section above, S4, we initialize the value of X at the beginning of every coarse interval [T i,T i+1], i ∈ {0, 1, ..., N − 1}, as follows:

0 X1 = X0, (4.13)

i+1 i i X1 = S4(T ,X1), 0 ≤ i ≤ (N − 1). (4.14)

i Then the algorithm reduces to a two-steps iterative process determining the values of Xk+1 , i ∈ {0, 1, ..., N − 1} at an iterative level k≥ 1.

i The values of Xk are used as a predictor on the N provided processors where a fine scheme, i+1 approximated by the exact solution S, is used to compute a value of Xk . We calculate the

i jumps by propagating the error across the time interval and add the values of the jumps to Xk

i to obtain a new value of Xk+1 , i ∈ {0, 1, ..., N − 1}. 72 Chapter 4. Parallel stochastic simulation

i Thus at an iterative level (k+1), Xk+1 is given by:

0 Xk+1 = X0, (4.15) k i+1 i i X i i i+1 Xk+1 = S4(T ,Xk+1) + (S(T ,Xj) − Xj ), 0 ≤ i ≤ (N − 1). (4.16) j=1

It is trivial to show that equation 4.16 is equivalent to:

i+1 i i i i Xk+1 = S4(T ,Xk+1) + δS(T ,Xk), 0 ≤ i ≤ (N − 1), (4.17) where the difference operator δS is given by:

i i i δS(T , x) = S(T , x) − S4(T , x). (4.18)

For clarity, the parareal algorithm can be summarised by the iterative steps below:

0 Xk = X0, for k ≥ 0, (4.19)

i+1 i i X0 = S4(T ,X0), (4.20)

i+1 i i i i Xk+1 = S4(T ,Xk+1) − δS(T ,Xk), for k ≥ 1, (4.21) for 0 ≤ i ≤ (N − 1).

Thus,

i+1 i+1 i i i i i i i i Xk+1 − X(T ) = [S4(T ,Xk+1) − S4(T ,X(T ))] + [δS(T ,Xk) − δS(T ,X(T ))]. (4.22)

After imposing restrictive regularisation conditions on the coarse solver S4 of order m and on the difference operator δS as well, an analytical study of this algorithm leads to:

N k k.m | Xk − X(T ) |≤ K.(T ) .(∆T ) | X0 | . (4.23) 4.2. Parallelism in time 73

Thus implying the fact that this algorithm replaces a scheme of order m by one of order km after k − 1 iterations [41,43].

4.2.3 Stochastic differential equations

The parareal algorithm for SDEs was first explored by G. Bal in reference [43], where the Euler algorithm was chosen as a coarse solver. In this thesis, we aim at achieving strong convergence results using a higher order scheme than Euler, the Milstein scheme as a coarse solver. An analysis of the convergence of this parareal algorithm will be undertaken. A proof of convergence concluding that the order of the algorithm is km after (k-1) iterations, will be provided below.

Introduction

To start with, let’s consider a stochastic differential equation of the form:

dXt = a(t, Xt).dt + b(t, Xt).dWt (4.24)

2 where X0 ∈ R such that E[(X0) ] < ∞ and Wt is a Brownian motion process defined on the probability space (Ω, F, P). The functions a ∈ C1,1(R+ × R) and b ∈ C1,2(R+ × R) are assumed to verify the Lipschitz and linear growth conditions below:

|a(t, x) − a(t, y)|2 + |b(t, x) − b(t, y)|2 ≤ K.|x − y|2, (4.25)

|a(t, x)|2 + |b(t, x)|2 ≤ K.(1 + |x|2), (4.26) for all x,y ∈ R, where K is a positive constant. Thus ensuring the existence and uniqueness of the solution to SDE 4.24.

We apply the Itˆo-Taylor expansion 3.10 introduced in the previous chapter, to the solution of 74 Chapter 4. Parallel stochastic simulation the SDE 4.24 at time T i+1

X(T i+1) = X(T i) + a(T i,X(T i)).4T + b(T i,X(T i)).(W (T i+1) − W (T i)) (4.27) 1 i i i+1 i 2 + 0.5 L b(T ,X(T )).((W (T ) − W (T )) − 4T ) + R1,

where the remainder term R1 is equal to:

Z T i+1 Z s Z T i+1 Z s 0 1 R1 = L a(z, X(z)).dz.ds + L a(z, X(z)).dWz.ds T i T i T i T i Z T i+1 Z s Z T i+1 Z s Z z 0 0 1 + L b(z, X(z)).dz.dWs + L L b(u, X(u)).du.dWz.dWs (4.28) T i T i T i T i T i Z T i+1 Z s Z z 1 1 + L L b(u, X(u)).dWu.dWz.dWs. T i T i T i

and the differential operators L0 and L1 are given by:

∂ 1 ∂2 L0 = a + b2 (4.29) ∂x 2 ∂x2 ∂ L1 = b (4.30) ∂x

Moreover, since we are dealing with the Milstein scheme as a coarse solver, Lipschitz-continuity assumptions will be imposed on the functions Ljb(t, x) and LjLj1 b(t, x) such that,

∃K > 0 such that : |Ljb(t, x) − Ljb(t, y)|2 ≤ K.|x − y|2 (4.31) |LjLj1 b(t, x) − LjLj1 b(t, y)|2 ≤ K.|x − y|2,

for all t ∈ [0,T ], x,y ∈ R and j, j1 = 0 or 1. 4.2. Parallelism in time 75

The algorithm for SDEs

The coarse solver, S4, is given by the Milstein scheme, such that:

i i i+1 S4(T ,X ) = X (4.32)

= Xi + a(T i,Xi).4T + b(T i,Xi).(W (T i+1) − W (T i)) 1 + .b(T i,Xi).b0(T i,Xi).((W (T i+1) − W (T i))2 − 4T ). 2

The fine solver, S, is approximated by the exact solution:

S(T i,X(T i)) = X(T i+1) (4.33) Z T i+1 Z T i+1 i = X(T ) + a(t, X(t)).dt + b(t, X(t)).dWt T i T i

As in the previous section, the parareal algorithm can be summarised by the iterative steps below:

0 Xk = X0, for k ≥ 0, (4.34)

i+1 i i X0 = S4(T ,X0), (4.35)

i+1 i i i i Xk+1 = S4(T ,Xk+1) − δS(T ,Xk), for k ≥ 1, (4.36) for 0 ≤ i ≤ (N − 1).

Convergence analysis

N In this section, we aim at studying the absolute error ηk at time T and iteration k. It is defined by:

N N ηk (4T ) = E[| Xk − X(T ) |]. (4.37) 76 Chapter 4. Parallel stochastic simulation

N N The absolute error ηk will be estimated from the root mean square εk via Jensen’s inequality:

N 2 N (ηk (4T )) ≤ εk , (4.38)

i where the root mean square error εk is given by:

i i i 2 εk = E[(Xk − X(T )) ]. (4.39)

Equation 4.38 implies that:

N N 1/2 ηk (4T ) ≤ (εk ) , (4.40)

From equation 4.22, we have:

i+1 i+1 i i i i i i i i Xk+1 − X(T ) = [S4(T ,Xk+1) − S4(T ,X(T ))] + [δS(T ,Xk) − δS(T ,X(T ))]. (4.41)

Thus,

i+1 i+1 i+1 2 2 εk+1 = E[(Xk+1 − X(T )) ] = E[(U + V ) ]. (4.42)

where:

i i i i U = S4(T ,Xk+1) − S4(T ,X(T )), (4.43)

i i i i V = δS(T ,Xk) − δS(T ,X(T )). (4.44)

Consequently, the three terms of the following equation will be studied in order to estimate the

i order of the root mean square error εk,

i+1 2 2 εk+1 ≤ E[U ] + E[V ] + 2.E[U.V ]. (4.45) 4.2. Parallelism in time 77

The root mean square of U

2 i i i i 2 E[U ] = E[(S4(T ,Xk+1) − S4(T ,X(T ))) ], (4.46)

Let:

i i i i i A(T ) = a(T ,Xk+1) − a(T ,X(T )),

i i i i i B(T ) = b(T ,Xk+1) − b(T ,X(T )), (4.47)

i i i 0 i i i i 0 i i C(T ) = 0.5(b(T ,Xk+1).b (T ,Xk+1) − b(T ,X(T )).b (T ,X(T ))) where b’(t,x) denotes the derivative of b(t,x) with respect to x. using equation 4.33, we get:

2 i i i i i+1 i E[U ] = E[((Xk+1 − X(T )) + A(T ).4T + B(T ).(W (T ) − W (T )) (4.48) + C(T i).((W (T i+1) − W (T i))2 − 4T ))2],

2 i i 2 i i i i 2 2 E[U ] ≤ E[((Xk+1 − X(T )) + 2.(Xk+1 − X(T )).A(T ).4T + A(T ) .4T (4.49) + (B(T i).(W (T i+1) − W (T i)))2 + (C(T i).((W (T i+1) − W (T i))2 − 4T ))2],

By using Lipschitz conditions 4.25, 4.26, 4.31 and Itoˆ isometry , we obtain:

2 i 0 i 2 i i 2 i E[U ] ≤ εk+1 + K .4T.εk+1 + K.4T .εk+1 + K.4T.εk+1 + 2.K.4T .εk+1, (4.50) 0 2 i = (1 + (K + K ).4T + 3.K.4T ).εk+1.

As a consequence the root mean square of U can be written as:

2 i E[U ] ≤ (1 + K1.4T ).εk+1, (4.51) 78 Chapter 4. Parallel stochastic simulation

where K’ and K1 is a positive constant.

The root mean square of V Equation 4.44 implies that:

2 i i i i 2 E[V ] = E[(δS(T ,Xk) − δS(T ,X(T ))) ], (4.52)

using equation 4.18, we obtain:

2 i i i i i i i i 2 E[V ] = E[(S(T ,Xk) − S4(T ,Xk)) − S(T ,X(T )) + S4(T ,X(T ))) ]. (4.53)

i i Let Y(t) be the solution of the SDE 4.24 starting at Xk at time T . From equations 4.27, 4.28 and 4.33 we deduce that:

2 2 E[V ] = E[(RY − RX ) ], (4.54) where,

Z T i+1 Z s Z T i+1 Z s 0 1 RX = L a(z, X(z)).dz.ds + L a(z, X(z)).dWz.ds T i T i T i T i Z T i+1 Z s Z T i+1 Z s Z z 0 0 1 + L b(z, X(z)).dz.dWs + L L b(u, X(u)).du.dWz.dWs (4.55) T i T i T i T i T i Z T i+1 Z s Z z 1 1 + L L b(u, X(u)).dWu.dWz.dWs. T i T i T i

Thus,

Z T i+1 Z s E[V 2] = E[( (L0a(z, X(z)) − L0a(z, Y (z))).dz.ds T i T i Z T i+1 Z s 1 1 + (L a(z, X(z)) − L a(z, Y (z))).dWz.ds T i T i Z T i+1 Z s 0 0 + (L b(z, X(z)) − L b(z, Y (z))).dz.dWs (4.56) T i T i Z T i+1 Z s Z z 0 1 0 1 + (L L b(u, X(u)) − L L b(u, Y (u))).du.dWz.dWs T i T i T i Z T i+1 Z s Z z 1 1 1 1 2 + (L L b(u, X(u)) − L L b(u, Y (u))).dWu.dWz.dWs) ]. T i T i T i 4.2. Parallelism in time 79

Z T i+1 Z s E[V 2] ≤ 8.E[( (L0a(z, X(z)) − L0a(z, Y (z))).dz.ds)2] T i T i Z T i+1 Z s 1 1 2 + 8.E[( (L a(z, X(z)) − L a(z, Y (z))).dWz.ds) ] T i T i Z T i+1 Z s 0 0 2 + 8.E[( (L b(z, X(z)) − L b(z, Y (z))).dz.dWs) ] (4.57) T i T i Z T i+1 Z s Z z 0 1 0 1 2 + 8.E[( (L L b(u, X(u)) − L L b(u, Y (u))).du.dWz.dWs) ] T i T i T i Z T i+1 Z s Z z 1 1 1 1 2 + 8.E[( (L L b(u, X(u)) − L L b(u, Y (u))).dWu.dWz.dWs) ]. T i T i T i

Using Cauchy-Schwarz inequality, Itoˆ0s isometry and Fubini’s theorem we obtain:

Z T i+1 Z s E[V 2] ≤ 4.4T 2. E[(L0a(z, X(z)) − L0a(z, Y (z)))2].dz.ds T i T i Z T i+1 Z s + 8.4T. E[(L1a(z, X(z)) − L1a(z, Y (z)))2].dz.ds T i T i Z T i+1 Z s + 8.4T. E[(L0b(z, X(z)) − L0b(z, Y (z)))2].dz.ds (4.58) T i T i Z T i+1 Z s Z z + 8.4T. E[(L0L1b(u, X(u)) − L0L1b(u, Y (u)))2].du.dz.ds T i T i T i Z T i+1 Z s Z z + 8. E[(L1L1b(u, X(u)) − L1L1b(u, Y (u)))2].du.dz.ds. T i T i T i

Taking into consideration the Lipschitz conditions 4.31, we obtain:

Z T i+1 Z s E[V 2] ≤ 4.K.4T 2. E[(X(z) − Y (z))2].dz.ds T i T i Z T i+1 Z s + 8.K.4T. E[(X(z) − Y (z))2].dz.ds T i T i Z T i+1 Z s + 8.K.4T. E[(X(z) − Y (z))2].dz.ds (4.59) T i T i Z T i+1 Z s Z z + 8.K.4T. E[(X(u) − Y (u))2].du.dz.ds T i T i T i Z T i+1 Z s Z z + 8.K. E[(X(u) − Y (u))2].du.dz.ds. T i T i T i 80 Chapter 4. Parallel stochastic simulation

X(t) and Y(t) are 2 solutions of the same SDE 4.24 on the time interval [T i,T i+1] starting at

i i different initial values X(T ) and Xk respectively. Thus from equation 4.34, we have that:

Z t Z t i X(t) = X(T ) + a(t, X(s)).ds + b(t, X(s)).dWs, T i T i (4.60) Z t Z t i Y (t) = Xk + a(t, Y (s)).ds + b(t, Y (s)).dWs, T i T i for t ∈ [T i,T i+1]. Therefore, using Lipschitz conditions 4.25, Itˆo isometry, Cauchy-Schwarz inequality and Fubini’s theorem, we obtain:

Z t 2 i i 2 2 E[(X(t) − Y (t)) ] ≤ 2.E[(X(T ) − Xk) ] + 2.E[( (a(t, X(s)) − a(t, Y (s))).ds) ] T i Z t 2 + 2.E[( (b(t, X(s)) − b(t, Y (s))).dWs) ], T i Z t i i 2 ≤ 2.εk + 2.K.(t − T ). E[(X(s) − Y (s)) ].ds T i Z t + 2.K. E[(X(s) − Y (s))2].ds, (4.61) T i Z t i 2 ≤ 2.εk + 2.K.4T. E[(X(s) − Y (s)) ].ds T i Z t + 2.K. E[(X(s) − Y (s))2].ds, T i Z t i 2 ≤ 2.εk + K2. E[(X(s) − Y (s)) ].ds T i

Gronwall’s inequality implies that:

2 i i E[(X(t) − Y (t)) ] ≤ 2.εk.exp(K2.(t − T )), (4.62) i ≤ 2.εk.exp(K2.4T ),

2 i E[(X(t) − Y (t)) ] ≤ K3.εk, (4.63)

i i+1 where K2 and K3 are positive constants and t ∈ [T ,T ]. 4.2. Parallelism in time 81

By replacing result 4.63 in equation 4.59, we get:

2 i 4 i 3 i 3 E[V ] ≤ K4.εk.4T + K4.εk.4T + K4.εk4T (4.64) i 4 i 3 + K4.εk4T + K4.εk.4T .

Then,

2 i 3 E[V ] ≤ K5.εk.4T (4.65)

where K5 is a positive constant.

Expectation of the cross term U.V We know from equations 4.43, 4.44 that:

i i i i U = S4(T ,Xk+1) − S4(T ,X(T )), (4.66) i i i i V = δS(T ,Xk) − δS(T ,X(T )).

More explicitly,

i i i i U = (Xk+1 − X(T )) + (a(Xk+1) − a(X(T ))).4T

i i i+1 i + (b(Xk+1) − b(X(T ))).(W (T ) − W (T )) (4.67)

0 i 0 i i+1 i 2 + 0.5(L b(Xk+1) − L b(X(T ))).((W (T ) − W (T )) − 4T ),

Z T i+1 Z s V = (L0a(z, X(z)) − L0a(z, Y (z))).dz.ds T i T i Z T i+1 Z s 1 1 + (L a(z, X(z)) − L a(z, Y (z))).dWz.ds T i T i Z T i+1 Z s 0 0 + (L b(z, X(z)) − L b(z, Y (z))).dz.dWs (4.68) T i T i Z T i+1 Z s Z z 0 1 0 1 + (L L b(u, X(u)) − L L b(u, Y (u))).du.dWz.dWs T i T i T i Z T i+1 Z s Z z 1 1 1 1 + (L L b(u, X(u)) − L L b(u, Y (u))).dWu.dWz.dWs. T i T i T i 82 Chapter 4. Parallel stochastic simulation

i i i i E[U.V ] = E[((Xk+1 − X(T )) + (a(Xk+1) − a(X(T ))).4T

i i i+1 i + (b(Xk+1) − b(X(T ))).(W (T ) − W (T )) (4.69) 0 i 0 i i+1 i 2 + 0.5(L b(Xk+1) − L b(X(T ))).((W (T ) − W (T )) − 4T ))

i i i i .(δS(T ,Xk) − δS(T ,X(T )))].

E[U.V ] = UV 1 + UV 2, (4.70) where,

i i i i UV 1 = E[((Xk+1 − X(T )) + (a(Xk+1) − a(X(T ))).4T ) (4.71) i i i i .(δS(T ,Xk) − δS(T ,X(T ))],

i i i+1 i UV 2 = E[((b(Xk+1) − b(X(T ))).(W (T ) − W (T ))

0 i 0 i i+1 i 2 + 0.5(L b(Xk+1) − L b(X(T ))).((W (T ) − W (T )) − 4T )) (4.72)

i i i i .(δS(T ,Xk) − δS(T ,X(T )))].

i i i i UV 1 = E[((Xk+1 − X(T )) + (a(Xk+1) − a(X(T ))).4T ) Z T i+1 Z s .( (L0a(z, X(z)) − L0a(z, Y (z))).dz.ds T i T i Z T i+1 Z s 1 1 + (L a(z, X(z)) − L a(z, Y (z))).dWz.ds T i T i Z T i+1 Z s (4.73) 0 0 + (L b(z, X(z)) − L b(z, Y (z))).dz.dWs T i T i Z T i+1 Z s Z z 0 1 0 1 + (L L b(u, X(u)) − L L b(u, Y (u))).du.dWz.dWs T i T i T i Z T i+1 Z s Z z 1 1 1 1 + (L L b(u, X(u)) − L L b(u, Y (u))).dWu.dWz.dWs)], T i T i T i 4.2. Parallelism in time 83

i i i i UV 1 = E[((Xk+1 − X(T )) + (a(Xk+1) − a(X(T ))).4T ) Z T i+1 Z s (4.74) .( (L0a(z, X(z)) − L0a(z, Y (z))).dz.ds)], T i T i

i i i i UV 1 ≤ E[| (Xk+1 − X(T )) + (a(Xk+1) − a(X(T ))).4T | Z T i+1 Z s (4.75) . | (L0a(z, X(z)) − L0a(z, Y (z))).dz.ds |], T i T i

Using Cauchy-Schwarz inequality , we obtain,

i i i i 2 1/2 UV 1 ≤ E[| (Xk+1 − X(T )) + (a(Xk+1) − a(X(T ))).4T | ] Z T i+1 Z s (4.76) .E[| (L0a(z, X(z)) − L0a(z, Y (z))).dz.ds |2]1/2, T i T i

Using equation 4.63 and Cauchy-schwarz inequality we obtain:

i 1/2 i 4 1/2 UV 1 ≤ [(1 + K7.4T ).εk+1] .[K3.εk.4T ] , (4.77)

where K7 is a positive constant.

4 5 i i 1/2 UV 1 ≤ [K3.(4T + K7.4T ).εk.εk+1] (4.78)

2 i i 1/2 UV 1 ≤ K8.4T .[εk.εk+1] (4.79)

where K8 is a positive constant.

Using Young’s inequality, we have:

i 3 i UV 1 ≤ K9.4T.εk+1 + K9.4T .εk. (4.80)

where K9 is a positive constant. 84 Chapter 4. Parallel stochastic simulation

Equation 4.72 implies that,

i i i+1 i UV 2 ≤ E[(b(Xk+1) − b(X(T ))).(W (T ) − W (T ))

i i i i .(δS(T ,Xk) − δS(T ,X(T )))] (4.81) 0 i 0 i i+1 i 2 + E[0.5(L b(Xk+1) − L b(X(T ))).((W (T ) − W (T )) − 4T ))

i i i i .(δS(T ,Xk) − δS(T ,X(T )))].

Using young’s inequality, we obtain:

i i i+1 i 2 UV 2 ≤ (1/2).E[((b(Xk+1) − b(X(T ))).(W (T ) − W (T ))) ]

i i i i 2 + (1/2).E[((δS(T ,Xk) − δS(T ,X(T )))) ] (4.82) 0 i 0 i i+1 i 2 2 + (1/2).E[(0.5(L b(Xk+1) − L b(X(T ))).((W (T ) − W (T )) − 4T ))) ]

i i i i 2 + (1/2).E[((δS(T ,Xk) − δS(T ,X(T )))) ].

Using equations 4.52, 4.65 and Itˆo’s isometry, we obtain:

i i 3 i 2 UV 2 ≤ (1/2).[K.εk+1.4T ] + [K5.εk.4T ] + (1/2).[K.εk+1.4T ]. (4.83)

i 3 i UV 2 ≤ [K10.εk.4T ] + [K10.εk+1.4T ]. (4.84)

where K10 is a positive constant.

We obtain the expectation of the cross term U.V from equation 4.70 by adding UV1 and UV2. Using inequalities 4.70 and 4.84, we obtain:

E[U.V ] = UV 1 + UV 2, (4.85)

i 3 i ≤K11.4T.εk+1 + K11.4T .εk. (4.86) 4.2. Parallelism in time 85

The root mean square error Equation 4.45 shows that the root mean square error is given by:

i+1 2 2 εk+1 ≤ E[U ] + E[V ] + 2.E[U.V ]. (4.87)

Using equations 4.51, 4.65 and 4.86, we get:

i+1 0 i 0 3 i εk+1 ≤ (1 + K .4T ).εk+1 + K .4T .εk. (4.88) where K’ is a positive constant.

i max Let ε0,max = max ε0. Since ε0 is bounded (Milstein scheme converges strongly with order 0≤i≤N one under the Lipschitz and linear growth assumptions imposed previously [14]), the binomial recurrence in 4.88, implies that:

i εi+1 ≤ ε . .(1 + K0.4T )(i−k).(K0.4T 3)k. (4.89) k+1 0,max k

For i + 1 = N and k  N, equation 4.89 becomes:

N εN ≤ ε . (1 + K0.4T )(N−k).(K0.4T 3)k. (4.90) k 0,max k

0 0 k k 2k ≤ ε0,max(1 + K .T ).((K ) .T .4T ), (4.91)

≤ K”.4T 2k. (4.92) where K” is a positive constant. consequently, using the inequality 4.40, the absolute error given in equation 4.37 verifies:

N 2k 1/2 ηk (4T ) ≤ (K”.4T ) , (4.93)

Thus,

N k E[| Xk − X(T ) |] ≤ C.4T , (4.94) 86 Chapter 4. Parallel stochastic simulation where C is a positive constant.

Therefore, the convergence analysis above shows that using the Milstein scheme of order one as a coarse solver we obtain a parallel algorithm of order k after k iterations. Whether a gen- eralisation of this algorithm for stiff equations exists is a question outside the scope of this thesis.

The convergence of this technique has been explored numerically on Camelot [57] and the re- sults are shown in the graph and table below. This experiment was done in order to check the convergence of the algorithm for different types of SDEs:

• The exact and numerical solution of a reflected SDE, simulated using the parareal algo- rithm are displayed in figure 4.4. The reflected boundary is at level zero, the drift is equal to 1.5 and variance b is equal to 2. The parameters of the parareal algorithm are set as following:

δt = 0.0025; T = 10; N = 4000; a = 1.5; b = 2; 4T = 0.025; Y [0] = 1.0,

• The exact and numerical solution of a Geometric Brownian motion, simulated using the parareal algorithm are displayed in figure 4.5. The drift is equal to 1.5 and variance b is equal to 2. The parameters of the parareal algorithm are set as following:for a GBM with the parameters below:

δt = 0.0025; T = 10; a = 1.5; b = 2; ∆T = 0.025; Y [0] = 1.0, 4.2. Parallelism in time 87

Figure 4.3: Parareal algorithm [43]

Figure 4.4: Parareal simulation of reflected SDE 88 Chapter 4. Parallel stochastic simulation

Figure 4.5: First Parareal simulation of GBM Chapter 5

Application: Second order stochastic fluid models

5.1 Queueing theory and fluid models

Queueing theory is one study of stochastic processing systems, whose primary elements are servers and customers. Customer arrival times and service times are assumed to possess some randomness, which necessitates the use of a variety of methods from the theory of stochastic processes to analyse these systems. Applications of the theory are wide-ranging, including man- ufacturing, telecommunications, computer networks and web-servers, internet traffic, inventory, and insurance/risk theory [16, 17]. The exact analysis of these stochastic processing networks turns out to be highly complex and impossible in some cases [18, 19].

Classical queueing theory generally involves single server systems or networks (many-server systems) with somewhat restrictive probabilistic assumptions on the distributions of the in- terarrival and service times, and on the service disciplines employed in a queueing network [20, 21, 22]. These restrictions exclude the use of such theory for many practical systems. The modern theory of stochastic processing networks, developed mostly in the last 15 years, attempts to address non-Markovian models, multiclass systems, and methods. This theory makes use of a hierarchy of approximate models as the basis for analysis and synthesis of

89 90 Chapter 5. Application: Second order stochastic fluid models such systems. In particular, the analytical theory associated with first order (functional law of large numbers) approximations called fluid models and second order (Functional central limit theorem) approximations called diffusion or Brownian motion models, has produced important insights in understanding how the performance of a multiclass network depends on different design and control parameters [23].

In considering fluid queue models, we are motivated to a large extent by the need for model- based performance and dependability analysis of high-speed communication networks [29, 31]. Since data carried by these networks are packaged in many small packets, it is convenient to model the flow as a fluid which enters and leaves the buffer according to randomly varying rates. In the second order models, the fluid flow is not considered constant in a given state, but it is defined by a mean flow rate and a variance. These models were introduced by Karandikar, Kulkarni [29] and Asmussen [28]. In these papers, a white noise factor was taken into con- sideration, representing the variability of the traffic during the transmission periods and the fluid level is described by a reflected Brownian motion modulated by a continuous time (CTMC) [25, 26, 27, 30].

5.2 Second order single fluid queue model

In this part, an analytical and numerical study of the second order model for a single fluid queue will be carried out. We will consider a fluid entering and leaving a queueing system consisting of a single server and an infinite size buffer. The amount of fluid in the buffer is represented by the stochastic process Q = {Q(t), t ≥ 0}. The input and output rates of fluid, λ(Z(t),Q(t)) and µ(Z(t),Q(t)) respectively, depend deterministically on an external environment and on a small component the white noise that is responsible of further randomness. This is modelled by a fluid process driven by a composite process {(Z(t),B(t)), t ≥ 0} where the Z component is a CTMC (with discrete state space S and generator J), and B = {B(t), t ≥ 0} is a standard white noise process [33, 34]. The variability of the traffic at time t is represented by the Brownian 5.2. Second order single fluid queue model 91 motion {B(t)} and a local variance σ(Z(t),Q(t)).

Figure 5.1: Single fluid queue

5.2.1 Pathwise construction of the dynamics of a single server fluid

queue

Two important performance measures in queueing theory are the queue length process Q = {Q(t), t ≥ 0} and the workload process W = {W (t), t ≥ 0}. In order to provide a pathwise construction of these processes, the notions of interarrival time ui and service time vi will be introduced in the paragraph below.

Let Q(t) be the amount of jobs present in the buffer at time t and Q(0) the fluid level of the buffer at time t = 0. The input and service rates in the buffer are denoted, respectively, by

th λ(Z(t),Q(t)) and µ(Z(t),Q(t)). We define the interarrival time ui between the (i−1) and the

th i job arrival, i ∈ {2, 3, ...} and u1 the time of the arrival of the first job to the buffer. Let vi be the time required to serve the ith job in the buffer including Q(0), i ∈ {1, 2, ...}. Beside being independent, each of the sequences of interarrival time and service time are independent and identically distributed with means respectively equal to 1/λ, 1/µ and coefficient of variation respectively equal to cu, cv.

We define two counting processes, the arrival process A = {A(t), t ≥ 0} and the service process S = {S(t), t ≥ 0} as: n X A(t) = sup{n : ui ≤ t, n > 0} i=1

n X S(t) = sup{n : vi ≤ t, n > 0} i=1

The arrival process A represents the number of jobs that reached the buffer during the interval [0, t] and the service process S represents the potential number of jobs served before time t 92 Chapter 5. Application: Second order stochastic fluid models assuming that the server is kept busy throughout that period of time.

Q(0), A and S provide the primitive data for the modelling of the processes Q and W. Obviously the queue length process Q can be written as:

Q(t) = Q(0) + A(t) − S(D(t)) (5.1)

where the busy time process D = {D(t), t ≥ 0} represents the duration of time on the interval [0, t] when the server is busy: Z t D(t) = 1Q(s)>0 .ds. (5.2) 0

The necessity of introducing D comes from the fact that the buffer might get empty during parts of the time interval [0, t], for this reason S(t) does not represent the number of jobs served during the interval [0, t] whereas S(D(t)) does. The idle time process I = {I(t), t ≥ 0} denotes the total amount of time when the server is idle during the time interval [0, t],

Z t I(t) = t − D(t) = 1Q(s)=0 .ds. (5.3) 0

A very important process, the workload process (measured in time units) at time t, W = {W (t), t ≥ 0}, is defined as follows:

W(t) = V (Q(0) + A(t)) − D(t). (5.4)

n P where: V (0) := 0,V (n) := vi, n ≥ 1. i=1 5.2. Second order single fluid queue model 93

5.2.2 Diffusion approximation and second order model

From the paragraph above, we conclude that the queue length process Q can be written as:

Q(t) = Q(0) + A(t) − S(D(t)) where: Z t D(t) = 1Q(s)>0 .ds. 0

We know from the strong law of large numbers that:

A(t)/t t−→→∞ λ,

S(t)/t t−→→∞ µ.

Thus, we rewrite the queue length process in terms of the centered version of the processes A and S, as follows: Q(t) = X(t) + Y (t), (5.5) where: X(t) = Q(0) + [A(t) − λ.t] − [S(D(t)) − µ.D(t)] + (λ − µ).t,

Y (t) = µ.(t − D(t)) = µ.I(t)

Since Q represents the buffer content, it is strictly positive during the busy period D(t) and null during the idle period I(t). Y is always positive, it increases only in the idle period when Q = 0. Thus we have the following relations:

Q(t) ≥ 0, (5.6)

dY(t) ≥ 0, Y(0) = 0, (5.7)

Q(t).dY(t) = 0, (5.8) 94 Chapter 5. Application: Second order stochastic fluid models for all t ≥ 0. The relations 5.5 to 5.8 define the one dimensional reflection mapping from X to (Q, Y ). Given a process X, the reflection mapping uniquely determines Q and Y as follows:

Y (t) = −inf[0,X(s)] 0≤s≤t

Q(t) = X(t) − inf [0,X(s)], (5.9) 0≤s≤t where Y acts as a regulator of X to ensure that Q is always confined to the positive orthant. Given an approximating process X1 of X, it doesn’t come as a surprise that the process Q1 below acts as an approximation for Q,

Q1(t) = X1(t) − inf [0,X1(s)], (5.10) 0≤s≤t

We introduce the processes AN (t) and SN (t):

√ AN (t) = (A(N.t) − N.λ.t)/ N,

√ SN (t) = (S(N.t) − N.µ.t)/ N.

∧ ∧ 2 2 Let A, S be two independent Brownian motions with drift zero and variances λ.cu and µ.cv respectively. By making use of the functional central limit theorem, the Skorohod representation theorem and the scaling property of Brownian motions without drift, we obtain that:

d ∧ AN (t) ≈ A(t)

d √ ∧ ∧ (A(t) − λt) ≈ N.A(t/N) =d A(t)

By going through the same previous steps, replacing the process A by S and approximating the busy time D(t) by (ρ ∧ 1).t, we obtain:

d ∧ d ∧ (S(D(t)) − µ.D(t)) ≈ S(D(t)) ≈ S((ρ ∧ 1).t). 5.2. Second order single fluid queue model 95

Therefore, we get: ∧ ∧ X1(t) = Q(0) + A(t) − S((ρ ∧ 1).t) + (λ − µ).t.

Thus the buffer content Q(t) can be described by the following reflected stochastic differential equation: dQ(t) = b.dt + σ.dB(t) + dY (t), where Q(0) = 0, B(t) is a standard Brownian motion, the drift b, in other words the net input rate at time t, is given by: b = λ − µ and the variance σ2 is given by:

2 2 2 σ = λ.cu + (λ ∧ µ).cv.

As mentioned earlier the stochastic regulator process Y = {Y (t), t ≥ 0} is introduced to ensure that the process Q never goes below the value zero.

The above derivation of the reflected Brownian approximation of the queue length Q, serves as a simplified and nonrigorous way to motivate the second order fluid model for single queues. Thus for an infinite capacity buffer and more general drift b and variance σ2 function of (Z(t),Q(t)), the queue length process in the second order fluid model is approximated by the

2 2 reflected Brownian motion starting at Q(0), with drift (λ − µ) and variance (λ.cu + (λ ∧ µ).cv). It satisfies the relation:

Z t Z t Q(t) = Q(0) + b(Z(s),Q(s)).ds + σ(Z(s),Q(s)).dBs + Y (t), 0 0 where: Z s Y (t) = − inf (0, (Q(0) + [b(Z(u),Q(u)).du + σ(Z(u),Q(u)).dBu])). 0≤s≤t 0

The RSDE equation above is very compact, while solving it numerically special cases of the 96 Chapter 5. Application: Second order stochastic fluid models drift function b and the variance σ will be considered. Sample paths will be generated using the techniques described in the previous chapter. The purpose of the numerical study of single fluid queues may not be for the sake of getting new results as it is for the sake of estimating the accuracy and precision of our program compared to some analytical results derived throughout the history for this model as for example in references [28, 29]. Thus after checking the liability of the program used in the one dimensional case of a fluid queue, we will discuss the steps required and handicaps met during the process of generalising it to the multidimensional case of stochastic networks, where numerical simulation becomes a necessity for the study of the problem.

5.2.3 Analytical study of second order fluid queues in a random

environment

We consider a server with infinite capacity buffer, a fluid arriving following a Brownian motion which is affected by an external environment. The dynamics of the environment is described by a continuous time Markov chain {Z(t), t ≥ 0} with discrete state space S = {1, 2, ..., n}. We assume that Z(t) is an irreducible CTMC with generator matrix J. The fluid level in the buffer is modelled by a Markov modulated Brownian motion Q(t) whose drift µi and variance

σi are dependent solely on the state i of the environment, where i ∈ S. An example of the sample paths of Z(t) and Q(t) is shown in the figure below; where the state space S is taken as {1, 2, 3, 4} and the drift µ3 is negative whereas µ1, µ2 and µ4 are positive.

Since the process Q(t) represents the content of the buffer, Q(t) cannot take negative values therefore a reflecting boundary condition at level zero is imposed on the Brownian motion modelling the length of the queue. The limiting distribution π of the process Z(t) on the state space S, is the solution to: π J = 0

P where πi = 1. The solution π is unique due to the assumption of the irreducibility of the i CTMC Z(t). 5.2. Second order single fluid queue model 97

Figure 5.2: Markov modulated process

When the process Z(t) reaches the steady state, the average drift of the process Q(t) can be P estimated by πiµi. In order to ensure that the level of the fluid in the infinite buffer doesn’t i grow beyond any finite limit the following condition has to be satisfied:

X πiµi < 0 (5.11) i

This condition is essential for the stability of the buffer content therefore the existence of a limiting distribution for the process {(Q(t),Z(t)), t ≥ 0}.

Notation

By hypothesis, the process {(Q(t),Z(t)), t ≥ 0} is a Markov process on the state space [0, ∞)×S.

Due to the fact that the boundary behaviour of the process {(Q(t),Z(t)), t ≥ 0} depends on the state j in S we divide the state space S into 6 subspaces:

2 • S0 = {i ∈ S : σi = 0, µi = 0},

2 • S0+ = {i ∈ S : σi = 0, µi > 0}, 98 Chapter 5. Application: Second order stochastic fluid models

2 • S0− = {i ∈ S : σi = 0, µi < 0},

2 • S++ = {i ∈ S : σi > 0, µi > 0},

2 • S+− = {i ∈ S : σi > 0, µi < 0},

2 • S+0 = {i ∈ S : σi > 0, µi = 0}.

Let n0, n0+, n0− and n+ and n∗ = n0+ + n0− be the respective cardinalities of S0, S0+,S0−,,

S+ = S++ ∪ S+− ∪ S+0 and S∗ = S0+ ∪ S0− . We denote by F(t,x,j;y,i) the cumulative distribution of {(Q(t),Z(t)), t ≥ 0}:

F (t, x, j; y, i) = P (X(t) ≤ x, Z(t) = j | X(0) = y, Z(0) = i), i, j ∈ S and x, y ∈ [0, ∞[ p(t,x,j;y,i) its density when x > 0 and l(t,j;y,i) its mass at the lower boundary x = 0. More care should be taken when defining the probability density due to the non-differentiability of the distribution F(x) at some points of discontinuities but practically for the infinite buffer case, no accumulated probability mass will occur other than at the lower boundary for this reason and to keep the analysis simple, we won’t recur to a more rigorous way of defining the probability density. Assuming that the stability condition mentioned in the previous paragraph is satisfied, we define the limiting distribution of {(Q(t),Z(t)), t ≥ 0} by:

F (x) = [F (x, 1),F (x, 2), ... , F (x, n)], with: F (x, j) = limF (t, x, j; y, i), i, j ∈ S and x, y ∈ [0, ∞[. t→∞

Similarly, we define the limiting density vector p(x) and mass vector l by:

p(x) = [p(x, 1), p(x, 2), ... , p(x, n)],

l = [l(1), l(2), ... , l(n)], 5.2. Second order single fluid queue model 99 with: p(x, j) = limp(t, x, j; y, i), i, j ∈ S and x, y ∈ [0, ∞[. t→∞ and: l(j) = liml(t, j; y, i), i, j ∈ S and y ∈ [0, ∞[. t→∞

Transient equations

Let g be a real function defined on [0, ∞) × S. Assuming that g(., i) is twice continuously differentiable. The generator L of the Markov process {(Q(t),Z(t)), t ≥ 0} is given by [54]:

n 1 ∂2 ∂ X Lg(x, i) = σ2 g(x, i) + µ . g(x, i) + g(x, j)J . 2 i ∂x2 i ∂x ij j=1

Therefore the Fokker-Planck equation applied to {(Q(t),Z(t)), t ≥ 0} implies that the transient distribution satisfies the PDE below:

∂ 1 ∂2 ∂ X F (t, x, j; y, i) = σ2 F (t, x, j; y, i) − µ . F (t, x, j; y, i) + F (t, x, k; y, i)J , ∂t 2 j ∂x2 j ∂x kj k when x > 0, t > 0. The transition density satisfies the PDE below:

∂ 1 ∂2 ∂ X p(t, x, j; y, i) = σ2 p(t, x, j; y, i) − µ . p(t, x, j; y, i) + p(t, x, k; y, i)J , ∂t 2 j ∂x2 j ∂x kj k when x > 0, t > 0. While at x = 0, we have the following boundary conditions:

∂ 1 ∂ ∂ X l(t, j; y, i) = σ2 p(t, 0, j; y, i) − µ . p(t, x, j; y, i) + l(t, k; y, i)J , ∂t 2 j ∂x j ∂x kj k for all j, with:

l(t, j; y, i) = 0 for j ∈ S+ ∪ S0+. 100 Chapter 5. Application: Second order stochastic fluid models

Stationary distribution

Assuming that the process {(Q(t),Z(t)), t ≥ 0} is stable, by taking the limit when t goes to infinity of the 3 PDEs above, we obtain that the limiting distributions and densities satisfy:

1 ∂2 ∂ X σ2 F (x, j) − µ . F (x, j) + F (x, k)J = 0, 2 j ∂x2 j ∂x kj k

1 ∂2 ∂ X σ2 p(x, j) − µ . p(x, j) + p(x, k)J = 0, 2 j ∂x2 j ∂x kj k when x > 0.

While at x = 0, we have the following boundary conditions:

1 ∂ X σ2 p(0, j) − µ .p(0, j) + l(k)J = 0, 2 j ∂x j kj k for all j, with:

l(j) = 0 for j ∈ S+ ∪ S0+.

The equations above can be written in the matrix form below:

∂2 ∂ F (x)Σ − F (x)D + F (x)J = 0, (5.12) ∂x2 ∂x

∂2 ∂ p(x)Σ − p(x)D + p(x)J = 0, (5.13) ∂x2 ∂x ∂ p(0)Σ − p(0)D + lJ = 0, (5.14) ∂x where the matrices D and Σ are defined by:

  µ1 0      µ2  D =   ,  .   ..      0 µn 5.2. Second order single fluid queue model 101

  1 2 σ1 0  2   1 2   σ2  Σ =  2  .  .   ..      1 2 0 2 σn

Spectral Representation of F(x)

We assume that: F (x) = eλ.xφ, where: λ is a scalar and φ is an n-dimensional vector. By replacing F (x) in Equation 5.13 and 5.14, we get:

φ[λ2Σ − λD + J] = 0, (5.15)

consequently, we obtain λ by solving the equation:

det[λ2Σ − λD + J] = 0, (5.16) and φ from equation 5.15. (λ φ) is called the (eigenvalue, eigenvector) of the equation 5.15.

We mention briefly the solution to equation 5.15, for more details and proofs please refer to [28, 29].

Theorem 5.2.1. Equation 5.15 has 2n+ +n∗ solutions. When the average drift d < 0, n+ +n0+ of these solutions have negative real parts, n+ +n0− −1 have positive real parts and one is equal to zero.

Assuming that the 2n+ +n∗ eigenvalues are distinct and that φi is the eigenvector corresponding to the eigenvalue λiwhere i ∈ {1, 2, , ... , 2n+ + n∗}; the spectral representation of F (x) is given by: 2n++n∗ X λix F (x) = αi.e .φi, (5.17) i=1 102 Chapter 5. Application: Second order stochastic fluid models

where αi are scalars determined by the boundary conditions and the normalizing condition for the probability distribution, as for example in our case of infinite buffer they are the solution of the system below:

αi = 0 for Re(λi) ≥ 0 (5.18)

n++n0+ X αj l − .φ = π for Re(λ ) < 0. (5.19) λ j i j=1 j

where lj = 0 for j ∈ S+ ∪ S0+.

5.2.4 Example of a single fluid queue and numerical results

The use of Monte Carlo techniques isn’t a necessity when studying a single fluid queue, solving the problem could be done via solving PDEs by finite difference techniques or finite element methods such as Crank Nicholson. But these numerical techniques for solving PDEs suffer from the curse of dimensionality, implementing them when the dimension of the stochastic process is above 3, turns out to be extremely demanding and computationally very expensive. For this reason, in high dimensions, Monte Carlo methods become necessary. In this part we proceed by solving the single fluid queue problem numerically via two ways the Fokker-Planck PDE and the stochastic simulation of the paths of the SDE and we compare the numerical results to validate the use of our software in higher dimensions where solving PDEs becomes unmanageable.

On-Off single fluid queue

We consider the case of an On-Off source where the fluid entering the buffer continuously alternate between two states, the first with positive drift the other with negative drift. In the figure below, a special case of such a process is considered. Mathematically, the environment process {Z(t), t ≥ 0} is described by a CTMC on the discrete state space S = {1, 2}, where state 1 corresponds to the On state and state 2 to the Off state. The generating matrix J is 5.2. Second order single fluid queue model 103

Figure 5.3: On-Off process

given by:   −a a   J =   , b −b

The drift matrix D and the variance matrix Σ can be written as:

  µ 0  1  D =   , 0 µ2

  1 2 σ1 0 Σ =  2  .  1 2  0 2 σ2 104 Chapter 5. Application: Second order stochastic fluid models

where, µ1 > 0 and µ2 < 0. The stationary distribution of the process Z(t) is:

 b a  π = a + b a + b

The parameters of the system are chosen in such a way that the average drift d is strictly negative, ensuring the existence of a steady state distribution for the process {(Q(t),Z(t)), t ≥ 0}: b a d = µ + µ < 0 1 a + b 2 a + b

Analysis through solving Fokker-Planck equation

In this paragraph, we consider the case when the drift matrix D1 is equal to:

  2 0   D1 =   , 0 −3 and the generator matrix of the process {Z(t), t ≥ 0} is given by:

  −1 1   J =   . 1 −1

The stationary distribution of the process {Z(t), t ≥ 0} is:

 1 1  π = 2 2

The steady state distribution 5.17 for the process {(Q(t),Z(t)), t ≥ 0} is determined ana- lytically by solving equations 5.15 , 5.16, 5.18 which in this case is not a difficult task but in general when the dimensionality of the matrix increases finding the eigenvalues and eigenvectors of equation 5.15 becomes very hard analytically and necessitates the use of specialised algo- rithms as for example [55]. Even the numerical techniques adapted to solve the Fokker-Planck equations suffer from the curse of dimensionality; for this reason, though time consuming, 5.2. Second order single fluid queue model 105 stochastic process simulation (chapter 3) seems to be necessary in high dimensions.

Analysis through stochastic process simulation

The reflected stochastic differential equation describing the dynamics of the buffer content Q(t) can be written as: dQ(t) = µ(Z(t)).dt + σ(Z(t)).dB(t) + dY (t), where as explained above, the stochastic process Y = {Y (t), t ≥ 0} can be expressed by:

Z s Y (t) = − inf (0, (Q(0) + [b(Z(t),Q(t)).dt + σ(Z(t),Q(t)).dB(t)])). 0≤s≤t 0

As mentioned before, in high dimensions solving Fokker-planck PDEs becomes intractable nu- merically and analytically in most of the cases. Recurring to the stochastic simulation of paths of the SDE describing the dynamics of the process (or Langevin equations) via methods ex- plained in chapter 3 becomes necessary. In this paragraphs, 150 000 paths of the SDE above have been simulated using Asmussen and Glynn algorithm in order to extract information about the steady state distribution of Q(t) and its transient properties like its expectation and variance as a function of time for several values of σi.

In the figures 5.4, 5.5, 5.6, 5.7, 5.8, 5.9 below we take the same values for the drift matrix

D1 and the generator matrix J as in the paragraph above. We compare the steady state distribution obtained analytically through solving the Fokker-Planck equation, with the one obtained through stochastic simulation of the paths of the SDE using Asmussen and Glynn algorithm in general and Euler algorithm when the variance is zero. The figures provide an assessment of the weak error of the numerical algorithm and show that the numerical solution converges towards the exact solution as expected. Special cases of the variances have been considered:

• σ1 = σ2 = 0, 106 Chapter 5. Application: Second order stochastic fluid models

• σ1 = 1, σ2 = 0,

• σ1 = 0, σ2 = 1,

• σ1 = σ2 = 1, in figure 5.9.

On the physical level, the figures below especially 5.11 show the crucial role that the variance plays in queueing network models and the special care one needs to undertake when selecting first order or second order fluid models to represent mathematically a physical process. As for example the congestion effect becomes much more significant when the variance increases which is illustrated clearly by the heavier distribution tails in figure 5.10 and the linear asymptotic growth of the expectation of the buffer content as a function of the variance in figure 5.11. Therefore when relying on mathematical models to quantize the performance of a service facility, it is crucial to take into consideration the effect of the variance on performance measures such as the length of the fluid queue and others.

Figure 5.4: Q-Q plot in the fluid flow case σ1 = σ2 = 0. 5.2. Second order single fluid queue model 107

Figure 5.5: Q-Q plot when σ1 = 1, σ2 = 0.

Figure 5.6: Analytical and numerical solution of F(x) versus x when σ1 = 0, σ2 = 0. 108 Chapter 5. Application: Second order stochastic fluid models

Figure 5.7: Analytical and numerical solution of F(x) versus x when σ1 = 1, σ2 = 0.

Figure 5.8: Analytical and numerical solution of F(x) versus x when σ1 = 0, σ2 = 1. 5.2. Second order single fluid queue model 109

Figure 5.9: Analytical and numerical solution of F(x) versus x when σ1 = σ2 = 1.

In the figures 5.10, 5.11 below we take the drift matrix D2 to be equal to:

  1 0   D2 =   , 0 −2 and the generator matrix of the process {Z(t), t ≥ 0} is given by:

  −1 1   J =   . 1 −1 110 Chapter 5. Application: Second order stochastic fluid models

Figure 5.10: Stationary distribution of the buffer content.

Figure 5.11: Expectation of the buffer content as a function of the variance. 5.3. Second order stochastic queueing network model 111

5.3 Second order stochastic queueing network model

In this section we will introduce the reflected Brownian motion RBM approximation for a stochastic fluid network specifically single class open generalized Jackson networks [62, 63, 64, 65, 66]. We will address the model at hand only, for information on queueing networks the reader is referred to [60]. We will discuss the extent at which the stochastic simulation of the single fluid queue, presented in section 5.2.4, can be generalized to the network case and the challenges faced. Note that, in this chapter, we will refer to semi martingale reflected Brownian motion SRBM as RBM. In other words RBM with drift or without drift are referred to by RBM.

5.3.1 Traffic equations for single class Generalized Jackson network

We consider a continuous time queueing network consisting of D nodes with infinite buffer capacity each. Let Ai(t) be the exogenous arrival renewal process at node i with arrival rate ai and Si(t) the service process at node i with service rate µi and a first-in-first-out (FIFO) service discipline.

Let A, S, a, µ be the d-dimensional vectors with respective components Ai(t), Si(t), ai(t), µi(t) where: i = 1 , ... , d. Jobs served at node i will either move to node j or exit the system with respective probabilities pij and pi0 where: d X pi0 = 1 − pij. j=1

The d × d substochastic matrix P formed by the elements pij is called the routing matrix of the queueing network. We only consider the case of single class open networks where the spectral radius of P is less than one. We assume that interarrival times, service times and routing decisions are independent se- quences of independent and identically distributed random variables. 112 Chapter 5. Application: Second order stochastic fluid models

Figure 5.12: A generalized Jackson network (GJN)

th Let the d-dimensional vector λ denote the effective inflow rate, whose i component λi, repre- sents the net inflow rate at node i. The traffic equations can be written as:

d X λi = ai + (λj ∧ µj).pji, i = 1, ... , d j=1

The traffic intensity ρi is given by:

ρi = λi/µi, i = 1, ... , d.

Nodes i that verify ρi < 1 are called nonbottleneck, the ones who don’t are called bottleneck.

Among the bottleneck nodes, the nodes for which ρi = 1 are called balanced bottleneck, the remaining ones are called strict bottleneck.

When all the components of the vector traffic intensity ρ verify ρi < 1, λ is strictly less than µ and the effective inflow rate solution to the traffic equation:

λ = a + P 0 (λ ∧ µ) is given by: λ = (1 − P 0)−1 a. 5.3. Second order stochastic queueing network model 113

5.3.2 Pathwise construction of the dynamics of a single class open

GJN

As mentioned previously in the single fluid queue case, an important performance measure in queueing theory is the queue length process Q = {Q(t), t ≥ 0}. In order to provide a pathwise construction of this process, the notions of interarrival time u(n) and service time v(n) will be introduced in the paragraph below.

Let Q(t) be a d-dimensional vector, such that each component Qi of Q represents the buffer

th content at node i at time t and Qi(0) the fluid level of the i node at time t = 0, where i = 1, ... , d. The input and service rates at node i are denoted, respectively, by the previously introduced net input rate λi and service rate µi, where i = 1, ... , d.

th th We introduce ui(n) as the interarrival time at node i between the (n−1) and the n job arrival and ui(1) the time of arrival of the first job to node i, where n ∈ {2, 3, ...} and i = 1, ... , d.

th Let vi(n) be the time required to serve the n job at node i including Q(0) where n ∈ {1, 2, ...} and i = 1, ... , d. As mentioned in the paragraph ahead, beside being independent, each of the sequences of interarrival time and service time are independent and identically distributed with finite means and finite coefficient of variation respectively equal to the d-dimensional vectors cu and cv. Nodes who doesn’t have exogenous arrivals, i.o. ai = 0, their corresponding coefficient of variation is zero cui = 0.

Consequently the two counting processes, arrival process A = {A(t), t ≥ 0} and service process S = {S(t), t ≥ 0} can be written as:

n X Ai(t) = sup{n : ui(k) ≤ t, n > 0} k=1

n X Si(t) = sup{n : vi(k) ≤ t, n > 0} k=1

The components Ai(t) of the d-dimensional arrival process A, i = 1, ... , d, represent the num- ber of jobs that reached buffer i during the interval of time [0, t]. The components Si(t) of the d-dimensional service process S , i = 1, ... , d, represent the potential number of jobs served at 114 Chapter 5. Application: Second order stochastic fluid models node i before time t assuming that the server is kept busy throughout the interval [0, t].

The routing choices of the lth job served at node j , l ∈ N+, are represented by a sequence of independent, identically distributed d-dimensional random vectors Cj(l) with state space

th d {e1, e2, ... , ed, 0}, where ei is the unit vector in the i direction of R .

j th C (l) taking the value ei denotes that the l job served at node j will move to node i, whereas the value 0 denotes that this job will leave the network. The probability of Cj(l) taking values ei or 0 is respectively pji and pj0. Let T i(l) be a d-dimensional vector such that:

l X T i(0) = 0,T i(l) = Ci(k), k=1

+ where l ∈ N∗ and i = 1, ... , d.

th i i The j component Tj (l) of T (l), denotes the cumulative number of jobs served at node i before and including the lth job which were routed to node j. Let T be a d × d dimensional matrix whose ith column correspond to the vector T i, where i = 1, ... , d.

Q(0), A and S provide the primitive data for the modelling of the process Q. Obviously the ith component of the queue length process Q can be written as:

d X j Qi(t) = Qi(0) + Ai(t) + Ti (Sj(Dj(t))) − Si(Di(t)), (5.20) j=1

where the busy time process Di(t) represents the duration of time on the interval [0, t] when the ith node is busy: Z t

Di(t) = 1Qi(s)>0 .ds, (5.21) 0 for t ≥ 0 and i = 1, ... , d.

The necessity of introducing D comes from the fact that buffers might get empty during parts of the time interval [0, t], for this reason S(t) does not represent the number of jobs served 5.3. Second order stochastic queueing network model 115

during the interval [0, t] whereas S(D(t)) does. The idle time process Ii(t) denotes the total amount of time when node i is idle, during the time interval [0, t],

Z t

Ii(t) = t − Di(t) = 1Qi(s)=0 .ds, (5.22) 0 for t ≥ 0 and i = 1, ... , d.

5.3.3 Diffusion approximation and second order model

From the paragraph above, we conclude that the queue length process Qi at node i can be written as:

d X j Qi(t) = Qi(0) + Ai(t) + Ri (Sj(Dj(t))) − Si(Di(t)), j=1 where: Z t

Di(t) = 1Qi(s)>0 .ds, 0 for t ≥ 0 and i = 1, ... , d.

We know from the strong law of large numbers that:

A(t) −→ at, t→∞

S(t) −→ µt t→∞

T (x) −→ P 0x. x→∞

Thus as for the single fluid queue case, we rewrite the queue length process in terms of the centered version of A, S and T as follows: 116 Chapter 5. Application: Second order stochastic fluid models

Q(t) = X(t) + (I − P 0)Y (t), (5.23)

X(t) = Q(0) + [A(t) − at] + (P 0S(D(t)) − P 0µD(t)) − [S(D(t)) − µ.D(t)]

+ (T 0(S0(D(t)))e − P 0S(D(t))) + (a − (I − P 0)µ)t,

Y (t) = µ(et − D(t)),

= µ.I(t),

for all t ≥ 0, where e is a d-dimensional vector whose all elements are equal to one and I is the identity matrix. An extended expression would be:

d X Qj(t) = Xj(t) + Yj(t) − pijYi(t), (5.24) i=1 where:

d X Xj(t) = Qj(0) + [Aj(t) − ajt] + pij(SiDi(t) − µiDi(t)) − [Sj(Dj(t)) − µj.Dj(t)] i=1 d d X X + (Tij(Si(Di(t))) − pijSi(Di(t))) + (aj − µj − pijµi)t, i=1 i=1

Yj(t) = µj(t − Dj(t)),

= µj.Ij(t), for all t ≥ 0 and j = 1, ... , d.

Since the queue length Q is strictly positive during the busy period D(t) and null during the idle period I(t). Y is always positive, it increases only in the idle period. Thus we have the following relations:

Qj(t) ≥ 0, (5.25) 5.3. Second order stochastic queueing network model 117

dYj(t) ≥ 0,Yj(0) = 0, (5.26)

Z t Qj(t).dYj(t) = 0, (5.27) 0 for all t ≥ 0 and j = 1, ... , d.

The relations 5.24 to 5.27 define an oblique reflection mapping from X to (Q, Y ) where the reflection matrix (I − P 0) is an M-matrix (P has spectral radius less than one since we only considered single class open generalized Jackson networks).

By proceeding similarly to the single fluid queue case in paragraph 5.2.2, we obtain that the queue length Q(t) can be described by the following reflected stochastic differential equation:

dQ(t) = b.dt + Λ.dB(t) + (I − P 0)dY (t), (5.28) where Q(0) = 0, B(t) is a d-dimensional standard Brownian motion, the drift vector b, in other words the net input rate at time t, is given by:

b = a − (I − P 0) µ, (5.29) and the covariance matrix Γ = Λ ΛT is given by:

d X 2 2 Γjk = (λi ∧ µi)[pij(δjk − pik) + cvi .(pij − δij)(pik − δik)] + aj.cuj .δjk, (5.30) i=1 where j = 1, ... , d and k = 1, ... , d.

Consequently the queue length in a single class open GJN can be approximated by a reflected Brownian motion with drift vector b given by 5.29 , covariance matrix Γ given by 5.30 and reflection matrix R equal to I − P 0.

While the above derivation of the RBM approximation for the single class open GJN, serves 118 Chapter 5. Application: Second order stochastic fluid models as a simplified, heuristic way to motivate the second order fluid model for the open GJN case. The reader is referred to paper [67] for a rigorous proof of the weak convergence of the RBM model in the heavy traffic regime, and to references [66, 68] where the convergence of the steady state distribution is covered as well.

5.3.4 Second order fluid network model analytically

Unlike the one-dimensional setting, a closed form solution of the stationary distribution of mul- tidimensional reflected Brownian motions can be derived in an extremely limited number of cases. In this section, we study analytically the RBM approximation of a single class open GJN. Examining the conditions of existence and uniqueness of its stationary distribution, deriving the basic adjoint relation characteristic of stationary distributions and determining its product form solution when available.

Existence and uniqueness of the stationary distribution

We know from the paragraph 5.3.3 that the queue length process Q = {Q(t), t ≥ 0} in a single class open GJN (with d queueing stations, an exogenous arrival rate vector a, a service rate vector µ, a routing matrix P and coefficients of interarrival and service times vectors equal respectively to cu and cv) can be modelled by a d-dimensional reflected Brownian motion which follows the reflected stochastic differential equation below:

dQ(t) = b dt + Λ dB(t) + R dY (t), (5.31) where Q(0) = 0, b ∈ Rd is the drift vector given by relation 5.29, Γ = Λ ΛT ∈ Rd×d is the positive definite covariance matrix given by relation 5.30, B(t) is a d-dimensional standard Brownian motion and R = I − P 0 is the reflection matrix. 5.3. Second order stochastic queueing network model 119

As stated in the previous paragraph, the process Q(t) is the solution to the following Skorokhod problem with input process X and initial value Q(0):

dQ(t) = dX(t) + R dY (t), (5.32) where: dX(t) = b.dt + ΛdB(t),

Qj(t) ≥ 0, (5.33)

dYj(t) ≥ 0,Yj(0) = 0, (5.34)

Z t Qj(t).dYj(t) = 0, (5.35) 0 for all t ≥ 0 and i = 1, ... , d.

The relations 5.32 to 5.35 define an oblique reflection mapping from X, a d-dimensional BM(b, Γ), to (Q, Y ) with reflection matrix R = (I − P 0). Since we only considered single class open GJN we deduce that P n n−→→∞ 0, P has nonnegative elements and a spectral radius strictly less than one, which implies that the reflection matrix R, R = (I − P 0), is an M-matrix, the matrix R−1 exists and has nonnegative coordinates [69] (for more details on M matrices please refer to [73]). When R is an M matrix, the strong solution Q to the Skorokhod problem (relations 5.32 to 5.35) associated with X, exists and is unique. The process Q is called a reflected Brownian motion RBM(b, Γ,R). For additional information please refer to [71, 72].

d The RBM Q(t) behaves like a Brownian motion inside the positive d-orthant R+ and is re- flected from its boundaries along a direction determined by the reflection matrix R, for the ith

d th boundary surface Fi = {x ∈ R+ : xi = 0} the reflection direction is the i column of matrix R, i = 1, ... , d. 120 Chapter 5. Application: Second order stochastic fluid models

Harrison and Reiman proved in paper [69] that when R is an M matrix, the stationary distri- bution of the reflected Brownian motion RBM(b, Γ,R) exists and is unique, under the stability condition below: R−1 b < 0, (5.36) we note that the inequality in condition 5.36 is meant coordinate-wise. Thus, in the single class open generalized Jackson network case where the reflection matrix R is always an M-matrix, condition 5.36 is necessary and sufficient for the existence and uniqueness of a steady state distribution of the RBM(b, Γ,R), for further details please refer to [75]. We recall from relation 5.29 that the drift b is equal to:

b = a − (I − P 0) µ.

It is useful to note that condition 5.36:

R−1 b < 0 for the existence and uniqueness of the steady-state distribution of the reflected Brownian motion RBM(b, Γ,R) with M-matrix R, equivalent to:

(I − P 0)−1 a < µ, is unsurprisingly the same as the condition for the existence and uniqueness of a stationary distribution for an open single class GJN, ”the traffic intensity is strictly less than one, ρ < 1,” (in other words: λ = (I − P 0)−1a < µ), the latter condition was proven in references [77, 78] for the GJN case.

The basic adjoint relation

The stationary distribution of an RBM(b, Γ,R) is bound to satisfy a weak form of an adjoint elliptic partial differential equation with oblique derivative boundary conditions. This equation 5.3. Second order stochastic queueing network model 121 is called the basic adjoint relation, BAR. Below, we derive BAR for the case considered through- out this chapter, a single class open GJN approximated by a multidimensional RBM(b, Γ,R) [74, 79]. We know from the paragraph 5.3.3 that the queue length process Q = {Q(t), t ≥ 0} in a single class open GJN (with d queueing stations, an exogenous arrival rate vector a, a service rate vector µ, a routing matrix P and coefficients of interarrival and service times vectors equal respectively to cu and cv) can be approximated by a d-dimensional reflected Brownian motion RBM(b, Γ,R): Q(t) = Q(0) + b dt + Λ B(t) + RY (t), (5.37) where Q(0) = 0, Y (0) = 0, b ∈ Rd is the drift vector given by relation 5.29, Γ = Λ ΛT ∈ Rd×d is the positive definite covariance matrix given by relation 5.30, B(t) is a d-dimensional standard Brownian motion and R = I − P 0 is the reflection matrix.

If the d-dimensional reflected Brownian motion Q has a stationary distribution, we know from the previous paragraph that it is unique. Let π be this stationary distribution and νi be the boundary distribution associated with this RBM. The measures π and νi are absolutely

d continuous relatively to Lebesgue measure dx on R+ and dσi on Fi respectively, i = 1, ... , d.

dπ d dνi We define their respective densities by: p0 = on R+ and pi = on Fi. Thus we can dx dσi write: Z t Z Z Eπ[ f(Q(s)).dYi(s)] = t. f(x).dνi(x) = t. f.pi.dσi, (5.38) 0 Fi Fi

th d where: Fi is the i boundary face, Fi = {x ∈ R+ : xi = 0}.

We know from relation 5.37, that Q(t) is a reflected Brownian motion. Itˆo’sformula implies

2 d that for any function f ∈ Cb (R+), we have:

d Z t Z t X Z t f(Q(t)) − f(Q(0)) = ∇f(Q(s)).dB(s) + Lf(Q(s)).ds + Dif(Q(s)).dYi(s), 0 0 i=1 0 (5.39) 122 Chapter 5. Application: Second order stochastic fluid models

for t ≥ 0, where the operators L and Di are defined by:

d d 1 X ∂2g X ∂g L g = Γ + b , 2 ij ∂x ∂x i ∂x i,j=1 i j i=1 i

Di g = Ri.∇g,

th where Ri is the i column of R, i = 1, ... , d. By taking the conditional expectation accross equation with respect to the stationary distribu- tion π of Q, we obtain:

d Z t X Z t 0 = Eπ[ Lf(Q(s)).ds] + Eπ[ Dif(Q(s)).dYi(s)] (5.40) 0 i=1 0

Applying Fubini theorem and using relation 5.38, we get:

d Z X Z t Lf(x).dπ(x) + t. Dif(x).dνi(x) = 0, (5.41) d R+ i=1 Fi

Thus, we obtain the basic adjoint relation:

d Z X Z Lf(x).dπ(x) + Dif(x).dνi(x) = 0. (5.42) d R+ i=1 Fi

Product form solution for the stationary distribution

Although we obtained the basic adjoint relation 5.42 to derive explicit forms of stationary distributions for d-dimensional reflected Brownian motions RBM(b, Γ,R). It is rarely the case when the steady state distribution can be determined explicitly even for low dimensions such as two, due to the difficulty of the problem. One special category of reflected Brownian motions RBM(b, Γ,R), proved to present a tractable form of the stationary distribution called product form, is described below. Let δ be a d × d matrix with diagonal elements equal to the diagonal elements of matrix R and Ω be a d × d matrix with diagonal elements equal to the diagonal elements of matrix Γ. All the non-diagonal elements of matrice δ and Ω are equal to zero. 5.3. Second order stochastic queueing network model 123

Harrison and Williams [62] proved that the steady state distribution of reflected Brownian motions RBM(b, Γ,R) with Minkowski reflection matrix (M-matrix), satisfying the condition of existence and uniqueness 5.36 , R−1b < 0, and the skew symmetry condition below:

2Γ = R δ−1Ω + Ω δ−1R0 (5.43) has a product form. Its density function is given by:

d Y π(x) = πi(xi), (5.44) i=1 where:

πi(xi) = ui.exp(−ui.xi), (5.45) and

−1 −1 ui = −2 Ω δ R b, (5.46)

d for: x ∈ R+ , xi ≥ 0, i = 1, ... , d. For more details please refer to papers [62, 75, 80]. Note that the M-matrix condition on R can be extended to completely-S matrices and the product form will still exist but we didn’t make this wider class of matrices assumption since we’re dealing with open single class GJN which are always M-matrices.

5.3.5 Second order fluid network model numerically

Multidimensional RBM have been researched for decades in many disciplines from mathemati- cal finance to operations research and computer and communication networks, to mention few. However, analytical tools are still lagging behind and don’t deliver much when approximating highly multidimensional problems [81, 82, 84]. In queueing networks if one moves away from the nice model of a Jackson network where the stationary distribution accepts a product form solution, to more complicated models of multi- class networks involving collaboration among servers, analytical explicit solutions become way 124 Chapter 5. Application: Second order stochastic fluid models unlikely to be found and one is naturally driven to consider numerical techniques in quest for solutions. On the numerical level, things are not way simpler. Unlike the one-dimensional case, in multi- dimensions options are more limited. Below, we present the numerical techniques available for the simulation of generalized Jackson networks in their reflected Brownian motion approxima- tion. Then we give the example of the numerical simulation of a tandem fluid queue.

Numerical techniques available

In this section, we introduce numerical techniques developed for the simulation of the RBM approximation of queueing networks in particular generalized Jackson networks. These tech- niques go beyond the ones used for the one-dimensional case because when moving to higher dimensions, we shift from the nice orthogonal reflection in the smooth positive ray (half-line)

2 to the oblique reflection in the non-smooth positive quadrant in R+ extending to the oblique

d reflection in the positive orthant in R+. This adds layers of difficulty to the algorithm design task over the one-dimensional model. As seen in the second order single fluid queue section of this chapter, many techniques exist for an accurate simulation of the one-dimensional model for RBM. As for example, finite difference or finite element methods such as Crank-Nicholson to solve the corresponding Fokker Planck PDE, naive Euler stochastic simulation of the RBM or Asmussen, Glynn and Pitman technique improved Euler scheme [85] to sample N paths of the RBM and compute the desired observable with decent accuracy and good convergence rates. In the multidimensional case of RBM approximation for GJN networks, with the oblique reflec-

d tion and non-smooth boundary domain the positive orthant R+ not many numerical techniques has been suggested. In 1992, Dai and Harrison [64] proposed the algorithm QNET to find the steady state distribution of reflected Brownian motion through solving the basic adjoint re- lation. Its drawbacks are that it relies on solving a partial differential equation BAR and numerical PDEs are famous for suffering from the curse of dimensionality. Therefore while it provides a solution in multidimensional settings it is not helpful in high dimensions [64, 76]. 5.3. Second order stochastic queueing network model 125

In 2015, a new algorithm was published by Blanchet and Chen [86] relying on time reversal, dominated coupling from the past and wavelength approximation of a Brownian motion to simulate the stationary distribution of RBM. Although promising this algorithm is still in a theoretical stage, its authors complain that at this stage its convergence time is extremely high and its implementation is still a research problem on its own.

Example of a tandem of fluid queues

In this section we consider a tandem queueing network composed of 3 servers as in figure 5.13 below:

Figure 5.13: Tandem of fluid queues

We know from paragraph 5.3.3 that a tandem queue in other words, a d-station generalized Jackson network in series can be modelled by a reflected Brownian motion. The queue length process Q = {Q(t), t ≥ 0} in a single class open GJN (with d queueing stations, an exogenous arrival rate vector a, a service rate vector µ, a routing matrix P and coefficients of interar- rival and service times vectors equal respectively to cu and cv) can be approximated by a d-dimensional RBM which follows the reflected stochastic differential equation below:

dQ(t) = b dt + Λ dB(t) + R dY (t), (5.47) where b ∈ Rd is the drift vector given by:

b = a − (I − P 0) µ, (5.48) 126 Chapter 5. Application: Second order stochastic fluid models

Γ = Λ ΛT ∈ Rd×d is the positive definite covariance matrix given by:

d X 2 2 Γjk = (λi ∧ µi)[pij(δjk − pik) + cvi .(pij − δij)(pik − δik)] + aj.cuj .δjk, (5.49) i=1

B(t) is a d-dimensional standard Brownian motion, R = I − P 0 is the reflection matrix, j = 1, ... , d and k = 1, ... , d.

In our example, the dimension d is equal to 3. We set the parameters of the tandem as follows:

cu[j]2 = cv[j]2 = 0.1,

a = [0.95 0 0], µ = [1 1 1], b =[0.05 0 0],

  1 0 0     R =  −1 1 0  ,     0 −1 1

  0.195 −0.1 0     Γ =  −0.1 0.2 −0.1  ,     0 −0.1 0.2

  0.441588 0 0     Λ =  −0.226455 0.38564 0      0 −0.259309 0.364361 where: j ∈ { 1, 2 , 3}. 5.3. Second order stochastic queueing network model 127

Thus,   −0.05   −1   R b =  −0.15  ,     −0.25

As mentioned previously, in high dimensions solving Fokker-planck PDEs becomes intractable numerically and analytically in most of the cases. Recurring to the stochastic simulation of paths of the SDE describing the dynamics of the process (or Langevin equations) via methods explained in chapter 3 becomes necessary. In this paragraphs, 10 000 paths of the RSDE 5.47 above have been simulated using naive Euler in order to extract information about the steady state distribution of Q(t) and its transient properties like its expectation and variance as a function of time.

In our example, the steady state distribution of Q(t) exists since the condition of its existence and uniqueness, R−1b < 0 coordinate-wise, is verified. We simulate this tandem queue in the first instance, starting from an empty buffer in the three queues of the tandem, then by taking the first buffer occupied, Q1(0) = 3, and the other two empty. Needless to say, the mean queue length, figures 5.14 and 5.15, and variance of the queue length, figures 5.16 and 5.17, of the three servers in both cases converge respectively to the same limit of the mean and variance of the stationary distribution of Q(t). And the stationary distribution presented in figures 5.18 and 5.19, is identical in both cases as well. Thus their stationary state was unaffected by the initial state of the queues as expected.

Although simulating special cases of tandems and networks, like Jackson network, can be done by the paths simulation above. In general when we add collaboration among servers which is the case in generalized Jackson networks, multidimensional numerical techniques are still way behind. One promising algorithm, Blanchet [86], which is still in a theoretical stage might extend those techniques relying on time reversal [61], dominated coupling from the past and wavelength approximation of a Brownian motion to simulate the stationary distribution of 128 Chapter 5. Application: Second order stochastic fluid models

RBM. The only algorithm for GJN that survives until now is Dai and Harrison’s [64] which moves to solving the BAR relation, thus falling again in the curse of dimensionality that numerical PDEs suffer from.

Figure 5.14: Mean queue length function of time with Q1(0) equal to 3 5.3. Second order stochastic queueing network model 129

Figure 5.15: Mean queue length function of time with Q1(0) equal to 0 130 Chapter 5. Application: Second order stochastic fluid models

Figure 5.16: Variance of the queue length function of time with Q1(0) equal to 3 5.3. Second order stochastic queueing network model 131

Figure 5.17: Variance of the queue length function of time with Q1(0) equal to 0 132 Chapter 5. Application: Second order stochastic fluid models

Figure 5.18: Stationary distribution with Q1(0) equal to 3 5.3. Second order stochastic queueing network model 133

Figure 5.19: Stationary distribution with Q1(0) equal to 0 Chapter 6

Conclusion

Throughout this thesis, the focus was mainly on studying potential parallel techniques for stochastic differential equations. Two types of parallelism were explored, the phase space par- allelism and time paralellism.

The phase space parallelism through block partitioning of the phase space, was the embarrass- ingly parallel scenario. Its study was done for the sake of clarity, completeness and due to its wide range of application from the field of application of this thesis, computer and communi- cation networks, to financial stock markets and biological networks.

The more interesting case of parallelism is the time parallelism which is the ultimate scenario if one’s interest lies in the strong approximation i.e. modelling paths of the solution of a stochas- tic differential equation. This kind of parallelism is complicated because while partitioning the time interval into several subdivisions running on a different processor, an initial value should be provided to every processor which is not feasible without the mathematical sophistication of the parareal algorithm. This algorithm, as mentioned previously was initiated in reference [41] for the ODE case and was studied extensively in the literature for the ODEs and PDEs. G. Bal in reference [43] has been extending it to the case of SDEs through the use of an Euler solver and proving its order of convergence k/2 at the kth iteration. Our work went into extending

134 135 this research, to cover the case of a parareal algorithm with a different solver, the Milstein solver for SDEs. A proof of convergence was provided to show an improvement from order k/2 to order k after k iterations over the coarse Milstein solver.

The second part of the thesis covered the second order fluid queues in the field of computer and communication networks more specifically the diffusion approximation of queueing networks. This chapter was divided into two sections, the first one covered the case of a single fluid queue, while the second one dealt with stochastic fluid networks. In the case of the single fluid queue, we explored its Brownian approximation, known as the second order or diffusion model, by presenting it from an analytical and numerical point of view. Analytically, we introduced the model by deriving its mathematical validation, presented the stochastic and partial differential equations behind the dynamics of its buffer content and derived its stationary distribution when it exists. We went further to explore the numerical techniques used to analyse computationally these models and presented numerical solutions for several examples of second order single fluid queues which proved to fit perfectly with their analytical solution thus reflecting the reliability of the computational programs used. The single fluid queue study was generalized in the second part of the chapter to cover the case of the queueing network especially the GJN network and its Brownian approximation. Analyt- ically, we presented the pathwise construction of the dynamics of an open single class GJN, the corresponding traffic equations and the mathematical validation of the reflected Brownian mo- tion approximation as a tool to model GJN networks. We examined the conditions of existence and uniqueness of its stationary distribution, derived the basic adjoint relation characteristic of its stationary distribution and determined its product form solution and the conditions of the latter’s availability. Numerically, we presented the available few numerical techniques to deal with RBM approx- imations of GJN networks, the most efficient being Dai and harrison’s technique. Then we generalized the stochastic scenario simulation used in the single fluid queue case to simulate paths of the multidimentional RBM approximating the queue length in a tandem fluid queue. This approach is efficient when dealing with simple tandem queues and special cases of networks 136 Chapter 6. Conclusion but it proves to be limited and doesn’t cover but a very narrow range of reflection matrices and therefore can’t be used to solve networks with elaborate collaboration among their stations as in the case of most GJNs.

Therefore in this thesis, we studied parallel techniques for SDEs. We presented several means to obtain faster simulations of these differential equations. We extended the parareal algorithm in the time parallelization for SDEs, to obtain an algorithm with higher order of convergence using a Milstein solver. We studied elaborately RBM approximation for single fluid queues and stochastic networks specifically GJN. We applied the stochastic scenario simulation, explored in the previous chap- ters in serial and parallel programming, to the case of stochastic networks which can produce interesting results in single fluid queues and simple network cases. But the study of RBM approximations for GJN networks in high dimensions, is still a chal- lenging active field of research analytically and numerically. Multidimentional RBMs are very difficult to solve analytically, they have been researched for decades in many disciplines from mathematical finance to operations research and computer and communication networks, to mention few. However as mentioned previously, analytical tools are still lagging behind and don’t deliver much when approximating highly multidimensional problems. As for example, finding tail asymptotics and closed form solutions for steady state distributions even stability conditions beyond the 3-dimensional setting, is still a challenging research question except for few rare cases. Leading Gamarnik to talk about the undecidability, in other words that a uni- versal algorithm doesn’t exist for the calculation of the steady state distribution, estimation of its tail decay rate and stability verification of a multidimensional reflecting process [81, 82, 83]. In queueing networks if one moves away from the nice model of a Jackson network where the stationary distribution accepts a product form solution, to more complicated models of multi- class networks involving collaboration among servers, analytical explicit solutions become way unlikely to be found and one is naturally driven to consider numerical techniques in quest for solutions. On the numerical level, things are not way simpler. Unlike the one-dimensional case, in mul- 137 tidimension options are more limited. The stochastic scenario simulations that we explored, although efficient in high dimensions, they prove to be limited to special reflection matrices like the orthogonal one and very few others. Dai and Harrison [64] proposed an algorithm to find the steady state distribution of reflected Brownian motion approximating GJNs through solving the basic adjoint relation. Its draw- backs are that it relies on solving a partial differential equation BAR and numerical PDEs are famous for suffering from the curse of dimensionality. Therefore while it provides a solution in the multidimensional setting it is not helpful in high dimensions [64, 76]. Recently, a new algorithm was published by Blanchet and Chen [86] relying on time reversal, dominated coupling from the past and wavelength approximation of a Brownian motion to simulate the stationary distribution of RBM. Although promising this algorithm is still in a theoretical stage, its authors complain that at this stage its convergence time is extremely high and its implementation is still a research problem on its own. As per future work, throughout this thesis while exploring analytical methods to handle mul- tidimensional RBMs I came along studies of RBMs steady state distribution and their corre- sponding decay rate using the theory of large deviations which would be an interesting direction to investigate [87, 88, 89]. On the numerical level, it would be interesting to examine Blanchet and Chen’s algorithm [86] and investigate practical ways of implementing it serially and in parallel, which is the only promising paths available at the moment, up to my knowledge, to handle GJN networks in very high multidimensional setting. Bibliography

[1] Peter E Kopp, M Capinski, Measure, Integral and Probability. Springer-Verlag, 1994.

[2] Williams, David, Probability with Martingales. Cambridge University Press, 1991.

[3] Frank Jones, Lebesgue Integration on Euclidean Space. Jones and Bartlett, 2000.

[4] David Pollard, A user’s guide to measure theoretic probability. Cambridge University Press, 2002.

[5] Sheldon Ross, Stochastic Processes. Wiley, 1995.

[6] Vidyadhar G. Kulkarni, Modeling and Analysis of Stochastic Systems. Chapman and Hall, 1996.

[7] Sheldon Ross, Introuction to probability models. Elsevier, 2007.

[8] L. Rogers, D.Williams, Diffusions, Markov Processes and Martingales. Cambridge Univer- sity Press, 2000.

[9] T. Mikosch, Elementary stochastic processes. World scientific, 1998.

[10] J. Michael Harrison, Brownian motion and stochastic flow systems. John Wiley and sons, 1985.

[11] L. Rogers, D.Williams, Ito calculus. Cambridge University Press, 2000.

[12] B. Oskendal, Stochastic differential equations. Springer, 2005.

138 BIBLIOGRAPHY 139

[13] G. Pavliotis, A. Stuart , Multiscale Methods: Averaging and Homogenization. Springer, 2008.

[14] P. Kloeden, E. Platen, Numerical Solution of Stochastic Differential Equations. Springer, 2000.

[15] E. Platen, N. Bruti-Liberati, Numerical solution of stochastic differential equations with jumps in finance. Springer, 2010.

[16] D. Bertsekas, R. Gallager, Data Networks. Prentice-Hall, NJ, 1992.

[17] D. D. Yao, Probability Models in Manufacturing Systems. Springer, New Yorks, 1994.

[18] M. Bramson, and J. Dai , Heavy traffic limits for some queueing networks. Ann. Appl. Probab. 11, 49-90, 2001.

[19] E. Uysal, An Overview of The Application of Heavy Traffic Theory and Brownian Approx- imations to the Control of Multiclass Queuing Networks. 2004.

[20] Baskett, F., Chandy, K. M., Muntz, R. R. and F. G. Palacios, Open, closed and mixed networks of queues with different classes of customers. Assoc. Comput. Mach. 22 248-260, 1975.

[21] F. P. Kelly, Networks of queues with customers of different types. J. Appl. Probab. 12, 542-554, 1975.

[22] Jackson, J. R., Networks of waiting lines. Oper. Res. 5 518-521, 1957.

[23] Gribaudo,M. and R. Gaeta, Efficient steady-state analysis of second-order fluid stochastic Petri nets. Perform. Eval. 63(9-10): 1032-1047, 2006.

[24] L. Rabehasaina and B. Sericola, A second-order Markov-modulated fluid queue with linear service rate. J. Appl. Probab. Volume 41, Number 3 , 758-777, 2004.

[25] M. Gribaudo, M. Telek, Fluid Models in Performance Analysis. SFM 2007: 271-317, 2007.

[26] M. Gribaudo, D. Manini, B. Sericola and M. Telek, Second order fluid models with general boundary behaviour. Annals of Operations Research, 2008. 140 BIBLIOGRAPHY

[27] W. Whitt, Stochastic-Process Limits. Springer, 2002.

[28] S. Asmussen, Stationary distributions for fluid flow models with or without Brownian noise. Comm. Statist. Stochastic Models, 11 , 21-49, 1995.

[29] R.L. Karandikar and V.G. Kulkarni, Second-order fluid flow models: Reflected Brownian motion in a random environment. Oper. Res. 43, 77-88, 1995.

[30] V.G. Kulkarni, Fluid models for single buffer systems. Frontiers in Queueing; Models and Applications in Science and Engineering, CRC Press, 321-338, 1997.

[31] A.I. Elwalid, D. Mitra, Statistical multiplexing with loss priorities in rate-based congestion control of high-speed networks. IEEE Trans. Commun., 42, 11, 2989-3002, 1994.

[32] A. Zisowsky, K. Wolter, Numerical Solution of Second Order Stochastic Fluid Models. TOOLS2000, 1999.

[33] K. Taylor, E. Howard, A Second Course in Stochastic Processes. Elsevier Science and Technology, 1981.

[34] N. U. Prabhu, Stochastic Storage Processes Queues, Insurance Risks, and Dams. Springer- Verlag, 1981.

[35] S. Chandrasekhar , Stochastic Problems in Physics and Astronomy. Reviews of Modern Physics, Vol. 15, No. 1, 1943.

[36] Stochastic Processes.

[37] J. Banks, J. Carson, B. Nelson, D. Nicol, Discrete-Event System Simulation. Pearson Education International, 2005.

[38] D. E. Knuth, The Art of Computer Programming. Massachusetts: Addison-Wesley, 1997.

[39] S. Asmussen, P. W. Glynn, Stochastic Simulation: Algorithms and Analysis. Stochastic Modelling and Applied Probability , Vol. 57, 2007.

[40] H. M. Taylor, S. Karlin, An Introduction to Stochastic Modeling. Academic Press, 1998. BIBLIOGRAPHY 141

[41] J.-L. Lions, Y. Maday, G. Turinisi, R´esolutiond’EDP par un sch´emaen temps ”parar´eel”. C. R. Acad. Sci. Paris S´er.I Math. 332, no. 7, 661-668, 2001.

[42] G. Bal, Y. Maday, A ”parareal” time discretization for non-linear PDE’s with application to the pricing of an american put. Recent developments in domain decomposition methods (Z¨urich, 2001), volume 23 of Lect. Notes in Comput. Sci. Eng., 189-202. Springer, Berlin, 2002.

[43] G. Bal, Parallelization in time of (stochastic) ordinary differential equations. 2005.

[44] K. Burrage, Parallel methods for systems of ordinary differential equations. 1995.

[45] D. Revuz, M. Yor, Continuous martingales and Brownian motion . Springer, 1998.

[46] I. Karatzas, S. Shreve. Brownian motion and . Springer, 1998.

[47] S. Shreve. Stochastic Calculus for Finance II: Continuous-Time Models. Springer, 2004.

[48] C. Dellacherie, P.-A. Meyer. Probabilities and Potential. North-Holland, 1978.

[49] O. Nikodym. Sur une g´en´eralisation des int´egrales de M. J. Radon”. Fundamenta Mathe- maticae, 15, 131-179, 1930.

[50] G. Weiss. Continuity of Stochastic Processes. J Appl Prob, 1975.

[51] M. Love. Probability Theory. Comprehensive Manuals of Surgical Specialties, Volumes 45- 46, Edition 4, Springer, 1978.

[52] Cramr H., Leadbetter M.R. Stationary and related stochastic processes: sample function properties and their applications. Wiley, 1967.

[53] P. Protter. Stochastic integration and differential equations. Springer, 2004.

[54] R. Situ, Theory of stochastic differential equations with jumps. Springer, 2005.

[55] M. Agapie, K. Sohraby, Algorithmic solution to second-order fluid flow. IEEE INFOCOM 2001, 3, 1261-1270, 2001. 142 BIBLIOGRAPHY

[56] E. Platen, W. Wagner, On a Taylor formula for a class of Itˆoprocesses. Probab. Math. Statist. 3 (1), 3751, 1982.

[57] Camelot cluster, http://aesop.doc.ic.ac.uk/help/grail.

[58] K. Sato, Lvy Processes and Infinitely Divisible Distributions. Cambridge University Press, 2011.

[59] V. I. Arnold, Ordinary Differential Equations. The MIT Press, 1978.

[60] F.P. Kelly, Networks of queues. Advances in applied probability 8 (2), 416-432, 1976.

[61] F. P. Kelly, Reversibility and stochastic networks. Wiley, 1979.

[62] J. M. Harrison, R. J. Williams, Multidimensional reflected Brownian motions having ex- ponential stationary distributions. Annals of Probability 15, 115-137, 1987.

[63] H. Chen, D. Yao, Fundamentals of queueing networks. Springer, 2001.

[64] J. G. Dai, J. M. Harrison, Reflected Brownian motion in an orthant: numerical methods for steady-state analysis. Annals of Applied Probability 2, 65-86, 1992.

[65] M. Miyazawa, Light tail asymptotics in multidimensional reflecting processes for queueing networks. TOP- an official journal of the spanish society of and operations research 19 (2), 233-299 2011.

[66] D. Gamarnik, A. Zeevi, Validity of the heavy traffic steady-state approximations in gener- alized Jackson networks. The Annals of Applied Probability 16 (1), 56-90, 2006.

[67] M. I. Reiman, Open queueing networks in heavy traffic. Math. Oper. Res. 9, 441458, 1984.

[68] A. Budhiraja, C. Lee, Stationary distribution convergence for generalized Jackson networks in heavy traffic. Math. Oper. Res. 34, 4556, 2009.

[69] J. M. Harrison, M. I. Reiman, Reflected Brownian motion on an orthant. Ann. Probab. 9, 302308, 1981. BIBLIOGRAPHY 143

[70] J. G. Dai, A. B. Dieker, Nonnegativity of solutions to the basic adjoint relationship for some diffusion processes. Queueing Syst. 68, 295303, 2011.

[71] M. I. Reiman, R. J. Williams, A boundary property of reflecting Brownian motions. Probability Theory and Related Fields, 77-87, 1988.

[72] J. G. Dai, R. J. Williams, Existence and uniqueness of semimartingale reflecting Brownian motions in convex polyhedrons. Theory of probability and its applications 40, 1-40, 1995.

[73] A. Berman, R. J. Plemmons, Nonnegative matrices in the mathematical sciences. Academic Press, 1979.

[74] R. J. Williams, Semimartingale reflecting Brownian motions in the orthant. IMA Volumes in Mathematics and Its Applications 71, 125137, 1995.

[75] J. M. Harrison, R. J. Williams, Brownian models of open queueing networks with homoge- neous customer populations. IMA 10, 147186, 1987.

[76] X. Shen, H. Chen, J. G. Dai, W. Dai, The finite element method for computing the sta- tionary distribution of an SRBM in a hypercube with applications to finite buffer queueing networks. Queueing Systems 42, 3362, 2002.

[77] K. Sigman, The stability of open queueing networks. Stoch. Process. Appl. 35, 1125, 1990.

[78] D. D. Down, S. P. Meyn, Piecewise linear test functions for stability and instability of queueing networks. Queueing Systems Theory Appl. 27, 205226, 1997.

[79] J. Dai, T. Kurtz, Characterization of the stationary distribution for a semimartingale reflecting Brownian motion in a convex polyhedron. Preprint, 1997.

[80] R.J. Williams, Reflected Brownian motion with skew symmetric data in a polyhedral do- main. Theor. Rel. Fields 75, 459485, 1987.

[81] D. Gamarnik, On deciding stability of constrained homogeneous random walks and queueing systems. Mathematics of Operations Research 27, 272293, 2002. 144 BIBLIOGRAPHY

[82] D. Gamarnik, Computing stationary probability distribution and large deviations rates for constrained homogeneous random walks. Mathematics of Operations Research 32, 257265, 2007.

[83] M. Sipser, Introduction to the Theory of Computability. PWS, 1997.

[84] M. Miyazawa, Light tail asymptotics in multidimensional reflecting processes for queueing networks. TOP 19, 233-299, 2011.

[85] S. Asmussen, P. Glynn, J. Pitman, Discretization error in simulation of one-dimensional reflecting Brownian motion. Ann. Appl. Probab. 5, 875896, 1995.

[86] J. Blanchet, X. Chen, Steady-state simulation of reflected Brownian motion and related stochastic networks. The Annals of Applied Probability 25 (6), 32093250, 1995.

[87] P. Dupuis, R.S. Ellis, A weak convergence approach to the theory of large deviations. John Wiley and Sons, 1997.

[88] D. Bertsimas, I. C. Paschalidis, J. N. Tsitsiklis, On the large deviations behavior of acyclic networks of G/G/1 queues. Annals of Applied Probability 8, 10271069, 1998.

[89] K. Majewski, Large deviations of stationary reflected Brownian motions, in stochastic networks: theory and applications. Oxford University Press, 1996.