Louisiana State University LSU Digital Commons

LSU Doctoral Dissertations Graduate School

3-25-2018 General Stochastic and Itô Formula with Application to Stochastic Differential Equations and Jiayu Zhai Louisiana State University and Agricultural and Mechanical College, [email protected]

Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_dissertations Part of the Numerical Analysis and Computation Commons, and the Probability Commons

Recommended Citation Zhai, Jiayu, "General Stochastic Integral and Itô Formula with Application to Stochastic Differential Equations and Mathematical Finance" (2018). LSU Doctoral Dissertations. 4518. https://digitalcommons.lsu.edu/gradschool_dissertations/4518

This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected]. GENERAL STOCHASTIC INTEGRAL AND ITOˆ FORMULA WITH APPLICATIONS TO STOCHASTIC DIFFERENTIAL EQUATIONS AND MATHEMATICAL FINANCE

A Dissertation Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Department of

by Jiayu Zhai B.S., Ludong University, 2009 M.S., Shanghai University, 2012 M.S., Louisiana State University, 2014 May 2018 Acknowledgements

This dissertation would be impossible without the contribution and support from several people. I would like to express my deepest appreciation to my advisers,

Professor Hui-Hsiung Kuo and Professor Xiaoliang Wan, for their guidance and encouragement throughout my PhD study in Louisiana State University. They set models for me to follow.

Meanwhile, I wish to thank Professors Hongchao Zhang and Professor Padman- abhan Sundar for their advises in my research, particularly in optimization theory and stochastic differential equations, respectively. I also want to thank Professor

Daniel Cohen and the Dean’s representative Professor Parampreet Singh for their help and instruction during my graduate study and dissertation writing. I appre- ciate them being the committee members of my general exam and thesis defence.

I want to thank the Department of Mathematics at Louisiana State University and its faculty, staff and graduate students for providing me a pleasant environment to study, discuss and work.

I also want to thank the programs that financially supported my research di- rectly. The work on Part II of my dissertation was supported by AFOSR grant

FA9550-15-1-0051 and NSF Grant DMS-1620026. The work on Chapter 8 was supported in part by an appointment with the NSF Mathematical Sciences Sum- mer Internship Program sponsored by the National Science Foundation, Division of Mathematical Sciences (DMS). This program is administered by the Oak Ridge

Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy (DOE) and NSF. ORISE is managed by

ORAU under DOE contract number DE-SC0014664.

ii I am extremely grateful to my mother for standing behind me. Her uncondi- tional love and help provide me an inexhaustible resource of energy to continue my research. Some of the credit for my success must go to my friends, who are always there whenever I need them.

This dissertation is particularly dedicated to my father Zhigang Zhai, who passed away in May 2017. I would never have achieved my success without his support and love, especially when I make critical decisions. Thank you and love you, Dad!

May you rest in peace!

iii Table of Contents

Acknowledgements ...... ii

Abstract ...... vi

Part I: General Stochastic Integral and Its Applications ...... 1

Chapter 1: Introduction ...... 2 1.1 Background ...... 2 1.2 and Brownian Motion ...... 4 1.3 Conditional Expectation and Martingale ...... 5 1.4 ItˆoIntegral and the ItˆoFormula ...... 8 1.5 ...... 12

Chapter 2: General Stochastic Integral and General ItˆoFormula 14 2.1 Stochastic Integral for Instantly Independent Stochastic Processes . 14 2.2 General Stochastic Integral ...... 25 2.3 ItˆoIsometry for General Stochastic Integral ...... 30 2.4 A General ItˆoFormula ...... 30

Chapter 3: Near-Martingale Property of Anticipating Stochastic Integration ...... 34 3.1 Near-Martingale Property ...... 34 3.2 Doob–Meyer’s Decomposition for Near-Submartingales ...... 37 3.3 General Girsanov Theorem ...... 39

Chapter 4: Application to Stochastic Differential Equations . . . 41 4.1 Stochastic Differential Equations for Exponential Processes . . . . . 41 4.2 Stochastic Differential Equations with Anticipative Coefficients . . . 43 4.3 Stochastic Differential Equations with Anticipative Initial Condition 46 4.4 Conditional Expectation of the Solution of Theorem 4.3.2 ...... 48

Chapter 5: A Black–Scholes Model with Anticipative Initial Con- ditions ...... 56 5.1 The Classical Black–Scholes Model ...... 56 5.2 A Black–Scholes Model with Anticipative Initial Conditions . . . . 58 5.3 Arbitrage-Free Property ...... 59 5.4 Completeness of the Market ...... 62 5.5 Option Pricing Formula ...... 64 5.6 Hedging Portfolio ...... 69

iv References ...... 70

Part II: Temporal Minimum Action Method for Rare Events in Stochastic Dynamic Systems ...... 73

Chapter 6: Introduction ...... 74 6.1 Large Deviation Principle ...... 74 6.2 Statistic Physics ...... 76 6.3 Freidlin-Wentzell Theory ...... 77

Chapter 7: Temporal Minimum Action Method ...... 80 7.1 Problem Description ...... 80 7.2 Introduction to the Temporal Minimum Action Method ...... 82 7.3 Reformulation and Finite Element Discretization of the Problem . . 83 7.4 Problem I with a Fixed T ...... 86 7.5 Problem II with a Finite T ∗ ...... 91 7.6 Problem II with an Infinite T ∗ ...... 96 7.7 A Priori Error Estimate for a Linear ODE System ...... 101 7.8 Numerical Experiments ...... 104

Chapter 8: Temporal Minimum Action Method for Time Delayed System ...... 109 8.1 Introduction of Large Deviation Principle for Stochastic Differential Delay Equations ...... 109 8.2 Penalty Method ...... 110 8.3 Variational with Delay for Finite Element Approximation . 112 8.4 Computation of the Gradient of Sˆ ...... 114 8.5 Numerical Examples ...... 116

References ...... 119

Vita ...... 122

v Abstract

A general stochastic integration theory for adapted and instantly independent stochastic processes arises when we consider anticipative stochastic differential equations. In Part I of this thesis, we conduct a deeper research on the general stochastic integral introduced by W. Ayed and H.-H. Kuo in 2008. We provide a rigorous mathematical framework for the integral in Chapter 2, and prove that the integral is well-defined. Then a general Itˆoformula is given. In Chapter 3, we present an intrinsic property, near-martingale property, of the general stochastic integral, and Doob–Meyer’s decomposition for near-submartigales. We apply the new stochastic integration theory to several kinds of anticipative stochastic differ- ential equations and discuss their properties in Chapter 4. In Chapter 5, we apply our results to a general Black–Scholes model with anticipative initial conditions to obtain the properties of the market with inside information and the pricing strat- egy, which can help us better understand inside trading in mathematical finance.

In dynamical systems, small random noises can make transitions between equilib- riums happen. This kind of transitions happens rarely but can have crucial impact on the system, which leads to the numerical study of it. In Part II of this thesis, we emphasize on the temporal minimum action method introduced by X. Wan, using optimal linear time scaling and finite element approximation to find the most possible transition paths, also known as the minimal action paths, numerically. We show the method and its numerical analysis in Chapter 7. In Chapter 8, we apply this method to a stochastic dynamic system with time delay using penalty method.

vi Part I

General Stochastic Integral and Its Applications

1 Chapter 1 Introduction

1.1 Background

The historical background of this work can be traced back to early 19th century. In

1827, Robert Brown published in [6] his observation that pollen grains suspended in water move continuously in a random way. However, Brown did not explain the source of the motion, which was later on named after him, a Brownian motion, in physics or set up the model in mathematics.

The application of Brownian motion in mathematical modelling goes earlier than its theoretical framework. It was used separately in two subjects: (1) In mathemat- ical finance, Louis Bachelier used Brownian motion to model stock market in his doctoral thesis in 1900; and (2) In physics, Albert Einstein [9] used Brown motion to model atoms in 1905.

By using , Einstein’s description of Brownian motion is close to its rigorous definition in modern mathematics. In 1923, Norbert Wiener [33] provided a definition of Brownian motion in mathematics using real analysis, and so it is also called a now. Based on this definition, the whole theory of stochastic process and its analysis are built up.

The Itˆointegral introduced by K. Itˆoin 1942 [16], as a generalization of Wiener integral, becomes a strong and standard tool in the study of statistic physics, control engineering, biological system, mathematical finance, and so on. However, the classic Itˆointegration theory relies on the requirement of adapted integrand, yet even in solving adapted problems, for example hedging portfolio in Black-Scholes model, there can be non-adapted SDEs involved. For instance, Itˆointegral is not

2 enough to solve an even simple SDE

dXt = Xt dB(t),Y0 = B(1), t ≥ 0, (1.1.1) since the initial condition is anticipating. Wiener’s work inspired L. Gross to intro- duce in 1965 [12], based on which the distribu- tion theory was built up by T. Hida in 1975 [13]. In 1976, P. Malliavin provided a proof of the existence of transition probability (see [30]). In the white noise distri- bution theory, the Hitsuda–Skorokhod integral provides a certain generalization of R t Itˆointegral to define the stochastic integral a B(1) dB(s), which is used in solving (1.1.1). However, this generalization has two problems:

R t R t ∗ 1. a f(t) dB(s) := a ∂t f(t) dt is defined as a generalized function in the space (S)∗ and lacks probabilistic meaning.

R t ∗ 2. The white noise integral a ∂t f(t) dt is a Hitsuda–Skorokhod integral if it is a

in Lp(S0(R), µ) for some p > 1. However, there is no theorem

that characterizes a generalized function in (S)∗ to be in Lp(S0(R), µ).

ItˆoCalculus Abstract Wiener Space  - (1942) : (Gross 1965)    6(∗) 9 6 XyXXXXX ? XXXXX ? XXXX White Noise Theory Xz  - (Hida 1975) (1976)

E. Pardoux and Protter, P. in [32] defined the forward and backward stochastic separately in order to solve the forward-backward SDEs. This provides an explicit expression of the solution of some kinds of backward SDEs. However, R t it cannot define a simple stochastic integral like 0 B(1) − B(s) dB(s), t ∈ [0, 1]. The disadvantage of the existing theories and the need in solving general SDEs

3 motivated us to develop a new stochastic integral. As shown in the above diagram

(see [21]) of relationships among the four subjects of stochastic analysis, our work is at (∗).

In the following sections, the definitions of the preliminary mathematical con- cepts and their properties are given.

1.2 Stochastic Process and Brownian Motion

Definition 1.2.1 (Stochastic Process). Let (Ω, F,P ) be a , where

Ω is a sample space, F is a σ-field and P is a probability measure. A function

X(t, ω) : [0, +∞) × Ω → R is called a stochastic process, if it satisfies the following two conditions.

1. For each t ∈ [0, +∞), f(t, ω):Ω → R is a random variable.

2. For each ω ∈ Ω, f(t, ω) : [0, +∞) → R is a measurable function of t.

Definition 1.2.2 (Brownian Motion). A stochastic process B(t, ω) : [0, ∞)×Ω →

R is called a Brownian motion or a Wiener process, if it satisfies the following four conditions.

i. For any 0 ≤ s < t, B(t, ω)−B(s, ω) is a random variable normally distributed

with expectation 0 and variance t − s;

ii. B(t) starts at 0 almost surely, i.e.,

P {ω : B(0, ω) = 0} = 1;

iii. For any 0 ≤ t1 < . . . < tn, the random variables B(t1, ω), B(t2, ω) −

B(t1, ω),... , B(tn, ω) − B(tn−1, ω) are independent;

iv. Almost all sample paths of B(t) are continuous, i.e.,

P {ω : B(t, ω) is continuous in [0, ∞)} = 1.

4 1.3 Conditional Expectation and Martingale

Definition 1.3.1 (Conditional Expectation). Let (Ω, F,P ) be a probability space,

G a sub-σ-field of F, and X a random variable in L1(Ω). Then the conditional expectation of X on G, denoted by E[X|G], is a random variable Y satisfying the following conditions:

1. Y is measurable with respect to G;

2. For any Borel set A ∈ G, we have

Z Z Y dP = X dP. A A

Remark 1.3.2. The conditional expectation E[X|G] is the best approximation of

X with information of G.

Theorem 1.3.3 (Properties of Conditional Expectation). Let X and Y be random variables on Ω. Assume that G is a sub-σ-field of F. Then we have

1. EE[X|G] = E[X].

2. E[X + Y |G] = E[X|G] + E[Y |G].

3. If X is measurable with respect to G, then E[X|G] = X.

4. If X is independent of G, then E[X|G] = E[X].

5. If Y is measurable with respect to G and XY ∈ L1(Ω), then

E[XY |G] = YE[X|G].

  6. If H is a sub-σ-field of G, then E E[X|G] H = E[X|H].

Definition 1.3.4 (Filtration). A family of σ-fields {Ft; 0 ≤ t < ∞} on (Ω, F,P ) is called a filtration, if Fs ⊂ Ft ⊂ F for any 0 ≤ s < t.

5 Definition 1.3.5 (Adaptedness). Let X(t, ω) be a stochastic process and {Ft; 0 ≤ t < ∞} be a filtration. We say that X is adapted to {Ft}, if for each t ≥ 0, X(t, ω) is measurable with respect to Ft.

Remark 1.3.6. From now on, we will also use Bt and Xt to denote a Brownian motion and a stochastic process, respectively. We say that {Ft} is the natural filtration, if it is generated from the corresponding Brownian motion, i.e., for each t ≥ 0, Ft = σ{B(s); 0 ≤ s ≤ t}.

Example 1.3.7. A Brownian motion B(t) is adapted to its natural filtration {Ft}.

Definition 1.3.8 (Martingale). Let {Ft} be a filtration and X(t) be a stochastic process adapted to {Ft} with E|Xt| < ∞ for all t ≥ 0. We call X(t) a martingale if for each 0 ≤ s < t,

E[X(t)|Fs] = X(s), almost surely.

Remark 1.3.9. Similar to Remark 1.3.2, a martingale is a stochastic process Xt whose best approximation with the information prior to time s is the process Xs itself at s.

Example 1.3.10. A Brownian motion B(t) is a martingale with respect to its natural filtration {Ft}. In fact, by Definition 1.2.2 and Theorem 1.3.3,

E[B(t)|Fs] =E[B(t) − B(s) + B(s)|Fs]

=E[B(t) − B(s)|Fs] + E[B(s)|Fs]

=E[B(t) − B(s)] + B(s)

=B(s). (1.3.1)

Many important stochastic processes are not martingales, but they are worth to study, e.g., B(t)2 in Example 1.3.13 below. Although they are not martingales, a similar property is satisfied.

6 Definition 1.3.11 (Submartingale). A stochastic process Xt, a ≤ t ≤ b, with

E|Xt| < ∞ for all t, is called a submartingale with respect to a filtration {Ft}, if for all a ≤ s < t ≤ b, we have

E[Xt|Fs] ≥ Xs, almost surely. (1.3.2)

Definition 1.3.12 (Supermartingale). A stochastic process Xt, a ≤ t ≤ b, with

E|Xt| < ∞ for all t, is called a supermartingale with respect to a filtration {Ft}, if for all a ≤ s < t ≤ b, we have

E[Xt|Fs] ≤ Xs, almost surely. (1.3.3)

2 Example 1.3.13. Let Xt = B(t) . Then for t > s,

2 E[Xt|Fs] =E[B(t) |Fs]

 2  =E B(t) − B(s) + B(s) |Fs

 2  2 =E B(t) − B(s) |Fs + 2B(s)E[B(t) − B(s)|Fs] + E[B(s) |Fs]

=EB(t) − B(s)2 + B(s)2

2 2 =B(s) + t − s ≥ B(s) = Xs.

So Xt is a submartingale.

A crucial and useful property of submartingale processes is Doob–Meyer’s de- composition.

Theorem 1.3.14 (Doob–Meyer’s Decomposition, [20]). Let Xt, a ≤ t ≤ b, be a continuous submartingale with respect to a continuous filtration {Ft; a ≤ t ≤ b}.

Then Xt has a unique decomposition

Xt = Mt + At, a ≤ t ≤ b, (1.3.4)

where Mt is a continuous martingale with respect to {Ft}, and At is a continuous stochastic process satisfying the conditions:

7 1. Aa = 0;

2. At is increasing in t almost surely;

3. At is adapted to {Ft}.

Definition 1.3.15 (Compensator). The continuous stochastic process At in The- orem 1.3.14 is called the compensator of the submartingale Xt.

1.4 ItˆoIntegral and the ItˆoFormula

From now on we will consider, without loss of generality, all the stochastic processes involved in a time interval [a, b] for arbitrary 0 ≤ a < b, or [0,T ] for an arbitrary

T > 0.

2 We use Lad([a, b]×Ω) to denote the space of square integrable stochastic processes

R b 2 f(t) adapted to {Ft}, that is, a E |f(t)| dt < ∞.

Definition 1.4.1 (ItˆoIntegral). Let B(t) be a Brownian motion, and f(t) be a

2 stochastic process in Lad([a, b] × Ω). The Itˆointegral of f is a random variable denoted in stochastic integral form

Z b f(t) dB(t), a and defined in the following steps.

Step 1: Step Processes.

If f(t) is an adapted step stochastic process

n X f(t) = ξi−11[ti−1,ti), i=1 where a = t0 < t1 < ··· < tn = b is a partition of [a, b], then we define its Itˆo integral as n Z b X f(t) dB(t) = ξi−1(B(ti) − B(ti−1)). a i=1

8 Step 2: Square Integrable Processes Approximated by Step Processes.

2 For any process f(t) ∈ Lad([0,T ] × Ω), we can prove that there is a sequence of

∞ 2 adapted step stochastic processes {fn(t)}n=1 that converges to f(t) in L ([0,T ]×Ω). Step 3: Square Integrable Processes.

Define the Itˆointegral of f by

Z b Z b 2 f(t) dB(t) = lim fn(t) dB(t), in L (Ω). a n→∞ a

One can prove that the stochastic integral above is well-defined. For details, see

[20]. Although Itˆointegral is defined in the rigorous framework of , its intuition is just a similarity of . Its consistency with Riemann integral is proved in the following theorem.

Theorem 1.4.2. Assume that f(t) is left continuous and adapted to {Ft}. Then the limit of its Riemman type sum exists and is equal to its Itˆointegral, i.e.,

n Z b X f(t) dB(t) = lim f(ti−1)(B(ti) − B(ti−1)). (1.4.1) k∆nk→0 a i=1

Remark 1.4.3. Notice that in the right hand side of Equation (1.4.1), as the limit of Riemann type sum, the evaluation point of f on the subinterval [ti−1, ti) is the left endpoint ti−1. We will see that this makes Itˆoprocess a martingale, and this is the key characteristic of Itˆointegration.

Now, we show the most important properties of Itˆointegral. They are used in, for example, the proof of existence and uniqueness of stochastic differential equations,

Freidlin-Wendzell property, the Girsanov theorem and so on. Among others, the Itˆo formula palys the central role in modern stochastic analysis, and its applications to physics, biology, engineering and finance. For their proofs, refer to [20].

9 2 Theorem 1.4.4 (ItˆoIsometry). Let f ∈ Lad([0,T ] × Ω). Then its Itˆointegral R T 0 f(t) dB(t) is a random variable satisfying

 Z T  E f(t) dB(t) = 0, 0 and  Z T 2 Z T 2 E f(t) dB(t) = (E[f(t) ]) dt. 0 0 In addition, if f(t) is deterministic, then its Itˆointegral is normally distributed.

Theorem 1.4.5 (Martingale Property). Let Xt, 0 ≤ t ≤ T , be the stochastic process Z t Xt = f(s) dB(s), 0

2 where f ∈ Lad([0,T ] × Ω). Then Xt is a continuous martingale.

Definition 1.4.6 (). Let f(t) be a stochastic process and

∆ = {a = t0 < t1 < . . . < tn−1 < tn = b} be a partition of [a, b]. Then the quadratic variation of f on [a, b] is defined by

n X 2 hfi[a,b] = lim (f(ti) − f(ti−1)) , k∆k→0 i=1 provided the limit exists in probability.

Example 1.4.7. The quadratic variation hBi[a,b] of a Brownian motion B(t) is b − a for any b > a > 0.

2 1 Definition 1.4.8 (ItˆoProcess). Let f(t) ∈ Lad(Ω,L [0,T ]), g(t) ∈ Lad(Ω,L [0,T ]). Then we call the stochastic process X(t) defined by

Z t Z t X(t) = f(t) dB(t) + g(t) dt (1.4.2) 0 0 an Itˆoprocess.

10 Remark 1.4.9. In Equation (1.4.2), the first term is defined as an Itˆointegral, and the second term is defined as a Riemann integral for almost every ω ∈ Ω. By

Theorem 1.4.5, Xt is continuous. Equation (1.4.2) is often written in stochastic differential

dX(t) = f(t) dB(t) + g(t) dt.

 (i) n Theorem 1.4.10 (Itˆo’sFormula, [17]). Let Xt i=1 be Itˆoprocesses as (1.4.2) for fi, gi, i = 1, 2, . . . , n, and ϕ(x1, x2, . . . , xn) be a twice continuously differentiable real function on Rn. Then

(1) (2) (n) (1) (2) (n) ϕ Xt ,Xt ,...,Xt =ϕ X0 ,X0 ,...,X0 n X Z t ∂ + ϕX(1),X(2),...,X(n) dX(i) ∂x s s s s i=1 0 i n 1 X Z t ∂2 + ϕX(1),X(2),...,X(n) dX(i)dX(j), 2 ∂x ∂x s s s s s i,j=1 0 i j (1.4.3) or in stochastic differential

n (1) (2) (n) X ∂ (1) (2) (n) (i) dϕX ,X ,...,X  = ϕX ,X ,...,X  dX t t t ∂x t t t t i=1 i n 2 1 X ∂ (1) (2) (n) (i) (j) + ϕX ,X ,...,X  dX dX , 2 ∂x ∂x t t t t t i,j=1 i j (1.4.4)

(i) (i) where dXt are the stochastic differentials for the Itˆoprocesses Xt and their multiplications follow the rule shown in the following Itˆotable:

× dB(t) dt

dB(t) dt 0 .

dt 0 0

11 Example 1.4.11. By applying Theorem 1.4.10 to X(t) = B(t)(f(t) = 1 and g(t) = 0) and ϕ(x) = x2. Then ϕ0(x) = 2x and ϕ00(x) = 2, and we have

Z t B(t)2 = 2 B(s) dB(s) + t, (1.4.5) 0 or equivalently, Z t 1 B(s) dB(s) = (B(t)2 − t). (1.4.6) 0 2 This calculates the Itˆointegral of B(t) on [0, t].

Example 1.4.12 (Exponential Process). Let B(t) be a Brownian motion and {Ft}

2 be its natural filtration. Let f(t) be a stochastic process in Lad([0,T ] × Ω). Define

Z t Z t n 1 2 o Ef (t) = exp f(s) dB(s) − f (s) ds . 0 2 0

Then by Theorem 1.4.10, we have

dEf (t) = f(t)Ef (t) dB(t).

This is in fact an analogue of the differential of an exponential function, so we call

Ef (t) the exponential process of f.

1.5 Girsanov Theorem

The Girsanov theorem shows that a certain class of transitions of a Brownian motion are also Brownian motions with respect to equivalent probability measures.

As an important corollary of the Itˆoformula, it has a wide range of applications.

Theorem 1.5.1 (Girsanov Theorem, [11]). Let B(t) be a Brownian motion and

{Ft} be its natural filtration. Let f(t) be a square integrable {Ft}-adapted stochastic   process such that EP Ef (t) < ∞ for all t ∈ [0,T ]. Then the stochastic process given by Z t Be(t) = B(t) − f(s) ds 0

12 is a Brownian motion with respect to the probability measure Q given by

Z T Z T n 1 2 o dQ = Ef (T ) dP = exp f(t) dB(t) − f (t) dt dP, (1.5.1) 0 2 0 which is equivalent to P .

13 Chapter 2 General Stochastic Integral and General ItˆoFormula

As indicated in Section 1.1, for non-adapted or anticipative stochastic processes, the classical Itˆointegration theory is not enough. So we need to define a general stochastic integration to deal with both adapted and non-adapted cases. In this chapter, we will give the definition of a general stochastic integral and prove its

Itˆoformula. It is worth to point out that the Itˆoformula coincides with the one proved in the white noise distribution theory, which resolves the problem stated in Section 1.1. However, as indicated there, the conclusions from the white noise distribution theory are too abstract to have a realistic meaning.

We start with the definition of the stochastic integral for non-adapted stochastic processes (or so-called counter part).

2.1 Stochastic Integral for Instantly Independent Stochastic Processes

Definition 2.1.1 (Counter Filtration). A family {G(t); a ≤ t ≤ b} of complete

σ-fields is called a counter {Ft}-filtration if it satisfies the following conditions:

(t) 1. {G ; a ≤ t ≤ b} is instantly independent with respect to {Ft; a ≤ t ≤ b},

(t) i.e., for each t ∈ [a, b], G and Ft are independent;

(t) (t1) (t2) 2. {G ; a ≤ t ≤ b} is decreasing, i.e., for t1 < t2, G ⊃ G ;

3. Ge(t) := σ{B(b) − B(s): t ≤ s ≤ b} ⊂ G(t) for each t ∈ [a, b].

Remark 2.1.2. Note that

14 1. A family {G(t); a ≤ t ≤ b} that is instantly independent with respect to

(t) {Ft; a ≤ t ≤ b} may not be decreasing. For example, G = σ{B(b) − B(t)}

is instantly independent with respect to {Ft}, but not decreasing.

(t) 2. The backward filtration Ge is a counter {Ft}-filtration. But it is not the only one. For example,

G(t) := σ{B(b) − B(s),B(2b) − B(s): a ≤ t ≤ b},

is also a counter {Ft}-filtration.

(t) (t) 3. If Xt and Y are processes that are adapted to {Ft} and {G }, respectively,

(t) (t) where {Ft} and {G } are instantly independent, then Xt and Y are in-

2 (t) stantly independent. In the sense of the Hilbert space L (Ω),Xt and Y are orthogonal for each t. Thus, we can consider L2(G(t)) as a subspace that is

2 orthogonal to L (Ft).

(t) Proposition 2.1.3. If {G } is a counter {Ft}-filtration, and a stochastic process

(t) Xt is adapted to {G }, then Xt is instantly independent to {Ft}.

−1 (t) −1 Proof. For any Borel set A and t ∈ [a, b],Xt (A) ∈ G . So Xt (A) is independent of {Ft}. Thus, Xt is independent of {Ft} for each t.

2 From now on, we use Lct ([a, b] × Ω) to denote the space of all stochastic processes g(t, ω), a ≤ t ≤ b, ω ∈ Ω, satisfying the following conditions:

(t) 1. g(t, ω) is adapted to the counter {Ft}-filtration {G };

R b  2 2. a E |g(t)| dt < ∞.

2 2 Remark 2.1.4. In fact, in L ([a, b] × Ω), Lct ([a, b] × Ω) is a subspace of the sub-

2 2 2 space that is orthogonal to Lad ([a, b] × Ω), i.e., Lad ([a, b] × Ω) ⊥ Lct ([a, b] × Ω).

15 R b R b Now, we can define the stochastic integral a g(t) dB(t), and then a f(t)g(t) dB(t),

2 2 for g ∈ Lct ([a, b] × Ω) and f ∈ Lad ([a, b] × Ω) . We divide the definition into the following several steps.

2 Step 1: g is a step stochastic process in Lct ([a, b] × Ω) . Suppose g is a step stochastic process given by

n X g(t, ω) = ηi(ω)1[ti−1,ti)(t), i=1

(ti) 2 2 where ηi is G -measurable and E [ηi ] < ∞. Then g ∈ Lct ([a, b] × Ω) . Define

n X I(g) = ηi (B(ti) − B(ti−1)) . i=1

2 Then I(f) + I(g) = I(f + g), for step functions f and g in Lct ([a, b] × Ω) .

2 R b 2 Lemma 2.1.5. E(I(g)) = 0 and E (|I(g)| ) = a E (|g(t)| ) dt.

Proof. For each 1 ≤ i ≤ n,

 (ti) E (ηi (B(ti) − B(ti−1))) = E E ηi (B(ti) − B(ti−1)) |G

 (ti) = E ηiE (B(ti) − B(ti−1)) |G

= E [ηiE (B(ti) − B(ti−1))]

= E(ηi · 0) = 0.

Pn So E(I(g)) = i=1 E (ηi (B(ti) − B(ti−1))) = 0.

n 2 X |I(g)| = ηiηj (B(ti) − B(ti−1)) (B(tj) − B(tj−1)) . i,j=1

16 For i < j,

E [ηiηj (B(ti) − B(ti−1)) (B(tj) − B(tj−1))]

  (ti) =E E ηiηj (B(ti) − B(ti−1)) (B(tj) − B(tj−1)) |G

  (ti) =E ηiηj (B(tj) − B(tj−1)) E (B(ti) − B(ti−1)) |G

=E [ηiηj (B(tj) − B(tj−1)) E (B(ti) − B(ti−1))]

=0.

So when i 6= j, E [ηiηj (B(ti) − B(ti−1)) (B(tj) − B(tj−1))] = 0. Then

 2 2   2 2 (ti) E ηi (B(ti) − B(ti−1)) =E E ηi (B(ti) − B(ti−1)) |G

 2  2 (ti) =E ηi E (B(ti) − B(ti−1)) |G

 2  2 =E ηi E (B(ti) − B(ti−1))

 2  =E ηi (ti − ti−1)

2 =E ηi (ti − ti−1).

Thus,

" n # 2 X E |I(g)| = E ηiηj (B(ti) − B(ti−1)) (B(tj) − B(tj−1)) i,j=1 " n # X 2 2 = E ηi (B(ti) − B(ti−1)) i=1 n X 2 = E ηi (ti − ti−1) i=1 Z b = E |g(t)|2 dt. a

Step 2: An approximation lemma.

17 2 Lemma 2.1.6. Suppose g ∈ Lct([a, b]×Ω). Then there exists a sequence {gn(t); n ≥ 1}

2 of step stochastic processes in Lct([a, b] × Ω) such that

Z b  2 lim E |g(t) − gn(t)| dt = 0. n→∞ a

Proof. Case I: E[g(t)g(s)] is a continuous function of (t, s) ∈ [a, b]2.

Let a = t0 < t1 < ··· < tn−1 < tn = b be a partition of [a, b] and define a stochastic process gn(t, ω) by

gn(t, ω) = g(ti, ω), ti−1 ≤ t < ti.

Then {gn} is a sequence of G(t)-adapted processes. By the continuity of E[g(t)g(s)] on [a, b]2, we have

lim E[|g(t) − g(s)|2] = lim [E[g(t)g(t)] − 2E[g(t)g(s)] + E[g(s)g(s)]] = 0, s→t s→t which implies that for each t ∈ [a, b],

 2 lim E |g(t) − gn(t)| = 0. n→∞

Moreover,

2 h 2 2i |g(t) − gn(t)| ≤ 2 g(t)| + |gn(t) .

So for all a ≤ t ≤ b,

2  2 2  E[|g(t) − gn(t)| ] ≤ 2 E(|g(t)| ) + E(|gn(t)| )

≤ 4 sup E(|g(s)|2). a≤s≤b

The right hand side is finite by the continuity assumption. Thus, by the Lebesgue

Dominated Convergence Theorem, we have

Z b Z b  2 h 2 i lim E |g(t) − gn(t)| dt = lim E(|g(t) − gn(t)| ) dt = 0. n→∞ a a n→∞

Case II: g is bounded: |g| ≤ M for some M > 0.

18 R n(b−t) −τ −1 (t) Defineg ˜n(t, ω) = 0 e g(t+n τ, ω) dτ. Theng ˜n is adapted to G since the integral is a pointwise limit of a sum of stochastic processes that are measurable with respect to G(t). Moreover,

Z b Z b Z n(b−t)  2 h −τ −1 2i E |g˜n(t)| dt = E e g(t + n τ, ω) dτ dt a a 0 Z b Z n(b−t) h 2i ≤ E M dτ dt a 0 Z b = M 2n2(b − t)2 dt < ∞. a

And we have the following properties of gn.

◦ Claim (a): For each n, E (˜gn(t)˜gn(s)) is a continuous function of (t, s). Take the substitution u = t + n−1τ. Then

Z b n(t−u) g˜n(t, ω) = ne g(u, ω) du. t

Thus,

2 0 ≤ lim E |g˜n(t) − g˜n(s)| t→s Z s  n(t−u) 2 = lim E ne g(u, ω) du t→s t ≤ lim n2M 2(s − t)2 = 0, t→s which implies

2 lim E |g˜n(t) − g˜n(s)| = 0. t→s

So

0 ≤ |E (˜gn(t)˜gn(s)) − E (˜gn(t0)˜gn(s0)) |

≤ |E ((˜gn(t) − g˜n(t0))g ˜n(s)) − E (˜gn(t0) (˜gn(s) − g˜n(s0))) |

p 2 2 p 2 2 ≤ E (|g˜n(t) − g˜n(t0)| ) E (|g˜n(s)| ) − E (|g˜n(t0)| ) E (|g˜n(s) − g˜n(s0)| )

→ 0,

19 as t → t0 and s → s0. The claim follows.

R b 2 ◦ Claim (b): a E (|g(t) − g˜n(t)| ) dt → 0, as n → ∞.

Z ∞ −τ −1  g(t) − g˜n(t) = e g(t) − g(t + n τ) dτ, 0 where g(t) is understood to be zero when t > b. Since e−τ is a probability measure on [0, ∞), we can apply the Schwartz inequality to get

Z ∞ 2 −τ −1 2 |g(t) − g˜n(t)| ≤ e |g(t) − g(t + n τ)| dτ. 0

Then

Z b Z b Z ∞ 2  −τ −1 2  E |g(t) − g˜n(t)| dt ≤ E e |g(t) − g(t + n τ)| dτ dt a a 0 Z b Z ∞ = e−τ E |g(t) − g(t + n−1τ)|2 dτ dt a 0 Z ∞  Z b  = e−τ E|g(t) − g(t + n−1τ)|2 dt dτ 0 a Z ∞  Z b  = e−τ E |g(t) − g(t + n−1τ)|2 dt dτ 0 a ≤ 4M 2(b − a) < ∞.

Thus, by the Lebesgue Dominated Convergence Theorem, we get the claim.

 (t) Now, by Claim (a) we can apply Case I tog ˜n for each n to get a G -adapted step stochastic process gn(t, ω) such that

Z b 2 1 E |gn(t) − g˜n(t)| dt < . a n

Then by Claim (b), we have

Z b 2 lim E |g(t) − gn(t)| dt = 0. n→∞ a

2 Case III: The general case for g ∈ Lct ([a, b] × Ω) .

20 2 Let g ∈ Lct ([a, b] × Ω) . For each n, define

 g(t, ω), if |g(t, ω)| ≤ n, g¯n(t, ω) = 0, if |g(t, ω)| > n.

Then

2 2 2 2 |g(t) − g¯n(t)| ≤ 2 |g(t)| + |g¯n(t)| ≤ 4|g(t)| .

R b 2 By the Lebesgue Dominated Convergence Theorem and a E (|g(t)| ) dt < ∞, we have Z b 2 lim E |g(t) − g¯n(t)| dt = 0. n→∞ a

 (t) Now, for each n ≥ 1, we apply Case II tog ¯n to get a G -adapted step stochastic process gn(t, ω) such that

Z b 2 1 E |gn(t) − g¯n(t)| dt < . a n

Hence, Z b 2 lim E |g(t) − gn(t)| dt = 0, n→∞ a by the two equations above.

R b 2 Step 3: Stochastic integral a g(t) dB(t) for g ∈ Lct ([a, b] × Ω) .

 (t) By Lemma 2, we can find a sequence {gn(t, ω); n ≥ 1} of G -adapted step stochastic processes such that

Z b 2 lim E |g(t) − gn(t)| dt = 0. n→∞ a

For each n, I(gn) is defined as in Step 1. By Lemma 1, we get

Z b 2 2 E |I(gn) − I(gm)| = E |gn(t) − gm(t)| dt → 0, as m, n → ∞, a

21  (t) since I(gn) − I(gm) = I(gn − gm) whereas gn − gm is still a G -adapted step

2 stochastic process. Hence the sequence {I(gn)} is a Cauchy sequence in L (Ω). Define

2 I(g) = lim I(gn), in L (Ω). (2.1.1) n→∞

The stochastic integral I(g) is well-defined. In fact, if there is another sequence

 (t) {gn} of G -adapted step stochastic processes such that

Z b 2 lim E |g(t) − fn(t)| dt = 0, n→∞ a then

2 2 E |I(fn) − I(gm)| = E |I(fn − gm)| Z b 2 = E |fn(t) − gm(t)| dt a Z b Z b 2 2 ≤ 2 E |fn(t) − g(t)| dt + 2 E |g(t) − gm(t)| dt a a → 0, as m, n → ∞.

Thus,

2 lim I(fn) = lim I(gm), in L (Ω). n→∞ n→∞

Now, we define the stochastic integral of the counter part as follows.

Definition 2.1.7. The limit I(g) defined in Equation (2.1.1) is called the general

2 R b stochastic integral of g ∈ Lct ([a, b] × Ω) and is denoted by a g(t) dB(t).

2 It is obvious that I is linear, namely, for any α, β ∈ R and f, g ∈ Lct ([a, b] × Ω) , we have I(αf + βg) = αI(f) + βI(g).

2  R b Theorem 2.1.8 (Isometry). Suppose g ∈ Lct [a, b] × Ω . Then a g(t) dB(t) is a R b  random variable with E a g(t) dB(t) = 0 and

 Z b 2 Z b 2 E g(t) dB(t) = E |g(t)| dt. a a

22 R b  (t) Proof. By the definition of a g(t) dB(t), there exists a sequence of G -adapted step stochastic processes gn, such that

Z b 2 lim E |g(t) − gn(t)| dt = 0, (2.1.2) n→∞ a and

Z b Z b 2 g(t) dB(t) = lim gn(t) dB(t), in L (Ω). (2.1.3) a n→∞ a

But by Lemma 2.1.5, we have

 Z b   Z b 2 Z b 2 E gn(t) dB(t) = 0 and E gn(t) dB(t) = E |gn(t)| dt. a a a

Therefore,

 Z b   Z b   Z b 

E g(t) dB(t) = E g(t) dB(t) − E gn(t) dB(t) a a a  Z b 

= E g(t) − gn(t) dB(t) a b 1  h Z 2i 2

≤ E g(t) − gn(t) dB(t) a Z b 2 = E |g(t) − gn(t)| dt → 0, as n → ∞. a

R b  The left hand side is independent of n. So E a g(t) dB(t) = 0. By Equation (2.1.2) and (2.1.3), we have

Z b Z b g(t) dB(t) = lim gn(t) dB(t) , 2 2 a L (Ω) n→∞ a L (Ω) and

kgkL2 ([a,b]×Ω) = lim kgnkL2 ([a,b]×Ω). ct n→∞ ct

23 So

 Z b 2 Z b Z b E g(t) dB(t) = g(t) dB(t) = lim gn(t) dB(t) 2 2 a a L (Ω) n→∞ a L (Ω)  Z b 2 Z b 2 = lim E gn(t) dB(t) = lim E |gn(t)| dt n→∞ a n→∞ a

= lim kgnkL2 ([a,b]×Ω) = kgkL2 ([a,b]×Ω) n→∞ ct ct Z b = E |g(t)|2 dt. a

2 2 By this theorem, the stochastic integration I : Lct ([a, b] × Ω) → L (Ω) is an isometry.

From Case I of the proof of Lemma 2.1.6, we have the following theorem:

2 Theorem 2.1.9. Suppose g ∈ Lct ([a, b] × Ω) and assume that E(g(t)g(s)) is a continuous function of t and s. Then

Z b n X 2 g(t) dB(t) = lim g(ti)(B(ti) − B(ti−1)) , in L (Ω) . n→∞ a i=1 We can rewrite the equation in Theorem 2.1.9 as

n Z ti X h i 2 lim g(ti)(B(ti) − B(ti−1)) − g(t) dB(t) = 0, in L (Ω) , n→∞ i=1 ti−1 so is true in probability. In this sense, we can write in an asymptotic and symbolic

i (t) R T way that ∆Y ≈ −g(ti)∆Bi, where the stochastic process Y := t g(s) dB(s),

i (ti) (ti−1) and ∆Y = Y − Y , ∆Bi = Bti − Bti−1 . The continuity of the stochastic process Y (t) defined above can be proved simi- larly as in [20] for the classical Itˆostochastic integral.

Now, we can define the stochastic integral for the combination of adapted and instantly independent processes. From now on, if it is not specified, then the state- ment “ϕ(t) is instantly independent of {Ft}” means “ϕ(t) is adapted to a counter

{Ft}-filtration”.

24 2.2 General Stochastic Integral

The general stochastic integral for both adapted and non-adapted processes was

firstly introduced in [1] by W. Ayed and H.-H. Kuo.

Definition 2.2.1 (Ayed–Kuo [1]). The stochastic integral of f(t)ϕ(t), where f is adapted and ϕ is instantly independent, is defined by

Z b n X  f(t)ϕ(t) dB(t) = lim f(tj−1)ϕ(tj) B(tj) − B(tj−1) , k∆nk→0 a j=1 provided that the limit in probability exists.

Definition 2.2.2 (Hwang–Kuo–Saitˆo–Zhai[14]). Suppose Φ(t), a ≤ t ≤ b, is a stochastic process of the following form

m X Φ(t) = fi(t)ϕi(t), (2.2.1) i=1 where fi(t)’s are {Ft}-adapted continuous stochastic processes and ϕi(t)’s are con- tinuous stochastic processes being instantly independent of {Ft}. The stochastic integral of Φ(t) is defined by

m Z b X Z b Φ(t) dB(t) = fi(t)ϕi(t) dB(t), a i=1 a

R b where the integrals a fi(t)ϕi(t) dB(t) are defined by

n Z b X fi(t)ϕi(t) dB(t) = lim fi(tj−1)ϕi(tj)(B(tj) − B(tj−1)), k∆nk→0 a j=1 provided that the limit in probability exists.

Intuitively, the new stochastic integral uses respectively the left endpoint for the

Itˆo(adapted) part and the right endpoint for the counter (instantly independent) part as the evaluation points of integration. In comparison, the Itˆointegral only has adapted part, and thus uses the left endpoint. However, it is not proved in [1] that the new integral is well-defined. We filled this gap in the following lemma.

25 Lemma 2.2.3 (Hwang–Kuo–Saitˆo–Zhai[14]). Let fi(t), 1 ≤ i ≤ m, gj(t), 1 ≤ j ≤ n, be {Ft}-adapted continuous stochastic processes and let ϕi(t), 1 ≤ i ≤ m,

ξj(t), 1 ≤ j ≤ n, be continuous stochastic processes being instantly independent R b R b of {Ft}. Suppose the stochastic integrals a fi(t)ϕi(t) dB(t) and a gj(t)ξj(t) dB(t) exist for 1 ≤ i ≤ m, 1 ≤ j ≤ n. Assume that

m n X X fi(t)ϕi(t) = gj(t)ξj(t), a ≤ t ≤ b. i=1 j=1 Then the following equality holds:

m n X Z b X Z b fi(t)ϕi(t) dB(t) = gj(t)ξj(t) dB(t). (2.2.2) i=1 a j=1 a Remark 2.2.4. When f(t)ϕ(t) = g(t)ξ(t) (for Case 1 in the proof below), we do not R b R b need to assume the existence of a f(t)ϕ(t) dB(t) and a g(t)ξ(t) dB(t) as limits in probability. Equation (2.2.2) for this case is understood to mean that if the limit in probability on one side of the equality exists, then it also exists on the other side and the equality holds.

Proof. We divide the proof into several cases:

Case 1: m = n = 1.

Suppose f(t)ϕ(t) = g(t)ξ(t), a ≤ t ≤ b. Without loss of generality, we assume that f(t), ϕ(t), g(t), ξ(t) are non-zero. Then

f(t) ξ(t) = . g(t) ϕ(t)

Note that in the above equation, the left-hand side is {Ft}-adapted and the right- hand side is instantly independent of {Ft}. However, a continuous stochastic pro- cess that is both adapted to and instantly independent of {Ft} must be a contin- uous deterministic function. Denote this function as a(t). Then

f(t) ξ(t) = = a(t). g(t) ϕ(t)

26 So 1 f(t) = g(t)a(t) and ϕ = ξ(t). a(t)

Thus, for any partition a = t0 < t1 < ··· < tl = b, Z b l X  f(t)ϕ(t) dB(t) ≈ f(ti−1)ϕ(ti) B(ti) − B(ti−1) a k=1 l X  1   ≈ a(ti−1)g(ti−1) ϕ(ti) B(ti) − B(ti−1) a(ti) k=1 l X  ≈ g(ti−1)ξ(ti) B(ti) − B(ti−1) k=1 Z b ≈ g(t)ξ(t) dB(t), a which proves the theorem for this case.

Case 2: m = 1, n = 2.

Assume

f(t)ϕ(t) = g1(t)ξ1(t) + g2(t)ξ2(t). (2.2.3)

Let a(t) = Eϕ(t), b1(t) = Eξ1(t), b2(t) = Eξ2(t) and assume without loss of gener- ality that a(t), b1(t), b2(t) are non-zero. Then by taking conditional expectation of

Equation (2.2.3) with respect to {Ft}, we get

f(t)a(t) = g1(t)b1(t) + g2(t)b2(t). (2.2.4)

˜ Leta ˜(t) = a(t)/b2(t), b1(t) = b1(t)/b2(t). Then we have

˜ g2(t) = f(t)˜a(t) − g1(t)b1(t). (2.2.5)

Substitute g2(t) in Equation (2.2.5) into Equation (2.2.3) to get

   ˜  f(t) ϕ(t) − a˜(t)ξ2(t) = g1(t) ξ1(t) − b1(t)ξ2(t) . (2.2.6)

By Remark 2.2.4, we can apply Case 1. Hence

Z b Z b    ˜  f(t) ϕ(t) − a˜(t)ξ2(t) dB(t) = g1(t) ξ1(t) − b1(t)ξ2(t) dB(t). (2.2.7) a a

27 But we can use Definition 2.2.1 to see that

Z b Z b Z b   f(t) ϕ(t) − a˜(t)ξ2(t) dB(t) = f(t)ϕ(t) dB(t) − f(t)˜a(t)ξ2(t) dB(t), a a a (2.2.8) and

Z b Z b Z b  ˜  ˜ g1(t) ξ1(t) − b1(t)ξ2(t) dB(t) = g1(t)ξ1(t) dB(t) − g1(t)b1(t)ξ2(t) dB(t), a a a (2.2.9) since both sides have the same limit in probability. It follows from Equations

(2.2.5), (2.2.7), (2.2.8), and (2.2.9) that

Z b f(t)ϕ(t) dB(t) a Z b Z b  ˜  = g1(t)ξ1(t) dB(t) + f(t)˜a(t) − g1(t)b1(t) ξ2(t) dB(t) a a Z b Z b = g1(t)ξ1(t) dB(t) + g2(t)ξ2(t) dB(t), a a which proves Case 2.

Case 3: m = 1, n ≥ 3.

Assume n X f(t)ϕ(t) = gj(t)ξj(t). j=1 Then similar to Equation (2.2.5), we have

n−1 X ˜ gn(t) = f(t)˜a(t) − gj(t)bj(t), j=1 ˜ wherea ˜(t) = Eϕ(t)/Eξn(t) and bj(t) = Eξj(t)/Eξn(t), 1 ≤ j ≤ n−1. Then, similar to Equation (2.2.7), we have

n−1   X  ˜  f(t) ϕ(t) − a˜(t)ξn(t) = gj(t) ξj(t) − bj(t)ξn(t) . j=1 Thus, this case has been reduced to the case for n − 1. Hence we can use the same arguments as those in Case 2 and the induction to prove the lemma for this case.

28 Case 4: m ≥ 2, n ≥ 1.

Assume m n X X fi(t)ϕi(t) = gj(t)ξj(t). i=1 j=1 which can be rewritten as

n m X X f1(t)ϕ1(t) = gj(t)ξj(t) − fi(t)ϕi(t). j=1 i=2

Therefore, this case is reduced to Case 3 and the lemma is proved for general case.

In general, if a stochastic process Φ(t) can be written as a series of the form f(t)ϕ(t) in L2(Ω), we define the stochastic integral of Φ(t) to be the sum of the series in probability.

Definition 2.2.5 (Hwang–Kuo–Saitˆo–Zhai[14]). Suppose Φ(t), a ≤ t ≤ b, is a

∞ stochastic process and there exists a sequence {Φn(t)}n=1 of stochastic processes of the form in Equation (2.2.1) satisfying the conditions:

R b 2 1. a |Φ(t) − Φn(t)| dt → 0 almost surely.

R b 2. a Φn(t) dB(t) converges in probability.

Then the stochastic integral of Φ(t) is defined by

Z b Z b Φ(t) dB(t) = lim Φn(t) dB(t), in probability. (2.2.10) a n→∞ a

Example 2.2.6. We have

Z t Z t B(1) dB(s) = B(s) + B(1) − B(s) dB(s) = B(1)B(t) − t, (2.2.11) 0 0 for 0 ≤ t ≤ 1.

29 2.3 ItˆoIsometry for General Stochastic Integral

It is mentioned in Theorem 1.4.4 that the classical Itˆointegration operator is an isometry. In [25], an isometry equality for general stochastic integral is proved, and it is same to the isometry equality obtained in the white noise distribution theory.

Theorem 2.3.1 ([25]). Let f(t) and ϕ(t) be stochastic processes adapted to and instantly independent of {Ft}, respectively. Then we have,

h Z T i E f(t)ϕ(t) dB(t) = 0. 0

Theorem 2.3.2 ([25]). Let f(t) and ϕ(x) be as above. In addition, assume that f(t) and ϕ(x) are equal to their Maclaurin series on R. Let B(t) be a Brownian motion. Then we have

h Z T 2i E f(B(t))ϕ(B(T ) − B(t)) dB(t) 0 Z T = Ef(B(t))ϕ(B(T ) − B(t))2 dt 0 Z TZ t + 2 Ef(B(s))ϕ0(B(T ) − B(s))f 0(B(t))ϕ(B(T ) − B(t)) dsdt. (2.3.1) 0 0 2.4 A General ItˆoFormula

The Itˆoformula plays the central role in both theory and application of Itˆocalculus.

For the general stochastic integral defined in Section 2.2, we proved its Itˆoformula in [14]. Consider the following stochastic processes

Z t Z t Xt = Xa + g(s) dB(s) + h(s) ds, (2.4.1) a a Z b Z b Y (t) = Y (b) + ξ(s) dB(s) + η(s) ds, (2.4.2) t t where g(t) and h(t) are {Ft}-adapted so that Xt is an Itˆoprocess, and ξ(t) and

(t) η(t) are instantly independent of {Ft} such that Y is also instantly independent

(t) of {Ft}. Thus Y is a stochastic process in the counter part. Then we have the following theorem and we use the proof from [14].

30 Theorem 2.4.1. Let Xt, a ≤ t ≤ b, be an Itˆoprocess given by Equation (2.4.1) and Y (t), a ≤ t ≤ b, an instantly independent process given by Equation (2.4.2).

Suppose θ(x, y) is a real-valued C2-function on R2. Then the following equality holds for a ≤ t ≤ b:

Z t Z t (t) (a) (s) 1 (s) 2 θ(Xt,Y ) =θ(Xa,Y ) + θx(Xs,Y ) dXs + θxx(Xs,Y )(dXs) a 2 a Z t Z t (s) (s) 1 (s) (s) 2 + θy(Xs,Y ) dY − θyy(Xs,Y )(dY ) , a 2 a which can be expressed symbolically in terms of stochastic differentials as

1 1 dθ(X ,Y (t)) = θ dX + θ (dX )2 + θ dY (t) − θ (dY (t))2. (2.4.3) t x t 2 xx t y 2 yy

Proof. Since θ(x, y) is a real-valued C2-function, we have the estimate

θ(x, y) ≈θ(x0, y0) + θx(x0, y0)(x − x0) + θy(x0, y0)(y − y0) 1 1 + θ (x , y )(x − x )2 + θ (x , y )(y − y )2 2 xx 0 0 0 2 yy 0 0 0

+ θxy(x0, y0)(x − x0)(y − y0). (2.4.4)

(t) Now, we calculate the stochastic differential dθ(Xt,Y ). For any partition a = t0 < t1 < ··· < tn = t of the interval [a, t], we have

n h i (t) (a) X (ti) (ti−1) θ(Xt,Y ) = θ(Xa,Y ) + θ(Xti ,Y ) − θ(Xti−1 ,Y ) . (2.4.5) i=1 Then for each i = 1, 2, . . . , n, by Equation (2.4.4), we get

(ti) (ti−1) θ(Xti ,Y ) − θ(Xti−1 ,Y )

(ti−1) (ti−1) (ti) (ti−1) ≈ θx(Xti−1 ,Y )(Xti − Xti−1 ) + θy(Xti−1 ,Y )(Y − Y ) 1 1 + θ (X ,Y (ti−1))(X − X )2 + θ (X ,Y (ti−1))(Y (ti) − Y (ti−1))2 2 xx ti−1 ti ti−1 2 yy ti−1

(ti−1) (ti) (ti−1) + θxy(Xti−1 ,Y )(Xti − Xti−1 )(Y − Y )

= Ii + IIi + IIIi + IVi + Vi,

(2.4.6)

31 where Ii, IIi, IIIi, IVi, Vi are defined term by term in the corresponding order, re- spectively.

Now, we consider the summations of these terms over i. For the first term in

Equation (2.4.6), we have

n n X X (ti−1) Ii = θx(Xti−1 ,Y )(Xti − Xti−1 ) i=1 i=1 n X  (ti) (ti) (ti) (ti−1)  ≈ θx(Xti−1 ,Y ) + θxy(Xti−1 ,Y )(Y − Y ) (Xti − Xti−1 ) i=1 Z t Z t s x (s) → θx(Xs,Y ) dXs − θxy(Xx,Y )(dXs)(dY ). (2.4.7) a a

For the second term in Equation (2.4.6), we have

n n X X (ti−1) (ti) (ti−1) IIi = θy(Xti−1 ,Y )(Y − Y ) i=1 i=1 n X  (ti) (ti)  (ti) (ti−1) ≈ θy(Xti−1 ,Y ) + θyy(Xti−1 ,Y ) (Y − Y ) i=1 Z t Z t s (s) x (s) 2 → θy(Xs,Y ) dY − θyy(Xx,Y )(dY ) . (2.4.8) a a

For the third term in Equation (2.4.6), we have

n n X 1 X III = θ (X ,Y (ti−1))(X − X )2 i 2 xx ti−1 ti ti−1 i=1 i=1 Z t 1 s 2 → θxx(Xs,Y )(dXs) . (2.4.9) 2 a

(ti−1) (ti) Note that we do not have to change θxx(Xti−1 ,Y ) to θxx(Xti−1 ,Y ) since the

2 2 integrator (dXs) = g(s) ds from Equation (2.4.1). Similarly, for the fourth term in Equation (2.4.6), we have

n n X 1 X IV = θ (X ,Y (ti−1))(Y (ti) − Y (ti−1))2 i 2 yy ti−1 i=1 i=1 Z t 1 s (s) 2 → θyy(Xs,Y )(dY ) . (2.4.10) 2 a

32 For the fifth term in Equation (2.4.6), we have

n n X X (ti−1) (ti) (ti−1) Vi = θxy(Xti−1 ,Y )(Xti − Xti−1 )(Y − Y ) i=1 i=1 Z t s (s) → θxy(Xs,Y )(dXs)(dY ). (2.4.11) a

Finally, we sum up Equations (2.4.7)–(2.4.11) and get the stochastic differential of

(t) θ(Xt,Y ). And then, Equation (2.4.3) and the theorem follow.

It is easy to generate the Itˆoformula for multiple stochastic processes.

(i) Theorem 2.4.2. Let Xt , 1 ≤ i ≤ n, be Itˆoprocesses given by Equation (2.4.1)

(t) and Yj , 1 ≤ j ≤ m, instantly independent processes given by Equation (2.4.2).

1 2 Suppose θ(t, x1, . . . , xn, y1, . . . , ym) is a real-valued function being C in t and C

(1) (n) (t) (t) in xi and yj. Then the stochastic differential of θ(t, Xt ,...,Xt ,Y1 ,...,Ym ) is given by

(1) (n) (t) (t) θ(t, Xt ,...,Xt ,Y1 ,...,Ym ) n n X (i) 1 X (i) (k) = θ dt + θ dX + θ (dX )(dX ) t xi t 2 xixk t t i=1 i,k=1 m m X (t) 1 X (t) (t) + θ dY − θ (dY )(dY ). (2.4.12) yj j 2 yj yl j l j=1 j,l=1

(t) (t) Corollary 2.4.3. Let Xt,Yt be Itˆoprocesses given by Equation (2.4.1), and X ,Y instantly independent processes given by Equation (2.4.2). Then we have the fol- lowing product rules:

d(XtYt) = Xt dYt + Yt dXt + (dXt)(dYt),

d(X(t)Y (t)) = X(t) dY (t) + Y (t) dX(t) − (dX(t))(dY (t)),

(t) (t) (t) d(XtY ) = Xt dY + Y dXt.

33 Chapter 3 Near-Martingale Property of Anticipating Stochastic Integration

3.1 Near-Martingale Property

In Section 1.4, we have seen from Theorem 1.4.5 the martingale property of a stochastic process defined by the classical Itˆointegral. However, one would not expect this property for a non-, not to say its general stochastic integral. Nevertheless, a similar property holds for general stochastic integrals. We discuss this property in this chapter. Recall the definition of martingale property.

It is equivalent to the following three conditions:

1. For each t ∈ [0,T ], E |Xt| < ∞;

2. Xt is adapted to {Ft};

3. E[Xt − Xs|Fs] = 0, for all 0 ≤ s < t ≤ T .

The main reason that a stochastic process defined by general stochastic integral is not able to be a martingale is that the Condition 2 above does not hold for the counter part. So we expect a property that satisfies Condition 1 and 3. It is firstly introduced in [24] by H.-H. Kuo, A. Sae-Tang and B. Szozda.

Definition 3.1.1 (Kuo–Sae-Tang–Szozda [24]). A stochastic process Xt is called a near-martingale with respect to a filtration {Ft} if E|Xt| < ∞ for all t ∈ [a, b] and E[Xt − Xs|Fs] = 0 for all a ≤ s < t ≤ b.

The following theorem gives an intrinsic characterization of a near-martingale.

Theorem 3.1.2 (Hwang–Kuo–Saitˆo–Zhai[15]). Let Xt, a ≤ t ≤ b be a stochastic process with E|Xt| < ∞ for each t ∈ [a, b] and let Yt = E[Xt|Ft]. Then Xt is a near-martingale if and only if Yt is a martingale.

34 Proof. First assume that Xt is a near-martingale. Then for any s ≤ t we have

  E[Yt|Fs] = E E[Xt|Ft]|Fs = E[Xt|Fs] = E[Xs|Fs] = Ys.

So Yt is a martingale. On the other hand, if Yt is a martingale, then for any s ≤ t, we have

  E[Xt|Fs] = E E[Xt|Ft]|Fs = E[Yt|Fs] = Ys = E[Xs|Fs],

which implies that Xt is a near-martingale.

Example 3.1.3. From Example 2.2.6, we have seen the general stochastic integral

Z t Xt = B(1) dB(s) = B(1)B(t) − t. 0 Then

2 Yt = E[Xt|Ft] = E[B(1)B(t) − t|Ft] = B(t)E[B(1)|Ft] − t = B(t) − t.

But

2 E[Yt|Fs] = E[B(t) − t|Fs]

 2  = E (B(t) − B(s) + B(s)) |Fs − t

 2     2  = E (B(t) − B(s)) |Fs + 2E (B(t) − B(s))B(s)|Fs + E B(s) |Fs − t

= E(B(t) − B(s))2 + 2B(s)E(B(t) − B(s)) + B(s)2 − t

= t − s + 0 + B(s)2 − t

2 = B(s) − s = Ys,

which means Yt is a martingale. By Theorem 3.1.2, Xt is a near-martingale.

Example 3.1.3 gives us an intuition that, similar to the martingale property of Itˆointegral shown in Theorem 1.4.5, we can expect that a stochastic process defined by the general stochastic integral is a near-martingale. In [24], the authors proved this conjecture.

35 Theorem 3.1.4 (Kuo–Sae-Tang–Szozda [24]). The stochastic process Xt defined in Definition 2.2.1, i.e.,

Z t X(t) = f(s)ϕ(s) dB(s) (3.1.1) a is a near-martingale with respect to {Ft}.

Moreover, Xt defined in Theorem 3.1.4 is also a near-martingale with respect to

(t) a counter {Ft}-filtration {G }.

Theorem 3.1.5 (Kuo–Sae-Tang–Szozda [24]). The stochastic process Xt defined in Theorem 3.1.4 is a near-martingale with respect to a counter {Ft}-filtration {G(t)}.

It is natural to extend the submartingale and supermartingale properties (Defi- nition 1.3.11 and 1.3.12) to general stochastic integrals.

Definition 3.1.6 (Hwang–Kuo–Saitˆo–Zhai[15]). A stochastic process Xt, a ≤ t ≤ b, with E|Xt| < ∞ for all t, is called a near-submartingale with respect to a

filtration {Ft}, if for all a ≤ s < t ≤ b, we have

E[Xt|Fs] ≥ E[Xs|Fs], almost surely, or equivalently,

E[Xt − Xs|Fs] ≥ 0, almost surely. (3.1.2)

Definition 3.1.7 (Hwang–Kuo–Saitˆo–Zhai[15]). A stochastic process Xt, a ≤ t ≤ b, with E|Xt| < ∞ for all t, is called a near-supermartingale with respect to a

filtration {Ft}, if for all a ≤ s < t ≤ b, we have

E[Xt|Fs] ≤ E[Xs|Fs], almost surely, or equivalently,

E[Xt − Xs|Fs] ≤ 0, almost surely. (3.1.3)

36 By similar arguments, the intrinsic characterization of near-submartingale and near-supermartingale can be proved.

Theorem 3.1.8 (Hwang–Kuo–Saitˆo–Zhai[15]). Let Xt, a ≤ t ≤ b be a stochastic process with E|Xt| < ∞ for each t ∈ [a, b] and let Yt = E[Xt|Ft]. Then Xt is a near-submartingale or near-supermartingale, if and only if Yt is a submartingale or supermartingale, respectively.

3.2 Doob–Meyer’s Decomposition for Near-Submartingales

In Theorem 1.3.14, we have seen a unique decomposition of a submartingale. Its analogue near-submartingale also has a similar decomposition. The Doob–Meyer’s

∞ decomposition for a near-submartingale random sequence {Xn}n=1 with respect to

∞ a sequence filtration {Fn}n=1 is given in [27]. In this section, we give the Doob– Meyer’s decomposition for near-submartingale processes.

Theorem 3.2.1 (Hwang–Kuo–Saitˆo–Zhai[15]). Let Xt, a ≤ t ≤ b, be a continuous near-submartingale with respect to a continuous filtration {Ft; a ≤ t ≤ b}. Then

Xt has a unique decomposition

Xt = Mt + At, a ≤ t ≤ b, (3.2.1)

where Mt is a continuous near-martingale with respect to {Ft}, and At is a con- tinuous stochastic process satisfying the conditions:

1. Aa = 0;

2. At is increasing in t almost surely;

3. At is adapted to {Ft}.

Remark 3.2.2. Notice that the three conditions in the theorem for At are the same to the conditions for the compensator in the Doob–Meyer’s decomposition

37 for submartingales and that At indeed plays the same role in the decompositions.

So we also call At the compensator of the near-submartingale Xt.

Proof. We prove the uniqueness first. Suppose

Xt = Mt + At = Mft + Aet, where Mt, Mft are continuous near-martingales and At, Aet are continuous stochastic processes satisfying the three conditions in the theorem. By taking conditional expectation, we have

E[Xt|Ft] = E[Mt|Ft] + At = E[Mft|Ft] + Aet.

By Theorem 3.1.2, E[Mt|Ft] and E[Mft|Ft] are martingales, and by Theorem 3.1.8,

E[Xt|Ft] is a submartingale. Thus, the above equation provides two Doob–Meyer’s decompositions of the submartingale E[Xt|Ft]. So At = Aet by Theorem 1.3.14, and then Mt = Mft by the decomposition of Xt. The uniqueness is proved.

To prove the existence, let Yt = E[Xt|Ft]. By Theorem 3.1.8, Yt is a submartin- gale. By Theorem 1.3.14, the Doob–Meyer’s decomposition for submartingales, we have Yt = Nt + At, where Nt is a continuous martingale and At is a continuous stochastic process satisfying the three conditions in the theorem. Define

Mt = Xt − E[Xt|Ft] + Nt.

Then we have

Xt = Mt + E[Xt|Ft] − Nt = Mt + Yt − Nt = Mt + At, (3.2.2) and

  E[Mt|Ft] = E[Xt|Ft] − E E[Xt|Ft]|Ft] + E[Nt|Ft] = E[Nt|Ft] = Nt.

But Nt is a martingale. So Mt is a near-martinagle by Theorem 3.1.2. Therefore, Equation (3.2.2) provides a Doob–Meyer’s decomposition for the near-submartingale

Xt. We have proved the existence and finalized the proof.

38 3.3 General Girsanov Theorem

The most important application of the Itˆoformula is the Girsanov theorem (see

Section 1.5). It plays a crucial role in transform of probability measure and ap- plication in the mathematical finance. For general stochastic integral, H.-H. Kuo,

Y. Peng and B. Szozda proved in [23] a general Girsanov theorem by applying the general Itˆoformula. Recall that in the proof of the Girsanov theorem, the key tool is L´evy’sCharacterization Theorem that characterizes a Brownian motion.

Theorem 3.3.1 (L´evy’sCharacterization Theorem). A stochastic process Xt is a Brownian motion if and only if there exists a probability measure Q and a filtration

{Ft} such that

1. Xt is a continuous martingale with respect to {Ft} under Q;

2. Q(X0 = 0) = 1;

3. The quadratic variation hXit with respect to Q of Xt on the interval [0, t] is equal to t almost surely.

As discussed in Section 3.1, we can not expect the martingale property for non- adapted case. So Condition 1 must be corrected to the near-martingale property.

The following theorem shows that the similar conclusion and gives the construction of the corresponding probability measure Q.

Theorem 3.3.2 (Kuo–Peng–Szozda [23]). Suppose that B(t) is a Brownian motion and {Ft} is its natural filtration on a probability space (Ω, F,P ). Let f(t) and g(t) be continuous square-integrable stochastic processes such that f(t) is adapted to

  (t) Ft and g(t) is adapted to a counter {Ft}-filtration G , such that E[Ef (t)] < ∞ and E[E g(t)] < ∞ for all t > 0, where

Z t Z t n 1 2 o Ef (t) = exp f(s) dB(s) − f (s) ds , 0 2 0

39 and n Z T 1 Z T o E g(t) = exp − g(s) dB(s) − g2(s) ds . t 2 t Let Z t  Bet = Bt + f(s) + g(s) ds. 0

Then Bet is a near-martingale with respect to (Ω, F,Q), where

Z T Z T 1 2 dQ = exp {− (f(t) + g(t)) dBt − (f(t) + g(t)) dt} dP. (3.3.1) 0 2 0

By the construction of Be, Condition 2 in L´evy’sCharacterization Theorem also holds. The following theorem gives the analogue of Condition 3.

Theorem 3.3.3 (Kuo–Peng–Szozda [23]). Suppose that the assumptions of Theo- rem 3.3.2 hold. Then the Q-quadratic variation of Be on the interval [0, t] is equal to t.

40 Chapter 4 Application to Stochastic Differential Equations

4.1 Stochastic Differential Equations for Exponential Processes

As shown in Example 1.4.12, the exponential processes play a central role in trans- form of probability measures and the Girsanov theorem. Although the general

Girsanov theorem shown in Section 3.3 is proved, its proof doesn’t use exponential processes but uses the classical Girsanov theorem. This means that the full un- derstanding of the measure transform and exponential process is not clear. In this section, we give some results in the study of non-adapted exponential processes using the general Itˆoformula.

2 Recall Example 1.4.12, we have that the exponential process for f ∈ Lad(Ω,L [0, 1]) Z t Z t n 1 2 o Ef (t) = exp f(s) dB(s) − f (s) ds 0 2 0 satisfies the stochastic differential equation   dX(t) = f(t)X(t) dB(t), 0 ≤ t ≤ 1, (4.1.1)  X(0) = 1. However, if f is not adapted, e.g., f(t) = B(1), then we can prove that the expo- nential process defined as above does not satisfy the stochastic differential equation

(4.1.1). In fact, by Equation (2.2.11), we have

n Z t 1 Z t o 1 exp B(1) dB(s) − B(1)2 ds = exp B(1)B(t) − t − B(1)2t. 0 2 0 2

(t) Then by applying Theorem 2.4.2 to Xt = B(t),Y = B(1) − B(t) and θ(t, x, y) =

1 2  exp (x + y)x − t − 2 (x + y) t , we can see that Equation (4.1.1) does not hold. Nevertheless, if we define for B(1) the exponential process as

h Z t 1 i E(t) = exp B(1) e−(t−s) dB(s) − B(1)21 − e−2t − t , (4.1.2) 0 4

41 then we have

Theorem 4.1.1. The stochastic process E(t) defined in Equation (4.1.2) is the solution to the stochastic differential equation   dE(t) = B(1)E(t) dB(t), 0 ≤ t ≤ 1, (4.1.3)  E(0) = 1.

Remark 4.1.2. This stochastic differential equation was first studied by Buckdahn

[7] by a different method. The white noise version is a special case of Theorem

13.34 in the book [19].

In fact, we can define a general exponential process for both adapted and antic- ipative processes. Let Xt be an Itˆoprocess

Z t Z t 1 2 Xt = g(s) dB(s) − g(s) ds, a ≤ t ≤ b, (4.1.4) a 2 a

2 (t) where g ∈ Lad([a, b] × Ω), and Y an instantly independent process

Z b 1 Z b Y (t) = − h(s) dB(s) − h(s)2 ds, a ≤ t ≤ b, (4.1.5) t 2 t

2 where h ∈ Lct([a, b]×Ω). Define the exponential process E(t) associated with g and h by

(t) E(t) = eXt eY , a ≤ t ≤ b. (4.1.6)

(t) x y Apply Theorem 2.4.2 to Xt,Y and θ(x, y) = e e . Then we obtain

(t)  (t) dθ Xt,Y = g(t) + h(t) θ Xt,Y dB(t), i.e.,

dE(t) = g(t) + h(t)E(t) dB(t).

The above argument proves

42 2 Theorem 4.1.3 (Hwang–Kuo–Saitˆo–Zhai[14]). Let g ∈ Lad([a, b] × Ω) and h ∈

2 Lct([a, b] × Ω). Then the exponential process

h Z t 1 Z t Z b 1 Z b i E(t) = exp g(s) dB(s) − g(s)2 ds − h(s) dB(s) − h(s)2 ds a 2 a t 2 t (4.1.7) is the solution to the stochastic differential equation

dE(t) = g(t) + h(t)E(t) dB(t).

Remark 4.1.4. The exponential process (4.1.7) involves two special cases of stochas- tic differential equations:

1. When h = 0, then

h Z t 1 Z t i E(t) = exp g(s) dB(s) − g(s)2 ds a 2 a

solves the forward equation   dE(t) = g(t)E(t) dB(t), a ≤ t ≤ b, (4.1.8)  E(a) = 1.

2. When g = 0, then

h Z b 1 Z b i E(t) = exp − h(s) dB(s) − h(s)2 ds t 2 t

solves the backward equation   dE(t) = h(t)E(t) dB(t), a ≤ t ≤ b, (4.1.9)  E(b) = 1.

4.2 Stochastic Differential Equations with Anticipative Coefficients

In the white noise distribution theory (see [19]), the following theorem gives the solution of a stochastic integral equation with anticipative coefficients:

43 Theorem 4.2.1. Let 0 ≤ a < b < ∞ and let f ∈ L∞([a, b]). Then the stochastic integral equation Z t ∗ X(t) = x0 + ∂s (f(s)B(b)X(s)) ds, a ≤ t ≤ b, (x0 ∈ R), 0 has a unique solution in L2([a, b]; (L2)) given by

X(t) Z t Z t Z t h − R tf(τ) dτ 1 2 2 −2 R t f(τ) dτ i = x0 exp B(b) f(s)e s dB(s)− B(b) f(s) e s ds− f(s) ds . a 2 a a (4.2.1) The stochastic integral equation can be written in stochastic differential   dXt = f(t)B(b)Xt dBt, t ∈ [a, b],

 Xa = x0.

In particular, if f(t) ≡ 1, a = 0, b = 1, and x0 = 1, then we get that the solution of   dXt = B(1)Xt dBt, t ∈ [0, 1],

 X0 = 1, is h Z t 1 i X(t) = exp B(1) f(s)e−(t−s) dB(s) − B(1)2 1 − e−2t − t . 0 4 The following theorem gives the solution of the stochastic differential equa- R 1 tion with B(1) replaced by a general anticipative coefficient 0 h(t) dB(t), for h ∈ L2([0, 1]).

Theorem 4.2.2. Let h ∈ L2([0, 1]). Then the stochastic differential equation   R 1   dXt = 0 h(t) dB(t) Xt dBt, t ∈ [0, 1],

 X0 = 1, has a unique solution in L2([0, 1]; (L2)) given by h Z 1 Z t  X(t) = exp h(t) dB(t) e− H(t)−H(s) h(s) dB(s) 0 0 1 Z 1 i − h(t) dB(t)2 1 − e−2H(t) − H(t) , (4.2.2) 4 0

44 R t 2 where H(t) = 0 h(s) ds.

Proof. Let

Z 1 Z t (t) H(s) Y = h(s) dB(s), and Zt = h(s)e dB(s). t 0

Then

(t) H(t) dY = −h(t) dB(t), and dZt = h(t)e dB(t).

Moreover, dH(t) = h(t)2 dt.

Now,

 1 2  X(t) = exp Y (0)Z e−H(t) − Y (0) 1 − e−2H(t) − H(t) t 4

(0)  = θ Y ,Zt, t ,

 −H(t) 1 2 −2H(t)  where θ(y, z, t) = exp yze − 4 y 1 − e − H(t) . For this θ, we have

∂θ = ye−H(t) θ, ∂z

2 ∂ θ 2 = ye−H(t) θ = y2e−2H(t)θ, ∂z2

∂2θ  1  = ze−H(t) − y 1 − e−2H(t) ye−H(t) θ + e−H(t)θ, ∂z∂y 2

∂θ  1  = −yze−H(t)h(t)2 − y2e−2H(t)h(t)2 − h(t)2 θ. ∂t 2

45 By applying the general Itˆoformula in Theorem 2.4.2 to θ above, we get

(0)  dXt =dθ Y ,Zt, t ∂θ 1 ∂2θ ∂2θ ∂θ = dZ + (dZ )2 − (dZ ) dY (t) + dt ∂z t 2 ∂z2 t ∂z∂y t ∂t n 1 2 =θ × Y (0)e−H(t)eH(t)h(t) dB + Y (0) e−2H(t)e2H(t)h(t)2 dt t 2  1 1  + Z e−H(t)Y (0)e−H(t) − Y (0)2e−H(t) + Y (0)2e−3H(t) + e−H(t) t 2 2 × eH(t)h(t)2 dt  1  o + − Y (0)Z e−H(t)h(t)2 − dY (0)2e−2H(t)h(t)2 − h(t)2 dt t 2 n 1 1 =θ × Y (0)h(t) dB + Y (0)2h(t)2 dt+Z Y (0)e−H(t)h(t)2 dt− Y (0)2h(t)2 dt t 2 t 2 1 + Y (0)2e−2H(t)h(t)2 dt + h(t)2 dt − Y (0)Z e−H(t)h(t)2 dt 2 t 1 o − Y (0)2e−2H(t)h(t)2 dt − h(t)2 dt 2 (0) =θY h(t) dBt

(0) =XtY h(t) dBt, where, for simplicity, we omit the variables of θ and its partial derivatives. Thus,

Xt satisfies the stochastic differential equation

(0)  dXt = XtY h(t) dBt, t ∈ [0, 1],

X0 = 1.

4.3 Stochastic Differential Equations with Anticipative Initial Condition

In [18], the authors gave the solution to stochastic differential equations with an- ticipative initial condition in the following theorem:

Theorem 4.3.1 (Khalifa–Kuo–Ouerdiane–Szozda [18]). Suppose that α ∈ L2([a, b])

2 ∞ T and β ∈ Lad([a, b] × Ω). Suppose also that ρ ∈ M S(R). Then the stochastic

46 differential equation   dXt = α(t)Xt dBt + β(t)Xt dt, t ∈ [a, b],

 Xa = ρ(Bb − Ba). has a unique solution given by

Xt = [ρ(Bb − Ba) − ξ(t, Bb − Ba)]Zt, where Z t  Z t  ξ(t, y) = α(s)ρ0 y − α(u) du ds, a s and Z t Z t    1 2 Zt = exp α(s) dBs + β(s) − α(s) ds . a a 2 However, this theorem is not clear and its proof does not show the essential structure of the stochastic differential equation. In [14], we have proved a simplified form of Theorem 4.3.1, and its proof is also intrinsic. In fact, by the general Itˆo formula in Theorem 2.4.2, we can give an explicit formula for the solution of a stochastic differential equation with anticipative initial condition:

Theorem 4.3.2 (Hwang–Kuo–Saitˆo–Zhai[14]). Let α(t) be a deterministic func-

2 R b 2 tion in L ([a, b]), β(t) an adapted stochastic process such that E a |β(t)| dt < ∞, and ρ a continuous function on R. Then the stochastic differential equation   dXt = α(t)Xt dB(t) + β(t)Xt dt, a ≤ t ≤ b, (4.3.1)   Xa = ρ B(b) − B(a) , has a unique solution given by

Z t Z t Z t  h 1 2 i Xt = ρ B(b) − B(a) − α(s) ds exp α(s) dB(s) + β(s) − α(s) ds . a a a 2 (4.3.2)

This kind of equations has strong practical meaning. As we will see in Section

5.2, we will use this model to represent the stock market with inside information.

47 In fact, they were considered in [8] and [10] in the white noise distribution theory.

However, as mentioned in Section 1.1, these results have difficulty to have proba- bilistic meaning, and so to apply to the analysis of real market and option pricing.

The solution expression in Theorem 4.3.2 provides the explicit way of calculating and simulating of the market, and can offer an option pricing formula (see Section

5.2, 5.5 and 5.6). 4.4 Conditional Expectation of the Solution of Theorem 4.3.2

Now, let Yt = E[Xt|Ft] be the conditional expectation of the solution process Xt in

Equation (4.3.2). Then Yt becomes adapted to the filtration {Ft}. It is natural to conjecture that Yt is the solution to the adapted version of the stochastic differential equation (4.3.1), namely,

 dXt = α(t)Xt dBt + β(t)Xt dt, t ∈ [a, b],  Xa = E ρ(Bb − Ba) . However, this is not the fact and we have the following result:

Theorem 4.4.1 (Kuo–Zhai [29]). Suppose α(t) and β(t) are as in Theorem 4.3.2.

Moreover, assume that ρ is a smooth function whose Maclaurin series converges to itself. Suppose also that X1(t) and X2(t) are the solutions of the same linear stochastic differential equation

dXt = α(t)Xt dBt + β(t)Xt dt, t ∈ [a, b], with different initial conditions

0 X1(a) = ρ(Bb − Ba) and X2(a) = ρ (Bb − Ba), respectively. Let Y1(t) = E[X1(t)|Ft] and Y2(t) = E[X2(t)|Ft]. Then Y1(t) satisfies the following stochastic differential equation   dY1(t) = α(t)Y1(t) dBt + β(t)Y1(t) dt + Y2(t) dBt, t ∈ [a, b], (4.4.1)   Y1(a) = E ρ(Bb − Ba) .

48 Proof. Define Z t Z t n 1 2 o Zt = exp α(s) dB(s) + β(s) − α(s) ds . (4.4.2) a a 2

By the assumption, we can write X1(t) as Z t  X1(t) = ρ Bb − Ba − α(s) ds Zt a ∞ X 1 Z t = Z ρ(k)B − B − α(s) ds(B − B )k. t k! t a b t k=0 a Then

Y1(t) = E[X1(t)|Ft] ∞ X 1 Z t = Z ρ(k)B − B − α(s) dsE (B − B )k|F  t k! t a b t t k=0 a ∞ X 1 Z t = Z ρ(2k)B − B − α(s) ds(b − t)k(2k − 1)!! t (2k)! t a k=0 a ∞ X 1 Z t = Z ρ(2k)B − B − α(s) ds(b − t)k t (2k)!! t a k=0 a ∞ X 1 = Z V k, t (2k)!! t k=0 k (2k) R t  k where Vt = ρ Bt − Ba − a α(s) ds (b − t) . Similarly, we have ∞ X 1 Z t Y (t) = Z ρ(2k+1)B − B − α(s) ds(b − t)k. 2 t (2k)!! t a k=0 a k k Now, both Vt and Zt are adapted. In order to find dY1(t), we first find dVt and dZt. By the definition of Zt,

dZt = α(t)Zt dBt + β(t)Zt dt.

By Itˆo’sformula (2.4.12), we have Z t k (2k+1)  k dVt =ρ Bt − Ba − α(s) ds (b − t) (dBt − α(t) dt) a Z t 1 (2k+2)  k + ρ Bt − Ba − α(s) ds (b − t) dt 2 a Z t (2k)  k−1 − ρ Bt − Ba − α(s) ds k(b − t) dt. a

49 So

k d Y1(t) ∞ X 1 = V k dZ + Z dV k + (dV k)(dZ ) (2k)!! t t t t t t k=0 ∞ X 1 = ρ (b − t)k(α(t)Z dB + β(t)Z dt) (2k)!! 2k t t t k=0 ∞ ∞ X 1 1 X 1 + Z ρ (b−t)k(dB −α(t) dt) + Z ρ (b − t)k dt (2k)!! t 2k+1 t 2 (2k)!! t 2k+2 k=0 k=0 ∞ ∞ X 1 X 1 − Z ρ k(b − t)k−1 dt + α(t)Z ρ (b − t)k dt (2k)!! t 2k (2k)!! t 2k+1 k=0 k=0 ∞ ∞ X 1 X 1 = ρ (b − t)kα(t)Z dB + ρ (b − t)kβ(t)Z dt (2k)!! 2k t t (2k)!! 2k t k=0 k=0 ∞ ∞ X 1 X 1 + Z ρ (b − t)k dB − Z ρ (b − t)kα(t) dt (2k)!! t 2k+1 t (2k)!! t 2k+1 k=0 k=0 ∞ ∞ 1 X 1 1 X 1 + Z ρ (b − t)k dt − Z ρ (b − t)k−1 dt 2 (2k)!! t 2k+2 2 (2(k − 1))!! t 2k k=0 k=0 ∞ X 1 + α(t)Z ρ (b − t)k dt (2k)!! t 2k+1 k=0 ∞ ∞ X 1 X 1 = ρ (b − t)kα(t)Z dB + ρ (b − t)kβ(t)Z dt (2k)!! 2k t t (2k)!! 2k t k=0 k=0 ∞ X 1 + Z ρ (b − t)k dB (2k)!! t 2k+1 t k=0

= α(t)Y1(t) dBt + β(t)Y1(t) dt + Y2(t) dBt,

(n) R t  where, for simplicity, ρn denotes ρ Bt − Ba − a α(s) ds , and the forth equality is by the cancelation of the last four terms.

Remark 4.4.2. If we consider what we have seen in the near-martingale prop- erty that a general stochastic integral satisfies, the appearance of the extra term

Y2(t) dBt in the stochastic differential equation (4.4.1) is not unexpected. Due to the existence of the anticipative part, the stochastic differential equation satisfied

50 by the conditional expectation Y1(t) of X1(t) is not just simply the solution of the stochastic differential equation (4.3.2). Moreover, the derivative ρ0 appearing in the initial condition satisfied by X2(t) is due to the extra term in the Itˆoisometry for general stochastic integral (see Theorem 2.3.2), which essentially, comes from the definition of general stochastic integration and its Itˆoformula.

A case of most interest of ρ in Theorem 4.4.1 is when it is a Hermite polynomial.

Recall that the Hermite polynomial of degree n with parameter η is defined by (see, e.g., [20])

n x2/2η n −x2/2η Hn(x; η) = (−η) e Dx e , (4.4.3) where Dx is the differential operator with respect to the variable x. Hermite poly- nomial Hn have the following properties:

DxHn(x; η) = nHn−1(x; η), (4.4.4) 1 D H (x; η) = − D2H (x; η), (4.4.5) η n 2 x n n X n H (x + y; η) = H (x; η)yk. (4.4.6) n k n−k k=0

Lemma 4.4.3 ([20]). The stochastic process Xt = Hn(B(t)−B(a); t−a), a ≤ t ≤ b is a martingale.

Proof. Apply Itˆo’sformula to Xt. Then we obtain

dXt =dHn(B(t) − B(a); t − a) 1 =D H (B(t) − B(a); t − a) dB(t) + D2H (B(t) − B(a); t − a) dt x n 2 x n

+ DηHn(B(t) − B(a); t − a) dt 1 =nH (B(t) − B(a); t − a) dB(t) + D2H (B(t) − B(a); t − a) dt n−1 2 x n 1 − D2H (B(t) − B(a); t − a) dt 2 x n

=nHn−1(B(t) − B(a); t − a) dB(t), (4.4.7)

51 where we used Equation (4.4.4) and (4.4.5). So

Z t Xt = Hn(B(t) − B(a); t − a) = nHn−1(B(s) − B(a); s − a) dB(s), a which is an Itˆointegral of the adapted process nHn−1(B(t)−B(a); t−a), and thus is a martingale.

For Hermite Polynomial ρ(x) = Hn(x; b − a), we have

Theorem 4.4.4 (Hwang–Kuo–Saitˆo–Zhai[15]). Let α(t) be a deterministic func-

2 R b 2 tion in L ([a, b]), β(t) an adapted stochastic process such that E a |β(t)| dt < ∞, and n a fixed natural number. Let Xt be the solution of the stochastic differential equation   dXt = α(t)Xt dB(t) + β(t)Xt dt, a ≤ t ≤ b, (4.4.8)   Xa = Hn B(b) − B(a); b − a , and Yt = E[Xt|Ft]. Then we have

Z t  Yt = Hn B(t) − B(a) − α(s) ds; t − a Zt, a ≤ t ≤ b, (4.4.9) a where Zt is defined in Equation (4.4.2). Moreover, Yt satisfies the stochastic dif- ferential equation

Z t h   i dYt = α(t)Yt + DxHn B(b) − B(a) − α(s) ds; t − a Zt dB(t) + β(t)Yt dt, a (4.4.10) or equivalently

Z t h  i dYt = α(t)Yt + nHn−1 B(t) − B(a) − α(s) ds; t − a Zt dB(t) + β(t)Yt dt, a (4.4.10’) for a ≤ t ≤ b, with initial condition Ya = 0.

Proof. First, we prove Equation (4.4.9). By Theorem 4.3.2, we have for a ≤ t ≤ b,

Z t  Xt = Hn B(b) − B(a) − α(s) ds; b − a Zt. a

52 Note that Zt is adapted to {Ft}. So

Z t    Yt = E[Xt|Ft] = E Hn B(b) − B(a) − α(s) ds; b − a Ft Zt. (4.4.11) a

R t By Equation (4.4.6) with x = B(b) − B(a), y = − a α(s) ds, and η = b − a, we get

Z t n   Z t X n k H B(b)−B(a)− α(s) ds; b−a= H (B(b)−B(a); b−a)− α(s) ds . n k n−k a k=0 a

Taking conditional expectation with respect to Ft, we obtain

Z t    E Hn B(b) − B(a) − α(s) ds; b − a Ft a n   Z t X n  k  = E Hn−k(B(b) − B(a); b − a) − α(s) ds Ft k k=0 a n   Z t X n   k = E Hn−k(B(b) − B(a); b − a) Ft − α(s) ds k k=0 a n   Z t X n k = H (B(t) − B(a); t − a) − α(s) ds k n−k k=0 a Z t  = Hn B(t) − B(a) − α(s) ds; t − a , (4.4.12) a

R t since a α(s) ds is adapted to {Ft} and we have used Lemma 4.4.3. Combining Equation (4.4.11) and (4.4.12), we get Equation (4.4.9)

Z t  Yt = E[Xt|Ft] = Hn B(t) − B(a) − α(s) ds; t − a Zt, a ≤ t ≤ b. a

By Itˆo’sformula, we have

Z t  dHn B(t) − B(a) − α(s) ds; t − a a 1 = D H dB(t) − D H α(t) dt + D2H dt + D H dt x n x n 2 x n η n 1 1 = nH dB(t) − nH α(t) dt + D2H dt − D2H dt n−1 n−1 2 x n 2 x n Z t = nHn−1(B(t) − B(a) − α(s) ds; t − a) dB(t) a Z t − nHn−1(B(t) − B(a) − α(s) ds; t − a)α(t) dt, a

53 where we used Equation (4.4.7), and

dZt = α(t)Zt dBt + β(t)Zt dt, and thus

dYt =Hn dZt + Zt dHn + (dHn)(dZt)

=Hnα(t)Zt dBt + Hnβ(t)Zt dt + nHn−1α(t)Zt dBt Z t h  i = α(t)Yt + nHn−1 B(t) − B(a) − α(s) ds; t − a Zt dB(t) + β(t)Yt dt, a which proves Equation (4.4.10) and (4.4.10’).

Remark 4.4.5. By using Theorem 4.4.1, we can directly get Equation (4.4.10) and

(4.4.10’). In fact, if we denote explicitly the fixed natural number n in Theorem

[n] [n] 4.4.4 in Xt and Yt as Xt and Yt , respectively. Note that for ρ(x) = Hn(x; b − a),

0 [n] by Equation (4.4.4), ρ (x) = nHn−1(x; b−a). Applying Theorem 4.4.1 to X1 = Xt

[n−1] [n] [n−1] and X2 = nXt (and then Y1 = Yt and Y2 = nYt ), we get

[n] [n] [n−1] [n] dYt = (α(t)Yt + nYt ) dB(t) + β(t)Yt dt,

[n] with Ya = 0, which reaches the same conclusion of Theorem 4.4.4.

Remark 4.4.6. The spirit of Theorem 4.4.4 is that if ρ is a Hermite polynomial, then we can express the conditional expectation process Yt = E[X(t)|Ft] in ρ. This observation is not reflected in Theorem 4.4.1, and leads to the following discussion.

It is well-known that the Hermite polynomials form an orthonormal basis of the

Hilbert space L2(R, w(x) dx), and thus a basis the Sobolev space H1(R, w(x) dx), where the measure w(x) dx is determined by the Gaussian weight function w(x) =

2 √ 1 −x /2ρ 2πρ e . So we can extend Theorem 4.4.4 from Hermite polynomials to func- tions in H1(R, w(x) dx).

54 Theorem 4.4.7 (Kuo–Zhai [29]). Suppose α(t) and β(t) are as in Theorem 4.3.2, and ρ ∈ H1(R, w(x) dx) that can be written as a Hermite series

∞ X ρ(x) = αnHn(x; b − a). i=0

Suppose also that X1(t) and X2(t) are the solutions of the same linear stochastic differential equation

dXt = α(t)Xt dBt + β(t)Xt dt, t ∈ [a, b], with different initial conditions

0 X1(a) = ρ(Bb − Ba) and X2(a) = ρ (Bb − Ba),

respectively. Let Y1(t) = E[X1(t)|Ft] and Y2(t) = E[X2(t)|Ft]. Then

∞ X Z t Y1(t) = αnHn(B(t) − B(a) − α(s) ds; t − a)Zt, a ≤ t ≤ b, i=0 a and it satisfies the following stochastic differential equation   dY1(t) = α(t)Y1(t) dBt + β(t)Y1(t) dt + Y2(t) dBt, t ∈ [a, b], (4.4.13)   Y1(a) = E ρ(Bb − Ba) .

Remark 4.4.8. The proofs of Theorem 4.4.1 and 4.4.7 use Maclaurin expansion and

Hermite expansion, respectively. It is natural to conjecture that these theorems can be extended to more function spaces for ρ.

55 Chapter 5 A Black–Scholes Model with Anticipative Initial Conditions

5.1 The Classical Black–Scholes Model

In mathematical finance, stochastic modelling has become a standard tool in pric- ing analysis. The Black–Scholes model introduced by Black and Scholes [5] and

Merton [31] is widely used in market analysis and investment strategy making.

The classical Black–Scholes model assumes adaptedness of the market, i.e., an

(0) (1) adapted stochastic process Xt = (Xt ,Xt ) such that   (0) (0) (0)  (risk-free) dXt = γ(t)Xt dt, X0 = 1, (5.1.1) (1) (1) (1) (1)  (risky) dXt = α(t)Xt dB(t) + J(t)Xt dt, X0 = 1.

This means that the model assumes that the investors only know the current in- formation about the market based on which they make pricing decisions. However, in reality, some individuals in the market know some inside information (about the future), so that they can use it to make a better pricing decision. The inside information makes the market anticipative (to them). In this situation, we assume that the initial conditions of the market depend on the future, i.e., the dynamics of the system still go on as usual (adapted), but the decision maker grasps the inside information and uses it in initial invested wealth. So we consider the generalized

(0) (1) Black–Scholes model with X0 = q0(B(T )) and X0 = q1(B(T )). This model has two useful aspects in realistic application:

• The first application is in the initial public offering (IPO). In IPO pricing, it

is inevitable for someone to have inside information in the investment. If the

understanding of the influence of inside information on the market is fully

analysed, then the pricing strategy for IPO can help lower the risk for the

56 investor, money lender and company so that it would attract more investor

into the game. Also, it helps the strategy maker to give a clearer prediction

or more credible promise of the profit by putting the future information into

the initial allocation design.

• The second application is for the government or regulatory organizations to

restrict the abuse of inside trading. For example, in the quiet periods in an

IPO’s history, insiders and the company are restricted in discussing, analysing

or trading the offering. However, even the quiet periods are set up to prevent

inside trading, these regulatory organizations can not fully guarantee that

the inside information is not abused by the individuals involved, directly or

indirectly. So it is of high importance for the regulatory organizations to have

a deep understanding of the influence of inside information on the market

and investment. This way, these organizations can provide an effective, open

and fair scheme for IPO. Moreover, the scheme can be designed case by case

to avoid rigidity. For instance, the quiet periods can be too long or too short

for different cases so that it can harm the investment behaviors.

We will give a full analysis of the Black–Scholes market with anticipative initial condition in this chapter. We will talk about the arbitrage-free and completeness of this general Black–Scholes market in Section 5.3 and 5.4. The option pricing formula for this anticipative market will be given in Section 5.5, and the corre- sponding hedging portfolio will be given in Section 5.6.

Notice that the solution to this new anticipative market (see (5.2.2) and (5.2.3)) is no longer adapted, the classical framework of the classical Itˆostochastic inte- gration is not enough. We need a general stochastic integral and its properties for pricing analysis.

57 5.2 A Black–Scholes Model with Anticipative Initial Conditions

Consider the market determined by the following Black–Scholes model with antic- ipative initial conditions:   (0) (0) (0)  dXt = γ(t)Xt dt, X0 = q0(B(T )), (5.2.1) (1) (1) (1) (1)  dXt = α(t)Xt dB(t) + J(t)Xt dt, X0 = q1(B(T )), where q0 and q1 are continuously differentiable functions, and γ(t), α(t),J(t) are

(0) (1) Ft-adapted. Xt refers to the unit price of the safe investment and Xt refers to the unit price of the risky investment. Then by Theorem 4.3.2, the solutions to the above stochastic differential equations with anticipative initial conditions are

(0) R t  0 γ(s) ds Xt = q1 B(T ) e , (5.2.2) and

Z t Z t Z t (1)   n  1 2 o Xt = q1 B(T ) − α(s) ds exp α(s) dB(s) + J(s) − α(s) ds . 0 0 0 2 (5.2.3)

Let γ(t) − J(t) h(t) = . α(t)

Define the normalized market

(0) Xet = q0(B(T )), and

(1) (1) Xet = Xt ξ(t), where n Z t o ξ(t) = exp − γ(s) ds 0 is the normalization factor.

58 A portfolio in the market Xt is a stochastic process

 p(t) = p0(t), p1(t) ,

where pi(t), i = 1, 2, represents the number of units of the ith investment at time t. The value of a portfolio p(t) is defined by

(0) (1) Vp(t) = p(t) · Xt = p0(t)Xt + p1(t)Xt .

A portfolio p(t) is self-financing if its value Vp(t) satisfies the following property

Z t Vp(t) = Vp(0) + p(s) · dXs, 0 or equivalently in stochastic differential

dVp(t) = p(t) · dXt.

The self-financing property means that we can rearrange the assets, but there is no external new investment or withdrawal of money. In order to buy a new asset, we must sell an old one; meanwhile, if we sell an asset, we must use the money to buy a new one from the same market.

A self-financing portfolio is called admissible if its value Vp(t) is lower bounded for almost all (t, ω) ∈ [0,T ] × Ω, i.e., there is a constant C > 0 such that

Vp(t, ω) ≥ −C, for almost all (t, ω) ∈ [0,T ] × Ω.

This means that the creditors have their debt limit.

5.3 Arbitrage-Free Property

The first important property for a market is arbitrage-free property.

Definition 5.3.1. An admissible portfolio p(t) is called an arbitrage in a market

Xt, 0 ≤ t ≤ T, if its value Vp(t) satisfies the conditions

 Vp(0) = 0,Vp(T ) ≥ 0,P Vp(T ) > 0 > 0.

59 In Theorem 5.3.3, we will give a sufficient condition for a general Black–Scholes market to be arbitrage-free. We will use the following auxiliary lemma to facilitate our proofs of the theorems. Its meaning is that we can always consider a normalized market, i.e., the safe investment does not change and the risky investment changes

(decreases) proportionally with the rate ξ(t).

Lemma 5.3.2. A portfolio p(t) is an arbitrage in X(t) if and only if it is an arbitrage in the normalized market Xe(t).

Proof. If p(t) is an arbitrage in X(t), then

 Vp(0) = 0,Vp(T ) ≥ 0,P Vp(T ) > 0 > 0.

So

Vep(0) = ξ(0)Vp(0) = 0,

Vep(T ) = ξ(T )Vp(T ) ≥ 0,

  P Vep(T ) > 0 = P Vp(T ) > 0 > 0.

The other direction is similar.

Theorem 5.3.3 (Arbitrage-Free Property). Assume the condition

h1 Z T γ(t) − J(t)2 i E exp dt < ∞. 2 0 α(t)

Then the market X(t), 0 ≤ t ≤ T , determined by (5.2.1) has no arbitrage.

Proof. Let Z t Bh(t) = B(t) − h(s) ds. 0

By the Girsanov theorem 1.5.1 and the Ft-adaptedness of h(t), Bh(t) is a Brownian motion with respect to the probability measure Q determined by

n Z T 1 Z T o dQ = exp h(t) dB(t) − h(t)2 dt dP. 0 2 0

60 Moreover,

(1) (1) (1) dXt = α(t)Xt dB(t) + J(t)Xt dt

(1)  (1) = α(t)Xt dBh(t) + h(t) dt + J(t)Xt dt

(1) (1) = α(t)Xt dBh(t) + γ(t)Xt dt.

By Itˆo’slemma, we get

(1) (1) (1) dXet = Xt dξ(t) + ξ(t) dXt

(1) (1) (1)  = −Xt γ(t)ξ(t) dt + ξ(t) α(t)Xt dBh(t) + γ(t)Xt dt

(1) = ξ(t)α(t)Xt dBh(t)

(1) = α(t)Xet dBh(t).

Suppose p(t) = (p0(t), p1(t)) is an arbitrage in X(t), so is in Xe(t). By the general

Itˆoformula, we have dq0(B(T )) = 0. Then

Z t Vep(t) = p(s) · dXes 0 Z t Z t (1) = p0(t) dq0(B(T )) + p1(t) dXet 0 0 Z t (1) = p1(t)α(t)Xet dBh(t) 0

Thus, by Theorem 3.1.4, Vep(t) is a near-martingale with respect to Q. So

   EQ Vep(T ) = EQ EQ Vep(T ) F0   = EQ EQ Vep(0) F0  = EQ Vep(0) = 0.

On the other hand, P and Q are equivalent. Then the conditions for p(t) to be an arbitrage are equivalent to

 Vp(0) = 0,Vp(T ) ≥ 0,Q-a.s., and Q Vp(T ) > 0 > 0.

61  So EQ Vep(T ) > 0, which is a contradiction. Therefore, there is no arbitrage in the market X(t).

5.4 Completeness of the Market

A lower bounded FT -measurable random variable Φ is called a T -claim.A T -claim is called attainable in Xt, 0 ≤ t ≤ T , if there exist a real number r and an admissible portfolio p(t) such that

Z T Φ = Vp(T ) = r + p(t) · dXt. 0

This portfolio is called a hedging portfolio for Φ.

Definition 5.4.1. A market is said to be complete if every bounded T -claim is attainable.

The completeness of a market means one can always find a hedging portfolio to realize every possible claim. The following theorem gives a sufficient condition for a general Black–Scholes market to be complete.

Theorem 5.4.2 (Completeness of a Market). Let Xt, 0 ≤ t ≤ T , be a Black– Scholes model satisfying the assumption in Theorem 5.3.3. In addition, assume that   σ B(s)|0 ≤ s ≤ t = σ Bh(s)|0 ≤ s ≤ t , ∀0 ≤ t ≤ T.

Then Xt is complete.

Proof. For a bounded T -claim Φ, we need to find a portfolio p(t), such that,

Z T Φ = r + p(t) · dX(t), (5.4.1) 0 or Z T ξ(T )Φ = Vep(T ) = r + dVep(t). (5.4.1’) 0

62 But

(1) dVep(t) = p0(t) dq0(B(T )) + p1(t) de(X)t

(1)  = p1(t) d Xt ξ(t)

(1) (1) = p1(t) Xt dξ(t) + ξ(t) dXt

(1) (1) (1)  = p1(t)Xt ξ(t)γ(t) dt + p1(t)ξ(t) α(t)Xt dB(t) + J(t)Xt dt

(1)  (1)  (1)  = p1(t)Xt ξ(t)γ(t) dt+p1(t)ξ(t) α(t)Xt dBh(t)+h(t) dt +J(t)Xt dt

(1)  = p1(t)Xt ξ(t) γ(t) dt + α(t) dBh(t) + α(t)h(t) dt + J(t) dt

(1) = p1(t)Xt ξ(t)α(t) dBh(t).

So Z t (1) Vp(t) = r + α(t)p1(t)Xt dBh(t) (5.4.2) 0 is equivalent to Z t (1) Vep(t) = r + α(t)p1(t)Xet dBh(t), (5.4.2’) 0 R t (1) and Vep(t) = r + 0 α(t)p1(t)ξ(t)Xt dBh(t) is a near-martingale with respect to P and Q by Theorem 3.1.4. Then (5.4.1’) is equivalent to

Z T (1) ξ(T )Φ = r + α(t)p1(t)ξ(t)Xt dBh(t). 0

Bh On the other hand, ξ(T )Φ is bounded and FT -measurable. By the martingale

2  representation theorem, there is a process θ ∈ Lad [0,T ] × Ω , such that, Z T ξ(T )Φ = EQ[ξ(T )Φ] + θ(t) dBh(t). 0

Let r = EQ[ξ(T )Φ] and

(1)−1 p1(t) = α(t)ξ(t)Xt θ(t), and

Z t −1  p0(t) = q0(B(T )) EQ[ξ(T )Φ] + ξ(t)A(t) + A(s)γ(s)ξ(s) ds , 0

63 where Z t (1) (1) A(t) = p1(t) dXt − p1(t)Xt . 0  Then p(t) = p0(t), p1(t) is a hedging portfolio for Φ. Therefore, by the arbitrari- ness of Φ, Xt is a complete market.

5.5 Option Pricing Formula

Let Φ be a T -claim. An (European) option on Φ is a guarantee to be paid the amount Φ(ω) at time t = T > 0. Then

1. Buyer’s price of the T -claim Φ is

n Z T o Pb(Φ) = sup x : ∃p(t) such that −x+ p(t)· dXt +Φ ≥ 0, a.s. . (5.5.1) 0

2. Seller’s price of Φ is

n Z T o Ps(Φ) = inf y : ∃p(t) such that y + p(t) · dXt ≥ Φ, a.s. . (5.5.2) 0

Remark 5.5.1. If the infimum in (5.5.2) does not exist, we assume Ps(Φ) = ∞.

Property 5.5.2. For the prices Pb and Ps defined above, we have

1. essinf Φ ≤ Pb(Φ).

2. If Xt satisfies the assumptions in Theorem 5.3.3, then we have

  Pb(Φ) ≤ EQ ξ(T )Φ(ω) ≤ Ps(Φ). (5.5.3)

Proof. Property 1 is trivial by taking p = 0. We prove Property 2 in the following.  R T Suppose x ∈ x : ∃p(t) such that − x + 0 p(t) · dXt + Φ ≥ 0, a.s. . Then there R T exists a portfolio p(t) such that −x + 0 p(t) · dXt ≥ −Φ, a.s.. So by (5.4.2), we have Z T (1) −x + p1(t)α(t)Xt ξ(t) dBh(t) ≥ −ξ(T )Φ, a.s.. (5.5.4) 0

64 R t (1) Since 0 p1(s)α(s)Xs ξ(s) dBh(s) is a near-martingale with respect to P and Q by Theorem 3.1.4, we have that for all t ∈ [0,T ],

Z t  h (1) EQ p1(s)α(s)Xs ξ(s) dBh(s) 0 h h Z t ii (1) = EQ EQ p1(s)α(s)Xs ξ(s) dBh(s) F0 0 h h Z 0 ii (1) = EQ EQ p1(s)α(s)Xs ξ(s) dBh(s) F0 0 Z 0 h (1) i = EQ p1(s)α(s)Xs ξ(s) dBh(s) 0

= EQ[0] = 0.

Thus, taking expectation with respect to Q on (5.5.4), we get

 x ≤ EQ ξ(T )Φ .

Therefore, by the arbitrariness of x,

 Pb(Φ) ≤ EQ ξ(T )Φ .

Similarly, we assume Ps(Φ) < ∞. Then there exists y, such that for some p(t),

Z T y + p(t) · dXt ≥ Φ, a.s.. 0

Then by (5.4.2),

Z T (1) y + p1(t)α(t)Xt ξ(t) dBh(t) ≥ ξ(T )Φ, a.s.. 0

Taking expectation with respect to Q, we get

 y ≥ EQ ξ(T )Φ .

By the arbitrariness of y,  Ps(Φ) ≥ EQ ξ(T )Φ .

We then complete the proof.

65 Definition 5.5.3. The price of a T -claim is said to exist if Pb(Φ) = Ps(Φ). Their common value is called the price of the option on Φ at time t = 0, and is denoted by P(Φ).

Now, we can give the the pricing formula for the option on any claim Φ.

Theorem 5.5.4 (Option Pricing Formula). Let Xt be a market satisfying the as-   sumptions in Theorem 5.3.3 and 5.4.2. Then for any T -claim Φ with EQ ξ(T )Φ < ∞, the price of Φ at time t = 0 is given by

  P(Φ) = EQ ξ(T )Φ . (5.5.5)

Proof. Since Xt satisfies the assumptions in Theorem 5.3.3 and 5.4.2, it is complete.

Define   Φ(ω), if Φ(ω) ≤ n, Φn(ω) =  n, if Φ(ω) > n.

Then Φn is a bounded T -claim, since Φ is lower bounded.

Since Xt is complete, there is yn ∈ R such that for some portfolio pn(t) = (0) (1)  pn (t), pn (t) , we have

Z T yn + pn(t) · dXt = Φn, 0 or equivalently,

Z T (1) (1) yn + pn (t)ξ(t)α(t)Xt dBh(t) = ξ(T )Φn. 0

By the completeness of Xt,Φn is attainable, so the integral in the above equation is a Q-near-martingale, so   yn = EQ ξ(T )Φn .

By the definition of Ps, we have

Ps(Φn) ≤ yn,

66 and since Φ ≥ Φn,

Ps(Φn) ≥ Ps(Φ).

Thus,   Ps(Φ) ≤ EQ ξ(T )Φ .

Combining with the inequality (5.5.3) in Property (3), we obtain

  Ps(Φ) = EQ ξ(T )Φ .

Similarly, we have the other equality

  Pb(Φ) = EQ ξ(T )Φ .

Therefore, we have the price of the option

  P(Φ) = EQ ξ(T )Φ .

.

Now, we are going to find out the formula to calculate the option price. In order for the price to have a explicit formula, we need to assume that γ(t) and α(t) are deterministic functions, and Φ is of the form

(1) Φ = F XT , which means that Φ is not only FT -measurable, but also totally determined by the knowledge of the market at time T .

Theorem 5.5.5 (Calculation of the Option Price). The explicit formula of the price (5.5.5) is given by

 (1) P(Φ) =EQ ξ(T )F XT  Z T  = exp − γ(t) dt 0  Z T Z T Z T     n 1 2 o × EQ F q1 B(T )− α(t) dt exp α(t) dB(t)+ J(t)− α(t) dt . 0 0 0 2

67 On the other hand,

Z T Z T Z T Z T 1 2  1 2 α(t) dB(t)+ J(t)− α(t) dt = α(t) dBh(t)+h(t) dt + J(t)− α(t) dt 0 0 2 0 0 2 Z T Z T 1 2 = α(t) dBh(t) + γ(t) − α(t) dt. 0 0 2 So

P(Φ)  Z T  = exp − γ(t) dt 0  Z T Z T Z T    n 1 2 o × EQ F q1 B(T )− α(t) dt exp α(t) dBh(t)+ γ(t)− α(t) dt 0 0 0 2  Z T  = exp − γ(t) dt 0  Z T Z T Z T    n 1 2 o ×EQ F q1 Bh(T )− α(t)−h(t) dt exp α(t) dBh(t)+ γ(t)− α(t) dt . 0 0 0 2 Furthermore, let α(t) = α and γ(t) = γ be constant functions, and J(t) be a deterministic function. Then h(t) is deterministic. And we have the following option pricing formula as an explicit integral Z T −γT h   n 1 2 oi P(Φ) =e EQ F q1 Bh(T )−αT + h(t) dt exp αBh(T )+ γ− α T 0 2 −γT Z Z T 2 e   n 1 2 o − y =√ F q1 y−αT + h(t) dt exp αy+ γ− α T e 2T dy. 2πT R 0 2 (5.5.6)

n Example 5.5.6. Suppose q1(x) = x and F (x) = x . Then the option price is e−γT Z Z T 1 y2 P(Φ) =√ y + αT + h(t) dtn exp αny + n(γ − α2)T − dy 2πT R 0 2 2T 1 1 =√ exp (n − 1)γT − nα2T 2πT 2 Z Z T 1 (αnT )2 × y + αT + h(t) dtn exp  − (y + αnT )2 + d(y + αnT ) R 0 2T 2T 1 1 =√ exp (n − 1)γT − nα2T (n − 1) 2πT 2 Z Z T 2 n − y × y + αT − αnT + h(t) dt e 2T dy. R 0

68 5.6 Hedging Portfolio

Section 5.5 gives the formula to help fixing a price for an arbitrary option. In order to realize the claim, we need to find out its hedging portfolio.

Suppose α(t) = α and γ(t) = γ are constant functions, and J(t) is a deterministic function. From the proof of Theorem 5.4.2, the hedging portfolio corresponding to the option pricing formula in Theorem 5.5.5 is given by

Z t −1 (1)  p0(t) = q0 B(T ) EQ ξ(T )F XT + ξ(t)A(t) + A(s)γ(s)ξ(s) ds , (5.6.1) 0

(1)−1 p1(t) = α(t)ξ(t)Xt θ(t), (5.6.2)

R t (1) (1) where A(t) = 0 p1(s) dXs − p1(t)Xt and θ(t) is a stochastic process satisfying

Z T (1)  (1) ξ(T )F XT = EQ ξ(T )F XT + θ(t) dBh(t). 0

The above function θ(t) is calculated as

n δ   o θ(t) = E e−γT F X(1) F Bh , Q δt T t with the variational derivative (see [20]) calculated as

δ   e−γT F X(1) δt T  Z T  δ −γT   1 2  = e F q1 Bh(T ) + αT + h(t) dt exp{αBh(T ) + (γ − α )T } δt 0 2 1 = eγT F 0X(1) exp αB (T ) + (γ − α2)T  T h 2 Z T Z T h 0   i × q1 Bh(T ) + αT + h(t) dt + αq1(B(T ) + αT + h(t) dt) . 0 0

So

Z Z T 1 γT 0  1 2  θ(t) =p e F q1 Bh(T )+αT + h(t) dt exp αy+αBh(t)+(γ− α )T 2π(T − t) R 0 2 1 1 × exp αy + αB (t) + (γ − α2)T − y2 dy. h 2 2(T − t)

69 References

[1] Ayed, W. and Kuo, H.-H.: An extension of the Itˆointegral, Communications on Stochastic Analysis 2, no. 3 (2008) 323–333.

[2] Ayed, W. and Kuo, H.-H.: An extension of the Itˆointegral: toward a general theory of stochastic integration, Theory of Stochastic Processes 16 (32), no. 1 (2010) 1–11.

[3] Basse-O’Connor, A., Graversen, S.-E., and Pedersen, J.: Stochastic integra- tion on the real line, Theory of Probability and Its Applications, 58 (2), 193–215.

[4] Bj¨ork,T.: Arbitrage Theory in Continuous Time Oxford University Press, 2004.

[5] Black, F. and Scholes, M.: The pricing of options and corporate liabilities, J. Political Economy 81, (1973) 637–659.

[6] Brown, R.: A brief account of microscopical observations made in the months of June, July and August, 1827, on the particles contained in the pollen of plants; and on the general existence of active molecules in organic and inorganic bodies, Philosophical Magazine N.S.4(1928), 161–173.

[7] Buckdahn, R.: Anticipating linear stochastic differential equations, Lecture Notes in Control and Information Sciences 136, (1989) 18–23, Springer- Verlag.

[8] Buckdahn, R. and Nualart, D.: Linear stochastic differential equations and Wick products, and Realted Fields, 99, no. 4 (1994) 501– 526.

[9] Einstein, A.: Uber¨ die von der molekularkinetischen theorie der w¨arme geforderte bewegung von in ruhenden fl¨ussigkeiten suspendierten teilchen, Annalen der Physik 322(1905), no. 8, 549–560.

[10] Esunge, J.: A class of anticipating linear stochastic differential equations, Communications on Stochastic Analysis, 3, no. 1 (2009) 155–164.

[11] Girsanov, I. V.: On transforming a certain class of stochastic processes by absolutely continuous substitution of measures, Theory of Probability & Its Applications 5, no. 3 (1960) 285–301.

[12] Gross, L.: Abstract Wiener spaces, Proc. 5th Berkeley Symp. Math. Stat. and Probab., 2, part 1 (1965), 31–42, University of California Press, Berkeley.

70 [13] Hida, T.: Analysis of Brownian Functionals, Carleton Mathematical Lecture Notes 13 1975.

[14] Hwang, C.-R., Kuo, H.-H., Saitˆo,K., and Zhai, J.: A general Itˆoformula for adapted and instantly independent stochastic processes, Communications on Stochastic Analysis 10 (2016), no. 3, 341–362.

[15] Hwang, C.-R., Kuo, H.-H., Saitˆo,K., and Zhai, J.: Near-martingale property of anticipating stochastic integration, Communications on Stochastic Anal- ysis 11 (2017), no. 4, 491–504.

[16] Itˆo,K.: Differential equations determining Markov processes, Zenkoku Shij¯o S¯ugakuDanwakai, 244 (1942), no. 1077 1352–1400 (in Japanese).

[17] Itˆo,K.: Stochastic integral, Proceedings of the Imperial Academy 20, no. 8 (1944) 519–524.

[18] Khalifa, N., Kuo, H.-H., Ouerdiane, H., and Szozda, B.: Linear stochastic differential equations with anticipating initial conditions, Communications on Stochastic Analysis 7, no. 2 (2013) 245–253.

[19] Kuo, H.-H.: White Noise Distribution Theory, CRC Press, 1996.

[20] Kuo, H.-H.: Introduction to Stochastic Integration, Universitext, Springer, 2006.

[21] Kuo, H.-H.: The Itˆocalculus and white noise theory: a brief survey to- ward general stochastic integration, Communication on Stochastic Analysis 8, no. 1 (2014) 111–139.

[22] Kuo, H.-H., Peng, Y., and Szozda, B.: Itˆoformula and Girsanov theorem for anticipating stochastic integrals, Communication on Stochastic Analysis 7, no. 3 (2013) 441–458.

[23] Kuo, H.-H., Peng, Y., and Szozda, B.: Generalization of the anticipative Girsanov theorem, Communications on Stochastic Analysis 7, no. 4 (2013) 573–589.

[24] Kuo, H.-H., Sae-Tang, A., and Szozda, B.: A stochastic integral for adapted and instantly independent stochastic processes, in “Advances in , Probability and ” Vol. I, Stochastic Processes, Finance and Control: A Festschrift in Honour of Robert J. Elliott (eds.: Cohen, S., Madan, D., Siu, T. and Yang, H.), World Scientific, 2012, 53–71.

[25] Kuo, H.-H., Sae-Tang, A., and Szozda, B.: An isometry formula for a new stochastic integral, In “Proceedings of International Conference on Quantum Probability and Related Topics,” May 29–June 4, 2011, Levico, Italy, QP–PQ: Quantum Probability and White Noise Analysis 29 (2013) 222–232.

71 [26] Kuo, H.-H., Sae-Tang, A., and Szozda, B.: The Itˆoformula for a new stochas- tic integral, Communications on Stochastic Analysis 6, no. 4 (2012) 603–614.

[27] Kuo, H.-H. and Saitˆo, K.: Doob’s decomposition theorem for near- submartingales, Communications on Stochastic Analysis 9, no. 4 (2015) 467– 476.

[28] Kuo, H.-H., Sudip, S., and Zhai, J.: A Black–Scholes model with anticipating initial conditions, preprint.

[29] Kuo, H.-H. and Zhai, J.: On some anticipative stochastic differential equa- tions, preprint.

[30] Malliavin, P.: of variation and hypoelliptic operators, Proceedings of the International Symposium on Stochastic Differential Equa- tions, Tokyo, 1976, pp. 195–263, Wiley, New York-Chichester-Brisbane, 1978.

[31] Merton, R.: Theory of rational option pricing, Bell Journal of Economics and Management Science 4, no. 2 (1973) 141–183.

[32] Pardoux, E. and Protter, P.: A two-sided stochastic integral and its calculus, Probab Theory Related Fields, 76 (1987), 15–49.

[33] Wiener, N: Differential space, Journal of Mathematical Physics 2(1923), 131– 174.

72 Part II

Temporal Minimum Action Method for Rare Events in Stochastic Dynamic Systems

73 Chapter 6 Introduction

6.1 Large Deviation Principle

We start talking about the motivation of the large deviation theory from math- ematics point of view. Recall the most important theorems in probability: and .

Theorem 6.1.1. ¯ Xn → µ, as n → ∞,

¯ 1 Pn where Xn = n i=1 Xi, and the limit is in the sense of convergence in probability for the weak law and almost surely for the strong law.

This law of large numbers (weak or strong) claims the stability of mean value, that is, when n is large enough, the average value of events will approach its expectation. Or equivalently, the probability that the mean value is far away from expectation (large deviation happens) is very small.

Theorem 6.1.2. n X → p , as n → ∞, n X where nX and pX are the number of occurrence and the probability of event X re- spectively, and the limit is in the sense of convergence in probability for Bernoulli’s law and almost surely for Borel’s law.

This law of large numbers claims the stability of frequency, that is, when n is large enough, the frequency of occurrence of an event will approach its probability.

Or equivalently, the probability that the frequency is far away from probability is very small.

74 Theorem 6.1.3 (Lindeberg-L´evyCentral Limit Theorem).

n √  1 X  n X − µ → N(0, σ2), n n i=1 where the limit is in distribution.

The central limit theorem claims the normality of distribution, that is, the nor- malization of the distribution of a sequence of random variables approaches normal distribution. This fact is the foundation of the effectiveness of using Brownian mo- tion or white noise when we simulate a stochastic noise.

The central limit theorem provides the asymptotic property of average value approaching expectation stated in the law of large numbers, that is, how small the probability is for having large deviation: √ δ n 1 Z 2 P (|S | ≥ δ) = 1 − √ e−x /2 dx. n √ 2π −δ n By applying L’Hˆopital’sRule twice, we obtain:

1 δ2 (log P (|S | ≥ δ)) → − , as n → ∞. (6.1.1) n n 2

More generally, we can consider the asymptotic property of the probability of

Sn having any kind of large deviation, for example, Z √ 1 −x2/2 P ( nSn ∈ A) → √ e dx, as n → ∞. 2π A

1 This leads to the usual discussion of the asymptotic property of n log µn(A), or more generally ε log µε(A), and

Definition 6.1.4 (Large Deviation Principle). Let {µε} be a family of probability measures. We call it satisfies large deviation principle if the inequality

◦ ¯ − inf I(x) ≤ lim inf ε log µε(A ) ≤ lim sup ε log µε(A) ≤ − inf I(x) x∈A◦ ε→0 ε→0 x∈A¯ holds for some I(x) and all Borel sets A, where A◦ and A¯ denote the interior and the closure of A, respectively.

75 6.2 Statistic Physics

In physics, large deviation principle is also motivated from statistic mechanics.

Modern physics believes that some macro physical phenomena are in fact the statistic appearance of the particles contained in the system, for example, the heat of the system is the average kinetic energy of the particles. And these phenomena sometimes show rare performance and they follow large deviation principle. The rate function here is relative entropy originally.

α Pα Consider a vector γ = (γ1, . . . , γα) ∈ R satisfying γi ≥ 0 and i=1 γi = 1. We call γ a probability vector. The set of all the probability vectors is denoted by Pα, which is also called the α-simplex in the space of α dimension.

Definition 6.2.1 (Relative Entropy). For a fixed ρ ∈ Pα, the relative entropy of

γ ∈ Pα with respect to ρ is defined by

α X γi I (γ) = γ log . ρ i ρ i=1 i Property 6.2.2. The relative entropy satisfies the following properties:

(1). Iρ(γ) ≥ 0, and Iρ(γ) = 0 ⇔ ρ = γ.

(2). Iρ is strictly convex.

Remark 6.2.3. Property 6.2.2 (1) explains the reason of using Iρ, that is, it describes the difference (deviation) between two distributions.

Boltzmann discovered from statistic point of view that in a multi-particle sys- tem, the probability of empirical distribution ω being in the neighborhood of the distribution γ approaches 0 asymptotically as exp[−nIρ(γ)]. The rate function is exactly the relative entropy Iρ. If we call ρ the equilibrium distribution, then it is in fact the minimizer of the rate function (relative entropy). The empirical distribution must converge to ρ. But

76 if γ is another state that is different from the equilibrium ρ, then “the empirical distribution approaches γ” is a rare event, and the probability of this approach

(large deviation happens) converges to 0 asymptotically as the exponential of rate

−nIρ. This principle is called the principle of maximum entropy in physics.

6.3 Freidlin-Wentzell Theory

Consider a stochastic differential equation

ε ε √ dXt = b(Xt ) dt + ε dBt, (6.3.1) or more generally a degenerate version

ε ε √ ε dXt = b(Xt ) dt + εσ(Xt ) dBt, (6.3.2)

ε with the initial condition X0 = 0.

Example 6.3.1 (Langevin Equation). A multi-Brownian-particle system satisfies the Langevin equation

 Pi(t) dQi(t) = m dt, √ Pi(t) dPi(t) = −∇V (Qi(t)) dt − γ m dt − 2γkBT dWi(t), where γ is the damping, kB is the Boltzmann constant, and T is the absolute temperature. kBT here is ε, the magnitude of environmental heat noise.

1 R T 0 2 Consider the system (6.3.1). Let I(w) = 2 0 |wt − b(wt)| dt. Then the distri-

ε bution of the solution process Xt in the Banach space C0[0,T ] satisfies the large deviation principle with rate function I, namely,

−I(A◦) ≤ lim inf ε log P [Xε ∈ A◦] ≤ lim sup ε log P [Xε ∈ A¯] ≤ −I(A¯), ε→0 ε→0 for A ∈ B(C0). One of the most important consequence of Freidlin-Wentzell theory is that in comparison to the deterministic system of several equilibrium (attractor) without

77 random noise

ε ε dXt = b(Xt ) dt, (6.3.3) a particle/state is not able to escape from one equilibrium and transit to another.

But in a system with noise, even if the noise has very small altitude, it is still pos- sible to escape from an equilibrium and move to another. Under the assumption of white noise, this movement is a rare event, or almost impossible event. While the possibility exists, it asymptotically converges to 0 with rate function I. Among all these rare events, there is one most possible (least unlikely) event. This event is the most possible way of moving from one equilibrium to another driven by noise. And it is in fact the minimizer of I. This minimum of I corresponds to the minimum action, so I is also called the action functional which is related to Lagrangian-

Hamiltonian in physics. So the long-term behavior of the perturbed system (6.3.1) is characterized by the small-noise-induced transitions between the equilibriums of the unperturbed system (6.3.3). Although these transitions rarely occur, they can describe many critical phenomena in physical, chemical, and biological sys- tems, such as non-equilibrium interface growth [12, 24], regime change in climate

[36], switching in biophysical network [35], hydrodynamic instability [31, 30], etc.

Therefore, to find out this minimum action and the minimizer is very important for studying the stability of the system in noise environment.

We describe our problem rigorously. In order to do so, from now on, we denote the action functional I as Z T 1 0 2 ST (ϕ) = I(ϕ) = |ϕ − b(ϕ)| dt. (6.3.4) 2 0

Let x1 and x2 be two arbitrary points in the phase space. If we restrict the transition on a certain time interval [0,T ], we have

lim lim −ε log Pr(τδ ≤ T ) = inf ST (ϕ), (6.3.5) δ↓0 ε↓0 ϕ(0)=x1, ϕ(T )=x2

78 where τδ is the first entrance time of the δ-neighborhood of x2 for the random process X(t) starting from x1. The path variable ϕ connecting x0 and x1, over which the action functional is minimized, is called a transition path. If the time scale is not specified, the transition probability can be described with respect to the quasi-potential V (x1, x2) from x1 to x2:

V (x1, x2) := inf inf ST (ϕ). (6.3.6) + T ∈R ϕ(0)=x1, ϕ(T )=x2

The probability meaning of V (x1, x2) is

V (x1, x2) = inf lim lim −ε log Pr(τδ ≤ T ). (6.3.7) T ∈R+ δ↓0 ε↓0

We in general call the asymptotic results given in Equation (6.3.5) and (6.3.7) large deviation principle (LDP). We use ϕ∗ to indicate transition path that minimizes the action functional in Equation (6.3.5) or (6.3.6), which is also called the minimal action path (MAP) [9], or the instanton in physical literature related to path

∗ integral. The MAP ϕ is the most probable transition path from x1 to x2. For the quasi-potential, we let T ∗ indicate the optimal integration time, which can be either finite or infinite depending on x1 and x2. The importance of LDP is that it simplifies the computation of transition probability, which is a path integral in a function space, to seeking the minimizers ϕ∗ or (T ∗, ϕ∗).

79 Chapter 7 Temporal Minimum Action Method

7.1 Problem Description

As we have seen from Section 6.3, the results (6.3.5) and (6.3.6) from the Freidlin-

Wentzell (F-W) theory of large deviations lead to the following two problems

∗ Problem I : ST (ϕ ) = inf ST (ϕ), (7.1.1) ϕ(0)=x1, ϕ(T )=x2 and

∗ Problem II : ST ∗ (ϕ ) = inf inf ST (ϕ). (7.1.2) + T ∈R ϕ(0)=x1, ϕ(T )=x2

Here ϕ(t) is a path connecting x1 and x2 in the phase space on the time interval [0,T ]. The minima and minimizers of Problems I and II characterize the difficulty of the small-noise-induced transition from x1 to the vicinity of x2, see Equation (6.3.5) and (6.3.7). In Problem I, the transition is restricted to a certain time scale

T , which is relaxed in Problem II.

To analyze the convergence of numerical approximations for Problem I and II, we need the following assumptions for b(x).

Assumption 7.1.1. (1) b(x) is Lipschitz continuous in a big ball, i.e., there exist

constants K > 0 and R1 > 0, such that

|b(x) − b(y)| ≤ K|x − y|, ∀ x, y ∈ BR1 (0), (7.1.3)

n where | · | denotes the `2 norm of a vector in R ;

(2) There exist positive numbers β, R2, such that

2 hb(x), xi ≤ −β|x| , ∀ |x| ≥ R2, (7.1.4)

80 2 2 S∗ where R2 ≤ R1 − β , and Z 1 ∗ 1  2 S = max y − x − b x + (y − x)t dt. x,y∈BR2 (0) 2 0

(3) The solution points of b(x) = 0 are isolated.

Lemma 7.1.2. Let Assumption (7.1.4) hold. If both the starting and ending points

of a MAP ϕ(t) are inside BR2 (0), then ϕ(t) is located within BR1 (0) for any t.

Proof. Suppose that ϕ(t) is a MAP outside of BR2 (0) but connecting two points x

0 and y on the the surface of BR2 (0). Let w(t) = ϕ − b(ϕ). We have

ϕ0 = b(ϕ) + w. (7.1.5)

Taking inner product on both sides of the above equation with 2ϕ, we get

d|ϕ|2 = 2hb(ϕ), ϕi + 2hw, ϕi. (7.1.6) dt

Then by using Cauchy’s inequality with β, and Assumption (7.1.4), we get

d|ϕ|2 1 1 ≤ −2β|ϕ|2 + |w|2 + 2β|ϕ|2 = |w|2. (7.1.7) dt 2β 2β

Taking integration and using the definition of minimum action, we obtain a bound for any t along the MAP:

Z t 2 2 1 2 2 1 2 1 ∗ 2 |ϕ| ≤ |x| + |w| dt ≤ R2 + ST ∗ (ϕ) ≤ R2 + S ≤ R1, (7.1.8) 0 2β β β

which means that the whole MAP is located within BR1 (0).

Remark 7.1.3. The Assumptions (7.1.3) and (7.1.4) allow most of the physically relevant smooth nonlinear dynamics. It is seen from Lemma 7.1.2 that Assumption

(7.1.4) is used to restrict all MAPs of interest inside BR1 (0). For simplicity and without loss of generality, we will assume from now on that the Lipschitz continuity of b(x) is global, namely, R1 = ∞. For the general case given in Assumption 7.1.1,

one can achieve all the conclusions by restricting the theorems and proofs to BR1 (0).

81 For simplicity, we will use the following notations in this chapter. For ϕ(t) ∈ Rn defined on Γ = [0,T ], denote |ϕ|2 = Pn |ϕ |2 and |ϕ|2 = Pn |ϕ |2 , where T i=1 i m,ΓT i=1 i m,ΓT

2 R (m) 2 2 ϕi is the i-th component of ϕ and |ϕi| = |ϕ | dt. Denote kϕk = m,ΓT ΓT i m,ΓT Pn 2 2 P R (k) 2 n kϕik , where kϕik = |ϕ | dt. For f(t), g(t):ΓT → , i=1 m,ΓT m,ΓT k≤m ΓT i R Pn R Pn we denote the inner products hf, gi = figi and hf, giΓ = ( figi) dt. i=1 T ΓT i=1

7.2 Introduction to the Temporal Minimum Action Method

Starting from [8], the large deviation principle given by the F-W theory has been approximated numerically, especially for non-gradient systems, and the numerical methods are, in general, called minimum action method (MAM ).

The numerical difficulty is to separate the slow dynamics around critical points from fast dynamics elsewhere. In fact, since the noise is small, it takes infinite time to escape from a small neighborhood of a critical point. So the MAP will be mainly captured by the fast dynamics subject to a finite time, but it will take infinite time to pass a critical point. To overcome this difficulty, there exist two basic techniques: (1) non-uniform temporal discretization including moving mesh technique and adaptive finite element method (aMAM), and (2) reformulation of the action functional with respect to arc length (gMAM). Both techniques aim to solve Problem II with T ∗ = ∞. In aMAM, a finite but large T is used while in gMAM, the infinite T ∗ is mapped to a finite arc length. So, aMAM is not able to deal with Problem II subject to a finite T ∗ since a fixed T is required. Since T has been removed, gMAM is not able to deal with Problem I.

To deal with both Problem I and II in a consistent way, we have developed a minimum action method with optimal linear time scaling (temporal minimum action method, or tMAM ) coupled with finite element discretization. This method is based on two observations:

82 1. For any given transition path, there exists a unique T to minimize the action

functional subject to a linear scaling of time;

2. For transition paths defined on a finite element approximation space, the

optimal integration time T ∗ is always finite but increases to infinity as the

approximation space is refined.

The first observation removes the parameter T in Problem II and allows us to always discretize only on the unit interval Γ1. The second observation guarantees that the discrete problem of Problem II is well-posed after T is removed. In addi- tion, Problem I becomes a special case of our reformation of Problem II. Then we achieve our aim to deal with both Problem I and II in a consistent way.

Although many techniques have been developed from the algorithm point of view, few numerical analysis has been done for minimum action method. We fill this gap partially in this chapter. We start with the idea of tMAM.

7.3 Reformulation and Finite Element Discretization of the Problem

By Maupertuis’ principle of least action for the minimizer (T ∗, ϕ∗) of Problem II, we have a necessary condition:

Lemma 7.3.1 ([17]). Let (T ∗, ϕ∗) be the minimizer of Problem II. Then ϕ∗ is

∂L located on the surface H(ϕ, ∂ϕ0 ) = 0, where H is the Hamiltonian given by the

0 1 0 2 Legendre transform L(ϕ, ϕ ) := 2 |ϕ −b(ϕ)| . More specifically, for Equation (6.3.1) ∂L H(ϕ, ) = 0 ⇐⇒ |ϕ0(t)| = |b(ϕ(t))|, ∀ t. (7.3.1) ∂ϕ0

We call Equation (7.3.1) the zero-Hamiltonian constraint. The zero-Hamiltonian constraint provides a nonlinear mapping between the arc length of the geometri- cally fixed lines on surface H = 0 and time t (see Section 7.6 for more details).

Here we consider a linear time scaling on ΓT , which is simpler and more flexible for numerical approximation. For any given transition path ϕ and a fixed T , we

83 consider the change of variable s = t/T ∈ [0, 1] = Γ1. Let ϕ(t) = ϕ(sT ) =:ϕ ¯(s). Thenϕ ¯0(s) = ϕ0(t)T, and we can rewrite the action functional as

Z 1 T −1 0 2 ST (ϕ(t)) = ST (ϕ ¯(s)) = T ϕ¯ (s) − b(ϕ ¯(s)) ds =: S(T, ϕ¯). (7.3.2) 2 0

Now, the two variablesϕ ¯ and T are independent, sinceϕ ¯ is just an arbitrary path defined on [0, 1] connecting x0 and x1. So for a fixed path and using optimality condition, we can find the minimizer of S, which is simply a function of T .

Lemma 7.3.2. For any given transition path ϕ¯, we have

Sˆ(ϕ ¯) := S(Tˆ(ϕ ¯), ϕ¯) = inf S(T, ϕ¯), (7.3.3) T ∈R+ if Tˆ(ϕ ¯) < ∞, where |ϕ¯0| Tˆ(ϕ ¯) = 0,Γ1 . (7.3.4) |b(ϕ ¯)|0,Γ1

Proof. It is easy to verify that the functional Tˆ(ϕ ¯) is nothing but the unique solution of the optimality condition ∂T S(T, ϕ¯) = 0.

Corollary 7.3.3. Let (T ∗, ϕ∗) be the minimizer of Problem II. If T ∗ < ∞, we have T ∗ = Tˆ(ϕ ¯∗), where ϕ¯∗(s) := ϕ∗(sT ∗).

Proof. By the zero-Hamiltonian constraint (7.3.1) and the definition ofϕ ¯, we have

|(ϕ ¯∗)0| = |(ϕ∗)0|T ∗ = |b(ϕ∗)|T ∗ = |b(ϕ ¯∗)|T ∗.

Integrating both sides on Γ1, we have the conclusion.

For any absolutely continuous path ϕ, it is shown in Theorem 5.6.3 in [6] that

ST can be written as  1 n  ST (ϕ), ϕ ∈ H (ΓT ; R ), ST (ϕ) = (7.3.5)  ∞, otherwise.

84 1 n Thus, we can solve the problem in the framework of Sobolev space H (ΓT ; R ).

1 1 n From now on, we use H (ΓT ) to indicate H (ΓT ; R ) for simplicity. The same rule

1 n 2 n applies to other spaces such as H0 (Γ; R ) and L (Γ; R ). Define the following two admissible sets consisting of transition paths:

 1 AT = ϕ ∈ H (ΓT ): ϕ(0) = 0, ϕ(T ) = x , (7.3.6)

 1 A1 = ϕ¯ ∈ H (Γ1) :ϕ ¯(0) = 0, ϕ¯(1) = x , (7.3.7)

where we let x1 = 0 and x2 = x just for convenience.

Lemma 7.3.4. If T ∗ < ∞, we have

∗ ˆ ∗ ˆ ST ∗ (ϕ ) = S(ϕ ¯ ) = inf S(ϕ ¯). (7.3.8) ϕ¯∈A1 and T ∗ = Tˆ(ϕ ¯∗)(see Equation (7.3.4)), where ϕ∗(t) =ϕ ¯∗(t/T ∗)(or ϕ¯∗(s) =

ϕ∗(sT ∗)).

∗ ∗ ∗ Proof. If (T , ϕ ) is a minimizer of ST (ϕ) and T < ∞, then

∗ ˆ ˆ ∗ ST ∗ (ϕ ) = inf inf ST (ϕ) = inf S(ϕ ¯) ≤ S(ϕ ¯ ), T ∈R+ ϕ∈AT ϕ¯∈A1 and

ˆ ∗ ∗ ∗ ∗ S(ϕ ¯ ) = inf S(T, ϕ¯ ) = inf ST (ϕ ) ≤ ST ∗ (ϕ ). T ∈R+ T ∈R+ ∗ ∗ ∗ ˆ ∗ ∗ ˆ Thus, ST ∗ (ϕ ) = S(T , ϕ¯ ) = S(ϕ ¯ ), that is,ϕ ¯ is a minimizer of S(ϕ ¯) forϕ ¯ ∈ A1, and T ∗ = Tˆ(ϕ ¯∗) from Corollary 7.3.3.

∗ ˆ ∗ ˆ ∗ ∗ ∗ t Conversely, ifϕ ¯ is a minimizer of S(ϕ ¯), we let T = T (ϕ ¯ ), and ϕ (t) =ϕ ¯ ( T ∗ ), for t ∈ [0,T ∗]. We have

∗ ˆ ∗ ∗ ˆ ∗ ˆ ST ∗ (ϕ ) = S(T (ϕ ¯ ), ϕ¯ ) = S(ϕ ¯ ) = inf S(ϕ ¯) = inf inf ST (ϕ), ϕ¯∈A1 T ∈R+ ϕ∈AT

∗ ∗ ∗ ˆ when T < ∞. Then (T , ϕ ) is a minimizer of ST (ϕ). So the minimizers of S(ϕ ¯) and ST (ϕ) have a one-to-one correspondence when the optimal integral time is finite.

85 Lemma 7.3.4 shows that for a finite T ∗ we can use Equation (7.3.8) instead of

Problem II to approximate the quasi-potential such that the optimization param- eter T is removed and we obtain a new problem

Sˆ(ϕ ¯∗) = inf Sˆ(ϕ ¯) (7.3.9) ϕ¯(0)=x1, ϕ¯(1)=x2 that is equivalent to Problem II.

Now, we discretize the problems and solve it numerically. Let Th and T h be partitions of ΓT and Γ1, respectively. We define the following approximation spaces given by linear finite elements:

 Bh = ϕh ∈ AT : ϕh|I is affine for each I ∈ Th ,  Bh = ϕ¯h ∈ A1 :ϕ ¯h|I is affine for each I ∈ T h .

For any h, we define the following discretized action functionals:

 1 R T 0 2 2 0 |ϕh − b(ϕh)| dt, if ϕh ∈ Bh, ST,h(ϕh) = (7.3.10) ∞, if ϕh 6∈ Bh, and ˆ  T (ϕ ¯h) R 1 1 0 2 2 0 | ˆ ϕ¯h − b(ϕ ¯h)| dt, ifϕ ¯h ∈ Bh, ˆ T (ϕ ¯h) Sh(ϕ ¯h) = (7.3.11) ∞, ifϕ ¯h 6∈ Bh.

7.4 Problem I with a Fixed T

For this case, our main results are summarized in the following theorem:

Theorem 7.4.1. For Problem I with a fixed T , we have

min ST (ϕ) = lim inf ST,h(ϕh), ϕ∈AT h→0 ϕh∈Bh namely, the minima of ST,h converge to the minimum of ST (ϕ) as h → 0. Moreover, if {ϕh} ⊂ Bh is a sequence of minimizers of ST,h, then there is a subsequence that

1 converges weakly in H (ΓT ) to some ϕ ∈ AT , which is a minimizer of ST .

86 We prove this theorem in three steps:

Step 1: The existence of the minimizer of ST (ϕ) in AT .

∗ Lemma 7.4.2 (Solution Existence). There exists at least one function ϕ ∈ AT such that

∗ ST (ϕ ) = min ST (ϕ). ϕ∈AT

1 R T 0 2 Proof. We first prove the coerciveness of ST (ϕ) = 2 0 |ϕ − b(ϕ)| dt. Define an auxiliary function g by

Z t g(t) = ϕ(t) − b(ϕ(u)) du. 0

Then g0 = ϕ0 − b(ϕ) and g(0) = 0. Since b is globally Lipschitz continuous, we have

|ϕ0(t)| ≤ |b(ϕ(t)) − b(0)| + |b(0)| + |g0|

≤ K|ϕ| + |b(0)| + |g0(t)| Z t ≤ K |ϕ0(s)| ds + |b(0)| + |g0(t)|. 0

By Gronwall’s inequality, we have

Z t |ϕ0(t)| ≤ K (|b(0)| + |g0(s)|)eK(t−s) ds + |b(0)| + |g0(t)|, 0 from which we obtain

|ϕ| ≤ C |b(0)|2 + C |g|2 , 1,ΓT 1 2 1,ΓT

where C1 and C2 are two positive constants depending on K and T . Thus,

1 2 1 −1 2 1 −1 2 ST (ϕ) = |g| ≥ C |ϕ| − C1C |b(0)| . 2 1,ΓT 2 2 1,ΓT 2 2

The coerciveness of the action functional follows. On the other hand, the integrand

|ϕ0 − b(ϕ)|2 is bounded below by 0, and convex in ϕ0. By Theorem 2 on Page 448

1 in [11], ST (ϕ) is weakly lower semicontinuous on H (ΓT ).

87 ∞ For any minimizing sequence {ϕk}k=1, from the coerciveness, we have

sup |ϕk|1,ΓT < ∞. k

Let ϕ0 ∈ AT be any fixed function, e.g., the linear function on ΓT from 0 to x.

1 Then ϕk − ϕ0 ∈ H0 (ΓT ), and

|ϕk|0,ΓT ≤ |ϕk − ϕ0|0,ΓT + |ϕ0|0,ΓT ≤ Cp|ϕk − ϕ0|1,ΓT + |ϕ0|0,ΓT < ∞,

∞ 1 by the Poincar´e’sInequality. Thus {ϕk}k=1 is bounded in H (ΓT ). Then there exists

∞ ∗ 1 a subsequence {ϕkj }j=1 converging weakly to some ϕ in H (ΓT ). Then ϕkj − ϕ0

∗ 1 1 converges to ϕ −ϕ0 weakly in H0 (ΓT ). By Mazur’s Theorem [11], H0 (ΓT ) is weakly

∗ 1 ∗ closed. So ϕ − ϕ0 ∈ H0 (ΓT ), i.e., ϕ ∈ AT .

∗ ∗ Therefore, ST (ϕ ) ≤ lim infj→∞ ST (ϕkj ) = infϕ∈AT ST (ϕ). Since ϕ ∈ AT , we reach the conclusion.

Step 2: The Γ-convergence of ST,h to ST as h → 0. The following property will be used frequently.

1 Property 7.4.3. For any sequence {ϕh} ⊂ Bh converging weakly to ϕ ∈ H (ΓT ), we have

lim |b(ϕh) − b(ϕ)|0,Γ = 0. h→0 T

1 2 Proof. Since ϕh converges weakly to ϕ in H (ΓT ), ϕh → ϕ in L (ΓT ), i.e., ϕh converges strongly to ϕ in the L2 sense. By the Lipschitz continuity of b, we reach the conclusion.

Lemma 7.4.4 (Γ-convergence of ST,h). Let {Th} be a sequence of finite element meshes with h → 0. For every ϕ ∈ AT , the following two properties hold:

1 • Lim-inf inequality: for every sequence {ϕh} converging weakly to ϕ in H (ΓT ), we have

ST (ϕ) ≤ lim inf ST,h(ϕh). (7.4.1) h→0

88 • Lim-sup inequality: there exists a sequence {ϕh} ⊂ Bh converging weakly to

1 ϕ in H (ΓT ), such that

ST (ϕ) ≥ lim sup ST,h(ϕh). (7.4.2) h→0

Proof. We only need to consider a sequence {ϕh} ⊂ Bh for the lim-inf inequality, since otherwise, (7.4.1) is trivial by the definition of ST,h(ϕ). Let {ϕh} ⊂ Bh be

1 an arbitrary sequence converging weakly to ϕ in H (ΓT ). The discretized action functional can be rewritten as

Z T 0 2 |ϕh − b(ϕh)| dt 0 Z T Z T Z T 0 2 2 0 = |ϕh| dt + |b(ϕh)| dt − 2 hϕh, b(ϕh)i dt 0 0 0

= I1 + I2 + I3. (7.4.3)

The first summand functional I1 is obviously weakly lower semicontinuous in

1 0 H (ΓT ) since the integrand is convex with respect to ϕ .

For I2 in Equation (7.4.3), using Property 7.4.3, we have

lim |b(ϕh)|0,Γ = |b(ϕ)|0,Γ , h→0 T T

For I3 in Equation (7.4.3), we have Z T Z T 0 0 hϕh, b(ϕh)i dt − hϕ , b(ϕ)i dt 0 0 Z T Z T 0 0 0 = hϕh, b(ϕh) − b(ϕ)i dt + hϕh − ϕ , b(ϕ)i dt 0 0

0 0 ≤ |ϕh|1,ΓT |b(ϕh) − b(ϕ)|0,ΓT + |hϕh − ϕ , b(ϕ)iΓT |.

Using Property 7.4.3 and the fact that suph |ϕh|1,ΓT < ∞, we have that the first term of the above inequality converges to 0. Moreover, the second term also con-

1 verges to 0 due to the weak convergence of ϕh to ϕ in H (ΓT ). Thus, Z T Z T 0 0 lim hϕh, b(ϕh)i dt = hϕ , b(ϕ)i dt. h→0 0 0

89 Combining the results for I1, I2 and I3, we obtain the lim-inf inequality: Z T 0 2 lim inf |ϕh − b(ϕh)| dt h→0 0 Z T Z T Z T  0 2 2 0 = lim inf |ϕh| dt + |b(ϕh)| dt − 2 hϕh, b(ϕh)i dt h→0 0 0 0 Z T Z T Z T 0 2 2 0 = lim inf |ϕh| dt + lim |b(ϕh)| dt − 2 lim hϕh, b(ϕh)i dt h→0 0 h→0 0 h→0 0 Z T Z T Z T ≥ |ϕ0|2 dt + |b(ϕ)|2 dt − 2 hϕ0, b(ϕ)i dt 0 0 0 Z T = |ϕ0 − b(ϕ)|2 dt. 0

2 1 Now, we address the lim-sup inequality. Since H (ΓT ) is dense in H (ΓT ), for

1 2 any ϕ ∈ H (ΓT ) and ε > 0, there exists a non-zero uε ∈ H (ΓT ), such that

kϕ − uεk1,ΓT < ε. We have

|Ihuε − uε|1,ΓT ≤ ch|uε|2,ΓT ≤ cε, by letting ε ε h = h(ε) = min{ , , ε}, |uε|1,ΓT |uε|2,ΓT where Ih is an interpolation operator defined by linear finite elements. Let ϕh =

Ihuε. Then we have ϕh ∈ Bh, and

|ϕh − ϕ|1,ΓT ≤|ϕh − uε|1,ΓT + |uε − ϕ|1,ΓT

=|Ihuε − uε|1,ΓT + |uε − ϕ|1,ΓT

|ϕh − ϕ|0,ΓT ≤|ϕh − uε|0,ΓT + |uε − ϕ|0,ΓT

=|Ihuε − uε|0,ΓT + |uε − ϕ|0,ΓT

≤ch|uε|1,ΓT + ε

90 1 1 as ε → 0. So ϕh converges to ϕ in H (ΓT ), and also converges weakly in H (ΓT ).

By Property 7.4.3, we know that b(ϕh) → b(ϕ) in L2(ΓT ). Thus,

1 0 2 lim ST,h(ϕh) = lim |ϕh − b(ϕh)|0,Γ = ST (ϕ), h→0 h→0 2 T which proves the lim-sup equality.

Step 3: The proof of Theorem 7.4.1.

Proof. With the solution existence and the Γ-convergence, we only need the equi- coerciveness of ST,h for the final conclusion. For any ϕh ∈ Bh, we have ST,h(ϕh) =

ST (ϕh). Then the equi-coerciveness of ST,h in Bh follows from the coerciveness of

ST (ϕh) restricted to Bh ⊂ AT (see the first part of the proof of Lemma 7.4.2). 7.5 Problem II with a Finite T ∗

For this case, we consider the reformulation of ST given in Section 7.3. From Lemma

∗ ˆ 7.3.4, we know that Problem II with a finite T is equivalent to minimizing S in A1 (see Equation (7.3.8)). Our main results are summarized in the following theorem:

Theorem 7.5.1. For Problem II with a finite T ∗, we have

ˆ ˆ min S(ϕ ¯) = lim inf Sh(ϕ ¯h), ϕ¯∈A1 h→0 ϕ¯h∈Bh ˆ ˆ namely, the minima of Sh converge to the minimum of S as h → 0. Moreover, ˆ if {ϕ¯h} ⊂ Bh is a sequence of minimizers of Sh, then there is a subsequence that

1 ˆ converges weakly in H (Γ1) to some ϕ¯ ∈ A1, which is a minimizer of S.

Similar to Problem I with a fixed T , we prove this theorem in four steps:

Step 1: The strong positiveness of the functional Tˆ.

Property 7.5.2. There exists a constant CTˆ > 0 such that

ˆ T (ϕ ¯) ≥ CTˆ, (7.5.1) for any ϕ¯ ∈ A1.

91 1 Proof. For anyϕ ¯ ∈ A1, letϕ ¯ =ϕ ¯0 +ϕ ¯L, whereϕ ¯0 ∈ H0 (Γ1) andϕ ¯L(s) = xs, s ∈ [0, 1] is the linear path connecting 0 and x. We have

|ϕ¯0 + x| Tˆ(ϕ ¯) = 0 0,Γ1 |b(ϕ ¯0 +ϕ ¯L)|0,Γ1 |ϕ¯0 + x| ≥ 0 0,Γ1 |b(ϕ ¯0 +ϕ ¯L) − b(ϕ ¯L)|0,Γ1 + |b(ϕ ¯L)|0,Γ1 |ϕ¯0 + x| ≥ 0 0,Γ1 K|ϕ¯0|0,Γ1 + |b(ϕ ¯L)|0,Γ1 0 |ϕ¯0 + x|0,Γ1 ≥ 0 , KCp|ϕ¯0|0,Γ1 + |b(ϕ ¯L)|0,Γ1 where Cp is the constant for Poincar´e’sInequality. So

|ϕ¯0 + x|2 Tˆ(ϕ ¯)2 ≥ 0 0,Γ1 2K2C2|ϕ¯0 |2 + 2|b(ϕ ¯ )|2 p 0 0,Γ1 L 0,Γ1 |ϕ¯0 + x|2 = 0 0,Γ1 C |ϕ¯0 |2 + C 1 0 0,Γ1 2

=: J(ϕ ¯0) > 0,

2 2 2 where C1 = 2K Cp > 0, and C2 = 2|b(ϕ ¯L)|0,Γ1 > 0.

1 Let δϕ¯ ∈ H0 (Γ1) be a perturbation function with δϕ¯(0) = δϕ¯(1) = 0. Then

J(ϕ ¯0 + δϕ¯) − J(ϕ ¯0) |ϕ¯0 + x + δϕ¯0|2 |ϕ¯0 + x|2 = 0 0,Γ1 − 0 0,Γ1 C |ϕ¯0 + δϕ¯0|2 + C C |ϕ¯0 |2 + C 1 0 0,Γ1 2 1 0 0,Γ1 2 0 0 2 0 2 0 2 0 0 2 |ϕ¯ + x + δϕ¯ | (C1|ϕ¯ | + C2) − |ϕ¯ + x| (C1|ϕ¯ + δϕ¯ | + C2) = 0 0,Γ1 0 0,Γ1 0 0,Γ1 0 0,Γ1 (C |ϕ¯0 + δϕ¯0|2 + C )(C |ϕ¯0 |2 + C ) 1 0 0,Γ1 2 1 0 0,Γ1 2 0 0 0 2 0 0 0 2 2hϕ¯ + x, δϕ¯ iΓ (C1|ϕ¯ | + C2) − 2C1hϕ¯ , δϕ¯ iΓ |ϕ¯ + x| = 0 1 0 0,Γ1 0 1 0 0,Γ1 + R(ϕ ¯0 , x, δϕ¯0), (C |ϕ¯0 |2 + C )2 0 1 0 0,Γ1 2

2 where R is the remainder term of O(|δϕ¯|1,Γ1 ). So the first-order variation of J is

0 0 0 2 0 2 2hϕ¯ , δϕ¯ iΓ (C1|ϕ¯ | + C2 − C1|ϕ¯ + x| ) δJ = 0 1 0 0,Γ1 0 0,Γ1 . (C |ϕ¯0 |2 + C )2 1 0 0,Γ1 2

0 Then the optimality condition δJ = 0 is satisfied only in two cases:ϕ ¯0 = 0 or

0 2 0 2 1 C1|ϕ¯0|0,Γ1 +C2 = C1|ϕ¯0+x|0,Γ1 . For the first case,ϕ ¯0 is a constant. Butϕ ¯0 ∈ H0 (Γ1),

92 |x|2 1 soϕ ¯0 = 0. Then J(0) = > 0. For the second case, J(ϕ ¯0) = > 0. Thus, C2 C1 |x|2 1 Tˆ2(ϕ ¯) ≥ min{ , }, C2 C1 or more specifically,

|x| 1 Tˆ(ϕ ¯) ≥ C ˆ := min{√ , √ } > 0, T 2|b(ϕ ¯ )|2 2KC L 0,Γ1 p which proves the property.

ˆ Step 2: The existence of the minimizer of S(ϕ ¯) in A1.

Lemma 7.5.3 (Solution Existence). If the optimal integral time T ∗ for Problem

∗ II is finite, there exists at least one function ϕ ∈ AT such that

∗ ˆ ST ∗ (ϕ ) = min ST (ϕ) = min S(ϕ ¯). + T ∈R , ϕ¯∈A1 ϕ∈AT

ˆ 1 Proof. We first prove the weakly lower semi-continuity of S(ϕ ¯) in H (Γ1). Rewrite Sˆ(ϕ ¯) by substituting (7.3.4). Then we get

Tˆ(ϕ ¯) Z 1 2 ˆ ˆ−1 0 S(ϕ ¯) = T (ϕ ¯)ϕ ¯ − b(ϕ ¯) dt 2 0

0 0 = |ϕ¯ |0,Γ1 |b(ϕ ¯)|0,Γ1 − hϕ¯ , b(ϕ ¯)iΓ1 .

1 0 2 For any sequenceϕ ¯k converging weakly toϕ ¯ in H (Γ1), {ϕ¯k} is bounded in L (Γ1)

2 andϕ ¯k → ϕ¯ in L (Γ1). Coupling with the global Lipschitz continuity of b, we can obtain

2 2 lim |b(ϕ ¯k)|0,Γ = |b(ϕ ¯)|0,Γ , k→∞ 1 1

0 0 lim hϕ¯k, b(ϕ ¯k)iΓ1 = hϕ¯ , b(ϕ ¯)iΓ1 . k→∞

0 The weakly lower semicontinuity of |ϕ¯ |0,Γ1 yields that

0 0 lim inf |ϕ¯k|0,Γ1 ≥ |ϕ¯ |0,Γ1 . (7.5.2) k→∞

93 Combining the above results, we have

ˆ lim inf Sk(ϕ ¯k) k→∞

0 0 = lim inf (|ϕ¯k|0,Γ1 |b(ϕ ¯k)|0,Γ1 − hϕ¯k, b(ϕ ¯k)iΓ1 ) k→∞

0 0 = lim inf |ϕ¯k|0,Γ1 |b(ϕ ¯k)|0,Γ1 − lim hϕ¯k, b(ϕ ¯k)iΓ1 k→∞ k→∞

0 0 ≥ |ϕ¯ |0,Γ1 |b(ϕ ¯)|0,Γ1 − hϕ¯ , b(ϕ ¯)iΓ1

= Sˆ(ϕ ¯),

ˆ 1 that is, S(ϕ ¯) is weakly lower semicontinuous in H (Γ1). Now, we establish the coerciveness of Sˆ(ϕ ¯). Since T ∗ is finite, there exists M ∈

(T ∗, ∞), such that

inf Sˆ(ϕ ¯) = inf Sˆ(ϕ ¯). ϕ¯∈A1 ϕ¯∈A1, Tˆ(ϕ ¯)

∗ sequence of ST (ϕ). The assumption of T < ∞ allows us to add the condition ˆ ˆ that supk T (ϕ ¯k) < M. Otherwise, T (ϕ ¯k) must go to infinity. The continuity of S(T, ϕ¯) with respect to T yields that T ∗ = ∞, which contradicts our assumption

∗ ˆ−1 0 0 that T < ∞. Now, let T (ϕ ¯)ϕ ¯ (s) − b(ϕ ¯(s)) =g ¯ (s). Then for anyϕ ¯ ∈ A1 with Tˆ(ϕ ¯) < M,

|ϕ¯0| ≤ |Tˆ(ϕ ¯)||b(ϕ ¯)| + |Tˆ(ϕ ¯)||g0|

≤ M|b(ϕ ¯)| + M|g¯0|

≤ MK|ϕ¯| + M|b(0)| + M|g¯0| Z s ≤ MK |ϕ¯0(u)| du + M|b(0)| + M|g¯0|. 0 By Gronwall’s Inequality, we have

Z s |ϕ¯0(s)| ≤ M 2K(|g¯0(u)| + |b(0)|)eKM(s−u) du + M|b(0)| + M|g¯0(s)|, 0

94 which yields that

0 2 2 0 2 |ϕ¯ |0,Γ1 ≤ C1|b(0)| + C2|g¯ |0,Γ1 , (7.5.3) where C1,C2 ∈ (0, ∞) only depend on M and K. So

Tˆ(ϕ ¯) Z 1 2 ˆ ˆ−1 0 S(ϕ ¯) = T (ϕ ¯)ϕ ¯ (s) − b(ϕ ¯(s)) ds 2 0 Tˆ(ϕ ¯) = |g¯0|2 2 0,Γ1   CTˆ 1 0 2 C1 2 ≥ |ϕ¯ |0,Γ1 − |b(0)| , 2 C2 C2 where we used Property 7.5.2 in the last step. The coerciveness is proved.

∞ ˆ For any minimizing sequence {ϕ¯k}k=1 of S(ϕ ¯), we have

0 2C1 2 2C2 ˆ sup |ϕ¯k|0,Γ1 ≤ |b(0)| + sup{S(ϕ ¯k)} < ∞. k CT CT k

Letϕ ¯0 ∈ A1. Then

|ϕ¯k|0,Γ1 ≤ |ϕ¯k − ϕ¯0|0,Γ1 + |ϕ¯0|0,Γ1

0 0 ≤ Cp|ϕ¯k − ϕ¯0|0,Γ1 + |ϕ¯0|0,Γ1 < ∞,

∞ 1 by Poincar´e’sInequality. Thus, {ϕ¯k}k=1 is bounded in H (Γ1). Then there is a

∞ ∗ 1 1 subsequence {ϕ¯kj }j=1 converging to someϕ ¯ ∈ H (Γ1) weakly in H (Γ1). Soϕ ¯kj −

∗ 1 1 ϕ¯0 converges weakly toϕ ¯ − ϕ¯0 in H0 (Γ1). By Mazur’s Theorem, H0 (Γ1) is weakly

∗ 1 ∗ ∗ closed. Soϕ ¯ −ϕ¯0 ∈ H0 (Γ1), andϕ ¯ ∈ A1. By Lemma 7.3.4, ϕ ∈ AT corresponding ∗ ∗ ˆ ∗ toϕ ¯ ∈ A1 is a minimizer of ST (ϕ) and T = T (ϕ ¯ ).

ˆ ˆ Step 3: The Γ-convergence of Sh to S as h → 0.

ˆ Lemma 7.5.4 (Γ-convergence of Sh). Let {Th} be a sequence of finite element meshes. For every ϕ¯ ∈ A1, the following two properties hold:

1 • Lim-inf inequality: for every sequence {ϕ¯h} converging weakly to ϕ¯ in H (Γ1), we have ˆ ˆ S(ϕ ¯) ≤ lim inf Sh(ϕ ¯h). (7.5.4) h→0

95 ˆ • Lim-sup inequality: there exists a sequence {ϕ¯h} ⊂ Bh converging weakly to

1 ϕ¯ in H (Γ1), such that

ˆ ˆ S(ϕ ¯) ≥ lim sup Sh(ϕ ¯h). (7.5.5) h→0

Proof. We only need to consider a sequence {ϕ¯h} ⊂ Bh for the lim-inf inequality, since otherwise, the inequality is trivial. Similar to the proof of Lemma 7.5.3, rewrite the discretized action functional as

Tˆ(ϕ ¯ ) Z 1 2 ˆ h ˆ−1 0 Sh(ϕ ¯h) = T (ϕ ¯h)ϕ ¯h − b(ϕ ¯h) dt 2 0

0 0 = |ϕ¯h|0,Γ1 |b(ϕ ¯h)|0,Γ1 − hϕ¯h, b(ϕ ¯h)iΓ1 .

By the same argument as in the proof of Lemma 7.4.4, we have

0 0 lim inf |ϕ¯h|0,Γ1 ≥ |ϕ¯ |0,Γ1 , h→0

lim |b(ϕ ¯h)|0,Γ1 = |b(ϕ ¯)|0,Γ1 , h→0

0 0 limhϕ¯h, b(ϕ ¯h)iΓ1 = hϕ¯ , b(ϕ ¯)iΓ1 . h→0

Combining these results, we have the lim-inf inequality. The lim-sup inequality can be obtained by the same argument as in the proof of Lemma 7.4.4

Step 4: The proof of Theorem 7.5.1.

Proof. Similar to the proof of Theorem 7.4.1, the only thing left is to verify the ˆ equi-coerciveness of Sh(ϕ ¯h), which can be obtained directly from the coerciveness ˆ ¯ of S(ϕ ¯) restricted onto Bh ⊂ A1 (see the second step in the proof of Lemma 7.5.3). So the proof is complete.

7.6 Problem II with an Infinite T ∗

In this case, the integration domain becomes the whole positive real line, which is not able to be rescaled to the unit interval. The geometric MAM (gMAM) [17]

96 rescales this infinite domain by arc length by assuming, instead of finite transition time, finite total arc length. This way the zero-Hamiltonian constraint (7.3.1) can also remove the parameter T . However, since the Jacobian of the transform between time and arc length variables is singular at critical points, the numerical accuracy will deteriorate when unknown critical points exist along the MAP.

So we still use the reformulation in time. But we use a large but finite integra- tion time to deal with the infinite T ∗. We simplify the scenario, but reserve the numerical difficulties. Let 0 ∈ D be an asymptotically stable equilibrium point, D is contained in the basin of attraction of 0, and hb(y), n(y)i < 0 for any y ∈ ∂D, where n(y) is the exterior normal to the boundary ∂D. Then starting from any point in D, a trajectory of system (6.3.3) will converge to 0. We assume that the ending point x of Problem II is located on ∂D.

If we consider the change of variable in general, say α = α(t), we have (see

Lemma 3.1, Chapter 4 in [13])

Z α(T ) 0 0 ST (ϕ) ≥ S(ϕ ˜) = (|ϕ˜ ||b| − hϕ˜ , bi)dα, (7.6.1) α(0) whereϕ ˜(α) = ϕ(t(α)),ϕ ˜0 is the derivative with respect to α, and the equality holds when the zero-Hamiltonian constraint (7.3.1) is satisfied. In terms of α, the zero-Hamiltonian constraint can be written as

|ϕ˜0|α˙ (t) = |b(ϕ ˜)|, (7.6.2) which yields Z α(t) |ϕ˜0| t = dα. (7.6.3) 0 |b(ϕ ˜)| If |ϕ˜0| ≡ cnst, the variable α just the rescaled arc length. Assuming that the length of the optimal curve is finite, we can rescale the total arc length to one, i.e.,

α(T ) = 1, which yields gMAM.

97 For any transition pathϕ ˜(α) = ϕ(t(α)) that satisfies the zero-Hamiltonian con- straint, if α is the arc length variable withϕ ˜(0) = 0, then |ϕ˜0| = 1. Let y be an arbitrary point onϕ ˜. Then the integration time from 0 to y is

Z αy 1 t = dα, 0 |b(ϕ ˜)| where αy is the arc length of the curve connecting 0 and y. Then the total arc length from the equilibrium 0 to y alongϕ ¯ is small if y is in a small neighborhood of 0. However, from

|b(ϕ ˜)| = |b(ϕ ˜) − b(0)| ≤ K|ϕ˜| ≤ Kα, we get Z αy 1 Z αy 1 t = dα ≥ dα = ∞, 0 |b(ϕ ˜)| 0 Kα ∗ as long as αy > 0. Thus, T = ∞, since 0 is a critical point. For clarity, we present the starting and ending points of a transition path in its

∗ notation. Let ϕy,x denote the minimizer of Problem II with starting point y and

∗ ending point x, and Ty,x the corresponding optimal integration time. Then for any

∗ ∗ ∗ ∗ y on ϕ0,x, T0,y = ∞ and Ty,x < ∞ whenever the path ϕy,x has finite arc length. ˆ L We can construct a minimizing sequence for S(ϕ). In fact, let ϕ0,y = yt be the linear path from 0 to y in one time unit T = 1. Then

Z 1 ∗ L 1 2 S ∗ (ϕ ) ≤ S (ϕ ) = |y − b(yt)| dt T0,y 0,y T 0,y 2 0 Z 1 ≤ (|y|2 + K2|y|2t2) dt ≤ C(K)ρ2, 0

∗ where |y| ≤ ρ. So the action S ∗ (ϕ ) decreases to zero with respect to ρ does, T0,y 0,y

∗ even if T0,y = ∞ for any finite ρ. Consider a sequence of optimization problems

ˆ ∗,n ˆ S(ϕ ¯0,x) = inf S(ϕ ¯), n = 1, 2, 3,..., (7.6.4) Tˆ(ϕ ¯)≤n, ϕ¯∈A1 by adding the constraint Tˆ(ϕ ¯) ≤ n. We have

98 ∗,n ∞ Lemma 7.6.1. {ϕ¯0,x}n=1 is a minimizing sequence of (7.3.9).

ˆ ∗,n Proof. First of all, S(ϕ0,x) is decreasing as n increases. Choose ρ such that x∈ /

−k Bρ(0) and define ρk = 2 ρ, k = 1, 2,.... Let yk be the first intersection point of

∗ ∗ ϕ0,x and Bρk (0) when travelling along ϕ0,x from x to 0. Then |yk| = ρk and the optimal transition time T ∗ < ∞. For each k, we construct a path from 0 to x by: yk,x   ϕL , t ∈ [−T ∗ − 1, −T ∗ ],  0,yk yk,x yk,x ϕk =  ϕ∗ , t ∈ [−T ∗ , 0].  yk,x yk,x Due to the additivity, we know that ϕ∗ is located on ϕ∗ since y ∈ ϕ∗ . Then yk,x 0,x k 0,x {(T = T ∗ + 1, ϕ )} is a minimizing sequence as ρ decreases, and k yk,x k k

∗ 2 S (ϕ ) ≤ S ∗ (ϕ ) + C(K)ρ . Tk k T0,x 0,x k

Consider n = dTke. We have

∗,n ∗ 2 Sˆ(ϕ ) ≤ S (ϕ ) ≤ S ∗ (ϕ ) + C(K)ρ , 0,x Tk k T0,x 0,x k

ˆ where the first inequality is because T (ϕk) is not equal to Tk in general. We then reach the conclusion.

When T ∗ = ∞, we have Tˆ(ϕ ¯) → ∞ asϕ ¯ gets close to the minimizer, which

0 implies |ϕ¯ |0,Γ1 → ∞. Thus for this case, it is not able to conduct the convergence

1 ¯x2 analysis in H (Γ1). So we use a larger space Cx1 (0,T ), the space of absolutely continuous functions connecting x1 and x2 on [0,T ] with 0 < T ≤ ∞. To prove the convergence of the minimizing sequence in Lemma 7.6.1, we use the following proposition:

 Proposition 7.6.2 (Proposition 2.1 in [17]). Assume that the sequence (Tk, ϕk) k∈N

¯x2 with Tk > 0 and ϕk ∈ Cx1 (0,Tk) for every k ∈ N, is a minimizing sequence of

(6.3.6) and that the lengths of the curves of ϕk are uniformly bounded, i.e.

Z Tk lim ST (ϕk) = V (x1, x2) and sup |ϕ˙ k(t)| dt < ∞. k→∞ k k∈N 0

99 Then the the action functional Sˆ has a minimizer ϕ∗, and for some subsequence

(ϕkl )l∈N we have that

∗ lim d(ϕk , ϕ ) = 0, l→∞ l where d denotes the Fr´echetdistance.

∗,n Theorem 7.6.3. Assume that the lengths of the curves ϕ0,x are uniformly bounded.

∗,nl ∗ ¯x Then there exists a subsequence ϕ0,x that converges to a minimizer ϕ ∈ C0 (0,T ) with respect to Fr´echetdistance.

∗,n ∞ Proof. By Lemma 7.3.4, we have a one-to-one correspondence between {ϕ¯0,x}n=1

∗,n ∞ ∗,n ∞ and {ϕ0,x}n=1. So by Lemma 7.6.1, {(n, ϕ0,x)}n=1 defines a minimizing sequence of Problem II. The theorem then is a direct application of Proposition 7.6.2.

∗,n Although we just constructed {ϕ}0,x for an exit problem around the neighbor- hood of an asymptotically stable equilibrium, it is easy to see that the idea can be applied to a global transition, say both x1 and x2 are asymptotically stable equilibrium, as long as there exist finitely many critical points along the MAP.

In Equation (7.6.4), we introduced an extra constraint Tˆ(ϕ ¯) ≤ n, which implies that the infimum may be reached at the boundary Tˆ(ϕ ¯) = n. However, such a box-type constraint is not favorable from the optimization point of view. In the next lemma, we show that this constraint is not needed for the discrete problems, which is a key observation based on which tMAM is developed (See Section 7.2).

∗ ˆ ˆ ∗ Lemma 7.6.4. If ϕ¯h is the minimizer of Sh(ϕ ¯h) over Bh, then T (ϕ ¯h) ≤ Ch < ∞, for any given h.

Proof. Note that |(ϕ ¯∗ )0|2 |ϕ¯∗ |2 Tˆ2(ϕ ¯∗ ) = h 0,Γ1 ≤ C h 0,Γ1 , h |b(ϕ ¯∗ )|2 |b(ϕ ¯∗ )|2 h 0,Γ1 h 0,Γ1

100 where the last inequality is from the inverse inequality of finite element discretiza-

ˆ ∗ tion and the constant C only depends on mesh [4]. Suppose T (ϕ ¯h) = ∞. Then

∗ ∗ ∗ either |b(ϕ ¯h)|0,Γ1 = 0 or |ϕ¯h|0,Γ1 = ∞. The first case implies that b(ϕ ¯h(s)) = 0 for all s ∈ Γ1, which contradicts Assumption 7.1.1 (3). The second case implies that

∗ ϕ¯h contradicts Lemma 7.1.2 due to the continuity.

Lemma 7.6.4 means that for a discrete problem the constraint Tˆ(ϕ ¯) ≤ n in

Equation (7.6.4) is not necessary in the sense that there always exists a number

ˆ ∗ n such that T (ϕ ¯h) < n. We can then consider a sequence {T h} of finite element ˆ meshes and treat the minimization of Sh(ϕ ¯h) exactly in the same way as the case

∗ ˆ ∗ ∗ that T is finite. Simply speaking, {(T (ϕ ¯h), ϕ¯h)} defines a minimizing sequence as h → 0 no matter that T ∗ is finite or infinite. The only difference between T ∗ < ∞

∗ ∗ 1 ∗ and T = ∞ is that we address the convergence ofϕ ¯h in H (Γ1) for T < ∞ and

¯x2 ∗ in Cx1 for T = ∞.

7.7 A Priori Error Estimate for a Linear ODE System

In this section we apply our strategy to a linear ODE system with b(x) = −Ax, where A is a symmetric positive definite matrix. Then x = 0 is a global attractor.

The E-L equation associated with Sˆ(ϕ ¯) becomes

−Tˆ−2(ϕ ¯)ϕ ¯00 + A2ϕ¯ = 0, (7.7.1) which is a nonlocal elliptic problem of Kirchhoff type. For Problem I with a fixed T , i.e., Tˆ(ϕ) = T , Equation (7.7.1) becomes a standard diffusion-reaction equation.

∗ ∗ ˆ ∗ ∗ Letϕ ¯ =ϕ ¯0 + ϕL ∈ A1 be the minimizer of S(ϕ ¯), andϕ ¯h =ϕ ¯h,0 + ϕL ∈ Bh the ˆ minimizer of Sh(ϕ ¯h), where ϕL = x1 + (x2 − x1)s is a linear function connecting x1 and x2 on Γ1. Let

1 Vh = {v : v|I is affine on all I ∈ T h, v(0) = v(1) = 0} ⊂ H0 (Γ1).

101 For a fixed T , Equation (7.7.1) has a unique solution and the standard argument shows that

|ϕ¯∗ − ϕ¯∗ | = ϕ¯∗ − ϕ¯∗ ≤ CT 2 inf |ϕ¯∗ − w| . (7.7.2) h 1,Γ1 0 h,0 1,Γ 0 1,Γ1 1 w∈Vh

If T is large enough, Equation (7.7.1) can be regarded as a singularly perturbed problem, the best approximation given by a uniform mesh cannot reach the optimal convergence rate due to the existence of boundary layer.

We now consider Problem II with a finite T ∗. The minimizerϕ ¯∗ of Sˆ(ϕ ¯) satisfies the weak form of Equation (7.7.1):

∗ 0 0 ˆ2 ∗ ∗ 1 (ϕ ¯ ) , v = −T (ϕ ¯ ) hAϕ¯ , Avi , ∀ v ∈ H (Γ1). (7.7.3) Γ1 Γ1 0

∗ ˆ The minimizerϕ ¯h of Sh(ϕ ¯h) satisfies the discrete weak form:

∗ 0 0 ˆ2 ∗ ∗ (ϕ ¯ ) , v = −T (ϕ ¯ ) hAϕ¯ , Avi , ∀ v ∈ Vh. (7.7.4) h Γ1 h h Γ1

We have the following a priori error estimate for Problem II with a finite T ∗:

∗ ∗ 1 Proposition 7.7.1. Consider a subsequence ϕ¯h converging weakly to ϕ¯ in H (Γ1)

∗ ∗ as h → 0. Assume that ϕ¯ and ϕ¯h satisfy Equations (7.7.3) and (7.7.4), respec- tively. For problem II with a finite T ∗, there exists a constant C ∼ (T ∗)2 such that

|ϕ¯∗ − ϕ¯∗ | = ϕ¯∗ − ϕ¯∗ ≤ C inf |ϕ¯∗ − w| , (7.7.5) h 1,Γ1 0 h,0 1,Γ 0 1,Γ1 1 w∈Vh when h is small enough.

∗ Proof. Let η be the best approximation ofϕ ¯ on Vh ⊕ ϕL, i.e.,

|ϕ¯∗ − η| = inf |ϕ¯∗ − w| . 1,Γ1 1,Γ1 w∈Vh⊕ϕL

Then we have

∗ 0 0 h(ϕ ¯ − η) , w i = 0, ∀ w ∈ Vh,

102 ∗ 1 whereϕ ¯ − η ∈ H0 (Γ1). Consider

|ϕ¯∗ − η|2 = h(ϕ ¯∗ − ϕ¯∗)0, (ϕ ¯∗ − η)0i + h(ϕ ¯∗ − η)0, (ϕ ¯∗ − η)0i h 1,Γ1 h h Γ1 h Γ1

= h(ϕ ¯∗ − ϕ¯∗)0, (ϕ ¯∗ − η)0i h h Γ1 D E ˆ2 ∗ ∗ ˆ2 ∗ ∗ 2 ∗ = − (T (ϕ ¯h)ϕ ¯h − T (ϕ ¯ )ϕ ¯ ),A (ϕ ¯h − η) Γ1 D E ˆ2 ∗ ∗ ˆ2 2 ∗ = − (T (ϕ ¯h)ϕ ¯h − T (η)η),A (ϕ ¯h − η) Γ1 D E ˆ2 ˆ2 ∗ ∗ 2 ∗ − (T (η)η − T (ϕ ¯ )ϕ ¯ ),A (ϕ ¯h − η) Γ1

=I1 + I2. (7.7.6)

We look at I2 first. Note that

|Tˆ2(η) − Tˆ2(ϕ ¯∗)|

|η|2 |ϕ¯∗|2 = 1,Γ1 − 1,Γ1 |Aη|2 |Aϕ¯∗|2 0,Γ1 0,Γ1

|η|2 − |ϕ¯∗|2 |ϕ¯∗|2 (|Aϕ¯∗|2 − |Aη|2 ) = 1,Γ1 1,Γ1 + 1,Γ1 0,Γ1 0,Γ1 |Aη|2 |Aη|2 |Aϕ¯∗|2 0,Γ1 0,Γ1 0,Γ1

∗ ∗ ≤ CTˆ(η, ϕ¯ )|η − ϕ¯ |1,Γ1 , (7.7.7) where

∗ ∗ 2 ∗ |η| + |ϕ¯ | |ϕ¯ | (|Aη|0,Γ + |Aϕ¯ |0,Γ )kAkCp C (η, ϕ¯∗) = 1,Γ1 1,Γ1 + 1,Γ1 1 1 , Tˆ |Aη|2 |Aη|2 |Aϕ¯∗|2 0,Γ1 0,Γ1 0,Γ1 and Cp is the Poincar´econstant. Then we have

D E ˆ2 ˆ2 ∗ ∗ 2 ∗ |I2| = (T (η)η − T (ϕ ¯ )ϕ ¯ ),A (ϕ ¯h − η) Γ1 D E ˆ2 ∗ 2 ∗ ≤ T (η)(η − ϕ¯ ),A (ϕ ¯h − η) Γ1 D E ˆ2 ˆ2 ∗ ∗ 2 ∗ + (T (η) − T (ϕ ¯ ))ϕ ¯ ,A (ϕ ¯h − η) Γ1 ˆ2 2 2 ∗ ∗ ∗ ∗ ≤(T (η)Cp kAk + CTˆ(η, ϕ¯ )|Aϕ¯ |0,Γ1 kAkCp)|η − ϕ¯ |1,Γ1 |ϕ¯h − η|1,Γ1

∗ ∗ ∗ =CI2 (η, ϕ¯ )|η − ϕ¯ |1,Γ1 |ϕ¯h − η|1,Γ1 . (7.7.8)

103 By the definition of η, we have

∗ ∗ lim |η|1,Γ1 = |ϕ¯ |1,Γ1 , lim |Aη|0,Γ1 = |Aϕ¯ |0,Γ1 . h→0 h→0

We then have

∗ 2 lim CI2 (η, ϕ¯ ) = 2M + 3M , h→0

∗ where M = kAkCpT .

ˆ ∗ ˆ ∗ ∗ We now look at I1. Since limh→0 T (ϕ ¯h) = limh→0 T (η) = T and T < ∞, we know that when h is small enough, I ∼ −(T ∗)2 h(ϕ ¯∗ − η),A2(ϕ ¯∗ − η)i < 0. 1 h h Γ1 Combining this fact with Equation (7.7.6) and (7.7.8), we have that for h small enough there exists a constant C > 2M + 3M 2 such that

|ϕ¯∗ − η| ≤ C |ϕ¯∗ − η| = C inf |ϕ¯∗ − w| . h 1,Γ1 1,Γ1 1,Γ1 w∈Vh⊕ϕL

The proof is complete.

To this end, we obtain a similar a priori error estimate to that for Problem I with a fixed T . Since T ∗ can be arbitrarily large, we know that the optimal convergence rate may degenerate when a boundary layer exists. Using Proposition 7.7.1, we can easily obtain the optimal convergence rate with respect to the error of action functional:

ˆ ∗ ˆ ∗ 2 ˆ ∗ ∗ ∗ 2 |S(ϕ ¯ ) − S(ϕ ¯h)| ∼ |δ S(ϕ ¯ )| ∼ |ϕ¯h − ϕ¯ |1,Γ1 , (7.7.9) where the second-order variation can be obtained with respect to the perturbation

∗ ∗ function δϕ¯ =ϕ ¯ − ϕ¯h.

7.8 Numerical Experiments

We will use the following simple linear stochastic ODE system to demonstrate our analysis results √ dX(t) = AX(t) dt + ε dW (t), (7.8.1)

104 where       a −b λ 0 a b −1    1    A = B JB =       , b a 0 λ2 −b a √ T with a = 1/3, b = 8/3, λ1 = −10, and λ2 = −2. Then z = (0, 0) is a stable fixed point. For the corresponding deterministic system, namely when ε = 0, and any given point X(0) = x 6= z in the phase space, the trajectory X(t) = etAx converges to z as t → ∞. When noise exists, this trajectory is also the minimal action path

ϕ∗ from x to etAx with T ∗ = t, since V (x, etAx) = 0. Moreover, if the ending point is z, T ∗ = ∞. This obviously is not an exit problem, which is a typical application of MAM. However, it includes most of the numerical difficulties of MAM, and the trajectory can serve as an exact solution, which simplifies the discussions.

Consider the minimal action path from x (6= z) to etAx such that T ∗ = t.

Since the minimal action path corresponds to a trajectory, we can use the value of action functional as the measure of error with an optimal rate O(h2) ∼ O(N −2)

(see Equation (7.7.9)), where N is the number of elements. We will look at the following two cases:

∗ ∗ (i). T is finite and small. According to Theorem 7.5.1,ϕ ¯h converges toϕ ¯ . Since T ∗ is small, according to Proposition 7.7.1, we expect optimal convergence

rate ofϕ ¯h as h → 0.

(ii). T ∗ = ∞. We will compare the convergence behavior between tMAM and

MAM with a fixed large T . According to Theorem 7.6.3, we have the con-

¯z vergence in Cx. However, the convergence of this case is similar to that for a finite but large T ∗, where we expect a deteriorated convergence rate.

Case (i):

Let x = (1, 1). We use eAx as the ending point such that T ∗ = 1. In figure

7.1 we plot the convergence behavior of tMAM with uniform linear finite element

105 FIGURE 7.1. [32] Convergence behavior of tMAM for Case (i). Left: errors of action functional; Right: errors of T ∗.

FIGURE 7.2. [32] Convergence behavior of tMAM and MAM with a fixed T for Case ∗ ˆ ∗ (ii). Left: errors of action functional; Right: estimated T of tMAM, i.e., T (ϕ ¯h). discretization. It is seen that the optimal convergence rate is reached for both

∗ ˆ action functional and T estimated by T (ϕ ¯h). Case (ii):

For this case, we still use x = (1, 1) as the starting point. The ending point is chosen as a = (0, 0)T such that T ∗ = ∞. Except tMAM, we use MAM with a fixed T to approximate this case, where T is supposed to be large. In general, we do not have a criterion to define how large is enough because the accuracy is affected by two competing issues: 1) The fact that T ∗ = ∞ favors a large T ; but

2) a fixed discretization favors a small T . This implies that for any given h, an

“optimal” finite T exists. For the purpose of demonstration, we choose T = 100,

106 FIGURE 7.3. [32] Approximate MAPs given by tMAM and MAM with a fixed T for Case (ii).

∗ which is actually too large from the accuracy point of view. Let ϕT (t) be the

∗ ∗ approximate MAP given by MAM with a fixed T . We know thatϕ ¯T (s) = ϕT (t/T ) ˆ ∗ yields a smaller action with the integration time T (ϕ ¯T (s)). In this sense, no matter what T is chosen, for the same discretization tMAM will always provide a better approximation than MAM with a fixed T . The reason we use an overlarge T is to demonstrate the deterioration of convergence rate. In figure 7.2, we plot the convergence behavior of tMAM and MAM with T = 100 on the left, and the estimated T ∗ given by tMAM on the right. It is seen that the convergence is slower than O(N −2) as we have analyzed in section 7.7. For the same discretization, tMAM has an accuracy that is several orders of magnitude better than MAM with

T = 100. In the right plot of figure 7.2, we see that the optimal integration time for a certain discretization is actually not large at all. This implies that MAM with a fixed T for Case (ii) is actually not very reliable. In figure 7.3, we compare the

MAPs given by tMAM and MAM with the exact solution etAx, where all symbols indicate the nodes of finite element discretization. First of all, we note that the number of effective nodes in MAM is small because of the scale separation of fast

107 dynamics and small dynamics. Most nodes are clustered around the fixed point.

This is called a problem of clustering (see [25, 33] for the discussion of this issue).

Second, if the chosen T is too large, oscillation is observed in the paths given by

MAM especially when the resolution is relatively low; on the other hand, tMAM does not suffer such an oscillation by adjusting the integration time according to the resolution. Third, although tMAM is able to provide a good approximation even with a coarse discretization, more than enough nodes are put into the region around the fixed point, which corresponds to the deterioration of convergence rate.

To recover the optimal convergence rate, we need to resort to adaptivity (see [28, 33] for the construction of the algorithm).

108 Chapter 8 Temporal Minimum Action Method for Time Delayed System

8.1 Introduction of Large Deviation Principle for Stochastic Differential Delay Equations

In reality, some systems in physics, chemistry, biology, medicine, economics, engi- neering, etc., depend not only on the current state, but also on the past states (see e.g. [21]). In addition to the properties of transitions as rare events in stochastic delayed systems, the study of LDP provides us insight of the impact of delays on the systems.

The delay of a system can have memorial influence on the system in different ways, for example, discrete delays at finite previous points and distributed delay along a previous time interval. In this chapter, we will apply tMAM to the system with discrete delay. Without loss of generality, we consider delay at one previous point, i.e., the stochastic differential delay equation (SDDE)

 √  dXt = b(Xt,Xt−τ ) dt + ε dWt, t ∈ (0,T ], (8.1.1)  X(t) = f(t), t ∈ [−τ, 0],

where τ is a positive constant, b(x, y) is Lipschitz continuous with respect to both

n x and y, and Xt ∈ R . The solution of a time delayed system is not uniquely defined by the sole knowledge of the pointwise initial condition at t = 0 but by a functional initial condition f(·) defined over the interval [−τ, 0] [16]. In some literature, this is also referred to as a memory effect. Due to the dependence on a function instead of a point, Equation (8.1.1) is not a finite-dimensional system, but an infinite-dimensional one.

109 The large deviation principle for this SDDE is proved in [22] with the action functional Z T 1 0 2 ST (ϕ) = |ϕ (t) − b(ϕ(t), ϕ(t − τ))| dt. (8.1.2) 2 0

To apply tMAM, we need to scale the time linearly, then rewrite ST (ϕ) as

T Z 1 S(T, ϕ¯) = |T −1ϕ¯0(s) − b(ϕ ¯(s), ϕ¯(s − τ/T ))|2 dt, (8.1.3) 2 0 and seek the optimal linear time scaling for ϕ, whereϕ ¯(s) is the linearly rescaled path of ϕ on [0, 1]. However, we notice that comparing to a simple quadratic equation as in tMAM for systems without delays, T is coupled with the delay such that the optimality condition ∂T S = 0 yields a complicated nonlinear equation

1 1 ∂ S = hb, bi − T −2hϕ¯0 , ϕ¯0 i − h∇ bϕ¯0 T −1τ, T −1ϕ¯0 − bi = 0, (8.1.4) T 2 2 s s 2 sˆ s

2 where h·, ·i denotes the inner product in L ([0, 1]),ϕ ¯s−τ/T =ϕ ¯sˆ and ∇2b indi- cates the gradient of b with respect toϕ ¯sˆ. Although a root-finding algorithm is always possible, we expect to control T more explicitly, which is important for the development and understanding of the algorithm.

Note that if there is no delay, i.e., τ = 0, we have only one positive solution

hϕ¯0 , ϕ¯0 i1/2 T = s s , (8.1.5) hb, bi1/2 which is the same to the result in Chapter 7 as expected. We will use this rescaling time for our penalty method introduced in the next section, and we will see at the end of Section 8.4 that it is a good approximation for the exact optimal time T ∗.

8.2 Penalty Method

We investigate the penalty method by introducing an auxiliary unknown path ψ¯ for the delayed functionϕ ¯(s−τ/T ). The consistency between ϕ(t) and ϕ(t−τ) will be maintained through a penalty term. This strategy will give us a more explicit control on T .

110 We rewrite the action functional (8.1.3) as

Z 1 ¯ T −1 0 ¯ 2 S(T, ϕ¯s, ψs) = |T ϕ¯s − b(ϕ ¯s, ψs)| ds, (8.2.1) 2 0

¯ ¯ where we replaceϕ ¯s−τ/T by ψs. We introduce a point-wise constraint that ψs =

ϕ¯sˆ =ϕ ¯(s − τ/T ). We will use the penalty method to deal with this constraint. In other words, we define

Z 1 2 Z 1 ˆ ¯ T −1 0 ¯ 2 β T ¯ 2 S(T, ϕ¯s, ψs) = |T ϕ¯s − b(ϕ ¯s, ψs)| ds + |ψs − ϕ¯sˆ| ds, (8.2.2) 2 0 2 0

¯ where 0 6= β ∈ R. This way, we makeϕ ¯s and ψs independent and link them through the penalty term.

Remark 8.2.1. In fact, there are two kinds of penalty term:

2 Z T 2 Z 1 2 Z 1 β 2 β T ¯ 2 β ¯ 2 |ψt − ϕt−τ | dt = |ψs − ϕ¯sˆ| ds and |ψs − ϕ¯sˆ| ds. 2 0 2 0 2 0

Here in this dissertation, we use the first one.

¯ Assuming thatϕ ¯s and ψs in Equation (8.2.1) are independent, we can obtain an optimal linear time scaling in a similar way to the delay-free case. Taking the partial derivative of (8.2.1) with respect to the time variable T , we have

¯ Z 1 Z 1 ∂S(T, ϕ¯s, ψs) 1 0 2 1 ¯ 2 = − 2 |ϕ¯s| ds + |b(ϕ ¯s, ψs)| ds = 0. ∂T 2T 0 2 0

ˆ ¯ Then we get the optimal time T of (8.2.1) as a functional ofϕ ¯s and ψs

q 1 R |ϕ¯0 |2 ds 0 ˆ ¯ 0 s kϕ¯ kL2([0,1]) T (ϕ ¯s, ψs) = q = ¯ . (8.2.3) R 1 ¯ 2 kb(ϕ, ¯ ψ)kL2([0,1]) 0 |b(ϕ ¯s, ψs)| ds

111 Substituting this optimal time to (8.2.2), we get the TMAM action functional on

L2([0, 1])

ˆ ¯ S(ϕ ¯s, ψs) ˆ ¯ Z 1 2 ˆ ¯ Z 1 T (ϕ ¯s, ψs) ˆ ¯ −1 0 ¯ 2 β T (ϕ ¯s, ψs) ¯ 2 = |T (ϕ ¯s, ψs) ϕ¯s − b(ϕ ¯s, ψs)| ds + |ψs − ϕ¯sˆ| ds 2 0 2 0 β2kϕ¯0k = kϕ¯0kkb(ϕ, ¯ ψ¯)k − hϕ¯0, b(ϕ, ¯ ψ¯)i + kψ¯ − ϕ¯ k2, 2kb(ϕ, ¯ ψ¯)k s sˆ (8.2.4)

2 ˆ ¯ where k · k denotes the norm in L ([0, 1]), ands ˆ = s − τ/T (ϕ ¯s, ψs) is a functional ¯ ofϕ ¯s and ψs. Now, we have reformulated the optimization problem

min min ST (ϕ(t)) (8.2.5) T >0 ϕ(t)∈H1([0,T ]), ϕ(0)=x0,ϕ(T )=x1 into a new optimization problem

min Sˆ(ϕ ¯(s), ψ¯(s)). (8.2.6) ϕ¯(s),ψ¯(s)∈H1([0,1]), ϕ¯(0)=x0,ϕ¯(1)=x1

8.3 Variational Calculus with Delay for Finite Element Approximation

As discussed in [32], the Euler–Lagrange equation of this kind of action functional is a high-order nonlinear and singular PDE, which is very difficult to solve. So we still consider the minimization problem itself directly and use a variational approach. In [32], we gave a finite element approximation framework and provide the analysis of its convergence. We showed that this framework is efficient for a system without delay no matter that the optimal transition time is finite or infinite.

We still use the same framework for discretizing the action functional (8.1.2) and its reformulation (8.2.4).

112 1 Suppose that we have a finite dimensional approximation space Vh ⊂ H ([0, 1]) ¯ spanned by {ϕ¯i(s), ψl(s); 1 ≤ i ≤ M, 1 ≤ l ≤ N} such that

M N X ¯ X ¯ ϕ¯(s) = xiϕ¯i(s), ψ(s) = ylψl(s), i=1 l=1 n where xi, yl ∈ R . We then have the discretized version of Problem (8.2.6)

min Sˆ(ϕ ¯(s), ψ¯(s)). (8.3.1) ¯ ϕ¯(s),ψ(s)∈Vh, ϕ¯(0)=x0,ϕ¯(1)=x1 Since most modern optimization algorithms require the gradient, we now com- pute the gradient of the discretized action functional in (8.3.1) using a variational approach. Let M N X ¯ X ¯ δϕ¯(t) = δxiϕ¯i(s), δψ(s) = δylψl(s), i=1 l=1 n where δxi, δyi ∈ R . Then δSˆ δSˆ δSˆ(ϕ, ¯ ψ¯) = h , δϕ¯i + h , δψ¯i δϕ¯ δψ¯ M N X δSˆ X δSˆ = h , δx ϕ¯ (s)i + h , δy ψ¯ (s)i. δϕ¯ i i δψ¯ l l i=1 l=1 Consider a particular choice of δϕ¯, whose coefficients are equal to zero except the jth component δxi,j of δxi. Then we have

δSˆ δSˆ = h , ϕ¯ (s)e iδx , δϕ¯ i j i,j which implies that ˆ ˆ ˆ ¯ ∂S δS (∇S(ϕ, ¯ ψ))k(i,j) = = h , ϕ¯i(s)eji, (8.3.2) ∂xi,j δϕ¯

n where ej is the unit vector in R with its jth component being 1 and the rest being 0, and k(i, j) is a global index for the variableϕ ¯ uniquely determined by i = 1, 2,...,M and j = 1, 2, . . . , n. The same is for ∂Sˆ : ∂yl,j ˆ ˆ ˆ ¯ ∂S δS ¯ (∇S(ϕ, ¯ ψ))k˜(l,j) = = h , ψl(s)eji, (8.3.3) ∂yl,j δψ¯

113 where k˜(l, j) is a global index for the variable ψ¯.

Now, we introduce the finite element approximation space. Note that the variable

1 ϕ¯ ∈ H ([0, 1]) is subject to the boundary conditionϕ ¯(0) = x0 andϕ ¯(1) = x1. Then

1 ϕ¯(s) − l(s) ∈ H0 ([0, 1]), where l(s) = x0 + s(x1 − x0) is the linear path connecting ¯ x0 and x1. The variable ψ that replaces the delay part ofϕ ¯ doesn’t need to satisfy

1 1 the boundary condition. Soϕ ¯ ∈ H0 ([0, 1]) + l and ψ ∈ H ([0, 1]). While the higher order techniques can be considered, we use the linear finite elements here. Let

Th : 0 = s0 < s1 < ··· < sN = 1 be a partition of the interval [0, 1]. We define the linear finite element spaces

1 1 Uh ⊂ H0 ([0, 1]),Wh ⊂ H ([0, 1]), and the admissible set

Vh = (Uh × Wh) + (l, 0)

¯ N−1 ¯ N+1 for (ϕ, ¯ ψ). Let {ϕ¯i}i=1 be the basis for Uh and {ψi}i=1 be the basis for Wh. Then the variables (ϕ, ¯ ψ¯) can be approximated as

N−1 N+1 ¯ ¯ X X ¯ (ϕ, ¯ ψ) ≈ (ϕ ¯h, ψh) = ( xiϕ¯i + l, yiψi). i=1 i=1 Then the discretized action functional in (8.3.1) has the form

ˆ ¯ S(ϕ ¯h, ψh) β2kϕ¯0 k τ = kϕ¯0 kkb(ϕ ¯ , ψ¯ )k − hϕ¯0 , b(ϕ ¯ , ψ¯ )i + h kψ¯ (s) − ϕ¯ (s − )k2. h h h h h h ¯ h h ˆ ¯ 2kb(ϕ ¯h, ψh)k T (ϕ ¯h, ψh) 8.4 Computation of the Gradient of Sˆ

∗ ¯∗ PN−1 ∗ PN+1 ∗ ¯ Let (ϕ ¯h, ψh) = ( i=1 xi ϕ¯i(s) + l(s), l=1 yl ψl(s)) be the finite element approx- imation of the minimizer (ϕ ¯∗, ψ¯∗) for the minimal action pathϕ ¯∗. Then the finite element coefficients xi,j and yl,j will satisfy the problem

ˆ ∗ ¯∗ ˆ ¯ ˆ S(ϕ ¯ , ψ ) = min S(ϕ ¯h, ψh) = min S(xi,j, yl,j). (8.4.1) h h ¯ (ϕ ¯h,ψh)∈Vh xi,j ,yl,j ∈R

114 ˆ ˆ Now, we derive the partial derivatives ∂S/∂xi,j and ∂S/∂yl,j. First, we have

¯ ¯ ¯ ¯ ¯ ¯ 2 ¯ 2 b(ϕ ¯ + δϕ,¯ ψ + δψ) = b(ϕ, ¯ ψ) + ∇1b(ϕ, ¯ ψ)δϕ¯ + ∇2b(ϕ, ¯ ψ)δψ + O(kδϕ¯k + kδψk ),

¯ ¯ where δϕ¯ and δψ are small perturbations ofϕ ¯ and ψ, respectively, and ∇1b and

∇2b are the gradients of b with respect to the first and second variables. Then we get

Sˆ(ϕ ¯ + δϕ,¯ ψ¯) − Sˆ(ϕ, ¯ ψ¯)

= k(ϕ ¯ + δϕ¯)0k − kϕ¯0kkb(ϕ, ¯ ψ¯)k + kϕ¯0kkb(ϕ ¯ + δϕ,¯ ψ¯)k − kb(ϕ, ¯ ϕ¯)k

− hδϕ¯0, b(ϕ, ¯ ψ¯)i − hϕ¯0, b(ϕ ¯ + δϕ,¯ ψ¯) − b(ϕ, ¯ ψ¯)i β2(k(ϕ ¯ + δϕ¯)0k − kϕ¯0k) + kψ¯ − ϕ¯(s − τ/Tˆ(ϕ, ¯ ψ¯))k2 2kb(ϕ, ¯ ψ¯)k β2kϕ¯0k + (kb(ϕ ¯ + δϕ,¯ ψ¯)k − kb(ϕ, ¯ ψ¯)k)kψ¯ − ϕ¯(s − τ/Tˆ(ϕ, ¯ ψ¯))k2 2kb(ϕ, ¯ ψ¯)k2 β2kϕ¯0k δkψ¯ − ϕ¯(s − τ/Tˆ)k2 + (Tˆ(ϕ ¯ + δϕ,¯ ψ¯) − Tˆ(ϕ, ¯ ψ¯)) 2kb(ϕ ¯ + δϕ,¯ ψ¯)k δTˆ + O(kδϕ¯k2 + kδϕ¯0k2), with 1 k(ϕ ¯ + δϕ¯)0k − kϕ¯0k = hϕ¯0, δϕ¯0i + O(kδϕ¯0k2), kϕ¯0k

1 kb(ϕ ¯ + δϕ,¯ ψ¯)k − kb(ϕ, ¯ ψ¯)k = hb(ϕ, ¯ ψ¯), ∇ b(ϕ, ¯ ψ¯)δϕ¯i + O(kδϕ¯k2), kb(ϕ, ¯ ψ¯)k 1

0 ¯ ¯ 0 ¯ 2 hϕ¯ , b(ϕ ¯ + δϕ,¯ ψ) − b(ϕ, ¯ ψ)i = hϕ¯ , ∇1b(ϕ, ¯ ψ)δϕ¯i + O(kδϕ¯k ),

Tˆ(ϕ ¯ + δϕ,¯ ψ¯) − Tˆ(ϕ, ¯ ψ¯) 1 kϕ¯0k = (k(ϕ ¯ + δϕ¯)0k − kϕ¯0k) − (kb(ϕ ¯ + δϕ,¯ ψ¯)k − kb(ϕ, ¯ ψ¯)k), kb(ϕ, ¯ ψ¯)k kb(ϕ, ¯ ψ¯)k2

δkψ¯ − ϕ¯(s − τ/Tˆ)k2 τ D τ τ E = − ψ¯ − ϕ¯(s − ), ϕ¯0(s − ) . δTˆ Tˆ2 Tˆ Tˆ

115 Combining the equations above and simplifying, we have

∂Sˆ

∂xi,j D E D E ˆ 1 0 ¯ 1 0 ¯ 2 ¯ τ τ = T ϕ¯ − b(ϕ, ¯ ψ), ϕ¯ ej − ∇1b(ϕ, ¯ ψ)ϕ ¯iej − β ψ − ϕ¯(s − ), ϕ¯i(s − )ej Tˆ Tˆ i Tˆ Tˆ 2 D E D E τβ ¯ τ 0 τ 1 0 0 ˆ ¯ ¯ − ψ − ϕ¯(s − ), ϕ¯ (s − ) hϕ¯ , ϕ¯ eji − T b(ϕ, ¯ ψ), ∇1b(ϕ, ¯ ψ)ϕ ¯iej . kϕ¯0k2 Tˆ Tˆ Tˆ i (8.4.2)

Similarly, we have

ˆ D E D E ∂S ˆ 1 0 ¯ ¯ ¯ 2 ¯ τ ¯ = − T ϕ¯ − b(ϕ, ¯ ψ), ∇2b(ϕ, ¯ ψ)ψlej + β ψ − ϕ¯(s − ), ψlej ∂yl,j Tˆ Tˆ (8.4.3) 2 ˆ τβ T D ¯ τ 0 τ ED ¯ ¯ ¯ E + ψ − ϕ¯(s − ), ϕ¯ (s − ) b(ϕ, ¯ ψ), ∇2b(ϕ, ¯ ψ)ψlej . kϕ¯0k2 Tˆ Tˆ With the computation of the gradient, we can use a gradient-type optimiza- tion algorithm, such as conjugate gradient, stochastic gradient method or BFGS,

∗ ¯∗ to solve the dicretized problem (8.3.1) (or (8.4.1)). Once the minimizer (ϕ ¯h, ψh)

∗ or the minimal action pathϕ ¯h is obtained, the optimal integration time can be approximated by

kϕ¯0k Tˆ∗ ≈ Tˆ(ϕ ¯∗ , ϕ¯∗ (s − τ/Tˆ)) = , h h h ∗ ∗ ˆ kb(ϕ ¯h, ϕ¯h(s − τ/T ))k or kϕ¯0k Tˆ∗ ≈ Tˆ(ϕ ¯∗ , ψ¯∗) = . h h h ∗ ¯∗ kb(ϕ ¯h, ψh)k

8.5 Numerical Examples

We verify our method using the same system with different delays. We consider in this section the stochastic delayed system (8.1.1) with b(x, y) = Ax + By, where     −1 0 −1 0     A =   ,B =   . 0 −5 0 −1

116 FIGURE 8.1. Approximate MAPs for different delayed systems.

Example 8.5.1. Suppose τ = 1,T = 2, f(t) = (t, t), i.e., the delay is simply linear.

We use our method in Section 8.2 to find the minimal action path numerically. The left plot in Figure 8.1 shows the approximate minimal action path with n = 16 and n = 32 nodes. For this simple case, the solution is already close to the exact solution when n = 16. ˆ ˆ The approximate time scaling are T16 ≈ 1.90 and T32 ≈ 1.97. The corresponding ˆ ˆ minimum actions are S16 ≈ 4.90e−3 and S32 ≈ 1.23e−3, respectively. This means that as the mesh refines, the approximate optimal time scaling converges to the optimal time scaling T ∗ = 2, and the discrete action decreases to the minimum action S∗ = 0.

Example 8.5.2. Suppose τ = 1,T = 2, f(t) = (t2, −t2), i.e., the delay is nonlinear.

We apply our method in Section 8.2 to solve the problem numerically. The right plot in Figure 8.1 shows the approximate minimal action path with n = 16 and n = 32 nodes. For this case, we can see that the solution for n = 16 is still far away from the exact solution. But when n = 32, the solution is close to the exact solution. This shows an obvious comparison between different numbers of nodes. ˆ ˆ The approximate time scaling are T16 ≈ 1.44 and T32 ≈ 1.85. This also shows an obvious convergence of the approximate optimal time scaling to the optimal

117 time scaling T ∗ = 2 as the mesh refines. The corresponding minimum actions are ˆ ˆ S16 ≈ 8.75e − 3 and S32 ≈ 1.36e − 3, respectively. This presents the decrease of discrete action to the minimum action S∗ = 0.

Remark 8.5.3. It is worth to point out that the convergence rate of the method with uniform mesh and linear elements is very slow. So we need to develop the algorithm by adaptive method (see [28, 33], and also see the discussion at the end of Section 7.8).

118 References

[1] Braides, A.: Γ-Convergence for Beginners, Oxford University Press, 2002. [2] Braides, A.: Local Minimization, Variational Evolution and Γ-Convergence, Lecture Notes in Mathematics, vol 2094, Springer, Berlin, 2014. [3] Cerrai, S. and Freidlin, M.: Large deviations for the Langevin equation with strong damping. Appl. Math. Optim., 161 (2015), 859–875. [4] Ciarlet, P. G.: The Finite Element Method for Elliptic Problems, SIAM, 2002. [5] Dal Maso, G.: An Introduction to Γ-convergence, Birkhauser, Boston, 1993. [6] Dembo, A. and Zeitouni, O.: Large Deviations Techniques and Applications, 2nd ed., Springer, 1998. [7] Duong, M. H., Peletier, M. A., and Zimmer, J.: Conservative-dissipative approximation schemes for a generalized Kramers equation. Mathematical Methods in the Applied Sciences. 37 (2014), 2517–2540. [8] E, W., Ren, W., and Vanden-Eijnden, E.: String method for the study of rare events, Phys. Rev. B, 66 (2002), 052301. [9] E, W., Ren, W., and Vanden-Eijnden, E.: Minimum action method for the study of rare events, Commun. Pure Appl. Math., 57 (2004), 637–656. [10] E, W., Ren, W., and Vanden-Eijnden, E.: Simplified and improved string method for computing the minimum energy paths in barrier-crossing events, J. Chem. Phys., 126 (2007), 164103. [11] Evans, L. C.: Partial Differential Equations, American Mathematical Society, 2nd Edition, 2010. [12] Fogedby, H. C. and Ren, W.: Minimum action method for the Kardar-Parisi- Zhang equation, Phys. Rev. E, 80 (2009), 041116. [13] Freidlin, M. and Wentzell, A.: Random Perturbations of Dynamical Systems, second ed., Springer-Verlag, New York, 1998. [14] Grafke, T., Grauer, R., Sch¨afer, T., and Vanden-Eijnden, E.: Arclength parametrized Hamilton’s equations for the calculation of instantons, SIAM, Multiscale Model. Simul., 12(2) (2014), 566–580. [15] Grafke, T., Sch¨afer,T., and Vanden-Eijnden, E.: Long-term effects of small random perturbations on dynamical systems: theoretical and computational tools, In: R. Melnik, R. Makarov and J. Belair (eds), Recent Progress and Modern Challenges in Applied Mathematics, Modeling and Computational Science, Fields Institute Communications, vol 79. Springer, New York, NY.

119 [16] Hale, J. K. and Verduyn Lunel, S. M.: Introduction to Functional Differential Equations, Springer, New York, 1991.

[17] Heymann, M. and Vanden-Eijnden, E.: The geometric minimum action method: A least action principle on the space of curves, Commun. Pure Appl. Math., 61 (2008), 1052–1117.

[18] J`onsson,H., Mills, G., and Jacobsen, K.: Nudged elastic band method for find- ing minimum energy paths of transitions, Classical and Quantum Dynamics in Condensed Phase Simulations, Ed. B. Berne, G. Ciccotti and D. Coker, 1998.

[19] Kesavan, S.: Topics in Functional Analysis and Applications, John Wiley & Sons Inc., 1989.

[20] Lei, H., Baker, N. A., and Li, X.: Data-driven parameterization of the gen- eralized Langevin equation. PNAS. 113 (2016), 14183–14188.

[21] Mao, X.: Stochastic Differential Equations and Applications, 2nd ed. Hor- wood, England, 1997.

[22] Mo, C. and Luo, J.: Large deviations for stochastic differential delay equa- tions. Nonlinear Analysis. 80 (2013), 202–210.

[23] Nocedal, J. and Wright, S.: Numerical Optimization, Springer Series in Op- erations Research, Springer, New York, 1999.

[24] Smith, N. R., Meerson, B., and Sasorov, P. V.: Local average height distribu- tion of fluctuating interfaces, Phys. Rev. E, 95 (2017), 012134.

[25] Sun, Y. and Zhou, X.: An improved adaptive minimum action method for the calculation of transition path in non-gradient systems, arXiv:1701.04044.

[26] Wan, X.: An adaptive high-order minimum action method, J. Compt. Phys., 230 (2011), 8669–8682.

[27] Wan, X.: A minimum action method for small random perturbations of two- dimensional parallel shear flows, J. Compt. Phys., 235 (2013), 497–514.

[28] Wan, X.: A minimum action method with optimal linear time scaling, Comm. Compt. Phys., 18(5) (2015), 1352–1379.

[29] Wan, X. and Lin, G.: Hybrid parallel computing of minimum action method, Parallel Computing, 39 (2013), 638–651.

[30] Wan, X. and Yu, H.: A dynamic-solver-consistent minimum action method: With an application to 2D Navier-Stokes equations, J. Comput. Phys., 33 (2017), 209–226.

120 [31] Wan, X., Yu, H., and E, W.: Model the nonlinear instability of wall-bouned shear flows as a rare event: A study on two-dimensional Poiseuille flows, Nonlinearity, 28 (2015), 1409–1440.

[32] Wan, X., Yu, H., and Zhai, J.: Convergence analysis of finite element ap- proximation of large deviation principle, Accepted (2018), arXiv:1710.03471.

[33] Wan, X., Zheng, B., and Lin, G.: An hp-adaptive of minimum action method based on a posteriori error estimate, Communications in Computational Physics, 23(2) (2018), 408–439.

[34] Wan, X., Zhou, X., and E, W.: Study of the noise-induced transition and the exploration of the configuration space for the Kuramoto-Sivashinsky equation using the minimum action method, Nonlinearity, 23 (2010), 475–493.

[35] Wells, D. K., Kath, W. L., and Motter, A. E.: Control of stochastic and induced switching in biophysical networks, Phys. Rev. X 5 (2015), 031036.

[36] Yao, W. and Ren, W.: Noise-induced transition in barotropic flow over topog- raphy and application to Kuroshio, J. Comput. Phys. 300 (2015), 352–364.

[37] Zhou, X. and E, W.: Study of noise-induced transitions in the Lorenz system using the minimum action method, Comm. Math. Sci., 8(2) (2010), 341–355.

[38] Zhou, X., Ren, W., and E, W.: Adaptive minimum action method for the study of rare events, J. Chem. Phys., 128 (2008), 104111.

121 Vita

Jiayu Zhai was born in June 1986, in Weifang, Shandong Province, China. He

finished his undergraduate studies at Ludong University in July 2009. He earned a master of science degree in mathematics from Shanghai University in July 2012. In

August 2012, he came to Louisiana State University to pursue graduate studies in mathematics. He earned a master of science degree in mathematics from Louisiana

State University in May 2014. He is currently a candidate for the degree of Doctor of Philosophy in mathematics, which will be awarded in May 2018.

122