<<

Introduction to Stochastic Math 545

Contents

Chapter 1. Introduction 5 1. Motivations 5 2. Outline For a Course 6 Chapter 2. Probabilistic Background 9 1. Countable probability spaces 9 2. Uncountable Probability Spaces 11 3. Distributions and Convergence of Random Variables 18 Chapter 3. and Stochastic Processes 21 1. An Illustrative Example: A Collection of Random Walks 21 2. General Stochastic Proceses 22 3. Definition of (Wiener Process) 23 4. Constructive Approach to Brownian motion 24 5. Brownian motion has Rough Trajectories 26 6. More Properties of Random Walks 28 7. More Properties of General Stochastic Processes 29 8. A glimpse of the connection with pdes 34 Chapter 4. ItˆoIntegrals 37 1. Properties of the noise Suggested by Modeling 37 2. Riemann-Stieltjes 38 3. A motivating example 39 4. Itˆointegrals for a simple class of step functions 40 5. Extension to the Closure of Elementary Processes 44 6. Properties of Itˆointegrals 46 7. A continuous in time version of the the Itˆointegral 47 8. An Extension of the ItˆoIntegral 48 9. ItˆoProcesses 49 Chapter 5. Stochastic Calculus 51 1. Itˆo’sFormula for Brownian motion 51 2. and Covariation 54 3. Itˆo’sFormula for an ItˆoProcess 58 4. Full Multidimensional Version of ItˆoFormula 60 5. Collection of the Formal Rules for Itˆo’sFormula and Quadratic Variation 64 Chapter 6. Stochastic Differential Equations 67 1. Definitions 67 2. Examples of SDEs 68 3. Existence and Uniqueness for SDEs 71 4. Weak solutions to SDEs 74

3 5. Markov property of Itˆodiffusions 76 Chapter 7. PDEs and SDEs: The connection 77 1. Infinitesimal generators 77 2. Martingales associated with diffusion processes 79 3. Connection with PDEs 81 4. Time-homogeneous Diffusions 84 5. Stochastic Characteristics 85 6. A fundamental example: Brownian motion and the Heat Equation 86 Chapter 8. Martingales and Localization 89 1. Martingales & Co. 89 2. Optional stopping 91 3. Localization 92 4. Quadratic variation for martingales 95 5. L´evy-Doob Characterization of Brownian-Motion 97 6. Random time changes 99 7. Time Change for an sde 99 8. Martingale representation theorem 100 9. Martingale inequalities 100 Chapter 9. Girsanov’s Theorem 103 1. An illustrative example 103 2. Tilted Brownian motion 104 3. Shifting Weiner Measure Simple Cameron Martin Theorem 105 4. Girsanov’s Theorem for sdes 106 Bibliography 109 Appendix A. Some Results from 111

4 CHAPTER 1

Introduction

1. Motivations Evolutions in time with random influences/random dynamics. Let Nptq be the “number of rabbits in some population” or “the price of a stock”. Then one might want to make a model of the dynamics which includes “random influences”. A (very) simple example is dNptq “ aptqNptq where aptq “ rptq ` “noise” . (1.1) dt Making sense of “noise” and learning how to make calculations with it is one of the principal objectives of this course. This will allow us predict, in a probabilistic sense, the behavior of Nptq . Examples of situations like the one introduced above are ubiquitous in nature: i) The gambler’s ruin problem We play the following game: We start with 3$ in our pocket and we flip a coin. If the result is tail we loose one dollar, while if the result is positive we win one dollar. We stop when we have no money to bargain, or when we reach 9$. We may ask: what is the probability that I end up broke? ii) Population dynamics/Infectious diseases As anticipated, (1.1) can be used to model the evolution in the number of rabbits in some population. Similar models are used to model the number of genetic mutations an animal species. We may also think about Nptq as the number of sick individuals in a population. Reasonable and widely applied models for the spread of infectious diseases are obtained by modifying (1.1), and observing its behavior. In all these cases, one may be inteested in knowing if it is likely for the disease/mutation to take over the population, or rather to go extinct. iii) Stock prices We may think about a set of M risky investments (e.g. a stock), where the price Niptq for i P t1,...Mu per unit at time t evolves according to (1.1). In this case, one M one would like to optimize his/her choice of stocks to maximize the total value i“1 αiNiptq at a later time T . ř Connections with theory and PDEs. There exists a deep connection between noisy processes such as the one introduced above and the deterministic theory of partial differential equations. This starling connection will be explored and expanded upon during the course, but we anticipate some examples below: i) Dirichlet problem Let upxq be the solution to the pde given below with the noted B B boundary conditions. Here ∆ “ Bx ` By . The amazing fact is the following: If we start a Brownian motion diffusing from a point px0, y0q inside the domain then the probability that it first hits the boundary in the darker region is given by upx0, y0q. ii) Black Scholes Equation Suppose that at time t “ 0 the person in iii) is offered the right (without obligation) to buy one unit of the risky asset at a specified price S and at a specified future time t “ T . Such a right is called a European call . How much should the person be willing to pay for such an option? This question can be answered by solving the famous Black Scholes equation, giving for any stock price Nptq the rigth value S of the European option.

5 u = 0

1 2∆u = 0

u = 1

2. Outline For a Course What follows is a rough outline of the class, giving a good indication of the topics to be covered, though there will be modifications. i) Week 1: Motivation and Introduction to (a) Motivating Examples: Random Walks, Population Model with noise, Black-Scholes, Dirichlet problems (b) Themes: Direct calculation with stochastic calculus, connections with pdes (c) Introduction: Probability Spaces, Expectations, σ-algebras, Conditional expectations, Random walks and discrete time stochastic processes. Continuous time stochastic pro- cesses and characterization of the law of a process by its finite dimensional distributions (Kolmogorov Extension Theorem). Markov Process and Martingales. ii) Week 2-3: Brownian motion and its Properties (a) Definitions of Brownian motion (BM) as a continuous with indepen- dent increments. Chapman-Kolmogorov equation, forward and backward Kolmogorov equations for BM. Continuity of sample paths (Kolmogorov Continuity Theorem). BM and more Markov process and Martingales. (b) First and second variation (a.k.a variation and quadratic variation) Application to BM. iii) Week 4: Stochastic (a) The Riemann-Stieltjes integral. Why can’t we use it ? t (b) Building the Itˆoand Stratonovich integrals. (Making sense of “ 0 σ dB.”) (c) Standard properties of integrals hold: linearity, additivity 2 2 ş (d) Itˆoisometry: Ep f dBq “ E f ds. iv) Week 5: Itˆo’sFormula and Applications (a) Change of variable.ş ş (b) Connections with pdes and the Backward Kolmogorov equation. v) Week 6: Stochastic Differential Equations (a) What does it mean to solve an sde ? (b) Existence of solutions (Picard iteration). Uniqueness of solutions. vi) Week 7: Stopping Times (a) Definition. σ-algebra associated to stopping time. Bounded stopping times. Doob’s optional stopping theorem. (b) Dirichlet Problems and hitting probabilities. (c) Localization via stopping times vii) Week 8: Levy-Doob theorem and Girsonov’s Theorem (a) How to tell when a continuous martingale is a Brownian motion.

6 (b) Random time changes to turn a Martingale into a Brownian Motion (c) Hermite Polynomials and the exponential martingale (d) Girsanov’s Theorem, Cameron-Martin formula, and changes of measure. (1) The simple example of i.i.d Gaussian random variables shifted. (2) Idea of Importance sampling and how to sample from tails. (3) The shift of a Brownian motion (4) Changing the drift in a diffusion. viii) Week 9: sdes with Jumps ix) Week 10: Feller Theory of one dimensional diffusions (a) Speed measures, natural scales, the classification of boundary point.

7

CHAPTER 2

Probabilistic Background

Example 0.1. We begin with the following motivating example. Consider a random sequence N ω “ tωiui“0 where

1 1 with probability 2 ωi “ 1 #´1 with probability 2 independent of the other ω’s. We will also write this as 1 rpω , ω , . . . , ω q “ ps , s , . . . , s qs “ P 1 2 N 1 2 N 2N for si “ ˘1. We can group the possible outcomes into subsets depending on the value of ω1:

F1 “ ttω P Ω: ω1 “ 1u, tω P Ω: ω1 “ ´1uu . (2.1)

These subsets contain the possible events that we may observe knowing the value of ω1. In other words, this separation represents the information we have on the process at that point. Let Ω be the set of all such sequences of length N (i.e. Ω “ t´1, 1uN ), and consider now the sequence of functions tXn :Ω Ñ Zu where

X0pωq “ 0 (2.2) n Xnpωq “ ωi i“1 ÿ for n P t1, ¨ ¨ ¨ ,Nu. Each Xn is an example of an integer-valued . The set Ω is called the sample space (or outcome space) and the measure given by P on Ω is called the probability measure.

We now recall some basic definitions from the theory of probability which will allow us to put this example on solid ground. Intuitively, each ω P Ω is a possible outcome of all of the in our system. The measure P gives the chance of the various outcomes. In the above setting where the outcome space Ω consists of a finite number of elements, we are able to define everything in a straightforward way. We begin with a quick recalling of a number of definitions in the countably infinite (possibly finite) setting.

1. Countable probability spaces If Ω is countable it is enough to define the probability of each element in Ω. That is to say give a function p:Ω Ñ r0, 1s with ωPω ppωq “ 1 and define ř Prωs “ ppωq

9 for each ω P Ω. An event A is just a subset of Ω. We naturally extend the definition of P to an event A by

PrAs :“ P rωs . ωPA ÿ Observe that this definition has a number of consequences. In particular, if Ai are disjoint events, that is to say Ai Ă Ω and Ai X Ak “H if i ‰ j then 8 8 P Ai “ PrAis «i“1 ff i“1 c ď ÿ c and if A :“ tω P Ω with ω R Au is the compliment of A then PrAs “ 1 ´ PrA s. Given two event A and B, the conditional probability of A given B is defined by PrA X Bs PrA|Bs :“ (2.3) PrBs For fixed B, this is just a new probability measure Pr ¨ |Bs on Ω which gives probability Prω|Bs to the outcome ω P Ω. A real-valued random variable X is simply a real-valued function of Ω. Similarly, a random variable taking values in some set X is a function X : Ω Ñ X. We can then define the expected value of a random variable X (or simply the expectation of X) as

ErXs :“ xPrX “ xs “ XpωqPrωs x Range X ωPΩ P ÿ p q ÿ Here we have used the convention that tX “ xu is short hand for tω P Ω: Xpωq “ xu and the definition of RangepXq “ tx P X : Dω, with Xpωq “ xu “ X´1pΩq. Two events A and B are independent if PrA X Bs “ PrAsPrBs. Two random variable are independent if PrX “ x, Y “ ys “ PrX “ xsPrY “ ys. Of course this implies that for any events A and B that PrX P A, Y P Bs “ PrX P AsPrY P Bs and that ErXY s “ ErXsErY s. A collection of events Ai is said to be mutually independent if n PrA1 X ¨ ¨ ¨ X Ans “ PrAis . i“1 ź Similarly a collection of random variable Xi are mutually independent if for any collection of sets from their range Ai one has that the collection of events tXi P Aiu are mutually independent. As before, as a consequence one has that n ErX1 ...Xns “ ErXis . i“1 ź Given two X-valued random variables Y and Z, for any z P Range(Z) we define the conditional expectation of Y given tZ “ zu as

ErY |Z “ zs :“ yPrY “ y|Z “ zs (2.4) y Range Y P ÿ p q Which is to say that ErY |Z “ zs is just the expected value of Y under the probability measure which is given by P r ¨ |Z “ zs. In general, for any event A we can define the conditional expectation of Y given A as

ErY |As :“ yPrY “ y | As (2.5) y Range Y P ÿ p q 10 We can extend the definition ErY |Z “ zs to ErY |Zs which we understand to be a function of Z which takes the value ErY |Z “ zs when Z “ z. More formally ErY |Zs :“ hpZq where h: RangepZq Ñ X given by hpzq “ ErY |Z “ zs. Example 1.1. Let us compute an example of conditional probability. Using the notation from Example 0.1 we see 2 2 ErpX3q |X2 “ 2s “ iPrpX3q “ i|X2 “ 2s iPN ÿ2 2 “ p1q PrX3 “ 1|X2 “ 2s ` p3q PrX3 “ 3|X2 “ 2s “ 5

Of course, X2 can also take the value ´2 and 0. For these values of X2 we have 2 2 2 ErpX3q |X2 “ ´2s “p´1q PrX3 “ ´1|X2 “ ´2s ` p´3q PrX3 “ ´3|X2 “ ´2s “ 5 2 2 2 ErpX3q |X2 “ 0s “p´1q PrX3 “ ´1|X2 “ 0s ` p1q PrX3 “ 1|X2 “ 0s “ 1 2 Hence ErpX3q |X2 “ ´2s “ hpX2q where 5 if x “ ˘2 hpxq “ #1 if x “ 0 By clever rearrangement one does not always have to calculate the function ErY |Zs so explicitly. Consider the following examples. 7 6 ErX7|X6s “ E ωi|X6 “ E ωi ` ω7|X6 «i“1 ff «i“1 ff ÿ ÿ “E rX6 ` ω7|X6s “ ErX6|F6s ` Erω7|F6s “X6 ` Erω7s “ X6 since Erω7s “ 0. We can also do a similar calculation for the previous example. 2 2 2 2 ErX3 |X2s “ ErpX2 ` ω3q |X2s “ ErX2 ` 2ω3X2 ` ω3|X2s 2 2 2 “ ErX2 s ` 2Erω3sErX2s ` Erω3s “ X2 ` 1 2 since Erω3s “ 0 and Erω3s “ 1. Compare this to the definition of h given above.

2. Uncountable Probability Spaces We will need to consider Ω which have uncountably many points. An example is a uniform point drawn from r0, 1s. To handle such settings completely rigorously we need ideas from basic measure theory. However if one is willing to except a few formal rules of manipulation, we can proceed with learning basic stochastic calculus without needing to distract ourselves with too much measure theory. A R-valued random variable X is called a continuous random variable if there exists a density function ρ: R Ñ R so that b PrX P ra, bss “ ρpxqdx ża n for any ra, bs Ă R. More generally a R -valued random variable X is called a continuous random n variable if there exists a density function ρ: R Ñ R so that b1 bn PrX P ra, bss “ ¨ ¨ ¨ ρpx1, . . . , xnq dx1 ¨ ¨ ¨ dxn “ ρpxqdx “ ρpxqLebpdxq ża1 żan żra,bs żra,bs 11 n for any ra, bs “ rai, bis Ă R . The last two expressions are just different ways of writing the same n thing. Here we have introduced the notation Lebpdxq for the standard Lebesgue measure on R ś given by dx1 ¨ ¨ ¨ dxn. Analogously to the countable case we define the expectation of a continuos random variable with density ρ by

ErhpXqs “ hpxqρpxqdx . n żR Definition 2.1. A real-valued random variable X is Gaussian with mean µ and variance σ2 if 2 1 ´ px´µq PrX P As “ ? e 2σ2 dx . 2πσ2 żA If a random variable has this distribution we will write X „ N pµ, σq . n m If X and Y are R -valued and R -valued random variables, respectively, then the vector pX,Y q n m is again a continuous R ˆ -valued random variable which has a density which is called the joint density of X and Y . If Y has density ρY and ρXY is the joint density of X and Y we can define

ρXY px, yq PrX P A|Y “ ys “ dx . ρ pyq żA Y Hence X given Y “ y is a new continuos random variable with density x ÞÑ ρXY px,yq for a fixed y. ρY pyq The conditional expectation is defined using this density. While many calculations can be handled satisfactorily at this level, we will soon see that we need to consider random variables on much more complicated spaces such as the space of real-valued continuous functions on the time interval r0,T s which will be denoted Cpr0,T s; Rq. To give all of the details in such a setting would require a level of technical detail which we do not wish to enter into on our first visit to the subject of stochastic calculus. If one is willing to “suspend a little disbelief” one can learn the formal rule of manipulation, much as one did when one first learned regular calculus. The technical details are important but better appreciated after one fist has the big picture. 2.1. Sigma Algebras. To this end, we will introduce the idea of a sigma algebra (usually written σ-algebra or σ-field in [Klebaner]). In Section 1, we defined our probability measures by beginning with assigning a probability to each ω P Ω. This was fine when Ω was finite or countably infinite. When Ω “ r0, 1s as in the case of picking a uniform point from the unit interval, the probability of any given point must be zero. Otherwise the sum of all of the probabilities would be 8 since there are infinitely many points and each of them has the same probability as no point is more or less likely than another. Formally,

Prtxus “ 8 if Prtxus ą 0 . x 0,1 Prÿ s This is only the tip of the iceberg. There are many more complicated issues. The solution is to fix a collection of subsets of Ω about which we are “allowed” to ask “what is the probability of this event?”. We will be able to make this collection of subsets very large, but it will not, in general, contain all of the subsets of Ω in situations where Ω is uncountable. This collections of subsets is called the σ-algebra. The triplet pΩ, F, Pq of an outcome space Ω, a probability measure P and a σ-algebra F is called a Probability Space. For any event A P F, the “probability of this event happening” is well defined and equal to PrAs. A subset of Ω which is not in F might not have a well defined probability. Essentially all of the event you will think of naturally will be in the σ-algebra with which we will work. In light of this, it is reasonable to ask why we bring them up

12 at all. It turns out that σ-algebras are a useful way to “encode the information” contained in a collection of events or random variables. This idea and notation is used in many different contexts. If you want to be able to read the literature, it is useful to have a operational understanding of σ-algebras without entering into the technical detail. Before attempting to convey any intuition or operational knowledge about σ-algebras we give the formal definitions since they are short (even if unenlightening). Definition 2.2. Given a set Ω, a σ-algebra F is a collection of subsets of Ω such that i)Ω P F ii) A P F ùñ Ac “ ΩzA P F 8 iii) Given tAnu a countable collection of subsets of F, we have i“1 Ai P F. In this case the pair pΩ, Fq are referred to as a measurable space Ť For us a σ-algebra is the embodiment of information. n Example 2.3. If Ω “ R or any subset of it, we talk about the Borel σ-algebra as the σ-algebra generated by all of the intervals ra, bs with a, b P Ω. This σ-algebra contains essentially any event you would think about in most all reasonable problems. Using pa, bq, or ra, bq or pa, bs or some mixture of them makes no difference. Given any collection of subsets G of Ω we can talk about the “σ-algebra generated by G” as simply what we get by taking all of the elements of G and exhaustively applying all of the operations listed above in the definition of a σ-algebra. More formally, Definition 2.4. Given Ω and F a collection of subsets of Ω, σpF q is the σ-algebra generated by F . This is defined as the smallest (in terms of numbers of sets) σ-algebra which contains F . Intuitively σpF q represents all of the probability data contained in F .

Example 2.5 (Example 0.1 continued). In (2.1) we defined the set F1 as the collection of sets of possible outcomes fixing ω1. This collection of sets generaties a σ-algebra on Ω, given by

F1 :“ tH, Ω, tω P Ω: ω1 “ 1u, tω P Ω: ω1 “ ´1uu , (2.6) representing the information we have on the process knowing ω1. To complete our measurable space pΩ, Fq into a probability space we need to add a probability measure. Since we will not build our measure from its definition on individual ω P Ω as we did in Section 1 we will instead assume that it satisfies certain reasonable properties which follow from this construction in the countable or finite case. The fact that the following assumptions is all that is needed would be covered in a measure theoretical probability or analysis class.

Definition 2.6. A measure P on a measurable space pΩ, Fq is a probability measure if i) PrΩs “ 1, c ii) PrA s “ 1 ´ PrAs for all A P F. n n iii) Given tAiu a finite collection of pairwise disjoint sets in F, P r i“1 Ais “ i“1 PrAis , In this case the triplet pΩ, F, Pq is referred to as a probability space Ť ř Definition 2.7. If pΩ, Fq and pX, Bq are measurable spaces, then ξ :Ω Ñ X is a X-valued random variable if for all B P B we have ξ´1pBq P F. In other words, admissible events in X get mapped to admissible events in Ω.

13 Given any events A and B in F, we define the conditional probability just as before, namely PrA X Bs PrA|Bs “ . PrBs Given real-valued random variable X on a probabilty space pΩ, F, Pq, we define the expected value of X in a way analogous to before:

EpXq “ XpωqPpdωq . żΩ We will take for granted that this integral makes sense. However, it follows from the general theory of measure spaces.

Definition 2.8. Given a random variable on the probability space pΩ, F, Pq taking values in a measurable space pX, Bq, we define the σ-algebra generated by the random variable X as σpXq “ σptX´1pBq|B P Buq .

The idea is that σpXq contains all of the infomation contained in X. If an event is in σpXq then whether this event happens or not is completely determined by knowing the value of the random variable X.

Example 2.9 (Example 0.1 continued). By definition (2.2), since X1 “ ω1 the σ-algebra generated by the random variable X1 is σpX1q “ F1 from (2.6). However, the σ-algebra generated by X2 “ ω1 ` ω2 is given by

σpX2q “ tH, Ω, tω P Ω: pω1, ω2q “ p1, 1qu, tω P Ω: pω1, ω2q “ p´1, ´1qu, tω P Ω: ω1`ω2 “ 0uu .

Note that this σ-algebra is different than F2 “ σpttw P Ω: pw1, w2q “ ps1, s2qu : s1, s2 P t´1, 1uuq . Indeed, knowing the value of X2 is not always sufficient to know the value of ω1 “ X1. Contrarily, knowing the value of pω1, ω2q (contained in F2) definitely implies that you know the value of X2. In other words, (the information of) σpX2q is contained in F2, concisely σpX2q Ă F2. 2 Now compare σpX2q and σpY q where Y “ X2 . Lets consider three events A “ tX2 “ 2u, B “ tX2 “ 0u, C “ tX2 is evenu. Clearly all three events are in the σ-algebra generated by X2 (i.e. σpX2q) since if you know that value of X2 then you always know whether the events happen or not. Next notice that B P σpY q since if you know that Y “ 0 then X2 “ 0 and if Y ‰ 0 then X2 ‰ 0. Hence no mater what the value of Y is knowing it you can decide if X2 “ 0 or not. However, knowing the value of Y does not always tell you if X2 “ 2. It does sometimes, but not always. If Y “ 0 then you know that X2 ‰ 0. However if Y “ 4 then X2 could be equal to either 2 or ´2. We conclude that A R σpY q but B P σpY q. Since X2 is always even, we do not need to know any information to decide C and it is in fact in both σpX2q and σpY q. In fact, C “ Ω and Ω is in any σ-algebra since by definition Ω and the empty set H are always included. Lastly, since whenever we know X2 we know Y , it is clear that σpX2q contains all of the information contained in σpY q. In fact it follows from the defintion and the fact that σpY q Ă σpX2q. To say that one σ-algebra is contained in another is to say that the second contains all of the information of the first and possibly more.

Definition 2.10. We say that a real-valued random variable X is measurable with respect to σ-algebra G if every set of the form X´1pra, bsq is in G.1

1If X is a random variable taking values in a measurable space pX, Bq (recall that B is a σ-algebra over X) then we require that X´1pBq P G for all B P B.

14 Speaking intuitively, a random variable is measurable with respect to a given σ-algebra if the information in the σ-algebra is always sufficient to fix the value of the random variable. Of course the random variable X is always measurable with repect to σpXq. More specifically, σpXq is the smallest σ-algebra G on Ω such that X is G-measureable. In the previous example, Y is measurable with respect to σpX2q since knowing the value of X2 fixes the value of Y . Definition 2.11. If a random variable X is measurable with respect to a σ-algebra F then we will write X P F. While this is a slight abuse of notation, it will be very convenient. Example 2.12. Let X be a random variable taking values in r´1, 1s. Let g be the function from r´1, 1s Ñ t´1, 1u such that gpxq “ ´1 if x ď 0 and gpxq “ 1 if x ą 0. Define the random variable Y by Y pωq “ gpXpωqq. Hence Y is a random variable talking values in t´1, 1u. Let FY be the ´1 σ-algebra generated by the random variable Y . That is FY “ σpY q :“ tY pBq : B P BpRqu. In this case, we can figure out exactly what FY looks like. Since Y takes on only two values, we see that for any subset B in BpRq(the Borel σ-algebra of R)

Y ´1p´1q :“ tω : Y pωq “ ´1u if ´ 1 P B, 1 R B $Y ´1p1q :“ tω : Y pωq “ 1u if 1 P B, ´1 R B ´1 ’ Y pBq “ ’ ’H if ´ 1 R B, 1 R B &’ Ω if ´ 1 P B, 1 P B ’ ’ Thus F consists of exactly four’ sets, namely tH, Ω,Y ´1p´1q,Y ´1p1qu. For a function f :Ω Ñ Y %’ R to be measurable with respect the σ-algebra FY , the inverse image of any set B P BpRq must be one ´1 of the four sets in FY . This is another way of saying that f must be constant on both Y p´1q and Y ´1p1q. Note that together Y ´1p´1q Y Y ´1p1q “ Ω. We now re-examine the idea of a conditional expectation of a random variable with respect to a σ-algebra. To do so, we introduce the indicator function.

Definition 2.13. Given pΩ, F, Pq a probability space and A P F, the indicator function of A is 1 x P A 1Apxq “ (2.7) #0 otherwise and is a measurable function. Fixing a probability space pΩ, F, Pq, we have Proposition 2.14. If X is a random variable on pΩ, F, Pq with Er|X|s ă 8, and G Ă F is a σ-algebra, then there is a unique random variable Y on pΩ, G, Pq such that

i) Er|Y |s ă 8 ,

ii) Er1AY s “ Er1AXs for all A P G .

Definition 2.15. We define the conditional expectation with respect to a σ-algebra G as the unique random variable Y from Proposition 2.14, i.e., ErX|Gs :“ Y . The intuition behind Proposition 2.15 is that the conditional expectation wrt a σ-algebra G Ă F of a random variable X P F is that random variable Y P G that is equivalent or identical (in terms of expected value, or predictive power) to X given the information contained in G. In other words, Y “ ErX|Gs is the best approximation of the value of X given the information in G. The previous definition of conditional expectation wrt a fixed set of events is obtained by evaluating

15 the random variable ErX|Gs on the events of interest, i.e., by fixing the events in G that may have occurred. When we condition on a random variable we are really conditioning on the information that random variable is giving to us. In other words, we are conditioning on the σ-algebra generated by that random variable: E rX|Zs :“ E rX|σpZqs . As in the discrete case, one can show that there exists a function h : RangepZq Ñ X such that ErY |Zpωqs :“ hpZpωqq , and hence we can think about the conditional expectation as a function of Zpωq. In particular, this allows to define ErY |Z “ zs :“ hpzq . Example 2.16. When the set Ω is countable, we can write every random variable Y on Ω as

Y pωq “ y1Y “ypωq . y Range Y P ÿ p q Consequently, writing ErX|Zspωq “ ErX|Z “ zs1Z“zpωq , z Range Z P ÿ p q we obtain

Er1Z“zpωqXpωqs “ E r1Z“zpωqErX|Z “ zss “ ErX|Z “ zs ¨ Er1Z“zpωqs Now, recognizing that Er1As “ PrAs, if PrZ “ zs ‰ 0 we finally obtain Er1Z“zpωqXpωqs xPrZ “ z, X “ xs ErX|Z “ zs “ “ , rZ “ zs rZ “ zs P x Range X P P ÿ p q and recover (2.4). We now list some properties of the conditional expectation: ‚ Linearity: for all α, β P R we have ErαX ` βY |Gs “ αErX|Gs ` βErY |Gs , ‚ if X is G-measurable then ErXY |Gs “ XErY |Gs . Intuitively, since X P G (X is measurable wrt the σ-algebra G), the best approximation of X on the sets contained in G is X itself, so we do not need to approximate it. ‚ Tower property: if G and H are both σ-algebras with G Ă H, then E rE rX|Hs |Gs “ E rE rX|Gs |Hs “ E rX|Gs . Since G is a smaller σ-algebra, the functions which are measurable with respect to it are contained in the space of functions measurable with respect to H. More intuitively, being measure with respect to G means that only the information contained in G is left free to vary. E rE rX|Hs |Gs means first give me your best guess given only the information contained in H as input and then reevaluate this guess making use of only the information in G which is a subset of the information in H. Limiting oneself to the information in G is the bottleneck so in the end it is the only effect one sees. In other words, once one takes the conditional expectation with respect to a smaller σ algebra one is loosing information. Therefore, by doing E rE rX|Gs |Hs one is loosing information (in the innermost expectation) that cannot be recovered by the second one.

16 2 ‚ Jensen’s inequality If g : I Ñ R is convex on I Ď R for a random variable X P G with range(X) Ď I we have gpErXs|Gq ď ErgpXq|Gs , ‚ Chebysheff inequality For a random variable X P G we have that for any λ ą 0

ErX|Gs rX ą λ|Gs ď , P λ ‚ Optimal approximation The conditional expectation with respect to a σ-algebra G Ă F by

2 ErY |Gs “ argmin ErY ´ Zs (2.8) Z meas w.r.t. G This should be thought of as the best guess of the value of Y given the information in G.

Example 2.17 (Example 2.12 continued). In the previous example, EtX|FY u is the best ´1 approximation to X which is measurable with respect to FY , that is constant on Y p´1q and ´1 Y p1q.In other words, EtX|FY u is the random variable built from a function hmin composed with the random variable Y such that the expression

2 E pX ´ hminpY qq ! ) is minimized. Since Y pωq takes only two values in our example, the only details of hmin which mater are its values at 1 and -1. Furthermore, since hminpY q only depends on the information in Y , it is measurable with respect to FY . If by chance X is measurable with respect to FY , then the best approximation to X is X itself. So in that case EtX|FY upωq “ Xpωq. In light of (2.8), we see that

ErX|Y1,...,Yks “ ErX|σpY1,...,Ykqs

This fits with our intuitive idea that σpY1,...,Ykq embodies the information contained in the random variables Y1,Y2,...Yk and that ErX|σpY1,...,Ykqs is our best guess at X if we only know the information in σpY1,...,Ykq.

Definition 2.18. Given pΩ, F, Pq a probability space and A, B P F, we say that A and B are independent if

PrA X Bs “ PrAs ¨ PrBs (2.9)

Similarly, random variables tXiu are jointly independent if for all Ci,

n PrX1 P C1 and ... and Xn P Cns “ PrXi P Cis (2.10) i“1 ź It is important to note that given two independent random variables X and Y one has

ErXY s “ ErXs ¨ ErY s (2.11)

2 A function g is convex on I Ď R if for all x, y P I with rx, ys Ď I and for all λ P r0, 1s one has gpλx ` p1 ´ λqyq ď λgpxq ` p1 ´ λqgpyq

17 3. Distributions and Convergence of Random Variables Definition 3.1. We say that two X-valued random variables X and Y have the same distribution or have the same law if for all bounded (measureable) functions f : X Ñ R we have ErfpXqs “ ErfpY qs. This equivalence is sometimes written LawpXq “ LawpY q (2.12) Remark 3.2. Either of the following are equivalent to two random variable X and Y on a probability space pΩ, F, Pq having the same distribution.

i) ErfpXqs “ ErfpY qs for all continuous f with compact support. ii) PrX P As “ PrY P As for all A P F.

There are many ways a sequence of random variables tXnunPN can converge to another random variable X:

Definition 3.3. Let tXnunPN be a sequence of random variables on a probability space pΩ, F, Pq, and let X be a random variable on the same space. Then

‚ almost sure convergence tXnu converges to X almost surely if

rtw P Ω : lim Xnpωq “ Xpωqus “ 1 , P nÑ8

‚ convergence in probability tXnu converges to X in probability if, for all ε ą 0

lim rtw P Ω: |Xnpωq ´ Xpωq| ą εus “ 0 , nÑ8 P

‚ weak convergence tXnu converges weakly (or in distribution) to X if, for all A P B

lim rXnpωq P As “ rXpωq P As . nÑ8 P P

Remark 3.4. The above definitions can be ordered by strength: we have the following implications almost sure convergence ñ convergence in probability ñ weak convergence . Moreover, we note that in order to have convergence in distributions the two random variables do not need to live on the same probability space. A useful method of showing that the distribution of a sequence of random variables converges to another is to consider the associated sequence of Fourier transforms, or the characteristic function of a random variable as it is called in . Definition 3.5. The characteristic function (or Fourier Transform) of a random variable X is defined as

ψptq “ ErexppitXqs for all t P R. It is a basic fact that the characteristic function of a random variable uniquely determines the distribution of a random variable. Furthermore, the following convergence theorem is a classical theorem from probability theory.

18 Theorem 3.6. Let Xn be a sequence of real-valued random variables and let ψn be the associated characteristic functions. Assume that there exists a function ψ so that for each t P R

lim ψnptq “ ψptq . nÑ8 If ψ is continuous at zero then there exists a random variable X so that the distribution of Xn converges to the distribution of X. Furthermore the characteristic function of X is ψ. Example 3.7. Note that using a Fourier transform, 2 2 iλx iλm´ σ λ Ere s “ e 2 , for all λ. Using this, we say that X “ pX1,...,Xkq is a k-dimensional Gaussian if there exists k k m P R and R a positive definite symmetric k ˆ k matrix so that for all λ P R we have iλ¨x iλ¨m´ pRλq¨λ Ere s “ e 2 .

19

CHAPTER 3

Wiener Process and Stochastic Processes

1. An Illustrative Example: A Collection of Random Walks

pnq n Fixing an n ě 0, let tξk : k “ 1, ¨ ¨ ¨ , 2 u be a collection of independent random variables each distributed as normal with mean zero and variance 2´n. For t “ k2´n for some k P t1, ¨ ¨ ¨ , 2nu, we define k pnq pnq B ptq “ ξj . (3.1) j“1 ÿ for intermediate times P r0, 1s not of the form k2´n for some k we define the function as the linear function connecting the two nearest points of the form k2´n. In other words, if t P rs, rs were s “ k2´n and r “ pk ` 1q2´n then t ´ s r ´ t Bpnqptq “ Bpnqpsq ` Bpnqprq 2n 2n We will see momentarily that Bpnq has the following properties independent of n: i) Bpnqp0q “ 0. n ii) EBp qptq “ 0 for all t P r0, 1s. n n 2 n iii) E|Bp qptq ´ Bp qpsq| “ t ´ s for 0 ď s ă t ď 1 of the form k2´ . iv) The distribution of Bpnqptq ´ Bpnqpsq is Gaussian for 0 ď s ă t ď 1 of the form k2´n. v) The collection of random variable pnq pnq B ptiq ´ B pti´1q

are mutually independent as long as 0 ď t0 ă t1 ă ¨ ¨ ¨ ă tm ď 1 for some m and the ti are of the form k2´n. The first property is clear since the sum in (3.1) is empty. The second property for t “ k2´n follows from k pnq pnq EB ptq “ Eξj “ 0 j“1 ÿ pnq pnq t´s pnq r´t pnq since Eξj “. For general t, we have EB ptq “ 2n EB psq ` 2n EB prq with s, r of the form k2´n for different k. To see the second moment calculation take s “ m2´m and t “ k2n and observe that k k k k pnq pnq 2 E |B ptq ´ B psq| “ E ξj ξ` “ E rξjξ`s « j“m `“m ff j“m `“m ” ı ´ ÿ ¯´ ÿ ¯ ÿ ÿ k k k k 2 ´n ´n ´n “ E ξj ` E rξjs E rξ`s “ 2 “ k2 ´ m2 “ t ´ s j“m j“m `“m j“m ÿ “ ‰ ÿ `ÿ‰j ÿ 2 ´n since ξj and ξ` are independent if j ‰ `, and E rξjs “ 0, and E ξj “ 2 .

21 ” ı Since Bnptq is just the sum of independent Gaussians, it is also distributed Gaussian with a mean and a variance which is just the sum of the individual means and variances respectively. pnq pnq Because for disjoint time intervals the differences of the B ptiq ´ B pti´1q are sums over disjoint 1 collections of ξis, they are mutually independent. Since all of these properties are independent of n it is tempting to think about the limit as n Ñ 8 and the mesh becoming increasingly fine. It is not clear that such a limit would exist as the curves Bpnq become increasingly “wiggly.” We will see in fact that it does exist. We begin by taking an abstract perspective in the next sections though we will return to a more concrete perspective at the end.

2. General Stochastic Proceses Motivated by the example of the previous section, we pause to discuss the idea of a stochastic process more generally.

Definition 2.1. Let pΩ, F, Pq be a probability space and let pX, Bq be a measurable space. Also let T be an indexing set which for our purposes will typically be R, R`, N, or Z. Suppose that for each t P T we have Xt :Ω Ñ X a measurable function. Then the set tXtu is a stochastic process on T with values in X. Also, given ω P Ω, ξtpωq : T Ñ X is called a path or trajectory of ξt.

Remark 2.2. Commonly used notations for stochastic processes include tXtu,Xtpωq,Xtpωq,... We want to define a type of equivalence between stochastic processes. But first of recall the following notion of equivalence of random variables.

Definition 2.3. We say that two stochastic processes have the same distribution or have the same law if for all t1 ă ¨ ¨ ¨ ă tn P T we have

LawpXt1 ,...,Xtn q “ LawpYt1 ,...,Ytn q where we think of the vector pXt1 ,...Xtn q as a random variable taking values in the product space Xn. We would like to state a nice extension theorem for constructing stochastic processes, but first we need a definition.

Definition 2.4. Given a set of finite dimensional distributions tµu over an indexing set T on X we say that the set is compatible if

i) For all t1 ă ¨ ¨ ¨ ă tm`1 P T and A1,...,Am P B we have

µt1...tm pA1,...,Amq “ µt1...tm`1 pA1,...,Am, Xq

ii) For all t1 ă ¨ ¨ ¨ ă tm, A1,...,Am P B, and σ a permutation on m letters, we have

µt1...tm pA1,...,Amq “ µtσp1q...tσpmq Aσp1q,...,Aσpmq The first condition is roughly saying that if one considers` a null condition˘ (the total space) in a higher-dimensional measure, one gets the same result without the null condition in the lower dimensional measure. The second condition is saying that the order of the indexing of the µ doesn’t matter.

Remark 2.5. The first of the above two requirements is sometimes called the Chapman- Kolmogorov equation.

22 Theorem 2.6 (Kolmogorov Extension Theorem). Given a set of compatible finite dimensional distributions tµt1...tm u with indexing set T , there exists a probability space pΩ, F, Pq and a stochastic process tXtu so that Xt has the required finite dimensional distributions, i.e. for all t1 ă ¨ ¨ ¨ ă tm P T and A1,...Am P B we have

PrXt1 P A1 and ... and Xtm P Ams “ µt1...tm pA1,...Amq (3.2)

3. Definition of Brownian motion (Wiener Process) Looking back at Section 1, the list of properties of Bpnq suggest a reasonable collection of compatible finite distributions. Namely independent increments with each increment distributed normally with mean zero and variance proportional to the time interval. and we make the following definition.

Definition 3.1. Standard Brownian motion tBtu is a stochastic process on R such that

i) B0 “ 0 almost surely (i.e. P rtω P Ω: B0 ‰ 0us “ 0),

ii) Bt has independent increments: for any t1 ă t2 ă ... ă tn,

Bt1 ,Bt2 ´ Bt1 ,...,Btn ´ Btn´1 are independent,

iii) The increments Bt ´ Bs are Gaussian random variables with mean 0 and variance given by the length of the interval:

V arpBt ´ Bsq “ |t ´ s| . Since this is a compatible collection of finite dimensional distributions, Theorem 2.6 guarantees the existence of the process we have described. pnq Looking back at B from (3.1) it might be reasonable to hope that the Brownian motion tBtu defined above might be a of time. Notice that the above definition makes no mention of the continuity. The following definition makes it clear what we mean by continuous.

Definition 3.2. A stochastic process is continuous if all the trajectories t Ñ Xtpωq are continuous. It turns out that the finite dimensional distributions can not guarantee that a stochastic process is almost surely continuous. However they can imply that it is possible for a given process to be continuous.

Definition 3.3. A stochastic process tXtu is a version (or modification) of a second stochastic process tYtu if for all t , PrXt “ Yts “ 1. Notice that this is a symmetric relation. Theorem 3.4 (Kolmogorov Continuity Theorem (a version)). Suppose that a stochastic process tXtu, t ě 0 satisfies the estimate: for all T ą 0 there exist positive constants α, β, D so that α 1`β Er|Xt ´ Xs| s ď D|t ´ s| @t, s P r0,T s, (3.3) then there exist a version of Xt which is continuous. Remark 3.5. The estimate in (3.3) holds for a Brownian motion. We give the details in one-dimension. First recall that if X is a gaussian random variable with mean 0 and variance σ2 4 4 4 2 then ErX s “ 3σ . Applying this to Brownian motion we have E|Bt ´ Bs| “ 3|t ´ s| and conclude that (3.3) holds with α “ 4, β “ 1, D “ 3. Hence it is not incompatible with the all ready assumed properties of Brownian motion to assume that Bt is continuous almost surely.

23 Remark 3.5 shows that continuity is a fundamental attribute of Brownian motion. In fact we have the following second (and equivalent) definition of Brownian motion which assumes a form of continuity as a basic assumption, replacing other assumptions.

Theorem 3.6. Let Bt be a stochastic process such that the following conditions hold: 2 i) EpB1 “ constant, ii) B0 “ 0 almost surely, iii) Bt`h ´ Bt is independent of tBs : s ď tu. iv) The distribution of Bt`h ´ Bt is independent of t ě 0 (stationary increments), v) (Continuity in probability.) For all δ ą 0,

lim Pr|Bt`h ´ Bt| ą δs “ 0 hÑ0 2 then Bt is Brownian motion. When EBp1q “ 1 we call it standard Brownian motion. The process introduced above can be straightforwardly generalized to n dimensions:

n Definition 3.7. n-dimensional Standard Brownian motion tBtu is a stochastic process on R such that

i) B0 “ 0 almost surely (i.e. P rtω : B0 ‰ 0us “ 0),

ii) Bt has independent increments: for any t1 ă t2 ă ... ă tn,

Bt1 ,Bt2 ´ Bt1 ,...,Btn ´ Btn´1 are independent,

iii) The increments Bt ´ Bs are Gaussian random variables with mean 0 and variance given by the length of the interval: denoting by pBtqi the i-th component of Bt

|t ´ s| if i “ j V arppBtqi ´ pBsqjq “ . #0 else

4. Constructive Approach to Brownian motion Returning to the construction of Section 1, one might be tempted to hope that the random pnq walks B converge to a Brownian motion Bt as n Ñ 8. While this is true that the distribution pnq of B converges weakly to that of Bt as n Ñ 8, a moments reflection shows that there is not hope that the sequence converges almost surely since Bpnq and Bpn`1q have no relation for a given pnq realization of the underlying random variable tξk u. We will now show that by cleverly rearranging the randomness we can construct a new sequence of random walks W pnqptq so that the stochastic process W pnq has the same distribution as Bpnq yet W pnq will converge almost surely to a realization of Brownian motion. pnq pnq We begin by defining a new collection of random variables tηk u from the tξk u. We define p0q η1 to be a normal random variable with mean 0 and variance 1 which is independent of all of the ξ’s. Then for n ě 0 and k P t1,..., 2ku we define

n 1 1 n 1 n n 1 1 n 1 n ηp ` q “ ηp q ` ξp q and ηp ` q “ ηp q ´ ξp q 2k 2 k 2 k 2k´1 2 k 2 k pnq Since each ηk is the sum of independent Gaussian random variables they are themselves Gaussian pnq ´n random variables. It is easy to see that ηk is mean zero and has variance 2 . Since for any k pnq pnq n ě 0 and j, k P t1,..., 2 u with j ‰ k, we see that Eηk ηj “ 0 and we conclude that because the 24 pnq k variables are Gaussian that the collection of random variables tηk : k P t1,..., 2 uu are mutually independent. Hence if we define

k pnq ´n pnq W pk2 q “ ηj j“1 ÿ and at intermediate times as the value of the line connecting the two nearest points, then W pnq has the same distribution as Bpnq from Section 1.

Theorem 4.1. With probability one, the sequence of functions W pnqptq pωq on converges uniformly to a continuous function Btpωq as n Ñ 8, and the process Btpq is a Brownian motion on r0, 1s. ` ˘

Proof. Now for n ě 0 and k P t1,..., 2ku define

pnq pnq pn`1q Zk “ sup W ptq ´ W ptq tPrpk´1q2´n,k2´ns ˇ ˇ and observe that ˇ ˇ

n 1 n n 1 1 n Zp q “ ηp q ´ ηp ` q “ ξp q k 2 k 2k´1 2 k ˇ ˇ ˇ ˇ pnq ˇ ´pn`ˇ2q ˇ ˇ Since ξk is normal with mean zero andˇ variance 2 ˇ ,ˇ we haveˇ by Markov inequality that

pnq 4 ´2pn`2q pnq Er|Z | s 3 ¨ 2 rZ ą δs ď k “ . P k δ4 δ4

pnq n In turn since the tZk : k “ 1,..., 2 u are mutually independent this implies that

pnq pn`1q pnq n pnq P sup W ptq ´ W ptq ą δ “ Prsup Zk ą δs “ 2 PrZ1 ą δs «tPr0,1s ff k ˇ ˇ ˇ ˇ 3 ¨ 2´2pn`2q ď 2n :“ ψpn, δq . δ4

Since ψpn, 2´n{5q „ c2´n{5 for some c ą 0, we have that

8 pnq pn`1q ´n{5 P sup W ptq ´ W ptq ą 2 ă 8 . n“1 tPr0,1s ÿ ” ˇ ˇ ı ˇ ˇ Hence the Borel-Cantelli lemma implies that with probability one there exists a random kpωq so that if n ě k then

sup W pnqptq ´ W pn`1qptq ď 2´n{5 . tPr0,1s ˇ ˇ ˇ pnq ˇ In other words with probability one the tW u form a Cauchy sequence. Let Bt denote the limit. It is not hard to see that Bt has the properties that define Brownian motion. Furthermore since pnq each W is uniformly continuous and converge in the supremum norm to Bt, we conclude that with probability one Bt is also uniformly continuous.  25 5. Brownian motion has Rough Trajectories In Section 4, we saw that Brownian motion could be seen as the limit of a ever roughening path. This leads us to wonder “how rough is Brownian motion?” We know it is continuous, but is it differentiable? Definition 5.1. The (standard) p-th variation on the interval ps, tq of any continuous function f is defined to be p Vprfsps, tq “ sup |fptk`1q ´ fptkq| (3.4) Γ k where the supremum is over all partitions ÿ

Γ “ tttku : s “ t0 ă t1 ă ¨ ¨ ¨ ă tn´1 ă tn “ tu , (3.5) for some n.

For a given partition ttku let us define the mesh wi dth of the partition to be

|Γ| “ sup |tk ´ tk´1| . 0ăkďN The variations of Brownian motion are finite only for a certain range of q:

Proposition 5.2. If Bt is a Brownian motion on the interval r0,T s with T ă 8, then

VprBsp0,T q ă 8 a.s. if and only if p ą 2 . (3.6) Proof. See [17, 12] for details.  The fact that large values of p imply boundedness of the quadratic variation may be surprising at first. However, this results from the fact that in order for the supremum in (3.4) to diverge for a continuous function we must consider a sequence ΓN of partitions with diverging number of intervals. As these intervals become smaller, the variation that each of them captures becomes smaller. These small contribution become even smaller if they are raised to a power p ą 1, whence the (possible) convergence. This concept is in close relation with the one of H¨oldercontinuity.

Notice that if a function f has a nice bounded on the interval r0, ts then V1rfsp0, tq ă 8 since

tk`1 1 1 |fptkq ´ fptk`1q| “ tk f psq ds ď sup |f psq| |tk`1 ´ tk| , sPr0,ts ˇ ˇ ` ˘ we see that ˇ ˇ 1 V1rfsp0, tq ď sup |f psq| t . sPr0,ts ` ˘ Similar considerations hold if f is Lipschitz continuous in r0,T s with Lipschitz constant L:

V1rfsp0, tq “ |fptkq ´ fptk`1q| ď Lptk`1 ´ tkq “ LT . k k ÿ ÿ Hence (3.6) implies that with probability one Brownian Motion can not have a bounded derivative on any interval. In fact someting much stronger is true. With probability one, Brownian motion is nowhere differentiable as a function of time (see [12] for details). From Proposition 5.2, we see that p “ 2 is the border case. It is quite subtle. On one hand the statement (3.6) is true, yet if one considers a specific sequence of shrinking partitions ΓpNq (such that each successive partition contains the previous partition as a sub-partition) then pNq pNq 2 QN pT q :“ |Bptk`1q ´ Bptk q| Ñ T a.s. . pNq Γÿ 26 Initially we will prove the following simpler statement.

pNq pNq Theorem 5.3. Let Γ be a sequence of partitions of r0,T s as in (3.5) with limNÑ8 |Γ | Ñ 0. Then

pNq pNq 2 QN pT q :“ |Bptk 1q ´ Bptk q| ÝÑ T (3.7) ` NÑ8 pNq Γÿ 2 in L pΩ, Pq.

Corollary 5.4. Under the conditions of the above theorem we have limNÑ8 QN pT q “ T in probability.

Proof. We see that for any ε ą 0

2 Er|Znpωq ´ T | s N rω : |Z pωq ´ T | ą εs ď Ñ 0 as |Γp q| Ñ 0. P N ε2



Proof of Theorem 5.3. Fix any sequence of partitions

pNq pNq pNq pNq pNq Γ :“ tttk u : 0 “ t1 ă t2 ¨ ¨ ¨ ă tn “ T u , of r0,T s with |ΓpNq| Ñ 0 as N Ñ 8. Defining

N´1 pNq pNq 2 ZN :“ rBptk`1q ´ Bptk qs , k“1 ÿ we need to show that 2 ErZN ´ T s . We have, 2 2 2 2 2 ErZN ´ T s “ ErZN s ´ 2T ErZN s ` T “ ErZN s ´ T .

pNq pNq pNq pNq pNq Using the convenient notation ∆kB :“ Bptk`1q ´ Bptk q and ∆kt :“ |tk`1 ´ tk | we have that

2 2 2 ErZN s “ Er p∆nBq p∆kBq s n k ÿ ÿ 4 2 2 “ Er p∆nBq s ` Er p∆kBq p∆nBq s n n‰k ÿ ÿ pNq 2 pNq pNq “ 3 p∆nt q ` p∆kt qp∆nt q n n‰k ÿ ÿ 2 4 pNq 2 since Ep∆kBq “ ∆kt and Ep∆kBq “ p∆kt q because ∆kB is a Gaussian random variable with pNq mean zero and variance ∆kt . The limit of the first term equals 0 as the maximum partition spacing goes to zero since

pNq 2 pNq p∆nt q ď 3 ¨ supp∆nt qT n ÿ 27 Returning to the remaining term

N k´1 N pNq pNq pNq pNq pNq p∆kt qp∆nt q “ ∆kt r ∆nt ` ∆nt s n‰k k“1 n“1 k`1 ÿ ÿ ÿ ÿ N pNq pNq “ ∆kt pT ´ ∆kt q k“1 ÿ pNq pNq 2 “ T ∆kt ´ p∆kt q “ T 2ÿ´ 0 ÿ Summarizing, we have shown that 2 ErZN ´ T s Ñ 0 as N Ñ 8  pNq pNq Corollary 5.5. Under the conditions of Theorem 5.3, if Γ Ă Γ we have limNÑ8 QN pT q “ T almost surely.

6. More Properties of Random Walks We now return to the family of random walks constructed in Section 1. The collection of random walks Bpnqptq constructed in (3.1) have additional properties which are useful to identify and isolate as the general structures will be important for our development of stochastic calculus. ´n n pnq Fixing an n ě 0, define tk “ k2 for k “ 0,..., 2 . Then notice that for each such k, B ptkq is a Gaussian random variable since it is the sum of the mutually independent random variables pnq n ξj . Furthermore for any collection of ptk1 , tk2 , . . . , tkm q with kj P t1,..., 2 qu we have that pnq pnq B pt1q, ¨ ¨ ¨ ,B ptmq is a multidimensional Gaussian vector.` ˘ Next notice that for 0 ď t` ă tk ď 1 we have Bpnq Bpnq Bpnq Bpnq Bpnq Bpnq Bpnq Bpnq Bpnq Bpnq (3.8) E tk t` “ E p tk ´ t` q ` t` t` “ E tk ´ t` ` t` “ t` since Bpnq “Bpnqˇis independent‰ “ of Bpnq and Bpˇnq B‰ pnq “ 0. We will‰ choose to view this single tk ´ t` ˇ t` Er tˇk ´ t` s “ fact as the result of two finer grain facts. The first being that the distribution of the walk at time pnq tk given the values tBs : s ă t`u is the same as the conditional distribution of the walk at time tk given only Bpnq. In light of Definition 3.1, we can state this more formally by saying that for all t` functions f

f Bpnq F f Bpnq Bpnq E p tk q t` “ E p tk q t` ” ˇ ı ” ˇ ı pnq ˇ ˇ where Ft “ σpBs : s ď tq. This propertyˇ is called the Markovˇ property which states that the distribution of the future dependes only on the past through the present value of the process. There is a stronger version of this property called the strong Markov property which states that one can in fact restart the process and restart it from the current (random) value and run it for the remaining amount of time and obtain the same answer. To state this more precisely let us introduce pnq the process Xptq “ x ` B ptq as the starting from the point x and let Px be the induced on Cpr0, 1sRq by the trajectory of Xptq for fixed initial x. Let Ex be the expected value associated to Px. Of course P0 is simply the random walk starting from 0 28 that we have been previously considering. Then the strong Markov property states that for any function f

E0fpXtk q “ E0F pXtk´t` , tkq where F px, tq “ ExfpXtq . Neither of these Markov properties is solely enough to produce (3.8). We also need some fact pnq about the mean of the process given the past. Again defining Ft “ σpBs : s ď tq, we can rewrite (3.8) as Bpnq F Bpnq E tk t` “ t` ” ˇ ı by using the Markov property. This equality is theˇ principle fact that makes a process what is called a martingale. ˇ We now revisit these ideas making more general definitions which abstract these properties so we can talk about and use them in broader contexts.

7. More Properties of General Stochastic Processes Gaussian processes. We begin by giving the general definition of a Gaussian process of which Brownian motion is an example.

Definition 7.1. tXtu is a Gaussian random process if all finite dimensional distributions of X are Gaussian random variables. I.e., for all t1 ă . . . tk P T and A1,...,Ak P B (where here B represents Borel sets on the real line) we have that there exists R a positive definite symmetric k ˆ k k matrix and m P R so that 1 ´ 1 pX´mqT R´1pX´mq rX P A and ... and X P A s “ e 2 . P t1 1 tk k k{2 ? A ˆ¨¨¨ˆA 2π det R ż 1 k p q We have the associated definitions

µt :“ ErXts ,Rt,s :“ covpXtXsq “ ErpXt ´ µtq ¨ pXs ´ µsqs .

Example 7.2. By definition, Brownian motion is a Gaussian process with mean vector µt “ 0 for all t ą 0 and covariance matrix 2 CovpBt,Bsq “ Cov pBs ` pBt ´ Bsq,Bsq “ E rBss ` E rpBt ´ BsqBss 2 “ E rBss ` E rBt ´ Bss E rBss “ s . where we have assumed without loss of generality that s ă t and in the third identity we have used the independence of increments of Brownian motion. This shows that for general t, s ě 0 we have

CovpBt,Bsq “ mintt, su . (3.9) In fact, because the mean and covariance structure of a Gaussian process completely detemine the properties of its marginals, if a Gaussian process has the same covariance and mean as a brownian motion then it is a Brownian Motion. Theorem 7.3. A Brownian motion is a Gaussian process with zero mean function, and covariance function minpt, sq. Conversely, a Gaussian process with zero mean function, and covariance function minpt, sq is a Brownian motion. Proof. Example 7.2 proves the forward direction. To prove the reverse direction, assume that Xt is a Gaussian process with zero mean and CovpXt,Xsq “ minpt, sq. Then the increments of the process, given by pXt,Xt`s ´ Xtq are Gaussian random variables with mean 0. The variance of the increments Xt`s ´ Xt is given by

VarpXt`s´Xt,Xt`s´Xtq “ CovpXt`s,Xt`sq´2CovpXt,Xt`s´Xtq`CovpXt,Xtq “ pt`sq´2t`t “ s . 29 The independence of Xt and Xt`s ´ Xt follows immediately by

CovpXt,Xt`s ´ Xtq “ CovpXt,Xt`sq ´ CovpXt,Xtq “ t ´ t “ 0 .  Martingales. In order to introduce the concept of a martingale, we first adapt the concept of σ-algebra to the framework of stochastic processes. In particular, in the case of a stochastic process we would like to encode the idea of history of a process: by observing a process up to a time t ą 0 we have all the information on the behavior of the process before that time but none after it. Furthermore, as t increases we increase the amount of information we have on that process. This idea is the one that underlies the concept of filtration: Definition 7.4. Given an indexing set T , a filtration of σ-algebras is a set of sigma algebras tFtutPT such that for all t1 ă ¨ ¨ ¨ ă tm P T we have

Ft1 Ă ¨ ¨ ¨ Ă Ftm . We now define in which sense a filtration contains the information associated to a certain process

Definition 7.5. A stochastic process tXtu is adapted to a filtration tFtu if its marginals are measurable with respect to the corresponding σ-algebras, i.e., if σpXtq Ď Ft for all t P T . In this case we say that the process is to the filtration tFtu . We also extend the concept of σ-algebras generated by a random variable to the case of a filtration. In this case, the filtration generated by a process tXtu is the smallest filtration containing enough information about tXtu. X Definition 7.6. Let tXtu be a stochastic process on pΩ, F, Pq . Then the filtration tFt u generated by Xt is given by X Ft “ σpXs|0 ď s ď tq , which means the smallest σ-algebra with respect to which the random variable Xs is measureable, X for all s P r0, ts. Thus, Ft contains σpXsq for all 0 ď s ď t. The canonical example of this is the filtration generated by a discrete random process (i.e. T “ N): Example 7.7 (Example 0.1 continued). The filtration generated by the N-coin flip process for m ă N is

Fm :“ σpX0,...,Xmq .

X Intuitively, we will think of tFt u as the history of tXtu up to time t, or the information about X tXtu up to time t. Roughly speaking, an event A is in tFt u if its occurence can be determined by knowing tXsu for all s P r0, ts. For example, if Bt is a Brownian motion consider the event

A “ tω P Ω : max |Bspωq| ď 2u . tPp0,1{2q

It is clear that we have A P F1{2 as the history of Bt up to time t “ 1{2 determines wether A has occurres or not. However, we have that A R F1{3 as the process may not yet have reached 2 at time t “ 1{3 but may do so before t “ 1{2. We now have all the tools to define the concept of a martingale:

Definition 7.8. tXtu is a Martingale with respect to a filtration Ft if for all t ą s we have

i) Xt is Ft-measurable ,

30 ii) ErXt|Fss “ Xs ,

iii) Er|Xt|s ă 8 .

Condition iii) in Def. 7.8 involves a conditional expectation with respect to the σ-algebra Ft. Recall that E rXt`s|Fts is an Ft-measureable random variable which approximates Xt`s in a certain optimal way (and it is uniquely defined). Then Def. 7.8 states that, given the history of Xt up to time t, our best estimate of Xt`s is simply Xt , the value of tXtu at the present time t. In other words, a martingale is the equivalent in stochastic calculus of a straight line.

B Example 7.9. Brownian motion is a martingale wrt tFt u. Indeed, we have that

E rBt`s|Fts “ E rBt ` pBt`s ´ Btq|Fts “ E rBt|Fts ` E rBt`s ´ Bt|Fts “ Bt ` 0 , by the independence of increments property. The above strategy can be extended to general functions gpXq, as the only property that was used is independence of the increments: because gpXq P F X this property implies that B E gpBt`s ´ Btq|Ft “ E rgpBt`s ´ Btqs . (3.10) 2 B Example 7.10. The process“ Xt :“ Bt ´ t ‰is a martingale wrt tFt u. Indeed we have that B 2 the process is obviously measurable wrt tFt u and that E |Bt | “ t ă 8 verifying i) and ii) from Def. 7.8. For iii) we have “ ‰ 2 B 2 B E Bt`s|Ft “ E pBt ` Bt`s ´ Btq |Ft 2 B B 2 B 2 “ ‰ “ E “Bt |Ft ´ 2E rBts E B‰t`s ´ Bt|Ft ` E pBt`s ´ Btq |Ft “ Bt ` s . 2 B 2 subtracting t ´ s on both“ sides‰ of the above“ equation we obtain‰ E“ Bt`s ´ pt ` sq|F‰t “ Bt ´ t . 2 2 B Example 7.11. The process Yt :“ exprλBt ´ λ t{2s is a martingale“ wrt tFt ‰u. Again, the B process is obviously measurable wrt tFt u and, computing the moment generating function of a 2 gaussian random variable, we have that E rexp pλBtqs “ exp tλ {2 ă 8, verifying i) and ii) from Def. 7.8. For iii) we have ` ˘ B B B E exp pλBt`sq |Ft “ E exp pλpBt ` Bt`s ´ Btqq |Ft “ exp pλBtq E exp pλpBt`s ´ Btqq |Ft 2 “ ‰ “ exp“ pλBtq E rexp pλBsqs “ exp sλ‰ {2 . “ ‰ multiplying by exprpt ´ sqλ2{2s on both sides of the above` equation˘ we obtain 2 B 2 E exp λBt`s ´ λ pt ` sq{2 |Ft “ exp λBt ´ λ t{s . Markov processes.“ We` now turn to the general˘ idea‰ of a` Markov Process.˘ As we have seen in the example bove, this family of processes has the “memoryless” property, i.e., their future depends on their past (their history, their filtration) only through their present (their state at the present time, or the σ-algebra generated by the random variable of the process at that time). In the discrete time and countable sample space setting, this holds if given t1 ă ¨ ¨ ¨ ă tm ă t, we have that the distribution of Xt given pXt1 ,...,Xtm q equals the distribution of Xt given pXtm q. This is the case if for all A P BpXq and s1, . . . , sm P X we have that

P rXt P A|Xt1 “ s1 ...,Xtm “ sms “ P rXt P A|Xtm “ sms . This property can be stated in more general terms as follows:

Definition 7.12. A random process tXtu is called Markov with respect to a filtration tFtu when Xt is adapted to the filtration and, for any s ą t, Xs is independent of Ft given Xt.

31 The above definition can be restated in terms of Brownian motions as follows: For any set A P pXq we have B PrBt P A|Fs s “ PrBt P A|Bss a.s. . (3.11) Remember that conditional probabilities with respect to a σ-algebra are really random variables in the way that a conditional expectation with respect to simga-algebra is a random variable. That is, B B P Bt P A|Fs “ E 1BtPApωq|Fs and P rBt P A|Bss “ E r1BtPApωq|σpBsqs . Example“ 7.13. The‰ fact“ that (3.11) holds‰ can be shown directly by using characteristic functions. Indeed,to show that the distributions of the right and left hand side of (3.11) coincide it is enough to identify their characteristic functions. We compute

iϑBt B iϑpBs`Bt´Bsq B iϑBs iϑpBt´Bsq B iϑBs iϑpBt´Bsq E e |Fs “ E e |Fs “ e E e |Fs “ e E e , and similarly” ı ” ı ” ı ” ı

iϑBt iϑpBs`Bt´Bsq iϑBs iϑpBt´Bsq iϑBs iϑpBt´Bsq E e |Bs “ E e |Bs “ e E e |Bs “ e E e . Stopping” timesı and” Strong Markovı property.” We now introduceı the” concept ofı stopping time. As the name suggests, a stopping time is a time at which one can stop the process. The accent in this sentence should be put on can, and is to be intended in the following sense: if someone is observing the process as it evolves, and is given the instructions on when to stop, I can stop the process given his/her/their observations. In other words, the observer does not need future information to know if the event triggereing the stop of the process has occurred or not. We now define this concept formally:

Definition 7.14. For a measurable space pΩ, Fq and a filtration tFtu with Ft Ď F for all t P T , a random variable tτu is a stopping time wrt tFtu if tω P Ω: τ ď tu P Ft for all t P T . Classical examples of stopping times are hitting times such as the one defined in the following example

Example 7.15. The random time τ1 :“ infts ą 0 : Bt ě 1u is a stopping time wrt the natural B filtration of Brownian motion Ft . Indeed, at any time t i can know if the event τ1 has passed ˚ by looking at the past history of Bt. However, the random time τ0 : ` sup ts P p0, t q : Bs “ 0u is B ( ˚ ˚ NOT a stopping time wrt Ft for t ă t , as before t we cannot know for sure if the process will reach 0 again. ( The strong Markov property introduced below is a generalization of the Markov property to stopping times (as opposed to fixed times in Def. 7.12). More specifically, we say that a stochastic process has the strong Markov property if its future after any stopping time depends on its past only throught the present (i.e., its state at the stopping time).

Definition 7.16. The stochastic process tXtu has the strong Markov property if for all finite stopping time τ one has P rXτ`s P A|Fτ s “ P rXτ`s P A|Xτ s , where Fτ :“ tA P Ft : tτ ď tu X A P Ft , @t ą 0u.

In the above definition, the σ-algebra Fτ can be interpreted as “all the information we have on the process up to time τ”. Theorem 7.17. Brownian motion has the strong Markov property. We now use the above result to investigate some of the properties of Brownian motion:

32 Example 7.18. For any t ą 0 define the maximum of Brownian motion in the interval r0, ts as Mt :“ maxsPp0,tq Bs. Similarly, for any m ą 0 we define the hitting time of m as τm :“ infts P r0, ts : Bs ě mu. Then, we write

P rMt ě ms “ P rτm ď ts “ P rτm ď t, Bt ě ms ` P rτm ď t, Bt ă ms

“ P rτm ď t, Bt ´ Bτm ě 0s ` P rτm ď t, Bt ´ Bτm ă ms .

Using the strong Markov property of Brownian motion we have that Bt ´ Bτm is independent on

Fτm and is a Brownian motion. So by symmety of Brownian motion we have that

P rτm ď t, Bt ´ Bτm ě 0s “ P rτm ď t, Bt ´ Bτm ď ms “ P rBt ě ms , and we conclude that 8 2 y2 P rMt ě ms “ 2P rBt ě ms “ ? e 2t dx . 2πt żm This argument is called the reflection principle for Brownian motion. From the above argument one can also extract that lim P rτm ă T s “ 1 , T Ñ8 i.e., that the hitting times of Brownian motion are almost surely finite. From the above example we can also derive the following formula

Example 7.19. We will compute the probability density ρτm psq of the hitting time of level m by t the Brownian motion, defined by P rτm ď ts “ 0 ρτm psqds. To do so we write

8 y2 ? 8 ş 2 ´ ´u2 rτ ď ts “ rM ě ms “ 2 rB ě ms “ e 2t dy “ 2π e du , (3.12) P m P t P t πt ? c m m{ t ż ? ż where in the last equality we made a u “ y{ t. Now, differentiating (3.12) wrt t we obtain, by Leibnitz rule,

2 d |m| ´3{2 ´ m ρ ptq “ rτ ď ts “ t e 2t . (3.13) τm dtP m 2π We immediately see from (3.13) that

8 2 |m| ´1{2 ´ m rτ s “ s e 2s ds “ 8 . E m 2π ż0 Example 7.20. From the above computations we derive the distribution of zeros of Brownian motion in the interval ra, bs for 0 ă a ă b. We start by computing the desired quantity on the interval r0, ts for an initial condition x which we assume wlog to be x ă 0:

P rBs “ 0 for s P r0, ts|B0 “ xs “ P max Bs ě 0|B0 “ x “ P max Bs ě ´x|B0 “ 0 sPr0,ts sPr0,ts „  „  t 2 |x| ´3{2 ´ x “ rτ ď ts “ s e 2s ds . (3.14) P ´x 2π ż0 Since the above expression holds for all x we obtain the distribution of zeroes in the inteval ra, bs by integrating (3.14) over all possible x, weighted by the probability of reaching x at time a: 8 P rBs “ 0 for s P ra, bs|B0 “ 0s “ P rBs “ 0 for s P ra, b ´ as|Ba “ xs P rBa P dxs ż´8 8 b´a 2 2 |x| ´3{2 ´ x 2 ´ x 2 a s e 2s ds e 2a dx “ arccos . 2π πa π b ż´8 ż0 c ˆc ˙ 33 By taking the complement of the above we also obtain the probability that Brownian motion has no zeroes in the interval ra, bs:

2 a rB ‰ 0 @s P ra, bss “ 1 ´ rB “ 0 for s P ra, bss “ arcsin . P s P s π b ˆc ˙ The above result is referred to as the arcsine law for Brownian motion.

8. A glimpse of the connection with pdes The Gaussian

2 ´ z e 2πt ρpt, zq “ ? 2πt

Bρ 1 B2ρ is the fundamental solution to the heat equation. By direct calculation one sees that if t ą 0 then Bt “ 2 Bx2 . There is a small problem at zero, namely ρ blows up. However, for any “nice” function φpxq (smooth with compact support),

lim ρpt, xqφpxqdx “ φp0q. tÑ0 ż This is the definition of the “delta function” δpxq. (If this is uncomfortable to you, look at [16].) Hence we see that ρpt, xq is the (weak) solution to

Bρ 1 B2ρ “ Bt 2 Bx2 ρp0, xq “ δpxq

To see the connection to probability, we set ppt, x, yq “ ρpt, x ´ yq and observe that for any function f we have

8 E fpBtq B0 “ x “ fpyqppt, x, yqdy ż´8 ˇ ( ˇ We will write ExfpBtq for E fpBtq B0 “ x . Now notice that if upt, xq “ ExfpBtq then upt, xq solves

ˇ ( Bu 1 B2u ˇ “ Bt 2 Bx2 up0, xq “ fpxq

This is Kolmogorov’s Backward equation. We can also write it in terms of the transition density ppt, x, yq

Bp 1 B2p “ Bt 2 Bx2 pp0, x, yq “ δpx ´ yq

From this we see why it is called the “backwards” equation. It is a differential equation in the x variable. This is the “backwards” equation in ppt, x, yq in that it gives the initial point. This begs a question. Yes, there is also a forward equation. It is written in terms of the forward variable y.

Bp 1 B2p “ Bt 2 By2 pp0, x, yq “ δpx ´ yq

In this case it is identical to the backwards equation. In general it will not be. We make one last observation: the ppt, s, x, yq “ ppt ´ s, x, yq “ ρpt ´ s, x ´ yq satisfy the Chapman-Kolmogorov equation (the semi-group property). Namely, for any s ă r ă t and any x, y we have

8 pps, t, x, yq “ pps, r, x, zqppr, t, z, yqdz ż´8 34 y x z

s r t This also suggests the following form for the Kolmogorov forward equation. If we write an equation for pps, t, x, yq evolving in s and y, then we get an equation with a finial condition instead of an initial condition. Namely, for s ď t Bp 1 B2p “ ´ Bs 2 By2 ρpt, t, x, yq “ δpx ´ yq Hence, we are solving backwards in time.

35

CHAPTER 4

ItˆoIntegrals

1. Properties of the noise Suggested by Modeling

If we want to model a process Xt which was subject to random noise we might think of writing down a differential equation for Xt with a “noise” term regularly injecting randomness like the following:

dXt “ fpX q ` gpX q ¨ pnoiseq dt t t t The way we have written it suggests the “noise” is generic and is shaped to the specific state of the system by the coefficient gpXtq. It is instructive to write this in integral form over the interval r0, ts t t Xt “ X0 ` fpXsq ds ` gpXsq ¨ pnoiseqs ds (4.1) ż0 ż0 it is reasonable to take the “noise” term to be a pure noise, independent of the structure of Xt and leave the “shaping” of the noise to a particular setting to the function gpXtq. Since we want all moments of time to be the same it is reasonable to assume that distribution of “noise” is stationary in time. We would also like the noise at one moment to be independent of the noise at different moment. Since in particular both of these properties should hold when the function g ” 1 we consider that simplified case to gain insight. Defining t Vt “ pnoiseqs ds ż0 stationarity translates to the distribution of Vt`h ´ Vt being independent of t. Independence translates to

Vt1 ´ V0,Vt2 ´ Vt1 , ¨ ¨ ¨ ,Vtn`1 ´ Vtn being a collection of mutually independent random variables for any collection of times

0 ă t1 ă t2 ă ¨ ¨ ¨ ă tn . Rewriting (4.1) with g ” 1 produces t Xt “ X0 ` fpXsq ds ` Vt żt we see that if further decide that we would like to model processes Xt which are continuous in time we need to require that t ÞÑ Vt is almost surely a continuous process. Clearly from our informal definition, V0 “ 0. Collecting all of the properties we desire of Vt: i) V0 “ 0 ii) Stationary increments iii) Independent increments iv) t ÞÑ Vt is almost surely continuous.

37 Comparing this list with Theorem 3.6 we see that Vt must be a Brownian motion. We will chose it to be standard Brownian motion to fix a normalization. Hence we are left to make sense of the integral equation t t dBs X “ X ` fpX qdS ` gpX q ds t 0 s s ds ż0 ż0 Of course this leads to its own problems since we saw in Section 5 that Bs is nowhere differentiable. Formally canceling the “ds” maybe we can make sense of the integral t gpXsqdBs . ż0 There is a well established classical theory of integrals in this form called “RiemannStieltjes” integration. We will briefly sketch this theory in the next section. However, we will see that even this theory is not applicable to the above integral. This will lead us to consider a new type of integration theory designed explicitly for random functions like Brownian motion. This named the ItˆoIntegral after Kiyoshi Itˆowho developed the modern version though earlier version exist (notably in the work of Paley, Wiener and Zygmund).

2. Riemann-Stieltjes Integral Before we try to understand how to integrate against Brownian motion, we recall the classical Riemann-Stieltjes integration theory. Given two continuous functions f and g, we want to define T fptqdgptq . (4.2) ż0 We begin by considering a piecewise function φ function defined by

a0 for t P rt0, t1s φptq “ (4.3) #ak for t P ptk, tk`1s, k “ 1, . . . , n ´ 1 for some partition

0 “ t0 ă t1 ă ¨ ă tn´1 ă tn “ t and constants ak P R. For such a function φ it is intuitively clear that

T n´1 tk`1 n´1 tk`1 n´1 φptqdgptq “ φpsqdgpsq “ ak dgpsq “ akrgptk`1q ´ gptkqs . (4.4) 0 k“0 tk k“0 tk k“0 ż ÿ ż ÿ ż ÿ tk`1 tk`1 because dgpsq “ gptk 1q ´ gptkq by the fundamental theorem of Calculus (since dgpsq “ tk ` tk tk`1 g1psq ds if g is differentiable). tk ş ş The basic idea of defining (4.2) is to approximate f by a sequence of step functions tφ ptqu each ş n of the form given in (4.3) so that

sup |fptq ´ φnptq| Ñ 0 as n Ñ 8 . (4.5) tPr0,T s

pnq ´n n A natural choice of partition for the nth level is tk “ T k2 for k “ 0,..., 2 and then define the nth approximating function by fpT q if t “ T φnptq “ pnq pnq pnq #fptk q if t P rtk , tk`1q If f is continuos, it easy to see that (4.5) holds.

38 We then are left to show that there exists a constant α so that T φpnqptqdgptq ´ α ÝÑ 0 as n Ñ 8 ż0 ˇ T ˇ We would then define the integralˇ 0 fptqdgptq toˇ be equal to α. One of the keys to proving this convergence is a uniform bound on the approximating integrals T φpnq t dg t . Observe that ş 0 p q p q T n´1 ş pnq | φ ptqdgptq| ď }f}8 |gptkq ´ gptk`1q| ď }f}8V1rgsp0,T q 0 k“0 ż ÿ where

}f}8 “ |fptq| . t 0,T Prÿ s T pnq This uniform bound implies that the 0 φ ptqdgptq say in a compact subset of R. Hence there must be a limit point α of the sequence and a subsequence which converges to it. It is not then hard to show that this limit point is unique.ş That is to say that if any other subsequent converges it must T also converge to α. We define the value of 0 fptqdgptq to be α. Hence it seems sufficient for g to have V1rgsp0,T q ă 8 if when g and f are continuous. It can also be shown to be necessary for a reasonable class of f. ş It is further possible to show using essentially the same calculations that the limit α is independent of the sequence of partitions as long as the maximal spacing goes to zero and independent of the choice of point at which to evaluate the integrand f. In the above discussion we chose the left hand endpoint tk of the intervale rtk, tk`1s. However we were free to choose any point in the interval. While the compactness argument above is a standard path in is often more satisfying to explicitly show that the tφpnqu are a Cauchy sequence by showing that for any ε ą 0 there exists and N so that if n, m ą N then T T φpmqptqdgptq ´ φpnqptqdgptq ă ε 0 0 ˇ ż ż ˇ pmq pnq ˇ ˇ T pmq pnq Since φ ´ φ is again a stepˇ function of the form (4.3), theˇ integral 0 rφ ´ φ sptqdgptq is well defined given by a sum of the form (4.4). Hence we have ş T T T pmq pnq pmq pnq pmq pnq φ ptqdgptq ´ φ ptqdgptq “ rφ ´ φ sptqdgptq ď }φ ´ φ }8V1rgsp0,T q 0 0 0 ˇ ż ż ˇ ˇ ż ˇ ˇ ˇ ˇ ˇ pnq Sinceˇ f is continuous and the partitionˇ spacingˇ is going to zero it isˇ not hard so see that the tφ u from a Cauchy sequence under the } ¨ }8 norm which complets the proof that integrals of the step functions form a Cauchy sequence.

3. A motivating example We begin by considering the example T Bs dBs (4.6) ż0 where B is a standard Brownian motion. Since V1rBsp0,T q “ 8 almost surely we can not entirely follow the prescription of the Riemann-Stietjes integral given in Section 2. However, it still seems reasonable to approximate the integrand Bs by a sequence of step functions of the form(4.3). However, since B is random, the ak from (4.3) will have to be random variables.

39 Fixing a partition 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tN “ T , we define two different sequences of step function approximations of B. For t P r0,T s, we define N φ ptq “ Bptkq if t P rtk, tk`1q ˆN φ ptq “ Bptk`1q if t P ptk, tk`1s Just as in the Riemann-Stietjes setting (see (4.4)), for such step functions it is clear that one should define the respective integrals in the following manner:

T N´1 N φ ptq dBs “ BptkqrBptk`1q ´ Bptkqs 0 i“0 ż ÿ T N´1 ˆN φ ptq dBs “ Bptk`1qrBptk`1q ´ Bptkqs 0 i“0 ż ÿ In the Riemann-Stieltjes setting, the two are the same. But in this case, the two have very different properties as the following calculation shows. Since Bptkq and Bptk`1q ´ Bptkq are independent we have EBptkqrBptk`1q ´ Bptkqs “ E Bptkq ErBptk`1q ´ Bptkqs “ 0. So T “ N´‰ 1 N E φ dBs “ EBptkqrBptk`1q ´ Bptkqs 0 k“0 ” ż ı ÿ N´1 “ E Bptkq ErBptk`1q ´ Bptkqs “ 0 k“0 ÿ “ ‰ 2 While since ErBptk`1q ´ Bptkqs “ tk`1 ´ tk, we have T N´1 N´1 ˆN 2 E φ dBs “E BptkqrBptk`1q ´ Bptkqs ` E rBptk`1q ´ Bptkqs 0 i“0 k“0 ” ż ı ÿ ÿ N´1 “0 ` tk`1 ´ tk “ T k“0 ÿ “ ‰ Hence, how we construct our step functions will be important in our analysis. The choice of the endpoint used in φN ptq leads to what is called the Itˆointegral. The choice used in φˆN ptq is called the Klimontovich Integral. While if the midpoint is chosen, this leads to the . The question on which to use is a modeling question and is dependent on the problem being studied. We will see that it is possible to translate between all three in most cases. We will concentrate on the Itˆointegral since it has some nice additional properties which make the analysis attractive.

4. Itˆointegrals for a simple class of step functions

Let pΩ, F, Pq be a probability space and Bt be a standard Brownian motion. Let Ft be a filtration on the probability space to which Bt is adapted. For example, one could have Ft “ σpBs : s ď tq be the filtration generated by the Brownian motion Bt. Definition 4.1. φpt, ωq is an elementary stochastic process if there exists a collection of bounded, disjoint intervals tIku “ trt0, t1q, rt1, t2q,..., rtN´1, tN qu associated to a partition 0 ď t0 ă t1 ă ¨ ¨ ¨ ă tN and a collection of random variables tαk : k “ 0,...,Nu so that N

φpt, ωq “ αkpωq1Ik ptq , k“0 ÿ 40 2 the random variable αkpωq is measurable with respect to Ftk , and Er|αkpωq| s ă 8. We denote the space of all elementary functions as S2. To be precise, stochastic integrals are defined on the class of progressively measurable processes, defined below:

Definition 4.2. A stochastic process tXtutě0 on pΩ, F, P, tFtuq is called progressively measurable, if for any t ě 0, Xtpωq, viewed as a function of two variables pt, ωq is Br0,ts ˆ Ft-measurable, where Br0,ts is the Borel σ-algebra on r0, ts. Fact 1. Every adapted right continuous with left limits (“cadlag”) or left continuous with right limits process is progressively measurable. The reason to assume progressive measurability is to ensure that the expectation and the integral can be interchanged (by Fubinis theorem).

Fact 2. Any progressively measurable process X “ tXtutPr0,T s can be approximated by a sequence of n n simple processes X “ tXt utPr0,T s P S2 in the L2-sense, that is T T n 2 n 2 E |Xt ´ Xt| dt “ E |Xt ´ Xt| dt Ñ 0 as n Ñ 8 . «ż0 ff ż0 “ ‰ The proof of this approximation can be found in [13]. Because of the above result, we will be able to extend the notion of integral to adapted processes, and we restrict our attention to such processes for the rest of the chapter. 8 Next, we define a functional I which will be our integral operator. That is to say Ipφq “ 0 φ dB. Just as for Riemann-Stieltjes integral, if φ is an elementary stochastic process, it is relatively clear what we should mean by Ipφq, namely ş Definition 4.3. The stochastic integral operator of an elementary stochastic process is given by

Ipφq :“ αkrBptk`1q ´ Bptkqs . k ÿ We first observe that I satisfies one of the standard properties of an integral in that it is a linear functional. In other words, if λ P R, and φ and ψ are elementary stochastic processes then Ipλφq “ λIpφq and Ipψ ` φq “ Ipψq ` Ipφq (4.7)

Thanks to our requirement that αk are measurable with respect to the filtration associated to the left endpoint of the interval rtk, tk`1q we have the following properties which will play a central role in what follows and should be compared to the calculations in Section 3. Lemma 4.4. If φ is an elementary stochastic processes then EIpφq “ 0 (mean zero) 8 2 2 E Ipφq “ E|φptq| ds (ItˆoIsometry) ż0 Remark 4.5. An isometry“ is a‰ map between two spaces which preserves distance (i.e. the norm). If we consider

IpS2q “ Ipφq : φ P S2u then according to Lemma 4.4 the map Ipφq ÞÑ φ is an isometry between the space of random variables 2 2 L IpS2q, P “ X P IpS2q : }X}“ EpX q ă 8u 2 and the space of elementary` stochastic˘ processes L pS2, Prdωaq ˆ dts equipped with the norm 8 1 2 2 }φ}“ Eφ pt, ωq dt 0 ´ ż ¯ 41 Proof of Lemma 4.4. We begin by showing Ipφq is mean zero.

ErIpφqs “ ErαkpBptk`1q ´ Bptkqqs k ÿ “ ErErαkpBptk`1q ´ Bptkqqs|Ftk s k ÿ “ ErαkErBptk`1q ´ Bptkq|Ftk ss “ 0 k ÿ Turning to the Itˆoisometry

2 ErIpφq s “ E αkpBptk`1q ´ Bptkqq ¨ E αjpBptj`1q ´ Bptjqq k j ” ` ÿ ˘ı ” ` ÿ ˘ı “ ErαkαjrBptk`1q ´ BptkqsrBptj`1q ´ Bptjqss j,k ÿ “2 ErαkαjrBptk`1q ´ BptkqsrBptj`1q ´ Bptjqss jăk ÿ 2 2 ` ErαkrBptk`1q ´ Bptkqs s k ÿ Next, we examine each component separately: Recall for tk`1 ď tj

ErαkαjrBptk`1q ´ BptkqsrBptj`1q ´ Bptjqss

“ ErErαkαjrBptk`1q ´ BptkqsrBptj`1q ´ Bptjqs|Ftj ss

“ ErαkαjrBptk`1q ´ BptkqsErrBptj`1q ´ Bptjqs|Ftj ss “ 0 since ErrBptj`1q ´ Bptjqs|Ftj s “ 0. Similarly 2 2 2 2 ErαkrBptk`1q ´ Bptkqs s “ ErErαkrBptk`1q ´ Bptkqs |Ftk ss k k ÿ ÿ 2 “ Erαkptk`1 ´ tkqs k ÿ Hence, we have:

2 2 2 ErIpφq s “ 0 ` Erαksptk`1 ´ tkq “ Erφ psqs ds k ÿ ż  So far we have just defined the ItˆoIntegral on the whole positive half line r0, 8q. For any 0 ď s ă t ď 8, we make the following definition t φr dBr “ Ipφ1rs,tqq żs We can now talk about the stochastic process t Mt “ Ipφ1r0,tqq “ φsdBs (4.8) ż0 associated to a given elementary stochastic process φt, where the last two expressions are just different notation for the same object. We now state a few simple consequences of our definitions.

42 Lemma 4.6. Let φ P S2 and 0 ă s ă t then t s t φr dBr “ φr dBr ` φr dBr ż0 ż0 żs and t Mt “ φs dBs ż0 is measurable with respect to Ft.

Proof of Lemma 4.6. Clearly Mt is measurable with respect to Ft since tφs : s ď tu are by assumption and the construction of the integral only uses the information from tBs : s ď tu. The first property follows from φ1r0,ts “ φ1r0,sq ` φ1rs,tq and hence

Ipφ1r0,tsq “ Ipφ1r0,sqq ` Ipφ1rs,tqq by (4.7). 

Lemma 4.7. Let Mt be as in (4.8) for an elementary process φ and a Brownian motion Bt both adapted to a filtration tFt : t ě 0u. Then Mt is a martingale with respect to the filtration Ft. Proof of Lemma 4.7. Looking at Definition 7.8, there are three conditions we need to verify. 2 The measurability is contained in Lemma 4.6. The fact that ErMt s ă 8 follows from the Itˆo isometry since t 8 2 2 2 ErMt s “ E|φs| ds ď E|φs| ds ż0 ż0 because the last integral is assumed to be finite in the definition of an elementary stochastic process. All that remains is to verify that for s ă t.

ErMtpφq ´ Mspφq|Fss “ 0 .

There are a few cases. Taking one case, say s and t are in the disjoint intervals rtk, tk`1q and rtj, tj`1q, respectively. We have that j´1 Mtpφq ´ Mspφq “ αkrBptk`1q ´ Bss ` αnrBptn`1q ´ Bptnqs ` αjrBt ´ Bptjqs n“k`1 ´ ÿ ¯ Next, take repeated expectations with respect to the filtrations tFau, where a P ttj, tj´1, . . . , tk`1, su. This would then imply that for each a

ErαarBpta`1q ´ Bptaqs|Fas “ αaErBpta`1q ´ Bptaq|Fas “ αaErBpta`1q ´ Bptaqs “ 0

Hence, ErMtpφq ´ Mspφq|Fss “ 0. The other cases can be done similarly. And the conclusion immediately follows. 

Lemma 4.8. In the same setting as Lemma 4.7, Mt is a continuos stochastic process. (That is to say, with probability one the map t ÞÑ Mt is continuos for all t P r0, 8q). Proof of Lemma 4.8. We begin by noticing that if φpt, ωq is a simple process with N´1

φpt, ωq “ αkpωq1rtk,tk`1qptq k“1 ÿ 43 then if t P ptk˚ , tk˚`1q then k˚´1 Mtpφ, ωq “ αkpωqrBptk`1, ωq ´ Bptk, ωqs ` αk˚ pωqrBpt, ωq ´ Bptk˚ , ωqs k“1 ÿ Hence it is clear that Mtpφ, ωq is continuous if φ is a simple function since the Brownian motion Bt is continuous.  5. Extension to the Closure of Elementary Processes ¯ 2 We will denote by S2 the closure in L pP ˆ dtq of the square integrable elementary processes S2. Namely,

S¯2 “ all stochastic procces fpt, ωq : there exist a sequence φnpt, ωq P S2 8 ! 2 so that Epfptq ´ φnptqq dt Ñ 0 as n Ñ 8 ż0 Also recall that we define ) 2 2 L pΩ, Pq :“ f :Ω Ñ R : fpωq Prdωs ă 8 Ω ! ż 2) “ tRandom Variables X with EpX q ă 8u and 8 2 2 L pΩ ˆ r0, 8q, P ˆ dtq :“ f :Ω ˆ r0, 8q : fpt, ωq Prdωs dt ă 8 0 Ω ! ż ż 8 ) 2 “ Stochastic Processes Xt with EpXt q dt ă 8 0 ! ż ) In general, in the interest of brevity we will write L2pΩq for the first and L2pΩ ˆ r0, 8qq for 2 the second space above. We will occasionally write }X}L2pΩq for EpX q and }X}L2pΩˆr0,8qq for 8 1 2 2 p 0 EpXt q dtq though we will simply write } ¨ }L2 when the contexta is clear. Recalling that our definition of S2 and hence S¯2 had a filtration Ft in the background, we ş 2 2 define LadpΩ ˆ r0, 8qq to be the subset of L pΩ ˆ r0, 8qq in which all of the stochastic processes are adapted to the filtration Ft. We have the following simple observation Lemma 5.1. The closure of the space of elementary processes is contained in the space of square ¯ 2 integrable, adapted processes, i.e., S2 Ă LadpΩ ˆ r0, 8qq .

Proof. The adaptedness follows from the adaptedness of S2. The fact that elements in S¯2 are square integrable follows from the following calculation. Fixing an X P S¯2 and a sequence φn P S¯2 so that }φn ´ X}Ñ 0 as n Ñ 8 we fix an n so that }φn ´ X} ď 1. Then since for any real numbers a and b and p ě 1 one has |a ` b|2 ď 2p´1|a|2 ` 2p´1|b|p we see that 2 2 2 2 2 }f}L2 “}fn ` pf ´ φnq}L2 ď 2}φn}L2 ` 2}f ´ φn}L2 ď 2p}φn}L2 ` 1q ă 8 2 where the term on the far right is finite since for every n, φn P S2 (observe that }φn}L2 “ 2 Epαkq).  ¯ 2 ř In fact, S2 is exactly LadpΩ ˆ r0, 8qq, ¯ 2 Theorem 5.2. S2 “ LadpΩ ˆ r0, 8qq ¯ 2 Proof. S2 Ă LadpΩ ˆ r0, 8qq was just proven in Lemma 5.1. For the other direction, see the proof of [13, Theorem 3.1.5] or [11]  44 We are now ready to state the main theorem of this section. This result shows that given 2 a Cauchy sequence of elementary stochastic processes tφnu P S2 converging in L pΩ, r0, 8qq to a 2 process f P S¯2, there exists a random variable X P L pΩq to which the stochastic integrals Ipφnq 2 8 converge in L pΩq. We will then define this variable X to be the stochastic integral Ipfq “ 0 fs dBs. The concept of the construction is summarized in the following scheme: ş Ip¨q 2 2 tφnptqutPT P S IpS q Q φn dBs ş 8 2 n 2 0 E pφn ´ fq dt Ñ 0 Ñ 8 n Ñ 8 E pIpφnq ´ Xq Ñ 0 ş “ ‰ “ ‰ Ip¨q ¯2 2 tfptqutPT P S L pΩq Q X “: f dBs

2 ş Theorem 5.3. For every f P S¯2, there exists a random variable X P L pΩq so that if tφn : n “ 1,..., 8u is a sequence of elementary stochastic processes (i.e. elements of S2) converging to the 2 2 stochastic process f in L pΩ ˆ r0, 8q then Ipφnq converges to X in L pΩq.

Definition 5.4. For any f P S¯2, we define the Itˆointegral 8 Ipfq “ ft dBt ż0 to be the random variable X P L2pωq given by Theorem 5.3. 2 Proof of Theorem 5.3. We start by showing that Ipφnq is a Cauchy sequence in L pΩq. By the linearity of the map I (see (4.7)), Ipφnq ´ Ipφmq “ Ipφn ´ φmq. Hence by the Itˆoisometry for the elementary stochastic processes (see lemma 4.4), we have that 8 2 2 2 E pIpφnq ´ Ipφmqq “ E Ipφn ´ φmq “ E pφnptq ´ φmptqq dt (4.9) ż0 “ ‰ “ 2 ‰ “ ‰ Since the sequence φn converges to f in L pΩ ˆ r0, 8qq, we know that it is a Cauchy sequence in 2 L pΩ ˆ r0, 8qq. By the above calculations we hence have that tIpφnqu is a Cauchy sequence in L2pΩq. It is a classical fact from real analysis that this space is complete which implies that every Cauchy sequence converges to a point in the same space. Let X P L2pΩq denote the limit. To see that X does not depend on the sequence tφnu, let tφ˜nu be another sequence converging 2 to f. The same reasoning as above ensures the existence of a X˜ P L pΩq so that Ipφ˜nq Ñ X˜ in L2pΩq. On the other hand 8 ˜ 2 ˜ 2 ˜ 2 E pIpφnq ´ Ipφnqq “ E pIpφn ´ φnqq “ E pφnptq ´ φmptqq dt 0 ” ı ” 8 ı ż ” 8 ı 2 ˜ 2 ď 2 E pφnptq ´ fq dt ` 2 E pf ´ φmptqq dt 0 0 ż ż ” ı “ ‰ 2 2 where in the inequality we have again used the fact that pφn ´ φ˜mq “ rpφn ´ fq ` pf ´ φ˜mqs ď 2 2 2 2pφn ´ fq ` 2pf ´ φ˜nq . Since both φn and φ˜n converge to f in L pΩ ˆ r0, 8qq we have that the 2 last two terms on the righthand side go to 0. This in turn implies that EpX ´ X˜q “ 0 and that the two random variables are the same.  45 Remark 5.5. While the above construction might seem like magic since it is so soft, it is an application of a general principle in mathematics. If one can define a linear map on a dense subset of elements in a space in such a way that the map is an isometry then the map can be uniquely extend to the whole space. This approach is beautifully presented in our current context in [10] using the following lemma

Lemma 5.6. (Extension Theorem) Let B1 and B2 be two Banach spaces. Let B0 Ă B1 be a linear space. If L : B0 Ñ B2 is defined for all b P B0 and |Lb|B2 “ |Lb|B1 , @b P B0. Then there exists a unique representation of L to B0 (closure of B0) called L with Lb “ Lb, @b P B0. Example 5.7. We use the above theorem to show that T 1 I pB q “ B dB “ pB2 ´ T q . T s s s 2 T ż0 N n pnq pnq To do so, we show that the sequence φ 1 pnq pnq B n with ∆t t t 0 converges n “ j“1 tj j “ j`1 ´ j Ñ rtj ,tj`1q 2 in L pΩ, r0,T qq to tBtu. Indeed we have thatř T N tpnq N tpnq 2 j`1 2 j`1 pnq φ B dt B pnq B ds t s ds E | n ´ s| “ p t ´ sq “ p j`1 ´ q 0 tpnq j tpnq ż j“1 ż j j“1 ż j “ ‰ ÿ ÿ N 1 n n “ ptp q ´ tp qq2 Ñ 0 . 2 j`1 j j“1 ÿ Then, by Theorem 5.3 we have that T n Bs dBs “ lim Ipφnq “ lim B pnq ∆Bj . nÑ8 nÑ8 tj 0 j ż ÿ n Now, writing ∆Bj :“ B pnq ´ B pnq we have tj tj`1 2 2 2 2 n 2 n ∆ B : B B B pnq B pnq 2B pnq B pnq B pnq ∆B 2B pnq ∆B p j q “ pnq ´ pnq “ p t ´ t q ` t p t ´ t q “ p j q ` t j tj tj`1 j j`1 j j j`1 j and therefore

n 1 2 n 2 1 2 n 2 B pnq ∆Bj “ ∆pBj q ´ p∆Bj q “ BT ´ p∆Bj q . tj 2 2 j ˜ j j ¸ ˜ j ¸ ÿ ÿ ÿ ÿ 2 2 T The term on the rhs concerges to pBT ´ T q{2 in L pΩq, which therefore corresponds to 0 Bs dBs . 6. Properties of Itˆointegrals ş 2 Proposition 6.1. Let f, g P LadpΩ, r0, 8q, λ P R, then i) Linearity: 8 8 8 pλfs ` gsqdBs “ λ fs dBs ` gs dBs , (4.10) ż0 ż0 ż0 ii) Separability: for all S ą 0, 8 S 8 fs dBs “ fs dBs ` fs dBs , ż0 ż0 żS iii) Mean 0: 8 E fs dBs “ 0 , (4.11) „ż0  46 iv) Itˆoisometry: 8 2 8 2 E fs dBs “ E fs dt . (4.12) «ˆż0 ˙ ff ż0 “ ‰ Proof. For the proof of points i) and ii) we refer to [9]. 2 To prove iv), for f P LadpΩ, r0, 8q, let tφnu P S2 be a sequence of elementary stochastic processes 2 2 converging to f in LadpΩ, r0, 8q. By Theorem 5.3, there exists X P L2pΩq with E pX ´ Ipφnqq Ñ 0 as n Ñ 8. Since X2 , Ipφ q2 ă 8 using Cauchy-Schwartz (or H¨older)inequality we write E E n “ ‰ 2 2 |E I“pφnq‰ ´“ E X ‰| “ |E rpX ´ IpφnqqpX ` Ipφnqqs | “ ‰ “ ‰ ď |E rpX ´ IpφnqqIpφnqs ` E rpX ´ IpφnqqXs | 2 2 2 ď E rX s ` E rIpφnq s E rpX ´ Ipφnqq s , 2 with the term on the right hand side aE rpX ´ Ipφnqq s “ }Xa´ Ipφnq}L2pωq Ñ 0 as n Ñ 8 . This implies that in the limit n Ñ 8 we have a 2 2 E Ipφnq Ñ E X . At the same time, “ ‰ “ ‰ 8 8 2 2 2 EIpφnq “ Epφnptqq dt Ñ Epfptq q dt . ż0 ż0 Combining these facts produces 8 2 2 EpX q “ Epfptq q dt ż0 as desired. The exact same logic produces EX “ 0.  7. A continuous in time version of the the Itˆointegral In this section, we consider the

Definition 7.1. For any f P LadpΩ ˆ r0, 8qq and any t ě 0, we define the Itˆointegral process tItpfqu as t Itpfq “ fspωqdBspωq :“ Ipf1p0,tsq . ż0 Note that the process introduced above is well defined and adapted to Ft. In fact since t 8 2 2 2 EItpfq “ Eft dt ď Eft dt ă 8 ż0 ż0 we see that Itpfq is an adapted stochastic process whose second moment is uniformly bounded in time. It is not immediately clear that we can fix the realization ω (which in turn fixes the realization t of f and W ) and then change the time t in Itpfqpωq “ 0 fspωqdBspωq. We built the integral as some limit at each time t. Changing the time t requires us to repeat the limiting process. This would be fine except that we built the Itˆointegral throughş an L2-limit. Hence at each time it is only defined up to sets of probability zero. If we only want to define It for some countable sequence of time then this still would not be a problem because the countable union of measure zero sets is still measure zero. However, we would like to define Itpfq for all t P r0,T s. This is a problem and more work is required. 2 Theorem 7.2. Let f P LadpΩ, r08qq, then there exists a continuous version of Itpfq, i.e., there exists a t-continuous stochastic process tJtu on pΩ, F, Pq such that for all t P T P rJt “ Itpfqs “ 1 . 47 To prove this result, we state a very useful theorem which we will prove later.

Theorem 7.3 (Doob’s Martingale Inequality). Let Mt be a continuous-time martingale with respect to the filtration Ft. Then p Er|MT | s P sup |Mt| ě λ ď p 0ďtďT λ ´ ¯ where λ P R`, and p ě 1.

Proof of Theorem 7.2. By Lemma 4.8 we know that Itpφ, ωq is continuous if φ is a simple 2 function. If we had a sequence of simple functions φn converging to f in L , we would like to “transfer” the continuity of φ to f. To do so we use the following fact: “ The uniform limit of continuous functions is continuous”. In other words, if fn is continuous and suptPr0,T s |fnptq ´ fptq| Ñ 0 as n Ñ 8 then f is continuous. t To do so, let Itpφnq “ 0 φpsqdBs and consider a Cauchy sequence tφnu, for which we have 1 sup ş|I pφ q ´ I pφ q| ą ε ď |I pφ q ´ I pφ q|2 P t n t m ε2 E T n T m „0ďtďT  1 “ T ‰ ď pφ psq ´ φ psqq2 ds ε2 E n m „ż0  The last term goes to zero as n, m Ñ 8. Hence we can find a subsequence tnku so that 1 1 sup |I pφ q ´ I pφ q| ą ď P t nk`1 t nk k2 2k „0ďtďT  If we set 1 A “ sup |I pφ q ´ I pφ q| ą k t nk`1 t nk k2 "0ďtďT * ´k then k PrAks ď k 2 ă 8. Hence the Borel-Cantelli lemma tells us that there is an random Npωq so that ř ř 1 n N ω sup I φ I φ ą p q ùñ | tp nk`1 q ´ tp nk q| ď 2 0ďtďT k pkq pkq If we set Jt “ Itpφnk q then the tJt u form a Cauchy sequence in the sup norm (|f|sup “ pkq suptPr0,T s |fptq|). Since the convergence is uniform in t and each Jt is continuous, we know that for pkq almost every ω the limit point limkÑ8 Jt “ Jt is also continuous in t. Finally, since by assumption pkq t 2 we also have Jt Ñ Itpfq “ 0 fspωqdBspωq in L , we have that ş t fspωqdBspωq “ Jtpωq a.s. , ż0 as required. 

8. An Extension of the ItˆoIntegral

T 2 Up until now we have only considered the Itˆointegral for integrands f such that E 0 f ds ă 8. However it is possible to make sense of T f ω dB ω if we only know that 0 sp q sp q ş T ş 2 P |fspωq| ds ă 8 “ 1 . „ż0  48 T Most of the previous properties hold. In particular 0 fspωqdBspωq is a perfectly fine random variable which is almost surely finite. However, it is not necesarily true that T f ω dB ω 0. ş Er 0 sp q sp qs “ Which in turn means that T f ω dB ω need not be a martingale. (In fact it is what is called a 0 sp q sp q ş .) By obvious reasons, the Itˆoisometry property (4.12) may not hold in this case. ş Example 8.1. 9. ItˆoProcesses

Let pΩ, F, Pq be the canonical probability space and let Ft be the σ-algebra generated by the Brownian motion Btpωq.

Definition 9.1. Xtpωq is an Itˆoprocess if there exist stochastic processes fpt, ωq and σpt, ωq such that

i) fpt, ωq and σpt, ωq are Ft-measureable , t t 2 ii) 0 |f| ds ă 8 and 0 |σ| ds ă 8 almost surely , iii) şX0pωq is F0-measurableş , iv) With probability one the following holds t t Xtpωq “ X0pωq ` fspωq ds ` σspωqdBspωq (4.13) ż0 ż0 The processes fpt, ωq and σpt, ωq are referred to as drift and diffusion coeficients of Xt. For brievity, one often writes (4.13) as

dXtpωq “ ftpωq dt ` σtpωqdBtpωq But this is just notation for the integral equation above!

49

CHAPTER 5

Stochastic Calculus

This section introduces the fundamental tools for the computation of stochastic integrals. Indeed, similarly to what is done in classical calculus, stochastic integrals are rarely computed by applying the definition of Itˆointegral from the previous chapter. In the case of classical calculus, instead of applying the definition of one usually computes fpxqdx by applying the fundamental theorem of calculus and choosing d F pxq “ fpxq such that dx ş d fpxqdx “ F pxqdx “ F pxq . (5.1) dx ż ż Even though, as we have seen in the previous section differentiation in this framework is not possible, it is possible to obtain a similar result for Itˆointegrals. In the following chapter we will introduce such a formula (called the Itˆoformula) allowing for rapid computation of stochastic integrals.

1. Itˆo’sFormula for Brownian motion We first introduce the Itˆoformula for the Brownian motion process. 2 Theorem 1.1. Let f P C pRq (the set of twice continuously differentiable functions on R) and Bt a standard Brownian motion. Then for any t ą 0, t 1 t fpB q “ fp0q ` f 1pB qdB ` f 2pB q ds . t s s 2 s ż0 ż0 To prove this theorem, we first state the following partial result, proven at the end of the section. n n Lemma 1.2. Let g be a continuous function and Γ :“ ttk : k “ 1,...,Npnqu be a sequence of partitions of r0, ts such that t0 “ 0 and tN “ t n n n |Γ | :“ sup |ti`1 ´ ti | Ñ 0 as n Ñ 8 . (5.2) i Then N´1 2 t n gpξ q B n ´ B n Ñ gpB q ds , k tk tk`1 s k“0 ż0 n ÿ ´ ¯ for any choice of ξ P pB n ,B n q . k tk tk`1 Proof of Theorem 1.1. Without loss of generality we can assume that f and its first two are bounded. After establishing the result for such functions we can approximate any function by such a sequence and pass to the limit to obtain the general result. n Let ttk : k “ 1,...,Npnqu be a sequence of partitions of r0, ts such that t0 “ 0 and tN “ t N n n |Γ | :“ sup |ti`1 ´ ti | Ñ 0 as n Ñ 8 . i Now for any level n, Npnq

fpBtq ´ fp0q “ fpBtk q ´ fpBtk´1 q . (5.3) k“1 ÿ ´ ¯ 51 Taylor’s Theorem implies 1 2 fpB q ´ fpB q “ f 1pB q B ´ B ` f 2pξ q B ´ B , tk tk´1 tk´1 tk tk´1 2 k tk tk´1 for some ξk P rBtk´1 ,Btk s. Returning to (5.3),` we have ˘ ` ˘ Npnq Npnq 1 2 fpB q ´ fp0q “ f 1pB q B ´ B ` f 2pξ q B ´ B . (5.4) t tk´1 tk tk´1 2 k tk tk´1 k“1 k“1 ÿ ` ˘ ÿ ` ˘ By the construction of the Itˆointegral, for the first term on the right hand side of (5.4) we have

Npnq t 1 1 f pBtk´1 q Btk ´ Btk´1 ÝÑ f pBsq dBs k“1 ż0 ÿ ` ˘ in L2 as n Ñ 8. Combining this with the result of Lemma 1.2 (proven below) with g “ f 2 for the second term on the right hand side of (5.4) we conclude the proof.  Proof of Lemma 1.2. We now want to show that Npnq t 2 An :“ gpξkq Btk ´ Btk´1 ÝÑ gpBsq ds k“1 ż0 ÿ ` ˘ in probability as n Ñ 8. We begin by showing that

Npnq t 2 Cn :“ gpBtk´1 q Btk ´ Btk´1 ÝÑ gpBsq ds , (5.5) k“1 ż0 ÿ ` ˘ in probability as n Ñ 8. First since gpBsq is continuous, we have that

Npnq t

Dn :“ gpBtk´1 qptk ´ tk´1q ÝÑ gpBsq ds . k“1 0 ÿ ż 2 as n Ñ 8. Therefore, we obtain (5.5), by showing that the term |Cn ´ Dn| converges to 0 in L pΩq as this directly implies convergence in probability. To that end observe that

Npnq 2 2 2 2 E pDn ´ Cnq “ E gpBtk´1 q p∆kt ´ ∆kBq » fi k“1 “ ‰ ÿ – fl 2 2 ` 2E gpBtk´1 qgpBtj´1 qp∆kt ´ ∆kBqp∆jt ´ ∆j Bq (5.6) » fi jăk ÿ 2 – 2 fl where ∆k :“ tk ´ tk´1 and ∆kB :“ pBtk ´ Btk´1 q . Now since 2 2 2 2 2 2 E p∆kt ´ ∆kBq “ p∆ktq ´ 2p∆ktq ` 3p∆ktq “ 2p∆ktq Considering the first term“ in (5.6) we‰ have Npnq Npnq Npnq 2 2 2 2 2 N 2 E gpBtk´1 q p∆kt ´ ∆kBq “ 2 E gpBtk´1 q p∆ktq ď 2|Γ | E gpBtk´1 q p∆ktq » fi k“1 k“1 k“1 ÿ ÿ “ ‰ ÿ “ ‰ Since– the sum converges to fl t 2 E gpBsq ds ă 8 ż0 “ ‰ 52 as n Ñ 8 and |ΓN | Ñ 0, the product goes to zero. All that remains is the second sum in (5.6). Since

2 2 E gpBtk´1 qgpBtj´1 qp∆kt ´ ∆kBqp∆jt ´ ∆j Bq 2 2 “ “ E gpB‰tk´1 qgpBtj´1 qp∆jt ´ ∆j BqE ∆kt ´ ∆kB Ftk´1 and ∆ t ´ ∆2B F “ 0, we see that the“ second sum is in fact zero. “ ˇ ‰‰ E k k tk´1 ˇ All that remains is to show that An converges to Cn. Now “ ˇ ‰ ˇ Npnq 2 |Cn ´ An| ď |gpξkq ´ gpBtk q| Btk ´ Btk´1 k“1 ÿ ` Npnq ˘ 2 ď sup |gpξkq ´ gpBtk q| Btk ´ Btk´1 . k k“1 ´ ¯ ÿ ` ˘ Since the first term goes to zero in probability as n Ñ 8 by the continuity and boundedness of gpBsq and the second term converges to the quadratic variation of Bt (which equals t), we conclude that the product converges to zero in probability.  1.1. A second look at Itˆo’sFormula. Looking back at (5.4), one sees that the Itˆointegral term in Itˆo’sFormula comes from the sum against the increments of Brownian motion. This term results directly from the first order Taylor expansion of f, and can be identified with the first order derivative term that we are used to see in the fundamental theorem of calculus (5.1). The second sum Npnq 2 2 f pξkq Btk ´ Btk´1 , k“1 ÿ ` ˘ which contains the squares of the increments of Brownian motion, results from the second order term in the Taylor expansion and is absent in the classical calculus formulation. However, since the sum Npnq 2 Btk ´ Btk´1 k“1 ÿ ` ˘ converges in probability to the quadratic variation of the Brownian motion Bt, which according to Lemma 5.3 is simply t, this term gives a nonzero contribution in the limit n Ñ 8 and should be considered in this framework. We refer to this term as the Itˆocorrection term. In light of this remark, if we let rBst denote the quadratic variation of Bt, then one can reinterpretate the Itˆo correction term 1 t 1 t f 2pB q ds as f 2pB q drBs . (5.7) 2 s 2 s s ż0 ż0 We wish to derive a more general version of Itˆo’sformula for a general Itˆoprocess Xt defined by t t Xt “ X0 ` fs ds ` gs dBs . ż0 ż0 Beginning in the same way as before, we write the expression analogous to (5.4), namely

Npnq Npnq 1 2 fpX q ´ fpX q “ f 1pX q X ´ X ` f 2pξ q X ´ X (5.8) t 0 tk´1 tk tk´1 2 k tk tk´1 k“1 k“1 ÿ ` ˘ ÿ ` ˘ 53 for some twice continuously differentiable function f and some partition ttku of r0, ts. It is reasonable to expect the first and second sums to converge respectively to the integrals t 1 t f 1pX q dX and f 2pX q drXs . (5.9) s s 2 s s ż0 ż0 The first is the Itˆostochastic integral with respect to an Itˆoprocess Xt while the second is an integral with resect to the differential of the quadratic variation rXst of the process Xt. So far this discussion has proceeded mainly by analogy to the simple Brownian motion. To make sense of the two terms in (5.9) we need to better understand the quadratic variation of an Itˆoprocess Xt and to define the concept of stochastic integral against Xt. While the former point is covered in the following section, the latter is quickly clarified by this intuitive definition:

Definition 1.3. Given an Itˆoprocess tXtu with differential dXt “ ft dt`σt dBt and an adapted stochasic process thtu such that 8 8 2 |hsfs| ds ă 8 and phsσsq ds ă 8 a.s. ż0 ż0 then we define the integral of ht against Xt as t t t hs dXs :“ hsfs ds ` hsσs dBs . (5.10) ż0 ż0 ż0 2. Quadratic Variation and Covariation We generalize the definition (3.7) of quadratic variation to the one of quadratic covariation

Definition 2.1. Let Xt,Yt be two adapted, stochastic processes. Their quadratic covariation is defined as N p j rX,Y st :“ lim XtN ´ XtN YtN ´ YtN NÑ8 j`1 j j`1 j j“0 ÿ ´ ¯ ´ ¯ p N where lim denotes a limit in proibability and ttj u is a set partitioning the interval r0, ts defined by N N N N N Γ :“ tttj u : 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tjN “ tu (5.11) N N N with |Γ | :“ supj |tj`1 ´ tj | Ñ 0 as N Ñ 8. Furtheremore, we define the quadratic variation of Xt as rXst :“ rX,Xst . (5.12) We can also speak about the quadratic variation on an interval different than r0, ts. For 0 ď s ă t we will write respectively rXss,t and rX,Y ss,t for the quadratic variation and cross-quadratic variation on the interval rs, ts. Just from the algebraic form of the pre-limiting object the quadratic variation satisfies a number of properties. Lemma 2.2. Assuming all of the objects are defined, then for any adapted, continuous stochastic processes Xt,Yt, i) for any constant c P R we have 2 rc Xst “ c rXst ii) for 0 ă s ă t we have

rXs0,s ` rXss,t “ rXs0,t (5.13)

54 iii) we have that

0 ď rXs0,s ď rXs0,t (5.14)

for t ą s ě 0. In other words, the map t ÞÑ rXst is nondecreasing a.s. . iv) we can write

rX ˘ Y st “ rXst ` rY st ˘ 2rX,Y st . (5.15) Consequently, quadratic covariations can be written in terms of quadratic variations as 1 1 rX,Y s “ rX ` Y s ´ rXs ´ rY s “ rX ` Y s ´ rX ´ Y s . (5.16) t 2 t t t 4 t t ´ ¯ ´ ¯ Proof. Parts i) and ii) are a direct consequence of Def. 2.1, while part iii) results from ii) the fact that rX,Y st is defined as a sum of squares and is therefore nonnegative: rXs0,t “ rXs0,s ` rXss,t ě rXs0,s. Part iv) is obtained by noticing that jN p 2 rX ˘ Y st “ lim pXtN ´ XtN q ˘ pYtN ´ YtN q NÑ8 j`1 j j`1 j`1 j“0 ÿ ´ ¯ jN p 2 2 “ lim XtN ´ XtN ˘ 2 XtN ´ XtN YtN ´ YtN ` YtN ´ YtN NÑ8 j`1 j j`1 j j`1 j j`1 j j“0 ÿ ˆ´ ¯ ´ ¯ ´ ¯ ´ ¯ ˙ “ rXst ˘ 2rX,Y st ` rY st , while (5.16) is obtained by rearranging the terms of the above result (for the first inequality) and by using it to compute rX ` Y st ` rX ´ Y st (for the second).  In the following sections we will see that the quadratic variation of Itˆointegrals acquires a particularly simple form. We will show this by first considering quadratic variations of Itˆointegrals and then extend this result to Itˆoprocesses. 2.1. Quadratic Variation of an ItˆoIntegral. B 8 2 Lemma 2.3. Let σt be a process adapted to the foltration tFt u and such that 0 σs ds ă 8 a.s.. Then defining M :“ I pσq “ t σ dB we have that t t 0 s s ş t ş 2 rMst “ σs ds (5.17) ż0 2 or in differential notation drMst “ σt dt.

Proof of Lemma 2.3. It is enough to prove (5.17) when σs is an elementary stochastic process in S2. The general case can then be handled by approximation as in the proof of the Itˆoisometry. Hence, we assume that K

σt “ αj´11rtj´1,tj qptq (5.18) j“1 ÿ where the αk satisfy the properties required by S2 and K is some integer. Without loss of generality we can assume that t is the right endpoint of our interval so that the partition takes the form

0 “ t0 ă t1 ă ¨ ¨ ¨ ă tK “ t

Now observe that if rs, rs Ă rtj´1, tjs then r στ dBτ “ αj´1 Bs ´ Br żs ` ˘ 55 pnq Hence ts` u is a sequence of partitions of the interval rtj´1, tjs so that pnq pnq pnq tj´1 “ s0 ă s1 ă ¨ ¨ ¨ ă sNpnq “ tj

N pnq pnq and |Γ | “ sup` |s`´1 ´ s` | Ñ 0 as n Ñ 8. Then the quadratic variation of Mt on the interval rtj´1, tjs is the limit n Ñ 8 of Npnq Npnq 2 2 2 pMs`´1 ´ Ms` q “ αj´1 pBs`´1 ´ Bs` q . `“1 `“1 ÿ ÿ Since the summation on the right hand side limits to the quadratic variation of the Brownian motion B on the interval rtj´1, tjs which we know to be tj ´ tj´1 we conclude that 2 rMstj´1,tj “ αj´1ptj ´ tj´1q . Since the quadratic variation on disjoint intervals adds, we have that

K K t 2 2 rMst “ rMstj´1,tj “ αj´1ptj ´ tj´1q “ σs ds j“1 j“1 0 ÿ ÿ ż where the last equality follows from the fact that σs takes the form (5.18). As mentioned at the start, the general form follows from this calculation by approximation by functions in S2.  Remark 2.4. The proof of the above result in Klebaner (Theorem 4.14 on pp. 106) has a subtle issue . When bounding n´1 n´1 2 n n 2 2 n n 2 g B n t t 2δ g B n t t , p ti qp i`1 ´ i q ď nE p ti qp i`1 ´ i q i“0 « i“0 ff ÿ ÿ n´1 2 n n Klebaner asserts that as n Ñ 8, δn “ |Γn| Ñ 0, rg pB 2 qspt ´ t q would stay finite, and thus their i“0 E ti i`1 i n´1 2 n n product would go to 0. However, the finiteness of rg pBt2 qspt ´ t q is unjustified. In fact, if it were ř i“0 E i i`1 i t 2 finite, it must converge to 0 Erg pBsqsds (Riemann sum). However, this integral might be infinity for certain 2 ř choice of g, for example, g x ex (see Example 4.5 on pp. 99 of [Klebaner]). The proof here uses the ş p q “ same computation of second moment but only for “nice” functions (i.e., those with compact support). The convergence in probability (note: this is weaker than convergence in L2) for general continuous functions is established using approximation. The stopping rules for Itˆointegral are needed here, but we defer it to the later part of the course. We now consider the quadratic covariation of two Itˆointegrals with respect to independent Brownian motions.

Lemma 2.5. Let Bt,Wt two independent Brownian motions, and fs, gs two stochastic processes, 8 2 8 2 all adapted to the underlying filtration Ft and such that 0 fs ds, 0 gs ds ă 8 almost surely. We define ş ş t t Mt :“ fs dBs and Nt :“ gs dWs . ż0 ż0 Then, for all t ě 0 one has

rN,Mst “ 0 . (5.19)

Proof of (5.19). Again without lost of generality it is enough to prove the result for σt and gs in S2. We can further assume that both functions are defined with respect to the same partition

0 “ t0 ă t1 ă ¨ ¨ ¨ ă tK “ t

56 Since as observed in (5.13), the quadratic variation on disjoint intervals adds, we need only show that rN,Mstj´1,tj “ 0 on any of the partition intervals rtj´1, tjs.

Fixing such an interval rtj´1, tjs, we see that rN,Mstj´1,tj “ σtj´1 gtj´1 rW, Bstj´1,tj . The easiest way to see this is to use the “” equality (5.16)

2 W, B W?`B , W?`B W?´B , W?´B t t t t 0 r stj´1,tj “ r 2 2 stj´1,tj ´ r 2 2 stj´1,tj “ p j ´ j´1q ´ p j ´ j´1q “ since W?`B and W?´B are standard Brownian motions and hence have quadratic variation the length 2 2 of the time interval.  Remark 2.6. One can also prove the above result more directly by following the same argument as in Theorem 5.3 of Chapter 3. The key calculation is to show that the expected value of the approximating quadratic variation is 0 and not the length of the time interval as in the proof of Theorem (5.3) of Chapter 3. For any partitions ts`u of rtj´1, tjs one has to zero we have

E pBs` ´ Bs`´1qpBs` ´ Bs`´1q “ EpBs` ´ Bs`´1qEpBs` ´ Bs`´1q “ 0 ` ` ÿ ÿ 2.2. Quadratic Variation of an ItˆoProcess. In this section, using the results presented above, we finally obtain a simple expression for the quadratic variation of an Itˆoprocess.

Lemma 2.7. If Xt is an Itˆoprocess with differential dXt “ µt dt ` σt dBt , then t 2 2 rXst “ rIpσ qst “ σs ds , (5.20) ż0 2 or equivalently drXst “ σt dt. By comparing this result with (5.17) we notice that the only contribution to the quadratic variation process comes from the Itˆointegral. The following result will be useful in the proof of (5.20).

Lemma 2.8. Let Xt and Yt be adapted stochastic processes, such that Xt is continuous a.s. and Yt has trajectories with finite first variation (V1rY sptq ă 8) then rX,Y st “ 0 a.s.. Before we give the proof of Lemma 2.8 we observe that it immediately yields (5.20).

t t Proof of Lemma 2.7. Defining Ft “ 0 µs ds and Mt “ 0 σsdBs, observe that Xt “ Ft ` Mt and that Ft is continuous and of finite first variation almost surely. Hence rF st “ 0. Since Mt is ş ş continuous a.s., we have that rM,F st “ 0 almost surely. Hence t 2 rXst “ rF st ` 2rF,Mst ` rMst “ rMst “ σs ds . ż0  N N Proof of Lemma 2.8 . Let Γ :“ tti : i “ 0, . . . , iN u be a sequence of partitions of r0, ts N N N such that |Γ | “ supi |ti`1 ´ ti | Ñ 0 as N Ñ 8. Now

iN iN

pXti ´ Xti´1 qpYti ´ Yti´1 q ď sup |Xti ´ Xti´1 | |Yti ´ Yti´1 | i ˇ i“1 ˇ ´ ¯ i“1 ˇ ÿ ˇ ÿ The summationˇ on the right hand side is boundedˇ from above by the first variation of Yt which by assumption is finite a.s. On the other hand, as n Ñ 8 the supremum goes to zero since |ΓN | Ñ 0 and Xt is a.s. continuous.  57 Remark 2.9. Similarly to the formal considerations in Section 1.1, we may think of the 2 differential of the quadratic variation process drXs as the limit of the difference term pX n ´X n q , t tk`1 tk 2 which in turn is the square of the differential of brownian motion pdXtq . Therefore, formally speaking, we can obtain the result of the previous lemma by writing 2 2 drXst “ pdXtq “ pµt dt ` σt dBtq 2 2 2 2 “ µt pdtq ` 2µtσtpdtqpdBtq ` σt pdBtq 2 “ σt dt . 2 2 where we have applied pdtq “ pdtqpdBtq “ 0 (cfr. Lemma 2.8) and pdBtq “ dt (cfr. Lemma 2.3). These formal multiplication rules are summarized in the following table: By the same formal

ˆ dt dBt dt 0 0 dBt 0 dt arguments, such rules apply to the computation of the quadratic covariation of two Itˆoprocesses 1 1 Yt “ µt dt ` σtdBt: 1 1 drX,Y st “ pdXtqpdYtq “ pµt dt ` σt dBtqpµt dt ` σt dBtq 1 2 1 1 1 2 “ µtµtpdtq ` µtσtpdtqpdBtq ` µtσtpdtqpdBtq ` σtσtpdBtq (5.21) 1 “ σtσt dt . This result can be verified by going through the steps of the proof of the above lemmas.

3. Itˆo’sFormula for an ItˆoProcess Using the lessons learned in the previous sections, we now proceed to compute the infinitesimal of (5.8) as 1 dfpX q “ f 1pX qdX ` f 2pX qpdX q2 . (5.22) t t t 2 t t Of course this is just notation for the integral equation t 1 t fpX q ´ fpX q “ f 1pX qdX ` f 2pX qpdX q2 . t 0 s s 2 s s ż0 ż0 The first integral is simply the integral against an Itˆoprocess as we have already discussed. In light 2 2 2 of Remark 2.9, we should interpret drXts “ pdXtq “ pµt dt ` σt dBtq “ σt dt. Hence 1 t 1 t f 2pX qpdX q2 “ f 2pX qσ2 ds 2 s s 2 s s ż0 ż0 This formal calculation (which is correct) leads us to suggest the following general Itˆoformula.

Theorem 3.1. Let Xt be the Itˆoprocess given by

dXt “ µt dt ` σt dBt If f is a C2 function then t 1 t fpX q “ fpX q ` f 1pX qdX ` f 2pX qdrXs t 0 s s 2 s s ż0 ż0 t t 1 t “ fpX q ` f 1pX qµ ds ` f 1pX qσ dB ` f 2pX qσ2 ds 0 s s s s s 2 s s ż0 ż0 ż0 58 Proof of Theorem 3.1. Without loss of generality, we can assume that both µt and σt are adapted elementary stochastic processes satisfying the assumptions required by an Itˆoprocess. Furthermore, by inserting partition points if needed, we can assume that they are both defined on the same partition

0 “ t0 ă t1 ă ¨ ¨ ¨ ă tN “ t . Since N

fpXtq ´ fpX0q “ fpXt` q ´ fpXt`´1 q `“1 ÿ we need only prove Itˆo’sformula for fpXt` q ´ fpXt`´1 q. Now since Xt is constant for t P rt`´1, t`q we can take ξ “ Xt`´1 and for rr, ss Ă rt`´1, t`s we have s s

Xs ´ Xr “ µτ dτ ` στ dBτ “ µt`´1 ps ´ rq ` σt`´1 pBs ´ Brq . żr żr pnq N pnq Let tsi : i “ 0,...,Kpnqu be a sequence of partitions of rt`´1, t`s such that |Γ | “ supi |si ´ pnq si´1| Ñ 0 as n Ñ 8. Now using Taylor’s theorem we have 1 fpX q ´ fpX q “ f 1pX qpX ´ X q ` f 2pξ qpX ´ X q2 . sj sj´1 sj´1 sj sj´1 2 j sj sj´1 for some ξj P pXsj´1 ,Xsj q. Hence we have

Kpnq

fpXt` q ´ fpXt`´1 q “ fpXsj q ´ fpXsj´1 q j“1 ÿ K n K n p q 1 p q 1 “ f 1pX qpX ´ X q ` f 2pξ qpX ´ X q2 “ pIq ` pIIq sj´1 sj sj´1 2 j sj sj´1 2 j“1 j“1 ÿ ÿ Since X X 2 µ2 s s 2 2µ σ s s B B σ2 B B 2 p sj ´ sj´1 q “ t`´1 p j ´ j´1q ` t`´1 t`´1 p j ´ j´1qp sj ´ sj´1 q ` t`´1 p sj ´ sj´1 q we have Kpnq Kpnq 1 1 pIq “ µt`´1 f pXsj´1 qps ´ rq ` σt`´1 f pXsj´1 qpBs ´ Brq “ pIaq ` pIbq j“1 j“1 ÿ ÿ and Kpnq Kpnq II µ2 f 2 ξ s s 2 2µ σ f 2 ξ s s B B p q “ t`´1 p jqp j ´ j´1q ` t`´1 t`´1 p jqp j ´ j´1qp sj ´ sj´1 q j“1 j“1 ÿ ÿ Kpnq σ2 f 2 ξ B B 2 IIa IIb IIc ` t`´1 p jqp sj ´ sj´1 q “ p q ` p q ` p q j“1 ÿ As n Ñ 8, it is clear that

t` t` 1 1 pIaq ÝÑ f pXsqbs ds, and pIbq ÝÑ f pXsqσsdBs . żt`´1 żt`´1 59 Using the same arguments as in Theorem 1.1 we see that

t` t` IIc σ2 f 2 X ds σ2f 2 X ds . p q ÝÑ t`´1 p sq “ s p sq żt`´1 żt`´1 All that remains is to show that pIIaq and pIIbq converge to zero as n Ñ 8. Observe that Kpnq Kpnq IIa µ2 ΓN f 2 ξ s s and IIb 2µ σ ΓN f 2 ξ B B . |p q| ď t`´1 | | p jqp j ´ j´1q |p q| ď t`´1 t`´1 | | p jqp sj ´ sj´1 q j“1 j“1 ÿ ÿ 2 2 N Since the two sums converge to f pXsq ds and f pXsqdBs respectively the fact that |Γ | Ñ 0 implies that pIIaq and pIIbq converge to zero as n Ñ 8. Putting all of these results together ş ş produces the quoted result.  Remark 3.2. Notice that the “multiplication table” given in Remark 2.9 is reflected in the 2 details of the proof of Theorem 3.1. Each of the terms in pdXtq correspond to one of the terms 2 labeled pIIq which came from pXpsjq ´ Xpsj´1qq in the Taylor’s theorem expansion. The term pIIaq which corresponds to p dtq2 limits to zero as the multiplication table indicates. The term pIIbq which corresponds to pdtqpdBtq also tends to zero again as the table indicates. Lastly, pIIcq which 2 corresponds to p dBtq limits to an integral against pdtq as indicated in the table. Example 3.3. Consider the stochastic process with differential 1 dX “ X dB ` X dB . t 2 t t t t The above process is an example of a geometric Brownian motion, a process widely used in finance to model the price of a stock. We apply Itˆoformula to the function fpxq :“ log x. Using that ´1 2 ´2 Bxfpxq “ x and Bxxfpxq “ ´x we obtain 1 1 1 1 1 1 1 d log X “ dX ´ drXs “ X dB ` X dB ´ X2 dt “ dB . t X t 2 X2 t X 2 t t t t 2 X2 t t t t t ˆ ˙ t ` ˘ Bt In the integral form the above can be written as log Xt “ log X0 ` Bt and therefore Xt “ X0e .

4. Full Multidimensional Version of ItˆoFormula We now give the full multidimensional version of Itˆo’sformula. We will include the possibility that the function depends on time and that there is more than one Brownian motion. We begin with the definition of a multidimensional Itˆoprocess. d Definition 4.1. A stochastic process Xt “ pX1ptq,...,Xdptqq P R is an Itˆoprocess if each of the coordinate process Xiptq is a one-dimensional Itˆoprocess. m Let µiptq and σijptq be adapted stochastic processes and tBjptquj“1 be m mutually independent, standard Brownian motions such that for i “ 1, . . . , d we can write each component aof the d-dimensional Itˆoprocess as m dXiptq “ µiptq dt ` σijptq dBjptq . (5.23) j“1 ÿ If we collect the Brownian motions into one m-dimensional brownian motion Bt “ pB1ptq,...,Bmptqq d and define the R -valued process µt “ pµ1ptq, . . . , µdptqq and the matrix valued process σt whose matrix elements are the σijptq then we can write

dXt “ µt dt ` σt dBt . (5.24)

60 d While this is nice and compact, it is perhaps more suggestive to define the R -valued processes pjq σ “ pσ1,j, . . . , σn,jq for j “ 1, . . . , m and write d pjq dXt “ µt dt ` σt dBjptq . (5.25) j“1 ÿ This emphasizes that the process Xt at each moment of time is pushed in the direction which µt pjq points and given m random kicks, in the directions the σt point, whose magnitude and sign are dictated by the Brownian motions Bjptq. d We now want to derive the process which describes evolution of F pXtq where F : R Ñ R. In other words, the multidimensional Itˆoformula. We begin by developing some intuition. Recall Lemma 2.5 stating that the cross-quadratic variation of independent Brownian motions is zero. Hence if Bt and Wt are independent standard 2 Brownian motions then the multiplication table for pdXtq and pdYtqpdXtq if Xt and Yt are two Itˆo processes is given in the following table.

ˆ dt dBt dWt dt 0 0 0 dBt 0 dt 0 dWt 0 0 dt Table 1. Formal multiplication rules for differentials of two independent Brownian motions

d 2 d Theorem 4.2. Let F : R Ñ R be a function such that F pxq P C in x P R . If Xt is as above then n BF 1 n n BF dF pXtq “ pXtqdXiptq ` pXtq drXi,Xkst (5.26) Bxi 2 BxiBxk i“1 i“1 k“1 ÿ ÿ ÿ Furthermore one has d n BF n n BF pXtqdrXi,Xksptq “ pXtqaikptq dt (5.27) BxiBxk BxiBxk i“1 k“1 i“1 k“1 ÿ ÿ ÿ ÿ where d aikptq “ σijptqσkjptq j“1 ÿ The matrix aptq can be written compactly as σptqσptqT . The matrix a is often called the diffusion matrix. We will only sketch the proof of this version of Itˆoformula since it follows the same logic of the others already proven. Proofs can be found in many places including [13, 7, 3].

Sketch of proof. Similarly to the proof of Theorem 3.1 we introduce the family of partitions ΓN of the interval r0, ts as in (8.7) with limnÑ8 |ΓN | “ 0 and expand in Taylor the function f in each of these intervals: N d B F pXtq ´ F pX0q “ F pXs`´1 qpXi ps`q ´ Xi psj´1qq Bxi `“1 i“1 ÿ! ÿ 61 1 d B2 ` F pξ qpXi ps q ´ Xi ps qqpXj ps q ´ Xj ps qq 2 Bx Bx ` ` `´1 ` `´1 i,j“1 i j ÿ ) N 1 “ tpIq ` pIIq u , ` 2 ` `“1 ÿ d for ξ` P i“1rXips`q,Xips``1qs . For the first order term, it is straightforward to generalize the proof of Theorem 3.1 to obtain that Ś p N t d B lim pIq` “ F pXsq dXipsq . NÑ8 Bxi `“1 0 i“1 ÿ ż ÿ We formally recover the expression of the second order term by combining (5.21) with the rules of Table 1: p N t d B2 lim pIIq` “ F pXsqpdXipsqqpdXjpsqq NÑ8 BxiBxj `“1 0 i,j“1 ÿ ż ÿ t d B2 m “ F pXsq pσikptqdBkpsqqpσjlptqdBlpsqq BxiBxj 0 i,j“1 k,l“1 ż ÿ ÿ t d B2 m “ F pXsq σikptqσjkptq dt , BxiBxj 0 i,j“1 k“1 ż ÿ ÿ where in the second equality we have used that pdtqpdBjptqq “ 0 and in the third that pdBiptqqpdBjptqq “ 0 for i ‰ j. Note that one should check that when taking the limit in the first equality F pξ`q can be replaced by F pXs` q. This can be done by reproducing the proof of Lemma 1.2. 

Remark 4.3. The fact that (5.27) holds requires that Bi and Bj are independent if i ‰ j. However, (5.26) holds even when they are not independent. Theorem 4.2 is for a scalar valued function F . However by applying the result to each coordinate function d d p Fi : R Ñ R of the function F : R Ñ R with F “ pF1,...,Fpq we obtain the full multidimensional version. Instead of writing it again in coordinates, we take the opportunity to write a version aligned with the perspective and notation of (5.26). Recalling that the of F in the direction ν P Rd at the point x P Rd is p d F px ` ενq ´ F pxq BFk DF pxqrνs “ lim “ p∇F ¨ νqpxq “ νiek εÑ0 ε Bxi k“1 i“1 ÿ ÿ p d where ek is the k-th unit vector of R . Similarly, the second directional derivative at the point x P R in the directions ν, η P Rd is given by p d d 2 2 DF px ` εηqrνs ´ DF pxqrνs B Fk D F pxqrν, ηs “ lim “ νiηjek εÑ0 ε Bxixj k“1 i“1 j“1 ÿ ÿ ÿ Then in the notation of (5.26), Theorem 4.2 can be rewritten as

d piq dF pXptqq “ DF pXptqqrfptqs dt ` DF pXptqqrσ ptqs dBiptq i“1 ÿ 1 d d ` D2F pXptqqrσpiqptq, σpjqptqsdrB ,B sptq 2 i j i“1 j“1 ÿ ÿ 62 If Bi and Bj are assumed to be independent if i ‰ j, then d 1 d dF pXptqq “ DF pXptqqrfptqs dt ` DF pXptqqrσpiqptqs dB ptq ` D2F pXptqqrσpiqptq, σpiqptqs dt . i 2 i“1 i“1 ÿ ÿ We now consider special cases of Theorem 3.1 that will be helpful in practice. The first describes evolution of a function F px, tq that depends explicitely on time: d 2 d 1 Corollary 4.4. Let F : R ˆ r0, 8q Ñ R such that F px, tq is C in x P R and C in t P r0, 8q. Furthermore, let Xt be a d-dimensional Itˆoprocess as in (5.23). Then BF n BF 1 n n BF dF pXt, tq “ pXt, tq dt ` pXt, tqdXiptq ` pXt, tq drXi,Xkst . (5.28) Bt Bxi 2 BxiBxk i“1 i“1 k“1 ÿ ÿ ÿ where a is defined as in (5.27). Proof. In this proof we will make use of the “multiplication table” at the beginning of this section. Consider the d ` 1-dimensional process X¯t :“ pX1ptq,...,Xdptq,Ytq for Yt given by dYt “ dt ` 0 dBt. Then by applying Itˆo’sFormula Theorem 5.26 and the fact that Yt is of finite variation (so drZ,Y st “ 0 for continuous Zt) we obtain d dF pXtq “ BiF pXt, tqdXiptq ` BtF pXt, tqpdt ` 0 dBtq i“1 ÿ 1 d d ` B2 F pX , tqdrX ,X s ` 2 B B F pX , tqdrX ,Y s ` B2F pX , tqdrY ,Y s 2 ij t i j t i t t i t t t t t ˜i,j“1 i“1 ¸ ÿ ÿ d 1 d “ B F pX , tqdX ptq ` B F pX , tqdt ` B2 F pX , tqdrX ,X s . i t i t t 2 ij t i j t i“1 i,j“1 ÿ ÿ Note that the existence of the in t is not needed in the above formula and can therefore be dropped. 

Corollary 4.5. Let Xt,Yt be two Itˆoprocesses. Then

dpXtYtq “ YtdXt ` XtdYt ` drX,Y st . (5.29) This result is known as stochastic formula. 2 Proof. Let F : R Ñ R with F px, yq “ x ¨ y, then since 2 2 2 BxF px, yq “ y , ByF px, yq “ x , BxxF px, yq “ ByyF px, yq “ 0 , Bxy “ 1 , by Itˆo’sFormula Theorem 5.26 we have

dpXtYtq “ dF pXt,Ytq “ YtdXt ` XtdYt ` 1drX,Y st . (5.30)  Example 4.6. We compute the stochastic integral t sdBs . ż0 Applying the integration by parts formula (5.30) for dXt “ dBt, dYt “ dt we obtain

dptBtq “ tdBt ` Btdt ` drB, tst .

63 Since Yt is of finite variation we have drB,Y st “ 0, and by integrating and rearranging the terms we obtain t t t t sdBs “ dpsBsq ´ Bsds “ tBt ´ Bsds . ż0 ż0 ż0 ż0 2,1 Example 4.7. Assume that fpx, tq P C pR, R`q satisfies the pde B B2 fpx, tq ` fpx, tq “ 0 , Bt Bx2 2 and E fpBt, tq ă 8, then have that “ ‰ 1 dfpB , tq “ B fpB , tq dt ` B fpB , tq dt ` B2 fpB , tq drBs t t t x t 2 xx t t 1 “ B ` B2 fpB , tq dt ` B fpB , tq dB “B fpB , tqdB . t 2 xx t x t t x t t ˆ ˙ t Therefore fpBt, tq “ fp0, 0q ` 0 BxfpBs, sq dBs is a martingale. ş 5. Collection of the Formal Rules for Itˆo’sFormula and Quadratic Variation We now recall some of the formal calculations, bringing them all together in one place. We consider a probability space pΩ, F, Pq with a filtration Ft. We assume that Bt and Bt are independent standard Brownian motions adapted to the filtration Ft. 2 For any ρ P r0, 1s, Zt “ ρBt ` 1 ´ ρ Bt is again a standard Brownian motion. Furthermore

2 2 2 rZst “ rZ,Zst “aρ rB,Bst ` 2ρ 1 ´ ρ “ rB,Bst ` p1 ´ ρ qrB,Bst “ ρ2t ` 0 ` p1 ´ ρ2qt “ t a

Or in the formal differential notation, drZst “ dt. This result can be understood by using the formal multiplication table for differentials which formally states:

2 2 2 2 2 2 2 2 drZst “ pdZtq “ pρdBt ` 1 ´ ρ dBtq “ ρ pdBtq ` 2ρ 1 ´ ρ dBtdBt ` p1 ´ ρ qp dBtq “ ρ2 dt ` 0 ` p1 ´ ρ2aq dt “ dt a Similarly, one has

2 2 drZ,Bst “ pdZtqpdBtq “ ρpdBtq ` 1 ´ ρ p dBtqpdBtq “ ρ dt ` 0 “ ρ dt 2 2 2 2 drZ,Bst “ pdZtqp dBtq “ ρpdBtqp dBatq ` 1 ´ ρ p dBtq “ 0 ` 1 ´ ρ dt “ 1 ´ ρ dt

Now let σt and gt be adapted stochastic processesa (adapted to Ft) witha a t t 2 2 σs ds ă 8 and gs ds ă 8 ż0 ż0 almost surely. Now define

dMt “ σt dBt dNt “ gt dBt

dUt “ σtdBt dVt “ σtdZt

t Of course these are just formal expression. For example, dMt “ σt dBt means Mt “ M0 ` 0 σsdBs. Using the multiplication table from before we have ş 2 2 2 2 2 2 2 2 drMst “ pdMtq “ σt p dBtq “ σt dt drUst “ pdUtq “ σt pdBtq “ σt dt 2 2 2 2 2 2 2 2 drNst “ pdNtq “ gt p dBtq “ gt dt drV st “ pdVtq “ σt pdZtq “ σt dt

64 and the cross-quadratic variations 2 drM,Nst “ pdMtqpdNtq “ σtgtp dBtq “ σtgt dt 2 drM,Ust “ pdMtqpdUtq “ σt p dBtqpdBtq “ 0 2 2 2 2 2 2 drM,Zst “ pdMtqpdZtq “ ρσt p dBtqpdBtq ` 1 ´ ρ σt p dBtq “ 1 ´ ρ σt dt Next we define a a

dHt “ µt dt and dKt “ ft dt and observe that since Ht and Kt have finite first variation we have that 2 2 2 2 2 2 drHst “ pdHtq “ µt p dtq “ 0 and drKst “ pdKtq “ ft p dtq “ 0

Furthermore if Xt “ Ht ` Mt and Yt “ Kt ` Nt then using the previous calculations 2 drXst “ drX,Xst “ drH ` M,H ` Mst “ drHst ` drMst ` 2drH,Mst “ σt dt

drX,Y st “ drH ` M,K ` Nst “ drH,K ` Nst ` drM,K ` Nst “ drM,Nst “ σtgt dt or using the formal algebra 2 2 2 2 2 2 2 drXst “ pdXtq “ µt p dtq ` 2µtσtp dtqp dBtq ` σt p dBtq “ 0 ` 0 ` σt dt “ σt dt 2 2 drX,Y st “ pdXtqpdYtq “ µtftp dtq ` σtftp dtqp dBtq ` gtftp dtqp dBtq ` σtgtp dBtq

“ 0 ` 0 ` 0 ` σtgt dt “ σtgt dt

65

CHAPTER 6

Stochastic Differential Equations

1. Definitions

Let pΩ, F, Pq be a probability space equipped with a filtration tFtuT . Let Bt “ pB1ptq,...,Bmptqq P m m R be a m-dimensional Brownian motion with tBjptquj“1 a collection of mutually independent Brownian motions such that Bjptq P Ft and for any 0 ď s ă t, Bjptq ´ Bjpsq is independent of Fs. B Obviously, these conditions are satisfied by the natural filtration tFt uT . d d d d Definition 1.1. Let µ: R Ñ R and σj : R Ñ R for i “ 1, . . . , m be fixed functions. An equation of the form m dXt “ µpXtq dt ` σipXtq dBjptq (6.1) i“1 ÿ representing the integral equation

t m t Xt “ x ` µpXsq ds ` σjpXsq dBjpsq . (6.2) 0 j“1 0 ż ÿ ż where Xt is an unknown process is a Stochastic Differential Equation (sde) driven by the Brownian motion tBtu. The functions µpxq, σpxq asre called the drift and diffusion coefficients, respecively. It is more compact to introduce the matrix | | mˆd σpxq “ σ1pxq ¨ ¨ ¨ σmpxq P R ¨ | | ˛ ˝ ‚ and write

dXt “ bpXtq dt ` σpXtq dBt . (6.3) There are different concepts of solution for a sde. The most natural is the one of strong solution:

Definition 1.2. A stochastic process tXtu is a strong solution to the sde (6.1) driven by the Brownian motion Bt with (possibly random) initial condition X0 P R if the following holds i) tXtu is adapted to tFtu , ii) tXtu is continuous , t m t iii) Xt “ X0 ` 0 µpXtq dt ` j“1 0 σjpXtq dBjptq almost surely . Remark 1.3. Often,ş the choiceř of Brownianş motion in the above definition is implicit. However, it is important that the strong solution of an sde depends on the chosen Brownian motion driving it. A conceptually useful way to restate strong existence, say for all t ě 0, is that there exists a measure d d map Φ: pt, Bq ÞÑ XtpBq from r0, 8q ˆ Cpr0, 8q, R q Ñ R such that Xt “ Φpt, Bq solves (6.2) and Xt is measurable with respect to the filtration generated by Bt.

67 Definition 1.4. We say that a strong solution to (6.1) (driven by a Brownian motion Bt) is strongly unique if for any two solutions Xt, Yt with the same initial considion X0 of (6.1) we have that P rXt “ Yt for all t P r0,T ss “ 1 . Remark 1.5. By definition, the strong solution of a sde is continuous. For this reason, to prove strong uniqueness it is sufficient to prove that two solutions Xt, Yt satisfy

P rXt “ Yts “ 1 for all t P r0,T s .

Indeed, assuming that Xt is a version of Yt, by countable additivity, the set A “ tω : Xtpωq “ Ytpωq for all t P Q`u has probability one. By right-continuity (resp. left-continuity) of the sample paths, it follows that X and Y have the same paths for all ω P A.

2. Examples of SDEs We now consider a few useful examples of sdes that have a strong solution.

2.1. Geometric Brownian Motion. The geometric Brownian maotion, or Black Scholes model in finance, is a stochastic process Xt that solves the sde

dXt “ µXt dt ` σXt dBt,, (6.4) where µ, σ P R are constants. This model can be used to describe the evolution of the price Xt of a stock, which is assumed to have a mean increase and fluctuations with variance that depend both linearly on the stock price Xt. The coefficients µ, σ are called the percentage drift and percentage volatility, respecively. We see immediately that (6.4) has a solution by Itˆo’sformula: letting fpxq “ log x we have that

1 1 1 2 2 1 2 d log Xt “ pµXt dt ` σXt dBtq ´ 2 σ Xt dt “ pµ ´ σ q dt ` σ dBt , Xt 2 Xt 2 Therefore, by integrating and exponentiating, that the solution of the equation reads 1 X “ X exp µ ´ σ2 t ` σB . t 0 2 t „ˆ ˙  The uniqueness of this solution will be proven shortly.

2.2. Stochastic Exponential. Let Xt be an Itˆoprocess with differential dXt “ µt dt ` σt dBt. We consider the following sde:

dUt “ Ut dXt (6.5) with initial condition U0 “ u0 P R. Note that often one chooses u0 “ 1. Since the above sde is analogous to the ode df “ f dt whose solution is given by the exponential function fptq “ expptq, the process Ut solving (6.5) is often called the stochastic exponential and one writes Xt “ EpXqt. The following result ensures that this process exists and is unique:

Proposition 2.1. The sde (6.5) has a unique strong solution, given by 1 t 1 t U “ EpXq :“ U exp X ´ X ´ rXs “ U exp µ ´ σ2 ds ` σ dB . (6.6) t t 0 t 0 2 t 0 s 2 s s s „  „ż0 ˆ ˙ ż0  Proof. If u0 “ 0, then it is immediate by (6.5) that Ut ” 0 for all t ě 0 and it is the only solution. Now suppose u0 ‰ 0. We start by proving existence. The proposed solution is

68 1 Vt clearly adapted, so defining Vt “ Xt ´ X0 ´ 2 rXst only have to verify that e satisfies (6.5), i.e., Vt Vt dpe q “ e dXt. First, using Itˆo’sformula, we get 1 dV “ dX ´ drXs “ dX ´ σ2 dt (6.7) t t 2 t t t 1 dpeVt q “ eVt dV ` drV s . (6.8) t 2 t ˆ ˙ 2 Note that (6.7) implies rV st “ rXst, because the σt dt part is of finite variation. Therefore dV “ dX ´ 1 drV s . Plugging it back to (6.8), we have the desired equality. t t 2 t ş We next check that the solution is unique. Suppose we have another solution U˜ that satisfies (6.5). Notice that U is nonzero when u0 ‰ 0, we can compute dpU˜t{Utq

1 ´1 dpU˜t{Utq “ U˜t dp1{Utq ` dU˜t ` drU,U˜ st Ut dUt drUst U˜t dXt “ U˜ ´ ` ` ` drU,U˜ ´1s t U 2 U 3 U t „ t t  t ˜ 2 Utσt dt ´1 “ ` drU,U˜ st. Ut We need the stochastic differential of U ´1. 2 ´1 dUt drUst dXt σt dt dpU q “ ´ 2 ` 3 “ ´ ` Ut Ut Ut Ut ˜ ´1 2 ˜ ´1 6 drU,U st “ ´σt UtUt dt

Plugging this back gives dpU˜t{Utq “ 0. So the ratio of the two solution stays a constant for all t ě 0. ˜ Since both solutions start at u0, we conclude PrUt “ Ut, @t ě 0s “ 1. 

Similarly to the above definition. we introduce the stochastic logarithm Xt “ LpUqt of a process 1 1 1 Ut with a stochastic differential dUt “ µt dt ` σt dBt and Ut ‰ 0 as the solution to the following sde: dUt dXt “ andXt “ 0 . (6.9) Ut Again we have that the solution to the above sde exists and is unique. Proposition 2.2. Under the conditions listed above, the sde (6.9) has a unique solution given by t Ut drUst LpUq “ log ` . t U U 2 ˆ 0 ˙ ż0 t Furthermore, as suggested by the framework of stochastic calculus, the above operators are inverse wrt each other:

Proposition 2.3. If u0 “ 1 we have LpEpXqqt “ Xt and if Ut ‰ 0 then EpLpUqqt “ Ut.

Proof. It is enough to check that dXt “ dpEpXqtq{EpXqt and dUt “ LpUqtdLpUqt . 

2.3. Linear SDEs. Let tαtu, tβtu, tγtu, tδtu be given (i.e., independent on Xt) continuous stochastic processes adapted to the filtration tFtu. We consider the family of sdes given by

dXt “ pαt ` βtXtq dt ` pγt ` δtXtq dBt . (6.10) We proceed to solve such family of sdes, which includes as special cases some of the examples treated previously in this course. We do so in two steps:

69 i) First we consider the case where αt “ γt ” 0. In this case we should solve the sde

dUt “ βtUt dt ` δtUt dBt ,

which by Proposition 2.1, choosing U0 “ 1 and defining dY “ βt dt ` δt dBt has the unique solution t 1 t U “ EpY q “ exp β ´ δ2 ds ` δ dB . t t s 2 s s s „ż0 ˆ ˙ ż0  ii) We now proceed to consider the full sde (6.10), and make the ansatz of a separable solution, i.e., assume that Xt “ UtVt where dVt “ at dt ` bt dBt for unknown processes tatu, tbtu. Then we compute

dXt “ Ut dVt ` Vt dUt ` drU, V sptq

“ Utat dt ` Utbt dBt ` VtβtUt dt ` VtδtUt dBt ` btδtUt dt

“ patUt ` btδtUt ` βtXtq dt ` pbtUt ` δtXtq dBt . We notice that the above expression coincides with the rhs of (6.10) if

αt ´ δtγt γt at “ and bt “ . Ut Ut

This uniquely defines the process Vt (whose initial condition is fixed by the fact that U0 “ 1 to be V0 “ X0) as t t αs ´ δsγs γs V “ X ` ds ` dB , t 0 U U s ż0 s ż0 s which in turn defines the solution to (6.10) as Xt “ Ut ¨ Vt. Example 2.4. Letting a, b P R, consider the following sde on t P p0,T q: b ´ Xt dXt “ dt ` dBt with X0 “ a . (6.11) T ´ t It is clear that this is a linear sde with b 1 αt “ , βt “ , γt “ 1 , δt “ 0 . T ´ t T ´ t Therefore, the solution to (6.11) is given by t t t 1 Xt “ a 1 ´ ` b ` pT ´ tq ds . (6.12) T T T ´ s ˆ ˙ ż0 t ´1 Since 0pT ´ sq ds ă 8 for all t ă T the Itˆointegral in (6.12) is a martingale. Furthermore, as we have proven in Homework 2 it is a gaussian process. Hence, Xt is also a gaussian process, with ş E rXts “ a ` pa ´ bqt{T and covariance structure minps,tq 1 minps,tq 1 maxps,tq 1 CovpXs,Xtq “ pT ´ tqpT ´ sqCov dBq, dBq ` dBq T ´ q T ´ q T ´ q ˜ż0 ż0 żminps,tq ¸ minps,tq 1 st “ pT ´ maxpt, sqqpT ´ minpt, sqqVar dBq “ minps, tq ´ T ´ q T ˜ż0 ¸ where in the second equality we have used that the covariance of independent random variables is 0. The above expression suggests that the variance of the process Xt is 0 at t “ 0 and t “ T , and maximized at t “ T {2, while the expected vaue of the process Xt is on the line interpolating between a and b. Hence the name : the proess above can be seen as a Brownian Motion

70 with initial condition B0 “ a and conditioned on its final value BT “ b. Indeed, one can prove (cfr Klebaner, Example 5.11) that t 1 limpT ´ tq dBs “ 0 a.s. . tÑT T ´ s ż0 3. Existence and Uniqueness for SDEs In this section we prove a theorem giving sufficient conditions on the coefficients µp¨q, σp¨q for the existence and uniqueness of the solution to the associated sde. We will consider solutions to the equation

dXt “ µpt, Xtq dt ` σpt, Xtq dBt (6.13) Note that we have assumed explicit dependence on time t of the drift and diffusion coefficients as this will allow us to weaken the conditions of the following theorem: Theorem 3.1. Fix a terminal time T ą 0. Assume that σpt, xq and µpt, xq are globally Lipschitz continuous, i.e., that there is a positive constant K so that for any t P r0,T s and any x and y we have |µpt, xq ´ µpt, yq| ` |σpt, xq ´ σpt, yq| ď K|x ´ y| . Then the sde (6.13) with initial condition X0 “ x has a solution t t Xt “ x ` µps, Xsq ds ` σps, Xsq dBs , ż0 ż0 and this solution is strongly unique. Furthermore the solution satisfies is in L2pΩ ˆ r0,T sq, i.e., t 2 E Xs ds ă 8 . „ż0  It is clear that in order for the solution Xt to be well defined we need that t t 2 |µps, Xsq| ds ă 8 and σps, Xsq ds ă 8 a.s. ż0 ż0 However, the assumptions of Theorem 3.1, while being easier to check, are more strict than the ones above. It is useful to recall that these assumptions are needed even for odes to have existence and uniqueness: we remind in the following examples how the non-Lipschitz character of the drift can Example 3.2 (Existence). The ode dx “ x2 with xp0q “ 1 dt has a drift coefficient µpxq “ x2 that is not uniformly Lipschitz continuous (although it is locally 1 Lipschitz continuous) because it grows faster than lineary. This ode has a solution xptq “ 1´t . Hoowever, it is clear that this solution is only well defined for all t P p0, 1q and diverges for t Ñ 1. In other words, the solution to this ode does not exist beyond t “ 1. Example 3.3 (Uniqueness). The ode dx “ 2 |x| dt is not locally Lipschitz continuous at x “ 0. The solutiona of this ode with initial condition x0 “ 0 is not unique. Indeed, it is immediate to check that 2 x1,t “ 0 and x2,t “ t , are both solution to this equation with the given initial condition. 3.1. Proof of Theorem 3.1.

71 Preparatory results. We start by proving two very useful lemmas: Lemma 3.4. (Gronwall’s inequality) Let yptq be a nonnegative function such that t yptq ď A ` D ypsq ds (6.14) ż0 for nonnegative A, D P R. Then yptq satisfies fptq ď A exp pDtq Proof of Lemma 3.4. By repeatedly iterating the (6.14) we obtain t yptq ď A ` Dypsq ds 0 ż t s ď A ` D A ` Dypqq dq ds 0 ˆ 0 ˙ ż t ż t s ď A ` A D ds ` D Dypqq dq ds 0 0 0 ż tż s ż s ď A ` ADt ` D2 A ` D yprq dr dq ds 0 0 ˆ 0 ˙ ż żt s ż t s q ď A ` ADt ` AD2 dq ds ` D3 yprq dr dq ds ď ... ż0 ż0 ż0 ż0 ż0 D2t2 D3t3 t s q r ď A ` ADt ` A ` A ` D4 ypτq dτ dr dq ds . 2 3! ż0 ż0 ż0 ż0 We notice that repeating the above procedure k times we will obtain the first k terms of the Tayolr expansion of A exp Dt plus a remainder term resulting from an integral iterated k ` 1 times. For finite T we can bound such integral by defining the constants T C :“ ypsq ds ă 8 and G :“ A ` DC, ż0 so that yptq ď G. Consequently, we can bound the remainder term by Gtk`1Dk`1{k! which vanishes exponentially fast in the limit k Ñ 8, uniformly in t P r0,T s. Alternative proofs assuming the existence and uniqueness of solutions to odes can be found in any good ode or dynamics book. For instance [6] or [5]. 

Lemma 3.5. Let tynptqu be a sequence of functions satisfying

‚ y0ptq ď A, t ‚ yn`1ptq ď D ynpsq ds ă 8 @ n ą 0, t P r0,T s , ż0 n n for positive constants A, D P R, then ynptq ď CD t {n! . Proof. the proof of this result goes by induction: the first step is trivial, while for the induction step we have t t Dntn Dn`1tn`1 yn 1ptq ď D ynpsq ds ď D C ! ds “ C . ` n pn ` 1q! ż0 ż0  72 Uniqueness for sde. If X1ptq and X2ptq are two solutions then taking there difference produces t t X1ptq ´ X2ptq “ rµps, X1psqq ´ µps, X2psqqs ds ` rσps, X1psqq ´ σps, X2psqqs dBs ż0 ż0 We now use the fact that maxtpa ´ bq2, pa ` bq2u ď pa ´ bq2 ` pa ` bq2 “ 2a2 ` 2b2 gives t 2 t 2 2 |X1ptq ´ X2ptq| ď2 rµps, X1psqq ´ µps, X2psqqs ds ` 2 rσps, X1psqq ´ σps, X2psqqs dBs ˇż0 ˇ ˇż0 ˇ ˇ ˇ ˇ ˇ ďpIˇq ` pIIq ˇ ˇ ˇ ˇ ˇ ˇ ˇ t 2 t 2 Next recall that Holders (or Cauchy-Schwartz) inequality implies that p 0 f dsq ď t 0 f ds (apply Holder to the product 1 ¨ f with p “ q “ 2.) Hence ş ş t t 2 2 2 EpIq ď 2tE rµps, X1psqq ´ µps, X2psqqs ds ď 2tK E |X1psq ´ X2psq| ds ż0 ż0 Applying Itˆo’sisometry to the second term gives t t 2 2 2 EpIIq “2 E |σps, X1psqq ´ σps, X2psqq| ds ď 2K E |X1psq ´ X2psq| ds ż0 ż0 Putting this all together and recalling that t P r0,T s gives t 2 2 2 E |X1ptq ´ X2ptq| ď 2K pT ` 1q E |X1psq ´ X2psq| ds ż0 2 Hence by Gronwall’s inequality Lemma 3.4 we conclude that E |X1ptq ´ X2ptq| “ 0 for all t P r0,T s. Hence X1ptq and X2ptq are identical almost surely. Existence for sde. The existence of solutions is proved by a variant of Picard’s iterations. Fix an initial value x, we define a sequence processes Xnptq follows. By induction, the processes have continuous paths and are adapted.

X0ptq “ x t t X1ptq “ x ` µps, xqds ` σps, xqdBs ż0 ż0 ...... t t Xn`1ptq “ x ` µps, Xnpsqqds ` σps, XnpsqqdBs ż0 ż0 2 Fix t ě 0, we will show that Xnptq converges in L . Hence there is a random variable Xptq P 2 2 L 2 L pΩ, F,P q and Xn ÝÑ Xptq. Let ynptq “ E pXn`1ptq ´ Xnptqq , we will verify the two conditions in Lemma 3.5. First, for n “ 0 and any t P r0,T s, “ ‰ t 2 t 2 2 y0ptq “ E pX1ptq ´ X0ptqq ď 2E µps, xqds ` 2E σps, xqdBs «ˆż0 ˙ ff «ˆż0 ˙ ff “ t ‰ 2 t 2 ď 2E K|1 ` x|ds ` 2E K|1 ` x|dBs ď C ă 8, «ˆż0 ˙ ff «ˆż0 ˙ ff 73 where the second inequality uses the fact that the coefficients are growing no faster than linearly. Second, similar computation as for the uniqueness yields t 2 yn`1ptq ď 2K p1 ` T q ynpsqds @t P r0,T s, n “ 0, 1, 2 ... ż0 which is finite by induction. Lemma 3.5 implies p4K2 ` 4K2T qn y ptq “ pX ptq ´ X ptqq2 ď C , n E n`1 n n! “ ‰ 2 which goes to zero uniformly for all t P r0,T s. We thus conclude Xnptq converges in L uniformly and their L2-limit, Xptq P L2pΩ, F,P q. L2 It remains to show that the limit process Xptq solves (6.13). Since Xn ÝÑ X, we have 2 2 Erµpt, Xnptqq ´ µpt, Xptqqs ` Erσpt, Xnptqq ´ σpt, Xptqqs 2 2 2 2 ď K ErpXnptq ´ Xptqq s ` K ErpXnptq ´ Xptqq s Ñ 0, uniformly in t By Itˆo’sisometry and Fubini:

t t 2 t 2 E σps, XnpsqqdBs ´ σps, XpsqqdBs “ E σps, Xnpsqq ´ σps, XpsqqdBs «ˆż0 ż0 ˙ ff «ˆż0 ˙ ff t 2 nÑ8 “ E pσps, Xnpsqq ´ σps, Xpsqqq ds ÝÑ 0 . ż0 Similarly, by Cauchy-Schwarz inequality we have that : “ ‰

t t 2 t 2 E µps, XnpsqqdBs ´ µps, Xpsqq ds “ E µps, Xnpsqq ´ µps, Xpsqq ds «ˆż0 ż0 ˙ ff «ˆż0 ˙ ff t 2 nÑ8 “ t E pµps, Xnpsqq ´ µps, Xpsqqq ds ÝÑ 0 . ż0 We thus have “ ‰ t t Xptq “ x ` µps, Xpsqqds ` σps, XpsqqdBs, ż0 ż0 i.e., Xptq solves (6.13). Remark 3.6. Looking through the proof of Theorem 3.1 we see that the assumption of global Lipschitz continuity can be weakened to the following assumption i) |µpt, xq| ` |σpt, xq| ă Cp1 ` |x|q (necessary for existence) , ii) |µpt, xq ´ µpt, yq| ` |σpt, xq ´ σpt, yq| ď C|x ´ y| (necessary for uniqueness) .

4. Weak solutions to SDEs Until now we have studied strong solutions to sdes, i.e., solutions for which a Brownian motion (and a probability space) is given in advance, and that are constructed based on such Brownian motion. If we are only given some functions µpxq and σ without fixing a Brownian motion, we may be able to construct a weak solution to an sde of the form (6.1). Such solutions allow to choose a convenient Brownian motion (and consequently a probabilits space!) for the solution Xt to satisfy the desired sde.

74 Definition 4.1. A weak solution of the stochastic differential equation (6.1) with initial condition X0 „ ρ0 for a given probability distribution ρ0 is a continuous stochastic process Xt defined on some probability space pΩ, F, Pq such that for some Brownian motion Bt and some filtration tFtu such that Bjptq P Ft and for any 0 ď s ă t, Bjptq ´ Bjpsq is independent of Fs, the process Xt is adapted and satisfies the stochastic integral equation (6.2). In other words, in the case of a weak solution we are free to chose some convenient Brownian motion that allows Xt to be a solution. In this sense, these solutions are also distributional solutions, i.e., solutions that have the “right” marginals. Because two solutions Xt, Yt may live on different probability spaces, we cannot compare their paths as in the case of strong solutions. Instead, we weaken the concept of strong uniqueness to the one of weak uniqueness, i.e., uniqueness in law of the solution process:

Definition 4.2. The weak solution of a sde is said to be weakly unique if any two solutions Xt, Yt have the same law, i.e., for all tti P r0,T su, tAi P Bu we have

P tXti P Aiu “ P tYti P Aiu . « i ff « i ff č č Example 4.3. Consider the sde dYt “ dBt with initial condition Y0 “ 0 . This sde has clearly a strong solution, which is Yt “ Bt. If we let Wt be another Brownian motion (possibly defined on another probability space) then Wt will not be, in general, a strong solution to the above sde (in the cas ethat the two probability spaces are different, the two solutions cannot even be compared). It will, however, be a weak solution to the sde, as being a Brownian motion completely determines the marginals of the process. We will now consider an example for which there exists a weak solution, but not a strong solution:

Example 4.4 (Tanaka’s sde). For certain µ and σ, solutions to (6.1) may exist for some Brownian motion and some admissible filtrations but not for others. Consider the sde

dXt “ signpXtqdBt,X0 “ 0; (6.15) where σpt, xq “ signpxq is the sign function `1, if x ě 0 signpxq “ ´1, if x ă 0. " The function σpxq is not continuous and thus not Lipschitz. A strong solution does not exist for this sde, with the filtration F “ pFtq chosen to be Ft :“ σpBs, 0 ď s ď tq. Suppose Xt is a strong solution to Tanaka’s sde, then we must have

F˜t :“ σpXs, 0 ď s ď tq Ď Ft. (6.16) T 2 t Notice that for any T ě 0, 0 E signpXtq ds ă 8, the Itˆointegral 0 signpXtqdBs is well defined and Xt is a martingale. Moreover, the quadratic variation of Xt is ş “ ‰ ş t t 2 rXst “ rsignpXtqs ds “ 1 ¨ ds “ t, ż0 ż0 thus Xt must be a Brownian motion (by L´evy’scharacterization, to be proved later). We may denote Xt “ B˜t to emphasize that it is a Brownian motion. Now multiplying both sides of (6.15) by signpXtq, we obtain dBt “ signpB˜tqdB˜t. (6.17)

75 t ˜ ˜ and thus Bt “ 0 signpBsqdBs. By Tanaka’s formula (to be shown later), we then have ˜ ˜ ş Bt “ |Bt| ´ Lt where L˜t is the of B˜t at zero. It follows that Bt is σp|B˜s|, 0 ď s ď tq-measurable. This leads to a contradiction to (6.16), because it would imply that

Ft Ď σp|B˜s|, 0 ď s ď tq Ĺ σpB˜s, 0 ď s ď tq “ F˜t.

Still, as we have seen above, choosing Xt “ B˜t there exists a Brownian motion Bs such that Tanaks’s sde holds. Such pair of brwnian motions forms a weak solution to Tanaks’s equation. 5. Markov property of Itˆodiffusions The solutions (weak or strong) to stochastic differential equations are referred to as diffusion processes (or Itˆodiffusions).

Definition 5.1. An Itˆoprocess dXt “ µt dt ` σt dBt is an Itˆodiffusion if µt, σt are measurable X wrt the filtration tFt u generated by Xt for all t P r0,T s, i.e., X µt, σt P Ft .

Remark 5.2. It is clear that solution tXtu to the sde dXt “ µpt, Xtq dt ` σpt, Xtq dBt for continuous functions µ, σ is an Itˆodiffusion. For this reason, such sdes are called of diffusion-type. Recall Def. 7.12 that a Markov process is a process whose future depends on its past only through its present value, while if this property holds also for stopping times, the process is said to have the strong Markov property (cfr Def. 7.16).

Theorem 5.3. The solution tXtu to the sde

dXt “ µpt, Xtq dt ` σpt, Xtq dBt , (6.18) has the strong Markov property. While we do not present the proof of this result, which can be found in [13], it should be intuitively clear why solutions to (6.18) have the Markov property. Indeed, we see that the drift and diffusion coefficients of the above sde only depend on the time and on the value of Xt at that time (and not on its past value). This fact, combined with the independence of the increments of Brownian motion results in the Markov (and the strong Markov) property of such solutions.

76 CHAPTER 7

PDEs and SDEs: The connection

Throughout this chapter, except twhen specified otherwise, we let tXtu be a solution to the sde

dXt “ µpt, Xtq dt ` σpt, Xtq dBt . (7.1) As we have seen in the last chapter, solutions to the above equation are referred to as diffusion processes. This name comes from the fact that Brownian motion, the archetipical diffusion process, was invented to model the diffusion of a dust particle in water. Similarly, in the world of partial differential equations, diffusion equations model precisely the same type of phenomenon. In this chapter we will see that the correspondence between these two domains goes well beyond this point.

1. Infinitesimal generators Having seen in the previous chapter that solutions to sdes possess the strong markov property, we introduce the following operator to study the evolution of their finite-dimensional distributions:

Definition 1.1. The infinitesimal generator for a continuous time Markov process Xt is an operator A such that for any function f,

ErfpXt`dtq|Xt “ xs ´ fpxq Atfpxq :“ lim , (7.2) dtÓ0 dt provided the limit exists. The set of function for which the above limit exists is called the domain DpAtq of the generator.

This operator encodes the infinitesimal change in the probability distribution of the process Xt. d One way of seeing this is by choosing fpxq “ 1Apxq for a set A P R . We now look at some examples where we find the explicit form of the generator for Itˆodiffusions:

Example 1.2. The infinitesimal generator for a standard one-dimensional Brownian motion Bt is 1 d2 A “ 2 dx2 for all f that are C2 with compact support. To derive this, we first apply Itˆo’sformula to any f P C2 and write t d2 t 1 d2 fpB q “ fpB q ` fpB qdB ` fpB qds t 0 dx s s 2 dx2 s ż0 ż0 t t 1 d2 “ fpB q ` f 1pB qdB ` fpB qds 0 s s 2 dx2 s ż0 ż0 Apply this formula to two time points t and t ` r, we have t`r t`r 1 d2 fpB q “ fpB q ` f 1pB qdB ` fpB qds t`r t s s 2 dx2 s żt żt 77 When f has compact support, f 1pxq is bounded and suppose |f 1pxq| ď K and thus for each t, t t 12 2 2 E f pBsq ds ď K ds “ K t ă 8. ż0 ż0 Hence the first integral has expectation` zero,˘ due to Itˆo’sisometry. It follows that t`r t`r 1 d2 rfpB q|B “ xs “ fpxq ` f 1pB qdB ` fpB qds B “ x E t`r t E s s 2 dx2 s t "żt żt ˇ * t`r t`r 1 d2 ˇ “ fpxq ` f 1pB qdB ` fpB qdsˇ E s s 2 dx2 s ˇ "żt żt * t`r 1 d2 “ fpxq ` fpB qds E 2 dx2 s żt where the second equality is due to the independence of the post-t process Bt`s ´ Bt and Bt. Subtracting fpxq, dividing by r and letting r Ñ 0 on both sides, we obtain

t`r 1 d2 2 fpBsqds ErfpBt`rq|Bt “ xs ´ fpxq E t 2 dx Afpxq “ lim “ lim rÓ0 r rÓ0 ”ş r ı 2 2 d t`r 1 d f B ds d t`r 1 d f B ds E t 2 dx2 p sq p˚q t 2 E dx2 p sq “ “ ´ ”ş dr ¯ı ş ”dr ı 1 d2 1 d2 1 d2 “ fpB q “ fpB q “ fpxq 2 dx2 s 2 dx2 t 2 dx2 ˇs“t ˇ In the above calculation, we invertedˇ the order of the integrals using Fubini-Tonelli’s theorem. ˇ Remark 1.3. Here we omit the subscript t in the generator A because Brownian motion is time-homogeneous, i.e.,

ErfpBt`dtq|Bt “ xs ´ fpxq ErfpBs`dtq|Bs “ xs ´ fpxq “ dt dt 1 d2 and thus Atfpxq “ Asfpxq. The generator A “ 2 dx2 does not change with time. The procedure to obtain the infinitesimal generator of Brownian motion can be straightforwardly generalized to the case of Itˆodiffusions:

Example 1.4. Assume that Xt satisfies the sde (7.1), then its generator At is d σ2pt, xq d2 A fpxq “ µpt, xq fpxq ` fpxq (7.3) t dx 2 dx2 for all f P C2 with compact support.The computation is similar to the Brownian motion case. First

apply Itˆo’sformula to fpXtq and get t`r 2 2 t`r d σ ps, Xsq d d fpX q “ fpX q ` µps, X q fpX q ` fpX q ds ` σps, X q fpX qdB t`r t s dx s 2 dx2 s s dx s s żt " * żt Then using the fact that f P C2 with compact support, the last integral has expectation zero. ErfpXt`rq|Xt“xs´fpxq Conditioning on Xt “ x, computing r , exchanging integrals by Fubini-Tonelli and taking r Ñ 0, we conclude that the generator has the form (7.3). The above example can be further generalized to the case when the function f also depends on time:

78 Example 1.5. Consider the two dimensional process pt, Xtq, where the first coordinate is 2 deterministic and the second coordinate Xt satisfies (7.1). We treat it as a process Yt “ pt, Xtq P R . In this case, the generator of Yt, according to the definition in (7.2) is given by

ErfpYt`dtq|Yt “ pt, xqs ´ fpt, xq Atfpt, xq :“ lim dtÓ0 dt B σ2pt, xq B2 B “ µpt, xq fpt, xq ` fpt, xq ` fpt, xq (7.4) Bx 2 Bx2 Bt for any f P C1,2 that has compact support.

Formally speaking, if At is the generator of Xt, what At does to f is to map it to the “drift coefficient” in the stochastic differential of fpXtq, i.e.,

dfpXtq “ AtfpXtq ¨ dt ` (something) ¨ dBt Remark 1.6. The notation here is slightly different from [Klebaner], where Klebaner always 1,2 uses Lt to denote the operator on functions f P C so that B σ2pt, xq B2 Ltfpt, xq “ µpt, xq fpt, xq ` fpt, xq (7.5) Bx 2 Bx2 and call such Lt the “generator of Xt”. Comparing this form with (7.3) and (7.4), and since Lt 1,2 acts on C functions, we can relate Lt to the generator At of pt, Xtq, i.e., B Ltfpt, xq ` fpt, xq “ Atfpt, xq Bt When we look at martingales constructed from the generators, At will give a more compact (and maybe more intuitive) notation.

Exercise. Find the generator At of pXt,Ytq, where Xt and Yt satisfies

dXt “ µpt, Xtqdt ` σpt, XtqdBt dYt “ αpt, Ytqdt ` βpt, YtqdBt

What if Xt and Yt are driven by two independent Brownian motions?

2. Martingales associated with diffusion processes 2 Suppose Xt solves (7.1) and At is its generator (see (7.3)). For f P C , we know from Itˆo’s formula that t t 1 fpXtq “ fpX0q ` AsfpXsqdt ` σps, Xsqf pXsqdBs ż0 ż0 Under proper conditions, the third term on the right is a well-defined Itˆointegral and also a martingale. We can construct martingales by isolating this integral, i.e., let t t 1 Mt :“ fpXtq ´ fpX0q ´ AsfpXsqds “ σps, Xsqf pXsqdBs , (7.6) ż0 ˆ ż0 ˙ and we will see that for certain µ, σ and f functions, Mt will be a martingale. First of all, if Xt is a solution to (7.1) (either weak or strong), then, by definition, Mt is always Gt :“ σpXs, 0 ď s ď tq measurable. So from now on, for the purpose of constructing martingales, we will only say “Xt solves the sde (7.1)” without specifying whether Xt is a strong or a weak solution. Recall that if t 2 12 Erσ ps, Xsqf pXsqs ds ă 8, (7.7) ż0 79 t 1 then the Itˆointegral 0 σps, Xsqf pXsqdBs is a martingale. Therefore, the usual technical step is to prove (7.7) in order to conclude that Mt is a martingale. ş Theorems 6.2 in [Klebaner] gives a set of conditions for Mt to be a martingale: Condition 1. Let the following assumptions hold: i) µpt, xq and σpt, xq are locally Lipschitz in x with a Lipschitz constant independent of t and are growing at most linearly in x; and ii) f P C2 and f 1 is bounded, Condition (i) implies that (7.1) has a strong solution, but more importantly, it controls the speed of growth of the solution Xt (see Theorem 5.4 and also the proof in Theorem 6.2 of [Klebaner] for more details); (ii) controls the magnitude of f, which together with (i) ensure the finiteness of the integral in (7.7).[9, Theorem 6.3] gives a n alternative set of conditions to Condition 1, however, the proof follows the same idea. The above result can be summarized in the following theorem

Theorem 2.1. Let tXtu be a solution to (7.1), f a function such that Condition 1 holds, then Mt defined in (7.6) is a martingale.

We now generalize the above result to the case when f is time-dependent. Let Xt solve (7.1). If At is the generator of the two dimensional process pt, Xtq (see the expression in (7.4)), then for any function fpt, xq P C1,2 t Mt :“ fpt, Xtq ´ fp0,X0q ´ Asfps, Xsqds (7.8) ż0 can be a martingale if µ, σ and f satisfy certain conditions. Again, using Itˆo’sformula, we see that t B Mt “ σps, Xsq fps, XsqdBs Bx ż0 The approach to show that Mt is a martingale is the same as above. For example, if Condition 1 (ii) above is modified to Condition 2. Let the following assumptions hold: i) µpt, xq and σpt, xq are locally Lipschitz in x with a Lipschitz constant independent of t and are growing at most linearly in x; and 1,2 B ii)’ f P C and Bx fpt, xq is bounded for all t and x. then, we can conclude that Mt defined in (7.8) is a martingale:

Theorem 2.2. Let tXtu be a solution to (7.1), f a function such that Condition 2 holds, then Mt defined in (7.8) is a martingale.

One advantage of using At instead of Lt is that we can express Mt in the same form, that is, t Mt :“ fpXtq ´ fpX0q ´ 0 AsfpXsqds, provided At is chosen to be the generator of Xt (which might be high-dimensional). The following are a few immediate consequences, stated under Condition 2. However, one should keepş in mind that there are other conditions, under which these claims are also true.

Corollary 2.3 (Dynkin’s formula). Suppose that Xt solves (7.1) and that Condition 2 holds. Let At be the generator of pt, Xtq (see (7.4)). Then for any t P r0,T s, t Erfpt, Xtqs “ E rfp0,X0qs ` E Asfps, Xsqds , „ż0  The result is also true if t is replaced by a bounded stopping time τ P r0,T s.

80 Corollary 2.4. Assume that Xt solves (7.1) and that Condition 2 holds. If f solves the following pde B σ2pt, xq B2 B pAtfpt, xq “q µpt, xq fpt, xq ` fpt, xq ` fpt, xq ” 0, Bx 2 Bx2 Bt then fpt, Xtq is a martingale.

Example 2.5. Consider Xt “ Bt, then σ ” 1 and µ ” 0, which satisfies Condition 2 (i). Then, for any fpt, xq that satisfies Condition 2 (ii’) (or conditions in Theorem 6.3 of Klebaner) and solves 1 B2 B fpt, xq ` fpt, xq ” 0, 2 Bx2 Bt 2 3 t{2 x´t{2 fpt, Btq is a martingale. For example, fpt, xq “ x, x ´ t, x ´ 3tx, e sinpxq, or e .

3. Connection with PDEs In the previous section we have seen that under proper conditions the solution f of some pde can be used to construct a martingale. In this section, we will see that the solutions of certain pdes may be represented by the expectation of the solution of the sde. Throughout this section we will assume that Xt solves the sde (7.1) whose coefficients satisfy Condition 2 (i), and At is the generator of pt, Xtq as given in (7.4). Furthermore, we assume that f satisfies Condition 2 (ii’). Note that other conditions, under which Mt defined in (7.8) is a martingale, would also work.

3.1. Kolmogorov Backwards Equation.

Theorem 3.1. Under the standing assumptions, if fpt, xq solves the pde A f “ 0 for all t P p0,T q t , (7.9) #fpT, xq “ gpxq for some function g such that Er|gpXT q|s ă 8. Then,

fpt, xq “ EpgpXT q|Xt “ xq, for all t P r0,T s.

Proof. Under the standing assumptions and the fact that Atf “ 0, we know that fpt, Xtq is a martingale, due to Corollary 2.4. Then for any t P r0,T s,

ErfpT,XT q|Fts “ fpt, Xtq

Using the boundary condition, we have fpT,XT q “ gpXT q. The result then follows from the Markov property of the solution to the diffusion-type sde, i.e.,

fpt, Xtq “ ErgpXT q|Fts “ ErgpXT q|Xts.  Remark 3.2. Note that the above theorem, assuming that fpt, xq solves the given pde, represents expectation values of the process Xt in terms of such solutions. Under suitable regularity conditions on the coefficients of the sde (7.1) and on the boundary condition gpxq one can show that such expected value is the unique solution to the pde (7.9). These results, however, go beyond the scope of this course and will not be presented here. We refer the interested reader to, e.g., [13].

Definition 3.3. For a Itˆodiffusion tXtu solving (7.1) the pde (7.9) is called the Kolmogorov Backwards equation.

81 The name of the above pde is due to the fact that it has to be solved backwards in time, i.e., the boundary condition in (7.9) is fixed at the end of the time interval of interest. This may seem at first counterintuitive. One possible way to interpret this fact is that bringing the time derivative on the other side of the equality we obtain ´Btf “ Ltf, where Lt is the generator defined in (7.5). In this form, the “arrow of time” is given by the fact that σ2 has nonnegative values and corresponds to the second derivative “widening the support” of f, and the negative sign in front of the time derivative corrsponds to an evolution in the “reverse” direction. Another way of understanding the fact that the direction of time is reversed in (7.9) is the following: in order to establish an expectation wrt a certain function in the future (e.g., the value of an option at expiration), an operator that evolves such function should project it backwards to the information we have at the moment, i.e., the value of Xt

Example 3.4. Letting gpxq “ 1Apxq, we have that being able to solve (7.9) is equivalent to knowing E r1ApXT q|Xt “ xs “ P rXT P A|Xt “ xs .

Example 3.5. Letting Xt be the solution to the Black Scholes model dXt “ µXt dt ` σXt dBt and gpxq “ V pxq some value function of an option at time T , then being able to solve (7.9) is equivalent to knowing the expected value of that option at expiration: E rV pXT q|Xt “ xs. We now state an extension of Theorem 3.1 which deals with the case where the right side of the pde is nonzero.

Theorem 3.6. Under the standing assumptions, if fpt, xq solves the pde

Atfpt, xq “ ´φpxq for all t P p0,T q #fpT, xq “ gpxq for some bounded function φ : R Ñ R and g such that Er|gpXT q|s ă 8. Then, T fpt, xq “ E gpXT q ` φpXsqds Xt “ x , for all t P r0,T s. ˆ żt ˇ ˙ ˇ Proof. By Theorem 2.2 we have that ˇ ˇ t Mt :“ fpt, Xtq ´ fp0,X0q ´ Asfps, Xsqds ż0 is a martingale. Plugging in Atfpt, xq “ ´φpxq and taking conditional expectation of MT |Ft, since Mt “ EpMT |Ftq we get t T fpt, Xtq ´ fp0,X0q ´ Asfps, Xsqds “ EpfpT,XT q|Ftq ´ fp0,X0q ´ E Asfps, Xsqds Ft ż0 ˆż0 ˇ ˙ ˇ which can be rewritten as ˇ ˇ T fpt, Xtq “ E gpXT q ` φpXsqds Ft „ żt ˇ  ˇ Finally, by the Markov property of Itˆodiffusions, we obtain ˇ ˇ T fpt, xq “ E gpXT q ` φpXsqds Xt “ x „ żt ˇ  ˇ ˇ  ˇ 82 3.2. Feynman-Kac formula. Theorem 3.1 can be generalized even further: Theorem 3.7 (Feynman-Kac Formula). Under the standing assumptions, if fpt, xq solves A fpt, xq “ rpt, xqfpt, xq for all t P r0,T s t , (7.10) #fpT, xq “ gpxq where rpt, xq and gpxq are some bounded functions, then

T ´ rps,Xsq ds fpt, xq “ E e t gpXT q Xt “ x . ş ´ ˇ ¯ Furthermore, fpt, xq above is the unique solution to (7.10). ˇ ˇ Proof. The proof of uniqueness of the solution goes beyond the scope of this lecture and we do not prove it here. As in the previous cases, we want to show that the content of the expectation value is a martingale. Therefore, consider τ ´ rps,Xsq ds Mτ :“ e t fpτ, Xτ q . τ ş ´ rps,Xsq ds Defining Uτ “ e t , Yτ “ fpτ, Xτ q we apply Itˆo’sformula to obtain ş dMτ “ dpUτ Yτ q “ Uτ dpYτ q ` Yτ dpUτ q ` drU, Y sτ .

Recall from the chapter on the stochastic exponential that Uτ “ Eprpτ, Xτ qq and that therefore dUτ “ rpτ, Xτ qUτ dτ. Furthermore, we recognize that Uτ has finite variation, so drU, Y s “ 0. Combining these observations we obtain by Itˆo’sformula for f, that 1 dM “ U B fpτ, X q ` µpτ, X qB fpτ, X q ` σpτ, X q2B2 fpτ, X q dτ τ τ τ τ τ x τ 2 τ xx τ ˜ „ 

` σpτ,Xτ qBxfpτ, Xτ q dBτ ´ rpτ, Xτ qfpτ, Xτ qdτ ¸ “ Uτ ppAτ ´ rpτ, Xτ qqfpτ, Xτ q ` σpτ,Xτ qBxfpτ, Xτ q dBτ q . We immediately realize that the drift term in the above formula vanishes by assumption, and that the Itˆointegral term is a martingale by the standing assumptions on f, σ and µ. Consequently, the expected vylue of the martingale is constant and we have that

T 0 ´ rps,Xsq ds fpt, xqe “ Mt “ ErMT |Fts “ E e t gpXT q Xt “ x ş ” ˇ ı ˇ  ˇ Example 3.8 (Example 3.5 continued). Let us consider the Black Scholes model i.e., dXt “ µXt dt ` σXt dBt for σ, µ P R. We consider the case where one can cash his/her option and obtain a risk-free interest that satisfies the ode

dXt “ rXt dt , for a positive constant r P R. Then, one needs to factor such possible, risk-free earning in the value V pt, Xtq of the asset Xt (the underlying), i.e., compare the expected value at future time T , ˚ V pXT q “ V pXT q with the projected risk-free value today: rpT ´tq ˚ e V pt, Xtq “ E rV pXT q|Xt “ xs or, in other words, ´rpT ´tq V pt, Xtq “ E e V pXT q|Xt “ x .

” 83 ı The above is an example of the expected value in Theorem 3.7, and therefore obeys the pde B V pt, xq ` µxB V pt, xq ` 1 σ2x2B2 V pt, xq ´ r V pt, xq “ 0 for all t P r0,T s t x 2 xx , ˚ #V pT, xq “ V pxq which is called the Black Scholes equation.

4. Time-homogeneous Diffusions In this section we now consider a class of diffusion processes whose drift and diffusion coefficients do not depend explicitely on time:

Definition 4.1. If Xt solves

dXt “ µpXtq dt ` σpXtq dBt (7.11) then Xt is a time-homogeneous Itˆodiffusion process. Intuitively, the evolution of such processes does not depend on the time at which the process is started, i.e., P rXt P A|X0 “ x0s “ P rXt`s P A|Xs “ x0s for all A P BpRq, s P T s.t. t ` s P T . In other words, their evolution is invariant wrt translations in time, whence the name time-homogeneous. Definition 4.2. Given a sde with a unique solution we define the associated Markov semigroup Pt by

pPtφqpxq “ ExφpXtq To see that this definition satisfies the semigroup property observe that the Markov property states that

pPt`sφqpxq “ ExφpXt`sq “ ExEXs φpXtq “ ExpPtφqpXsq “ pPsPtφqpxq . Note that for the class of processes introduced above the infinitesimal generator is time-independent, i.e., we have Atf “ Af. As a further consequence of the translation invariance (in time) of the sde (7.11), the fact that the final condition of the backward Kolmogorov equation is at a specific time T is not relevant in this framework. This enables us to “store” the time-reversal in the function itself and look at the backward Kolmogorov equation as a forward equation as we explain below. 2,1 Let f´px, tq be a bounded, C function satisfying Bf ´ “ Lf Bt ´ (7.12) f´px, 0q “ gpxq where L is the generator defined in (7.5). For simplicity we also assume that g is bounded and continuous. Then we have the analogous result to Theorem 3.1

Theorem 4.3. under Mt “ f´pXt,T ´ tq is a martingale for t P r0,T q. Proof. The proof is identical to the Brownian case. We start by applying Itˆo’sformula s Bf´ f´pXs,T ´ sq ´ f´pX0,T q “ ´ pXγ,T ´ γq ` Lf´ Xγ,T ´ γ dγ 0 Bt ż s ı “ Bf´ ` ˘` ˘ ` Xγ,T ´ γ dBγ . Bx ż0 ` ˘ Bf´ As before the integrand of the first integral is identically zero because Bt “ Lf´. Hence only the stochastic integral is left on the right-hand side.  And as before we have

84 Corollary 4.4. In the above setting

f´px, tq “ Erg Xt |X0 “ xs The restriction to bounded and continuous g is` not˘ needed.

Proof of Cor. 4.4. Since s ÞÑ f´pXs,T ´ sq ´ f´pX0,T q is a martingale,

Erf´pXT , 0q|X0 “ xs “ ErupX0,T q|X0 “ xs

because at s “ 0 we see that f´pXs,T ´ sq ´ upXp0q,T q “ 0. Since ErupX0,T q|X0 “ xs “ upx, T q and Erf´pXT , 0q|X0 “ xs “ ErgpXT q|X0 “ xs, the proof is complete.  For a more detailed discussion of Poisson and Dirichlet problems we refer to [13].

5. Stochastic Characteristics To better understand Theorem 4.3 and Corollary 4.4, we begin by considering the deterministic case Bf ´ “ pb ¨ ∇qf (7.13) Bt ´ f´px, 0q “ fpxq We want to make an analogy between the method of characteristics used to solve (7.13) and the results in Theorem 4.3 and Corollary 4.4. The method of characteristics is a method of solving (7.13) which in this simple setting amounts to finding a collection of curves (“characteristic curves”) along which the solution is constant. Let us call these curves xptq were t is the paramiterizing variable. Mathematically, we want f´pξptq,T ´tq to be a constant independent of t P r0,T s for some fixed T . Hence the constant depends only on the choice of ξptq. We will look of ξptq “ pξ1ptq, ¨ ¨ ¨ , ξdptqq which solve an ODE and thus we can paramiterize the curves ξptq by there initial condition ξp0q “ x. It may seem odd (and unneeded) to introduce the finial time T . This done so that f´pT, xq “ f´pT, ξp0qq and to keep the analogy close to what is traditionally done in sdes. Differentiating f´pξptq,T ´ tq with respect to t, we see that maintaining constant amounts

d d Bf´ dξi Bf´ Bf´ pξptq,T ´ tq ptq “ pξptq,T ´ tq “ bipξptqq pξptq,T ´ tq Bx dt Bt Bx i“1 i i“1 i ÿ ÿ where the last equality follows from (7.13). We conclude that for this equality to hold in general we need dξ “ bpξptqq and ξp0q “ x . dt

Since f´pξptq,T ´ tq is a constant we have

f´pξp0q,T q “ f´pξpT q, 0q ùñ f´px, T q “ fpξpT qq (7.14) which provides a solution to (7.13) to all points which can be reached by curves ξpT q. Under mild assumptions this is all of Rd. Looking back at Theorem 4.3, we notice that differently from the ode case we did not find a sde Xt which keeps f´pXt,T ´ tq constant in the fully fledged sense. However, we have obtained something very close to it: We chose t ÞÑ f´pXt,T ´ tq to be a martingale, i.e., a process that is constant on average! This is the content of Theorem 4.3 and Corollary 4.4 (putting the accent on the expectation part of the result), which mimicks the result of (7.14), only with the addition of expected values. Hence we might be provoked to make the following fanciful statement. Stochastic differential equation are the method of characteristics for difusions. Rather than follow a single characteristic back to the initial value to find the current value, we trace a infinite collection of stochastic curves each back to its own initial value which we then average weighting with the probability of the curve.

85 6. A fundamental example: Brownian motion and the Heat Equation We now consider the simple but fundamental case of standard brownian motion. 2 Let us consider a compact subset D Ă R with a smooth boundary BD and a fpxq defined on BD. The Dirichlet problem: We are looking for a function upxq such that B2u B2u ∆u “ 2 ` 2 “ 0 for y “ py1, y2q inside D. B y1 B y2 lim upyq “ fpxq for all x PBD. yÑx

Let Bpt, ωq “ pB1pt, ωq,B2pt, ωqq be a two dimensional Brownian Motion. Define the stopping time τ “ inftt ą 0 : Bptq R Du

Let Ey be the expectation with respect to the Wiener measure for a Brownian Motion starting from y at time t “ 0. Let us define φpxq “ EyfpBpτqq. We are going to show that φpxq solves the Dirichlet problem.

r Lemma 6.1. With probability 1, τ ă 8. In fact, Eτ ă 8 for all r ą 0. Proof. Ptτ ě nu ď Pt|Bp1q ´ Bp0q| ď diamD, |Bp2q ´ Bp1q| ď diamD,..., |Bpnq ´ Bpn ´ 1q| ď diamDu n n ď Pt||Bpkq ´ Bpk ´ 1q| ď diamDu “ α where α P p0, 1q k“1 ź 8 Hence n“1 Ptτ ě nu ă 8 and the Borel-Cantelli lemma says that τ is almost surely finite. Now lets look at the moments. ř 8 8 8 r r r r r n Eτ “ x Ptτ P dxu ď n P τ P pn ´ 1, ns ď n P τ ě n ´ 1 ď n α ă 8 ż n“1 n“1 n“1 ÿ ( ÿ ( ÿ  Lets fix a point y in the interior of D. Lets put a small circle of radius ρ around y so that the circle in contained completely in D. Let τρ,y be the first moment of time Bptq hits the circle of radius ρ centered at y. Because the law of Brownian Motion is invariant under rotations, we see that Bpτρ,yq is distributed uniformly on the circle centered at y. (Lets call this circle Sρpyq.) Theorem 6.2. φpxq solves the Laplace equation. Proof. i) We start by proving the mean value property. To do so we invoke the Strong Markov property of Bptq. Let τS “ inftt : Bptq P Sρpyqu and zϑ “ pρ cos ϑ, ρ sin ϑq be the point on Sρpyq at angle ϑ. We notice that any path from y to the boundary of D must pass through Sρpyq. Thus we can think of φpyq as the weighted average of E fpBpτqq BpτSq “ zϑ where ϑ moves use around the circle Sρpyq. Each entery in this average is weighted by the ˇ chance of hitting that point on the sphere starting from y. Since this chanceˇ is uniform 1 (all points are equally likely), we simply get the factor of 2π to normalize things to be a probability measure. 1 2π 1 2π φpyq “ dϑ fpBpτqq Bpτ q “ z “ dϑφpz q (7.15) 2π E S ϑ 2π ϑ ż0 ż0 ˇ ( ˇ86 ii) φpxq is infinitely differentiable. This can be easily shown but let us just assume it since we are doing this exercise in explicit calculation to improve our understanding, not to prove every detail of the theorems. B2φ B2φ iii) Now we see that φ satisfies ∆φ “ 2 ` 2 “ 0. We expand about a point y in the interior By1 By2 of D. Bφ Bφ φpzq “φpyq ` pz1 ´ y1q ` pz2 ´ y2q By1 By2 2 2 2 1 B φ 2 B φ 2 B φ 3 ` pz1 ´ y1q ` pz2 ´ y2q ` pz1 ´ y1qpz2 ´ y2q ` Op|z ´ y| q 2 By2 By2 By By „ 1 2 1 2  Now we integrate this over a circle Sρpyq centered at y of radius ρ. We take ρ to by sufficiently small so that the entire disk in the domain D. By direct calculation we have

pz1 ´ y1qdz “ 0, pz2 ´ y2qdz “ 0, pz1 ´ y1qpz2 ´ y2qdz “ 0 żSρpyq żSρpyq żSρpyq and

2 2 2 2 pz1 ´ y1q dz “ pconstqρ , pz2 ´ y2q dz “ pconstqρ . żSρpyq żSρpyq Since by the mean value property,

φpyq “ pconstq φpzqdz żSρpyq we see that B2φ B2φ 0 “ pconstqρ2 ` ` Opρ3q . By2 By2 ˆ 1 2 ˙ And thus, B2φ B2φ ∆φ “ 2 ` 2 “ 0 By1 By2 

87

CHAPTER 8

Martingales and Localization

This chapter is dedicated to a more in-depth study of martingales and their properties. Some of the results exposed here are fairly general, and their proof in full generality required tools that are more advanced than the ones we have at our disposal. For this reason, some of the proofs will be given in a simplified setting/under stronger assumptions together with a reference for the more general result.

1. Martingales & Co. We recall the definition of a martingale given at the beginning of the course, extending it slightly.

Definition 1.1. tXtu is a Martingale with respect to a filtration Ft (Ft-martingale for short) if for all t ą s we have i) Xt is Ft-measurable ,

ii) Er|Xt|s ă 8 , iii) ErXt|Fss “ Xs a.s. .

Similarly, Xt is a Ft-supermartingale [Ft-submartingale] is it satisfies conditions i) and ii) above, and ErXt|Fss ď Xs , rErXt|Fss ě Xss a.s. . When the filtration is clear from the context we simply say that a process is a [super/sub-]martingale. Super- and Submartingales extend the idea of a process that is constant in expectation to processes that are, respectively, nonincreasing and nondecreasing in expectation. It is clear that a martingale is both a supermartingale and a submartingale, while a supermartingale that is also a submartingale is a martingale.

Proposition 1.2. A supermartingale [submartingale] Mt is a martingale on r0,T s if and only if E rMT s “ E rM0s. Proof. xx  Remark 1.3. By Jensen’s inequality on conditional expectations we have for any convex function g : R Ñ R, a martingale Mt satisfies E rgpMtq|Fts ě gpE rMt|Fssq “ gpMsq , so application of a convex [concave] map to a martingale makes it a submartingale [supermartingale]. 2 Recall that a random variable X is [square-]integrable if E r|X|s ă 8 [E X ă 8]. The condition of simple integrability of a random variable X can be equivalently stated as the condition “ ‰ lim |X|1|X|ąn “ 0 . (8.1) nÑ8 E

Indeed, on one hand as limnÑ8 |X|1|X|ąn ““ 0 a.s. and‰ E |X|1|X|ąn ď E r|X|s ă 8 we have by the dominated convergence theorem1 that lim |X|1 “ r0s “ 0. On the other hand, nÑ8 E “|X|ąn E‰ 1See Theorem 0.1 in the appendix for a reminder of this theorem“ ‰ 89 we have that E r|X|s “ E |X|1|X|ďn ` E |X|1|X|ąn ď n ` E |X|1|X|ąn , (8.2) and by using that both summands on the right hand side are bounded (the first by definition and “ ‰ “ ‰ “ ‰ the second by assumption) we obtain that E r|X|s ă 8. We now generalize the definitions above to stochastic processes.

Definition 1.4. A stochastic process Xt on T “ r0,T s (where possibly T “ 8) is

i) integrable if suptPT E r|Xt|s ă 8 , 2 ii) square integrable if suptPT E Xt ă 8 (i.e., the second moments are uniformly bounded) ,

iii) uniformly integrable if limnÑ8“ sup‰ tPT E |Xt|1|Xt|ąn “ 0 The introduction of (8.1) allows to separate the“ concepts of‰ simple and uniform integrability for stochastic processes as in the latter definition the limit is taken after the supremum. As one would expect, uniform integrability is stronger than simple integrability: similarly to (8.2) one has

sup E r|Xt|s “ sup E |Xt|1|Xt|ďn ` sup E |Xt|1|Xt|ąn ď n ` sup E |Xt|1|Xt|ąn ă 8 . tPT tPT tPT tPT For the converse result we“ need stronger‰ assumptions.“ We give‰ below examples“ of such‰ results:

Proposition 1.5. A stochastic process tXtu is uniformly integrable if, either i) It is dominated by a random variable Y defined on the same probability space, i.e., Xtpωq ď Y pωq such that E r|Y |s ă 8, ii) There exists some positive function Gpxq on p0, 8q with limxÑ8 Gpxq{x “ 8 such that sup E rGp|Xt|qs ă 8 . tPT Proof. We only prove the first result, for which we have that

lim sup E |Xt|1|Xt|ąn ď lim E |Y |1|Y |ąn ă 8 . nÑ8 tPT nÑ8 “ ‰ “ ‰ For the proof of the second result we refer e.g., to [15].  In ii) of the above theorem we see that we need something sligthly better than simple integrability to have uniform integrability. In particular we see that Gpxq “ x1`ε for any ε ą 0 satisfies condition ii) of the above theorem. In particular, all square integrable martingales are uniformly integrable.

Theorem 1.6. Let Y be an integrable random variable on a filtered probability space pΩ, F, P, Ftq, then Mt :“ E rY |Fts (8.3) is a uniformly integrable martingale.

Proof. xx  We define a martingale such as the one in (8.3) as closed by the random variable Y . In particular, for any finite time interval r0,T s by definition every martingale is closed by its value at T since E rMT |Fts “ Mt and we have the following corollary.

Corollary 1.7. Any martingale Mt on a finite time interval is uniformly integrable. The above results can be extended to infinite time intervals.

Theorem 1.8 (Martingale convergence theorem). Let Mt on T “ r0, 8q be an integrable [sub/super]-martingale. Then there exists an almost sure (i.e., pointwise) limit limtÑ8 Mt “ Y , and Y is an integrable random variable.

90 The above theorem does not establish a correspondence between the random variables in terms of expected values. In particular, we may have cases where the theorem above applies but we have limtÑ8 E rMts ‰ E rY s:

Example 1.9. Consider the martingale Mt “ exprBt ´ t{2s. Because it is positive, we have that

sup E r|Mt|s “ sup E rMts “ E rM0s “ 1 ă 8 , tPT tPT so it converges almost surely to a random variable Y by Theorem 1.8. However, we see that by the law of large numbers for Brownian Motion Bt{t Ñ 0 a.s. and therefore

tpBt{t´1{2q E rY s “ E lim Mt “ E lim e “ 0 , tÑ8 tÑ8 ” ı ” ı which differs from limtÑ8 E rMts “ 1. The above observation neans in particular that the conditions of Theorem 1.8 do not guarantee convergence in the L1 norm. Under the stronger condition of uniform integrability of the process 1 Xt one obtains the same result with convergence in L norm and consequently the closedness of the martingale:

Theorem 1.10. Let Mt be a uniformly integrable martingale on T “ r0, 8q, then it converges 1 1 as t Ñ 8 in L and a.s. to a random variable Y . Conversely, if Mt converges in L to an integrable random variable Y then it is square integrable and converges almost surely. In both cases Mt is closed by Y .

2. Optional stopping After studying martingales per se, we consider their relation with stopping times. In particular, we will see that martingales behave nicely with respect to stopping times. To be more explicit, given a stochastic process Xt and recalling the definition Def. 7.14 of a stopping time τ, we denote by τ ^ t “ minpτ, tq and define the stopped process

τ Xt if t ă τ Xt :“ Xτ^t “ . #Xτ else The following theorem gives an example of the nice relationship between martingales and stopping times: it says that the martingale propery is maintained by a process when such process is stopped.

Theorem 2.1. For a Ft-martingale Mt and any stopping time τ, the process Mτ^t is a Ft- martingale (and therefore a Fτ ^ t-martingale), so

E rMτ^ts “ E rM0s for all t ą 0 . (8.4) Proof. xx  Martingales are often thought of as fair games because of their property of conserving their expected value: It is impossible, on average, to make positive gains by playing such game. Under this interpretaion, Theorem 2.1 states that even if a player is given the possibility of quitting the game use any betting strategy, he/she will not be able to make net gains at time t provided that his/her strategy only depends on past information (cfr. Def. 7.14 of stopping time). However, the above property is lost if the player is patient enough, as the following example shows:

Example 2.2. Let Bt be a Standard Brownian Motion (a martingale, hence an example of a “fair game”: you can think of it as a continuous version of betting one dollar on every coin flip) and define τ1 :“ inftt : Bt ě 1u (the strategy of stopping as soon as you have a net gain of 1$). Then by definition we have that Bτ “ 1 ‰ 0 “ E rB0s. 91 A similar situation to the one described above holds when considering the “martingale” betting strategy of doubling your bet every time you loose a coin flip. This strategy leads to an almost sure net win of 1$ if one is patient enough (and has enough money to bet). As the examples above shows, stopped martingales may lose the property of conserving the expected value in the limit t Ñ 8. The following theorem gives sufficient conditions for the martingale property to hold in this limit, i.e., for the expected value of a game to be conserved at a stopping time τ:

Theorem 2.3 (Optional stopping theorem). Let Mt be a martingale, τ a stopping time, then we have E rMτ s “ E rM0s if either of the following conditions holds: ‚ The stopping time τ is bounded a.s, i.e., DK ă 8: τ ď K ,

‚ The martingale Mt is uniformly integrable ,

‚ The stopping time is finite a.s. (i.e., Prτ “ 8s “ 0), Mt is integrable and

lim E rMt1τąts “ 0 . tÑ8 Proof.  Under the gaming interpretation of above, we see that a gaim is “fair”, i.e., it is impossible to make net gains, on average, using only past information, if any of the conditions i)-iii) hold. In particular, in the case of coin-flip games (or casino games) we see that a winning strategy does not exist as condition ii) holds: there is only a finite amount of money in the worls, so the martingale is uniformly bounded, and in particular uniformly integrable. A simplified example of such a situation is given next:

Example 2.4. Let Bt be a Standard Brownian Motion on on the interval a ă 0 ă b and define the stopping time τ “ τab “ inftt P r0, 8q : Bt R pa, bqu. The stopped process Bτ ^ t is uniformly bounded and in particular uniformly integrable. Hence, by Theorem 2.3 we have that E rBτ s “ E rB0s “ 0. However, we also have that Bτ “ b with probability p and Bτ “ a with probability 1 ´ p, therefore ´a 0 “ E rBτ s “ a ¨ p1 ´ pq ` b ¨ p ñ P rBτ “ bs “ p “ , b ´ a which we have concluded based on considerations based on the martingale properties of Bt and therefore extends to any martingale for which τab is finite a.s.. We conclude the chapter by presenting the converse of Theorem 2.3:

Proposition 2.5. Let Xt be a stochastic process such that for any stopping time τ, Xτ is integrable and E rX0s “ E rXτ s. Then Xt is a martingale. Proof. xx 

3. Localization This section is devoted to the use of stopping times for the study of the properties of stochastic processes. As we have seen, the stopped process may have some properties that the original process did not have (e.g., uniform integrability on r0, 8q in Example 2.4). One can generalize such situation to a sequence of stopping times, such as the following example:

Example 3.1. Consider, similarly to Example 2.4, a Standard Brownian Motion Bt on the interval p´n, nq for n P N. Then we can define the stopping times τn :“ inftt : Bt R p´n, nqu. For each n ą 0, the proces is uniformly integrable.

92 In the above example, by taking the limit n Ñ 8 one would approach the original setting of unbounded Brownian Motion by approximating it with uniformly bounded stopped processes. This proces can be extremely useful to obtain stronger results as the ones obtained previously in the course, as we will see later in this section, and justifies the following definition:

Definition 3.2. A property of a stochastic process Xt is said to hold locally if there exists a sequence tτnu of stopping times with the property limnÑ8 τnpωq “ 8 a.s. such that the stopped process Xτn^t has such property. In this cale, the sequence tτnu is called the localizing sequence. A particularly useful example is the one of the martingale property:

Definition 3.3. An adapted process Mt is a local martingale if there exists a sequence of stopping times tτnu such that limnÑ8 τnpωq “ 8 a.s. and the stopped process Mτn^t is a martingale for all n. It is clear that if a property holds in the original sense, then it also holds locally: one just has to take τn “ n ą t. On the contrary a local martingale is in general not a martingale: t 2 Example 3.4. Consider the Itˆointegral 0 exprBs s dBs , for t ă 1{4 and define τn :“ inftt ą 0 : exprB2s “ nu. The process M is a martingale, since we can write it as s τn^t ş t 2 M U exp B 1 2 dB τn^t “ t “ r s s exprBs sďn s ż0 is square integrable by Itˆoisometry. However, we have that 8 1 2 2 expr2B2s “ e2x e´x {p2tq dx E t 2πt ż´8 “ ‰ which diverges for t ą 1{4, implying that Mt is not integrable. We now list some results that, besides allowing to practice the use of localization methods, give sufficient conditions for a local martingale to be a martingale.

Proposition 3.5. Let Mt be a local martingale such that |Mt| ăď Y for an integrable random variable Y , then Mt is a uniformly integrable martingale. Proof. xx  Proposition 3.6. A non-negative local martingale Mt, for t P r0,T s is a supermartingale.

Proof. Let tτnu be the localizing sequence of Mt. Then for any t we have that limnÑ8 τn ^t “ t a.s and therefore that limnÑ8 Mτn^t “ Mt. Consequently, by Fatou’s lemma on conditional expectations we have

rMt|Fss “ lim inf Mτ ^t|Fs ď lim inf rMτ ^t|Fss “ lim inf Mτ ^s “ Ms a.s. , E E nÑ8 n nÑ8 E n nÑ8 n where in the second” equality we haveı used that the limit exists. In particular, we have that E rMts ď E rM0s ă 8.  Corollary 3.7. A non-negative local martingale Mt on T “ r0,T s for T ă 8 is a martingale if and only if E rMT s “ M0 Proof. This is a direct result of Proposition 1.2 and Proposition 3.6.  Remark 3.8. As explained in [9] there exists a necessary and sufficient condition for a local martingale to be a martingale: for the local martingale to be of “Dirichlet Class”, i.e., such that such that the collection of random variables

X “ tXτ : τ is a finite stopping timeu is unformly integrable, i.e., supXPX limnÑ8 E |X|1|X|ąn “ 0. “ 93 ‰ We now give some slightly more advanced examples of the use of localization procedure. We begin by revisiting the problem of proving moment bounds for Itˆointegrals. t Moment Bounds for ItˆoIntegrals. We let It “ 0 σsdBs. We want to prove the moment bounds 2p ş 2p p E|It| ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M t , under the assumption that |σs| ď M a.s. The case p “ 1 follows from the Itˆoisometry. Therefore, we now proceed to prove the induction step. Let us assume the inequality for p ´ 1 and use it to prove the inequality for p. For any N ą 0, we define t 4p´2 2 τN “ inftt ě 0 : |Is| σs ds ě Nu ż0 2p Applying Itˆoformula to x ÞÑ |x| and evaluating at the time t ^ τN produces

t^τN t^τN 2p 2pp´1q 2 2p´1 |It^τN | “ pp2p ´ 1q |Is| σs ds ` 2p |Is| σsdBs “ pIq ` pIIq ż0 ż0 now by the induction hypothesis t t 2pp´1q 2 2p p´1 EpIq ď pp2p ´ 1q E|Is| σs ds ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ pM s ds 0 0 ż 2p p ż 2p p ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M pt ^ τN q ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M t If we define t 2p´1 Ut “ |Is| σs1sďτN dBs ż0 then Ut is a martingale since

t t^τN 4p´2 2 4p´2 2 |Is| |σs| 1sďτN ds “ |Is| |σs| ds ď N. ż0 ż0 Since t^τN is a bounded stoping time, the optional stopping lemma says that EUt^τN “ 0. However as noted above

EpIIq “ EUt^τN so one obtains 2p 2p p E|It^τN | ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M t . 2p Since |Is| is almost surely finite, we know that τN is finite with probability one. Hence |It^τN | Ñ 2p |It| almost sure. Then by Fatou’s lemma we have 2p 2p 2p p E|It| ď lim E|It^τN | ď p2p ´ 1qp2p ´ 3q ¨ ¨ ¨ 3 ¨ 1 ¨ M t NÑ8 d d i d d SDEs with Superlinear Coefficients. Let b: R Ñ R and σp q : R Ñ R be such that for any R ą 0 there exists a C such that m |bpxq ´ bpyq| ` |σpiqpxq ´ σpiqpyq| ď C|x ´ y| i“1 ÿ |bpxq| ` |σpxq| ď C

d for any x, y P B0pRq, where B0prq :“ tx P R : }x}2 ă ru. 94 Consider the sde m piq piq dXt “ bpXtq dt ` σ pXtq dBt (8.5) i“1 ÿ piq For any R let bR and σR are functions such that bRpxq “ bpxq and σRpxq “ σpxq in B0pRq, they d are globally bounded and globally Lipchitz in R . Since bR and σR satisfy the existence and uniqueness assumptions of Chapter 2.3, there exists a pRq solution Xt to the equation m pRq pRq piq pRq piq dXt “ bRpXt q dt ` σR pXt q dBt (8.6) i“1 ÿ For any N ą 0 and R ą 0 we define the stopping time pRq τR :“ inftt ě 0 : |Xt | ą Ru Theorem 3.9. If

P lim τR “ 8 “ 1 RÑ8 „  there there exists a unique strong solution to (8.5).

Proof. Fix a T ą 0. For R P N let ΩR “ τR ă T ă τR`1 . By the assumption 8 ( P ΩR “ 1 . «R“1 ff ď Also notice that the ΩR are disjoint for R “ 1, 2,... and we can define the process pRq Xtpωq “ Xt pωq for t P r0,T s if ω P ΩR . pRq pRq pRq pRq pRq pRq pRq Since sup |Xt pωq| ă R, we know that bpXt pωqq “ b pXt pωqq and σpXt pωqq “ σ pXt pωqq for all t P r0,T s. Hence Xt as defined solves the original equation. Uniquness comes from the fact that solutions to (8.6) are unique.  4. Quadratic variation for martingales Recall the definition of quadratic variation of a stochastic process:

Definition 4.1. The quadratic variation of an adapted stochastic process Xt is defined as jN p 2 rXst :“ lim XtN ´ XtN NÑ8 j`1 j j“0 ÿ ´ ¯ p N where lim denotes a limit in proibability and ttj u is a set partitioning the interval r0, ts defined by N N N N N Γ :“ tttj u : 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tjN “ tu (8.7) N N N with |Γ | :“ supj |tj`1 ´ tj | Ñ 0 as N Ñ 8. The process defined above is a sum of positive contributions and is therefore nondecreasing in t a.s.. 2 Now let Mt be a [local] martingale. In light of Remark 1.3 we know that Mt is a [local] 2 submartingale. Hence, we would like to know if we can transform Mt back to a martingale, for example by subtracting a “compensation process” removing the nondecreasing part of the squared

95 process. It turns out that such process exists and is precisely the quadratic variation process. The intuition behind this result comes from the following computation: assume that s ă t, then we have 2 E rMtMss “ E rMsE rMt|Fsss “ E Ms where in the second equality we have used the martingale propery.“ As‰ a consequence of this we can write 2 2 2 2 2 E pMt ´ Msq “ E Mt ´ 2E rMtMss ` E Ms “ E Mt ´ E Ms . (8.8) In particular this implies that the summands in the definition of quadratic variation can be expressed, “ ‰ “ ‰ “ ‰ “ ‰ “ ‰ on expectation, as differences of expectation values that cancel telescopically, leading to (part of) the following theorem. Theorem 4.2. This theorem can be stated in the martingale and local martingale version:

i) Let Mt be a square-integrable martingale, then the quadratic variation process rMst exists 2 and Mt ´ rMst is a martingale. 2 ii) Let Mt be a local martingale, then the quadratic variation process rMst exists and Mt ´rMst is a local martingale. Proof. We only prove point i) of the theorem above. Point ii) follows for locally square integrable martingales by localization, i.e., by substituting t Ñ τn ^ t where τn is the localizing sequence. Repeating the calculation leading to (8.8) with conditional expectations we obtain

jN 2 2 2 E Mt ´ Ms |Fs “ E pMtN ´ MtN q |FtN . » j j´1 j´1 fi j“1 “ ‰ ÿ Now, taking the limit in probability of the right– hand side (we do not provefl that such limit exists 2 2 here, but we refer to [15]) and rearranging we obtain that E Mt ´ rMst|Fs “ Ms ´ rMss as desired .  “ ‰ We conclude this section by proving a surprising result about martingales with finite first variation.

Lemma 4.3. Let Mt be a continuous local martingale with finite first variation. Then Mt is almost surely constant. The intuition behind the above result is quite simple: considering a continuum time interval, constraining a continuous martingale on behaving “nicely” in order to have finite first variation (for example monotonically or in a differentiable way) the martingale would somehow have to be “consistent with its trend at t´” (except of course in a set of measure 0) and could therefore not respect the constant conditional expectation property. In other words, martingales with finite first variation are too “stiff” to be different from the identity function.

Proof of Lemma 4.3. We assume for this proof that Mt is a [locally] bounded martingale. We will eventually show that the variance of Mt is zero and hence Mt is constant. Picking some partition of time 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tk “ t, recalling (8.8) we consider the variance at time t M 2 M 2 M 2 M M 2 E t “ E tn ´ tn´1 “ E p tn ´ tn´1 q ÿ ´ ¯ ÿ ď E sup |Mtn ´ Mtn´1 | |Mtn ´ Mtn´1 | tn ÿ Since the first variation V ptq “ lim∆T Ñ0 |Mtn ´ Mtn´1 | was assumed to be finite we obtain 2 EMt ď pconstřqE lim sup |Mtn ´ Mtn´1 | ∆T Ñ0 tn

96 this limit is zero because M was assumed to be continuous. Hence the variance of Mt is zero and thus Mt is constant almost surely. Thus Mt is constant for any countable collection of times. Use the rational numbers and then continuity to conclude that it is constant and the same constant for all times. 

5. L´evy-Doob Characterization of Brownian-Motion Theorem 5.1 (L´evy-Doob). If Xptq is a continuous martingale such that

i) Xp0q “ 0, ii) Xptq is a square integrable-martingale with respect to the filtration it generates, iii) Xptq2 ´ t is a square integrable-martingale with respect to the filtration it generates then Xptq is a standard Brownian motion.

It is important the Xptq be continuous. For example if Nt is a Nt ´ t and 2 pNt ´ tq ´ t are both martingales but Nt ´ t is quite different from browninan motion.

Proof. Our proof essentially follows that of Doob found in [2], which approaches the problem as a central limit theorem, proved through a clever computational trick. Fix a positive integer N and an ε ą 0. Define

τpε, Nq “ infts ą 0 : sup |Xps1q ´ Xps2q| “ εu . s1ăs2ăs |s1´s2|ă1{N If there is no such time s, set τ “ 8. Fix a time t. We what to show that the random variable Xptq iαXptq ´α2t{2 has the same Gaussian distribution as Bt. To do this it is enough to show that Ee “ e , that is show that they both have the same characteristic functions (Fourier transform). It is a standard result in basic probability that if a sequence of random variables have characteristic functions which converge for each α to e´α2t{2 then the sequence of random variables has a limit and it is Gaussian. See [1] for a nice discussion of characteristic functions and convergence of probability measures. Hence, we will show that

2 iαXpt^τq ´ α t E e Ñ e 2 ` Opεq as N Ñ 8 for any ε ą 0. Since ε will be arbitrary! and the) lefthand side is indepndent of ε, this will imply the result. kt Partition the interval r0, ts with point tk “ N . Set

N N 2 iαpX ´X q ´ α pt ´t q I “ E e j j´1 ´ e 2 j j´1 ˇ # +ˇ ˇ j“1 j“1 ˇ ˇ ź ź ˇ where X : X t τ . In general,ˇ observe that the following identity holdsˇ j “ p j ^ q ˇ ˇ

A1A2A3 ¨ ¨ ¨ AN ´ B1B2 ¨ ¨ ¨ BN “A1A2 ¨ ¨ ¨ AN´1pAN ´ BN q

`A1A2 ¨ ¨ ¨ AN´2pAN´1 ´ BN´1qBN

`A1A2 ¨ ¨ ¨ AN´3pAN´2 ´ BN´2qBN´1BN . .

`pA1 ´ B1qB2B3 ¨ ¨ ¨ BN .

97 Hence

N N´k´1 2 N 2 iαpX ´X q iαpX ´X q ´ α pt ´t q ´ α pt ´t q I “ E e j j´1 e N´k N´k`1 ´ e 2 N´k N´k`1 e 2 j j´1 ˇ $ ,ˇ ˇ k“1 j“1 ˆ ˙ j“N´k`1 ˇ ˇ & ÿ ź ź .ˇ ˇ ˇ All ofˇ the% terms in the first product have modulus one and all of the terms in the second product-ˇ are lessˇ than one. Hence ˇ N α2 iαpXN´k´XN´k`1q ´ 2 ptN´k´tN´k`1q I ď E E e ´ e FtN´k^τ # + k“1 ˇ " ˇ * ˇ ÿ ˇ ˇ ˇ Now observe that by Taylorsˇ theorem ˇ ˇ

2 2 2 iα∆ X ´ α ∆ t α 2 α 3 2 e k ´ e 2 k “ iα∆ X ´ p∆ Xq ` ∆ t ` Op∆ Xq ` Op∆ tq k 2 k 2 k k k 3 2 where ∆kX “ Xk ´ Xk´1 and ∆kt “ tk ´ tk´1. The constants implied by Op∆kXq and Op∆ktq can be taken to be uniformly bounded for ε P p0, ε0s and N P rN0, 8q for some ε0 ą 0 and N0 ă 8. Observe that by using the martingale assumptions on X and the optional stopping lemma, we have that

E ∆N´kX|FtN´k^τ “ 0 2 E p∆ N´kXq |FtN´k^τ ( “ ∆N´kpt ^ τq ď tN´k ´ tN´k´1 .

Here ∆kpt ^ τq “ tk ^ τ ´ tk ^ τ. By our definition( of τ, |∆N´kX| ď ε. So we have 3 2 2 E |∆N´kX| |FtN´k^τ ď sup |∆N´kX| E p∆N´kXq |FtN´k^τ ď εE p∆N´kXq |FtN´k^τ . k,ω ( ` ˘ ( ( And thus,

N α2 α2 I ď tiα∆ X|F u ´ t p∆ Xq2|F u ` ∆ t ` Cp∆ tq2 ` C tp∆ Xq3|F u E E k tN´k E 2 k tN´k 2 k k E k tN´k^τ #k“1 ˇ ˇ+ ÿ ˇ ˇ ˇt α2 t α2 t 2 t t ˇ ď ´N ˇ ` N ` CN ` CεN “ C ` εCt ˇ N 2 N 2 N N N « ˆ ˙ ˆ ˙ ˆ ˙ ˆ ˙ff Observe that τ Ñ 8 as N Ñ 8 for any fixed ε. Hence we have that

2 iαXptq ´ α t E e ´ e 2 ď lim I ď εCt NÑ8 ˇ ˇ ˇ ! ) ˇ Notice that the left hand side isˇ independent of ε. Sinceˇ C and t are fixed and ε was any arbitrary ˇ ˇ number in p0, ε0s, we conclude that

2 iαXptq ´ α t E e “ e 2 ! ) 

We now give a slightly different formulation of the Levy-Doob theorem. Let Mt be a continuous martingale. Then by Theorem 4.2 if rMst “ t condition ii) of Theorem 5.1 is satisfied and we obtain the following result.

Theorem 5.2 (Levy-Doob theorem). If Mt is a continuous martingale with rMst “ t and M0 “ 0 then Mt is standard Brownian motion.

98 6. Random time changes

Let Mt be a contiguous martingale with respect to the filtration Ft. Since the quadratic variation map t ÞÑ rMst is non-decreasing, we can define its left-inverse by

τt “ infts ě 0 : rMss ě tu (8.9) and the limiting value

rMs8 “ lim rMst tÑ8

Theorem 6.1 (Dambis-Dubins-Schwartz). If rMs8 ą T then Bt “ Mτt is a Brownian motion on the interval r0,T s with respect to the filtration Gt “ Fτt . Conversely, there exists a standard

Brownian motion Bt such that Mt “ BrMst for t ě 0. Remark 6.2. Theorem 6.1 shows that any continuous martingale is just the time change of Brownian motion with rMst giving the rate at which randomness is injected into the system. 2 Theorem 6.1. Now rBst “ rMsτt “ t since τt is the inverse of t ÞÑ rMst. Hence Bt ´ t is martingale. Since

EpBt|Gsq “ EpMτt |Fτs q “ Mτs “ Bs we see that Bt is also a martingale. Hence the by the Levy-Doob Theorem (Theorem 5.2), Bt is a standard Brownian motion. The second theorem follows from the first. Since WrMst has the same distribution as B we see that B “ M “ M since τ “ t. rMst rMst τrMst t rMst  It is useful to explicitly see the above construction for an Itˆoprocess

dMt “ σt dBt where σt ą 0 for all t ě 0. Then 2 BtrMst “ σt

Since τt, define in (8.9), is the inverse of t ÞÑ rMst we have that 1 Btτt “ 2 . σt

7. Time Change for an sde Consider the one dimensional sde

dXt “ σpXtq dBt 8 2 with σpxq ą 0 and 0 σ pXsqds “ 8 almost surely. (A simple condition which ensures this is σpxq ě c ą 0 for all x.) Then as before ş τ t 2 1 rXst “ σ pXsqds and τt “ ds “ infts : rXss ą tu σ2pX q ż0 ż0 s Hence if we define Mt “ Xτt then rMst “ rXsτt “ t so Mt is a Brownian motion and 1 1 1 dMt “ 2 dXτt “ dBτt “ dBτt σ pXτt q σpXτt q σpMtq Now consider the sde 2 dXt “ σ pXtqfpXtq dt ` σpXtqgpXtq dBt d d d m,d where σ : R Ñ p0, 8q, f : R Ñ R , g : R ˆ R and Bt is a m-dimensional Brownian motion. 99 Then with the same definitions as above define dMt “ σpXtq dBt and Bt “ Mτt . Since t 2 ´2 rMst “ 0 σ pXsq dBS we see that Bt is a standard Brownian motion. Observe that dτt “ σ pXτt q dt and ş 1 dBt “ dBτt σpXτt q

Defining Yt “ Xτt , we have that

1 1 2 dYt “ 2 dXτt “ 2 σ pXτt qfpXτt q dt ` σpXτt qgpXτt q dBτt σ pXτt q σ pXτt q ` 1 ˘ “ fpYtq dt ` gpYtq dBτt “ fpYtq dt ` gpYtqdBt σpXτt q

Hence Yt solves the equation

dYt “ fpYtq dt ` gpYtqdBt

Note that it is only a week solution since the Brownian motion Bt was constructed at the same time as the solution Yt. A strong solution required that the Brownian motion be specified in advance. Remark 7.1. The same argument shows that if 2 dXt “ σt fpXtq dt ` σtgpXtq dBt t ´2 for some positive, adapted stochastic process σt. If τt “ 0 σs ds and Yt “ Xτt then dYt “ fpYtq dt ` gpYştqdBt t 2 for the standard Brownian motion Bt “ Mτt where Mt “ 0 σs dBs.

8. Martingale representationş theorem 9. Martingale inequalities

Theorem 9.1 (Doob’s Martingale Inequality). Let Mt be a continuous-time martingale with respect to the filtration Ft. Then p Er|MT | s P sup |Mt| ě λ ď p 0ďtďT λ ´ ¯ where λ P R`, and p ě 1. We now prove a limited version of the very useful Burkholder-Davis-Gundy inequalities. For the complete versions see [4, 8, 14]. t Theorem 9.2 (Burkholders Inequality). Let Xt “ 0 fspωqdBs then for any p ě 1 2p p E sup |Xs| ď CpşExXty 0ďsďt where Cp is a constant independent of the process, depending only on p. Proof. Doob’s Lp maximal inequality combined with Itˆo’sformula implies that 2p p 2p E sup |Xs| ďq E|Xt| (8.10) 0ďsďt 2pp2p ´ 1q t t “qp |X |2pp´1qfpsq2ds ` qp 2p |X |2p´1fpsqdB (8.11) E 2 s E s s " ż0 * " ż0 * 100 1 1 where p ` q “ 1. Next we introducing the stopping time t 4p´2 2 τN “ inftt ě 0 : |Xs| |fs| ds ě Nu ż0 t 2p´1 Let IN ptq “ 2p 0 |Xs^τN | fs^τN dBs. Since the integrand is bounded by the construction of τN we have that EIN ptq “ 0. Notice that ş t^τN 2p´1 E 2p |Xs| fpsqdBs “ EIN pt ^ τN q “ 0 " ż0 * where the last equality follows from the Optional Stopping Theorem. Next observe that t t 2pp´1q 2 2pp´1q 2 2pp´1q E |Xs| fpsq ds ďE sup |Xs| fpsq ds “ E sup |Xs| xXty 0 s t 0 s t "ż0 * " ď ď ż0 * " ď ď * then by Holders inequality we have that

1 1´ p 2p p 1 ďE sup |Xs| pExXty q p ˆ0ďsďt ˙ Putting everything together produces 1 1 1´ p 2p 2p p p E sup |Xs| ď CE sup |Xs| ExXt^τN y 0ďsďt^τN 0ďsďt^τN ˆ ˙ ´ ¯ By the definition of the stopping time everything is finite, hence we can divide thought by the first term on the right to obtain 1 1 p 2p p p E sup |Xs| ď C ExXt^τN y 0ďsďt^τN ˆ ˙ ´ ¯ raising everything to the power p and then removing the stopping time by taking the limit as N Ñ 8 gives the result. 

101

CHAPTER 9

Girsanov’s Theorem

1. An illustrative example We begin with a simple example. We will frame it in a rather formal way as this will make the analogies with later examples clearer. Let us consider the probability space pω, P, Fq where Ω “ R and P is the standard Gaussian with mean zero on variance one. (For completeness let F be the Borel σ-algebra on R.) Now consider two random variable Z and Z˜ on this probability space. As always, a real valued random variable is a function from Ω into R. Let us define Zpωq “ ω and Z˜pωq “ ω ` µ for some fixed constant µ. Since ω is drawn under P with respect the Np0, 1q measure on R we have that Z is also distributed Np0, 1q and Z˜ is distributed Npµ, 1q. Now let us introduce the density function associated to P as 1 ω2 φpωq “ ? expp´ q 2π 2 Now we introduce the function φpω ` µq µ2 ρµpωq “ “ exp ´ ωµ ´ φpωq 2 ´ ¯ Since ρµ is a function from Ω to R is can be viewed as a random variable and we have 8 8 EPρµ “ ρµpωqPpdωq “ ρµpωqφpωqdω “ φpω ´ µqdω “ 1 żω ż´8 ż´8 1 since φpω ´ µq is the density of a Npµ, 1q random variable. Hence ρµ is a L pΩ, Pq radnom variable. Hence we can define a new measure Q on Ω by

Qpdωq “ ρµpωqPpdωq. This means that for any random variable X on Ω we have that the expected vlaue with respect to the Q, denoted by EQ is define by

EQrXs “ EPrXρµs Furthermore observe that for any bounded f : R Ñ R, 8 8 ˜ ˜ ˜ EQfpZq “ EPrfpZqρµs “ fpZpωqqρµpωqφpωqdω “ fpω ` µqφpω ` µqdω ´8 ´8 ż 8 ż “ fpωqφpωqdω “ EPfpZq ż´8 Which implies that Under the measure Q : Z˜ is dsitributed Np0, 1q Under the measure P : Z˜ is dsitributed Npµ, 1q

103 n Now lets consider a higher dimentional version. Let Ω “ R and let P be n-dinentional Wienner measure with covariance σ2I where σ ą 0 and I is the n ˆ n dimentional covariance matrix. In n analogy to before, we define for ω “ pω1, . . . , ωnq P R n 1 1 2 φpωq “ n exp ´ 2 ωi p2πσ2q 2 2σ ´ i“1 ¯ n ÿ and for µ “ pµ1, . . . , µnq P R n n φpω ` µq 1 1 2 ρµpωq “ “ exp ´ ωiµi ´ µ φpωq σ2 2σ2 i i“1 i“1 ´ ÿ ÿ ¯ n ˜ Then if we define the R valued random variables Zpωq “ pZ1pωq,...,Znpωqq and Zpωq “ ˜ ˜ pZ1pωq,..., Znpωqq “ Zpωq ` µ. Then if we define Qpdωq “ ρµpωqPpdωq then following the same resoning as before that Under the measure Q : Z˜ is dsitributed Np0, σ2Iq 2 Under the measure P : Z˜ is dsitributed Npµ, σ Iq Example 1.1. Importance Sampling Let f : R Ñ R, and X be distributed N(0,1). We have:

8 2 1 ´ x EfpXq “ ? fpxqe 2 dx 2π ż´8 8 2 2 1 ´ px´µq µ ´µx “ ? fpxqe 2 e 2 dx 2π ż´8

for some µ P R. n For n large and tXiu1 iid N p0, 1q, we have:

1 n rfpXqs « fpX q E n i i“1 ÿ

Let Y be distributed N(µ, 1). We have:

2 ´µY ` µ ErfpXqs “ E fpY qe 2 „  n 2 1 ´µY ` µ « fpY qe i 2 n i i“1 ÿ

n for tYiui“1 iid N(µ, 1).

2. Tilted Brownian motion

Let Xt “ Bt ` µt where Bt is standard Brownian Moation, µ P R. Let 0 “ t0 ă t1 ă ¨ ¨ ¨ ă tn ď n T . Let f, g : R Ñ R such that fpx1, x2, . . . , xnq “ gpx1, x2 ´ x1, . . . , xn ´ xn´1q. We have:

104 E gpXt1 ,Xt2 ´ Xt1 ,...,Xtn ´ Xtn´1 q 2 2 “ ‰ n rpxi´xi´1q´µpti´ti´1q s gpx1, x2 ´ x1, . . . , xn ´ xn´1q ´ 2pt ´t q “ e i i´1 PpXt P dx1,...,Xt P dxnq n 1 1 1 1 n ¨ 2 ˛ Ωˆ¨¨¨ˆΩ p2πq 2 t pt ´ t q 2 ¨ ¨ ¨ pt ´ t q 2 i“1 ż 1 2 1 n n´1 ź ˝ ‚ Now:

2 x x µ t t 2 2 n rp i´ i´1q´ p i´ i´1q s n pxi´xi 1q n ´ 1 2 ´ 2 t t ´ 2 t t µpx ´x q´ µ pt ´t q e p i´ i´1q “ e p i´ i´1q e i i´1 2 i i´1 i“1 i“1 i“1 ź ź 2 ź n pxi´xi 1q ´ 1 2 ´ 2 t t µx ´ µ t “ e p i´ i´1q e n 2 n i“1 ź

And so:

1 2 µBtn ´ 2 µ tn E rfpXt1 ,...,Xtn qs “ E fpBt1 ,...,Btn qe ” ı “ E rfpBt1 ,...,Btn qMts

1 2 µBt´ µ t where Mt is the exponential martingale e 2 .

3. Shifting Weiner Measure a Simple Cameron Martin Theorem xx

n 3.1. Some calculations. Let Z1,...,Zn be iid N(0,1) on pΩ, F, Pq. Also let f : R Ñ R. Then:

EPfpZ1,...,Znq “ fpζ1, . . . , ζnqPpZ1 P dζ1,...,Zn P dζnq żΩˆ¨¨¨ˆΩ 1 1 n 2 ´ i“1 ζi “ fpζ1, . . . , ζnq n e 2 dζ1 ¨ ¨ ¨ dζn 2π 2 żΩˆ¨¨¨ˆΩ ˆ ˙ ř

Now define a new measure P˜:

dP˜pdωq “ Y dPpdωq n µ Z pωq´ 1 n µ2 where Y “ ep i“1 i i 2 i“1 i q ř řn and µ “ pµ1, . . . , µnq P R

The expectation of f under P˜ is given by:

105 n 1 n 2 p i“1 µiζi´ 2 i“1 µi q EP˜ fpZ1,...,Znq “ fpζ1, . . . , ζnqe PpZ1 P dζ1,...,Zn P dζnq żΩˆ¨¨¨ˆΩ ř ř 1 1 n 2 ´ i“1pζi´µiq “ fpζ1, . . . , ζnq n e 2 dζ1 ¨ ¨ ¨ dζn Ω Ω p2πq 2 ż ˆ¨¨¨ˆ ˆ ˙ ř

Now let Z˜ “ Z ´ µ. We see that:

2 EP˜ Zi “µi, EP˜ pZi ´ µiq “1 2 EPZi “0, EPpZi ´ µiq “1

We also have:

˜ ˜ EPfpZ1,...,Znq “ EP˜ fpZ1,..., Znq

“ EPY fpZ˜1,..., Z˜nq

“ EPY fpZ1 ´ µ1,...,Zn ´ µnq

4. Girsanov’s Theorem for sdes 4.1. Girsanov (1D).

Definition 4.1. Given two measures µ, ν, we say ν ăă µ (ν is absolutely continuous wrt µ) if: µpAq “ 0 ñ νpAq “ 0 for all measurable sets A. Lemma 4.2. Let µ and ν be probability measures on pΩ, Gq with dνpωq “ fpωqdµpωq for some f P L1pµq. Let X be a random variable with:

Eν|X| “ |Xpωq|dνpωq “ |Xpωq|fpωqdµpωq ż ż If H Ă G is a σ-algebra, then:

Eν rX|Hs Eµ rf|Hs “ Eµ rfX|Hs t 1 t 2 ´ asdBs´ a ds Let dYt “ atpωq ` dBt Define: Mt “ e 0 2 0 s p‹q Assume E|Mt| ă 8 so that Mt is a martingale. p‹q Holds if any of the followingş hold: ş

i) D cptq such that |aspωq| ă cptq @ 0 ă s ď t a.s. ii) Novikov’s Condition: 1 t a2ds EP e 2 0 ă 8 ş ” ı Claim 4.3. If dQpωq “ MT pωqdPpωq for t ď T , then Yt is a SBM wrt Q. Proof: We want to show that Yt is a SBM wrt Q. We are going to show:

i) Yt is a martingale wrt Q 2 ii) Yt ´ t is a martingale wrt Q 106 Assuming Y0 “ 0, Levy-Doob then implies that Yt is a SBM wrt Q. Let Kt “ MtYt “ gpMt,Ytq, where gpx, yq “ xy. Then, dKt “ MtdYt ` YtdMt ` dYtdMt

dMt “ ´ MtatdBt 2 6 dKt “ Mtpat dt ` dBtq ´ YtMtatdBt ´ MtatdBt “Mtpat dt ` dBtq ´ YtMtatdBt ´ Mtat dt

“Mtp1 ´ YtatqdBt

And so Kt is a martingale wrt P. Now,

EP rMtYt|Fss EQ rYt|Fss “ EP rMt|Fss Ks “ Ms “ Ys

And so Ys is a martingale wrt Q. Theorem 4.4. Let:

dXt “ bpXtq dt ` σpXtqdBt

dYt “ rγpt, ωq ` bpYtqs dt ` σpYtqdBt Xp0q “ Y p0q “ x

Assume D upt, ωq with: i) σpY pt, ωqqupt, ωq “ γpt, ωq a.s. 1 T u2ds ii) E e 2 0 s ă 8 Tş 1 T 2 ´ usdBs´ u ds If MT “ ”e 0 ı 2 0 s and dQ “ MT dP With Q the Law of Yt, P the Law of Xt, and ˆ t ş ş Bt “ 0 ups, ωqds ` Bt Then Yt solves: ˆ ş dYt “ bpYtq dt ` σpYtqdBt and Bˆt is a BM wrt Q Let Xt, Yt be defined by:

dXt “ apXtq dt ` σpXtqdBt

dYt “ bpYtq dt ` σpYtqdBt

X0 “ Y0 “ x

Define upyq by: σpyqupyq “ bpyq ´ apyq and let: t 1 t 2 ´ upYsqdBs´ |upYsq| ds Mt “ e 0 2 0 ş ş dQ “ MtdP The Q Law of Yt is the same as the P Law of Xt. t Bˆt “ Bt ` upYsqds ż0 107 Under Q, Bˆt is a standard BM.

dYt “ apYtq dt ` σpYtqdBˆt

“ apYtq dt ` σpYtq rupYtq dt ` dBts

“ apYtq dt ` bpYtq dt ´ apYtq dt ` σpYtqdBt

“ bpYtq dt ` σpYtqdBt n n And so the Law of Yt on Cp0,T ; R q is equivalent to the Law of Xt on Cp0,T ; R q. Example 4.5. Brownian motion Tracking a Continuous Function: Let Bt be a BM starting at 0. Let gptq be a continuous function with gp0q “ 0. We would like to establish the truth of the following statement:

P sup |Bt ´ gptq| ă ε ą 0 ˆ0ďtď1 ˙ 1 ε D h P C such that |gpsq ´ hpsq| ď 2 for 0 ă s ď 1. By the triangle inequality:

|Bt ´ gptq| ď |Bt ´ hptq| ` |hptq ´ gptq| And so we need only show that: ε P sup |B ´ hptq| ă ą 0 t 2 ˆ0ďtď1 ˙ Let the event G be given by: ε G “ |B ´ hpsq| ă , s P r0, 1s s 2 ! ε ) “ |X | ă , s P r0, 1s s 2 ! ) Where we define Xs “ Bs ´ hpsq. We then see that: 1 dXs “ dBs ´ h psqds and t 1 1 t 1 2 p h psqdBs´ |h psq| dsq Mt “ e 0 2 0 ş ş With dQ “ MtdP, we see that: i) Under Q, Xt is a standard BM. ii) P „ Q (i.e., QpGq ą 0 ñ PpGq ą 0).

dQ QpGq “ 1 dP dP G ż ˆ ˙ 1 2 2 dQ 1 ď dP PpGq 2 dP ˜ż ˆ ˙ ¸ and so:

QpGq2 PpGq ě 2 dQ dP dP ş ´ ¯

108 Bibliography

1. Leo Breiman, Probability, Society for Industrial and (SIAM), Philadelphia, PA, 1992, Corrected reprint of the 1968 original. MR 93d:60001 2. J. L. Doob, Stochastic processes, John Wiley & Sons Inc., New York, 1953. MR 15,445b 3. Richard Durrett, Stochastic calculus, Probability and Stochastics , CRC Press, Boca Raton, FL, 1996, A practical introduction. MR 1398879 (97k:60148) 4. , Stochastic calculus, a practical introduction, CRC Press, 1996. 5. John Guckenheimer and Philip Holmes, Nonlinear oscillations, dynamical systems, and bifurcations of vector fields, Applied Mathematical Sciences, vol. 42, Springer-Verlag, New York, 1990, Revised and corrected reprint of the 1983 original. 6. Philip Hartman, Ordinary differential equations, second ed., Birkh¨auser,Boston, Mass., 1982. 7. Ioannis Karatzas and Steven E. Shreve, Brownian motion and stochastic calculus, second ed., Graduate Texts in Mathematics, vol. 113, Springer-Verlag, New York, 1991. MR 1121940 (92h:60127) 8. , Brownian motion and stochastic calculus, second ed., Springer-Verlag, New York, 1991. MR 92h:60127 9. Fima C. Klebaner, Introduction to stochastic calculus with applications, third ed., Imperial College Press, London, 2012. MR 2933773 10. N. V. Krylov, Introduction to the theory of random processes, Graduate Studies in Mathematics, vol. 43, American Mathematical Society, Providence, RI, 2002. MR 1885884 (2003d:60001) 11. Hui-Hsiung Kuo, Introduction to stochastic integration, Universitext, Springer, New York, 2006. MR 2180429 (2006e:60001) 12. Peter M¨ortersand Yuval Peres, Brownian motion, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge, 2010, With an appendix by Oded Schramm and Wendelin Werner. MR 2604525 (2011i:60152) 13. Bernt Øksendal, Stochastic differential equations, fifth ed., Universitext, Springer-Verlag, Berlin, 1998, An introduction with applications. MR 1619188 (99c:60119) 14. Philip Protter, Stochastic integration and differential equations: a new approach, Springer-Verlag, 1990. 15. Philip E. Protter, Stochastic integration and differential equations, Stochastic Modelling and Applied Probability, vol. 21, Springer-Verlag, Berlin, 2005, Second edition. Version 2.1, Corrected third printing. MR 2273672 (2008e:60001) 16. Walter A. Strauss, Partial differential equations, John Wiley & Sons Inc., New York, 1992, An introduction. MR 92m:35001 17. S. J. Taylor, Exact asymptotic estimates of brownian path variation, Duke Mathematical Journal 39 (1972), no. 2, 219241, Mathematical Reviews number (MathSciNet) MR0295434, Zentralblatt MATH identifier0241.60069.

109

APPENDIX A

Some Results from Analysis

Recalling that, given a probability space pΩ, Σ, Pq and a random variable X on such a space we define the expectation of a function f as the integral

E rfpXqs “ fpωqPpdωq , żΩ where P denotes the (probability) measure against which we are integrating. The following results are stated for a general measure µ (i.e., not necessarily a probability measure).

Theorem 0.1 (Lebesgue’s Dominated Convergence theorem). Let tfnu be a sequence of mea- surable functions on a measure space pΩ, Σ, µq. Suppose that the sequence converges pointwise to a function f and is dominated by some integrable function g in the sense that

|fnpxq| ď gpxq for all numbers n in the index set of the sequence and all points x S. Then f is integrable and

lim |fn ´ f| dµ “ 0 nÑ8 żΩ which also implies

lim fn dµ “ f dµ nÑ8 żΩ żΩ Theorem 0.2 (Fatou’s Lemma). Given a measure space pΩ, Σ, µq and a set X P Σ , let tfnu be a sequence of pΣ, BRě0 q-measurable non-negative functions fn : X Ñ r0, `8s. Define the function f : X Ñ r0, `8s by setting fpxq “ lim inf fnpxq, nÑ8 for every x P X. Then f is pΣ, BRě0 q-measurable, and

f dµ ď lim inf fn dµ. nÑ8 żΩ żΩ where the integrals may be finite or infinite.

Remark 0.3. The above theorem can in particular be used when f is the indicator function 1An for a sequence of sets tAnu P Σ, obtaining

µplim inf Anq “ lim inf 1A dµ ď lim inf 1A dµ “ lim inf µpAnq . nÑ8 nÑ8 n nÑ8 n nÑ8 żΩ żΩ

111