Lecture 8: Stochastic Differential Equations

Readings Recommended: • Pavliotis (2014) 3.2-3.5 • Oksendal (2005) Ch. 5 Optional: • Gardiner (2009) 4.3-4.5 • Oksendal (2005) 7.1,7.2 (on Markov property) • Koralov and Sinai (2010) 21.4 (on Markov property)

We’d like to understand solutions to the following type of equation, called a Stochastic Differential Equation (SDE): dXt = b(Xt ,t)dt + σ(Xt ,t)dWt . (1) Recall that (1) is short-hand for an equation

Z t Xt = b(Xs,s)ds + σ(Xs,s)dWs. (2) 0 In the physics literature, you will often see (1) written as dx = b(x,t) + σ(x,t)η(t), dt where η(t) is a : a with mean 0 and covariance function Eη(s)η(t) = δ(t − s). Each term in (1) has a different interpretation.

• The term b(Xt ,t)dt is called the drift term. It describes the deterministic part of the equation. When this is the only term, we obtain a canonical ODE. • The term σ(Xt ,t)dWt is called the diffusion term. It describes random motion proportional to a Brow- nian motion. Over small times, this term causes the probability to spread out diffusively with a diffu- sivity locally proportional to σ 2.

If the diffusion term is constant, i.e. σ(x,t) ≡ σ ∈ R, then the noise is said to be additive. If the diffusion ∂ term depends on x, i.e. ∂x σ(x,t) 6= 0 in (1), the noise is said to be multiplicative. We will see that equations with multiplicative noise have to be treated more carefully then equations with additive noise. We learned how to define the in the expressions above last class. In this one we’ll look at properties of the solutions themselves. We will ask: when do solutions exist? Are they unique? And how can we actually solve them, and extract useful information?

1 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019

8.1 Existence and uniqueness

Definition. A X = (Xt )t≥0 is a strong solution to the SDE (1) for 0 ≤ t ≤ T if X is 1 1 2 continuous with probability 1, X is adapted (to Wt ), b(Xt ,t) ∈ L (0,T), σ(Xt ,t) ∈ L (0,T), and Equation (2) holds with probability 1 for all 0 ≤ t ≤ T. Definition. A strong solution X to an SDE of the form (1) is called a .

Remark. To be a diffusion process, it is important that the coefficients of (1) depend only on (Xt ,t) – they can’t be general adapted functions f (ω,t). n n×m Theorem. Given equation (1), suppose b ∈ R , σ ∈ R satisfy global Lipschitz and linear growth condi- tions:

|b(x,t) − b(y,t)| + |σ(x,t) − σ(y,t)| ≤ K|x − y| |b(x,t)| + |σ(x,t)| ≤ K(1 + |x|)

n for all x,y ∈ R , t ∈ [0,T], where K > 0 is a constant. Assume the initial value X0 = ξ is a random variable 2 with Eξ < ∞ and which is independent of (Wt )t≥0. Then (1) has a unique strong solution X.

Remark. “Unique” means that if X1,X2 are two strong solutions, then P(X1(t,ω) = X2(t,ω) for all t) = 1. That is, the two solutions are equal everywhere with probability 1. This is different from the statement that X1, X2 are versions of each other – you should think about how. This theorem bears a lot in common with similar theorems regarding the existence and uniqueness to the solution to an ODE. Counterexamples that show the necessity of each of the conditions of the theorem that apply to ODEs, can also be used for SDEs. Example. To construct an equation whose solution is not unique, we drop the condition of Lipschitz con- 2/3 3 tinuity. Consider the ODE dXt = Xt dt, which has solutions Xt = 0 for t ≤ a, Xt = (t − a) for t > a, for any a > 0. However, b(x) = 3x2/3 is not Lipschitz continuous as 0. For an example involving a Brownian motion, consider 1/3 2/3 dXt = 3Xt dt + 3Xt dWt , X0 = 0. 3 This has (at least) two solutions: Xt = 0, Xt = Wt . But, again, the coefficients of the SDE are not Lipschitz continuous. Example. To construct an equation which has no global solution, we drop the linear growth conditions. Consider 2 dXt = Xt dt, X0 = x0. 1 1 The solution is Xt = , which blows up at t = . 1 −t x0 x0

Proof (Uniqueness, 1d). (from Evans (2013), section 5.B.3) Let Xt ,Xˆt ∈ V ([0,T]) be two strong solutions to (1). Then Z t  Z t  Xt − Xˆt = b(Xs,s) − b(Xˆs) ds + σ(Xs,s) − σ(Xˆs,s) dWs . 0 0

1Actualy we ask for something slightly stronger, namely that X be progressively measurable with respect to F , the filtration generated by (Wt )t≥0

2 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019

Square each side, use (a + b)2 ≤ 2a2 + 2b2, and take expectations to get

Z t 2 Z t 2 2   E|Xt − Xˆt | ≤ 2E b(Xs,s) − b(Xˆs,s) ds + 2E σ(Xs,s) − σ(Xˆs,s) dWs . 0 0 We estimate the first term on the right-hand side using the the Cauchy-Schwarz inequality, which implies R t 2 R t 2 that 0 f ds ≤ t 0 | f | ds. We then use the Lipschitz continuity of b. The result is

Z t 2 Z t Z t  2 2 2 E b(Xs,s) − b(Xˆs,s) ds ≤ TE b(Xs,s) − b(Xˆs,s) ds ≤ K T E|Xs − Xˆs| ds . 0 0 0 Now we estimate the second term using the Itoˆ isometry and the Lipschitz continuity of σ:

Z t 2 Z t Z t  2 2 2 E σ(Xs,s) − σ(Xˆs,s) dWs = E|σ(Xs,x) − σ(Xˆs,s)| ds ≤ K E|Xs − Xˆs| ds . 0 0 0 Putting these estimates together shows that Z t 2 2 E|Xt − Xˆt | ≤ C E|Xs − Xˆs| ds 0

2 for some constant C, for 0 ≤ t ≤ T. If we let φ(t) ≡ E|Xt − Xˆt | , then the inequality is Z t φ(t) ≤ C φ(s)ds for all 0 ≤ t ≤ T . 0 Now we can use Gronwall’s Inequality, which says that if we are given a function f and nonnegative numbers a,b ≥ 0, then Z t f (t) ≤ a + b f (s)ds =⇒ f (t) ≤ aebt . 0 2 The proof is given in the appendix. Applying Gronwall’s Inequality with f (t) = φ(t) = E|Xt − Xˆt | and a = 0, b = C shows that 2 E|Xt − Xˆt | = 0 for all 0 ≤ t ≤ T .

Therefore for each fixed t ∈ [0,T] we have that Xt = Xˆt a.s. We have to show this holds for all t simultane- ously, i.e. the whole trajectory is equal, except for ω in a set of measure 0. We can argue that Xr = Xˆr a.s. for all rational 0 ≤ r ≤ T, i.e. P(Xt = Xˆt ∀t ∈ Q ∩ [0,T]) = 1. This is because can extend the equality to a ˆ countable set of t-values, say {t1,t2,...}, because for each ti, the “bad” ω-values {ω : Xti (ω) 6= Xti (ω)} form a measure-zero set, and a countable union of measure-zero sets is measure zero. By assumption X, Xˆ have continuous sample paths almost surely, so we can extend the equality to all values of t using the fact that the rationals form a dense set in R, so P(Xt = Xˆt ∀t ∈ [0,T]) = 1.

Proof (Existence, for a simpler equation). (Based on Evans (2013), section 5.B.1) We will show existence for the simpler equation

dXt = b(Xt )dt + dWt , X0 = x ∈ R , (3) where b ∈ C1 with |b0| ≤ K for some constant K. The proof in the more general case uses similar ideas, see e.g. Evans (2013) section 5.B.3 or Oksendal (2005) section 5.2.

3 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019

0 The proof is based on Picard iteration, as for the typical ODE existence proof. Let Xt = x, and define Z t n+1 n Xt = X0 + b(Xs ,s)ds + dWt , 0 n = 0,1,.... Define n n+1 n D (t) ≡ max |Xs − Xs | . 0≤s≤t Notice that for a given, continuous sample path of Brownian motion we have

Z s 0 D (t) = max b(x)dr +Ws ≤ C 0≤s≤t 0 for all 0 ≤ t ≤ T, where the constant C depends on the sample path via ω. We claim that Kn Dn(t) ≤ C tn . n! We show this by induction. The base case n = 0 is true. Assume it holds for n − 1 and calculate

Z s n n n−1  D (t) = max b(Xr ) − b(Xr ) dr 0≤s≤t 0 Z t ≤ K Dn−1(s)ds 0 Z t Kn−1sn−1 ≤ K C ds by the induction assumption 0 (n − 1)! Kn = C tn . n!

Therefore for m ≥ n we have

∞ k k m n K T max |Xt − Xt | ≤ C → 0 as n → ∞ . ≤t≤T ∑ 0 k=n k!

Therefore with probability 1, Xn converges uniformly for t ∈ [0,T] to a limit process X. One can check that X is continuous, adapted and solves (3). (See Varadhan (2007), p.90 for a more explicit construction of the uniform convergence argument.)

8.2 Examples of SDEs and their solutions

Let’s look at some specific SDEs and their solutions. First we recall some useful properties of the Itoˆ integral. We showed last lecture that R t • The non-anticipating property: E 0 f (s,ω)dWs = 0. R t 2 R t 2 • The Itoˆ isometry: E 0 f (s,ω)dWs = E 0 f (s,ω)ds.

4 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019

Another useful property is the following: for adapted processes g,h, Z t Z t  Z t E g(s,ω)dWs h(s,ω)dWs = E[g(s,ω)h(s,ω)]ds . (4) 0 0 0 To prove this property, apply Ito’sˆ isometry with f = h + g.

Formally, (4), as well as Ito’sˆ isometry, can be derived from the substitutions EdWu = EdWv = 0, EdWudWv = δ(u−v)dudv, and the fact that g(u),h(v) are adapted, so they are each independent of dWu, dWv respectively. That is, write Z t Z t  Z t Z t E g(u)dWu h(v)dWv = E[g(u)h(v)dWudWv]. 0 0 0 0 Now decompose the integrand into different pieces, depending on the relationship between u,v. Since 1v

E[g(u)h(v)dWudWv] = E[g(u)h(v)1v

dXt = −aXt dt + σdWt , X0 = ξ , (5) where ξ is independent of (Wt )t≥0. When a > 0 the first term causes an exponential decay to zero, while the second causes fluctuations about this exponential decay, with additive noise. Here are some examples of simulated trajectories with α = σ = 1. With initial condition ξ = 0:

With initial condition ξ = 5:

5 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019

You should notice that in the second set of trajectories, the process initially decays quite quickly, and when it gets close to zero it resembles the first set of trajectories. Let’s solve explicitly for the solution. Multiply both sides by eat and integrate: Z t at at at at as e dXt + ae Xt dt = σe dWt ⇒ e Xt − X0 = σe dWs . | {z } 0 at d(e Xt ) Therefore the solution is Z t −at −a(t−s) Xt = e X0 + σ e dWs | {z } 0 (a) | {z } (b) Term (a) shows the initial condition is “forgotten” exponentially quickly. Term (b) represents the stochastic −at fluctuations, and is similar to the convolution of an exponential kernel with Brownian motion, e ∗Wt , except it only looks at values in the past. This term is Gaussian, since it is the integral of a Gaussian process. Hence, if X0 is Gaussian, then Xt is Gaussian. From the solution formula we can calculate the moments. The mean is −at EXt = e EX0. The mean decays exponentially to 0. The covariance is, using our formal calculations introduced at the beginning of the section, Z sZ t −as −at 2 2 −a(t−v) −a(s−u) B(s,t) = EXsXt − EXsEXt = e e (EX0 − (EX0) ) + σ e e EdWudWv 0 0 | {z } δ(u−v)dudv Z s∧t −a(s+t) 2 −a(s−u) −a(t−u) = e Var(X0) + σ e e du 0 σ 2   = e−a(s+t)Var(X ) + e−a(s+t) e2a(s∧t) − 1 0 2a σ 2   = e−a(s+t)Var(X ) + e−a|s−t| − e−a(s+t) . 0 2a σ 2 We see that as t → ∞, Var(Xt ) = B(t,t) → 2a . σ 2 Now consider taking initial condition X0 ∼ N(0, 2a ). Then the mean and covariance are σ 2 X = 0, Cov(X ,X ) = e−a|s−t| . E t s t 2a

The mean is constant and the covariances only depends on the time difference s−t, so Xt is weakly stationary. Since Xt is Gaussian, it is also strongly stationary. Since it is strongly stationary, the one-point distributions σ 2 are constant with time. Therefore Xt ∼ N(0, 2a ) for all t. Such a distribution, which doesn’t change with time, is called a stationary distribution. We found it here by a clever guess, but later we will see a systematic way to find stationary distributions. It turns out the Ornstein-Uhlenbeck process is the only process that is stationary, Markov, Gaussian, and that has continuous paths. It comes up in a LOT of models. It is what you get when you linearize many SDEs.

6 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019

Example (Geometric Brownian Motion). Consider the equation

dNt = rNt dt + αNt dWt (6)

This is a model for population growth, where r is the average growth rate, and αdWt represents fluctuations in the growth rate. It’s also a model for the stock market, with Nt = price of asset, r = interest rate, α = volatility. The difference from the OU process is in the diffusion term – now the noise is multiplicative.

To solve: first, divide by Nt : dNt = rdt + αdWt Nt

It would be nice to write the LHS as d(logNt ) but we can’t, since we need to use the Itoˆ formula. Instead, we calculate 2 2 1 (dNt ) 1 α d(logNt ) = dNt − 2 = dNt − dt Nt 2Nt Nt 2

Now substitute for dNt using the equation to get

α2 d(logN ) = (r − )dt + αdW . t 2 t The solution is α2 (r− )t+αWt Nt = N0e 2 .

Let’s look at properties of this solution. First, let’s calculate the mean, ENt . To find this, we’ll need to find αW the mean of Yt = e t . Let’s express Yt as an Itoˆ integral, so we can use the non-anticipating property of the Itoˆ integral.

α2 dY = αeαWt dW + eαWt dt t t 2 Z t 2 Z t αWs α αWs ∴ Yt = Y0 + α e dWs + e ds 0 2 0 α2 Z t ∴ EYt = EY0 + EYsds 2 0 d α2 Y = Y , Y = 1 ∴ dt E t 2 E t E 0 2 α t ∴ EYt = e 2

rt Therefore ENt = (EN0)e . The expected value grows with the average rate. What happens to the trajectories themselves? Do they also increase a.s. with the average rate? Recall the Law of the Iterated Logarithm, which says that B lim sup √ t = 1 a.s. t→∞ 2t loglogt √ This implies that the supremum of eαWt grows as eα 2t loglogt as t → ∞, which is slower than linear in the α2 exponential. Therefore the trajectory behaviour depends on r − 2 :

7 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019

α2 • If r > 2 , then Nt → ∞ a.s. as t → ∞. α2 • If r < 2 , then Nt → 0 a.s.. α2 • If r = 2 , then Nt will fluctuate between values that are arbitrarily large and arbitrarily close to zero, a.s.. α2 Notice that the mean and the trajectory do not always behave the same way at ∞. If 0 < r < 2 , then ENt → ∞ while Nt → 0 a.s.! This apparent paradox arises because increasingly large (but rare) fluctuations will dominate the expectation. You should think about this. Example (Stochastically forced harmonic oscillator). Consider the ODE: d2X dX m t + kX + γ t = f (t). (7) dt2 t dt Here Xt is a harmonic oscillator, representing say the angle of a pendulum, like a swing, under small per- turbations from its rest state, m is its mass, k is a spring constant, γ is a damping coefficient, and f (t) is the external forcing. If you have taken a mechanics course, you know that that when a harmonic oscillator is forced periodically, say with forcing f (t) = sinωt, and if the forcing frequency does not equal the resonant frequency, ω 6= pk/m, then the oscillations will be bounded, and usually quite small – if you pump your legs on a swing too quickly or too slowly, the swing doesn’t move very much. However, when the frequency of the forcing exactly equals the resonant frequency, f (t) = sin(pk/mt), the oscillations will grow without bound – with no damping, you could make the swing go all the way around. What happens when the forcing is stochastic? If you pump your legs completely at random, will you swing? We will answer this by letting f (t) = ση(t), where η(t) is a white noise, and solving explicitly for a solution.

dXt We can write this as an SDE by letting Vt = dt : ! dX = V dt X   0 1 X  0 0 dW (1) t t =⇒ d t = t dt + t . dV = (−k2X − V )dt + dW V −k2 − V (2) t t γ t σ 2 t γ t 0 σ dWt | {z } | {z } −A B This has the form dUt = −AUt dt + BdWt , ! X  W (1) where U = t , W = t , and A, B are constant matrices. This is a 2-dimensional Ornstein- V t (2) t Wt Uhlenbeck process. We can solve it in the same way as the first example:

At At At d(e Ut ) = e dUt + Ae Ut At At = e (−AUt dt + BdWt ) + Ae Ut At = e BdWt Z t −At −A(t−s) ⇒ Ut = e U0 + e BdWs. 0 Consider cases.

8 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019

 0 −1 • Case γ = 0 (no damping). Then A = , which has eigenvalues λ = ±ik and corresponding k2 0 ik 0  eigenvectors u = (1,−ik)T , u = (1,ik)T . Then eAt = UeDtU−1, with U = (u u ), D = . 1 2 1 2 0 −ik  coskt − 1 sinkt Therefore eAt = k . −ksinkt coskt 0 Consider the solution with initial condition X0 = 0, X0 = 0:

t σ Z (2) Xt = sink(s −t) dWt . k 0

This represents oscillations that grow without bound. To see this, notice that EXt = 0 by the nonantic- ipating property of the Itoˆ integral, and calculate the variance as

2 Z t 2 σ 2 EXt = 2 (sink(s −t)) ds (by Itoˆ isometry) k 0 σ 2  t sin2kt  = − k2 2 4k σ 2 t ∼ → ∞ as t → ∞. k2 2 This means that a stochastically forced swingset will swing!2 So if you pump your legs completely at random, you can make the swing go as high as you want, in principle. It is somewhat remarkable that although you are forcing all frequencies equally, and almost all of these are non-resonant, you still have enough forcing near the resonant frequency to make the oscillations grow. • Case γ 6= 0 (with damping.) In this case the eigenvalues of A have a real and imaginary part. The real part is negative and represents exponential damping. The imaginary part represents oscillations, with a frequency that is slightly shifted from k. Exercise 8.1. Write down the solution to (7) explicitly, when γ 6= 0.

One way to gain insight into the properties of the solution is to assume that a stationary solution exists, and to calculate its spectral density. Let us multiply (7) at time t, with (7) at time 0, to get

00 0 00 0  2 mXt + kXt + γXt mX0 + kX0 + γX0 = σ η(t)η(0)

Now we take the expectation and use that

0 0 0 0 0 00 00 00 00 EXt X0 = −EXt X0 = C (t), EXt X0 = −C (t), EXt X0 = EXt X0 = C (t), (8) 00 0 0 00 000 00 00 (4) EXt X0 = −EXt X0 = C (t), EXt X0 = C (t), (9)

to get m2C(4)(t) + k2C(t) − γ2C00(t) + 2mkC00(t) = σ 2δ(t).

2We haven’t shown it does so almost surely, only that the variance increases without bound – you can try to think about why the oscillations will grow without bound a.s., using the properties of Brownian motion.

9 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019

Figure 1: Spectral density for a stationary swing.

Taking the Fourier transform gives

σ 2 m2λ 4 f (λ) + k2 f (λ) + γ2λ 2 f (λ) − 2mkλ 2 f (λ) = 2π and solving for the spectral density f (λ) gives

1 σ 2 f (λ) = . 2π (mλ 2 − k2)2 + λ 2γ2 q 2 γ This density is plotted in Figure 1. The frequency has a peak near k − 2 , which is close to (but slightly less than) the resonant frequency k. The swing swings, but it stays bounded. Notice that the spectral density is not integrable when γ = 0, so we wouldn’t expect a stationary solution with no damping.

8.3 Stratonovich Integral

Suppose we want to work with the Stratonovich integral. Can this be related to the Itoˆ integral? Yes it can. There is a simple formula to relate the two.

Theorem. Suppose Xt solves the Stratonovich equation

dXt = b(t,Xt )dt + σ(t,Xt ) ◦ dWt . (10)

Then Xt also solves the Itoˆ equation

 1 ∂  dXt = b(t,Xt ) + σ(t,Xt ) σ(t,Xt ) dt + σ(t,Xt )dWt . (11) 2 ∂x

Conversely, suppose Xt solves the Itoˆ equation

dXt = b(t,Xt )dt + σ(t,Xt )dWt . (12)

10 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019

Then Xt also solves the Stratonovich equation

 1 ∂  dXt = b(t,Xt ) − σ(t,Xt ) σ(t,Xt ) dt + σ(t,Xt ) ◦ dWt . (13) 2 ∂x

Therefore, a Stratonovich SDE is equivalent to an Itoˆ SDE with an additional drift term. The drift arises when the noise is multiplicative, and it depends on the rate of change of the magnitude of the noise. Heuristically, this is because the Stratonovich integral can “see” into the future so it needs to account for the changing magnitude of the noise.

Note that the Stratonovich correction presented here, only applies when Xt solves an SDE, and not for a more general Itoˆ process. The the remarks at the end of this section for more discussion of this important, but subtle, point.

Proof sketch. (from Pavliotis (2014), p.62.) Suppose Xt solves (10). We’ll work with the integral form of R t the equation. Write 0 σ(s,Xs) ◦ dWs as a limit of sums:

t   Z σ j + σ j+1) σ(s,Xs) ◦ dWs = m.s. lim ∑ (Wj+1 −Wj) (14) 0 ∆t→0 j 2

We write σ j ≡ σ(t j,Xt j ), Wj ≡ Wt j for brevity. We will write increments as ∆Wj = Wt j+1 −Wt j , and similarly for increments of Xt , t. Now estimate σ j+1 from σ j using a Taylor expansion.

∂σ j 2 σ = σ j + ∆Xj + O((∆Xj) ) j+1 ∂x     ∂σ j σ j + σ j+1 = σ j + b j∆t j + ∆Wj + O(∆t) + O(∆t j) using (10) to approximate ∆Xj ∂x 2

∂σ j ! ! ∂σ j σ j + σ j + ∂x ∆Xj) = σ j + b j∆t j + ∆Wj + O(∆t j) approximate σ using first line ∂x 2 j+1

∂σ j = σ j + σ j∆Wj + O(∆t j) . ∂x p We only kept terms up to O( ∆t j), since we will be multiplying all of this by ∆Wj. For the term σ j+1 in the second line, we substituted recursively from the first line. Now substitute the approximation for σ j+1 into (14) to obtain

t   Z 1 ∂σ j σ(s,Xs) ◦ dWs = m.s. lim ∑ σ j + σ j∆Wj ∆Wj + error 0 ∆t→0 j 2 ∂x t t Z Z 1 ∂σ(s,Xs) = σ(s,Xs)dWs + σ(s,Xs)ds . 0 0 2 ∂x Therefore the Stratonovich integral in (10) equals an Itoˆ integral plus the extra drift term in (11). The drift term in (10) is unchanged, a fact you can easily verify. Therefore Xt also solves the Itoˆ equation (11). Remark. To make the proof rigorous, one needs to rigorously control the error terms. This is left as an exercise.

11 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019

Example (Geometric Brownian Motion, revisted). Consider the Stratonovich equation

dNt = rNt dt + αNt ◦ dWt .

This is equivalent to the Itoˆ equation 1 dN = (r + α2)N dt + αN dW , t 2 t t t

rt+αWt whose solution we found earlier to be Nt = N0e . This is also the solution we would have found using the regular chain rule, d(logNt ) = dNt /Nt . It turns out the chain rule from classical works in Stratonovich calculus. We will investigate this more on the homework. Why use the Stratonovich integral? There are several advantages: 0 • The regular chain rule holds, i.e. d f (Xt ) = f (Xt ) ◦ dXt . (ELFS on HW, or see Pavliotis (2014).) • If you start with smooth, non-white noise in an ODE, and take a limit to make the noise white, you typically end up with a Stratonovich integral. That is, if you have an ODE of the form

dxε = b(xε ,t) + σ(xε ,t)ξ ε (t), dt

ε where ξ (t) is a smooth, stationary stochastic process whose covariance function Cε (t) approaches δ(t) as ε → 0. Then as ε → 0, xε (t) approaches in some sense the solution to the Stratonovich SDE dXt = b(Xt ,t) + σ(Xt ,t) ◦ dWt . See e.g. Pavliotis (2014), section 5.1. n • If you restrict your process to lie on a submanifold of R , the most natural way to do this is through the Stratonovich integral. For example, “Brownian motion” on the surface of a sphere is the solution to dXt = P(Xt ) ◦ dBt , 3 where Bt ∈ R is a 3-dimensional Brownian motion and P(Xt ) is the orthogonal projection matrix onto the tangent space to the surface of the unit sphere. What are the disadvantages? • The Itoˆ Isometry is lost. R t • The non-anticipating property E 0 σ(t,ω) ◦ dWt = 0 no longer holds, i.e. the Stratonovich integral “looks into the future.” Both of these make the rigorous analysis of the Stratonovich integral significantly harder. Mathematically, the Itoˆ integral is a “martingale” but the Stratonovich integral is not; since there are many powerful tools developed for processes which are matingales it is often more convenient to use the Itoˆ integral when devel- oping the theory of diffusion processes. Does it matter which integral you use? Not really – you can always convert from one to the other. In multiple dimensions the conversion is given by n n×m Theorem. Given Xt ,Wt ∈ R , σ ∈ R . The Stratonovich equation

dXt = b(t,Xt )dt + σ(t,Xt ) ◦ dWt

12 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019 is equivalent to the Itoˆ equation

 1  dX = b(t,X ) + ∇ σ : σ(t,X ) dt + σ(t,X )dW . t t 2 x t t t

Here (∇xσ : σ)i = ∑ j,k(∂kσik)σk j.

Remark. To apply the Itoˆ -Stratonovich conversion, it is very important that Xt solves an SDE – the coeffi- cients of the equation, must depend only on Xt ,t and not on any other processes. For example, suppose you are given the following equation:

dYt = b(Xt ) ◦ dWt , where dXt = α(Xt )dt + β(Xt )dWt . (15) where Xt solves It is tempting, but wrong, to convert the equation for Yt to Itoˆ form as 1 NOT TRUE: dY = b0(X )b(X )dt + b(X )dW . t 2 t t t t You have to work out a different conversion rule in this case.

Exercise 8.2. Work out the conversion rule for the example above to write equation (15) for Yt in Itoˆ form. Start from a discrete approximation for the integrals involved. (Try this before you read what follows!) 3 Remark. Here is a more general approach to the Stratonovich integral. Given processes Xt ,Yt , the Stratonovich integral is defined in terms of the Itoˆ integral as

Z t Z t 1 Yt ◦ dXt = Yt dXt + hX,Yit . 0 0 2

Here hX,Yit is the quadratic covariation of Xt ,Yt , defined as

hX,Yit = m.s. lim (Xj+1 − Xj)(Yj+1 −Yj). |σ|→0 ∑ j:t j

1 It satisfies hX,Yit = 2 ([X +Y]t − [X]t − [Y]t ), where [X]t = hX,Xit is the of X. For an R t R t R t 2 Itoˆ process Xt = 0 f (s)ds + 0 g(s)dWs, we have that [X]t = 0 g (s)ds.

For the example in the previous remark, we must calculate hb(X),Wit . Defining Zt = b(Xt ), we have from the Itoˆ formula that Z t Z t 0 1 00 2 0 Zt = (b (Xs)α(Xs) + b (Xs)β (Xs))ds + b (Xs)β(Xs)dWt . 0 2 0

R t 0 The quadratic covariation is hZ,Wit = 0 b (Xs)β(Xs)ds, giving Z t Z t 1 0 1 0 Yt = Y0 + b(Xt )dWt + b (Xs)β(Xs)ds ⇔ dYt = b(Xt )dWt + b (Xt )β(Xt )dt . 0 2 0 2 3 Xt ,Yt must be continuous , with respect to the filtration generated by Wt . A process Xt is a if it can be decomposed as Xt = Mt + At , where Mt is a , and At is a progressively measurable process of locally bounded variation.

13 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019

8.4 Appendix

Theorem (Gronwall’s Inequality). Let φ(t), b(t) ≥ 0 be nonnegative, continuous functions defined for 0 ≤ t ≤ T, and let a ≥ 0 be a constant. If

Z t φ(t) ≤ a + b(s)φ(s)ds for all 0 ≤ t ≤ T , 0 then R t φ(t) ≤ ae 0 b(s)ds for all 0 ≤ t ≤ T .

R t Proof. (Theorem and proof as stated in Evans (2013), section 5.B.3) Let Φ(t) = a + 0 b(s)φ(s)ds. Then Φ0 = bφ ≤ bΦ, so  R t 0 R t R t e− 0 bdsΦ = (Φ0 − bΦ)e− 0 bds ≤ (bφ − bφ)− 0 bds = 0 .

Therefore R t R t Φ(t)e− 0 bds ≤ Φ(0) = a =⇒ φ(t) ≤ Φ(t) ≤ ae 0 bds .

References

Evans, L. C. (2013). An Introduction to Stochastic Differential Equations. American Mathematical Society. Gardiner, C. (2009). Stochastic methods: A handbook for the natural sciences. Springer, 4th edition. Koralov, L. B. and Sinai, Y. G. (2010). Theory of Probability and Random Processes. Springer. Oksendal, B. (2005). Stochastic Differential Equations. Springer, 6 edition. Pavliotis, G. A. (2014). Stochastic Processes and Applications. Springer. Varadhan, S. R. S. (2007). Stochastic Processes, volume 16 of Courant Lecture Notes. American Mathe- matical Society.

14