Lecture 8: Stochastic Differential Equations
Readings Recommended: • Pavliotis (2014) 3.2-3.5 • Oksendal (2005) Ch. 5 Optional: • Gardiner (2009) 4.3-4.5 • Oksendal (2005) 7.1,7.2 (on Markov property) • Koralov and Sinai (2010) 21.4 (on Markov property)
We’d like to understand solutions to the following type of equation, called a Stochastic Differential Equation (SDE): dXt = b(Xt ,t)dt + σ(Xt ,t)dWt . (1) Recall that (1) is short-hand for an integral equation
Z t Xt = b(Xs,s)ds + σ(Xs,s)dWs. (2) 0 In the physics literature, you will often see (1) written as dx = b(x,t) + σ(x,t)η(t), dt where η(t) is a white noise: a Gaussian process with mean 0 and covariance function Eη(s)η(t) = δ(t − s). Each term in (1) has a different interpretation.
• The term b(Xt ,t)dt is called the drift term. It describes the deterministic part of the equation. When this is the only term, we obtain a canonical ODE. • The term σ(Xt ,t)dWt is called the diffusion term. It describes random motion proportional to a Brow- nian motion. Over small times, this term causes the probability to spread out diffusively with a diffu- sivity locally proportional to σ 2.
If the diffusion term is constant, i.e. σ(x,t) ≡ σ ∈ R, then the noise is said to be additive. If the diffusion ∂ term depends on x, i.e. ∂x σ(x,t) 6= 0 in (1), the noise is said to be multiplicative. We will see that equations with multiplicative noise have to be treated more carefully then equations with additive noise. We learned how to define the integrals in the expressions above last class. In this one we’ll look at properties of the solutions themselves. We will ask: when do solutions exist? Are they unique? And how can we actually solve them, and extract useful information?
1 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019
8.1 Existence and uniqueness
Definition. A stochastic process X = (Xt )t≥0 is a strong solution to the SDE (1) for 0 ≤ t ≤ T if X is 1 1 2 continuous with probability 1, X is adapted (to Wt ), b(Xt ,t) ∈ L (0,T), σ(Xt ,t) ∈ L (0,T), and Equation (2) holds with probability 1 for all 0 ≤ t ≤ T. Definition. A strong solution X to an SDE of the form (1) is called a diffusion process.
Remark. To be a diffusion process, it is important that the coefficients of (1) depend only on (Xt ,t) – they can’t be general adapted functions f (ω,t). n n×m Theorem. Given equation (1), suppose b ∈ R , σ ∈ R satisfy global Lipschitz and linear growth condi- tions:
|b(x,t) − b(y,t)| + |σ(x,t) − σ(y,t)| ≤ K|x − y| |b(x,t)| + |σ(x,t)| ≤ K(1 + |x|)
n for all x,y ∈ R , t ∈ [0,T], where K > 0 is a constant. Assume the initial value X0 = ξ is a random variable 2 with Eξ < ∞ and which is independent of (Wt )t≥0. Then (1) has a unique strong solution X.
Remark. “Unique” means that if X1,X2 are two strong solutions, then P(X1(t,ω) = X2(t,ω) for all t) = 1. That is, the two solutions are equal everywhere with probability 1. This is different from the statement that X1, X2 are versions of each other – you should think about how. This theorem bears a lot in common with similar theorems regarding the existence and uniqueness to the solution to an ODE. Counterexamples that show the necessity of each of the conditions of the theorem that apply to ODEs, can also be used for SDEs. Example. To construct an equation whose solution is not unique, we drop the condition of Lipschitz con- 2/3 3 tinuity. Consider the ODE dXt = Xt dt, which has solutions Xt = 0 for t ≤ a, Xt = (t − a) for t > a, for any a > 0. However, b(x) = 3x2/3 is not Lipschitz continuous as 0. For an example involving a Brownian motion, consider 1/3 2/3 dXt = 3Xt dt + 3Xt dWt , X0 = 0. 3 This has (at least) two solutions: Xt = 0, Xt = Wt . But, again, the coefficients of the SDE are not Lipschitz continuous. Example. To construct an equation which has no global solution, we drop the linear growth conditions. Consider 2 dXt = Xt dt, X0 = x0. 1 1 The solution is Xt = , which blows up at t = . 1 −t x0 x0
Proof (Uniqueness, 1d). (from Evans (2013), section 5.B.3) Let Xt ,Xˆt ∈ V ([0,T]) be two strong solutions to (1). Then Z t Z t Xt − Xˆt = b(Xs,s) − b(Xˆs) ds + σ(Xs,s) − σ(Xˆs,s) dWs . 0 0
1Actualy we ask for something slightly stronger, namely that X be progressively measurable with respect to F , the filtration generated by (Wt )t≥0
2 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019
Square each side, use (a + b)2 ≤ 2a2 + 2b2, and take expectations to get
Z t 2 Z t 2 2 E|Xt − Xˆt | ≤ 2E b(Xs,s) − b(Xˆs,s) ds + 2E σ(Xs,s) − σ(Xˆs,s) dWs . 0 0 We estimate the first term on the right-hand side using the the Cauchy-Schwarz inequality, which implies R t 2 R t 2 that 0 f ds ≤ t 0 | f | ds. We then use the Lipschitz continuity of b. The result is
Z t 2 Z t Z t 2 2 2 E b(Xs,s) − b(Xˆs,s) ds ≤ TE b(Xs,s) − b(Xˆs,s) ds ≤ K T E|Xs − Xˆs| ds . 0 0 0 Now we estimate the second term using the Itoˆ isometry and the Lipschitz continuity of σ:
Z t 2 Z t Z t 2 2 2 E σ(Xs,s) − σ(Xˆs,s) dWs = E|σ(Xs,x) − σ(Xˆs,s)| ds ≤ K E|Xs − Xˆs| ds . 0 0 0 Putting these estimates together shows that Z t 2 2 E|Xt − Xˆt | ≤ C E|Xs − Xˆs| ds 0
2 for some constant C, for 0 ≤ t ≤ T. If we let φ(t) ≡ E|Xt − Xˆt | , then the inequality is Z t φ(t) ≤ C φ(s)ds for all 0 ≤ t ≤ T . 0 Now we can use Gronwall’s Inequality, which says that if we are given a function f and nonnegative numbers a,b ≥ 0, then Z t f (t) ≤ a + b f (s)ds =⇒ f (t) ≤ aebt . 0 2 The proof is given in the appendix. Applying Gronwall’s Inequality with f (t) = φ(t) = E|Xt − Xˆt | and a = 0, b = C shows that 2 E|Xt − Xˆt | = 0 for all 0 ≤ t ≤ T .
Therefore for each fixed t ∈ [0,T] we have that Xt = Xˆt a.s. We have to show this holds for all t simultane- ously, i.e. the whole trajectory is equal, except for ω in a set of measure 0. We can argue that Xr = Xˆr a.s. for all rational 0 ≤ r ≤ T, i.e. P(Xt = Xˆt ∀t ∈ Q ∩ [0,T]) = 1. This is because can extend the equality to a ˆ countable set of t-values, say {t1,t2,...}, because for each ti, the “bad” ω-values {ω : Xti (ω) 6= Xti (ω)} form a measure-zero set, and a countable union of measure-zero sets is measure zero. By assumption X, Xˆ have continuous sample paths almost surely, so we can extend the equality to all values of t using the fact that the rationals form a dense set in R, so P(Xt = Xˆt ∀t ∈ [0,T]) = 1.
Proof (Existence, for a simpler equation). (Based on Evans (2013), section 5.B.1) We will show existence for the simpler equation
dXt = b(Xt )dt + dWt , X0 = x ∈ R , (3) where b ∈ C1 with |b0| ≤ K for some constant K. The proof in the more general case uses similar ideas, see e.g. Evans (2013) section 5.B.3 or Oksendal (2005) section 5.2.
3 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019
0 The proof is based on Picard iteration, as for the typical ODE existence proof. Let Xt = x, and define Z t n+1 n Xt = X0 + b(Xs ,s)ds + dWt , 0 n = 0,1,.... Define n n+1 n D (t) ≡ max |Xs − Xs | . 0≤s≤t Notice that for a given, continuous sample path of Brownian motion we have
Z s 0 D (t) = max b(x)dr +Ws ≤ C 0≤s≤t 0 for all 0 ≤ t ≤ T, where the constant C depends on the sample path via ω. We claim that Kn Dn(t) ≤ C tn . n! We show this by induction. The base case n = 0 is true. Assume it holds for n − 1 and calculate
Z s n n n−1 D (t) = max b(Xr ) − b(Xr ) dr 0≤s≤t 0 Z t ≤ K Dn−1(s)ds 0 Z t Kn−1sn−1 ≤ K C ds by the induction assumption 0 (n − 1)! Kn = C tn . n!
Therefore for m ≥ n we have
∞ k k m n K T max |Xt − Xt | ≤ C → 0 as n → ∞ . ≤t≤T ∑ 0 k=n k!
Therefore with probability 1, Xn converges uniformly for t ∈ [0,T] to a limit process X. One can check that X is continuous, adapted and solves (3). (See Varadhan (2007), p.90 for a more explicit construction of the uniform convergence argument.)
8.2 Examples of SDEs and their solutions
Let’s look at some specific SDEs and their solutions. First we recall some useful properties of the Itoˆ integral. We showed last lecture that R t • The non-anticipating property: E 0 f (s,ω)dWs = 0. R t 2 R t 2 • The Itoˆ isometry: E 0 f (s,ω)dWs = E 0 f (s,ω)ds.
4 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019
Another useful property is the following: for adapted processes g,h, Z t Z t Z t E g(s,ω)dWs h(s,ω)dWs = E[g(s,ω)h(s,ω)]ds . (4) 0 0 0 To prove this property, apply Ito’sˆ isometry with f = h + g.
Formally, (4), as well as Ito’sˆ isometry, can be derived from the substitutions EdWu = EdWv = 0, EdWudWv = δ(u−v)dudv, and the fact that g(u),h(v) are adapted, so they are each independent of dWu, dWv respectively. That is, write Z t Z t Z t Z t E g(u)dWu h(v)dWv = E[g(u)h(v)dWudWv]. 0 0 0 0 Now decompose the integrand into different pieces, depending on the relationship between u,v. Since 1v
E[g(u)h(v)dWudWv] = E[g(u)h(v)1v
dXt = −aXt dt + σdWt , X0 = ξ , (5) where ξ is independent of (Wt )t≥0. When a > 0 the first term causes an exponential decay to zero, while the second causes fluctuations about this exponential decay, with additive noise. Here are some examples of simulated trajectories with α = σ = 1. With initial condition ξ = 0:
With initial condition ξ = 5:
5 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019
You should notice that in the second set of trajectories, the process initially decays quite quickly, and when it gets close to zero it resembles the first set of trajectories. Let’s solve explicitly for the solution. Multiply both sides by eat and integrate: Z t at at at at as e dXt + ae Xt dt = σe dWt ⇒ e Xt − X0 = σe dWs . | {z } 0 at d(e Xt ) Therefore the solution is Z t −at −a(t−s) Xt = e X0 + σ e dWs | {z } 0 (a) | {z } (b) Term (a) shows the initial condition is “forgotten” exponentially quickly. Term (b) represents the stochastic −at fluctuations, and is similar to the convolution of an exponential kernel with Brownian motion, e ∗Wt , except it only looks at values in the past. This term is Gaussian, since it is the integral of a Gaussian process. Hence, if X0 is Gaussian, then Xt is Gaussian. From the solution formula we can calculate the moments. The mean is −at EXt = e EX0. The mean decays exponentially to 0. The covariance is, using our formal calculations introduced at the beginning of the section, Z sZ t −as −at 2 2 −a(t−v) −a(s−u) B(s,t) = EXsXt − EXsEXt = e e (EX0 − (EX0) ) + σ e e EdWudWv 0 0 | {z } δ(u−v)dudv Z s∧t −a(s+t) 2 −a(s−u) −a(t−u) = e Var(X0) + σ e e du 0 σ 2 = e−a(s+t)Var(X ) + e−a(s+t) e2a(s∧t) − 1 0 2a σ 2 = e−a(s+t)Var(X ) + e−a|s−t| − e−a(s+t) . 0 2a σ 2 We see that as t → ∞, Var(Xt ) = B(t,t) → 2a . σ 2 Now consider taking initial condition X0 ∼ N(0, 2a ). Then the mean and covariance are σ 2 X = 0, Cov(X ,X ) = e−a|s−t| . E t s t 2a
The mean is constant and the covariances only depends on the time difference s−t, so Xt is weakly stationary. Since Xt is Gaussian, it is also strongly stationary. Since it is strongly stationary, the one-point distributions σ 2 are constant with time. Therefore Xt ∼ N(0, 2a ) for all t. Such a distribution, which doesn’t change with time, is called a stationary distribution. We found it here by a clever guess, but later we will see a systematic way to find stationary distributions. It turns out the Ornstein-Uhlenbeck process is the only process that is stationary, Markov, Gaussian, and that has continuous paths. It comes up in a LOT of models. It is what you get when you linearize many SDEs.
6 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019
Example (Geometric Brownian Motion). Consider the equation
dNt = rNt dt + αNt dWt (6)
This is a model for population growth, where r is the average growth rate, and αdWt represents fluctuations in the growth rate. It’s also a model for the stock market, with Nt = price of asset, r = interest rate, α = volatility. The difference from the OU process is in the diffusion term – now the noise is multiplicative.
To solve: first, divide by Nt : dNt = rdt + αdWt Nt
It would be nice to write the LHS as d(logNt ) but we can’t, since we need to use the Itoˆ formula. Instead, we calculate 2 2 1 (dNt ) 1 α d(logNt ) = dNt − 2 = dNt − dt Nt 2Nt Nt 2
Now substitute for dNt using the equation to get
α2 d(logN ) = (r − )dt + αdW . t 2 t The solution is α2 (r− )t+αWt Nt = N0e 2 .
Let’s look at properties of this solution. First, let’s calculate the mean, ENt . To find this, we’ll need to find αW the mean of Yt = e t . Let’s express Yt as an Itoˆ integral, so we can use the non-anticipating property of the Itoˆ integral.
α2 dY = αeαWt dW + eαWt dt t t 2 Z t 2 Z t αWs α αWs ∴ Yt = Y0 + α e dWs + e ds 0 2 0 α2 Z t ∴ EYt = EY0 + EYsds 2 0 d α2 Y = Y , Y = 1 ∴ dt E t 2 E t E 0 2 α t ∴ EYt = e 2
rt Therefore ENt = (EN0)e . The expected value grows with the average rate. What happens to the trajectories themselves? Do they also increase a.s. with the average rate? Recall the Law of the Iterated Logarithm, which says that B lim sup √ t = 1 a.s. t→∞ 2t loglogt √ This implies that the supremum of eαWt grows as eα 2t loglogt as t → ∞, which is slower than linear in the α2 exponential. Therefore the trajectory behaviour depends on r − 2 :
7 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019
α2 • If r > 2 , then Nt → ∞ a.s. as t → ∞. α2 • If r < 2 , then Nt → 0 a.s.. α2 • If r = 2 , then Nt will fluctuate between values that are arbitrarily large and arbitrarily close to zero, a.s.. α2 Notice that the mean and the trajectory do not always behave the same way at ∞. If 0 < r < 2 , then ENt → ∞ while Nt → 0 a.s.! This apparent paradox arises because increasingly large (but rare) fluctuations will dominate the expectation. You should think about this. Example (Stochastically forced harmonic oscillator). Consider the ODE: d2X dX m t + kX + γ t = f (t). (7) dt2 t dt Here Xt is a harmonic oscillator, representing say the angle of a pendulum, like a swing, under small per- turbations from its rest state, m is its mass, k is a spring constant, γ is a damping coefficient, and f (t) is the external forcing. If you have taken a mechanics course, you know that that when a harmonic oscillator is forced periodically, say with forcing f (t) = sinωt, and if the forcing frequency does not equal the resonant frequency, ω 6= pk/m, then the oscillations will be bounded, and usually quite small – if you pump your legs on a swing too quickly or too slowly, the swing doesn’t move very much. However, when the frequency of the forcing exactly equals the resonant frequency, f (t) = sin(pk/mt), the oscillations will grow without bound – with no damping, you could make the swing go all the way around. What happens when the forcing is stochastic? If you pump your legs completely at random, will you swing? We will answer this by letting f (t) = ση(t), where η(t) is a white noise, and solving explicitly for a solution.
dXt We can write this as an SDE by letting Vt = dt : ! dX = V dt X 0 1 X 0 0 dW (1) t t =⇒ d t = t dt + t . dV = (−k2X − V )dt + dW V −k2 − V (2) t t γ t σ 2 t γ t 0 σ dWt | {z } | {z } −A B This has the form dUt = −AUt dt + BdWt , ! X W (1) where U = t , W = t , and A, B are constant matrices. This is a 2-dimensional Ornstein- V t (2) t Wt Uhlenbeck process. We can solve it in the same way as the first example:
At At At d(e Ut ) = e dUt + Ae Ut At At = e (−AUt dt + BdWt ) + Ae Ut At = e BdWt Z t −At −A(t−s) ⇒ Ut = e U0 + e BdWs. 0 Consider cases.
8 Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2019
0 −1 • Case γ = 0 (no damping). Then A = , which has eigenvalues λ = ±ik and corresponding k2 0 ik 0 eigenvectors u = (1,−ik)T , u = (1,ik)T . Then eAt = UeDtU−1, with U = (u u ), D = . 1 2 1 2 0 −ik coskt − 1 sinkt Therefore eAt = k . −ksinkt coskt 0 Consider the solution with initial condition X0 = 0, X0 = 0:
t σ Z (2) Xt = sink(s −t) dWt . k 0
This represents oscillations that grow without bound. To see this, notice that EXt = 0 by the nonantic- ipating property of the Itoˆ integral, and calculate the variance as
2 Z t 2 σ 2 EXt = 2 (sink(s −t)) ds (by Itoˆ isometry) k 0 σ 2 t sin2kt = − k2 2 4k σ 2 t ∼ → ∞ as t → ∞. k2 2 This means that a stochastically forced swingset will swing!2 So if you pump your legs completely at random, you can make the swing go as high as you want, in principle. It is somewhat remarkable that although you are forcing all frequencies equally, and almost all of these are non-resonant, you still have enough forcing near the resonant frequency to make the oscillations grow. • Case γ 6= 0 (with damping.) In this case the eigenvalues of A have a real and imaginary part. The real part is negative and represents exponential damping. The imaginary part represents oscillations, with a frequency that is slightly shifted from k. Exercise 8.1. Write down the solution to (7) explicitly, when γ 6= 0.
One way to gain insight into the properties of the solution is to assume that a stationary solution exists, and to calculate its spectral density. Let us multiply (7) at time t, with (7) at time 0, to get