Stochastic Differential Equations: a SAD Primer
Total Page:16
File Type:pdf, Size:1020Kb
Stochastic Differential Equations: A SAD Primer Adam Monahan July 8, 1998 1 The Wiener Process The stochastic process Wt is a Wiener Process if its increments are normally distributed: Wt − Ws ∼ N (0, t − s) (1) and are independent if non-overlapping: {( − )( − )}= ( , ) ∩ ( , ) =∅ E Wt1 Ws1 Wt2 Ws2 0if s1 t1 s2 t2 (2) In particular, we have E{Wt − Ws }=0(3) 2 E{(Wt − Ws ) }=t − s (4) Furthermore, if we fix the initial value of the process to be zero, W0 = 0(5) then we obtain E{Wt }=0(6) 2 E{(Wt ) }=t (7) E{Wt Ws }=min(t, s) (8) where the third of these equations follows from (2) by noting that if t > s ,then Wt = (Wt − Ws ) + Ws (9) so E{Wt Ws }=E{((Wt − Ws ) + Ws )Ws } (10) 2 = E{(Wt − Ws )Ws }+E{(Ws) } (11) = 0 + s (12) by linearity of the expectation operator. Realisations of the Wiener process are a.s. continuous, ie, P(lim Xs = Xt ) = 1 (13) s→t but are nowhere differentiable. This latter result can be understood to arise from the fact that W + − W 1 t h t ∼ N 0, (14) h h which distribution has divergent variance in the limit h → 0. 1 We can actually sort of make sense of the above result if we indulge in a little “physics math” (PM) ... define the process ξt , which we’ll call Gaussian White Noise, as the distribution with moments: E{ξt }=0 (15) {ξ ξ }=δ( − ) E t1 t2 t1 t2 (16) {ξ ξ ξ }= E t1 t2 t3 0 (17) {ξ ξ ξ ξ }=δ( − )δ( − ) E t1 t2 t3 t4 t1 t2 t3 t4 +δ(t2 − t3)δ(t4 − t1) + δ(t2 − t4)δ(t1 − t3) (18) and so on for all the higher moments of ξt . Note that the process ξt has infinite variance and a zero decorrelation time - it is a curious beast indeed. As is usually the case with objects involving generalised functions, it can be tamed by integration; define the process Yt by Z t 0 Yt = dt ξt (19) 0 By explicit calculation, we see that Yt is well-defined and has all the properties of the Wiener process; for example, for s < t , Z Z t t 2 0 00 E{(Yt − Ys) }= dt dt E{ξt0 ξt00 } (20) Zs Zs t t 0 00 0 00 = dt dt δ(t − t ) (21) Zs s t 0 = dt (22) s = t − s (23) Thus, throwing rigour joyously to the wind, we will write Z t 0 Wt = dt ξt0 (24) 0 and dWt = ξt dt (25) where you’ll note that we haven’t actually defined yet how to take integrals over a stochastic process (ie, we haven’t answered questions like: “How do we define convergence of Riemann sums?”). This PM approach is nice for those of us from a physics background, because it’s somewhat more intuitive (at first, anyhow). Its lack of rigour is not really a problem for a certain class of stochastic differential equations (SDEs),those with so-called additive noise, but for the more general class of SDEs involving what’s referred to as multiplicative noise, the PM approach muddies rather than clarifies matters, and we must fall back on the mathematically rigorous theory of integration over Wiener processes. 2 As a final point here, we note that the fact that variance of Wt+h − Wt scales as h and not as h is a direct consequence of the fact that the white noise as we’ve defined it has this strange delta function autocovariance structure, which in particular implies an infinite variance. Suppose we consider the Gaussian coloured noise process ζt defined such that E{ζt }=0 (26) E{ζt ζt+s}=g(s) (27) where g(s) is a nice, normal, continuous function defined such that g(|s|) ≤ g(0)<∞∀s > 0 (28) and lim g(|s|) = 0 (29) s→∞ Then Z Z t+h t+h 2 0 00 0 00 E{(Wt+h − Wt ) }= dt dt g(t − t ) (30) t t 2 As g is continuous and finite on the domain of integration, from (28) we can bound the double integral from above Z Z t+h t+h 2 0 00 E{(Wt+h − Wt ) }≤g(0) dt dt (31) t t = g(0)h2 (32) so (W + − W )2 E t h t ≤ g(0) (33) h2 Thus, the variance of increments of the coloured noise process scale as h2 , not as h , and so we can expect the derivatives to exist (in some sense). 2 Stochastic Differential Equations I: Additive Noise and the Ornstein-Uhlenbeck Process Let’s start off looking at SDEs by considering what’s pretty much the simplest example, that of damped Brownian motion. It seems simple enough now, but it helped earn Albert Einstein a Nobel prize, so it’s not trivial ... Let’s start off with the PM equation dv t =−γv + ηξ (34) dt t t Differential equations of this type,involving a stochastic term,are often referred to in the physics literature as Langevin equations. We imagine that this is the equation of motion for a grain of pollen in water; the pollen is subject to two forces: a frictional damping proportional to the velocity and a random molecular jostling. Note that the use of Gaussian white noise to model this jostling force is a mathematical idealisation; a strictly zero decorrelation time is unphysical - the molecular motions should have a nonzero memory. However, if that memory is very short compared to the characteristic timescale τ = γ −1 of the damping, then we assume (ie, hope) that to a good approximation we can treat the stochastic forcing as being delta-function autocorrelated. We see then that underlying this equation is the assumption of a broad temporal separation of scales, that the timescale, ν , of the internal memory of the fluctuations is very much smaller than the macroscopic timescale τ of the response. This is among the basic justifications of representing the effects on “slow” variables of “fast”, chaotic variables as white stochastic forcing. Note further that this separation of scales implies also that ξt should have a (sort of) Gaussian distribution. On intermediate timescales, T , such that ν<<T << τ (35) we think of the net effect of these molecular jostlings as being the sum of a very large number of independent small bumps, all of the same distribution. Now, invoking the central limit theorem, regardless of the precise characteristics of the distribution of the effect of individual bumps, the sum of a large number of them will have a Gaussian distribution. Thus, the separation of scales assumption allows us to model the effect of the noise on macroscopic timescales (O(τ)) as delta-function autocorrelated with a Gaussian distribution. Now, blazing forth with the PM approach to (34), we note that this equation can be rewritten d γ γ e t v = ηe t ξ (36) dt t t which we integrate to obtain Z t γ t 0 γ t0 e vt − v0 = η dt e ξt0 (37) 0 or Z t −γ t 0 −γ(t−t0) vt = e v0 + η dt e ξt0 (38) 0 where we’ll assume the initial velocity v0 is certain (ie, has a delta-function distribution). The first term in (38) represents the deterministic monotonic drift toward the fixed point vt = 0; in the absence of stochastic perturbations, a system started away from this fixed point drifts exponentially and monotonically toward it. The second term represents the integrated effect of all the rapid, random molecular jostlings on our grain of pollen; it has the effect of keeping the system away from the fixed point. 3 Now, vt is a linear combination of the Gaussian ξt , so the distribution of vt is also Gaussian and consequently is characterised uniquely by its first and second moments. Let’s have a look now at these moments : −γ t E{vt }=e v0 (39) as the white noise has mean zero, and (assuming s > 0) Z Z t t+s 2 0 00 −γ(t−t0) −γ(t+s−t00) E{(vt − E{vt })(vt+s − E{vt+s})}=η dt dt e e E{ξt0 ξt00 } (40) Z0 Z0 t t+s 0 00 −γ( − 0) −γ( + − 00) 0 00 = η2 dt dt e t t e t s t δ(t − t ) (41) Z0 0 t 0 −γ( + − 0) = η2 dt e 2t s 2t (42) 0 2 η −γ( + ) γ = e 2t s (e2 t − 1) (43) 2γ 2 η −γ −γ( + ) = (e s − e 2t s ) (44) 2γ In particular, the variance of vt is given by 2 η − γ E{(v − E{v })2}= (1 − e 2 t ) (45) t t 2γ We see that the mean, variance and autocovariance function of vt are not stationary for finite t , but that as t →∞, they approach the limiting stationary values s E {vt }=0 (46) η2 Es{v2}= (47) t 2γ η2 s −γ |s| E {v v + }= e (48) t t s 2γ where Es denotes expectation values taken with the limiting stationary distribution πη2 −1/2 γ ps(v, t) = exp − v2 (49) γ η2 and conditional probability distribution (transition probabilities) − / 2 1 2 −γ s 2 πη − γ 1 (v − v e ) ps (v , t + s|v , t) = (1 − e 2 s) exp − 2 1 (50) 2 1 γ 2 (η2/2γ)(1 − e−2γ s) which equation follows from the fact that the transition probability is Gaussian with mean v1 exp(−γ s) (cf (39)) and variance (η2/2γ)(1 − e−2γ s) (cf.