<<

A short tutorial on the path formalism for stochastic differential equations

Dani Mart´ı Group for Neural Theory, Ecole´ Normale Sup´erieure,Paris, France

This is a very short and incomplete introduction to the formalism of path for stochas- tic systems. The tutorial is strongly inspired by the very nice introduction by Chow and Buice (2015), which is far more complete than this text and whose reading I highly recommend if you are interested in the subject.

Goal: Provide a general formalism to compute moments, correlations, and transition proba- bilities for any , regardless of its nature. The formalism should allow for a systematic perturbative treatment.

Preliminaries: Moment generating functionals

Single variable Consider a single random variable X with density function p(x), not necessarily normalized. The n-th moment of X is defined as R xnp(x) dx hXni ≡ . R p(x) dx Instead of computing the moments one by one, we can make use of the moment generating function, defined as Z Z(λ) ≡ eλxp(x) dx. (1)

With this definition, moments can be computed by simply taking derivatives of Z(λ) and dividing the result by Z(0): n n 1 d hX i = n Z(λ) . (2) Z(0) dλ λ=0 The expansion of Z(λ) in a power series reads then

∞ X λm Z(λ) = Z(0) hXmi. (3) m! m=0

• Example: cumulant-generating function for a Gaussian random variable X with mean a and variance σ2. Plugging the (unnormalized) Gaussian probability density p(x) = exp[−(x − µ)2/(2σ2)] into Eq. (1) gives

Z ∞  (x − µ)2  Z(λ) = exp − 2 + λx dx, (4) −∞ 2σ which can be evaluated by ‘completing the square’ in the exponent,

1 1 λ2σ2 − (x − µ)2 + λx = − (x − µ − λσ2)2 + λµ + . 2σ2 2σ2 2

1 Carrying out integral over x yields

 1  Z(λ) = Z(0) exp λµ + λ2σ2 , 2 √ R ∞ −x2/(2σ2) where Z(0) = −∞ e dx = 2πσ is the normalization factor. Once we have Z(λ) we can easily compute the moments using Eq. (2). Thus, the first and second moments of x are   d 1 2 2 hXi = exp λa + λ σ = a, dλ 2 λ=0 2   2 d 1 2 2 2 2 hX i = 2 exp λa + λ σ = a + σ . dλ 2 λ=0

The cumulants of the random variable X, denoted by hhXnii, are defined by n n d Z(λ) hhX ii = n ln . (5) dλ Z(0) λ=0 Equivalently, the expansion of ln Z(λ) in power series is ∞ X λn ln Z(λ) = ln Z(0) + hhXnii. (6) n! n=1 The relation between cumulants and moments follows from equating the expansion (6) to the logarithm of the expansion (3), ∞ ∞ ∞ ! X λn X λn X λn hhXnii = ln hXni = ln 1 + hXni , (7) n! n! n! n=1 n=0 n=1 expanding the right-hand side of Eq. (7), and equating coefficients of equal powers of λ. The first three cumulants are hhXii = hXi, hhX2ii = hX2i − hXi2, and hhX3ii = hX3i − 3hX2ihXi + 2hXi3.

• Example: cumulant-generating function for a Gaussian random variable X with mean a and variance σ2. We just compute the logarithm of the moment-generating function found in the previous example, 1 ln Z(λ) = ln Z(0) + λa + λ2σ2 2 and identify the cumulants through relation (6). We have hhXii = a, hhX2ii = σ2, and hhXnii = 0 for n > 2.

Multidimensional variables

Let X be a random variable having r components X1,X2,...,Xr, and with joint probability density p(x) ≡ p(x1, x2, . . . , xr). The moments of X are R xm1 xm2 ··· xmr p(x) dx hXm1 Xm2 ··· Xmr i = 1 2 r , 1 2 r R p(x) dx

2 where dx ≡ dx1dx2 ··· dxr. The moment-generating function of X is a straightforward gener- alization of the one-dimensional version Z Z(λ) ≡ exp(λTx) p(x) dx, (8) where here λ is an r-dimensional vector and λT x is just the dot product between λ and x. Moments are generated with

1 ∂m1 ∂m2 ∂mr hXm1 Xm2 ··· Xmr i = ··· Z(λ) (9) 1 2 r m1 m2 mr Z(0) ∂x1 ∂x2 ∂xr λ=0

• Example: moment-generating function for a random Gaussian vector X with zero mean and variance Σ = K, X ∼ N (0, K). If we drop, as usual, the normalization factor of the probability density we have

Z  1  Z(λ) = exp − xT K−1x + λT x dx. (10) 2

The matrix K is symmetric and therefore diagonalizable (if K happens to be non- symmetric, the contribution from its antisymmetric part to the quadratic form xT K−1x vanishes). Let us denote by {ui} the basis of unit eigenvectors of K, and by U the matrix that results from arranging the set {ui} in columns. Because {u} is an orthonormal set, UT U = 1. The diagonalization condition can be summarized by KU = UD, where D is 2 the diagonal matrix with the eigenvalue σi at its i-th diagonal entry. In terms of the new −1 −1 T P P basis, K = UD U , x = i ciui = Uc, and λ = i diui = Ud, where c and d are coefficient vectors. We have

 1   1  Y  c2  exp − xT K−1x + λTx = exp − cT D−1c + dTc = exp − i + c d , 2 2 2σ2 i i i i and therefore

Z Y  c2  Y Z  c2  Z(λ) = exp − i + c d |U| dc, = exp − i + c d dc 2σ2 i i 2σ2 i i i i i i i  2 2  Y 1/2 d σ = 2πσ2 exp i i = |2πD|1/2 expdT Dd i 2 i = |2πK|1/2 expλT Kλ. (11)

where in the first line we used that the Jacobian |U| is 1 for orthogonal transformations, and where in the second line we completed the square for all the decoupled factors.

Although the Gaussian example is just a particular case, it is an important one because it can be computed in a closed form and because it is the basis for many perturbative schemes. The moment generating function Z(λ) is an exponential of a quadratic form in λ, which implies that only moments of even order will survive (see Eq. (9)). In particular we have

1 ∂ ∂ hxaxbi = Z(λ) = Kab, Z(0) ∂xa ∂xb λ=0 where a, b are any two indices of the components of x.

3 Moreover, it is not difficult to convince oneself that for Gaussian vectors any moment of even-order can be obtained as a sum of all the possible pairings between variables, for example:

hxaxbxcxdi = hxaxbihxcxdi + hxaxcihxbxdi + hxaxdihxbxci. This property is usually referred to as Wick’s theorem, or Isserlis’s theorem.

Fields We can extend from finite-dimensional systems to functions defined on some domain [0,T ]. The basic idea is to discretize the domain interval into n segments of length h such that T = nh, and then take the limits n → ∞ and h → 0 while keeping T = nh constant. Under this limit the originally discrete set {t0, t1, . . . , tn = T } becomes dense on the interval [0,T ]. Any t can then be approximated as t = jh, for some j = 0, 1, . . . , n and we can identify

xj −→ x(t), λj −→ λ(t),Kij −→ K(s, t). Now instead of a moment-generating function we’ll have a moment-generating , an object that maps functions into real numbers. For the Gaussian case we have the functional version of Eqs. (10) and (11) Z  1 ZZ Z  Z[λ(t)] = Dx(s) exp − x(u)K−1(u, v)x(v) du dv + λ(u)x(u) du 2 1 ZZ  = Z[0] exp λ(u)K(u, v)λ(v) du dv (12) 2 where we use square brackets to denote a functional, and where Z[0] is the value of the generating functional with the external source λ(t) set to 0. We also defined the for integration over functions as n Y Dx(t) ≡ lim dxi. n→∞ i=0 The integral in Eq. (12) is an example of path integral, or functional integral (Cartier and DeWitt-Morette, 2010). The presentation here is obviously very handwavy, and there are several ill-defined quan- 1/2 tities lying around like, for instance, the prefactor Z[0] = limn→∞ |2πK| , which is formally infinite. Fortunately, the moments we derive from the moment-generating functional are well- defined. Now we need to compute the moments, for which we need the notion of functional deriva- tive. Although functional derivatives can be defined rigorously, here we only need to remember the following axioms   δλ(t) ∂λj = δ(t − s) equivalent to = δij , δλ(s) ∂λi ! δ Z ∂ X λ(s)x(s) ds = x(t) equivalent to λ x = x , δλ(t) ∂λ j j i i j where δ(·) is the Dirac delta, or point-mass functional, and δij is the Kronecker delta: δij = 1 if i = j, 0 otherwise. As usual, moments are obtained from taking functional derivatives on the moment-generating functional: * + Y 1 Y δ x(ti) = Z[λ] . Z[0] δλ(ti) i i λ(t)=0

4 • Example: given the moment-generating functional of a Gaussian process, Eq. (12), the two-point correlation function is

1 δ δ hx(t)x(s)i = Z[λ] = K(t, s). Z[0] δλ(t) δλ(s) λ(u)=0

More generally, any (even-order) moment will be of the form

* + Y 1 Y δ X x(t ) = Z[λ] = K(t , t ) ··· K(t , t ), i i1 i2 i2s−1 ti2s Z[0] δλ(ti) i i λ(t)=0 all possible pairings

because of Wick’s theorem.

Functional formalism for stochastic differential equations

Imagine we have a stochastic process defined by the Langevin equation dx = f(x, t) + g(x, t) ξ(t), (13) dt with initial condition x(t0) = x0, and with x(t) defined on the domain t ∈ [t0,T ]. The symbol ξ(t) denotes a source of white noise of zero mean and unit variance: hξ(t)i = 0, hξ(t)ξ(t0)i = δ(t − t0). Equation (13) is to be interpreted as the It¯ostochastic differential equation dx = f(x, t) dt + g(x, t) dBt, (14) where dBt is a Brownian stochastic process. According to the convention for an It¯ostochastic process, g(x, t) is non-anticipating, meaning that when we evaluate the integrals over time and Bt, g(x, t) is independent of Bτ for τ > t. To explicitly enforce our choice of our initial condition we rewrite Eq. (13) as

dx = f(x, t) + g(x, t) ξ(t) + x δ(t − t ). (15) dt 0 0 The goal now is to use the path integral formalism to compute the moments and correlation functions of x(t).

Probability density functional of a stochastic process

We need first the probability density functional for x(t) given the initial condition x(t0) = x0. We denote this functional by P [x(t)|x0]. This functional is also called the transition probability and is the equivalent of the probability density function for finite-dimensional variables, in the sense it assigns a probability measure to each realization x(t) obeying x(0) = x0. To work on less shaky grounds, we start by discretizing version of Eq. (15) and work on a finite- dimensional system. For a small time step h, and according to the It¯oprescription, Eq. (15) is to be interpreted as √ xi+1 − xi = fi(xi)h + gi(xi)ξi h + x0δi0, (16) where i ∈ {0, 1,...,N}, T = Nh, δij is the Kronecker delta, and ξi is a discrete random variable that satisfies hξii = 0 and hξiξji = δij. The subindices in fi and gi mean that functions f(x, t)

5 and g(x, t) are evaluated at time t = ih. Note that the discretized stochastic variable vector xi depends on both the discretized white noise process ξi and the initial condition x0. Formally, the joint probability density functional for the vector x can be written as the point mass constrained at the particular solution of the stochastic differential equation:

N Y √  P (x|ξ; x0) = δ xi+1 − xi − fi(xi)h − gi(xi)ξi h − x0δi0 , (17) i=0 where we introduced the vectors x = (x0, x1, . . . , xN ) and ξ = (ξ0, ξ1, . . . , ξN−1). By particular solution we mean that associated with a particular realization ξ. We are interested in the joint probability density function P (x|x0) that results from integrating out the noise

N Z Y Z P (x|x0) = P (x|ξ; x0) p(ξj) dξj ≡ P (x|ξ; x0) p(ξ) dξ. (18) j=0

To compute Eq. (18), with P (x|ξ; x0) given by Eq. (17), we introduce the Fourier repre- sentation of the Dirac delta 1 Z δ(x) = e−ikx dk 2π for every factor in Eq. (17). We get Z N  √  Y dkj X P (x|ξ; x ) = exp −i k x − x − f (x )h − g (x )ξ h − x δ  . 0 2π j j+1 j j j j j j 0 j0 j=0 j To go further we need to specify the statistical properties of the noise source, determined by the probability density of ξi, p(ξi). Let us assume for the moment that the noise is Gaussian and white, for which 1 2  p(ξi) = √ exp −ξ /2 . 2π i The transition probability Eq. (18) reads then Z N Z N   Y Y dkj X P (x|ξ; x ) p(ξ )dξ = exp −i k x − x − f (x )h − x δ  0 j j 2π j j+1 j j j 0 j0 j=0 j=0 j Z N √ Y dξj  2  × √ exp ikjgj(xj)ξj h − ξj /2 . j=0 2π We can integrate the last integral by completing the square, getting Z N       Y dkj X xj+1 − xj δj0 X 1 P (x|x ) = exp − ik − f (x ) − x h + (ik )2g2(x )h , 0 2π j h j j 0 h 2 j j j j=0 j j Z N       Y d˜xj X xj+1 − xj δj0 X 1 ≡ exp − x˜ − f (x ) − x h + x˜2g2(x )h . 2πi j h j j 0 h 2 j j j j=0 j j where for convenience we introduced a new complex variablex ˜i = iki. If we now take the continuum limit h → 0 and N → ∞ while keeping T = Nh constant, and use N Z T xi −→ x(t) xi+1 − xi X , −→ x˙(t), h −→ dt x˜ −→ x˜(t) h i i=0 0 Z N   Z Y d˜xj δi0 −→ Dx˜(t), x −→ x δ(t), f (x ) −→ fx(t), t 2πi 0 h 0 i i j=0

6 we obtain the probability density functional Z  P [x(t)|x0] = Dx˜(t) exp −S[x, x˜] , (19) where we have defined the Z n   1 o S[x, x˜] ≡ x˜(t) x˙(t) − f(x(t), t) − x δ(t − t ) − x˜2(t)g2(x(t), t) dt. (20) 0 0 2 The action S[x, x˜] is commonly known as the Martin-Siggia-Rose-Janssen-de Dominicis action. The new fieldx ˜(t) is called the auxiliary field, or response field. Note that because of the definition iki → x˜(t), the integrals overx ˜(t) are to be evaluated along the imaginary axis, which is why no explicit imaginary units appear in the action. The expectation of any observable quantity O[x(t)] can now be formally represented as Z Z  hO[x]i = Dx(t) O[x(t)]P [x(t)|x0] = Dx(t) Dx˜(t) O[x(t)] exp −S[x, x˜] .

Moment-generating functional We define the moment-generating functional analogously to how we defined the finite-dimensional version, Eq. (8), noting that scalar products become time integrals in function space: Z λT x −→ λ(t)x(t) dt.

The moment-generating functional for x(t) andx ˜(t) reads  Z Z  Z[J, J˜] ≡ exp J˜(t)x(t) dt + J(t)˜x(t) dt Z Z Z  ˜ = Dx(t) P [x(t)|x0] exp J(t)x(t) dt + J(t)˜x(t) dt Z  Z Z  = Dx(t) Dx˜(t) exp −S[x, x˜] + J˜(t)x(t) dt + J(t)˜x(t) dt .

What if noise is not white or Gaussian? The formalism can incorporate noise other than Brownian. The noise source is by itself a stochastic process, which can be characterized by some probability functional Z ˜   P [ξ(t)|ξ0, t0] = Dξ(t) exp −S[ξ] = exp −S[ξ] , where we assumed that the action of the stochastic process depends on ξ only, S[ξ, ξ˜] = S[ξ]. So, for example, the action for a white Gaussian process is 1 Z S[ξ] = ξ2(t) dt. 2 Given the probability functional for the noise we can now write the transition probability for x(t) for any general noise source: Z    P [x(t)|x0, t0] = Dξ(t) δ x˙(t) − f(x, t) − ξ(t) − x0δ(t − t0) exp −S[ξ(t)] Z  Z   = Dξ(t)Dx˜(t) exp − x˜(t) x˙(t) − f(x, t) − x0δ(t − t0) dt Z  + x˜(t)ξ(t) dt − S[ξ(t)] .

7 We can further simplify this expression noting that Z  Z  Dξ(t) exp −S[ξ(t)] + x˜(t)ξ(t) dt is nothing but the definition of the generating functional for ξ(t) with sourcex ˜(t), i.e., Z[˜x(t)] ≡ exp W [˜x(t)]. We can thus write Z  Z    P [x(t)|x0, t0] = Dx˜(t) exp − x˜(t) x˙(t) − f(x, t) − x0δ(t − t0) dt + W [˜x(t)] .

We have now defined all the quantities we need to compute the moments. Either if you want to learn how to actually compute them for several relevant examples, or if you want to grasp the power of the path-integral formalism in the general case, I avidly encourage you to read (Chow and Buice, 2015).

References

Cartier P, DeWitt-Morette C (2010) Functional integration: action and symmetries. Cambridge University Press Chow C, Buice M (2015) Path integral methods for stochastic differential equations. The Journal of Mathe- matical Neuroscience 5(1):8, 10.1186/s13408-015-0018-5

8