Martingale calculus and a maximal inequality for supermartingales

B. Hajek

Department of Electrical and Computer Engineering and the Coordinated Science Laboratory University of Illinois at Urbana-Champaign

March 15, 2010 Abstract:

In the first hour of this two part presentation, the calculus of , which includes martingales with both continuous and discrete components, will be reviewed. In the second hour of the presentation, a tight upper bound is given involving the maximum of a supermartingale. Specifically, it is shown that if Y is a with initial value zero and quadratic variation process [Y,Y] such that Y + [Y,Y] is a supermartingale, then the probability the maximum of Y is greater than or equal to a positive constant a is less than or equal to 1/(1+a). The proof uses the semimartingale calculus and is inspired by dynamic programming. If Y has stationary independent increments, the bounds of J.F.C. Kingman apply to this situation. Complements and extensions will also be given. (Preliminary paper posted to AirXiv: http://arxiv.org/abs/0911.4444). Outline

Brief review of martingale calculus

Kingman’s moment bound

Bound on maximum of a supermartingale (with no SII assumption)

The big jump construction

Proof of the upper bound

Discrete time, version I

Discrete time, version II

A lemma

Comparison to Doob’s moment bounds PART I: Brief review of martingale calculus

See, for example, [4, 5, 7, 8, 9] for more detail. The usual underlying conditions

I Assume (Ω, F, P) is complete (subsets of events with probability zero are events)

I Assume filtration of σ-algebras F• = (Ft : t ≥ 0) is

I right-continuous, and I each Ft includes all zero-probability events.

I Thus martingales, supermartingales, and submartingales have c`adl`ag(right continuous with finite left limits) versions. We assume in these slides such versions are used, without further explicit mention. Predictable processes and L2 stochastic integrals

I P =predictable subsets of R+ × Ω (σ-algebra of subsets of R+ × Ω generated by random processes U(ω)I(a,b](t), where U is Fa measurable. )

I Process H = (H(t, ω): t ≥ 0) is predictable if it is a P-measurable function of (t, ω).

I X is said to admit a predictable compensator A if A is a predictable process and X − A is a martingale. 2 2 2 I If M is an L martingale (so supt≥0 Mt is integrable) then Mt has a predictable compensator, written < M, M > . R t I Integrals H • M = ( 0 Hs dMs : t ≥ 0) for such M and a class of predictable processes H can be defined by focusing on the isomorphism for t fixed:

2 2 E[(H • Mt ) ] = E[H • < M, M >t ].

I H • M is then also a martingale Localization in time and semimartingales

I A process X = (Xt : t ≥ 0) stopped at a stopping time T :

 0 if T = 0 T  Xt = Xt if 0 ≤ t ≤ T and T > 0  XT if t ≥ T and T > 0

I M is a if there is sequence of stopping times Tn with Tn ≤ Tn+1 and Tn → ∞ so M is a martingale for all n. Tn I H is locally bounded if there is such (Tn) so that H is bounded for each n.

I A semimartingale is a random process X that can be represented as the sum of a local martingale and a (c`adl`ag) adapted process of locally finite variation. Semimartingales as integrators

R t I Can define H • X = ( 0 Hs dXs : t ≥ 0) for: H locally bounded and predictable, X a semimartingale.

Tn n n n n I Use (Tn) such that X = X0 + Mt + At where X0 is n 2 n bounded, supt |Mt | is integrable, the variation of A is bounded, and HTn is bounded. n n 2 n n I Define H • M as an L stochastic integral and H • A as a Lebesgue-Stieltjes integral. n n n n I Let H • Xt = limn→∞ (H • M + H • A )t n n I Shown limit exists, same for all choices of (Tn), A , M . Note: 4(H • X )t = Ht 4Xt Quadratic variation processes

I [Y , Y ] for semimartingale Y is defined by

kn−1 X 2 [Y , Y ]t = lim (Ytn − Ytn ) n→∞ i+1 i i=0 for any 0 = tn < tn < ··· < tn = t such that 0 1 kn n n maxi |ti+1 − ti | → 0 as n → ∞. P 2 c I Decomposition: [Y , Y ]t = s≤t (4Ys ) + [Y , Y ]t 2 I If M is a square integrable martingale, then M − [M, M] and [M, M]− < M, M > are martingales. In particular, < M, M > is the predictable compensator of both M2 and [M, M]. I If Yt = Xt + bt then [X , X ] = [Y , Y ]. I Define [X , Y ] similarly, or as 1 2 ([X + Y , X + Y ] − [X , X ] − [Y , Y ]). I If either X or Y have locally finite variation, then X [X , Y ]t = X0Y0 + 4Xs 4Ys . 0

Let W denote standard . Then, as well known, c [W , W ]t = [W , W ]t =< W , W >t = t. Example: Poisson process

I Let N be a rate λ Poission process. I λt is the predictable compensator of N, so Mt = Nt − λt defines a martingale M. P 2 P I [N, N]t = [M, M]t = 0t = λt. Generalized Itˆoformula

Let F be a twice continuously differentiable function and let X be a semimartingale. The generalized Itˆoformula (aka the Dol´ean-DadeMeyer change of variables formula) is:

Z t 0 F (Xt ) = F (X0) + F (Xu−)dXu 0 X 0  + F (Xu) − F (Xu−) − F (Xu−)4Xu 0

Let S0 = 0 and Sk = U1 + ··· + Uk . for k ≥ 1, where U1, U2,... 2 ∗ are iid with mean −µ and variance σ < ∞. Let S = supk≥0 Sk . Kingman’s bound is σ2 E[S∗] ≤ . (1) 2µ Proof of Kingman’s bound Let Wn = max{0, U1, U1 + U2, ··· , U1 + ··· + Un}. Then ∗ ∗ Wn % S as n % ∞, so by dom. conv., E[Wn] % E[S ].

Wn+1 = max{0, U1 + max{0, U2, U2 + U3, U2 + ··· + Un+1}}. | {z } same distribution as Wn so d. Wn+1 = (U + Wn)+. (2) Trivially, U + Wn = (U + Wn)+ − (U + Wn)− (3) Taking expectations on both sides of (3) and applying (2) and the fact E[Wn+1] ≥ E[Wn] yields

E[(U + Wn)−] ≥ µ. (4)

Squaring both sides of (3), using (a)+(a)− ≡ 0, taking 2 2 expectations, and using (2) and E[Wn+1] ≥ E[Wn ] yields 2 2 E[U ] − 2µE[Wn] ≥ E[(U + Wn)−]. (5) Rearranging (5) and applying (4) yields

E[U2] − E[(U + W )2 ] E[W ] ≤ n − n 2µ E[U2] − µ2 − E[(U + W )2 ] + E[(U + W ) ]2 ≤ n − n − 2µ σ2 − Var((U + W ) ) σ2 = n − ≤ 2µ 2µ

Finally, letting n → ∞ yields (1). Kingman’s bound in continuous time

Kingman’s bound readily extends to continuous time. Let Y be a stationary independent increment (SII) process with Y0 = 0 such that for some µ > 0 and σ2 > 0,

2 E[Yt ] = −µt and Var(Yt ) = σ t. (6)

∗ σ2 Then E[Y ] ≤ 2µ . Proof of Kingman’s bound in continuous time

For each integer n ≥ 0, let Sn denote the process n n∗ Sk = Yk2−n . Let S = supk≥0 Sk . By Kingman’s moment bound for discrete time processes,

n 2 n∗ Var(S1 ) σ E[S ] ≤ n = −2E[S1 ] 2µ Since Sn∗ is nondecreasing in n and converges a.s. to Y ∗, the result follows. Reformulation of Kingman’s bound in continuous time

I Suppose Y is SII with Y0 = 0 Then: I E[Yt ] = −µt ↔ Y + µt is a martingale 2 2 4 I Var(Yt ) = σ t ↔ [Y , Y ] − σ t is a martingale (Mt = Yt + µt 2 and [M, M]t − σ t are martingales; [M, M] = [Y , Y ]) I So (6) is eqivalent to:

2 Yt + µt and [Y , Y ]t − σ t are martingales. (7)

µ I Let γ = σ2 . Then 2 Yt + γ[Y , Y ]t = (Yt + µt) + γ([Y , Y ]t − σ t), so (7) implies that Y + γ[Y , Y ] is a martingale. We have: Proposition (Kingman bound in continuous time) Let Y be a SII process with Y0 = 0 such that Y + γ[Y , Y ] is a supermartingale. Then ∗ 1 ∗ 1 E[Y ] ≤ 2γ . Also, for any a > 0, P{Y ≥ a} ≤ 2aγ . Part III: Bound on maximum of a supermartingale (no SII assumption, continuous time) Suppose γ > 0.

I Condition 1 Y is a semimartingale with Y0 = 0, and Y + γ[Y , Y ] is a supermartingale. µ I Condition 2 (Stronger than Condition 1) Y0 = 0, γ = σ2 , 2 and both (Yt + µt : t ≥ 0) and ([Y , Y ] − σ t : t ≥ 0) are supermartingales. Condition 2 implies Condition 1 because 2 Yt + γ[Y , Y ]t = (Yt + µt) + γ([Y , Y ]t − σ t). ∗ Let Y = sup{Yt : t ≥ 0}. Supermartingale bound

Proposition 1Under Condition 1 or 2, for a ≥ 0 : (a) The following holds:

1 P{Y ∗ ≥ a} ≤ . (8) 1 + γa

(b) Equality holds in (8) if and only if the following is true, with T = inf{t ≥ 0 : Yt ≥ a} :(Yt∧T : t ≥ 0) has no continuous martingale component, Y is sample-continuous over [0, T ) with probability one, P(YT = a|T < ∞) = 1, and (Y + γ[Y , Y ])t∧T is a martingale. I Question: Does Condition 1 or 2 imply a finite upper bound on E[Y ∗] depending only on γ?

I Question: How closely can the inequalities in Proposition 1 be met for a single choice of Y not depending on a? The following addresses these two questions. Proposition 2 Given γ ≥ 0 there exists Y satisfying Condition 2 such that 1 P{Y ∗ ≥ a} ≥ (9) 5(1 + aγ) for all a ≥ 0. In particular, E[Y ∗] = +∞ for this choice of Y . Part IV. The big jump construction

This construction shows that the bound on P{Y ∗ ≥ a} is tight, and provides a proof of Proposition 2. 2 Let µ > 0, σ > 0, and h : R+ → R+ such that h is nondecreasing. Consider a random process Y of the form:

 −y(t) t < T Y = t h(y(T )) − µ(t − T ) t ≥ T for a deterministic, continuous function y = (y(t): t ≥ 0) and random variable T described below. Note that if T < +∞ then YT − = −y(T ), YT = h(y(T )), and 4YT = y(T ) + h(y(T )). Let y be a solution to the differential equation

σ2 y˙ = µ + , y(0) = 0 y + h(y) and let T be an extended nonnegative random variable such that for all t ≥ 0,

Z t σ2 P(T ≥ t) = exp(− κ(ys )ds) with κ(y) = 2 . 0 (y + h(y)) The function κ(y(t)) is the failure rate function of T : P(T ≤ t + η|T ≥ t) = κ(y(t))η + o(η). The function κ was chosen so that

2 2 E[(Yt+η − Yt ) |T > t] = (y(t) + h(y(t))) κ(y(t))η + o(η) = σ2η + o(η) and the differential equation for y was chosen so that

E[Yt+η − Yt |T > t] = −y˙ (t)η + (y(t) + h(y(t))κ(y(t))η + o(η) = −µη + o(η)

Therefore, Y satisfies Condition 2. If h is strictly increasing, then for any c ≥ 0, a change of variable of integration from t to y yields:

P{Y ∗ ≥ h(c)} = P{T ≥ y −1(c)} − P{T = ∞} = exp(−I (c)) − exp(−I (∞)) (10) where

Z y −1(c) I (c) = κ(y(t))dt 0 Z c  σ2 −1 = κ(y) µ + dy 0 y + h(y) Example 1: Meeting moment bound with equality

Take h(y) ≡ a for some a > 0. We don’t use (10) because h is not strictly increasing, but similar reasoning yields:

P{Y ∗ ≥ a} = 1 − P{T = ∞}  Z ∞  = 1 − exp − κ(y(t))dt 0 ! Z ∞  σ2 −1 = 1 − exp − κ(y) µ + dy 0 y + a  Z ∞  1 µ   = 1 − exp − − 2 dy 0 y + a yµ + aµ + σ 1 = aµ 1 + σ2 µ So Y satisfies the bound of Proposition 1 with equality for γ = σ2 . Example 2: Proof of Proposition 2 Take h(y) = b + y for some b > 0. Equation (10) yields that for b ≥ 0, P{Y ∗ ≥ b + c} = exp(−I (c)) − exp(−I (∞)) where Z c σ2  σ2 −1 I (c) = 2 µ + dy 0 (b + 2y) b + 2y Z c  1 µ  = − 2 dy 0 b + 2y µ(b + 2y) + σ 1  b + 2c  1  b  = ln − ln 2 µ(b + 2c) + σ2 2 µb + σ2 Using this and (10), and setting c = a − b, yields  1  1   µb  2  σ2  2  µb+σ2 1 + µ(2a−b) − 1 a ≥ b P{Y ∗ ≥ a} =   1  µb 2 1 − µb+σ2 0 < a ≤ b (11) 16σ2 ∗ 1 1 Let b = 9µ . Then P{Y ≥ a} = 5 ≥ 5(1+ µa ) for 0 ≤ a ≤ b. By σ2 1 α 2 α checking derivatives, it is easy to verify that (1 + 2 ) − 1 ≥ 4(1+α) for any α > 0. Therefore, for this choice of b, and a ≥ b,

( 1 ) 4  σ2  2 P{Y ∗ ≥ a} ≥ 1 + − 1 5 2µa 1 ≥ µa . 5(1 + σ2 ) This bound for the process Y proves Proposition 2. Part V. Proof of the upper bound Reformulation of bound using X = a − Y .

Condition 10 : X is a semimartingale and γ ≥ 0, such that X − γ[X , X ] is a submartingale. Proposition 10 (Equivalent to Proposition 1.) 0 Suppose X and γ satisfy Condition 1 and X0 = a for some a ≥ 0. Let T = inf{t : Xt ≤ 0} (so T = ∞ if Xt > 0 for all T ). (a) The following holds:

1 P{T < ∞} ≤ . (12) 1 + γa

(b) Equality holds in (12) if and only if (Xt∧T : t ≥ 0) has no continuous martingale component, X is sample-continuous over [0, T ) with probability one, P{XT = 0|T < ∞} = 1, and ((X − γ[X , X ])t∧T : t ≥ 0) is a martingale. (Proof of Proposition 10(a))

First show no loss of optimality if X does not overshoot zero. So we assume Xt ≥ 0 for all t. (Details in paper.) 1 Let p(x) = 1+γa and 0 ≤ s < t. Z t 0 p(Xt ) = p(Xs ) + p (Xu−)dXu+ s X 0  p(Xu) − p(Xu−) − p (Xu−)4Xu s

0 2 p (Xu−) = −γp (Xu−) (14)

0 2 2 2 p(Xu) − p(Xu−) − p (Xu−)4Xu = (4Xu) γ p (Xu−)p(Xu) 2 2 2 ≤ (4Xu) γ p (Xu−) (15) and

00 2 3 p (Xu−) = 2γ p (Xu−) 2 2 ≤ 2γ p (Xu−). (16) Combining (13) - (16) and the fact c P 2 [X , X ]u = [X , X ]u + v≤u(4Xv ) yields

Z t 2 p(Xt ) ≤ p(Xs ) − γ p(Xu−) dGu (17) s where G = X − γ[X , X ]. By assumption, G is a submartingale, so

Z t 2 E[ p(Xu−) dGu|Fs ] ≥ 0 (18) s Therefore, p(X ) is a supermartingale, so that E[p(Xt )] ≤ p(X0) = p(a). For t ≥ 0, {T ≤ t} = {p(Xt ) = 1}. Therefore, P{T ≤ t} ≤ E[p(Xt )] ≤ p(a) for all t. Thus, P{T < ∞} = limt→∞ P{T ≤ t} ≤ p(a), completing the proof of Proposition 10 (a). (Proof of Proposition 10 (b))

To prove the uniqueness of the process constructed to meet the bound, we simply look over last three slides to see when the various inequalities used in the upper bound are tight. Details are in the paper. Part VI. Discrete-time, version I For the discrete-time setup suppose that (Ω, F, P) is a complete probability space with a filtration of σ-algebras (Fk : k ∈ Z+). Condition D1 S = (Sk : k ∈ Z+) is an adapted random process with S0 = 0 and, with Uj = Sj − Sj−1 for j ≥ 1, P 2 (Sk + γ 1≤j≤k Uj : k ≥ 0) is a supermartingale. ∗ Let S = sup{Sk : k ∈ Z+}. Proposition 3 Suppose S and γ satisfy Condition D1 and a ≥ 0 : (a) The following holds:

1 P{S∗ ≥ a} ≤ (19) 1 + γa

(b) For any γ ≥ 0, a ≥ 0 and  > 0, there is a process S satisfying ∗ 1 Condition D1 such that P{S ≥ a} ≥ 1+γa − . (Proof of Proposition 3(a))

The filtration (Fk : k ∈ Z+) can be extended to a filtration (Ft : t ∈ R+) by letting Ft = Fbtc for t ∈ R+, and the process S can be extended to a piecewise constant process (Yt : t ∈ R+) by ∗ ∗ letting Yt = Sbtc for t ∈ R+. Then S = Y and Y satisfies Condition 1. Thus, by Proposition 1, ∗ ∗ 1 P{S ≤ a} = P{Y ≤ a} ≤ 1+γa . This establishes (a). (Proof of Proposition 3(b) (outline))

I Start with a continuous time process using the big jump construction, Example 1, with very smallµ ˜ andσ ˜2, and ˜a a bit larger than a.

I Sample the process at integer times.

I The constants should be selected so that if the continuous-time process reaches ˜a then the discrete time process reaches a. (The details are in the paper.) Part VII. Discrete-time, version II

Conditions 1 and 2 yield the same bounds for continuous time. The same is not true in discrete time. In this section we explore the discrete time version of Condition 2, which we call Condition D2. Application of Proposition 3

Condition D2(a) Suppose S = (Sk : k ∈ Z+) is an adapted discrete time process with S0 = a and the increments Uk = Sk − Sk−1 are such that Uk+1 has conditional mean µ and 2 conditional variance σ , given Fk .

Let T = min{k ≥ 0 : Sk ≤ 0}.

µ The process a − S satisfies Condition D1 with γ = µ2+σ2 , so by 1 Proposition 3, P{T < ∞} ≤ 1+γa . But this bound is not tight. For example, lima&0 P{T < ∞} < 1. Dynamic programming equations

For a ≥ 0, let Vn(a) = maxS∼Condition D2(a) P{T ≤ n}. In particular, Vn(0) = 1. Let V0(a) = I{a=0}. Define the dynamic programming operator T by

TU(a) = sup E[U(X )]. X ∼(a+µ,σ2)

n Then Vn = T V0 and the functions Vn increase with n. Denote the (pointwise) limit by V∞. By the monotone convergence theorem, V∞(a) = supS∼Condition D2(a) P{T < ∞}. µ c As noted, Proposition 3 with γ = µ2+σ2 , yields V∞ ≤ V where c 1 V (a) = 1+γa . Since the operator T is monotone, we have by n (c) induction on n that V∞ ≤ V n for n ≥ 1, where V n = T V . Moreover, as n → ∞, for any a > 0 fixed, the intervals [Vn(a), V n(a)] shrink down to V∞(a). The operator T involves a maximization over a probability distribution. Let Tb be the operator that results if, instead of maximizing over distributions on R+ with mean a + µ and variance σ2, we use the unique such distribution supported by zero and one other point. The operator Tb can be expressed as follows:  (a + µ)2    σ2  1 − TUb (a) = 1 − U a + µ + . (a + µ)2 + σ2 a + µ

Likewise, define Vbn and Vb n the way Vn and V n were defined, but using Tb in place of T . The intervals [Vbn(a), Vb n(a)] shrink down to a limit point Vb∞(a), which is the maximum probability of ever reaching zero from an initial state a, using the two-point distributions. Unraveling the definitions yields that:

n Y (a(k) + µ)2 Vbn(a) = 1 − (20) (a(k) + µ)2 + σ2 k=1

! n a(n)µ Y (a(k) + µ)2 Vb n(a) = 1 − (21) a(n)µ + σ2 (a(k) + µ)2 + σ2 k=1

2 where a(1) = a and a(k+1) = a(k) + µ + σ for k ≥ 1. a(k)+µ We performed numerical calculations of the third derivative of Vb∞ using (20) and (21), and found it to be negative for several choices of µ/σ, leading to the following conjecture. ConjectureThe function Vb∞ is twice continuously differentiable over (0, ∞) and its second derivative is monotone nonincreasing. (We find that 1 = Vb∞(0) > Vb∞(0+) so Vb∞ is not continuous at zero.) The function Vb∞ is a fixed point of Tb. If the conjecture were true, then the lemma in the next section would imply that T Vb∞ = TbVb∞, implying that Vb∞ is a fixed point of T also. Therefore, if the conjecture is true, Vb∞ = V∞, and the process S that has the maximum chance of reaching zero subject to the given constraints, is the one using the two-point distributions. Part VIII. A lemma providing insight Let m > 0 and σ2 ≥ 0. Let Xb have the unique probability distribution with mean m and variance σ2 supported by {0, b} or m2+σ2 {b} for some point b > 0. Specifically, b = m , m2 σ2 P{Xb = b} = , and P{Xb = 0} = . m2 + σ2 m2 + σ2

Lemma Suppose φ : R+ → [0, 1] is a continuous function which is twice continuously differentiable over (0, ∞). Suppose φ00 is nonincreasing. (It easily follows that φ is nonincreasing and convex. 1 For example, φ(x) = 1+x .) Let X be any nonnegative random variable with mean m and variance σ2. Then E[φ(X )] ≤ E[φ(Xb)]. Proof of lemma (outline)

2 Let L(x) = φ(x) + λ1x − λ2x for x ∈ R+, where 2φ(0) − 2φ(b) + bφ0(0) λ = 1 b φ(0) − φ(b) + bφ0(0) λ = . 2 b2 Then L(0) = L(b) and L0(b) = 0, and it can be shown shown that L(x) ≤ L(0) = L(b) for all x ≥ 0. So E[L(X )] ≤ L(0) = E[L(Xb)]. Since E[X ] = E[Xb] and E[X 2] = E[(Xb)2], E[φ(Xb)] − E[φ(X )] = E[L(Xb)] − E[L(X )] ≥ 0. Part IX. Comparison to Doob’s moment bounds A big jump construction for Doob’s Lp inequality for p > 1. Doob’s inequality for nonnegative submartingale X , p > 1 : p ||X ∗|| ≤ ||X || . (22) p p − 1 T p Dubins and Gilat [2] construction showing tightness can be expressed as a “big jump” process as follows. Let h be a positive, nondecreasing function on the interval [0, 1], let U be uniformly distributed on the interval [0, 1], let 0 < c < 1, and let:  h(t) t < U X = t ch(U) t ≥ U.

0 (1−c)h(t) The drift of X at time t is h (t) − 1−t . Setting it to zero yields 1 ∗ ∗ h(t) = (1−t)1−c . Let T = 1. Note that X = h(U) and XT = cX . p Given p > 1, XT is in L if (1 − c)p > 1. We thus have: 1 1 p ||X ∗|| = ||X || < ∞ if > , p c T p c p − 1 which shows that the constant in (22) is the best possible. The analysis of [2] is related to work of Blackwell and Dubins [1], which makes a connection to the Hardy-Littlewood maximal function [3], h, of a nondecreasing integrable function g on [0, 1], defined as follows: 1 Z 1 h(t) = g(u)du. 1 − t t In fact, h is the unique function such that a process that follows h up until time U, and then jumps and sticks to value g(U), is a martingale. ReferencesI

D. Blackwell and L.E. Dubins. A converse to the dominated convergence theorem. Ill. J. Math., 7:508–514, 1963. L. E. Dubins and David Gilat. On the distribution of maxima of martingales. Proc. Amer. Math. Soc., 68(3):337–338, 1978. G. H. Hardy and J. E. Littlewood. A maximal theorem with function-theoretic applications. Acta Math., 54(1):81–116, 1930. J. Jacod. Calcul stochastique et probl´emesde martingales, Lecture Notes in Math., vol. 714. Springer-Verlag, New York, 1979. ReferencesII

O. Kallenberg. Foundations of Modern Probability (2nd ed.). Springer, 2002. J.F.C. Kingman. Some inequalities for the queue GI/G/1. Biometrika, 49(3/4):315–324, December 1962. P.A. Meyer. Un cours sur les intˆegralesstochastiques, pages 246–400. Springer-Verlag, New York, 1976. P. Protter. Stochastic Integration and Differential Equations (2nd ed.). Springer, 2004. E. Wong and B. Hajek. Stochastic processes in engineering systems. Springer-Verlag, New York, 1985. ReferencesIII