Notes on the Itô Calculus

Steven P.Lalley

November 14, 2016

1 Itô Integral: Deﬁnition and Basic Properties

1.1 Elementary integrands

Let W W (t) be a (one-dimensional) Wiener process, and ﬁx an admissible ﬁltration F. t = An adapted process Vt is called elementary if it has the form

Vt ªj 1(t j ,t j 1](t) (1) = j 0 + X= where 0 t t t , and for each index j the random variable ª is measur- = 0 < 1 <···< K <1 j able relative to Ft j .

Deﬁnition 1. For a simple process {Vt }t 0 satisfying equation (1), deﬁne the Itô integral ∏ as follows: t K 1 ° It (V ) Vs dWs : ªj (W (t j 1 t) W (t j t)) (2) = 0 = j 0 + ^ ° ^ Z X=

(Note: The alternative notation It (V ) is commonly used in the literature, and I will use it interchangeably with the integral notation.)

If the Brownian path t W (t) were of bounded variation, then the deﬁnition (2) 7! would coincide with the usual deﬁnition of a Lebesgue Stieltjes integral. But since the paths of W are not of bounded variation, the extension of the integral (2) to a larger (and more interesting) class of integrands must be done differently than in the usual Lebesgue theory.

Properties of the Itô Integral:

1 (A) Linearity: I (aV bU) aI (V ) bI (U). t + = t + t (B) Measurability: It (V ) is adapted to F. (C) Continuity: t I (V ) is continuous. 7! t These are all immediate from the definition. Proposition 1. Assume that V and U are elementary processes satisfying EV 2 EU2 t t t + t <1 for every t 0 (equivalently, the random variables ª in the definition (1) all have finite ∏ j second moments). Then EI (V ) EI (U) 0, (3) t = t = t EIt (U)It (V ) EVsUs ds, and hence (4) = 0 Zt EI (V )2 EV 2 ds. (5) t = s Z0 Proof. Because the Itô integral is linear, it suffices to prove the formulas (3) and (4) in the special case where the integrands are elementary processes with only one jump: U ª1 (t) and t = (r,s] V ≥1 (t) t = (r,s] where ª and ≥ are both F measurable random variables with finite second moments. r ° (Exercise: Explain why we can assume that the interval (r,s] is the same for both processes.) Now for any t, since ª,≥ are F measurable and L2, r ° EIt (U) E(ª(Wt s Wt r )) = ^ ° ^ EE(ª(Wt s Wt r ) Fr ) = ^ ° ^ | EªE((Wt s Wt r ) Fr ) = ^ ° ^ | 0, = and similarly, 2 2 2 EIt (U) E(ª (Wt s Wt r ) ) = ^ ° ^ 2 2 EE(ª (Wt s Wt r ) Fr ) = ^ ° ^ | 2 2 Eª E((Wt s Wt r ) Fr ) = ^ ° ^ | t EU2 ds. = s Z0 (Note: We don’t know a priori that EI (U)2 , but nevertheless we can still use the t <1 “filtering” rule for conditional expectation, because all of the random variables involved are nonnegative.) The covariance formula (4) now follows by polarization (that is, using the variance formula for I (U V ) and I (U V ), then subtracting.) t + t °

2 The equality (5) is of crucial importance – it asserts that the mapping that takes the process V to its Itô integral at any time t is an L2 isometry relative to the L2 norm for ° ° the product measure Lebesgue P. This will be the key to extending the integral to a £ wider class of integrands. The simple calculations that lead to (3) and (5) also yield the following useful information about the process It (V ):

Proposition 2. Assume that Vt is elementary with representation (1), and assume that each of the random variables ª has ﬁnite second moment. Then I (V ) is an L2 martingale j t ° relative to F. Furthermore, if t [I(V )] : V 2 ds; (6) t = s Z0 then I (V )2 [I(V )] is a martingale. t ° t

Note: The process [I(V )]t is called the quadratic variation of the martingale It (V ). The square bracket notation is standard in the literature.

Proof. First recall that a linear combination of martingales is a martingale, so to prove that It (V ) is a martingale it sufﬁces to consider elementary functions Vt with just one step: V ª1 (t) t = (s,r ] with ª measurable relative to Fs. For such a process V the integral It (V ) is zero for all t s, and I (V ) I (V ) for all t r , so to show that I (V ) is a martingale it is only ∑ t = r ∏ t necessary to check that

E(I (V ) F ) I (V ) for s u t r. t | u = u ∑ < ∑ E(ª(W W ) F ) ª(W W ). () t ° r | u = u ° r But this follows routinely from basic properties of conditional expectation, since ª is measurable relative to Fr and Wt is a martingale with respect to F. It is only slightly more difﬁcult to check that I (V )2 [I(V )] is a martingale (you have t ° t to decompose a sum of squares). Let Vt be elementary, and assume that the random variables ª in the representation (1) are in L2. We must show that for every s,t 0, j ∏ 2 2 E(It s(V ) Ft ) E([I(V )]t s Ft ) It (V ) [I(V )]t . + | ° + | = °

It sufﬁces to prove this for values of s such that s t j t j 1 (where the t j are the discon- ∑ ° ° tinuity points in the representation (1)), by the tower property of conditional expectations. Thus, we may assume without loss of generality that Vr is constant on the interval r [t,t s], that is, V ª where ª L2 and ª is measurable with respect to F . Under 2 + r = 2 t this assumption, It s(V ) It (V ) ª(Wt s Ws). + ° = + °

3 Now It (V ) is measurable relative to Ft , and hence, since the process Ir (V ) is a martingale (by the ﬁrst part of the proof),

2 2 2 E(It s(V ) Ft ) It (V ) E((It s(V ) It (V )) Ft ) 2It (V )E((It s(V ) It (V )) Ft ) + | = + + ° | + + ° | 2 2 It (V ) E((It s(V ) It (V )) Ft ) = + + ° | 2 2 2 It (V ) ª E((Wt s(V ) Wt (V )) Ft ) = + + ° | I (V )2 ª2s = t + 2 It (V ) E([I(V )]t s Ft ) [I(V )]t . = + + | °

1.2 Extension to the class VT

Fix T . Deﬁne the class VT to be the set of all adapted processes {Vt }t T , such that ∑1 (n) ∑ there exists a sequence {Vt }t T of elementary processes for which ∑ T (n) 2 lim E V Vt dt 0 (7) n | t ° | = !1 Z0

Proposition 3. The space VT is a (real) Hilbert space when endowed with the inner product

T U,V EU V dt. (8) h i= t t Z0 The elementary processes are dense in this Hilbert space. Furthermore, every process {Vt }t [0,T ] VT is progressively measurable, that is, for every 0 s T , when Vt (!) 2 ∑ ∑ = V (t,!) is viewed as a function of two variables (t,!) [0,s] ≠, it is jointly measurable R 2 £ with respect to the product æ algebra B F . ° [0,s] £ s

Remark 1. The last assertion – that every process in VT is progressively measurable – is only worth mentioning because it guarantees that integrals with respect to the product measure dt dP are well-deﬁned. This justiﬁes changing the order of integration as £ follows: s s EV 2 dt E V 2 dt : E[V ] . (9) t = t = s Z0 Z0 Note: The inner integral [V ]s is called the observed quadratic variation of V up to time s.

Proof of Proposition 3. That VT is a Hilbert space follows by a routine modiﬁcation of the usual proof that an L2 space is a Hilbert space. The elementary functions are dense by construction, because every element of VT is by deﬁnition a limit of elementary processes,

4 by (7). Finally, progressive measurability follows because (i) every elementary process is progressively measurable (exercise: why?); and (ii) limits of jointly measurable functions are jointly measurable (check your real analysis text).

Proposition 4. If {Vt }t T is a uniformly bounded, adapted process with continuous sam- ∑ ple paths then {Vt }t T is an element of the Hilbert space VT . More generally, let {Vt }t T be ∑ ∑ an adapted process such that

2 lim sup E Vt Vs 0. (10) ± 0 s,t T : t s ± | ° | = ! ∑ | ° |∑

Then {Vt }t T is an element of the Hilbert space VT . ∑

Note 1. It follows, for instance, that the Wiener process {Wt }t T is an element of VT . ∑ (n) Proof. I will only prove the ﬁrst assertion; the second is similar. Deﬁne Vt to be the (n) elementary process obtained from Vt by setting Vt equal to VkT/2n on the interval t [kT/2n,(k 1)T /2n). Because V has continuous paths, 2 + t (n) lim V Vt n t !1 = for every t T . Since the process V is uniformly bounded, there is a constant C ∑ t <1 such that V C for every t T , and so by the dominated convergence theorem | t |∑ ∑ (n) 2 lim E V Vt 0 n t !1 | ° | = for every t T . Another application of the dominated convergence theorem (this time ∑ for the Lebesgue integral) now implies that

T (n) 2 lim E V Vt dt 0. n | t ° | = !1 Z0

Theorem 1. (Itô Isometry) The Itô integral It (V ) deﬁned by (2) extends to all integrands V V in such a way that for each t T the mapping V I (V ) is a linear isometry from 2 T ∑ 7! t the space V to the L2 space of square-integrable random variables. In particular, if V is t ° n any sequence of bounded elementary functions such that V V 0, then for all t T , k n ° k! ∑ t t 2 It (V ) VdW: L lim Vn dW (11) = = ° n Z0 !1 Z0 exists and is independent of the approximating sequence Vn.

5 Proof. If V V in the norm (9) then the sequence V is Cauchy with respect to this n ! n norm. Consequently, by Proposition 1, the sequence of random variables It (Vn) is Cauchy in L2(P), and so it has an L2 limit. Linearity and uniqueness of the limit both ° follow by routine L2 arguments. ° This extended Itô integral inherits all of the properties of the Itô integral for elementary functions. Following is a list of these properties. Assume that V V and t T . 2 T ∑

Properties of the Itô Integral:

(A) Linearity: I (aV bU) aI (V ) bI (U). t + = t + t (B) Measurability: It (V ) is progressively measurable. (C) Continuity: t I (V ) is continuous (for some version). 7! t (D) Mean: EIt (V ) 0. = 2 2 (E) Variance: EIt (V ) V . Vt =k k 2 (F) Martingale Property: {It (V )}t T is an L martingale. ∑ 2 ° 1 (G) Quadratic Martingale Property: {It (V ) [I(V )]t }t T is an L martingale, where ° ∑ ° t [I(V )] : V 2 ds (12) t = s Z0 All of these, with the exception of (C), follow routinely from (11) and the corresponding properties of the integral for elementary functions by easy arguments using DCT and the like (but you should ﬁll in the details for (F) and (G)). Property (C) follows from Proposition 6 in the lecture notes on continuous martingales, because for elementary Vn the process It (Vn) has continuous paths.

T 1.3 Quadratic Variation and 0 WdW

There are tools for calculating stochasticR integrals that usually make it unnecessary to use the definition of the Itô integral directly. The most useful of these, the Itô formula, will be discussed in the following sections. It is instructive, however, to do one explicit calculation using only the definition. This calculation will show (i) that the Fundamental Theorem of Calculus does not hold for Itô integrals; and (ii) the central importance of the Quadratic Variation formula in the Itô calculus. The Quadratic Variation formula, in its simplest guise, is this: Proposition 5. For any T 0, > 2n 1 ° P lim (¢nW )2 T, (13) n k ° !1 k 0 = X= 6 where kT T kT ¢nW : W + W . k = 2n ° 2n µ ∂ µ ∂ n Proof. For each fixed n, the increments ¢k are independent, identically distributed Gaus- n sian random variables with mean zero and variance 2° T . Hence, the result follows from the WLLN for ¬2 random variables. ° Exercise 1. Prove that the convergence holds almost surely. HINT: Borel-Cantelli and exponential estimates. Exercise 2. Let f : R R be a continuous function with compact support. Show that ! 2n 1 T ° lim f (W (kT/2n))(¢nW )2 f (W (s))ds. n k !1 k 0 = 0 X= Z Exercise 3. Let W1(t) and W2(t) be independent Wiener processes. Prove that 2n 1 ° n n P lim (¢ W1)(¢ W2) 0. n k k ° !1 k 0 = X= HINT:(W (t) W (t))/p2 is a standard Wiener process. 1 + 2 The Wiener process W is itself in the class V , for every T , because t T <1 T T T 2 EW 2 ds sds . = s = = 2 <1 Z0 Z0 T 2 Thus, the integral 0 WdW is well-defined and is an element of L . To evaluate it, we will use the most obvious approximation of W by elementary functions. For simplicity, R s set T 1. Let µ(n) be the elementary function whose jumps are at the dyadic rationals = s 1/2n,2/2n,3/2n,..., and whose value in the interval [k/2n,(k 1)/2n) is W (k/2n): that is, + 2n (n) n µs W (k/2 )1[k/2n ,(k 1)/2n )(s). = k 0 + X= 1 (n) 2 Lemma 1. limn 0 E(Ws µs ) ds 0. !1 ° = R Proof. Since the simple process µ(n) takes the value W (k/2n) for all s [k/2n,(k 1)/2n], s 2 + 1 2n 1 (k 1)/2n (n) 2 ° + 2 E(µs µs ) ds E(Ws Wk/2n ) ds n 0 ° = k 0 k/2 ° Z X= Z 2n 1 (k 1)/2n ° + (s (k/2n))ds n = k 0 k/2 ° = Z 2Xn 1 ° 2n n 2n 2° 2 /2 0 ∑ k 0 = °! X=

7 The Itô Isometry Theorem 1 now implies that the stochastic integral µs dWs is the limit of the stochastic integrals µ(n) dW . Since µ(n) is elementary, its stochastic integral s s s R is deﬁned to be R 2n 1 (n) ° µs dWs Wk/2n (W(k 1)/2n Wk/2n ). = k 0 + ° Z X= To evaluate this sum, we use the technique of “summation by parts” (the discrete analogue of integration by parts). Here, the technique takes the form of observing that the sum can be modiﬁed slightly to give a sum that “telescopes”:

2n 1 2 ° 2 2 W1 (W(k 1)/2n Wk/2n ) = k 0 + ° = 2Xn 1 ° (W(k 1)/2n Wk/2n )(W(k 1)/2n Wk/2n ) = k 0 + ° + + = 2Xn 1 ° (W(k 1)/2n Wk/2n )(Wk/2n Wk/2n ) = k 0 + ° + = 2nX1 ° (W(k 1)/2n Wk/2n )(W(k 1)/2n Wk/2n ) + k 0 + ° + ° = X2n 1 ° 2 Wk/2n (W(k 1)/2n Wk/2n ) = k 0 + ° = 2n X1 ° 2 (W(k 1)/2n Wk/2n ) + k 0 + ° X= The ﬁrst sum on the right side is 2 µ(n) dW , and so converges to 2 1 W dW as n s s 0 s s ! . The second sum is the same sum that occurs in the Quadratic Variation Formula 1 R R (Proposition 5), and so converges, as n , to 1. Therefore, 1 WdW (W 2 1)/2. !1 0 = 1 ° More generally, R T 1 2 Ws dWs (W T ). (14) = 2 T ° Z0 Note that if the Itô integral obeyed the Fundamental Theorem of Calculus, then the value of the integral would be

t t 2 t 2 Ws Wt Ws dWs W (s)W 0(s)ds 0 = 0 = 2 0 = 2 Z Z Ø Ø Thus, formula (14) shows that the Itô calculus is fundamentallyØ different than ordinary calculus.

8 1.4 Stopping Rule for Itô Integrals

Proposition 6. Let V V and let ø T be a stopping time relative to the ﬁltration F. t 2 T ∑ Then ø T V dW V 1 (s)dW . (15) s s = s [0,ø] s Z0 Z0 In other words, if the Itô integral I (V ) is evaluated at the random time t ø, the result is t = a.s. the same as the Itô integral IT (V 1[0,ø]) of the truncated process Vs1[0,ø](s).

Proof. First consider the special case where both Vs and ø are elementary (in particular, ø takes values in a ﬁnite set). Then the truncated process Vs1[0,ø](s) is elementary (Exer- cise ?? above), and so both sides of (15) can be evaluated using formula (2). It is routine to check that they give the same value (do it!). Next, consider the case where V is elementary and ø T is an arbitrary stopping ∑ time. Then there is a sequence ø T of elementary stopping times such that ø ø.By m ∑ m # path-continuity of It (V ) (property (C) above),

lim Iø (V ) Iø(V ). n n !1 =

On the other hand, by the dominated convergence theorem, the sequence V 1[0,øn ] converges to V 1 in V norm, so by the Itô isometry, [0,ø] T ° 2 L lim IT (V 1[0,ø ]) IT (V 1[0,ø]). n n ° !1 = Therefore, the equality (15) holds, since it holds for each øm. Finally, consider the general case V V . By Proposition ??, there is a sequence V of 2 T n bounded elementary functions such that V V in the V norm. Consequently, by the n ! T ° dominated convergence theorem, V 1 V 1 in V norm, and so n [0,ø] ! [0,ø] T ° I (V 1 ) I (V 1 ) T n [0,ø] °! T [0,ø] in L2, by the Itô isometry. But on the other hand, Doob’s Maximal Inequality (see Propo- sition ??), together with the Itô isometry, implies that

max It (Vn) It (V ) 0 t T | ° |°! ∑ in probability. The equality (15) follows. Corollary 1. (Localization Principle) Let ø T be a stopping time. Suppose that V,U V ∑ 2 T are two processes that agree up to time ø, that is, V 1 (t) U 1 (t). Then t [0,ø] = t [0,ø] ø ø UdW VdW. (16) = Z0 Z0 Proof. Immediate from Proposition 6.

9 1.5 Extension to the Class WT

Fix T . Deﬁne W W to be the class of all progressively measurable processes ∑1 = T Vt V (t) such that = T P V 2 ds 1 (17) s <1 = ΩZ0 æ Proposition 7. Let V W , and for each n 1 deﬁne ø T inf{t : t V 2 ds n}. Then 2 T ∏ n = ^ 0 s ∏ for each n the process V (t)1 (t) is an element of V , and [0,øn ] T R t t lim Vs1[0,ø ](s)dWs : Vs dWs : It (V ) (18) n n = = !1 Z0 Z0 exists almost surely and varies continuously with t T . The process {It (V )}t T is called ∑ ∑ the Itô integral process associated to the integrand V .

Proof. First observe that limn øn T almost surely; in fact, with probability one, for all !1 = but ﬁnitely many n it will be the case that ø T . Let G {ø T }. By the Localization n = n = n = Principle (Corollary 1and the Stopping Rule, for all n,m, t ø t ^ n Vs1[0,øn m ](s)dWs Vs1[0,øn ](s)dWs. + = Z0 Z0 Consequently, on the event Gn, t t

Vs1[0,øn m ](s)dWs Vs1[0,øn ](s)dWs + = Z0 Z0 for all m 1,2,.... Therefore, the integrals stabilize on the event G , for all t T . Since = n ∑ the events Gn converge up to an event of probability one, it follows that the integrals stabilize a.s. Continuity in t follows because each of the approximating integrals is continuous.

Caution: The Itô integral defined by (18) does not share all of the properties of the Itô integral for integrands of class VT . In particular, the integrals may not have finite first moments; hence they are no longer necessarily martingales; and there is no Itô isometry.

2 The Itô Formula

2.1 Itô formula for Wiener functionals

The cornerstone of stochastic calculus is the Itô Formula, the stochastic analogue of the Fundamental Theorem of (ordinary) calculus. The simplest form is this:

10 Theorem 2. (Univariate Itô Formula) Let u(t,x) be twice continuously differentiable in x and once continuously differentiable in t. If Wt is a standard Wiener process, then

t t 1 t u(t,Wt ) u(0,0) us(s,Ws)ds ux (s,Ws)dWs uxx(s,Ws)ds. (19) ° = + + 2 Z0 Z0 Z0 Proof. See section 2.5 below.

One of the reasons for developing the Itô integral for filtrations larger than the minimal filtration is that this allows us to use the Itô calculus for functions and processes defined on several independent Wiener processes. Recall that a k dimensional Wiener ° process is an Rk vector-valued process ° W(t) (W (t),W (t),...,W (t)) (20) = 1 2 k whose components Wi (t) are mutually independent one-dimensional Wiener processes. Assume that F is a filtration that is admissible for each component process, that is, such that each process Wi (t) is a martingale relative to F. Then a progressively measurable process Vt relative to F can be integrated against any one of the Wiener processes Wi (t). If V(t) is itself a vector-valued process each of whose components Vi (t) is in the class WT , then define t k t V dW Vi (s)dWi (s) (21) 0 · = i 1 0 Z X= Z When there is no danger of confusion I will drop the boldface notation.

Theorem 3. (Multivariate Itô Formula) Let u(t,x) be twice continuously differentiable in each x and once continuously differentiable in t.IfW is a standard k dimensional i t ° Wiener process, then

t t 1 t u(t,Wt ) u(0,0) us(s,Ws)ds xu(s,Ws) dWs ¢xu(s,Ws)ds. (22) ° = + r · + 2 Z0 Z0 Z0 Here and ¢ denote the gradient and Laplacian operators in the x variables, respec- rx x ° tively.

Proof. This is essentially the same as the proof in the univariate case.

Example 1. First consider the case of one variable, and let u(t,x) x2. Then u 2 = xx = and u 0, and so the Itô formula gives another derivation of formula (14). Actually, the t = Itô formula will be proved in general by mimicking the derivation that led to (14), using a two-term Taylor series approximation for the increments of u(t,W )t ) over short time intervals.

11 Example 2. (Exponential Martingales.) Fix µ R, and let u(t,x) exp{µx µ2t/2}.It 2 = ° is readily checked that u u /2 0, so the two ordinary integrals in the Itô formula t + xx = cancel, leaving just the stochastic integral. Since u µu, the Itô formula gives x = t Z µ(t) 1 µZ µ(s)dW(s) (23) = + Z0 where Z µ(t): exp µW (t) µ2t/2 . = ° Thus, the exponential martingale Zµ(t) is© a solution of the™ linear stochastic differential equation dZ µZ dW . t = t t Example 3. A function u : Rk R is called harmonic in a domain D (an open subset of ! Rk ) if it satisﬁes the Laplace equation ¢u 0 at all points of D. Let u be a harmonic func- = tion on Rk that is twice continuously differentiable. Then the multivariable Itô formula implies that if W is a k dimensional Wiener process, t ° t u(W ) u(W ) u(W )dW . t = 0 + r s s Z0 It follows, by localization, that if ø is a stopping time such that u(Ws) is bounded for 2 r s ø then u(Wt ø) is an L martingale. ∑ ^ d 2 Exercise 4. Check that in dimension d 3 the Newtonian potential u(x) x ° + is ∏ =| | harmonic away from the origin. Check that in dimension d 2 the logarithmic potential = u(x) log x is harmonic away from the origin. = | |

2.2 Itô processes

An Itô process is a solution of a stochastic differential equation. More precisely, an Itô process is an F progressively measurable process X that can be represented as ° t t t X X A ds V dW t T, (24) t = 0 + s + s · s 8 ∑ Z0 Z0 or equivalently, in differential form,

dX(t) A(s)ds V (s) dW(t). (25) = + · Here W (t) is a k dimensional Wiener process; V (s) is a k dimensional vector-valued ° ° process with components V W ; and A is a progressively measurable process that is i 2 T t integrable (in t) relative to Lebesgue measure with probability 1, that is,

T A ds a.s. (26) | s| <1 Z0 12 If X 0 and the integrand A 0 for all s, then call X I (V ) an Itô integral process.Note 0 = s = t = t that every Itô process has (a version with) continuous paths. Similarly, a k dimensional ° Itô process is a vector-valued process Xt with representation (24) where U,V are vector- valued and W is a k dimensional Wiener process relative to F. (Note: In this case VdW ° must be interpreted as V dW.) If X is an Itô process with representation (24) (either · t R univariate or multivariate), its quadratic variation is deﬁned to be the process R t [X ] : V 2 ds. (27) t = | s| Z0 If X (t) and X (t) are Itô processes relative to the same driving d dimensional Wiener 1 2 ° process, with representations (in differential form)

d dXi (t) Ai (s)ds Vij(t)dW(t), (28) = + j 1 X= then the quadratic covariation of X1 and X2 is deﬁned by d d[Xi , X j ]t : Vil(t)Vjl(t)dt. (29) = l 1 X= Theorem 4. (Univariate Itô Formula) Let u(t,x) be twice continuously differentiable in x and once continuously differentiable in t, and let X (t) be a univariate Itô process. Then 1 du(t, X (t)) ut (t, X (t))dt ux (t, X (t))dX(t) uxx(t, X (t))d[X ]t . (30) = + + 2

Note: It should be understood that the differential equation in (30) is shorthand for an integral equation. Since u and its partial derivatives are assumed to be continuous, the ordinary and stochastic integrals of the processes on the right side of (30) are well-deﬁned up to any ﬁnite time t. The differential dX(t) is interpreted as in (25). Theorem 5. (Multivariate Itô Formula) Let u(t,x) be twice continuously differentiable in x Rk and once continuously differentiable in t, and let X (t) be a k dimensional Itô 2 ° process whose components Xi (t) satisfy the stochastic differential equations (28). Then

1 k du(t, X (t)) us(s, X (s))ds x u(s, Xs)dX(s) uxi ,x j (s, X (s))d[Xi , X j ](s) (31) = +r + 2 i,j 1 X= Note: Unlike the Multivariate Itô Formula for functions of Wiener processes (Theorem 3 above), this formula includes mixed partials.

Theorems4 and ?? can be proved by similar reasoning as in sec. 2.5 below. Alterna- tively, they can be deduced as special cases of the general Itô formula for Itô integrals relative to continuous local martingales. (See notes on web page.)

13 2.3 Example: Ornstein-Uhlenbeck process

Recall that the Ornstein-Uhlenbeck process with mean-reversion parameter Æ 0 is the > mean zero Gaussian process X whose covariance function is EX X exp{ Æ t s }. t s t = ° | ° | This process is the continuous-time analogue of the autoregressive 1 process, and is a ° weak limit of suitably scaled AR processes. It occurs frequently as a weak limit of stochastic processes with some sort of mean-reversion, for much the same reason that the clas- sical harmonic oscillator equation (Hooke’s Law) occurs in mechanical systems with a restoring force. The natural stochastic analogue of the harmonic oscillator equation is

dX ÆX dt dW ; (32) t =° t + t Æ is called the relaxation parameter. To solve equation (32),setY eÆt X and use the t = t Itô formula along with (32) to obtain

dY eÆt dW . t = t Thus, for any initial value X x the equation (32) has the unique solution 0 = t Æt Æt Æs X X e° e° e dW . (33) t = 0 + s Z0 It is easily checked that the Gaussian process deﬁned by this equation has the covariance function of the Ornstein-Uhlenbeck process with parameter Æ. Since Gaussian processes are determined by their means and covariances, it follows that the process Xt deﬁned by (33) is a stationary Ornstein-Uhlenbeck process, provided the initial value X0 is chosen to be a standard normal variate independent of the driving Brownian motion Wt .

2.4 Example: Brownian bridge

Recall that the standard Brownian bridge is the mean zero Gaussian process {Yt }0 t 1 ∑ ∑ with covariance function EY Y s(1 t)for0 s t 1. The Brownian bridge is the s t = ° < ∑ < continuum limit of scaled simple random walk conditioned to return to 0 at time 2n.But simple random walk conditioned to return to 0 at time 2n is equivalent to the random walk gotten by sampling without replacement from a box with n tickets marked 1 and n + marked 1. Now if S k, then there will be an excess of k tickets marked 1 left in the ° [nt] = ° box, and so the next step is a biased Bernoulli. This suggests that, in the continuum limit, there will be an instantaneous drift whose direction (in the (t,x) plane) points to (1,0). Thus, let Wt be a standard Brownian motion, and consider the stochastic differential equation Yt dYt dt dWt (34) =°1 t + ° 14 for 0 t 1. To solve this, set U f (t)Y and use (34) together with the Itô formula ∑ ∑ t = t to determine which choice of f (t) will make the dt terms vanish. The answer is f (t) = 1/(1 t) (easy exercise), and so ° 1 1 d((1 t)° Y ) (1 t)° dW . ° t = ° t Consequently, the unique solution to equation (34) with initial value Y 0 is given by 0 = t 1 Y (1 t) (1 s)° dW . (35) t = ° ° s Z0 It is once again easily checked that the stochastic process Yt deﬁned by (35) is a mean zero Gaussian process whose covariance function matches that of the standard Brownian bridge. Therefore, the solution of (34) with initial condition Y 0 is a standard Brownian 0 = bridge.

2.5 Proof of the univariate Itô formula

For ease of notation, I will consider only the case where the driving Wiener process is 1 dimensional; the argument in the general case is similar. First, I claim that it suffices ° to prove the result for functions u with compact support. This follows by a routine argument using the Stopping Rule and Localization Principle for Itô integrals: let Dn be an increasing sequence of open sets in R R that exhaust the space, and let øn be the + £ first time that X exits the region D . Then by continuity, u(t ø , X (t ø )) u(t, X (t)) t n ^ n ^ n ! as n , and !1 ø ø t ^ n °! Z0 Z0 for each of the integrals in (30). Thus, the result (30) will follows from the corresponding formula with t replaced by t ø . For this, the function u can be replaced by a function ^ n u˜ with compact support such that u˜ u in D , by the Localization Principle. (Note: the = n Localization Lemma in the Appendix to the notes on harmonic functions implies that there is a C 1 function √n with compact support that takes values between 0 and 1 and is identically 1 on D . Set uˆ u√ to obtain a C 2 function with compact support that n = n agrees with u on Dn.) Finally, if (30) can be proved with u replaced by u˜, then it will hold with t replaced by t ø , using the Stopping Rule again. Thus, we may now assume that ^ n the function u has compact support, and therefore that it and its partial derivatives are bounded and uniformly continuous. Second, I claim that it suffices to prove the result for elementary Itô processes, that is, processes Xt of the form (24) where Vs and As are bounded, elementary processes. This follows by a routine approximation argument, because any Itô process X (t) can be approximated by elementary Itô processes.

15 It remains to prove (30) for elementary Itô processes Xt and functions u of compact support. Assume that X has the form (24) with

As ≥j 1[t j ,t j 1](s), = +

Vs Xªj 1[t j ,t j 1](s) = + X where ≥j ,ªj are bounded random variables, both measurable relative to Ft j . Now it is clear that to prove the Itô formula (30) it sufﬁces to prove it for t (t j ,t j 1) for each index 2 + j. But this is essentially the same as proving that for an elementary Itô process X of the form (24) with A ≥1 (s) and V ª1 (s), and ≥,ª measurable relative to F , s = [a,b] s = [a,b] a t t 1 t u(t, Xt ) u(a, Xa) us(s, Xs)ds ux (s, Xs)dXs uxx(s, Xs)d[X ]s ° = + + 2 Za Za Za for all t (a,b). Fix a t and set T t a. Deﬁne 2 < = ° ¢n X : X (a (k 1)T /2n) X (a kT/2n), k = + + ° + ¢nU : U(a (k 1)T /2n) U(a kT/2n), where k = + + ° + U(s): u(s, X (s)); U (s) u (s, X (s)); U (s) u (s, X (s)); etc. = s = s x = x

Notice that because of the assumptions on As and Vs,

n n 1 n ¢ X ≥2° T ° ª¢ W (36) k = + k Now by Taylor’s theorem,

n 1 2 ° n u(t, Xt ) u(a, Xa) ¢kU (37) ° = k 0 = X n 1 n 1 2 ° 2 ° n 1 n n n 2° T ° Us(kT/2 ) Ux (kT/2 )¢k X = k 0 + k 0 = = n 1 X X n 1 2 ° 2 ° n n 2 n Uxx(kT/2 )(¢k X ) /2 Rk + k 0 + k 0 X= X= n where the remainder term Rk satisﬁes

n 2n n 2 R " (2° (¢ X ) ) (38) | k |∑ n + k and the constants "n converge to zero as n . (Note: This uniform bound on the re- !1 1 2 mainder terms follows from the assumption that u(s,x) is C £ and has compact support, because this ensures that the partial derivatives us and uxx are uniformly continuous and bounded.)

16 Finally, let’s see how the four sums on the right side of (37) behave as n . First, !1 because the partial derivative us(s,x) is uniformly continuous and bounded, the ﬁrst sum is just a Riemann sum approximation to the integral of a continuous function; thus,

n 1 2 ° t n 1 n lim 2° T ° Us(kT/2 ) us(s, Xs)ds. n !1 k 0 = a X= Z Next, by (36), the second sum also converges, since it can be split into a Riemann sum for a Riemann integral and an elementary approximation to an Itô integral:

n 1 2 ° t n n lim Ux (kT/2 )¢ X ux (s, Xs)dXs. n k !1 k 0 = a X= Z The third sum is handled using Proposition 5 on the quadratic variation of the Wiener process, and equation (36) to reduce the quadratic variation of X to that of W (Exercise: Use the fact that uxx is uniformly continuous and bounded to ﬁll in the details):

n 1 2 ° t n n 2 1 lim Uxx(kT/2 )(¢ X ) /2 uxx(s, Xs)d[X ]s. n k !1 k 0 = 2 a X= Z Finally, by (38) and Proposition 5,

n 1 2 ° lim Rn 0. n k !1 k 0 = X=

3 Complex Exponential Martingales and their Uses

Assume in this section that W (W 1,W 2,...,W d )isad dimensional Brownian motion t = t t t ° started at the origin, and let F {Ft }t 0 be an admissible ﬁltration. = ∏

3.1 Exponential Martingales

Let V be a progressively measurable, d dimensional process such that for each T t ° <1 the process Vt is in the class WT , that is,

T P V 2 ds 1. k sk <1 = ΩZ0 æ 17 Then the Itô formula (30) implies that the (complex-valued) exponential process

t t 1 2 Zt : exp i Vs dWs Vs ds , t T, (39) = · + 2 k k ∑ Ω Z0 Z0 æ satisﬁes the stochastic differential equation

dZ iZ V dW . (40) t = t t · t

This alone does not guarantee that the process Zt is a martingale, because without fur- 2 ther assumptions the integrand Zt Vt might not be in the class VT . Of course, if the integrand Vt is uniformly bounded for t T then so is Zt , and so the stochastic differ- ∑ 2 ential equation (40) exhibits Zt as the Itô integral of a process in VT , which implies that {Zt }t T is a martingale. This remains true under weaker hypotheses on Vt : ∑ Proposition 8. Assume that for each T , <1 T 1 2 E exp Vs ds (41) 2 k k <1 Ω Z0 æ

Then the process Zt deﬁned by (39) is a martingale, and in particular,

EZ 1 for each T . (42) T = <1 Proof. Set

t Xt X (t) Vs dWs, = = 0 · Zt [X ] [X ](t) V 2 ds, and t = = k sk Z0 ø inf{t :[X ] n} n = t = Since Z (t ø )V (t ø ) is uniformly bounded for each n, the process Z (t ø ) is a ^ n ^ n ^ N bounded martingale. But

Z (t ø ) exp{iX(t ø ) [X ](t ø )/2} exp{[X ] /2}, | ^ n |= ^ n + ^ n ∑ t so the random variables Z (t ø ) are all dominated by the L1 random variable exp{[X ] /2} ^ n t (note that the hypothesis (41) is the same as the assertion that exp{[X ]t /2} has ﬁnite ﬁrst moment). Therefore the dominated convergence theorem for conditional expectations implies that the process Z (t) is a martingale.

18 3.2 Radial part of a d dimensional Brownian motion °

Proposition 9. If £t is any progressively measurable process taking values in the unit sphere of Rd then the Itô process

t X X (t) £ dW t = = s · s Z0 is a standard one-dimensional Brownian motion.

Proof. Since Xt has continuous paths (recall that all Itô integral processes do), it sufﬁces to show that Xt has stationary, independent increments and that the marginal distribution of Xt is Normal-(0,t). Both of these tasks can be done simultaneously via characteristic functions (Fourier transforms), by showing that for all choices 0 t t t = 0 < 1 <···< k and all Ø ,Ø ,...,Ø R, 1 2 k 2 k k 2 E exp iØj (X (t j ) X (t j 1)) Øj (t j ti 1)/2 1. (j 1 ° ° + j 1 ° ° ) = X= X= Deﬁne

Vt Øj £(t) if t j 1 t t j and = ° < ∑ V 0 if t t ; t = > k then

k k 2 E exp iØj (X (t j ) X (t j 1)) Øj (t j ti 1)/2 (j 1 ° ° + j 1 ° ° ) X= X= tk tk 1 2 E exp i Vs dWs V ds . = + 2 s Ω Z0 Z0 æ The process V is clearly bounded in norm (by max Ø ), so Proposition 8 implies that t | i | the process t t 1 2 Zt : exp i Vs dWs V ds = + 2 s Ω Z0 Z0 æ is a martingale, and it follows that EZ 1. tk =

19 A Bessel process with dimension parameter d is a solution (or a process whose law agrees with that of a solution) of the stochastic differential equation

d 1 dXt ° dt dWt , (43) = 2Xt + where Wt is a standard one-dimensional Brownian motion. The problems of existence and uniqueness of solutions to the Bessel SDEs (43) will be addressed later. The next result shows that solutions exist when d is a positive integer.

Proposition 10. Let W (W 1,W 2,...,W d ) be a d dimensional Brownian motion started t = t t t ° at a point x 0, and let R W be the modulus of W . Then R is a Bessel process of 6= t =| t | t t dimension d started at x . | |

Proof. The process Rt is gotten by applying a smooth (everywhere except at the origin) real-valued function to d dimensional Brownian motion. Since d dimensional Brow- ° ° nian motion started at a point x 0 will never visit the origin, Ito’s formula applies, and 6= (after a brief adventure in multivariate calculus) shows that for any t 0 > t t R R £ dW (d 1)/(2R )ds t ° 0 = s · s + ° s Z" Z0 By Proposition 9, the ﬁrst integral determines a standard, one-dimensional Brownian motion.

Remark 2. In section 4 we will use the stochastic differential equation (43) to show that the law of one-dimensional Brownian motion conditioned to stay positive forever coincides with that of the radial part R of a 3 dimensional Brownian motion. t °

3.3 Time Change for Itô Integral Processes

Let X I (V ) be an Itô integral process, where the integrand V is a d dimensional, t = t s ° progressively measurable process in the class WT . One should interpret the magnitude V as representing instantaneous volatility – in particular, the conditional distribution | t | of the increment X (t ±t) X (t) given the value V æ is approximately, for ±t 0, + ° | t |= ! the normal distribution with mean zero and variance æ±t. One may view this in one of two ways: (1) the volatility V is a damping factor – that is, it multiplies the next | t | Wiener increment by V ; alternatively, (2) V is a time regulation factor, either slowing | t | | t | or speeding the normal rate at which variance is accumulated. The next theorem makes this latter viewpoint precise:

20 Theorem 6. Every Itô integral process is a time-changed Wiener process. More precisely, let X I (V ) be an Itô integral process with quadratic variation [X ] t V 2 ds. Assume t = t t = 0 | s| that [X ]t for all t , but that [X ] limt [X ]t with probability one. For <1 <1 1 = !1 =1 R each s 0 deﬁne ∏ ø(s) inf{t 0:[X ] s}. (44) = > t = Then the process W˜ (s) X (ø(s)), s 0 (45) = ∏ is a standard Wiener process.

Note: A more general theorem of P. L ÉVY asserts that every continuous martingale (not necessarily adapted to a Wiener ﬁltration) is a time-changed Wiener process.

Proof. To prove Theorem 6 we must show (i) that the time-changed process W˜ s has continuous sample paths, and (ii) that its increments are independent, mean-zero Gaussian random variables with the correct variances. The proof of (i) is a bit subtle, because the process ø(s) might have jumps; we will return to it after dealing with (ii). For (ii) we will follow the same strategy as in Proposition 9: we will show that for all choices of 0 t t t and all Ø ,Ø ,...,Ø R, = 0 < 1 <···< k 1 2 k 2 k k ˜ ˜ 2 E exp iØj (W (t j ) W (t j 1)) Øj (t j ti 1)/2 1. (46) (j 1 ° ° + j 1 ° ° ) = X= X=

Since the process {W˜ s}s 0 is adapted to the ﬁltration (Fø(s))s 0, to prove (46) it sufﬁces to ∏ ∏ show that for all s,t 0 and any µ R, ∏ 2 2 E exp iµ(W˜ t s W˜ s) µ t/2 Fø(s) 1. (47) + ° + | = But by Proposition 8, the° process© ™ ¢

Z (r ): exp{iµX (r ø(s t)) µ2(r ø(s t))/2} µ = ^ + + ^ + is a martingale (the truncation at ø(s t) guarantees that the integrand in (41) is bounded), + and so (47) follows from the Optional Sampling Formula for martingales.

It remains to prove that the sample paths of W˜ s are, with probability one, continuous. Clearly, if the mapping s ø(s) is continuous then s W˜ X is also continuous, 7! 7! s = ø(s) because the process X I (V ), being an Itô integral process, has continuous sample t = t paths. The difﬁculty is that s ø(s) might not be continuous: in particular, if V 0 a.e. 7! s = on some time interval s (a,b) then the process ø(s) will have a jump discontinuity of 2 size at least b a at s [X ] . Thus, what we must show is that X on any time interval ° = a t t (a,b) such that [X ] [X ] . This is proved in the following lemma. 2 a = b

21 T 2 Lemma 2. Let (Vs)s 0 be a progressively measurable process such that 0 Vs ds for ∏ <1 every T . Let X I (V ) be the corresponding Itô integral process, and denote by <1 t = t R [X ] t V 2 ds its quadratic variation. Then with probability one, for any nonempty time t = 0 s interval (a,b) on which [X ] is contant, the process X is also constant. R t t

Proof. Clearly, it suffices to prove that, for every nonempty interval (a,b) with rational endpoints a,b, we have X X almost surely on the event [X ] [X ] . Furthermore, b = a b = a by a routine localization argument, it suffices to consider the case where T V 2 C for 0 s < some constant C . In this case, the integrand (Vs)s 0 is in the class VT for every T , <1 ∏ R <1 so the process X 2 [X ] is a martingale. t ° t Fix a rational interval (a,b) (0, ) and a small " 0, and define Ω 1 > ø inf{t a :[X ] [X ] "}. " = ∏ t ° a = By Doob’s maximal inequality,

2 2 ± P{ max Xt Xa ±} E Xb ø" Xa a t b ø | ° |∏ ∑ | ^ ° | ∑ ∑ ^ " E[X ]b ø E[X ]a ". = ^ " ° ∑ Consequently, by the Borel-Cantelli lemma, with probability one only ﬁnitely many of the events n { max Xt Xa 2° ±} a t b ø n | ° |∏ ∑ ∑ ^ 8° will occur. But on the event {[X ] [X ] }, for every n 1,2,... it must be the case that b = a = ø n b; therefore, on this event 8° ∏

max Xt Xa 0 with probability one. a t b | ° |= ∑ ∑

Corollary 2. Let X I (V ) be an Itô integral process with quadratic variation [X ] , and t = t t let ø(s) be deﬁned by (44). Then for each Æ 0, >

P{max Xt Æ} 2P{Ws Æ}. (48) t ø(s) | |∏ ∑ ∏ ∑

Consequently, the random variable maxt ø(s) Xt has ﬁnite moments of all order, and even ∑ | | a ﬁnite moment generating function.

Proof. The maximum of X (t) up to time ø(s) coincides with the maximum of the Wiener | | process W˜ up to time s, so the result follows from the Reﬂection Principle.

22 3.4 Itô Representation Theorem

W W Theorem 7. Let Wt be a d dimensional Wiener process and let F (Ft )0 t be the ° = ∑ <1 minimal filtration for W . Then for any F W measurable random variable Y with mean t T ° zero and finite variance, there exists a (d dimensional vector-valued) process V in the ° t class VT such that T Y I (V ) V dW . (49) = T = s · s Z0 2 Corollary 3. If {Mt }t T is an L bounded martingale relative to the minimal filtration ∑ ° FW of a Wiener process, then M I (V ) a.s. for some process V in the class V , and t = t t T consequently Mt has a version with continuous paths.

Proof of the Corollary. Assume that T ; then Y : M satisﬁes the hypotheses of <1 = T Theorem 7, and hence has representation (49). For any integrand Vt of class VT , the Itô integral process I (V ) is an L2 martingale, and so by (49), t ° M E(M F W ) E(Y F W ) I (V ) a.s. t = T | t = | t = t

Proof of Theorem 7. It is enough to consider random variables Y of the form

Y f (W (t ),W (t ),...,W (t )) (50) = 1 2 I 2 W because such random variables are dense in L (≠,FT ,P). If there is a random variable Y of the form (50) that is not a stochastic integral, then (by orthogonal projection) there exists such a Y that is uncorrelated with every Y 0 of the form (50) that is a stochastic integral. I will show that if Y is a mean zero, ﬁnite-variance random variable of the form (50) that is uncorrelated with every random variable Y 0 of the same form that is also a stochastic integral, then Y 0a.s.By(40) above (or alternatively see Example 2 in = section 2), for all µ Rd the random variable j 2 J Y 0 exp µj W (t j ) = (j 1 ) X= d is a stochastic integral. Clearly, Y 0 has ﬁnite variance. Thus, by hypothesis, for all µ R , j 2 J Ef(W (t1),W (t2),...,W (tJ ))exp i µj ,W (t j ) 0. (51) (j 1 h i) = X=

23 This implies that the random variable Y must be 0 with probability one. To see this, consider the (signed) measure µ on RJd deﬁned by

dµ(x) f (x)P{(W (t ),W (t ),...,W (t )) dx}; = 1 2 J 2 then equation (51) implies that the Fourier transform of µ is identically 0, and this in turn implies that µ 0. = 1 Exercise 5. Does an L bounded martingale {Mt }t T necessarily have a version with ° ∑ continuous paths?

3.5 Hermite Functions and Hermite Martingales

The Hermite functions Hn(x,t) are polynomials in the variables x and t that satisfy the backward heat equation H H /2 0. As we have seen (e.g., in connection with the t + xx = exponential function exp{µx µ2t/2}), if H(x,t) satisﬁes the backward heat equation, ° then when the Itô Formula is applied to H(Wt ,t), the ordinary integrals cancel, leaving only the Itô integral; and thus, H(Wt ,t) is a (local) martingale. Consequently, the Hermite functions provide a sequence of polynomial martingales. The ﬁrst few Hermite functions are

H (x,t) 1, (52) 0 = H (x,t) x, 1 = H (x,t) x2 t, 2 = ° H (x,t) x3 3xt, 3 = ° H (x,t) x4 6x2t 3t 2. 4 = ° + The formal deﬁnition is by a generating function:

n 1 µ 2 Hn(x,t) exp{µx µ t/2} (53) n 0 n! = ° X= Exercise 6. Show that the Hermite functions satisfy the two-term recursion relation 2m n m Hn 1 xHn ntHn 1. Conclude that every term of H2m is a constant times x t ° for + = ° ° some 0 m n, and that the lead term is the monomial x2n. Conclude also that each ∑ ∑ Hn solves the backward heat equation.

Proposition 11. Let V be a bounded, progressively measurable process, and let X I (V ) s t = t be the Itô integral process with integrand V . Then for each n 0, the processes H (X ,[X ] ) ∏ n t t is a martingale.

24 Proof. Since each Hn(x,t) satisﬁes the backward heat equation, the Itô formula implies that H(Xt ø,[X ]t ø) is a martingale for each stopping time ø ø(m) ﬁrst t such that ^ ^ = = either X m or [X ] m.IfV is bounded, then Corollary 2 guarantees that for each | t |= t = s t the random variables H(Xt ø(m),[X ]t ø(m)) are dominated by an integrable random ^ ^ variable. Therefore, the DCT for conditional expectations implies that Hn(Xt ,[X ]t )isa martingale.

3.6 Moment Inequalities

Corollary 4. Let V be a progressively measurable process in the class W , and let X I (V ) t T t = t be the associated Itô integral process. Then for every integer m 1 and every time T , ∏ <1 there exist constants C such that m <1 EX2m C E[X ]m. (54) T ∑ m T

Note: Burkholder, Davis, and Gundy have proved maximal inequalities that are consid- erably stronger than this, but the arguments are not elementary.

Proof. First, it sufﬁces to consider the case where the integrand Vs is uniformly bounded. To see this, deﬁne in general the truncated integrands V (m) : V (s) m; then s = ^ (m) lim It (V ) It (V ) a.s., and m !1 = (m) lim [I(V )]t [I(V )]t . m !1 " = Hence, if the result holds for each of the truncated integrands V (m), then it will hold for V , by Fatou’s Lemma and the Monotone Convergence Theorem.

Assume, then, that Vs is uniformly bounded. The proof of (54) in this case is by induction on m. First note that, because Vt is assumed bounded, so is the quadratic variation [X ]T at any ﬁnite time T , and so [X ]T has ﬁnite moments of all orders. Also, if Vt is bounded then it is an element of VT , and so the Itô isometry implies that

EX2 E[X ] . T = T This takes care of the case m 1. The induction argument uses the Hermite martingales = H2m(Xt ,[X ]t ) (Proposition 11). By Exercise 6, the lead term (in x) of the polynomial 2m 2k m k H (x,t) is x , and the remaining terms are all of the form a x t ° for k m. 2m m,k <

25 Since H (X ,[X ] ) 0, the martingale identity implies 2m 0 0 = m 1 2m ° 2k m k EXT am,k EXT [X ]T ° =°k 0 =) = Xm 1 2m ° 2k m k EXT A2m EXT [X ]T ° ∑ k 0 =) = mX1 2m ° 2m k/m m 1 k/m EXT A2m (EXT ) (E[X ]T ) ° , ∑ k 0 X= the last by Holder’s inequality. Note that the constants A2m max am,k are determined = | | 2m by the coefﬁcients of the Hermite function H2m. Now divide both sides by EXT to obtain m 1 m 1 k/m ° E[X ] ° 1 A T . ∑ 2m 2m k 0 √ EXT ! X= The inequality (54) follows.

4 Girsanov’s Theorem

4.1 Change of measure

Let (≠,{Ft }t 0,P) be a filtered probability space. If ZT is a nonnegative, FT measurable ∏ ° random variable with expectation 1 then it is a likelihood ratio, that is, the measure Q on FT defined by Q(F ): E 1 Z (55) = P F T is a probability measure, and the likelihood ratio (Radon-Nikodym derivative) of Q relative to P is Z . It is of interest to know how the measure Q restricts to the æ algebra T ° F F when t T . t Ω T < Proposition 12. Let Z be an F measurable, nonnegative random variable such that T T ° E P Z 1, and let Q Q be the probability measure on F with likelihood ratio Z T = = T T T relative to P. Then for any t [0,T ] the restriction Q of Q to the æ algebra F has 2 t ° t likelihood ratio Z : E P (Z F ). (56) t = T | t Proof. The random variable Z defined by (56) is nonnegative, F measurable, and t t ° integrates to 1, so it is the likelihood ratio of a probability measure on Ft . For any event A F , 2 t Q(A): E P Z 1 E P E P (Z F )1 = T A = T | t A

26 by deﬁnition of conditional expectation, so Z (dQ/dP) . t = Ft

4.2 Example: Brownian motion with drift

Assume now that Wt is a standard one-dimensional Brownian motion, with admissible filtration {Ft }t 0 and let µ R be a fixed constant. Recall that the exponential process ∏ 2 Z µ : exp{µW µ2t/2} (57) t = t ° is a nonnegative martingale with initial value 1. Thus, for each T the random vari- <1 able Z is the likelihood ratio of a probability measure Q Qµ on F , defined by (55). T = T T µ Theorem 8. Under Q Q , the process {Wt }t T is a Brownian motion with drift µ, equiv- = T ∑ alently, the process {W˜ t Wt µt}t T is a standard Brownian motion. = ° ∑ Proof. Under P the process W µt has continuous paths, almost surely. Since Q is ab- t ° solutely continuous with respect to P, it follows that under Q the process W µt has t ° continuous paths. Thus, to show that this process is a Brownian motion it is enough to show that it has the right finite-dimensional distributions. This can be done by calculating the joint moment generating functions of the increments: fix 0 t t t T , = 0 < 1 <···< J = and set

¢t j t j t j 1, = ° ° ¢Wj W (t j ) W (t j 1), = ° ° ¢W˜ j W˜ (t j ) W˜ (t j 1); = ° ° then J J J 2 EQ exp Æj ¢W˜ j exp µ Æj ¢t j EP exp (Æj µ)¢Wj µ T /2 (i 1 ) = (° j 1 ) (i 1 + ° ) X= X= X= J J 2 exp µ T /2 µ Æj ¢t j EP exp (Æj µ)¢Wj = (° ° j 1 ) (i 1 + ) X= X= J J 2 2 exp µ T /2 µ Æj ¢t j exp (Æj µ) ¢t j /2 = (° ° j 1 ) (i 1 + ) X= X= J 2 exp Æj ¢t j /2 = (i 1 ) X=

27 µ By Proposition 12, the family of measures {Q }T 0 is consistent in the sense that the T ∏ restriction of QT S to the æ algebra FT is just QT . It is a routine exercise in measure + ° theory to show that these measures extend to a probability measure Qµ Qµ on the = 1 smallest æ algebra F that contains t 0Ft . It is important to note that, although ° 1 [ ∏ each QT is absolutely continuous relative to PT , the extension Q is singular relative to 1 P : this is because the strong law of large numbers for sums of i.i.d. standard normals 1 implies that

W /n 0 almost surely P but n ! W /n µ almost surely Q. n !

4.3 The Girsanov formula

The Girsanov theorem is a far-reaching extension of Theorem 8 that describes the change of measure needed to transform Brownian motion to Brownian motion plus a progressively measurable drift. Let (≠,F },P) be a probability space that supports a d dimensional ° Brownian motion Wt , and let {Ft }t 0 be an admissible filtration. Let {Vt }t 0 be a progres- ∏ ∏ sively measurable process relative to the filtration, and assume that for each T , <1 T P V 2 ds 1. | s| <1 = ΩZ0 æ Then the Itô integral I (V ) is well-defined for every t 0, and by a routine application of t ∏ the Itô formula, the process

t t Z exp V dW V 2 ds/2 exp{I (V ) [I (V )]/2} (58) t = s · s ° s = t ° t ΩZ0 Z0 æ satisﬁes the stochastic differential equation

t dZ Z V dW Z Z Z V dW . (59) t = t t · t () t ° 0 = s s · s Z0

Proposition 13. If the process {Vt }t 0 is bounded then {Zt }t 0 is a martingale, and conse- ∏ ∏ quently, for each T ,EZ 1. <1 T =

Remark 3. The hypothesis that Vt be bounded can be weakened substantially: a theorem of Novikov asserts that Z is a martingale if for every T , t <1 T 1 2 E exp Vs ds . (60) 2 | | <1 Ω Z0 æ

28 Proof of Proposition 13. Since the stochastic differential equation (59) has no dt term, it suffices to show that for each T the process Z V is in the integrability class V 2, and <1 t t T since the process V is bounded, it suffices to show that for each T , t <1 T EZ2 dt . (61) t <1 Z0 Clearly, t Z 2 exp 2 V dW . t ∑ s · s Ω Z0 æ By the time-change theorem for Itô integral processes, the process 2It (V ) in the last exponential is a time-changed Brownian motion, in particular, the process W˜ I (V )is s = ø(s) a Brownian motion in s, where ø(s) is defined by (44). Because the process Vt is bounded, there exists C such that the accumulated quadratic variation [I (V )] is bounded by <1 t Ct, for all t. Consequently,

t Vs dWs maxW˜ s : M˜ Ct, · ∑ s Ct = Z0 ∑ and so by Brownian scaling,

EZ2 E exp{2pCtM˜ }. t ∑ 1 By the reﬂection principle, this last expectation can be bounded as follows:

1 2pCty y2/2 E exp{2pCtM˜ } 2 e 2e° dy/p2º 1 = Z0 1 2pCty y2/2 2 e 2e° dy/p2º ∑ Z°1 2exp{2Ct}. = It is now apparent that (61) holds.

Proposition 13 asserts that if the process V is bounded then for each T the t <1 random variable Z (T ) is a likelihood ratio, that is, a nonnegative random variable that integrates to 1. We have already noted that boundedness of V is not necessary for EZ t T = 1, which is all that is needed to ensure that

Q(F ) E (Z (T )1 ) (62) = P F defines a new probability measure on (≠,FT ). Girsanov’s theorem describes the distribution of the stochastic process {W (t)}t 0 under this new probability measure. Define ∏ t W˜ (t) W (t) V ds (63) = ° s Z0 29 Theorem 9. (Girsanov) Assume that under P the process {Wt }t 0 is a d dimensional ∏ ° Brownian motion with admissible filtration F {Ft }t 0, and that the exponential process = ∏ {Zt }t T defined by (58) is a martingale relative to F under P. Define Q on FT by equa- ∑ ˜ tion (62). Then under the probability measure Q, the stochastic process W (t) 0 t T is a standard Wiener process. ∑ ∑ © ™

Proof. To show that the process W˜ t , under Q, is a standard Wiener process, it sufﬁces to show that it has independent, normally distributed increments with the correct variances. For this, it sufﬁces to show either that the joint moment generating function or the joint characteristic function (under Q) of the increments

W˜ (t1),W˜ (t2) W˜ (t1), ,W˜ (tn) W˜ (tn 1) ° ··· ° ° where 0 t t t , is the same as that of n independent, normally distributed < 1 < 2 <···< n random variables with expectations 0 and variances t ,t t ,..., that is, either 1 2 ° 1 n n ˜ ˜ 2 EQ exp Æk (W (tk ) W (tk 1)) exp Æk (tk tk 1) or (64) (k 1 ° ° ) = k 1 ° ° X= Y= n n © ™ ˜ ˜ 2 EQ exp iµk (W (tk ) W (tk 1)) exp µk (tk tk 1) . (65) (k 1 ° ° ) = k 1 ° ° ° X= Y= © ™

Special Case: Assume that the integrand process Vs is bounded. In this case it is easiest to use moment generating functions. Consider for simplicity the case n 1: To evaluate = the expectation EQ on the left side of (64), we rewrite it as an expectation under P, using the basic likelihood ratio identity relating the two expectation operators:

t E exp ÆW˜ (t) E exp ÆW (t) Æ V ds Q = Q ° s Ω Z0 æ © ™ t t t E exp ÆW (t) Æ V ds exp V dW V 2 ds/2 = P ° s s s ° s Ω 0 æ Ω 0 0 æ t Z t Z Z E exp (Æ V )dW (2ÆV V 2)ds/2 = P + s s ° s + s ΩZ0 Z0 æ 2 t t eÆ t/2E exp (Æ V )dW (Æ V )2 ds/2 = P + s s ° + s Ω 0 0 æ 2 Z Z eÆ t , = as desired. In the last step we used the fact that the exponential integrates to one. This follows from Novikov’s theorem, because the hypothesis that the integrand Vs is bounded guarantees that Novikov’s condition (60) is satisﬁed by (Æ V ). A similar calculation + s shows that (64) holds for n 1. >

30 General Case: Unfortunately, the final step in the calculation above cannot be justified in general. However, a similar argument can be made using characteristic functions rather than moment generating functions. Once again, consider for simplicity the case n 1: = t E exp iµW˜ (t) E exp iµW (t) iµ V ds Q = Q ° s Ω Z0 æ © ™ t t t E exp iµW (t) iµ V ds exp V dW V 2 ds/2 = P ° s s s ° s Ω 0 æ Ω 0 0 æ t Z t Z Z E exp (iµ V )dW (2iµV V 2)ds/2 = P + s s ° s + s Ω 0 0 æ Zt Zt 2 µ2t/2 E exp (iµ V )dW (iµ V ) ds/2 e° = P + s s ° + s ΩZ0 Z0 æ µ2t/2 e° , = which is the characteristic function of the normal distribution N(0,t). To justify the final equality, we must show that E exp{X [X ] /2} 1 P t ° t = where t t X (iµ V )dW and [X ] (iµ V )2 ds t = + s s t = + s Z0 Z0 Itô’s formula implies that

d exp{X [X ] /2} exp{X [X ] /2}(iµ V )dW , t ° t = t ° t + t t and so the martingale property will hold up to any stopping time ø that keeps the integrand on the right side bounded. Deﬁne stopping times

ø(n) inf{s : X n or [X ] n or V n}; = | |s = s = | s|= then for each n 1,2,..., =

EP exp{Xt ø(n) [X ]t ø(n)/2} 1 ^ ° ^ = As n , the integrand converges pointwise to exp{X [X ] /2}, so to conclude the !1 t ° t proof it sufﬁces to verify uniform integrability. For this, observe that for any s t, ∑ exp{X [X ] /2} eµs Z eµt Z | s ° s |∑ s ∑ s

By hypothesis, the process {Zs}s t is a positive martingale, and consequently the ran- ∑ dom variables Zt ø(n) are uniformly integrable. This implies that the random variables ^ exp{Xt ø(n) [X ]t ø(n)/2} are also uniformly integrable. | ^ ° ^ |

31 4.4 Example: Ornstein-Uhlenbeck revisited

Recall that the solution Xt to the linear stochastic differential equation (32) with initial condition X x is an Ornstein-Uhlenbeck process with mean-reversion parameter Æ. 0 = Because the stochastic differential equation (32) is of the same form as equation (63), Girsanov’s theorem implies that a change of measure will convert a standard Brownian motion to an Ornstein-Uhlenbeck process with initial point 0. The details are as follows: Let Wt be a standard Brownian motion defined on a probability space (≠,F ,P), and let F {Ft }t 0 be the associated Brownian filtration. Define = ∏ t N Æ W dW , (66) t =° s s Z0 Z exp{N [N] /2}, (67) t = t ° t and let Q QT be the probability measure on FT with likelihood ratio ZT relative to P. = t 2 Observe that the quadratic variation [N]t is just the integral 0 Ws ds. Theorem 9 asserts that, under the measure Q, the process R t W˜ : W ÆW ds, t = t + s Z0 for 0 t T , is a standard Brownian motion. But this implies that the process W itself ∑ ∑ must solve the stochastic differential equation

dW ÆW dt dW˜ t =° t + t under Q. It follows that {Wt }0 t T is, under Q, an Ornstein-Uhlenbeck process with ∑ ∑ mean-reversion parameter Æ and initial point x. Similarly, the shifted process x W is, + t under Q, an Ornstein-Uhlenbeck process with initial point x.

It is worth noting that the likelihood ratio Zt may be written in an alternative form that contains no Itô integrals. To do this, use the Itô formula on the quadratic function u(x) x2 to obtain W 2 2I (W ) t; this shows that the Itô integral in the definition (66) = t = t + of N may be replaced by W 2/2 t/2. Hence, the likelihood ratio Z may be written as t t ° t t Z exp{ ÆW 2/2 Æt/2 Æ2 W 2 ds/2}. (68) t = ° t + ° s Z0 A physicist would interpret the quantity in the exponential as the energy of the path W in a quadratic potential well. According to the fundamental postulate of statistical physics (see FEYNMAN, Lectures on Statistical Physics), the probability of finding a system in a H(æ)/kT given configuration æ is proportional to the exponential e° , where H(æ) is the energy (Hamiltonian) of the system in configuration æ. Thus, a physicist might view the

32 Ornstein-Uhlenbeck process as describing ﬂuctuations in a quadratic potential. (In fact, the Ornstein-Uhlenbeck process was originally invented to describe the instantaneous velocity of a particle undergoing rapid collisions with molecules in a gas. See the book by Edward Nelson on Brownian motion for a discussion of the physics of the Brownian motion process.) The formula (68) also suggests an interpretation of the change of measure in the language of acceptance/rejection sampling: Run a standard Brownian motion for time T to obtain a path x(t); then “accept” this path with probability proportional to

T exp{ Æx2 /2 Æ2 x2 ds/2}. ° T ° s Z0 The random paths produced by this acceptance/rejection scheme will be distributed as Ornstein-Uhlenbeck paths with initial point 0. This suggests (and it can be proved, but this is beyond the scope of these notes) that the Ornstein-Uhlenbeck measure on C[0,T ] is the weak limit of a sequence of discrete measures µn that weight random walk paths according to their potential energies.

Problem 1. Show how the Brownian bridge can be obtained from Brownian motion by change of measure, and ﬁnd an expression for the likelihood ratio that contains no Itô integrals.

4.5 Example: Brownian motion conditioned to stay positive

For each x R, let P x be a probability measure on (≠,F ) such that under P x the process 2 W is a one-dimensional Brownian motion with initial point W x. (Thus, under P x the t 0 = process W x is a standard Brownian motion.) For a b deﬁne t ° < T T min{t 0:W (a,b)}. = a,b = ∏ t 62 Proposition 14. Fix 0 x b, and let T T . Let Qx be the probability measure ob- < < = 0,b tained from P x by conditioning on the event {W b} that W reaches b before 0, that is, T = Qx (F ): P x (F {W b})/P x {W b}. (69) = \ T = T = x Then under Q the process {Wt }t T has the same distribution as does the solution of the ∑ stochastic differential equation

1 dX X ° dt dW , X x (70) t = t + t 0 = under P x . In other words, conditioning on the event W b has the same effect as adding T = the location-dependent drift 1/Xt .

33 x Proof. Q is a measure on the æ algebra FT that is absolutely continuous with respect x x ° x to P . The likelihood ratio dQ /dP on FT is

1{WT b} b ZT : = 1{WT b}. = P x {W b} = x = T = x x For any (nonrandom) time t 0, the likelihood ratio Zt T dQ /dP on FT t is gotten ∏ ^ x= ^ by computing the conditional expectation of ZT under P (Proposition 12). Since ZT is a function only of the endpoint WT , its conditional expectation on FT t is the same as ^ the conditional expectation on æ(WT t ), by the (strong) Markov property of Brownian ^ motion. Moreover, this conditonal expectation is just WT t /b (gambler’s ruin!). Thus, ^

ZT t WT t /x. ^ = ^ This doesn’t at ﬁrst sight appear to be of the exponential form required by the Girsanov theorem, but actually it is: by the Itô formula,

T t T t ^ 1 ^ 2 ZT t exp{log(WT t /x)} exp Ws° dWs Ws° ds/2 . ^ = ^ = ° ΩZ0 Z0 æ Consequently, the Girsanov theorem (Theorem 9) implies that under Qx the process T t 1 Wt T 0 ^ Ws° ds is a Brownian motion. This is equivalent to the assertion that under x^ ° Q the process WT t behaves as a solution to the stochastic differential equation (70). R ^ Observe that the stochastic differential equation (70) does not involve the stopping place b. Thus, Brownian motion conditioned to hit b before 0 can be constructed by running a Brownian motion conditioned to hit b a before 0, and stopping it at the first + hitting time of b. Alternatively, one can construct a Brownian motion conditioned to hit b before 0 by solving1 the stochastic differential equation (70) for t 0 and stopping it at ∏ the first hit of b. Now observe that if Wt is a Brownian motion started at any fixed s, the events {W b} are decreasing in b, and their intersection over all b 0 is the event that T0,b = > W 0 for all t and W visits all b x. (This is, of course, an event of probability zero.) t > t > For this reason, probabilists refer to the process Xt solving the stochastic differential equation (70) as Brownian motion conditioned to stay positive.

CAUTION: Because the event that W 0 for all t 0 has probability zero, we cannot t > > deﬁne “Brownian motion conditioned to stay positive” using conditional probabilities directly in the same way (see equation (69)) that we deﬁned “Brownian motion conditioned to hit b before 0”. Moreover, whereas Brownian motion conditioned to hit b before 0 has a distribution that is absolutely continuous relative to that of Brownian motion, Brownian motion conditioned to stay positive for all t 0 has a distribution (on C[0, )) > 1 that is necessarily singular relative to the law of Brownian motion.

1We haven’t yet established that the SDE (70) has a soulution for all t 0. This will be done later. ∏

34 5 Local Time and the Tanaka Formula

5.1 Occupation Measure of Brownian Motion

Let Wt be a standard one-dimensional Brownian motion and let F : {Ft }t 0 the asso- = ∏ ciated Brownian ﬁltration. A sample path of {Ws}0 s t induces, in a natural way, an ∑ ∑ occupation measure °t on (the Borel ﬁeld of) R:

t ° (A): 1 (W )ds. (71) t = A s Z0 Theorem 10. With probability one, for each t the occupation measure ° is absolutely <1 t continuous with respect to Lebesgue measure, and its Radon-Nikodym derivative L(t;x) is jointly continuous in t and x.

The occupation density L(t;x) is known as the local time of the Brownian motion at x. It was ﬁrst studied by Paul Lévy, who showed — among other things — that it could be deﬁned by the formula

1 t L(t;x) lim 1[x ",x "](Ws)ds. (72) = " 0 2" ° + ! Z0 Joint continuity of L(t;x)int,x was ﬁrst proved by Trotter. The modern argument that follows is based on an integral representation discovered by Tanaka.

5.2 Tanaka’s Formula

Theorem 11. The process L(t;x) deﬁned by

t 2L(t;x) W x x sgn(W x)dW (73) =| t ° |°| |° s ° s Z0 is almost surely nondecreasing, jointly continuous in t,x, and constant on every open time interval during which W x. t 6= For the proof that L(t;x) is continuous (more precisely, that it has a continuous version) we will need two auxiliary results. The ﬁrst is the Burkholder-Davis-Gundy inequality (Corollary 4 above). The second, the Kolmogorov-Chentsov criterion for path- continuity of a stochastic process, we have seen earlier:

35 Proposition 15. (Kolmogorov-Chentsov) Let Y (t) be a stochastic process indexed by a d dimensional parameter t. Then Y (t) has a version with continuous paths if there exist ° constants p,± 0 and C such that for all s,t, > <1 p d ± E Y (t) Y (s) C t s + . (74) | ° | ∑ | ° | Proof of Theorem 11: Continuity. Since the process W x x is jointly continuous, the | t ° |°| | Tanaka formula (73) implies that it is enough to show that

t Y (t;x): sgn(W x)dW = s ° s Z0 is jointly continuous in t and x. For this, we appeal to the Kolmogorov-Chentsov theorem: This asserts that to prove continuity of Y (t;x) in t,x it sufﬁces to show that for some p 1 ∏ and C,± 0, > p 2 ± E Y (t;x) Y (t;x0) C x x0 + and (75) | ° | ∑ | ° | p 2 ± E Y (t;x) Y (t 0;x) C t t 0 + . (76) | ° | ∑ | ° | I will prove only (75), with p 6 and ± 1; the other inequality, with the same values of = = p,±, is similar. Begin by observing that for x x0, < t Y (t;x) Y (t;x0) {sgn(Ws x) sgn(Ws x0)}dWs ° = 0 ° ° ° Z t 2 1 (W )dW = (x,x0) s s Z0

This is a martingale in t whose quadratic variation is ﬂat when Ws is not in the interval (x,x0) and grows linearly in time when W (x,x0). By Corollary 4, s 2 2m E Y (t;x) Y (t;x0) | ° | t m C 22mE 1 (W )ds ∑ m (x,x0) s µZ0 ∂ t t t1 t tm 1 3m m ° ° ° 1 Cm2 x x0 m! dt1dt2 dtm ∑ | ° | t 0 t 0 ··· t 0 pt1t2 tm ··· Z1= Z2= Zm = m ··· C 0 x x0 . ∑ | ° |

Proof of Theorem 11: Conclusion. It remains to show that L(t;x) is nondecreasing in t, and that it is constant on any time interval during which W x. Observe that the t 6=

36 Tanaka formula (73) is, in effect, a variation of the Itô formula for the absolute value function, since the sgn function is the derivative of the absolute value everywhere except at 0. This suggests that we try an approach by approximation, using the usual Itô formula to a smoothed version of the absolute value function. To get a smoothed version of , |·| convolve with a smooth probability density with support contained in [ ","] . Thus, let ' ° be an even, C 1 probability density on R with support contained in ( 1,1) (see the proof ° of Lemma ?? in the Appendix below), and deﬁne

'n(x): n'(nx); = x √ (x): 1 2 ' (y)dy; and n =° + n Z°1 Fn(x): x for x 1 =| | x | |> : 1 √n(z)dz for x 1. = + 1/n | |∑ Z° 1/n Note that 1/n √n 0, because of the symmetry of ' about 0. (EXERCISE: Check this.) ° = Consequently, Fn is C 1 on R, and agrees with the absolute value function outside the R 1 1 interval [ n° ,n° ]. The ﬁrst derivative of F is √ , and hence is bounded in absolute ° n n value by 1; it follows that F (y) y for all y R. The second derivative F 00 2' . n !| | 2 n = n Therefore, Itô’s formula implies that t 1 t Fn(Wt x) Fn( x) √n(Ws x)dWs 2'n(Ws x)ds. (77) ° ° ° = ° + 2 ° Z0 Z0 By construction, F (y) y as n , so the left side of (77) converges to W x x . n !| | !1 | t ° |°| | Now consider the stochastic integral on the right side of (77): As n , the function √n !11 1 converges to sgn, and in fact √ coincides with sgn outside of [ n° ,n° ]; moreover, the n ° difference sgn √ is bounded. Consequently, as n , | ° n| !1 t t lim √n(Ws x)dWs sgn(Ws x)dWs, n ° = ° !1 Z0 Z0 because the quadratic variation of the difference converges to 0. (EXERCISE: Check this.) This proves that two of the three quantities in (77) converge as n , and so the third !1 must also converge:

t L(t;x) lim 'n(Ws x)ds: Ln(t;x) (78) = n ° = !1 Z0 Each Ln(t;x) is nondecreasing in t, because 'n 0; therefore, L(t;x) is also nondecreas- ∏ 1 1 ing in t. Finally, since 'n 0 except in the interval [ n° ,n° ], each of the processes = ° 1 1 Ln m(t;x) is constant during time intervals when Wt [ n° ,n° ]. Hence, L(t;x)is + 62 ° constant on time intervals during which W x. s 6=

37 Proof of Theorem 10. It sufﬁces to show that for every continuous function f : R R with ! compact support, t f (W )ds f (x)L(t;x)dx a.s. (79) s = Z0 ZR Let ' be, as in the proof of Theorem 11, a C 1 probability density on R with support [ 1,1], and let ' (x) n'(nx); thus, ' is a probability density with support [ 1/n,1/n]. ° n = n ° By Corollary ??, ' f f n § ! uniformly as n . Therefore, !1 t t lim 'n f (Ws)ds f (Ws)ds. n § = !1 Z0 Z0 But t t 'n f (Ws)ds f (x)'n(Ws x)dxds 0 § = 0 R ° Z Z Z t f (x) ' (W x)dsdx = n s ° ZR Z0 f (x)L (t;x)dx = n ZR where L (t;x) is deﬁned in equation (78) in the proof of Theorem 11. Since L (t;x) n n ! L(t;x) as n , equation 79 follows. !1

5.3 Skorohod’s Lemma

Tanaka’s formula (73) relates three interesting processes: the local time process L(t;x), the reﬂected Brownian motion W x , and the stochastic integral | t ° | t X (t;x): sgn(W x)dW . (80) = s ° s Z0 Proposition 16. For each x R the process {X (t;x)}t 0 is a standard Wiener process. 2 ∏ Proof. Since X (t;x) is given as the stochastic integral of a bounded integrand, the time- change theorem implies that it is a time-changed Brownian motion. To show that the time change is trivial it is enough to check that the accumulated quadratic variation up to time t is t. But the quadratic variation is t 2 [X ]t sgn(Ws x) ds t, and = 0 ° ∑ Zt E[X ] P{W x}ds t. t = s 6= = Z0

38 Thus, the process x X (t;x) is a Brownian motion started at x . Tanaka’s formula, | |+ | | after rearrangement of terms, shows that this Brownian motion can be decomposed as the difference of the nonnegative process W x and the nondecreasing process 2L(t;x): | t ° | x X (t;x) W x 2L(t;x). (81) | |+ =| t ° |° This is called Skorohod’s equation. Skorohod discovered that there is only one such decomposition of a continuous path, and that the terms of the decomposition have a peculiar form:

Lemma 3. (Skorohod) Let w(t) be a continuous, real-valued function of t 0 such that ∏ w(0) 0. Then there exists a unique pair of real-valued continuous functions x(t) and ∏ y(t) such that

(a) x(t) w(t) y(t); = + (b) x(t) 0 and y(0) 0; ∏ = (c) y(t) is nondecreasing and is constant on any time interval during which x(t) 0. >

The functions x(t) and y(t) are given by

y(t) ( minw(s)) 0 and x(t) w(t) y(t). (82) = ° s t _ = + ∑ Proof. First, let’s verify that the functions x(t) and y(t) deﬁned by (82) satisfy properties (a)–(c). It is easy to see that if w(t) is continuous then its minimum to time t is also continuous; hence, both x(t) and y(t) are continuous. Up to the ﬁrst time t t that = 0 w(t) 0, the function y(s) will remain at the value 0, and so x(s) w(s) 0 for all s t . = = ∏ ∑ 0 For t t , ∏ 0 y(t) minw(s) and x(t) w(t) minw(s); =° s t = ° s t ∑ ∑ hence, x(t) 0 for all t t . Clearly, y(t) never decreases; and after time t it increases ∏ ∏ 0 0 only when w(t) is at its minimum, so x(t) 0. Thus, (a)–(c) hold for the functions x(t) = and y(t). Now suppose that (a)–(c) hold for some other functions x˜(t) and y˜(t). By hypothesis, y(0) y˜(0) 0, so x(0) x˜(0) w(0) 0. Suppose that at some time t 0, = = = = ∏ § ∏ x˜(t ) x(t ) " 0. § ° § = >

39 Then up until the next time s t that x˜ 0, the function y˜ must remain constant, and § > § = so x˜ w must remain constant. But x w y never decreases, so up until time s the ° ° = § difference x˜ x cannot increase; at time s (if this is ﬁnite) the difference x˜ x must ° § ° be 0, because x˜(s ) 0. Since x˜(t) x(t) is a continuous function of t that begins at ∑ § = ° x˜(0) x(0) 0, it follows that the difference x˜(t) x(t) can never exceed ". But " 0is ° = ° > arbitrary, so it must be that x˜(t) x(t) 0 t 0. ° ∑ 8 ∏ The same argument applies when the roles of x˜ and x are reversed. Therefore, x x˜. ¥ Corollary 5. (Lévy) For x 0, the law of the vector-valued process ( W x ,L(t;x)) coin- ∏ | t ° | cides with that of (W x M(t;x)°,M(t;x)°), where t + °

M(t;x)° : min(Ws x) 0. (83) = s t + ^ ∑

5.4 Application: Extensions of Itô’s Formula

Theorem 12. (Extended Itô Formula) Let u : R R be twice differentiable (but not neces- 2 ! sarily C ). Assume that u00 is integrable on any compact interval. Let W be a standard | | t Brownian motion. Then t 1 t u(Wt ) u(0) u0(Ws)dWs u00(Ws)ds. (84) = + + 2 Z0 Z0 Proof. Exercise. Hint: First show that it sufﬁces to consider the case where u has compact support. Next, let ' be a C 1 probability density with support [ ±,±]. By Lemma ??, ± ° ' u is C 1, and so Theorem 2 implies that the Itô formula (84) is valid when u is re- ± § placed by u ' . Now use Theorem 10 to show that as ± 0, § ± ! t t u '00(W )ds u00(W )ds. § ± s °! s Z0 Z0

Tanaka’s theorem shows that there is at least a reasonable substitute for the Itô formula when u(x) x . This function fails to have a second derivative at x 0, but it =| | = does have the property that its derivative is nondecreasing. This suggest that the Itô formula might generalize to convex functions u, as these also have nondecreasing (left) derivatives.

Deﬁnition 2. A function u : R R is convex if for any real numbers x y and any s (0,1), ! < 2 u(sx (1 s)y) su(x) (1 s)u(y). (85) + ° ∑ + °

40 Lemma 4. If u is convex, then it has a right derivative D+(x) at all but at most countably many points x R, deﬁned by 2 u(x ") u(x) D+u(x): lim + ° . (86) = " 0 " ! +

The function D+u(x) is nondecreasing and right continuous in x, and the limit (86) exists everywhere except at those x where D°u(x) has a jump discontinuity.

Proof. Exercise — or check any reasonable real analysis textbook, e.g. ROYDEN, ch. 5.

Since D+u is nondecreasing and right continuous, it is the cumulative distribution function of a Radon measure2 µ on R, that is,

D+u(x) D+u(0) µ((0,x]) for x 0 (87) = + > D+u(0) µ((x,0]) for x 0. = ° ∑ Consequently, by the Lebesgue differentiation theorem and an integration by parts (Fu- bini),

u(x) u(0) D+u(0)x (x y)dµ(y) for x 0 (88) = + + ° > Z[0,x] u(0) D+u(0)x (y x)dµ(y) for x 0. = + + ° ∑ Z[x,0] Observe that this exhibits u as a mixture of absolute value functions a (x): x y . Thus, y =| ° | given the Tanaka formula, the next result should come as no surprise.

Theorem 13. (Generalized Itô Formula) Let u : R R be convex, with right derivative ! D+u and second derivative measure µ, as above. Let Wt be a standard Brownian motion and let L(t;x) be the associated local time process. Then

t u(W ) u(0) D+u(W )dW L(t;x)dµ(x). (89) t = + s s + Z0 ZR Proof. Another exercise. (Smooth; use Itô; take limits; use Tanaka and Trotter.)

2A Radon measure is a Borel measure that attaches ﬁnite mass to any compact interval.