An introduction to stochastic control

F. J. Silva Dip. di Matematica Guido Castelnuovo

March 2012 Some disperse but useful results

This section is based on [24, Chapter 1]. Let Ω be a nonempty set, and let F ⊂ 2Ω. We say that

 F is a π-system if A, B ∈ F ⇒ A ∩ B ∈ F.

 F is a λ-system if (i) Ω ∈ F; (ii) A, B ∈ F and A ⊆ B imply B \ A ∈ F; (iii) Ai ∈ F, Ai ↑ A implies that A ∈ F.

Lemma 1. [σ-field of a π-system, or λ-π lemma] If a π system A is contained in a λ system F then σ(A) ⊆ F.

Proof: See any standard book on measure theory (e.g. [2]) 

Example of application: [Uniqueness of the extension of a measure defined on a π-system] Let P and Q be two probability measures on (Ω, F) which coincide on a

1 π-system A, then they coincide on σ(A). In fact, it is enough to define C = {C ∈ F ; P (C) = Q(C)} and two verify that it is a λ-system. Corollary 1. [Measurable function w.r.t. the σ-field of a π system] Let A be a π system. Let H be a linear space of functions from Ω to R such that

(i)1 ∈ H;( ii) IA ∈ H for all A ∈ A

(iii) φi ∈ H, 0 ≤ φi ↑ φ, φ is bounded ⇒ φ ∈ H.

Then H contains all σ(A)-measurable functions from Ω to R. Proof: Let φ be σ(A)-measurable. Clearly we have that

n + n X −n φ ↑ φ , with φ (ω) := j2 I{φ+(ω)∈[j2−n,(j+1)2−n]} (1) j≥0 with an analogous approximation for φ−. Therefore, by (iii), it is enough to show that φn ∈ H. But φ is the sum of indicator of elements in σ(A). Therefore it is natural to consider the set F := {A ∈ Ω; IA ∈ H}.

2 and to prove that σ(A) ⊆ F. But this follows from lemma 1, since A is a π-system and A ⊂ F, which is easily shown to be a λ-system.  Theorem 1. [Dynkin theorem] Let (Ω, F) and (Ω0, F 0) two measurable spaces, and let (U, d) be a Polish space. Let ξ :Ω → Ω0 and φ :Ω → U be r.v.’s. Then φ ∈ σ(ξ)- measurable, i.e. φ−1(B(U)) ⊆ ξ−1(F 0), iff there exists a measurable η :Ω0 → U such that φ(ω) = η(ξ(ω)) for all ω ∈ Ω.

Proof: Consider the case U = R (the general case can be obtained using an isomorphism theorem, see [19]) and define the set

 0 0 H := η(ξ); for some F -measurable map η :Ω → R .

We have to shown that the set of σ(ξ)/B(R) measurable maps belongs to H. This can be done by checking the assumptions of corollary 1 with A = σ(ξ) and proving that H satisfies (i), (ii) and (iii). 

Exercise: Do the details of the above proof.

3 Lemma 2. [Borel-Cantelli] Let (Ω, F, P) be a probability space and consider a sequence of events Ai ∈ F. We have X P(Ai) < ∞ ⇒ P(∩i ∪j≥i Aj) = 0. i

Proof: Straightforward. Note that P(∩i ∪j≥i Aj) ≤ P(∪j≥iAj) for all i and use the convergence of the series. 

Lemma 3. [Chebyshev inequality] Consider a nonnegative r.v. X. Then, for all p ∈ (0, ∞) and ε > 0 we have p E(X ) P(X ≥ ε) ≤ . εp Proof: It suffices to note that Z Z p p p X (X ≥ ε) = (X ≥ ε ) = p p d (ω) ≤ d (ω). P P I{X ≥ε } P p P Ω Ω ε 

4 Conditional expectation

This section is based on [24, Chapter 1]. Consider a probability space (Ω, F, P). Let 1 X ∈ L (Ω) and G be a sub-σ-field of F. Define the signed measure µ : G → R as Z µ(A) := X(ω)dP(ω) for all A ∈ G. A

1 By the Radon-Nikodym theorem there exists a unique P|G a.s. r.v. f ∈ LG(Ω) (i..e. in particular G-measurable) such that Z Z fdP = XdP for all A ∈ G. (2) A A The function f is called the conditional expectation of X given G, and we write

E(X|G) := f.

5 Fundamental properties of E(·|G) All the properties below are simple consequences of (2) (Do them as an exercise!)

(i) E(·|G) is a linear bounded functional.

(ii) For a constant a ∈ R, we have E(a|G) = a. 1 (iii) [Mononoticity] For X,Y ∈ LF with X ≥ Y we have E(X|G) ≥ E(Y |G). 2 2 (iv) [Take out the measurable part] For X ∈ LF and Y ∈ LG, we have E(YX|G) = Y E(X|G). (v) [Characterization of independence] X is independent of G iff for every Borel f 1 such that f(X) ∈ L (Ω) we have E(f(X)|G) = E(f(X)).

(vi) [Tower or “projection” property] If G1 ⊆ G2 ⊆ F, then E(E(X|G1)|G2) = E(E(X|G2)|G1) = E(X|G1). (vii) [Jensen inequality] Let φ convex such that φ(X) ∈ L1(Ω), then

φ(E(X|G)) ≤ E(φ(X)|G).

6  [Conditioning one r.v. X w.r.t. another r.v. ξ]

1 Let X ∈ LF and ξ : (Ω, F) → (U, B(U)). Note that we can always define

−1 E(X|ξ) := E(X|ξ (B(U))) = η(ξ) for some B(U)/B(R)- measurable function η, by Dynkin theorem. Therefore, it is natural to define

E(X|ξ = x) := η(x).

Another way to define this is by appealing to the Radon-Nikodym. In fact, let us define in B(U) the measure Z ν(B) := X(ω)dP(ω), ξ−1(B) −1 which is absolutely continuous with respect to Pξ := P ◦ ξ (the image measure of P dν under ξ). Therefore, there exists a Radon -Nikodym derivative (unique ε-a.s.) such dPε P that Z dν Z (x)dPε(x) = X(ω)dP(ω) for all B ∈ B(U). B dPε ξ−1(B)

7 We define dν E(X|ξ = x) := (x). dPε It can be checked that

−1 dν η(ξ(ω)) = E(X|ξ (B(U)))(ω) = (ξ(ω)) P|ξ−1(B(U)) − a.s. (3) dPε

−1 Integrating w.r.t. a set of the form ξ (A) and using the definition of Pξ, we obtain that

dν η(x) = (x) Pε − a.s. dPε

Note that incidentally, this gives another proof of Dynkin theorem. Let us prove (3). We

8 have

R dν R dν (ξ(ω))d (ω) = (x)d ε(x) ξ−1(B) dPε P B dPε P = ν(B) R = ξ−1(B) X(ω)dP(ω) R −1 = ξ−1(B) E(X|ξ (B(U)))(ω)dP(ω). which yields the result. 

Now, we define the conditional probability w.r.t. a σ-field G as

P(A|G) := E(IA|G).

Nota that for each B ∈ F we have that P(B|G) is only defined P|G a.s. Thus, it is not sure that we can find some A ∈ G with P(A) = 1 such that if we fix any ω ∈ A, we can measure all the sets B ∈ F . However, we can give a sense to this using the concept of regular conditional probability (see [2] for more on this).

9  [Characterization of E(·|ξ) in terms of {g(ξ); g bounded continuous}] Let us now prove that

E(X|ξ) = 0 iff for all bounded continuous g we have E(g(ξ)X) = 0.

The “only if” part is direct. To prove the “if part”, first note that

E(X|ξ) = 0 iff E(φX) = 0 for all φ that are σ(ξ) measurable.

Now, consider the set

H := {φ : (Ω, F) → (R, B(R)) ; E(φX) = 0}.

We have to show that H contains the σ(ξ)-measurable functions. It is clear that H satisfies the assumptions (i) and (iii) of corollary 1. We only have to construct a π-system A such that IA ∈ H for all A ∈ A and σ(A) = σ(ξ). Let us take

A := {ξ−1([a, b]) ; for some a < b}.

10 If A = ξ−1([a, b]) ∈ A we have

E(IA(ξ)X) = E(I[a,b](ξ)X).

n Now, take any sequence of continuous functions g → I[a,b](ξ), where the convergence is n pointwise. Since E(g (ξ)X) = 0, by passing to the limit we get that E(IA(ξ)X) = 0 and so assumption (ii) of corollary 1 is verified. 

 Note that the same proof yields that if G = σ(ξ1, ..., ξn) we have

n E(X|G) = 0 iff for all bounded continuous g : R → R we have E(g(ξ1, . . . , ξn)X) = 0.

 The interesting fact is that the result is also valid when we condition to a countable set of r.v’s. More precisely, using the above result and the technique of its proof we get the following result.

Proposition 1. [Checking conditions only on a finite set of variables] Consider a

11 sequence of variables ξ1, ξ2,... and define G := σ({ξi ; i ∈ N}). Then

n E(X|G) = 0 iff for all n ∈ N and g ∈ Cb(R ) we have

E(g(ξ1, . . . , ξn)X) = 0.

12 : Basic definitions

Good references for this part are [7, 14].

 Let I be a non-empty index set and (Ω, F, P) a probability space. A family {X(t); t ∈ I} is called a stochastic process. It can also be interpreted as for each ω you have a function X(·, ω) defined on I.

 We define, for ti ∈ I, the so-called finite-dimensional distributions

Ft1(x1) := P(X(t1) ≤ x1)

Ft1,t2(x1, x2) := P(X(t1) ≤ x1,X(t2) ≤ x2) .

Ft1,...,tn(x1, . . . xn) := P(X(t1) ≤ x1,...X(tn) ≤ xn)

Clearly, the family Ft1...tn satisfies:

13 (i) [Symmetry condition] For any permutation σ of (t1, ··· , tn) we have

F (x , . . . x ) = F (x , . . . x ) (4) tσ(1),...,tσ(n) σ(1) σ(n) t1,...,tn 1 n

(ii) [Compatibility condition] For all i < j

Ft1,...,ti,ti+1,...,tj (x1, . . . , xi, ∞,..., ∞) = Ft1,...,ti(x1, . . . , xi). (5)

Question: Given a family of functions Ft1,...,ti(·,..., ·) satisfying the the symmetry and compatibility condition, do we have an stochastic process with the desired finite-dimensional distributions?

Theorem 2. [Kolmogorov’s existence theorem] Given a family Ft1,...,ti(·,..., ·) satisfying (4) and (5), there exists a probability space (Ω, F, P) and a stochastic process X whose finite dimensional distributions are the desired ones.

Proof: See e.g. [7]. 

From now on we will assume that I := R+.

14  Given two process X and X0 we say that X0 is a modification of X if for all t ≥ 0 0 we have P(X(t) = X (t)) = 1. Note that the negligible sets depend on the time t. 0 We say that X is a version of X if there exists A ∈ F with P(A) = 1 such that X(ω, ·) = X0(ω, ·) for all ω ∈ A.

Example: Let (Ω, F, P) = ([0, 1], FLeb, L) and define X(t, ω) = 1 if ω = t and 0 if not. Then X0 ≡ 0 is a modification of X but not a version. In fact, for all 0 ω ∈ [0, 1] we have that X(·, ω) 6= X (·, ω). 

Despite of the above example, it should be noted that modifications are in fact very useful. For example, a modification of a process preserves its finite-dimensional laws.

 [Continuous process] We say that X is continuous (right-continuous, left-continuous) if for all ω we have X(·, ω) is continuous (resp. right-continuous, left-continuous).

 [Filtration and usual conditions] A family of σ-fields {Ft}t≥0 is called a filtration if Fs ⊆ Ft ⊆ F for all 0 ≤ s ≤ t. We say that {Ft}t≥0 satisfies the usual conditions, if F0 contains all the negligible sets, it is right-continuous, i.e. Ft = Ft+ := ∩s>tFs and it is left-continuous, i.e. Ft = Ft− := σ (∪s

15  [Measurability of process] A stochastic process X in [0,T ] is:

(i) measurable, if X : [0,T ] × Ω → R is [B(0,T ) × F]/B(R) measurable.

(iii) adapted, if for all t ∈ [0,T ] we have that X(t) is Ft/B(R) measurable.

(iii) progressively measurable (to simplify we will say “progressive”) if for all t ∈ [0,T ] we have that X : [0, t] × Ω → R is B([0, t]) × Ft/B(R)-measurable.

Evidently, every progressive process is adapted. What happens with the converse? In general, it is not true. However, Proposition 2. [Adapted continuous process are progressive] If an adapted process X is left or right continuous, then it is progressively measurable.

Proof: Suppose that X is right continuous and consider the sequence

kT  kT (k + 1)T  Xn(t) := X if t ∈ , for k = 0, . . . , n − 1. n n n

16 It is easy to see that Xn are progressive and by the right continuity they converge to X and so X is progressive.  For the general case, we have the following difficult result (for the proof see [17]) : Proposition 3. [Progressive modifications] Every adapted process has a progressive modification.

Why progressive measurability is important? Since you have a “dynamic” form of product measurability, this allows (as we will see later) to do the composition of process and random variables in such a way that the result is still at least adapted. Note that all the process Y (t) of interest should be at least adapted (which correspond that I can decide if Y (t) satisfies something or not, with the events that I can measure up to time t, i.e. Ft).

 Now, we construct two natural filtrations in the space W[0,T ] := C[0,T ]. Set

Wt[0,T ] := {ξ(· ∧ t); ξ ∈ W[0,T ]} , Bt(W[0,T ]) := σ (B(Wt[0,T ]))

Bt+(W[0,T ]) := ∩s>tBs(W[0,T ])

17 Remark 1. Note that by definition W[0,T ] ∈/ B(Wt[0,T ]), this is one reason to consider σ(B(Wt[0,T ])), the σ-field in W[0,T ], which by definition contains W[0,T ].

Note also that Bt(W[0,T ]) 6= Bt+(W[0,T ]). In fact, the event

{ξ ∈ W [0,T ]; ξ has a local maximum at t¯} , belongs to Bt¯+(W[0,T ]) \Bt¯(W[0,T ]).

 We denote by

A(U) the set of Bt+(W[0,T ])- progressive measurable process (6) with values in a Polish space U, i.e. if f ∈ A(U), the restriction of f to [0, t] × W[0,T ] is [0, t] × Bt+(W[0,T ])/B(U) measurable.

The following theorem is the extension for filtrations of Dynkin theorem 1. For the proof see [24].

18 Theorem 3. [Dynkin theorem for filtrations generated by a continuous process] Consider (Ω, F, P) and a Polish space U. Let ξ : [0,T ] × Ω → R be a continuous process and ξ denote by Ft := σ(ξ(s) ; 0 ≤ s ≤ t) for its associated filtration. Then a process ξ φ : [0,T ] × Ω → U is {Ft }t≥0 adapted iff there exists η ∈ A(U) such that

φ(t, ω) = η(t, ξ(· ∧ t, ω)).

Nex we provide a sufficient condition for a.s. Holder continuity (up to modification of the process) for a general stochastic process.

Theorem 4. [Kolmogorov-Centsov Continuity Criterium] Let Xt be a stochastic process satisfying

r 1+γr E [|X(t) − X(s)| ] ≤ C|t − s| , 0 ≤ s, t, ≤ T, for some r, γ, C ≥ 0.

Then X has a modification which is α-Holder continuous, for every α ∈ (0, γ).

Proof: See e.g. [7] or [14]. 

19 Stopping times

Let (Ω, F, {Ft}t≥0, P) be a filtered probability space satisfying the usual conditions.

 A mapping τ :Ω → [0, ∞] is a stopping time (w.r.t. {Ft}t≥0), if for all t ≥ 0 we have {τ ≤ t} ∈ Ft.

 The σ-field Fτ (of the events “before τ”) is defined as

Fτ := {A ∈ F ; A ∩ {τ ≤ t} ∈ Ft, ∀ t ≥ 0}.

Can we change ≤ by < in the definition of τ and Fτ ? The answer is yes because the filtration is right-continuous (Exercise).

20 We will only list some interesting properties of stopping times. See [14] for the proofs.

Proposition 4. Let σ, τ and σi be stopping times. Then (i)[Construction of stopping times] Then the following r.v. are also stopping times

σ + τ; sup σi; inf σi; lim sup σi; lim inf σi. i i i i

(ii)[Comparison of stopping times] The events {σ > τ}, {σ ≥ τ} and {σ = τ} belong to Fσ∧τ . Moreover, if A ∈ Fσ then A ∩ {σ ≤ τ} belongs to Fτ . In particular, if P-a.s. we have σ ≤ τ then Fσ ⊆ Fτ .

(iii)[The σ-field of the infimum of stopping times] Let σˆ = infi σi, then Fσˆ = ∩iFσi.

21 Proposition 5. [Tower property along stopping times] Let τ and σ be two stopping times. Then

E(Iσ>τ X|Fτ ) = Iσ>τ E(X|Fτ ) = Iσ>τ E(X|Fτ∧σ) (7) E(Iσ≥τ X|Fτ ) = Iσ≥τ E(X|Fτ ) = Iσ≥τ E(X|Fτ∧σ), and E(E(X|Fτ )|Fσ) = E(X|Fτ∧σ). Proposition 6. [Stochastic process and stopping times] Suppose that the filtration satisfies the natural conditions. Let X be progressive and τ an stopping time. Then X(τ) is Fτ measurable and the process X(τ ∧ t) is progressive.

Important example of stopping times: [First hitting time and first exit time of a continuous process] Let X be adapted and continuous and E an open set. The first hitting time of X to E is defined as σE(ω) := inf{t ≥ 0 ; X(t, ω) ∈ E} and the first exit time of X from E is defined c as τE(ω) := inf{t ≥ 0 ; X(t, ω) ∈ E }. Both random times are stopping times (exercise!).

22 Martingales

 A process X(t) is called a Ft-martingale (resp. submartingale, supermartingale) if it is adapted, X(t) ∈ L1(Ω) and

E(X(t)|Fs) = X(s) (resp. ≥, ≤), for all 0 ≤ s ≤ t.

 Note that in particular for a martingale (resp. submartingale, supermartingale) , t → E(X(t)) is constant (resp. increasing, decreasing).  Given a submartingale and a convex non-decreasing φ such that φ(X(t)) ∈ L1(Ω) for all t ∈ [0,T ], Jensen inequality yields that φ(X(t)) is also a submartingale.

23 Fundamental properties

See [14] for the proof of the following results:

 [Doob inequality I] Let p ≥ 1 and X(t) a right continuous martingale with X(t) ∈ Lp(Ω) for all t ∈ [0,T ]. Then

! p E (|X(T )| ) sup |X(t)| > λ ≤ . P p t∈[0,T ] λ

Note that the above inequality is an important improvement of Chebyshev’s inequality.  [Doob inequality II] For all p > 1, and X(t) a right continuous martingale with X(t) ∈ Lp(Ω) for all t ∈ [0,T ], we have

!  p p p p E sup |X(t)| ≤ E (|X(T )| ) . t∈[0,T ] p − 1

24  [Optional sampling theorem, or martingale property over stopping times] Let σ ≤ τ be two bounded stopping times. Then, for any martingale (resp. submartingale, supermartingale) X(t) we have

E (X(τ) |Fσ) = X(σ) (resp. ≥, ≤) P − a.s.

 As a corollary we obtain that for stopping times σ ≤ τ we have

E (X(t ∧ τ) − X(t ∧ σ)|Fσ) = 0.

Proof: Since X(t ∧ τ) is Ft mesurable, the optional sampling theorem yields

X(t ∧ σ) = E (X(t ∧ τ)|Ft∧σ) = E (E (X(t ∧ τ)|Ft) |Fσ) = E (X(t ∧ τ)|Fσ) , which gives the result. 

25 Multivariate Normal distribution

 A d-dimensional r.v. X is said to have the multivariate Gaussian distribution with mean µ and covariance Σ if its density is given by

d   −2 1 > −1 d (2πdetΣ) exp −2(x − µ) Σ (x − µ) for all x ∈ R .

The particular form of the density yields that if Σ is diagonal then the coordinates of X are one dimensional independent variables with normal distribution. (you can factorize the multiple integrals in order to calculate the distributions). Proposition 7. [Characterization of a multivariate gaussian] X is a multivariate d > gaussian iff for every b ∈ R we have that b X is a univariate gaussian.

Proof: See any basic probability book (e.g. [1]). 

26 Brownian motion

Let (Ω, F, {Ft}t≥0, P) be a filtered probability space.

m  An adapted R valued process X(t) is called a Brownian motion over [0, ∞) if for all 0 ≤ s < t we have that:

(i) X(t) has continuous trajectories.

(ii) X(t) − X(s) is independent of Fs.

(iii) X(t)−X(s) is normally distributed with mean 0 and covariance matrix (t−s)I.

(iv) X(0) = 0, P-a.s. Therefore,

E(X(t) − X(s)|Fs) = 0, P − a.s.  >  E (X(t) − X(s))(X(t) − X(s)) |Fs = (t − s)I.

27 In particular, it is equivalent to say that we have m independents one dimensional Brownian motions. In order to simplify the exposition we will work only with a one dimensional Brownian motion, but we will come back to the multidimensional case a bit before treating stochastic differential equations.

 Does a Brownian motion exist?

We now provide three sketches of a construction of such a process.

(i) [Construction on the space of continuous functions] We consider the space W := C([0, ∞)), endowed with the distance

X −j   ρˆ(w, wˆ) := 2 |w − wˆ|C([0,j]) ∧ 1 . j≥1

It is seen that W, endowed with this metric, is a Polish space. We say that C ⊆ W is a j cylinder in W if for j ∈ N and 0 ≤ t1 < . . . tj, there exists E ∈ B(R ) such that

B = {ξ ∈ W ;(ξ(t1), . . . , ξ(tj)) ∈ E} . (8)

28 We call C the set of cylinders.

Lemma 4. We have σ(C) = B(W).

j Proof: Since ξ ∈ W → (ξ(t1), ...., ξ(tj)) ∈ R is continuous, we get that C ∈ B(W). On the other hand, by continuity, note that for any ε > 0 and ξ0 ∈ W " # X −j {ξ ∈ W ;ρ ˆ(ξ, ξ0) < ε} = {ξ ∈ W ; 2 sup |ξ(t) − ξ0(t)| ∧ 1 < ε} j≥1 t∈Q∩[0,j] which belongs to σ(C). Since W is separable, open sets are countably unions of balls and thus B(W ) ⊆ σ(C). 

Let µ be probability measure in (R, B(R)) and consider the density function of a N (0, t), 1 |x|2 − − f(x, t) := (2πt) 2 e 2t for t > 0, x ∈ R.

29 We define Pµ as follows: For any Ei ∈ B(R) we let R R Pµ({ξ ∈ W ; ξ(t1) ∈ E1, . . . ξ(tj) ∈ Ej}) := µ(x0) E f(x1 − x0, t)dx1 ... R R 1 ... f(xj − xj−1, tj − tj−1)dxj. R (9) It is seen that Pµ is additive. The difficult part consists in proving that it is σ-additive. This was shown by Wiener in 1923. Using lemma 4 we can extend Pµ to (W, B(W)). This extension of µ (which of course it is unique) is called a Wiener measure with initial distribution µ. Then, on the filtered probability space (W, B(W), {Bt(W)}t≥0, Pµ), we obtain a Brownian motion by defining

X(t, w) := w(t) and letting µ = δ0.

Note that, as we have seen before, the filtration is not right-continuous. 

[0,∞) [0,∞) (ii) [Construction on R ] We consider the set R . As before we define a cylinder in the same manner

n [0,∞) o B = ξ ∈ R ;(ξ(t1), . . . , ξ(tj)) ∈ E ,

30 j [0,∞) where E ∈ B(R ). We set B(R ) for the σ-field generated by the cylinders. We define a family of a posteriori finite dimensional-distributions replacing in (9) each Ei by (−∞, xj]. It is easy to show that this family satisfies (4) and (5). Therefore, by applying theorem 2 we obtain a stochastic process X that has (9) as finite dimensional distributions.

The only delicate point is that a priori there is no a.s. continuity. In fact, it is easy [0,∞) to see that W ∈/ B(R ) (we only have countably information which is not enough to [0,∞) establish continuity, since R is too large) and that W contains only the empty set as [0,∞) a measurable set belonging to B(R ).

However, using Kolmogorov-Centov criterium (theorem 4) and the finite dimensional distributions, we obtain the existence of a modification which has α-Holder paths a.s. for 1 any α ∈ (0, 2) (Exercise!). As we will see later, almost surely we do not have Holder 1 continuity for α ≥ 2. 

31 (iii) [Approximation by random walks] Given a family {Y (i); i = 1, . . . , n} of 1 independent r.v’s with P(Y (i) = 1) = P(Y (i) = −1) = 2 we define the symmetric as

k X M(0) := 0 and M(k) := Y (j) for k = 0, . . . , n. j=1

Now, we define the obvious continuous process M(t) by linear interpolation

M(t) := M([t]) + (t − [t])Y ([t] + 1) for all t ≥ 0.

Let us properly scale the process (recalling the Central Limit Theorem) defining

1 W n(t) := √ M(nt) for all t ≥ 0. n

Another proof of the existence of the Brownian motion, consists in proving the convergence of this process to a process that satisfies the desired properties. This is the so-called Donsker’s invariance principle (see [5]). 

32  Set tk := k/n. Note that the following properties are clear (easy exercise!)

n n E(W (tk)|Fti)= W (ti) for all 0 ≤ i ≤ k.  n 2  n 2 (10) E (W (tj)) − tj|Fti =( W (ti)) − ti for i ≤ j n n Pk n n 2 [W ,W ](tk) := j=1 (W (tj) − W (tj−1)) = tk.

In particular we have

n 2 n n (W (tk)) − [W ,W ](tk) is a discrete time martingale.

 Return to the Brownian motion W (t): We now prove the analogous properties for the limit case, i.e. for W (t).

(i) W (·) is a martingale, i.e. if s ≤ t we have E(W (t)|Fs) = W (s). Proof:

E(W (t)|Fs) = E(W (t)−W (s)+W (s)|Fs) = W (s)+E(W (t)−W (s)|Fs) = W (s).



33 n n n (ii) Let π = (ti )i≥0 (t0 = 0) be a sequence of partitions of R+. Let ∆ti := n n ti+1 − ti and suppose that

n n |π | := sup |∆ti | → 0 as n ↑ ∞. i≥1

We define the discrete quadratic variation

πn X n n 2 QV (t) := |W (ti ∧ t) − W (ti−1 ∧ t)| . i≥1

n Proposition 8. We have that QV π (t) → t in L2(Ω), for all t ≥ 0.

Proof: We have

 2  2  πn  P n n 2 n n  E QV (t) − t = E i≥1 |W (ti ∧ t) − W (ti−1 ∧ t)| − (ti ∧ t − ti−1 ∧ t)

Writing the square, using the independence of the increments and the fact that

n n n n W (ti ∧ t) − W (ti−1 ∧ t) ∼ N (0, ti ∧ t − ti−1 ∧ t),

34 we can eliminate the cross- terms and we obtain that  2  πn  P h n n 4 n n 2i E QV (t) − t = i≥1 E |W (ti ∧ t) − W (ti−1 ∧ t)| − (ti ∧ t − ti−1 ∧ t)

4 2 If a r.v. Y ∼ N (0, t) then E(Y ) = 3t (Exercise! which is also useful to prove the a.s α- Holder 1 continuity of W (·) when α ∈ (0, 2)). Thus  2  πn  X n n 2 n E QV (t) − t = 2 (ti ∧ t − ti−1 ∧ t) ≤ 2t|π | → 0. i≥1

 In fact, if the sequence of partitions has a mesh satisfying that πn converge fast n −n enough to 0 (for example if the mesh is given by the dyadic partition ti := i2 ) it is n possible to prove a.s. convergence for the whole trajectory QV π (·) (see [7]).

Before passing to the third identity, let us first discuss some important consequences of the above result: First, from the inequality

πn n n X QV (t) ≤ max |W (tj ∧ t) − W (tj−1 ∧ t)| |W (ti ∧ t) − W (ti−1 ∧ t)|, j≥1 i≥1

35 πn P n n we see, by the continuity of the W (·), that V (t) := i≥1 |W (ti ∧ t) − W (ti−1 ∧ t)| n satisfies V π (t) → ∞ a.s. This implies that a.s. the total variation of W (·) is +∞. Therefore, it is not possible to define (see [21])

Z T f(t)dW (t, ω) for all f ∈ C[0,T ]. 0

Another trivial consequence is that W cannot have a bounded derivative on any interval. In fact, we will see later that a.s. W (·) is not differentiable at any point!

(iii) t → W (t)2 − t is a martingale. Proof: In fact, since W (t) is a martingale, for s < t we have

h 2 i 2 2 t − s = E (W (t) − W (s)) |Fs = E(W (t) |Fs) − W (s) .



36  [The augmented canonical filtration and the Blumenthal zero-one law] Clearly, by the continuity of the trajectories, the natural filtration associated with W (·), given by Ft = σ (W (s); s ≤ t), is left-continuous. However, it is easy to see that is not right- continuous (think again on the event of having a local maximum). Therefore, it does not satisfy the natural conditions. Now we are going to complete the filtration and then we will show that the new filtration is right-continuous.

First we need a useful result.

Theorem 5. [Blumenthal zero-one law] For all A ∈ F0+ we have P(A) = 0 or P(A) = 1.

Proof: We will show that F0+ is independent to it self, which clearly implies the k result. Let 0 < t1 < ... < tk and g : R → R bounded continuous. For A ∈ F0+, by continuity we get

( Ag(W (t1),...,W (tk)) = lim [ Ag(W (t1) − W (ε),...,W (tk) − W (ε)] E I ε→0 E I

For ε small enough, we obtain the independence between the above increments and Fε,

37 and so the increments are also independent of F0+. Thus

E(IAg(W (t1),...,W (tk)) = limε→0 P(A)E [g(W (t1) − W (ε),...,W (tk) − W (ε)] = P(A)E(g(W (t1),...,W (tk)).

Using the usual argument based in the monotone class theorem, we obtain that F0+ is independent of σ (W (s) ; 0 < s ≤ t) = σ (W (s) ; 0 ≤ s ≤ t) by continuity of W (·), and so we get the independence to itself.  We define

F t := σ(Ft∪N (F)) where N (F) := {A ⊆ Ω; ∃ A ∈ F,A ⊆ A with P(A) = 0}.

The filtration F t is called the canonical augmented filtration. Theorem 6. The augmented filtration F is continuous and W is a F- Brownian motion.

Proof: The left continuity is clear. Clearly F 0 ⊆ F 0+ and by the above result F 0+ ⊆ N (F) ⊆ F0, which shows the right-continuity at zero. At the other times the same argument applies by using the independence of the increments. 

38  [Almost surely nowhere differentiable] As we have seen before the Brownian motion has α-Holder continuous paths, for every α ∈ [0, 1/2). What happens for α ∈ [1/2, 1)?

Define the G(α, c, ε) as the set of all ω ∈ Ω such that for some s ∈ [0, 1] we have

|W (s, ω) − W (t, ω)| ≤ c|s − t|α for all t with |s − t| ≤ ε. (11)

The proof of the following theorem is taken from the nice book [22].

1 Theorem 7. If α > 2, then G(α, c, ε) has probability 0 for all 0 < c < ∞ and ε > 0.

Proof: First divide [0, 1] in intervals of length 1/n. When you are at k/n consider m-blocks to the right (where m will be fixed and be chosen later!) . For m/n ≤ ε, set

      j + 1 j Xn,k(ω) := max W , ω − W , ω ; k ≤ j < k + m . n n

¯ ¯ If ω ∈ G(α, c, ε), then at least one block Bn,k¯ := [k/n, (k + m)/n] is such that, for the neighborhood where (11) holds, we have that Bn,k¯ ⊆ [s − ε, s + ε]. By the triangular

39 α inequality and (11) we get that Xn,k¯(ω) ≤ 2c(m/n) . Therefore, we have

  α G(α, c, ε) ⊆ ω ; min Xn,k ≤ 2c(m/n) . 0≤k≤n−m

On the other hand, by the properties of the increments Brownian motion W (t) (independence and stationarity), we get:

α 1  αm P (Xn,k ≤ 2c(m/n) ) = P W ≤ 2c(m/n) n m  1  − α = P n 2 |W (1)| ≤ 2c(m/n) .

√ Using the trivial bound P(|B1| ≤ x) ≤ 2x/ 2π, we easily obtain

 1 m α α −α √ P (Xn,k ≤ 2c(m/n) ) ≤ 4cm n2 / 2π .

40 Thus,

 α m 1 α 4m 1+m( −α) P(G(α, c, ε)) ≤ nP (Xn,k ≤ 2c(m/n) ) ≤ √ n 2 . 2π

1 We conclude by choosing m such that m(α − 2) ≥ 1 and then letting n ↑ ∞. 

As a corollary, we get

Corollary 2. With probability one every path of the Brownian motion is not differentiable at any 0 ≤ t ≤ 1.

Proof: Set D for the event that W is differentiable at some t. Then

[ [  1 D ⊆ G 1, n, j j n which has zero probability. 

41 [Anecdotic remark] Weierstrass in 1872 constructed the following function which is continuous and nowhere differentiable

n X cos(3 t) f(t) := . 2n n≥1

1 Thus we have answered to the question of the α-Holder property for α ∈ (2, 1). 1 What happens with the limit case α = 2? The following important theorem gives, as a 1 corollary, a negative answer for α = 2. Theorem 8. [Law of the iterated logarithm] For the Brownian motion W (t) we have, with probability 1, that

W (t) W (t) lim sup q = 1 and lim inf q = −1. t↓0 1 t↓0 1 2t log[log( t )] 2t log[log( t )]

Proof: See [7].

42  [The strong Markov property] We recall that by definition of W (t) the process W (t + ·) − W (t) is independent of W (t). Our aim now is to generalize this property to the case of stopping times. Theorem 9. Let τ be a stopping time. Consider the process

W (τ)(t) := W (τ + t) − W (t).

(τ) Then, W (t) is a Brownian motion independent of Fτ .

Proof: Note that, by the monotone class argument, it is enough to prove that

h (τ) (τ) i E IAg(W (t1),...,W (tp)) = P(A)E [g(W (t1),...,W (tp))] for all 0 ≤ t1 < . . . , tp and any continuous bounded g. Let [τ]n be the smallest real number of the form k2−n which is greater than τ. Clearly

h (τ) (τ) i h ([τ]n) ([τ]n) i Ag(W (t1),...,W (tp)) = lim Ag(W (t1),...,W (tp)) . E I n→∞ E I

43 But

h ([τ]n) ([τ]n) i E IAg(W (t1),...,W (tp)) h  i = P g W (k2−n + t ) − W (k2−n),...,W (k2−n + t ) − W (k2−n) , k≥0 E IAk 1 1

−n −n where Ak := A ∩ {(k − 1)2 < τ ≤ k2 }. Since τ is a stopping time, we have that Ak is Fk2−n-measurable. Thus, by the independence of the increments of W , we get

h ([τ]n) ([τ]n) i   P E IAg(W (t1),...,W (tp)) = E g W (t1),...,W (tp) k≥0 P(Ak),   = P(A)E g W (t1),...,W (tp) , from which we get the result. 

44 Stochastic Integral

Let (Ω, F, {Ft}t≥0, P) be a filtered probability space satisfying the usual conditions.

We need the definition of some spaces: 2  LF ([0,T ]; R) is defined as the set of adapted measurable process f such that

"Z T # 2 E |f(t, ω)| dt < ∞. 0

2 We have that LF ([0,T ]; R) is a Hilbert space. For simplicity, when the context is clear, 2 we write only LF .

45 2 2 M ([0,T ]; R) is the set of Ft-square integrable (i.e. in LF ([0,T ]; R)) martingales with right-continuous paths starting from 0. We set

hM1,M2iM2 := E [M1(T )M2(T )] . (12)

M1 and M2 are identified if up P-null set, we have M1(·) = M2(·).

2 2 Mc([0,T ]; R) is the subspace of continuous elements of M ([0,T ]; R)

2 2 Theorem 10. M ([0,T ]; R) endowed with (12) is a Hilbert space and Mc([0,T ]; R) 2 is closed in M ([0,T ]; R).

2 Using this theorem, in particular the closeness property of Mc([0,T ]; R), we will define the stochastic integral w.r.t. W (t),

2 2 If (·) ∈ Mc of f ∈ LF .

46 Proof of theorem 10: The fact that k · kM2 is a norm is easy. We now prove that the 2 space is complete. Let Xn ∈ M be a Cauchy sequence. Choosing a subsequence nk such that −3k kXnk+1 − XnkkM2 ≤ 2 , Doob inequality (I) yields ! −k 2k  2 −k P sup |Xnk+1(t) − Xnk(t)| > 2 ≤ 2 E |Xnk+1(T ) − Xnk(T )| ≤ 2 . t∈[0,T ]

P By the Borel-Cantelli lemma (if i P(Ai) < +∞ then P(∩i ∪j≥i Ai) = 0) we get the existence of Ω0 ∈ F and X right-continuous, such that for all ω ∈ Ω0 we have supt∈[0,T ] |Xnk(t) − X(t)| → 0. 2 By dominated convergence and Doob inequality (II), we get that Xnk(t) → X(t) in L (Ω) for all 2 t ∈ [0,T ]. By passing to the limit in the martingale equality we get that X ∈ M . Finally, if Xnk is continuous we have that X is continuous, because of the a.s. uniform convergence of the subsequence. 

0 2 0  LF ([0,T ]; R) is the subspace of LF of bounded simple process, i.e. f ∈ L if there exists t0 = 0 < t1 < ... < ti < ... < T and Fti -measurable functions fi :Ω → R with supi,ω |fi(ω)| < ∞ such that: X f(t, ω) = fi(ω)I(ti,ti+1](t). i≥0

47  Define the stochastic integral If (ω, ·) : [0,T ] → R as X If (ω, ·) := fi(ω)[W (ω, · ∧ ti+1) − W (ω, · ∧ ti)] . i≥0

Fundamental properties of the process (ω, t) → If (ω, t).

(i) For all ω ∈ Ω, the continuity of W (ω, ·) yields the continuity of If (ω, t).

(ii) [Martingale property] We have that If is a martingale. In particular E(If ) = 0. Proof: If t > s ∈ (tj, tj+1), we have  E If (t)|Fs = If (s) + fjE(W (tj+1) − W (s)||Fs) P    + i≥j+1 E fi(ω) W (t ∧ ti+1) − W (t ∧ ti) |Fs , and conditioning to Fti in every term gives the result. 

2 hR T 2 i (iii) [Ito isometry] kIf (·)kM2 = E(If (T ) ) = E 0 |f(t)| dt = kfkL2.

48 Proof:

2 P h 2 2i E(If (T ) ) = i≥0 E |f(ti)| (W (ti+1) − W (ti)) P   +2 i

Conditioning by Fti each term of the first sum and by Ftj each term of the second sum we get the result. 

2 2 Properties (i)-(iii) imply that If ∈ Mc and that the linear application f ∈ LF → 2 0 0 If ∈ Mc is an isometry. Therefore we can extend If to f ∈ clos(L ) (the closure of L 2 0 2 in LF ). But... what is clos(L )? Note that if f ∈ LF is bounded and a.s. continuous, we have that

2n−1   X iT 2 fn(·) := f I (i+1)T (·) → f(·) in LF . 2n iT , i 2n 2n

2 We have the following general result for all f ∈ LF .

0 2 Lemma 5. We have clos(L ) = LF .

49 2 2 Therefore, for every f ∈ LF we have defined its stochastic integral If (·) ∈ Mc, which is usually denoted by

Z · If (·) =: f(s)dW (s). 0

Proof of lemma 5:

(i) As we have seen if f is bounded and a.s. continuous we are done.

(ii) If f is bounded progressively measurable we can approximate first by the following continuous adapted (and thus progressively measurable) process

Z t F k(t) := k f(s)ds., (13) 0∨(t− 1 ) k and we obtain the result by a diagonal argument.

(iii) If f is bounded adapted, it has a progressive measurable modification and the analogous process to (13) is progressive measurable. Using the argument based on Fubini theorem (see [14]) we can prove that F k(t) is adapted and then can proceed as in (ii).

50 (iv) If f is only measurable and adapted we truncate. 

Some remarks on the construction of the integral:

(i) One can think to approach a progressive process by sums as in the usual Lebesgue theory (see for example the sum in (1)). However, it is clear that it does not work since an event like {(t, ω); f(t, ω) ∈ [j2−n, (j + 1)2−n} 0 belongs only to B([0,T ]) × FT and thus we cannot construct elements in L of this kind, which implies the loose of the isometry property, the martingale property, etc.

(ii) Note that in the construction of the integral, we can avoid the use of the normal distribution of the increments as follows: We have used this in the Itˆoisometry property for the simple process. But the same result can be obtained, using that W (t)2 − t = W (t)2 − [W, W ](t) is a martingale (exercise!). This remark allows to define the stochastic integral for arbitrary continuous square-integrable martingales M(t). In fact, there is a deep result called the Doob-Meyer decomposition that says that for such

51 a martingale M the exists a process [M,M](t) (the quadratic variation of M) such that

M(t)2 − [M,M](t) is a martingale.

However, the approximation is then done in the space

2 LM := {x :Ω × [0,T ] → R ; x is adapted and kxkM,2 < ∞}, where "Z T # 2 kxkM,2 := E |X(t)| d[M,M](t) . 0

It can be shown that if the measure induced by [M,M](·) is absolutely continuous w.r.t. the Lebesgue measure (which is the case for the Brownian motion), then we can 2 approximate with simple functions any element of LM . However, for the general case we 2 can approximate only any progressive element of LM . For more information on this see the books [12, 14].

52 2 Fundamental properties of the stochastic integral Consider f, g ∈ LF and s < t

(i) [Conditioned Ito isometry and a new martingale]

 2  R t  hR t 2 i E s f(r)dW (r) |Fs = E s |f(r)| dr|Fs , R · 2 R · 2 0 f(r)dW (r) − 0 |f(r)| dr is a martingale.

Proof: The first identity follows analogously to the derivation for the unconditioned Ito isometry. In fact, the same proof yields the identity for simple process and then we can pass to the limit in the conditional expectation. The second identity comes from a direct computation using the first identity. In fact, using R · that 0 f(r)dW (r) is a martingale we get that

"Z t 2 # "Z t 2 # Z s 2 E f(r)dW (r) |Fs = E f(r)dW (r) |Fs − f(r)dW (r) , s 0 0 and the result follows using the first result. 

53 (ii) [Optional Sampling] For stopping times σ ≤ τ

hR t∧τ i E s∧σ f(r)dW (r)|Fσ = 0,  2  R t∧τ  hR t∧τ 2 i E s∧σ f(r)dW (r) |Fσ = E s∧σ |f(r)| dr|Fσ , hR t∧τ  R t∧τ  i hR t∧τ i E s∧σ f(r)dW (r) s∧σ g(r)dW (r) |Fσ = E s∧σ f(r)g(r)dr|Fσ

Proof: The first two identities are a direct consequence of the optional sampling theorem and (i). For the third identity, note that by the second identity we have

"Z t∧τ 2 # Z t∧τ  2 E [f(r) + g(r)]dW (r) |Fσ = E [f(r) + g(r)] dr|Fσ , s∧σ s∧σ developing the squares and using the second identity again, we get the result.  2 ˆ (ii) [Consistency] For any stopping time σ and f ∈ LF , let f(t, ω) := f(t, ω)Iσ(ω)≥t. Then Z t∧σ Z t f(s)dW (s) = fˆ(s)dW (s). 0 0

54 Proof: The proof of this fact follows easily from our construction (Exercise!). 

Now, we sketch the construction of the integral for a more general kind of process. Define

( Z T ) 2,loc 2 LF := X : [0,T ] × Ω → R ; X is Ft adapted and |X(t)| dt < ∞, P − a.s. 0

2,loc M := {X : [0,T ] × Ω → R ; ∃ nondecreasing stopping times σj with 2 P(limj→∞ σj ≥ T ) = 1 and X(· ∧ σj) ∈ M ∀ j ≥ 1

2,loc n 2,loc o Mc := X ∈ M ; X is continuous . 2,loc 2,loc If X ∈ M we say that X is a local martingale and if X ∈ Mc we say that X is a continuous local martingale.

2,loc Now, given f ∈ LF we are going to construct its stochastic integral by a localization

55 argument. In fact, for every j ≥ 1 define the stopping time

 Z t  2 σj(ω) := inf t ∈ [0,T ]; |f(s, ω)| ds ≥ j . 0

2 Define fj(t) := f(t)It≤σj . By definition of the stopping times we have that fj ∈ LF . We define the stochastic integral of f as

Z t Z t f(s)dW (s) := fj(s)dW (s) if t ∈ [0, σj] 0 0

We have to check that is well defined, because if t ≤ σi(ω) ≤ σj(ω) we have to verify that Z t Z t fj(s)dW (s) = fi(s)dW (s). 0 0 To see this, note that by the property of consistency, we have

Z t∧σi Z t Z t Z t f (s)dW (s) = f (s) dW (s) = f(s) dW (s) = f (s)dW (s). j j Is≤σi Is≤σi i 0 0 0 0

56 R · It can be proved that 0 f(s)dW (s) is a local martingale, but in general is not a martingale.

 Let us state an important inequality for the stochastic integral. Theorem 11. [Burkholder-Davis-Gundy (BDG) inequality] Let W be an m- 2 n×m dimensional Brownian motion and σ ∈ LF ([0,T ]; R ). Then, for any r > 0, there exists a constant Kr > 0 such that for any stopping time τ,

h ri  2r 1 R τ |σ(s)|2ds ≤ sup R t σ(s)dW (s) Kr E 0 E 0≤t≤τ 0 hR τ 2 ri ≤ KrE 0 |σ(s)| ds .

Note that for r = 1 the first inequality is trivial and the second inequality is Doob inequality (II) with p = 2.

57 Ito’s formula

 Note that if “standard” differential calculus were valid, when consider the stochastic integral one would expect that

Z t W 2(t) = 2 W (s)dW (s). 0

Let us check that this formula is wrong! In fact, evaluating in t 6= 0 and taking the expectation we obtain t = 0!. Let us find the correct formula with the definition of the n n n stochastic integral. Let us consider a sequence of partitions 0 = t0 < t1 < . . . < tn = t n n such that maxi≥1 |ti − ti−1| → 0 as n ↑ ∞. Since W (·) has continuous trajectories R t we can approximate 2 0 W (s)dW (s) as the limit of

Pn h i Pn h 2 2 i Pn h i2 i=1 2W (ti − 1) W (ti) − W (ti−1) = i=1 W (ti) − W (ti−1) − i=1 W (ti) − W (ti−1) 2 Pn h i2 = W (t) − i=1 W (ti) − W (ti−1) .

58 By taking the L2 limit we get t Z 2 2 W (s)dW (s) = W (t) − t. 0 

Now we provide the following change of variable formula better known as Itˆo’sformula. Now, we comeback to the setting where the Brownian motion is m-dimensional.

 It is important to fix now the following notation that will be used throughout the m0 0 notes. For a smooth function f : R+ × R → R (m ∈ N), we set

2 2 ∂tf(t, x) := Dtf(t, x), Df(t, x) := Dxf(t, x),D f(t, x) := Dxxf(t, x)

m Theorem 12. [Itˆo’sformula for the Brownian motion] Let f : R+ × R → R be C1,2([0,T ]) and W (t) a m-dimensional Brownian motion. Then, with probability 1, for every t > 0 we have:

Z t Z r >  1 h 2 i f(t, W (t)) = f(0, 0)+ Df(s, W (s)) dW (s)+ ∂tf(s, W (s)) + 2Tr D f(s, W (s)) ds. 0 0

59 For the proof of this important result see [12]. It is important to note that, since the result holds with probability one, then we can replace t by any stopping time τ.

Now we extend the result to the so-called Itˆo’sprocess. For adapted measurable n n×m processes b(t) and σ(t) with values in R and R respectively, satisfying that Z T Z T |b(s)|ds + |σ(s)|2dt < ∞ 0 0 we can define the process Z t Z t X(t) := X0 + b(s)ds + σ(s)dW (s). 0 0 A process of this type is called an Itˆo’sprocess. For these process we have the following Itˆo’sformula: n 1,2 n Theorem 13. Let f : R+ × R → R be C ([0,T ]; R ). Then, with probability one, we have

Z t Z t > h 1 h > 2 ii f(t, X(t)) = f(0, 0) + Df(s, X(s)) dX(s) + ∂tf(s, X(s)) + Tr σ(s)σ(s) D f(s, X(s)) dt. 0 0 2

60 Stochastic Differential Equations (SDEs)

Let (Ω, F, {Ft}t≥0, P) be given, W (t) be an m-dimensional Brownian motion and n n×m ξ0 a r.v. F0 measurable. Let us consider two functions b ∈ A(R ), σ ∈ A(R ) (recall (6)) and the following stochastic differential equation: (the expression in the following display is only standard notation, whose meaning is defined by the concept of solution define below)

dX(t) = b(t, X(·))dt + σ(t, X(·))dW (t), (14) X(0) = ξ0

We say that an adapted, continuous process X(t) is a solution of (14) if a.s. we have

(i) [Well posedness of the integrals]

Z t h 2i |b(s, X(·))| + |σ(s, X(·))| ds < ∞ for all t ≥ 0, P − a.s. 0

61 (Note that by the lemma 6 below the process b(t, X(·)) and σ(t, X(·)) are adapted)

(ii) [X solves the equation]

Z t Z t X(t) = ξ0 + b(s, X(·))ds + σ(s, X(·))dW (s) for all t ≥ 0. 0 0

Lemma 6. Let b ∈ A(R). Let (Ω, F, {F}t≥0, P) be given satisfying the usual conditions and let X be continuous and progressive. Then the process (t, ω) → b(t, X(·, ω)) is progressive.

Proof: We consider the application (t, ω) → Φ(t, ω) = (t, X(·, ω)) ∈ [0,T ]×W. Therefore, b(t, X(·, ω)) = b ◦ Φ(t, ω). −1 Since b ∈ A(R) we have that b (B(R)) ⊆ B([0, t]) × Bt+(W). Thus, since −1 X (Bt+(W)) ⊆ Ft+ = Ft (note that it is enough to verify it on the cylinders defined in (8)) we obtain immediately the result.

Assumptions for the existence of a unique solution:

62 (H) [Lipschitz conditions for the coefficients] There exists a constant L > 0 such that for all t ∈ [0, ∞), and x, y in W.

|b(t, x(·)) − b(t, y(·))| ≤ Lρˆ(x(·), y(·)) |σ(t, x(·)) − σ(t, y(·))| ≤ Lρˆ(x(·), y(·)) |b(t, 0)| + |σ(t, 0)| ∈ L2([0,T ]) for all T > 0.

Note that if we fix ¯j ∈ N we have,

¯ ¯ ¯ ¯ ρˆ(x(· ∧ j), y(· ∧ j)) ≤ Lkx(· ∧ j) − y(· ∧ j)kC[0,¯j],

Because, we clearly have

¯ ¯ ¯ ¯ kx(· ∧ j) − y(· ∧ j)kC[0,j] ≤ kx(· ∧ j) − y(· ∧ j)kC[0,¯j] for all j ∈ N.

For ` ≥ 1, T > 0, let us consider the space

` n  n LF (Ω,C([0,T ]; R )) := x :Ω × [0,T ] → R ; x is adapted, continuous and kxk`,∞ < ∞ ,

63 where " # ` ` kxk`,∞ := E sup |X(t)| . t∈[0,T ] ` n It is easy to show that LF (Ω,C([0,T ]; R )) is a Banach space (exercise!). Theorem 14. [Existence and uniqueness] Under the assumption (H), for all ` ≥ 1 and ` n ξ0 ∈ L (Ω, R ), there exists a unique solution X(t) of (14). Moreover, for all T > 0 ` n we have that X ∈ LF (Ω,C([0,T ]; R )).

Proof: We will use a fixed point technique. First, let us fix a fixed deterministic time ` n τ, to be chosen later, and consider the space Sτ := LF (Ω,C([0, τ]; R )). We define the application T : Sτ → Sτ by

Z t Z t T (x)(t) := ξ0 + b(s, X(·))ds + σ(s, X(·))dW (s) for all t ∈ [0, τ]. 0 0

If the application is well defined and is contractive, then we get our solution in [0, τ] and of course using the same argument we can extended to [τ, 2τ], by taking the logical initial condition in order to get continuity, etc. Therefore, it is enough to prove that

64 (i) T is well defined. We easily obtain the existence of a constant L1 > 0 such that

Z τ ` Z t `! ` ` |T (x)(t)| ≤ L1 |ξ0| + |b(s, X(·))|ds + sup σ(s, X(·))dW (s) . 0 t∈[0,τ] 0 (15) The BDG inequality gives

" `#  `  Z t Z τ 2 2 E sup σ(s, X(·))dW (s) ≤ K`E  |σ(s, X(·))| ds  t∈[0,τ] 0 0

Using (H) we get the existence of L2 > 0 such that

" `#  `  Z t Z τ 2 ` 2 2 ` E sup σ(s, X(·))dW (s) ≤ L2  |σ(s, 0)| ds + τ kxk`,∞ < ∞ t∈[0,τ] 0 0

65 Similarly, there exists L3 > 0 such that

" `# " ` # Z τ  Z τ  ` 2 ` E |b(s, X(·))|ds ≤ L3 |b(s, 0)|ds + τ kxk`,∞ < ∞. 0 0

Thus, by taking the supremum and then the expectation in (15) we obtain that T is well defined.

(ii) T is a contraction. Given x, y in Sτ , there exists a constant L4 > 0 such that

` hR τ ` |T (y)(t) − T (x)(t)| ≤ L4 0 |b(s, X(·)) − b(s, y(·))|ds ` + sup R t [σ(s, X(·)) − σ(s, y(·))] dW (s) t∈[0,τ] 0 (16) Using the BDG inequality we get an L5 > 0 such that   " Z t `# Z τ  ` 2 2 E sup [σ(s, X(·)) − σ(s, y(·))] dW (s) ≤ L5E  |σ(s, X(·)) − σ(s, y(·))| ds  t∈[0,τ] 0 0

66 By (H) we obtain the existence of L6 > 0 such that

" `# Z t ` 2 ` E sup [σ(s, X(·)) − σ(s, y(·))] dW (s) ≤ L6τ kx − yk`,∞ t∈[0,τ] 0

By (H) again, we get the existence of L7 > 0 such that

` Z τ  ` ` ` 2 ` |b(s, X(·)) − b(s, y(·))|ds ≤ L7τ kx−yk`,∞ ≤ L7τ kx−yk`,∞ for τ ≤ 1. 0

Therefore, by taking the supremum in (16), we finally obtain the existence of L8 > 0 such that ` 2 ` kT (y) − T (x)k`,∞ ≤ L8τ kx − yk`,∞. 2 Letting τ < min{1, ( 1 ) ` }, we get the result. L8 

67 n n  [Markov solutions] Suppose that we are given b : [0, ∞) × R → R and n n×m σ : [0, ∞) × R → R such that (H’) [Lipschitz conditions for the coefficients (bis)] There exists a constant L > 0 such that for all t ∈ [0, ∞), and x, y in W.

|b(t, x) − b(t, y)| ≤ L|x − y| |σ(t, x) − σ(t, y)| ≤ L|x − y| |b(t, 0)| + |σ(t, 0)| ∈ L2([0,T ]) for all T > 0.

We consider the SDE

dX(t) = b(t, X(·))dt + σ(t, X(·))dW (t), (17) X(0) = ξ0

Clearly, this is a special case of the one studied before. Moreover, it is clear that (H’) n n×m implies (H) when b and σ are viewed as elements in A(R ) and A(R ) respectively. This particular structure is very important since the solutions are Markovian, i.e. satisfy the Markov property.

68 Theorem 15. Under (H’) there exists a unique solution X of (17). Moreover, X(t) is a Markov process, i.e.

E (φ(X(t + h))|Ft) = E (φ(X(t + h))|X(t)) .

Moreover, X(t) is a strong Markov process, i.e. for every stopping time τ, we have

E (φ(X(τ + h))|Fτ ) = E (φ(X(τ + h))|X(τ)) .

Idea of the proof: We only need to prove the Markov and strong Markov property. We will argue in a formal way (see [14] for a rigorous proof). Note that

Z t+h Z τ+t X(t + h) = X(t) + b(s, X(s))ds + σ(s, X(s))dW (s). t τ

Defining X0(h) := X(t + h), W 0(h) := W (t + h) − W (t), b0(h, x) := b(t + h, x), 0 0 σ (h, x) = σ(t + h) and the filtration Fh := Ft+h, we can write the equation in the

69 form

Z h Z h X0(h) = X0(0) + b0(h0,X0(h0))dh + σ0(h0,X0(h0))dW 0(h0). 0 0

0 0 Since W (h) is a Brownian motion w.r.t. the filtration Fh (and independent of Fs by the Markov property), the above SDE is well posed and its solution depend only on X0(0) . By path uniqueness of the solutions this implies the result. For the strong Markov property the previous ideas work in the same way if we have autonomous coefficients, noting that 0 the new W (h) := W (τ + h) − W (τ) is a Brownian motion independent of Fτ by the strong Markov property. Otherwise, we can add an artificial variable (dXn+1 = 1dt), to treat the time as a part of X.

70 Proposition 9. [Important stability estimates] Suppose that (H) holds true. Then, for all T > 0 there exists a constant KT such that the unique solution X(t) of (14) satisfies: (i) " # `  `  E sup |X(s)| ≤ KT 1 + E(|ξ| ) . 0≤s≤T (ii)  `  `  ` E |X(t) − X(s)| ≤ KT 1 + E(|ξ| ) |t − s|2 . (iii) If ξˆ ∈ L` (Ω; n) is another r.v. and Xˆ (t) is the corresponding solution, then F0 R ! ˆ `  ˆ ` E sup |X(s) − X(s)| ≤ KT E |ξ − ξ| . 0≤s≤T

To prove these results we will need the Gronwall lemma.

71 Lemma 7. [Gronwall lemma] Let f :[a, b] → R be piecewise continuous, satisfying

Z t f(t) ≤ α + β f(s)ds, for all t ∈ [a, b], (18) a for some positive α and β. Then

f(t) ≤ αeβ(t−a).

Proof of lemma 7: Multiplying eq. (18) by e−β(t−a) we get

d  Z t  e−β(t−a) f(s)ds ≤ αe−β(t−a). dt a

Integrating this equation yields

Z t α e−β(t−a) f(s)ds ≤ (1 − e−β(t−a)). a β

Plugging this expression in (18) gives the result. 

72 Proof of proposition 9: We will prove the result for ` ≥ 2. The proof for the general case can be found in [18].

Proof of (i): Given T > 0 and t ∈ [0,T ], arguing as in the first part of the proof of theorem 14 we easily obtain

Z t ` Z s `! ` ` |X(t)| ≤ LT |ξ| + |b(s, X(r))|ds + sup σ(r, X(r))dW (r) 0 0≤s≤t 0

The BDG inequality and the fact that ` ≥ 2, give

 `  " Z s `# Z T !2 Z t ! 2 ` E sup σ(r, X(r))dW (r) ≤ C  |σ(s, 0)| ds + E sup |X(r)| ds , 0≤s≤t 0  0 0 0≤r≤s  for some C > 0. In what follows C will always denote a generic constant. Clearly,

  "Z t `# Z T !` Z t ! ` E |b(s, X(s))|ds ≤ C  |b(s, 0)|ds + E sup |X(r)| ds . 0 0 0 0≤r≤s

73 Therefore, we finally obtain ! " Z t ! # `  ` ` E sup |X(s)| ≤ C 1 + E |ξ| + E sup |X(r)| ds , 0≤s≤t 0 0≤r≤s and the result follows from Gronwall lemma.

Proof of (ii): Repeating the same kind of arguments we get     ` R t ` R t ` E |X(t) − X(s)| ≤ C s |b(0, r)|dr + E s |X(r)|dr ` " ` #) (19) R t 2 2 R t 2 2 + s |σ(0, r)| dr + E s |X(r)| dr

By the Cauchy-Schwartz inequality we get

 `  "Z t `# Z T !2 ` 2 E |X(r)|dr ≤ (t − s)2 E  |X(r)| dr  . s  0 

Using this fact, majoring X(r) by sup0≤t≤T X(r) in (19) and using (i), we easily obtain the result.

Proof of (iii): Exercise! Very similar to (i). 

74 Stochastic equations with random coefficients

Let us consider the following SDE with explicit dependence on ω

dX(t) = b(t, X(·), ω)dt + σ(t, X(·), ω)dW (t), (20) X(0) = ξ0,

n n n n n×m where b : [0, ∞) × W × Ω → R → R and σ :: [0, ∞) × W × Ω → R . The definition of solution of (20) is the obvious modification than the one for (??). In order to give a sense to the above SDE, we suppose:

n n n n×m (i) For every ω ∈ Ω, we have that b(·, ·, ω) ∈ A (R ) and b(·, ·, ω) ∈ A (R ). n (ii) For any x ∈ R , b(·, x, ·) and σ(·, x, ·) are adapted.

75 (iii) There exists a constant L > 0 such that for all t ∈ [0, ∞), ω ∈ Ω, and x, y in W.

|b(t, x(·), ω) − b(t, y(·), ω)| ≤ Lρˆ(x(·), y(·)) |σ(t, x(·), ω) − σ(t, y(·), ω)| ≤ Lρˆ(x(·), y(·)) 2 |b(·, 0, ·)| + |σ(·, 0, ·)| ∈ LF ([0,T ]) for all T > 0.

Theorem 16. [Existence and uniqueness for SDEs with random coefficients] Under the above assumptions there exists a unique solution of (20). Moreover, the obvious estimates (uniforms in ω), of proposition 9 hold true.

Proof: It is a direct adaptation of the proof for the non-random coefficient case (Exercise!).

It is important to note that since we have an explicit dependence in ω, the coefficients “have memory”, and thus the argument of the proof of the Markov property for diffusions does not apply. In general, the solution of (20) is not a Markov process.

76 Connections with PDEs

We come back to the setting of non-random coefficients. Let {Xt,x(s), s ≥ t} be the unique solution of

dX(s) = b(s, X(s))ds + σ(s, X(s))dW (s) for s ≥ t, X(t) = x.

n Given f : R → R, define the infinitesimal generator Af as  t,x  E f(X (t + h)) − f(x) Af(t, x) := lim if the limit exists. h→0 h

Itˆo’sformula implies that A is well defined for every C1,2-function with bounded derivatives and > 1 h > 2 i Af = b(t, x) Df(t, x) + 2Tr σσ (t, x)D f(x, t) . (21)

77  t,x  1,2 Proposition 10. Assume that (t, x) → v(t, x) := E g X (T ) is C ([0,T ] × n R ). Then v solves the PDE

∂tv + Av = 0, v(T, ·) = g(·).

Proof: If the function v has bounded derivatives the proof below simplifies considerably (exercise). To treat the case when only v ∈ C1,2, we use a typical localization in space technique in order to be able to eliminate the diffusion term by taking expectation. In fact, define the stopping time t,x τ1 := T ∧ inf{s > t ; |X (s) − x| ≥ 1}. Itˆo’sformula gives

 t,x  R s∧τ1 t,x v s ∧ τ1,X (s ∧ τ1) = v(t, x) + t (∂tv + Av)(r, X (r))dr R s∧τ1 t,x > t,x + t Dv(s, X (r)) σ(r, X (r))dW (r).

By taking the expectation, we get

Z s∧τ  h  t,x i 1 t,x E v s ∧ τ1,X (s ∧ τ1) − v(t, x) = E (∂tv + Av)(r, X (r))dr . t

78 But, by the strong Markov property we obtain

  t,x    t,x  s∧τ1,X (s∧τ1) v s ∧ τ1(ω),X (s ∧ τ1(ω), ω) = E g X (T ) |Fτ1 (ω).

Therefore,

 t,x  h  x,t  i v s ∧ τ1(ω),X (s ∧ τ1(ω), ω) = E g X (T ) |Fτ1 (ω).

By taking the expectation we get

h  t,x i E v s ∧ τ1,X (s ∧ τ1) = v(x, t), from which Z s∧τ  1 t,x E (∂tv + Av)(r, X (r))dr = 0. t Dividing by s − t, letting s ↓ t and using Lebesgue theorem we obtain the result. 

Now, we consider a more general problem:

∂ v + Av − k(t, x) + f(t, x) = 0, for (t, x) ∈ [0,T ) × n t R (22) v(T, ·) = g(·).

79 We suppose that: (i) b and σ are continuous, Lipschitz in x uniformly in t and |b(·, x)|, |σ(·, x)| belong to L2([0,T ], (ii) the function k is uniformly bounded from below, (iii) the function f has quadratic growth in x uniformly in t.

Theorem 17. [Probabilistic representation of the solution of a linear parabolic PDE, known as Feynman- Kac formula] Suppose that the above assumptions hold true. Let 1,2 m v ∈ C ([0,T ], R ) be a solution of (22) satisfying quadratic growth in x uniformly in t. Then v has the following representation

"Z T # t,x  t,x  t,x t,x v(t, x) = E β (s)f s, X (s) .s + β (T )g(X (T )) , t

t,x R s t,x x,t where β (s) := exp{− t k(r, X (r))dr}, with X (s) being the solution of

dX(s) = b(s, X(s))ds + σ(s, X(s))dW (s), X(t) = x.

80 Proof: Define the sequence of stopping times

n t,x o τn := T ∧ inf s > t ; |X (s) − x| ≥ n .

t,x By the continuity of the paths of X (s) it is clear that a.s. we have τn ↑ T as n ↑ ∞. Using that v is smooth, we can apply Itˆo’sformula to βt,x(s)v(s, Xx,t(s)) obtaining that

h t,x x,t i t,x  x,t  d β (s)v(s, X (s)) = β (s)[−kv + ∂tv + Av] s, X (s) ds     +βt,x(s)Dv s, Xx,t(s) σ s, Xx,t(s) dW (s).

Using that v solves (22) and taking the expected value, we obtain

h t,x x,t i h R τn t,x  x,t  i E β (τn)v(τn,X (τn)) − v(t, x) = E − t β (s)f s, X (s) ds hR τn t,x  x,t   x,t  i +E t β (s)Dv s, X (s) σ s, X (s) dW (s) .

x,t Since before τn, X (s) is bounded, the last term in the above expression is zero. Therefore, we have

Z τn  t,x  x,t  t,x x,t v(t, x) = E β (s)f s, X (s) ds + β (τn)v(τn,X (τn)) . t

81 The result easily follows by letting n ↑ ∞ by Lebesgue theorem. In order to verify that the integrand is dominated we use the quadratic growth property of f, the estimates for the second moment of Xx,t(s) and the fact that k is bounded by below. 

82 Stochastic control: Problem formulation

Excellent references for what follows are the books [11, 20, 24] and the lecture notes [23].

Let (Ω, F, {F}t≥0, P) be a given probability space satisfying the usual conditions. Let W be a Brownian motion defined on this space. We suppose that the filtration is given n by the canonical augmentation of the one generated by W (t). For (t, x) ∈ [0,T ] × R , let us consider the following controlled stochastic SDE

dX(s) = b (s, X(s),u (s)) ds + σ(s, X(s),u (s))dW (s) (23) X(t) = x

In the notation above, u is the control that belongs to the space

n 2 d o U := u : [0,T ] → U ; u ∈ L ([0,T ]; R ) .

83 d t,x where U ⊆ R is closed. For u ∈ U we denote by X [u] for the unique solution (if it n d n exists!) of (23). Given a function ` : [0,T ]×R ×R → R and g : R → R, we consider n the cost (which is well defined under the assumptions below) J : [0,T ] × R × U → R

"Z T # t,x t,x J(t, x, u) := E `(s, X [u](s), u(s))ds + g(X [u](T )) . t

The stochastic problem at (t, x) is to calculate the following value function

v(x, t) := inf J(t, x, u), v(x, T ) = g(x). u∈U

 [Assumptions to give an sense to the problem]. We first consider standard assumptions for the SDE:

n d n (i) [Uniform continuity assumption] The maps b : [0,T ] × R × R → R , σ : n d n×m n d n n [0,T ] × R × R → R , ` : [0,T ] × R × R → R and g : R → R are uniformly continuous.

84 (i) [Lipschitz assumption] There exists a constant L > 0 such that for φ(t, x, u)= b(t, x, u), σ(t, x, u), `(t, x, u), h(x), we have

n |φ(t, x, u) − φ(t, y, u)| ≤ L|x − y|, for all t ∈ [0,T ], x, y ∈ R , u ∈ U. |φ(t, 0, u)| ≤ L, ∀ (t, u) ∈ [0,T ] × U.

Under these assumptions, for each u ∈ U, equation (23) admits a unique solution Xt,x[u] and the function J is well defined. For simplicity, we have considered these assumptions. However, most of the results presented later can be extended to more general settings as:

(i’) Instead of uniform continuity you ask only for continuity.

(ii’) Lipschitz assumption only for b and σ.

(iii’) Instead of a uniform bound over b(t, 0, u) and σ(t, 0, u) you only ask for linear growth for b(t, x, u) and σ(t, x, u) .

(iv’) Instead of a uniform bound over `(t, 0, u), you only ask for quadratic growth for

85 `(t, x, u) and g(x).

For these assumptions see the lecture notes of N. Touzi [23] and the books [11, 20].

 [An interesting reduction of the set of admissible controls] Let us consider the set

 Ut := u ∈ U ; u|[t,T ] is independent of Ft .

We have:

Proposition 11. [A restriction of the set of admissible controls] The value function can be calculated as v(t, x) = inf J(t, x, u). u∈Ut

Proof: Clearly, v(t, x) ≤ inf J(t, x, u). Now, let u ∈ U and for simplicity suppose that u∈Ut ` = 0. We know by Dynkin theorem that u can be written as

u(s) = h (s, W (· ∧ s)) .

86 Therefore, for s > t we have   u(s) = h (s, W (· ∧ s)) = h s, W (· ∧ t) + W 0(· ∧ s) where W 0(· ∧ s) = W (· ∧ t) + [W (· ∧ s) − W (· ∧ t)]. Therefore if “we freeze the trajectory” of W (·∧t) we have, because of the independence of the increments of w(·), a new control that is independent of Ft. Using this observation, we have (denoting by µ the Wiener measure), that

h t,x i R t,x J(t, x, u) = E g(X [u](T )) = C([0,T ]) g(X [u](T, w))dµ(w) R R t,x 0 0 = C([t,T ]) C([0,t]) g(X [u](T, w, w ))dµ(w)dµ(w ) R R t,x 0 = C([0,t]) C([t,T ]) g(X [u](T, w, w ))dµ(w)dµ(w) ≥ R inf J(t, x, u)dµ(w) = inf J(t, x, u), C([0,t]) u∈Ut u∈Ut which yields the result. 

Some properties of the value function: As we will see later v is the unique solution (in a weak sense to be defined) of a second order HJB equation. However, we can deduce some interesting properties of v without appealing to this PDE (which in turns a posteriori prove these properties for the solution of the PDE).

87 1 Theorem 18. [Linear growth for v. Lipchitz property of v w.r.t. x and local 2 Holder w.r.t. to t] There exists K > 0 such that the value function satisfies

n |v(t, x)| ≤ K(1 + |x|), ∀ (t, x) ∈ [0,T ] × R ,  1  n |v(t, x) − v(t,ˆ xˆ)| ≤ K |x − xˆ| + (1 + |x| ∨ |xˆ|) |t − tˆ|2 ∀ t, tˆ ∈ [0,T ], x, xˆ ∈ R .

Proof: Since our assumptions are uniform on u, the proof is a direct consequence of our estimates for solutions of SDE (with ` = 1).   [Continuous dependence on parameters] Consider a family of problems parameterized by ε > 0. The dynamics are of the form

dX(t) = b (s, X(s), u(s)) ds + σ (s, X(s), u(s))dW (s) ε ε (24) X(t) = x and the cost

"Z T # t,x t,x Jε(t, x, u) := E `ε(s, Xε [u](s), u(s))ds + gε(Xε [u](T )) . t

88 t,x where Xε [u](s) is the solution of (24) associated with u. The function vε is defined as

vε(x, t) := inf Jε(t, x, u),v ε(x, T )= gε(x). u∈U

We suppose, uniformly in ε, that the data of the family of problems satisfies the same kind of assumptions than the data for the original problem. For ε = 0, we consider our original data. Proposition 12. [A stability result] Suppose that uniformly in (t, u) ∈ [0,T ] × U and x ∈ K, for some compact K, we have

lim |φε(t, x, u) − φ0(t, x, u)| = 0, ε↓0 where φε = bε, σε, `ε, gε. Then vε(t, x) → v(t, x) uniformly over compact sets.

n Proof: Fix (t, x) ∈ [0,T ] × R and u ∈ Ut. For notational convenience, we do not write the t,x t,x dependence on time for the data of the problems. Let us set Xε(·) := Xε [u](·), X0(·) := X [u](·), δεb(X) := bε(Xε)−b(X), δεσ(X) := σε(Xε)−σ(X). By Itˆo’sformula, we have for every s ∈ [t, T ], Z s Z s 2 n h >io |Xε(s)−X(s)| = 2(Xε − X)δεb(x) + Tr δεσ(X)δεσ(X) dr+2 (Xε−X)δεσ(X)dW (r). t t

89 From now on we denote by K > 0 a generic constant. Thus, " # 2 E sup |Xε(r) − X(r)| ≤ K(I1 + I2), t≤r≤s where, h h i i R s R s > I1 = E t |Xε(r) − X(r)||δεb(X)|dr + t Tr δεσ(X)δεσ(X) dr ,  R r 0  I2 = E supt≤r≤s t (Xε − X)δεσ(X)dW (r ) .

Let us first estimate I1. Since |δεb(X)| ≤ K (|Xε − X| + |bε(X) − b(X)|) with a similar expression for δεσ, we easily obtain ( "Z s # Z s ) 0 0 2  2 2 I1 ≤ K E sup |Xε(r ) − X(r )| dr + E |bε(X) − b(X)| + |σε(X) − σ(X)| dr . t t≤r0≤r t

Now we estimate I2. By the BDG inequality we get  1   1  Z s  Z s  2 2 2 2 2 I2 ≤ E  |Xε − X| |δεσ(X)| dr  ≤ E  sup |Xε(r) − X(r)| |δεσ(X)| dr  t t≤r≤s t

90 By Young inequality, with the obtain that " # Z s  1 2 2 2 KI2 ≤ 2E sup |Xε(r) − X(r)| + 2K E |δεσ(X)| dr . t≤r≤s t

Therefore we get,

h 2i n hR s 0 0 2 i E supt≤r≤s |Xε(r) − X(r)| ≤ K E t supt≤r0≤r |Xε(r ) − X(r )| dr hR s  2 2 io +E t |bε(X) − b(X)| + |σε(X) − σ(X)| dr , which by Gronwall inequality implies that " # "Z T # 2  2 2 E sup |Xε(r) − X(r)| ≤ KE |bε(X(s)) − b(X(s))| + |σε(X(s)) − σ(X(s))| ds . t≤s≤T t

By the uniform convergence of the parameters we can find η : [0, ∞) × [0, ∞) → [0, ∞), continuous, non-decreasing, with η(0,R) = 0 for all R ≥ 0, such that n |φε(t, x, u) − φ0(t, x, u)| ≤ η(ε, |x|) ∀ (t, x, u) ∈ [0,T ] × R × U. In fact, we can take

η(ε, R) = sup sup sup φε0(t, x, u) − φ0(t, x, u) . 0≤ε0≤ε 0≤r≤R (t,u)∈[0,T ]×U, x∈B(0,r)

91 Thus, we get that

" # Z T 2 h 2i E sup |Xε(s) − X(s)| ≤ K E η(ε, |X(s)|) ds t≤s≤T t

Now, we estimate the r.h.s. of the above equation. By our assumptions, we obtain

η(ε, |X(s)|) ≤ 2L(1 + |X(s)|).

Therefore, for any R > 0, we have

h 2 i 2 h  2i E η(ε, |X(t)|) ) ≤ η(ε, R) + KE I|X(t)|>R 1 + |X(t)| .

Thus, by the Cauchy Schwartz inequality, we obtain

1 h 2 i 2 1  h 4i2 E η(ε, |X(t)|) ) ≤ η(ε, R) + KP(|X(t)| > R)2 E 1 + |X(t)| .

By the Chebyshev inequality, we get

1 h i 4 2 η(ε, |X(t)|)2) ≤ η(ε, R)2 + K E(|X(t)| ) (1 + |x|2) E R2 4 ≤ η(ε, R)2 + K (1+|x| ). R2

92 Therefore, h i (1 + |x|4) sup η(ε, |X(t)|)2 ≤ η(ε, R)2 + K E 2 t∈[0,T ] R which gives ! 1 + |x|2 sup E [η(ε, |X(t)|)] ≤ K η(ε, R) + . (25) t∈[0,T ] R In particular, letting first ε ↓ 0 and then R ↑ ∞, we get " # E sup |Xε(s) − X(s)| → 0. t≤s≤T

Thus,  |Jε(t, x, u) − J(t, x, u)| ≤ KE sups≤t≤T |Xε(s) − X(s)| + |gε(X(T )) − g(X(T ))| R T i + t |`ε(t, X(T ), u) − `(t, X(T ), u)| dt .

Using (25), we have

! 1 + |x|2 |Jε(t, x, u) − J(t, x, u)| ≤ K η(ε, R) + . R

93 Which implies that Jε(t, x, u) → J(t, x, u) uniformly in [0,T ] × Ut and x in a compact set. This fact implies the result. 

√As a particular case, let us suppose that we only perturb σ to σε := (σ, 2εIn)n×(m+n) (i.e. we are adding n independent “small noises”). Then Corollary 3. [An error estimate] There exists a constant K > 0 such that √ |vε(t, x) − v(t, x)| ≤ K ε.

Proof: It is clear that now " # √ E sup |Xε(s) − X(s)| = O( ε), t≤s≤T and the result follows directly from the expression of |Jε(t, x, u) − J(t, x, u)|. 

94  [Semiconcavity of the value function under stronger assumptions] Let us recall that n a function φ is said to be semiconcave if ∀ x, y ∈ R , λ ∈ [0, 1]

λφ(x) + (1 − λ)φ(y) − φ(λx + (1 − λ)y) ≤ Kλ(1 − λ)|x − y|2.

The trivial and useful semiconcave and non-concave example is |x|2 thanks to the identity

λ|x|2 + (1 − λ)|y|2 = |λx + (1 − λ)y|2 + λ(1 − λ)|x − y|2.

As a direct consequence of the above example, we get: Proposition 13. [An equivalent definition] A function φ is semiconcave iff there exists a constant K > 0 such that φ(·) − K| · |2 is concave.

We say that a function φ is semiconvex if −φ is semiconcave, i.e. there exists a constant K > 0 such that φ(·) + K| · |2 is convex.

95 Let us assume that g(·) and `(t, ·, u) are semiconcave uniformly on (t, u). Moreover, let us suppose that b(t, ·, u), σ(t, ·, u) are differentiable and its derivatives are Lipschitz uniformly in (t, u).

Proposition 14. [Semiconcavity of v] Under the above assumptions, the function v(t, ·) is semiconcave, uniformly in t ∈ [0,T ].

n Sketch of the proof: Take two points x1, x2 ∈ R and set xλ := λx1 + (1 − λ)x2. Fix u ∈ Ut such that J(t, xλ, u) ≤ v(t, xλ) + ε. Given this fact, it is enough to prove the semiconcavity of J(t, ·, u) with an associated constant independent of u. Let us denote by

t,x1 t,x2 t,xλ X1 = X [u],X2 = X [u] and Xλ = X [u].

Due to the non linearity of b and σ we have in general Xλ 6= λX1 + (1 − λ)X2 and we cannot apply directly the semiconcavity of `(t, ·, u) an g(·). So we have to work a little... In order to simplify the notation, suppose that ` ≡ 0

λJ(t, x1, u)+(1−λ)J(t, x2, u)−J(t, xλ, u) = E [λg(X1(T )) + (1 − λ)g(X2(T )) − g(Xλ(T ))] .

96 λ Let us define X (t) := λX1 + (1 − λ)X2. In order to use the semiconcavity of g we need to estimate " # λ E sup |X (s) − Xλ(s)| s∈[t,T ]

As usual these will depend on the difference of the coefficients. Let us write the coefficient of the dt part

λb(s, X1(s)) + (1 − λ)b(s, X2(s)) − b(s, Xλ(s)).

Because of the Lipschitz property for b, it is natural to write

λ λ λb(s, X1(s)) + (1 − λ)b(s, X2(s)) − b(s, X (s)) + b(s, X (s)) − b(s, Xλ(s)), and to prove that

λ 2 λb(s, X1(s)) + (1 − λ)b(s, X2(s)) − b(s, X (s)) ≤ Kλ(1 − λ)|X1(s) − X2(s)| . This is easily checked using that b(s, ·) has a Lipschitz derivative. Analogously, we obtain

λ 2 λσ(s, X1(s)) + (1 − λ)σ(s, X2(s)) − σ(s, X (s)) ≤ Kλ(1 − λ)|X1(s) − X2(s)| .

λ 2 Then use the standard procedure: write |X (s) − Xλ(s)| , apply the BDG inequality and then use the Gronwall inequality to get the result (Exercise!). 

97 The Hamilton-Jacobi-

Now, we do the link with the PDEs. In fact, for the moment, we prove that if 1,2 d v ∈ C ([0,T ] × R ) then it satisfies a second order Hamilon-Jacobi-Bellman (HJB) equation (compare with proposition 10). Let us define the function

n n n×n Hˆ : [0,T ] × R × U × R × R → R n n n×n H : [0,T ] × R × R × R → R as

ˆ > 1 > H(t, x, u, p, P ) := `(t, x, u) + p b(x, u) + 2Tr(σ(x, u)σ(x, u) P ),

H(t, x, p, P ) := infu∈U Hˆ(t, x, u, p, P ).

n Note that for a smooth f : [0,T [×R → R, we have

H(t, x, Df(t, x),D2f(t, x)) = inf {`(t, x, u) + A[u]f(t, x)} , u∈U

98 where

> 1 h > 2 i A[u]f(t, x)= b(t, x, u) Df(t, x) + 2Tr σσ (t, x, u)D f(x, t) .

The dynamic programming principle

“An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.” R. Bellman [4]

The following property will be crucial in order to characterize the value function as the solution of a PDE.

99 n Theorem 19. [Dynamic programming principle] For any (t, x) ∈ [0,T ] × R we have

"Z t¯ # t,x t,x v(t, x) = inf E `(t, X [u](s), u(s))ds + v(t,¯ X [u](t¯)) . u∈Ut t

Proof of the “easy inequality”: Let u ∈ Ut. We have

hR t¯ t,x R T t,x t,x i J(t, x, u) = E t `(t, X [u](s), u(s))ds + t¯ `(t, X [u](s), u(s))ds + g(X [u](T ))   R t¯ t,x R T t,x t,X¯ t,x[u](t¯) = E t `(t, X [u](s), u(s))ds + t¯ `(t, X [u](s), u(s))ds + g(X [u](T ))

Dynkin theorem 3 implies that each time that we freeze the W (·) up to time t¯ (i.e. we condition w.r.t. ω ω Ft¯ and we evaluate at ω), there exists an admissible control uˆ ∈ Ut¯ such that uˆ (s) = u(s, ω) for all ¯ s ∈ [0, t]. Conditioning w.r.t. Ft¯ we get

h ¯ i R t t,x ¯ t,x ¯ J(t, x, u) = E t `(t, X [u](s), u(s))ds + J(t, X [u](t), uˆ) h ¯ i R t t,x ¯ t,x ¯ ≥ E t `(t, X [u](s), u(s))ds + v(t, X [u](t)) ,

100 which implies that "Z t¯ # t,x t,x v(t, x) ≥ inf E `(t, X [u](s), u(s))ds + v(t,¯ X [u](t¯)) . u∈Ut t

The other inequality is rather difficult to prove, specially for more general problems when the value function v is not a priori continuous. There are different types if proofs: El Karoui [9] used delicate measurability selections theorems. Yong and Zhou [24] consider a weak form of the formulation of stochastic optimal control problems (i.e. the probability space is a part of the control). B. Bouchard and N. Touzi [6] established a general “weak form” of the dynamic programming principle, that allows to establish the HJB equation (26) in a very general framework.  Let us consider the following second order HJB equation

2 n ∂tv(t, x) + H(t, x, Dv(t, x),D v(t, x)) = 0 for (t, x) ∈ [0,T ) × R n v(T, x) = g(x) for x ∈ R . (26) We have 1,2 d Theorem 20. Suppose that the value function v belongs to C ([0,T ] × R ). Then v solves (26).

101 2 Proof: For simplicity we will suppose that the derivatives ∂tv, Dv and D v are bounded. Otherwise, we have to use a localization argument with the stopping times (as in the proof of proposition 10). Given u ∈ U define a control u ∈ Ut as u(t, ω) ≡ u. By the dynamic programming principle we have

"Z t¯ # x,t x,t v(t, x) ≤ E `(s, X [u](s), u)ds + v(t,¯ X [u](t¯)) . (27) t

On the other hand, Itˆo’sformula implies

¯h i ¯ x,t ¯ R t x,t x,t v(t, X [u](t)) = v(t, x) + t ∂tv(s, X [u](s)) + A[u]v(s, X [u](s)) dt R t¯ x,t + t Dv(s, X [u](s))σdW (s).

By taking the expected valued and using (27), we obtain

"Z t¯ # n x,t ˆ  x,t  x,t  2  x,t o E ∂tv(s, X [u](s)) + H s, X [u](s), u, Dv s, X [u](s) ,D v s, X [u](s) ≥ 0. t

Dividing by t¯− t and letting t → t¯ gives

2 ∂tv(t, x) + Hˆ (t, x, u, Dv(t, x),D v(t, x)) ≥ 0.

102 By taking the infimum over u ∈ U we get

2 ∂tv(t, x) + H(t, x, Dv(t, x),D v(t, x)) ≥ 0.

To prove the converse inequality, chose uε,t¯ such that

"Z t¯ # ¯ x,t ¯ x,t ¯ v(t, x) + ε(t − t) ≥ E `(s, X [uε,t¯](s), uε,t¯)ds + v(t, X [uε,t¯](t)) . t

Again, using Itˆo’sformula, we get (we simplify the notation since the context is clear)

h ¯n     oi ¯ R t ˆ 2 ε(t − t) ≥ E t ∂tv(s, X[uε,t¯](s)) + H s, X[uε,t¯], uε,t¯, Dv s, X[uε,t¯] ,D v s, X[uε,t¯] hR t¯   2 i ≥ E t ∂tv(s, X[uε,t¯](s)) + H s, X(s), Dv s, X[uε,t¯],D v(s, X[uε,t¯] .

Using the uniform continuity of the functions, we can divide by t¯ − t and to pass to the limit to get the result.  1,2 n Theorem 21. [A verification theorem] Let v ∈ C ([0,T ] × R ) be a solution of (26). Then: (i) n v(x, t) ≤ J(t, x, u) for all u ∈ Ut, (t, x) ∈ [0,T ] × R .

103 (ii) We have the following verification argument for open-loops controls. An admissible pair (¯x(·), u¯(·)) is optimal for v(x, t) iff

2 ∂tv(x, t) + Hˆ (t, x¯(t), u¯(t), Dv(t, x¯(t)),D v(t, x¯(t)) = 0.

(iii) We have the following verification argument for Markov or feed-back controls. Suppose that for each (t, x) there exists a u¯(t, x) ∈ U such that     Hˆ t, x, u¯(t, x), Dv(t, x),D2v(t, x) = H t, x, Dv(t, x),D2v(t, x) .

Suppose also that the equation

dX(s) = b (s, X(s), u¯(s, X(s))) ds + σ(s, X(s), u¯(s, X(s)))dW (s) X(t) = x (28) admits a unique solution for all (x, t). If s ∈ [t, T ] → u¯(s, X(s)) ∈ U is admissible, then it is optimal.

104 Proof: Let u ∈ Ut with associate trajectory x(·) with x(t) = x, then we have

hR T i E(g(x(T )) = v(x, t) + E t {∂tv(s, x(s)) + Av(s, x(s))} hR T n 2 ≥ v(x, t) + E t ∂tv(s, x(s)) + H(s, x(s), Dv(s, x(s)),D v(s, x(s))) −`(s, x(s), u(s))}] , which, using the equation (26), proves (i). For (ii), the inequality becomes equality, from which follows easily the result. Finally, (iii) is a direct consequence of (ii). 

105 Viscosity solutions

As we have seen, if the value function is regular enough, then it solves equation (26). This equation can be written in the abstract form

2 F (t, x, v(t, x), ∂tv(t, x), Dv(t, x),D v(t, x)) = 0 plus limit conditions. (29) In general this equation does not admit a classical solution. However we can expect to define a weak type of solutions such that (29) is well posed, i.e. it has a unique solution. The correct manner is the one of f viscosity solutions (introduced in this context by Lions in [15, 16])

However, we need to pay attention to the following fact: One of the crucial assumptions in the theory of viscosity solutions is that F (t, x, v, q, p, P ) is monotone w.r.t. P . The usual convention for the type of monotonicity is that F is not increasing w.r.t. P . This is

106 also called ellipticity condition. In order to follow this convention, we have to write (26) as

2 n −∂tv(t, x) + G(t, x, Dv(t, x),D v(t, x)) = 0 for (t, x) ∈ [0,T ) × R n v(T, x) = g(x) for x ∈ R , (30) where

n > 1  > o G(t, x, p, P ) := supu∈U −`(t, x, u) − b(t, x, u) p − 2Tr σ(t, x, u)σ(t, x, u) P . ˆ = supu∈U −H(t, x, u, p, P ). Evidently, we have G(t, x, p, X) ≤ G(t, x, p, Y ) if X ≥ Y and thus our operator is elliptic. Of course, we have

G(t, x, Dv(t, x),D2v(t, x)) = sup {−`(t, x, u) − A[u]v(t, x)} . u∈U

107 n  [Definition of viscosity solutions] (i) We say that v ∈ C([0,T ] × R ) is a viscosity subsolution of (30) if v(T, x) ≤ g(x), 1,2 n and for any φ ∈ C ([0,T ] × R ), whenever v − φ attains a local maximum at (t, x) we have 2 −∂tφ(t, x) + G(t, x, Dφ(t, x),D φ(t, x)) ≤ 0.

n (ii) We say that v ∈ C([0,T ] × R ) is a viscosity supersolution of (30) if

v(T, x) ≥ g(x),

1,2 n and for any φ ∈ C ([0,T ] × R ), whenever v − φ attains a local minimum at (t, x) we have 2 −∂tφ(t, x) + G(t, x, Dφ(t, x),D φ(t, x)) ≥ 0.

n (iii) We say that v ∈ C([0,T ] × R ) is a viscosity solution of (30) if it is both, a viscosity subsolution and a viscosity supersolution.

108  [Some important remarks] The following properties are proposed as an exercise! (in increasing order of difficulty) (i) We can suppose that at the test point (x, t) we have v(x, t) = φ(x, t). (ii) We can change “local maximum”, “local minimum” by “local maximum strict” and “local minimum strict”. (iii) We can change “local strict maximum”, “local strict minimum” by “global strict maximum” and “global strict minimum” .

In our stochastic framework, it will be convenient to work with the last notion in order to avoid localization arguments with stopping times (see the proof of theorem 22).

 [The value function is a viscosity solution of (30)] We have

Theorem 22. The value function v is a viscosity solution of (30).

Proof: We essentially repeat the proof for the regular case, but taking into the account the test 1,2 n functions in order to be able to differentiate. Let φ ∈ C ([0,T ] × R ) and (t, x) such that v − φ has a global maximum at (t, x). Let u ∈ U and define u(·) ∈ Ut defined as u(t, ω) ≡ u. Let us denote by

109 x(·) the associated state, with x(t) = x. We have, for any t¯ > t,

"Z t¯ # E {v(s, x(s)) − φ(s, x(s)) − [v(t, x) − φ(t, x)]} ds ≤ 0. t

By Itˆo’sformula, we obtain

"Z t¯ # E {v(s, x(s)) − v(t, x) − ∂tφ(s, x(s)) − A[u]φ(s, x(s))} ds ≤ 0 t

Using the dynamic programing principle, we get

"Z t¯ # "Z t¯ # E {v(s, x(s)) − v(t, x)} ds ≥ E −`(s, x(s), u)ds . t t

Therefore, "Z t¯ # E {−∂tφ(s, x(s)) − `(s, x(s), u) − A[u]φ(s, x(s))} ds ≤ 0 t By taking the supremum w.r.t. u ∈ U we obtain

"Z t¯ # n 2 o E −∂tφ(s, x(s)) + G(s, x(s), Dφ(s, x(s)),D φ(s, x(s)) ds ≤ 0. t

110 Dividing by t − t¯ and taking the limit t¯ → t, yields the subsolution property.

1,2 n For the supersolution property, let φ ∈ C ([0,T ] × R ) and (t, x) such that v − φ has a global minimum at (t, x). Thus, for t¯ > t, we have for every adapted x(·) starting at x,

"Z t¯ # E {v(s, x(s)) − φ(s, x(s)) − [v(t, x) − φ(t, x)]} ds ≥ 0. (31) t

Using the dynamic programming principle, chose uε,s¯ ∈ Ut, with associated state xε,s¯, such that " # " # Z t¯n o Z t¯   ¯ E v(t, x) − v(s, xε,t¯(s)) ds ≥ E ` s, xε,t¯(s), uε,t¯(s) ds − ε(t − t). t t

Combining with (31) we get " # Z t¯n   o ¯ E −` s, xε,t¯(s), uε,t¯(s) + φ(t, x) − φ(s, xε,t¯(s)) ds ≥ −ε(t − t). t

By Itˆo’sformula we obtain " # Z t¯n   o ¯ E −∂tφ(s, xε,t¯(s)) − ` s, xε,t¯(s), uε,t¯(s) − A[uε,t¯(s)]φ(s, xε,t¯(s)) ds ≥ −ε(t − t), t

111 which implies that " # Z t¯n o 2 ¯ E −∂tφ(s, xε,t¯(s)) + G(s, xε,t¯(s), Dφ(s, xε,t¯(s)),D φ(s, xε,t¯(s)) ds ≥ −ε(t − t). t

Dividing by t¯− t, letting t¯ ↓ t and noting that ε is arbitrary, we obtain that v is a supersolution. 

 [Basic results] We first provide a proposition that implies that viscosity solutions qualify as generalized solutions:

1,2 n Proposition 15. [Equivalence of notions under regularity] Let v ∈ C ([0,T ] × R . Then v is a viscosity solution of (30) iff it is a classical solution.

Proof: The proof for the subsolution (supersolution) property follows very easily from the first and second order optimality conditions for a maximum (minimum) of v − φ at some (x, t), together will the ellipticity of G. 

In fact, in order to satisfies the equation point-wisely, we can ask, instead of v ∈ C1,2, that v admits a “first order expansion in t” and a second order expansion at x.

112 Lemma 8. [Relation with “second order expansions in x”] Suppose that for all (t, x) there exists X (for notational convenience we denote D2v(t, x) = X) such that

1 2 v(s, y) = v(t, x)+∂tv(t, x)(s−t)+Dv(x, t)(y −x)+ 2hX(y −x), y −xi+o(|s−t|+|x−y| ).

Then, if v is a viscosity subsolution (supersolution) of (30), we have that

2 −∂tv(t, x) + G(t, x, Dv, D v(t, x)) ≤ (≥)0.

Proof: For the subsolution case, it is enough to test using the quadratic function

1 2 φε,δ(s, y) := v(t, x)+(∂tv(t, x)+ε)(s−t)+Dv(x, t)(y −x)+ 2h[D v(x, t)+δI](y −x), y −xi

In fact, we have that v(t, x) − φε,δ(s, y) has a local maximum at (t, x). We thus find that

2 −(∂tv(t, x) + ε) + G(t, x, Dv, D v(t, x) + δI) ≤ 0. and the result follows by letting ε, δ → 0. An analogous argument applies for the supersolution property. 

113 Now we turn our attention to an important and rather surprising stability result.

Proposition 16. [Stability result] Let vε be a solution of

2 n −∂tvε + Gε(t, x, Dvε,D vε) = 0 in (0,T ) × R , n (32) vε(x, T ) = gε(x) in (0,T ) × R , where, for every ε > 0, the functions Gε and gε are continuous. Suppose that as ε ↓ 0, we have Gε → G, hε → h and vε → v uniformly over any compact set. Then v is a viscosity solution of

2 n −∂tv + G(t, x, Dv, D v) = 0 in (0,T ) × R , n (33) v(x, T ) = g(x) in (0,T ) × R .

Proof: The proof, as well as a several of arguments in this theory, is based on the the basic lemma 9 below. Let us prove that v is a subsolution of (33). Let φ ∈ C1,2 and (t, x) such that v − φ has a strict maximum at (t, x). Using the lemma below, we have that the existence of (tn, xn) → (t, x) such that vn − φ has a local maximum at (tn, xn). Using that vn is a subsolution of (32), we have

2 −∂tφ(tn, xn) + Gε(tn, xn, Dφ(tn, xn),D φ(tn, xn)) ≤ 0.

114 By passing to the limit, we obtain the result. The supersolution property follows with the same procedure. 

Lemma 9. Let v : O → R be continuous (where O is a domain on a Euclidean space), with a strict local maximum at x0. Suppose that there exists a sequence vn → v locally uniformly to v. Then there exists a sequence xn of local minimums of vn such that xn → x0.

Proof of the lemma: There exists a δ > 0 such that v(x0) > v(x) for x ∈ Bδ(x0). Therefore, since vn → v uniformly on Bδ(x0), after some n0, we will have that any maximum xn of vn in Bδ(x0) belongs to Bδ(x0). In order to prove that xn → x0, let x¯ be a limit point of xn. Then, because of the uniform convergence, vn(xn) → v(¯x) = v(x0) (uniform convergence implies convergence of the maximum of the functions, to see this note that vn(xn) ≥ vn(z) for all z ∈ Bδ(x0) and then pass to the limit). Therefore, because x0 is the unique maximum of v in Bδ(x0) we get that x¯ = x0, and the proof is completed. 

Now we state the important issue of uniqueness of a viscosity solution for (30).

Theorem 23. [Uniqueness theorem] The value function is the unique solution of (30).

115 As a important corollary we get the an error estimate for the vanishing viscosity approximation of (30). In fact, equation (30) can be degenerate and in general does not have regular solutions. A natural way of regularizing the equation is to consider

2 n −∂tvε − ε∆vε + G(t, x, Dvε,D vε(t, x)) = 0 for (t, x) ∈ [0,T ) × R n vε(T, x) = g(x) for x ∈ R , (34) In this case, due to the regularizing Laplacian, it can be shown that there exists a unique 1,2 solution vε ∈ C of (34). The unicity result of (23) and corollary 3 imply Proposition 17. [Vanishing viscosity] There exists a constat K > 0 such that

√ n |vε(t, x) − v(t, x)| ≤ K ε ∀ (t, x) ∈ [0,T ] × R , ε > 0.

We will not prove exactly the uniqueness theorem. However, we will prove a comparison principle for a model second order HJB equation. Up to important technical matters (see [24]), the same procedure allows to prove the uniqueness result of a viscosity solution of (30).

116 n Let O ⊆ R be an open and bounded set. Let us consider the equation   H v(x), Dv(x),D2v(x) = 0, x ∈ O. (35)

n n n where H : R × R × R × S → R. We assume: (i) H is continuous.

(ii) [Ellipticity condition] We suppose that H is non-increasing w.r.t. the last variable, i.e. for every A, B ∈ Sn, with A ≤ B we have

n H(v, p, A)≥H(v, p, B) for all (v, p) ∈ R × R .

(iii) There exists γ > 0 such that for all v ≥ v2 we have

n n H(v1, p, P ) − H(v2, p, P )≥γ(v1 − v2) for all (p, P ) ∈ R × S .

The definition of subsolutions and supersolutions is of course analogous to the one given for the parabolic case. The result that we want to prove is the following:

117 Theorem 24. Let v and vˆ be a subsolution and a supersolution, respectively, of (35). If v ≤ vˆ in ∂O, then v ≤ vˆ in O.

Proof for the regular case: Let us assume that v, vˆ ∈ C2(Ω) (or more generally that the admit a second order expansion). We argue by contradiction. Suppose that M := supx∈O[v(x) − vˆ(x)] = v(x0) − vˆ(x0)> 0, for some x0 ∈ O. Then,

2 2 Dv(x0) = Dvˆ(x0) and D v(x0) ≤ D vˆ(x0). (36)

Moreover, by the lemma 8, we must have     H v(x), Dv(x),D2v(x) ≤ 0, −H vˆ(x),Dvˆ(x),D2vˆ(x) ≤ 0. (37)

By combining (36) and (36) and using the ellipticity property, we obtain

 2   2  H v(x0),Dvˆ(x0),D vˆ(x0) − H vˆ(x0),Dvˆ(x0),D vˆ(x0) ≤ 0, which yields a contradiction with assumption (iii). 

For the proof of the general case, we will try to “mimic” the above proof. We will need need to approximate v and vˆ in such a way that they become almost two times

118 differentiable. The good approximation is by semiconvex functions and one way to do it is to use the so-called inf-convolutions and sup-convolutions.

The following arguments and the proof of the uniqueness result are based on the notes [8].

Why semi-convex functions are the good regularization? We provide below two theorems that answer this question: Theorem 25. [Alexandrov theorem] Let w be a semiconvex function with constant M over the open set O. Then, for a.a. x ∈ O, the function w admits a second order expansion, i.e. there exist X ∈ Sn (which depends on x) such that

1  2 w(y) = w(x) + Dw(x)(y − x) + 2hX(y − x), y − xi + o |y − x| .

Moreover, we have that X ≥ −2MIn.

Proof: See [10]. 

119 Theorem 26. [Jensen maximum principle] Let w : O → R be a semi-convex function with an strict local maximum at some x0 ∈ O. More precisely, set α := w(x0) − max∂Br(x0) w, where r > 0. For δ > 0, define  n Eδ := x ∈ Br(x0); ∃ p ∈ R , |p| ≤ δ, w(y) − hp, y − xi ≤ w(x) ∀ y ∈ Br(x0) .

α Then, if δ ∈ (0, r ), the set Eδ has a strictly positive measure. Moreover

n L(Eδ) ≥ cδ , for some constant c > 0.

Proof: See [13]. 

We will use both theorems in the following way: If a point x ∈ Eδ and it admits a second order expansion then

2 |Dw(x)| ≤ δ, D w(x) ≤ 0, and w(y) ≤ w(x) + hDw(x), y − xi ∀ y ∈ Br(x0).

This remark allows us to prove the following proposition:

120 Proposition 18. Consider a semiconvex function w with a local strict maximum at x0. Then, there exists a sequence xn → x0 such that w has a second order expansion at x0 and 2 Dw(xn) → 0 and D w(xn) ≤ 0.

Proof: It suffices to consider points xn in E 1 such that w has a second order expansion at xn. Thus n the properties announced follow for the above remark, except for the convergence of xn to x. But

w(x0) − hDw(xn), x0 − xi ≤ w(xn), which, by passing to the limit and using that x0 is a strict local minimum, imply that every limit point of xn is equal to x0. The result follows. 

Now we state a version of the above propositions that do not ask for a strict local maximum.

Proposition 19. Consider a semiconvex function w with a local maximum at x0. Then there exists a matrix X ∈ Sn and a sequence xn → x0 such that w has a second order

121 expansion at x0 and

2 Dw(xn) → 0 and D w(xn) → X ≤ 0.

1 2 Proof: Apply proposition 18 to the semiconvex function wk(x) := w(x) − k|x − x0| and use a diagonal procedure (exercise!). 

n Given a compact set K ⊆ R and a continuous function v := K → R we define for α > 0 its sup-convolution vα as

 1  vα(x) := sup v(y) − |x − y|2 | y ∈ K for all x ∈ K. α

Analogously, for a continuous function v := K → R, we define for α > 0 its inf-convolution vα as   1 2 vα(x) := inf v(y) + |x − y| | y ∈ K for all x ∈ K. α

122 α Note that vα = −(−v) , which allow us to extend properties of the sup-convolution to the inf-convolution.

 [Fundamental properties of the inf-convolution] Lemma 10. [vα is semiconvex and Lipschitz] For all α > 0, vα is Lipschitz in K, and 1 semiconvex with constant α in the interior of K. Proof: The Lipschitz property follows easily from the fact that K is compact. On the other hand, note that 1  1   vα(x) + |x|2 = sup v(y) − |y|2 − 2hx, yi , α α from which the semi-convexity (with constant 1/α) follows.  Lemma 11. [A convergence result for vα] For all α > 0, we have that vα ≥ v. Also, we have that vα(x) → v(x) for all x ∈ K. Moreover,

α lim v (xα) = v(x). (38) α↓0, xα→x

Proof: Taking y = x in the definition of vα we get that vα(x) ≥ v(x). Now, we prove

123 α lim supxα→x v (xα) ≤ v(x). In fact, by compactness we have the existence of yα such that

α 1 1 v (xα) = v(yα) − |xα − yα| ≤ M − |xα − yα|, α α for M := supK |u|. This implies that yα → x, because

1 α |xα − yα| ≤ M − v (xα) ≤ M − v(xα) ≤ 2M. α

αn α Consider a subsequence such that lim v (xαn) = lim supxα→x v (xα). We have

αn lim v (xαn) ≤ lim v(yαn) = v(x).

We use this result, in order to prove (38). In fact,

α α v(x) = lim v(xα) ≤ lim inf v (xα) ≤ lim sup v (xα) ≤ v(x).

. 

Lemma 12. [vα is still a subsolution on a smaller domain] If v is a subsolution of (35),

124 then vα is a subsolution of the same equation but on the open set

 1  " #  2  Oα,k := x ∈ O ; vα > −k, d(x, ∂O) > α(sup v + k)  O 

Proof: See [8]. 

Finally, we will need another simple lemma whose easy proof is left to the reader.

Lemma 13. Consider a continuous function w : K → R and a sequence of continuous α α α functions w : K → R such that w ≥ w and lim supα↓0, xα→x w (xα) = w(x). Then for all ε > 0 there exists α0 > 0 such that for all α ∈ (0, α0) and maximum α point xα of w , there exists a maximum point x of w such that |x−xα| ≤ ε. Moreover,

lim max wα = max w. α↓0 K

125 Proof of the comparison principle: We argue by contradiction. If the result is not true, since we have v ≤ vˆ in ∂O, we have the existence of xˆ ∈ O such that v(ˆx) > vˆ(ˆx). Let us denote by M = maxx∈O{v(x) − vˆ(x)} > 0. As for the proof of the comparison principle for first order equations (see e.g. [3]) we double variables, by introducing, for ε > 0,

1 2 wε(x, y) := v(x) − vˆ(y) − |x − y| . ε2 which we know that “approximates well” the function v − vˆ (see e.g. [3]). By continuity and convergence we have maxOˆ wε > 0 for ε small enough. Also, there exists θ > 0 (independent of ε) such that for every maximum point (xε, yε) of wε we have

d(xε, ∂O) > θ ; d(yε, ∂O) > θ. for all ε small enough.

Now fix such an ε. To complete the proof we follow the following steps:

(i) We modify the regularization wε to

α 1 2 wε,α(x, y) = v (x) − vˆα(y) − |x − y| for all α > 0. (39) ε2 This allows to obtain functions that have a.e. a second order expansion and to apply optimality conditions at an optimal (xε, yε). In this way we try to proceed as if originally the functions were differentiable. To put vα it is natural because we now that it is a subsolution over a smaller domain Oα. Analogously, to put vˆα is natural since it is a super-solution over a smaller domain Oα.

126 α α (ii) We construct O and Oα in such a way that any maximum (xα, yα) of wε,α satisfies that xα ∈ O and yα ∈ Oα. In fact, to do this we need the following lemma (whose easy proof is left to the reader):

Lemma 14. Let f α and f be u.s.c. over a compact set K. Suppose that

α α f ≥ f and that lim sup f (xα) = f(x). α↓0, xα→x

α α Then, for every ε > 0, there exists α0 > 0 such that: For all α ∈ (0, α0) and x ∈ argmax f there exists x ∈ argmax f such that |x − xα| ≤ ε. Moreover

lim max f α = max f. α↓0 K K

Using the above lemma we see that for α small enough, we have that (xα, yα) is uniformly close to (xε, yε) ∈ argmax wε. In particular, d(xα, ∂O) > θ/2 and d(yα, ∂O) > θ/2. The same lemma implies that lim max wα,ε = wε. α↓0 O×O From the definition in terms of v and vˆ and the fact that

α 1 2 v (xα) − vˆα(yα) − |xα − yα| = max wε,α > 0. ε2

127 α we readily have that v (xα) and vˆα(yα) are bounded by a constant independent of (ε, α). Therefore there exists a constant k such that

( 1 ) α α h i2 xα ∈ O := x ∈ O ; v (x) > −k ; d(x, ∂O) > α(supO v + k) , ( 1 ) h i2 yα ∈ Oα := x ∈ O ;v ˆα(x) < k ; d(x, ∂O) > α(infO vˆ + k)

α (iii) We fix α and a maximum (¯x, y¯) ∈ O × Oα. We obtain information from this optimality thanks to the semiconvexity and proposition 19. In fact, there exists a sequence (xn, yn) → (¯x, y¯) and a matrix A ∈ S2n, with A ≤ 0, such that

2 Dwε,α(xn, yn) → 0 ; D wε,α(xn, yn) → A. (40)

∞ (iv) In view of the particular structure of wε,α (a decoupled function plus a C function), we have an α important information: the fact that wε,α admits a second order expansion at (xn, yn) implies that v admits a second order expansion at xn and vˆα admits a second order expansion at yn.

128 Moreover, we have

 α 2 2  Dwε,α(xn, yn) = Dv (xn) − (xn − yn), −Dvˆα(yn) + (xn − yn) ε2 ε2 2 α !   (41) 2 D v (xn) 0 2 In −In D wε,α(xn, yn) = 2 − 2 0 −D vˆα(yn) ε −In In

The convergence in (40) implies the existence of X, Y in Sn such that:

α 2 2 Dv (xn) → (¯x − y¯),Dvˆα(yn) → (¯x − y¯) (42) ε2 ε2 and  X 0  2  I −I  A = − n n (43) 0 −Y ε2 −In In n n Testing with (z, z) ∈ R × R we get that X ≤ Y . (iv) Intuitively, we are almost done, because if we do the analogy with the regular case (recall that x¯ should be like y¯), we have that (42) is like Dv(¯x) = Dvˆ(¯x) and (43) is like D2v(¯x) ≤ D2vˆ(¯x). Let us finish the proof in the correct manner. Since vα is a subsolution of (35) in Oα and it has a second order expansion at every xn, we get α α 2 α H(v (xn), Dv (xn),D v (xn)) ≤ 0. The continuity of H implies  2  H vα(¯x), (¯x − y¯),X ≤ 0. ε2

129 Analogously, since vˆα is a subsolution of (35) in Oα and it has a second order expansion at every yn, we get  2  H vˆα(¯y), (¯x − y¯),Y ≥ 0. ε2 Subtracting the inequalities we get     α 2 2 H v (¯x), (¯x − y¯),X − H vˆα(¯y), (¯x − y¯),Y ≤ 0. ε2 ε2

After an easy calculation, using the ellipticity assumptions, we get

wε,α(¯x, y¯) ≤ 0 and we have a contradiction since wε,α(¯x, y¯) > 0. 

130 References

[1] R.B. Ash. Basic probability theory. Wiley, NY, 1970. [2] R.B. Ash. Real Analysis and Probability. Academic Press, NY, 1972. [3] M. Bardi and I. Capuzzo Dolcetta. Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Birkauser, 1996. [4] R. Bellman. Dynamic Programming. Princeton Univ. Press, Princeton, New Yersey, 1957. [5] P. Billingsley. Convergence of Probability Measures. Wiley, NY, 1968. [6] B. Bouchard and N. Touzi. Weak dynamic programming principle for viscosity solutions. SIAM J. Control Optim., 49-3:948–962, 2011. [7] L. Breiman. Probability. Addison-Wesley Publishing Company, Reading, MA, 1968. [8] P. Cardaliaguet. Solutions de viscosit´ed‘´equationselliptiques et paraboliques non lin´eaires. Lecture Notes for the DEA program at Rennes, 2004. [9] N. ElKaroui. Les aspects probabilistes du contrˆole stochastique. Lecture Notes in Math. 876, 1981. [10] L.C. Evans and R.F. Gariepy. Measure theory and fine properties of functions. CRC Press, Boca Rat´on, FL, 1992. Studies in Advanced Mathematics. [11] W.H. Fleming and H.M. Soner. Controlled Markov processes and viscosity solutions. Springer, New York, 1993. [12] N. Ikeda and S. Watanabe. Stochastic differential equations and diffusion processes. Second Edition, North-Holland Publishing Co., Amsterdam, 1989.

131 [13] R. Jensen. The maximum principle for viscosity solutions of fully nonlinear second order partial differential equations. Arch. Ration. Mech. Anal., 101-1:1–27, 1988. [14] I. Karatzas and S.E. Shreve. Brownian Motion and Stochastic Calculus. Second Edition Springer-Verlag, New York, 1991. [15] P.-L. Lions. Optimal control of diffusion processes and Hamilton-Jacobi-Bellman equations. I. The dynamic programming principle and applications. Comm. Partial Differential Equations, 8(10):1101– 1174, 1983. [16] P.-L. Lions. Optimal control of diffusion processes and Hamilton-Jacobi-Bellman equations. Part 2: viscosity solutions and uniqueness. Comm. Partial Differential Equations, 8:1229–1276, 1983. [17] P.A. Meyer. Probability and Potentials. Blaisdell Publishing Company, Waltham, Mass., 1966. [18] L. Mou and J. Yong. A variational formula for stochastic controls and some applications. Pure and Applied Mathematics Quarterly, 3:539–567, 2007. [19] K.R. Parthasarathy. Probability Measures on Metric Spaces. Academic Press, New York, 1967. [20] H. Pham. Optimisation et contrˆole stochastique appliqu´es `a la finance, volume 61 of Math´ematiques & Applications. Springer, Berlin, 2007. [21] P. Protter. Stochastic integration and differential equations. Springer-Verlag, Berlin, 2nd edition,, 2004. [22] J. Steele. Stochastic calculus and financial applications. Springer-Verlag, New York, 2001. [23] N. Touzi. Optimal Stochastic Control, Stochastic Target Problems, and Backward SDEs. Lecture Notes at the Fields Institute, 2010.

132 [24] J. Yong and X.Y. Zhou. Stochastic controls, Hamiltonian systems and HJB equations. Springer-Verlag, New York, Berlin, 2000.

133