On Numerical Stochastic Optimal Control Via Bellman's Dynamic Programming Principle Prince Osei Aboagye University of Texas at El Paso, [email protected]

On Numerical Stochastic Optimal Control Via Bellman's Dynamic Programming Principle Prince Osei Aboagye University of Texas at El Paso, Kpo.Aboagye@Gmail.Com

University of Texas at El Paso DigitalCommons@UTEP

Open Access Theses & Dissertations

2018-01-01 On Numerical Stochastic Optimal Control Via Bellman's Dynamic Programming Principle Prince Osei Aboagye University of Texas at El Paso, [email protected]

Follow this and additional works at: https://digitalcommons.utep.edu/open_etd Part of the Mathematics Commons

Recommended Citation Aboagye, Prince Osei, "On Numerical Stochastic Optimal Control Via Bellman's Dynamic Programming Principle" (2018). Open Access Theses & Dissertations. 1386. https://digitalcommons.utep.edu/open_etd/1386

This is brought to you for free and open access by DigitalCommons@UTEP. It has been accepted for inclusion in Open Access Theses & Dissertations by an authorized administrator of DigitalCommons@UTEP. For more information, please contact [email protected]. ON NUMERICAL STOCHASTIC OPTIMAL CONTROL VIA

BELLMAN’S DYNAMIC PROGRAMMING PRINCIPLE

PRINCE OSEI ABOAGYE

Master’s Program in Mathematical Sciences

APPROVED:

Michael Pokojovy, Ph.D., Chair

Sangjin Kim, Ph.D.

Thompson Sarkodie-Gyan, Ph.D.

Prince Osei Aboagye

2018 ON NUMERICAL STOCHASTIC OPTIMAL CONTROL VIA

BELLMAN’S DYNAMIC PROGRAMMING PRINCIPLE

PRINCE OSEI ABOAGYE, B.A.

THESIS

Presented to the Faculty of the Graduate School of

The University of Texas at El Paso

in Partial Fulﬁllment

of the Requirements

for the Degree of

MASTER OF SCIENCE

Department of Mathematical Sciences

THE UNIVERSITY OF TEXAS AT EL PASO

August 2018 Acknowledgements

I give all the glory and honor to God for successfully taking me through my Master’s degree. I am grateful to the Department of Mathematics at UTEP for this educational opportunity, to the professors that guided me and kept me in track. I also thank my committee for taking the time to attend my presentation and for their feedback, in particular, special thanks to my mentor Dr. Michael Pokojovy for guiding me into an amazing career path in mathematics and statistics and for his lasting patience and understanding. Lastly, my warm appreciation goes to my parents and siblings for the moral and emotional support. I would like to also express my heartfelt gratitude to my grandfather, Mr. George Ansanyi, who has single-handedly funded my education from senior high school to the university level where I completed my undergraduate degree. Grandpa, God richly bless you. Without them this wouldn’t have happened, they make me feel blessed.

iv Abstract

In this work, we present an application of Stochastic Control Theory to the Merton’s portfolio optimization problem. Then, the dynamic programming methodology is applied to reduce the whole problem to solving the well-known HJB (Hamilton-Jacobi-Bellman) equation that arises from the Merton’s portfolio optimization problem subject to the power utility function. Finally, a numerical method is proposed to solve the HJB equation and the optimal strategy. The numerical solutions are compared with the explicit solutions for optimal consumption and investment control policies.

v Table of Contents

Page

Acknowledgements ...... iv

Abstract ...... v

Table of Contents ...... vi

List of Figures ...... x

1 Introduction ...... 1

2 Stochastic Calculus ...... 5

2.1 Preliminaries ...... 5

2.1.1 Probability Space ...... 5

2.1.2 Random Variable ...... 5

2.1.3 Brief Sketch of Lebesgue’s Integral ...... 6

2.1.4 Convergence Concepts for Random Variables ...... 9

2.1.5 The Lebesgue-Stieltjes Integral ...... 11

2.2 Stochastic Processes and Brownian Motion ...... 11

2.2.1 Discrete Stochastic Processes ...... 12

2.2.2 Continuous Stochastic Processes ...... 13

2.2.3 Martingales ...... 14

2.2.4 Stopping Times and Optional Stopping ...... 15

2.2.5 The Wiener Process and White Noise ...... 16

2.2.6 Existence: A Multiscale Construction ...... 18

vi 2.2.7 White Noise ...... 19

2.3 The Stochastic Integral ...... 20

2.3.1 Some Elementary Properties ...... 22

2.3.2 The ItˆoCalculus ...... 23

2.3.3 Girsanov’s Theorem ...... 24

2.3.4 The Martingale Representation Theorem ...... 25

3 Stochastic Control and Dynamic Programming ...... 27

3.1 Stochastic Control Problems in Standard Form ...... 27

3.2 The Dynamic Programming Principle ...... 30

3.2.1 A weak Dynamic Programming Principle ...... 30

3.3 The Dynamic Programming Equation ...... 31

3.3.1 Continuity of the Value Function for Bounded Controls ...... 32

4 Optimal Stopping and Dynamic Programming ...... 33

4.1 Optimal Stopping Problems ...... 33

4.2 The Dynamic Programming Principle ...... 34

4.3 The Dynamic Programming Equation ...... 35

4.4 Regularity of the Value Function ...... 35

4.4.1 Finite Horizon Optimal Stopping ...... 35

5 Solving Control Problems by Veriﬁcation ...... 36

5.1 The Veriﬁcation Argument for Stochastic Control Problems ...... 36

5.2 The Veriﬁcation Argument for Optimal Stopping Problems ...... 39

6 Introduction to Viscosity Solutions ...... 41

vii 6.1 Intuition Behind Viscosity Solutions ...... 41

6.2 Deﬁnition of Viscosity Solutions ...... 42

6.3 First Properties ...... 43

6.4 Comparison Result and Uniqueness ...... 44

6.4.1 Comparison of Classical Solutions in a Bounded Domain ...... 45

6.4.2 Semijets Deﬁnition of Viscosity Solutions ...... 45

6.4.3 The Crandall-Ishii’s lemma ...... 47

6.4.4 Comparison of Viscosity Solutions in a Bounded Domain ...... 47

6.5 Comparison in Unbounded Domains ...... 48

7 Finite Diﬀerence Numerical Approximations ...... 49

7.1 Introduction ...... 49

7.2 Controlled Discrete Time Markov Chains ...... 49

7.3 Finite Diﬀerence Approximations to HJB Equations ...... 52

7.4 Convergence of Finite Diﬀerence Approximations ...... 60

8 Application: Merton’s Portfolio Optimization Problem ...... 62

8.1 The Merton’s Portfolio Optimization Problem and the HJB Equation . . . 63

8.2 The HJB Equation ...... 64

8.2.1 Utility Function ...... 66

9 Numerical Results ...... 69

9.1 Brief Summary of the Steps Involved in Obtaining the HJB Equations As- sociated with the Merton’s Portfolio Problem ...... 69

10 Conclusions and Further Directions ...... 76

Bibliography ...... 77

viii Appendix ...... 79

Curriculum Vitae ...... 88

ix List of Figures

9.1 Comparing theoretical and numerical HJB...... 73

9.2 Numerical approximation of the optimal investment and consumption strategies...... 74

9.3 Wealth process ...... 75

x Chapter 1: Introduction

The main thrust of the Stochastic Control Theory is to determine an optimal control strategy for a controlled Markovian diﬀusion with the goal of optimizing a certain criterion of interest such as maximizing the expected discounted utility of an investment strategy or the expected policy return of the learning strategy, minimizing the average costs or the mean/median risk of a business operation or the discrepancy from a desired trajectory of a self-driving vehicle, etc.

High-frequency data commonly occur in various areas of science and engineering ranging from mathematical finance (high-frequency trading), data science (time series analysis) and computer science (signal processing) to quality engineering (control charts) and geological sciences (analyzing seismic data), etc. In many cases, the underlying data generating process can adequately be modeled as a Markovian diffusion being a solution to a wide class of stochastic differential equations.

The canonical application area of Markovian diffusions is mathematical finance. Indeed, Markovian diffusions or stochastic differential equations are a hypernym for a whole class of stochastic models such as the standard geometric Brownian motion, Heston model, CEV model, SABR volatility model, (continuous) GARCH model, Chen model, Vasiˇcekmodel, Hull-White model, etc. (only to name a few). Due to their generality, Markovian diffusions are further applied to

1. approximate computation of the ARL (average run length) and control limits for CUSUM- and EWMA-type control charts,

2. designing and assessing sequential tests in bio-statistical applications,

3. describing the limiting behavior of HMM (Hidden Markov Models),

1 4. approximating the limiting behavior of time-discrete econometric models (GARCH, ARMA, etc.),

5. design of experiments for temporal data,

6. longitudinal and functional data analysis (FDA),

7. machine learning (reinforcement learning, temporal diﬀerence learning, etc.),

8. pattern recognition and discriminant analysis for geological data, etc.

Once the data are analyzed and the parameters of the underlying data generating process are estimated, further inference procedures such as predictive analysis, statistical tests, confidence interval or region construction, etc., can be performed by the analyst to facilitate the decision making procedure. In many cases, the temporal evolution of the process can be controlled through an external input. For example, consider a portfolio managed by a financial institution. The portfolio value is directly affected by the portfolio structure and the price of the assets in the portfolio. Whereas the former can be selected according to an investment strategy, the latter (to a large extent) is determined by the market and, thus, falls beyond one’s control.

Research in this field was pioneered by Bellman and Pontryagin. However a large number of research on control theory has developed over recent years, inspired in particular by problems from mathematical finance. The dynamic programming principle (DPP) to a stochastic control problem for Markov processes leads to a nonlinear partial differential equation (PDE), called the Hamilton-Jacobi- Bellman (HJB), which was pioneered by Bell- man. These PDE are named after Sir William Rowan Hamilton, Carl Gustav Jacobi and Richard Bellman [1].

Even parameter estimation for Markovian diﬀusions can often be reduced to optimal control problems [2], etc. Due to their complexity, only very few optimal control problems for stochastic diﬀerential equations can be solved explicitly. Thus, there is an increasing need

2 for fast, eﬃcient and reliable numerical techniques for solving this kind of problems. The goal of this thesis is to present a numerical technique for solving the HJB equation and the optimal strategy arising from the Merton’s portfolio optimization problem.

In the classical Merton portfolio optimization problem, an investor endowed with an initial capital consumes a certain amount of the capital and invests the remaining wealth into the financial market [3]. The investor has two investment options; a riskless asset with constant interest rate and a risky asset whose price is assumed to follow a geometric Brownian motion. Given a fixed investment time period, the investors objective is to find an optimal consumption-investment strategy in order to maximize the expected utility of wealth at the terminal trading time and of intermediate consumption [3].

In his landmark paper (Merton 1969), Merton formulated the optimal investment and consumption problem as a stochastic optimal control problem which could be solved with the dynamic programming principle. As a result, it leads to the well-known Hamilton- Jacobi-Bellman (HJB) equation, potentially a fully nonlinear partial differential equation (PDE). The main difficulty of the Merton problem is the nonlinearity of the HJB equation, which makes it difficult to solve analytically [4].

Our proposed approach will be to derive the dynamic programming equations for the resulting Merton portfolio optimization problem and solve it numerically using the ﬁnite diﬀerence scheme.

This thesis is organized as follows. In this Chapter I gave a brief introduction to Stochastic Calculus. Stochastic Calculus serves as a fundamental tool throughout this thesis. Then in Chapter 2 we will outline the basic structure of a stochastic optimization problem and the features it is formulated with and also we will show how the Dynamic Programming Principle can be used to derive the HJB equation. In Chapter 3, our objective will be to derive similar results as those obtained in the previous Chapter for standard stochastic control problems in the context of optimal stopping problems. In Chapter 4, we will look at how to verify that some “guess” of the value function is indeed equal to the unknown

3 value function. In Chapter 5, we will introduce the concept of viscosity solution to the HJB equation, since for deriving the HJB equation we assume our solution to be smooth enough, which is not always the case. Viscosity solutions are proposed to circumvent this difficulty. In Chapter 6, we will give a brief introduction of the topic of numerical solution of HJB partial differential equations using the finite difference scheme. In Chapter 7, we will present an application in the form of an asset allocation problem thus the Merton’s portfolio optimization problem. Finally, In Chapter 8, we will use the finite difference numerical scheme to solve the HJB equation derived from the Merton’s portfolio optimization problem and obtain the explicit solutions for optimal consumption and investment control policies. Our main contribution in this thesis was to numerically solve the Merton’s portfolio optimization problem with the Bellman’s Dynamic Programming Principle.

4 Chapter 2: Stochastic Calculus

In this Chapter we are closely following [5], [6], [7], [8] without claiming direct authorship. Our thrust is only to summarize.

2.1 Preliminaries

2.1.1 Probability Space

Deﬁnition 2.1. A triple (Ω, F,P ) is called a probability space provided Ω is a nonempty set, F is a σ-algebra of subsets of Ω, and P is a σ-additive measure on (Ω, F) with P [Ω] = 1 [5].

2.1.2 Random Variable

Deﬁnition 2.2. The Borel subsets of Rn, denoted B comprise the smallest σ-algebra of subsets of Rn containing all open sets [5].

Deﬁnition 2.3. Let (Ω, F,P ) be a probability space. A mapping X :Ω → Rn is called an n-dimensional random variable if for each B ∈ B, we have X−1(B) ∈ F. We equivalently say that X is F-measurable [5].

5 2.1.3 Brief Sketch of Lebesgue’s Integral

The Lebesgue integral of a random variable X can be deﬁned in three steps [6].

1. For a discrete random variable of the form

n X X = αi1Ai , αi ∈ R,Ai ∈ F i=1

the integral of X is deﬁned as

X [X] := X(ω) dP (ω) := α P [A ] . E ˆ i i Ω i

2. Let ξ denote the set of all discrete random variables. Consider the set of all random variables which are monotone limits of discrete random variables, i.e., deﬁne

∗ ξ := {X : ∃ u1 ≤ . . . , un ∈ ξ, un ↑ X}

Remark: X random variable with X ≤ 0 ⇒ X ∈ ξ∗. For X ∈ ξ∗ deﬁne

X dP := lim un dP. ˆ n→∞ˆ Ω Ω

3. For an arbitrary random variable X consider the decomposition X = X+ − X− with

X+ := sup(X, 0) ,X− := sup(−X, 0)

According to (2), X+,X− ∈ ξ∗.

If either E [X−] < ∞ or E [X+] < ∞, deﬁne

X dP := X+ dP − X− dP. ˆ ˆ ˆ Ω Ω Ω

6 Properties of the Lebesgue Integral:

• Linearity: (αX + βY ) dP = α X dP + β Y dP ´Ω Ω´ Ω´ • Positivity : X ≥ 0 implies X dP ≥ 0 and ´Ω

X dP > 0 ⇔ P [X > 0] > 0. ˆ Ω

• Monotone Convergence (Beppo Levi).

Let (Xn) be a monotone sequence of random variables (i.e., Xn ≤ Xn+1) with

X1 ≥ C. Then ∗ X := limXn ∈ ξ n

and

lim Xn dP = lim Xn dP = X dP. n→∞ˆ ˆ n→∞ ˆ Ω Ω Ω

• Fatou’s Lemma

(i) For any sequence (Xn) of random variables which are bounded from below, one has

lim inf Xn dP ≤ lim inf Xn dP. ˆ n→∞ n→∞ ˆ Ω Ω

• For any sequence (Xn) of random variables bounded from above, one has

lim sup Xn dP ≥ lim sup Xn dP. ˆ n→∞ n→∞ ˆ Ω Ω

• Jensen’s Inequality Let X be an integrable random variable with values in R and u : R → R a convex function. Then one has

u (E [X]) ≤ E [u (X)] .

7 Lp−Spaces (1 ≤ p < ∞)

Lp (Ω) denotes the set of all real-valued random variables X on (Ω, F,P ) p with E [|X| ] < ∞ for some (1 ≤ p < ∞). For X ∈ Lp, the Lp − norm is deﬁned as

1 p p kXkp := (E [|X| ]) .

The Lp − norm has the following properties:

1. H¨older’s Inequality

p p 1 1 Given X ∈ L (Ω) and Y ∈ L (Ω) with p + q = 1, one has

1 1   p   q |X| · |Y | dP ≤ |X|p dP · |Y |q dP dP < ∞, ˆ ˆ  ˆ  Ω Ω Ω

In particular, since |X · Y | ≤ |X| · |Y |, implies X · Y ∈ L1 (Ω) .

2. Lp (Ω) is a normed vector space. In particular, X,Y ∈ Lp X + Y ∈ Lp and one has

kX + Y kp ≤ kXkp + kY kp . (triangle inequality)

3. Lq ⊂ Lp for p < q.

8 2.1.4 Convergence Concepts for Random Variables

Deﬁnition 2.4. Let (X ) , X be random variables on (Ω, F,P ) . n n∈N

1. The sequence (Xn) converges to XP − almost surely if

P [{ω : Xn (ω) → X (ω)}] = 1.

We will then write Xn → XP − a.s.

2. The sequence (Xn) converges in probability if, for every > 0

lim P [|Xn − X| > ] = 0. n→∞

We will then write P − lim Xn = X.

p p 3. Let (Xn) be in L (Ω) for some p ∈ [1, ∞). The sequence (Xn) converges to X in L if

1 p p lim kXn − Y k = lim ( [|Xn − Y | ]) = 0. n→∞ p n→∞ E

Deﬁnition 2.5. The sequence (Xn) converges to X weakly ( in distribution) if, for every continuous bounded function f : E → R,

lim f (Xn) dPn = lim f (X) dP. n→∞ˆ n→∞ˆ Ωn Ω

D We will then write Xn → X.

9 Proposition 2.6. Let X be a random variable and (Xn) be a sequence of random variables on the probability space (Ω, F,P ) . The following implications hold [7]:

1. Xn → X a.s. =⇒ Xn → X in probability.

p 2. Xn → X in L =⇒Xn → X in probability.

p q 3. Xn → X in L =⇒Xn → X in L (q ≤ p) .

4. Xn → X in probability =⇒ Xn → X in law.

Deﬁnition 2.7. The sequence (Xn) is called uniformly integrable

lim sup |Xn| dP = 0. C→∞ n ˆ |Xn|>C

Suﬃcient conditions for uniform integrability are the following:

p 1. sup E [|X| ] < ∞ for some p > 1, n

1 2. There exists a random variable Y ∈ L such that |Xn| ≤ YP − a.s. for all n. Condition 2. is Lebesgue’s “dominated convergence” condition.

10 2.1.5 The Lebesgue-Stieltjes Integral

Consider a real-valued random variable X on (Ω, F,P ) and a Borel- measurable mapping f : R → R. i.e. we have

X f (Ω, F,P ) → (R, B,PX ) → R with −1 PX [B] : = P [X (B)] distribution of X

FX (x) : = PX [[−∞, x]] = P [X ≤ x] distribution function of X.

Then the (Lebesgues-Stieltjes) integral f(x) dFX (x) is well deﬁned due to the following ´R integral transformation formula:

Proposition 2.8.

f ◦ X dP = f dP = f(x) dF (x) . ˆ ˆ X ˆ X Ω Ω Ω

Properties of F = FX :

1. F is isotone, i.e. x ≤ y =⇒ F (x) ≤ F (y),

2. F is right continuous,

3. lim F (x) = 0; lim F (x) = 1. n→−∞ n→∞

2.2 Stochastic Processes and Brownian Motion

A stochastic process is a family of random variables {X(t), t ∈ τ} deﬁned on a probability space (Ω, F,P ) and indexed by a parameter t where t varies over a set τ . If the set τ is discrete, the stochastic process is called discrete. If the set τ is continuous, the stochastic process is called continuous. The parameter t usually plays the role of time and the random variables can be discrete valued or continuous-valued at each value of t [8].

11 2.2.1 Discrete Stochastic Processes

Let τ = {t0, t1, t2,...} be a set of discrete times. Let each element of the sequence of random variables X(t0),X(t1),X(t2),... be deﬁned on the sample space Ω. The sequence

{Xn} is said to be a Markov process if the future is independent of the past given the future. A discrete-valued Markov process is called a Markov chain.

Let P (Xn+1 = xn+1 | Xn = xn) deﬁne the one-step transition probabilities for a Markov chain. That is, P (Xn+1 = xn+1 and Xn = xn) = P (Xn+1 = xn+1 | Xn = xn)P (Xn = xn) . If the transition probabilities are independent of time tn, then the Markov chain is said to have stationary transition probabilities and the Markov chain is referred to as a homogeneous Markov chain [8].

Deﬁnition 2.9. Let (Ω, F,P ) be a probability space. A (discrete time) ﬁltration is an increasing sequence {Fn} of σ − algebras F0 ⊂ F1 ⊂ · · · ⊂ F. The quadruple

(Ω, F, {Fn} ,P ) is called a ﬁltered probability space [7].

Deﬁnition 2.10. Let (Ω, F, {Fn} ,P ) be a ﬁltered probability space. A stochastic process {Xn} is called (Fn−) adapted if Xn is Fn−measurable for every n, and is called

(Fn − predictable) if Xn is Fn−1−measurable for every n [7].

Hence, if {Xn} is adapted, then it means that Xn is a measurement of something in the past or present (up to and including time n), while in the predictable case Xn represents a measurement of something in the past (before time n) [7].

Definition 2.11. Let (Ω, F,P ) be a probability space and {Xn} be a stochastic process. X The filtration generated by {Xn} is defined as Fn = σ {X0,...,Xn}, and the process {Xn} X is Fn −adapted by construction [7].

12 2.2.2 Continuous Stochastic Processes

Let {X(t), t ∈ τ} be a continuous stochastic process deﬁned on the probability space (Ω, F,P ), where τ = [0,T ] is an interval in time and the process is deﬁned at all instants of time in the interval. A continuous-time stochastic process is a function X : τ × Ω → R of two variables t and ω and X and may be discrete-valued or continuous-valued. In particular, X(t) = X(t, ·) is a random variable for each value of t ∈ τ and X(·, ω) maps the interval τ into R and is called a sample path, a realization, or a trajectory of the stochastic process for each ω ∈ Ω [8].

Remark 2.12. From this point, we will primarily work in continuous time t ∈ [0, ∞] .

Deﬁnition 2.13. Let Xt be a stochastic process on some ﬁltered probability space

(Ω, F, {Ft} ,P ) and time set T ⊂ [0, ∞] . Then Xt is called adapted if Xt is Ft−measurable for all t, is measurable if the random variable X : T × Ω → R is B (T) × F−measurable, and is called progressively measurable if X : [0, t] ∩ T × Ω → R is

B ([0, t] ∩ T) × Ft−measurable for all t [7].

If Xt is Fn− adapted and measurable, then it is Ft−progressively measurable [7].

Lemma 2.14. Let the process Xt have continuous sample paths. Then the random variables inf Xt, supXt , lim inf Xt and lim supXt are measurable [7]. t t t t

Deﬁnition 2.15. Let Xt be an a.s. nonnegative stochastic process with continuous sample 1 paths. Then lim inf Xt ≤ lim inf (Xt). If there is a Y ∈ L such that Xt ≤ Y a.s. E t t E for all t, then E lim supXt ≤ lim supXt E (Xt). t t

13 2.2.3 Martingales

A martingale is a very special type of stochastic process.

Deﬁnition 2.16. (Martingale) Let (Ω, F, {Fn} ,P ) be a ﬁltered probability space, and let M(·) be an Fn−adapted integrable process with values in a separable Banach space W . Then M(·) is said to be a martingale if, for all r, s ∈ [t, T ] , s ≤ r,

E [M(r) | Fn] = M(s) P-a.s.

If W = R, we say that M(s) is a submartingale (respectively, supermartingale) if

E [M(r) | Fn] ≥ M(s), (respectively, E [M(r) | Fn] ≤ M(s)) P-a.s.

Lemma 2.17. (Doob’s decomposition). Let (Ω, F,P ) be a probability space, let Fn 1 be a ﬁltration and let {Xn} be Fn− adapted with Xn ∈ L for every n.

Then Xn = X0 + An + Mn P-a.s., where {An} is Fn− predictable and {Mn} is an

Fn−martingale with M0 = 0. Moreover, this decomposition is unique [7].

Deﬁnition 2.18. Let {Mn} be a martingale and {An} be a predictable process. Then n P (A · M)n = Ak(Mk −Mk−1), the martingale transform of M by A, is again a martingale, k=1 1 provided that An and (A · M)n are in L for all n [7].

Lemma 2.19. (Doob’s upcrossing lemma). Let {Mn} be a martingale, and denote by Un(a, b) the number of upcrossings of a ≤ b up to time n: that is, Un(a, b) is the number of times that Mk crosses from below a to above b before time n. Then we have + (U (a, b)) ≤ E((a−Mn) ) [7]. E n (b−a)

14 Theorem 2.20. (Martingale convergence). Let {Mn} be an Fn−martingale such that one of the following hold:

+ (a) supE(|Mn|) < ∞; or (b) supE((Mn) ) < ∞ ; or n n − 1 (c) supE((Mn) ) < ∞. Then there exists an F∞−measurable random variable M∞ ∈ L , n where F∞ = σ {Fn : n = 1, 2,...}, such that Mn → M∞ a.s.

Theorem 2.21. Let Mt be martingale, i.e., E(Mt | Fs) = Ms a.s. for any s ≤ t, and assume that Mt has continuous sample paths. If any of the following conditios hold:

+ − (a) supE(|Mt|) < ∞; or (b) supE((Mt) ) < ∞ ; or (c) supE((Mt) ) < ∞; then there t t t 1 exists an F∞−measurable random variable M∞ ∈ L s.t. such that Mt → M∞ a.s

2.2.4 Stopping Times and Optional Stopping

Deﬁnition 2.22. An (Fn−) stopping time is a random time τ :Ω → {0, 1,..., ∞} such that {ω ∈ Ω: τ (ω) ≤ n} ∈ Fn for every n.

Using the notion of a stopping time, we can deﬁne stopped processes as:

Deﬁnition 2.23. Let {Xn} be a stochastic process and τ < ∞ be a stopping time. Then

Xτ denotes the random variable Xτ(ω) (ω): ie., this is the process Xn evaluated at τ. For 0 any stopping time τ, the stochastic process Xn (ω) = Xn∧τ(ω) (ω)) is called the stopped 0 0 process: i.e., Xn = Xn for n < τ, and Xn = Xn for n ≥ τ [7].

Definition 2.24. Let (Fn) be a filtration and let τ be a stopping time. By definition,

Fτ = {A ∈ F∞ : A ∩ {τ ≤ n} ∈ Fn for all n} is the σ − algebra of events that occur before time τ (recall that F∞ = σ {Fn : n = 1, 2,...}) . If τ < ∞ a.s., then Xτ is well deﬁned and

Fτ − measurable [7].

Let Xn be a martingale (or a super- or submartingale). By the above representation for the stopped process, it is evident that even the stopped process is a martingale (or super- or submartingale, respectively) [7].

15 Lemma 2.25. If Mn is a martingale (or super-, submartingale) and τ is a stopping time, then Mn∧τ is again a martingale (or supermartingale, submartingale, respectively) [7].

Theorem 2.26. (Optional stopping). Let Mn be a martingale, and let τ < ∞ be a stopping time. Then E (Mτ ) = E (M0) holds under any of the following conditions: (a) τ < K a.s. for some K ∈ N;

(b) |Mn| ≤ K for some K ∈ [0, ∞] and all n;

(c) |Mn − Mn−1| ≤ K a.s. for some K ∈ [0, ∞] and all n, and E (τ) < ∞. If Mn is a supermartingale, then under the above conditions E (Mτ ) ≤ E (M0) [7].

2.2.5 The Wiener Process and White Noise

Brownian motion is usually described as the limit of a random walk as the time step and mean square displacement per time step converge to zero. In this section we will see that this limit actually coincides with a well deﬁned stochastic process called the Wiener process and we will study its most important properties [7].

2.2.5.1 Basic properties and Uniqueness

The Wiener process is the limit as N → ∞ of the random walk

bNtc X ξn xt (N) = √ , n=1 N

where ξn are i.i.d. random variables with zero mean and unit variance.

Lemma 2.27. (Finite dimensional distributions). For any ﬁnite set of times t1 < t2 < ··· < tn, n < ∞ , the n−dimensional random variable (xt1 (N), . . . , xtn (N)) converges in law as N → ∞ to an n−dimensional random variable (xt1 , . . . , xtn ) such that xt1 , xt2 − xt1 , . . . , xtn − xtn−1 , are independent Gaussian random variables with zero mean and variance t1, t2 − t1, . . . , tn − tn−1 , respectively.

16 Deﬁnition 2.28. A stochastic process Wt is called a Wiener process if

1. the ﬁnite dimensional distributions of Wt are those of Lemma 2.27; and

2. the sample paths of Wt are continuous.

1 n An Rn−valued process Wt = (Wt ,...,Wt ) is called an n−dimensional Wiener process if 1 n Wt ,...,Wt are independent Wiener processes.

0 Proposition 2.29. (Uniqueness). If Wt and W0 are two Wiener processes, then the C ([0, ∞]) −valued random variables W., W.0 :Ω → C ([0, ∞]) have the same law [7].

W Given a Wiener process Wt, we can introduce its natural ﬁltration Ft = σ {Ws : s ≤ t}.

Deﬁnition 2.30. Let Ft be a ﬁltration. Then a stochastic process Wt is called an

Ft−Wiener process if Wt is a Wiener process, is Ft−adapted, and Wt − Ws is independent of Fs for any t > s [7].

Lemma 2.31. An Ft−Wiener process Wt is an Ft−martingale [7].

A Wiener process is also a Markov process.

Deﬁnition 2.32. An Ft−adapted process Xt is called an Ft−Markov process if we have

E (f (Xt) |Fs) = E (f (Xt) |Xs) for all t ≥ s and all bounded measurable functions f. When X the filtration is not specified, the natural filtration Ft is implied [7].

Lemma 2.33. An Ft−Wiener process Wt is an Ft−Markov process [7].

Lemma 2.34. With unit probability, the sample paths of a Wiener process Wt are non- diﬀerentiable at any rational time t [7].

Clearly, the sample paths of Brownian motion are very rough; certainly the derivative of the Wiener process cannot be a sensible stochastic process [7]. It, therefore, conﬁrms the

17 fact that white noise is not a stochastic process. Another measure of the irregularity of the sample paths of the Wiener process is their total variation [7]. For any real-valued function f(t), the total variation of f on the interval t ∈ [a, b] is deﬁned as

k X TV (f, a, b) = sup sup |f (ti+1) − f (ti)| , k≥0 (ti)∈P (k, a, b) i=0

where P (k, a, b) denotes the set of all partitions a = t0 < t1 < ··· < tk < tk+1 = b.

Lemma 2.35. With unit probability, TV (W., a, b) = ∞ for any a < b. In other words, the sample paths of the Wiener process are a.s. of inﬁnite variation [7].

2.2.6 Existence: A Multiscale Construction

We are ﬁnally ready to construct a Wiener process.

Lemma 2.36. Let {Wt : t ∈ [0, 1]} be a stochastic process on the probability space (Ω, F,P ). 0 0 0 Then there exists a stochastic process {Wt : t ∈ [0, ∞]} on a probability space (Ω , F ,P ) for all t.

The second simpliﬁcation is to make our random walks have continuous sample paths, unlike xt(N) which has jumps [7].

Lemma 2.37. Let fn (t) n = 1, 2,... be a sequence of continuous functions on t ∈ [0, 1] that converge uniformly to some function f (t) , i.e., sup | fn (t) − f (t)| → 0 as n → ∞. t∈[0,1] Then f (t) must be a continuous function [7].

Lemma 2.38. Let fn (t) n = 1, 2,... be a sequence of continuous functions on t ∈ [0, 1] , P such that sup | fn+1 (t) − fn (t)| < ∞. Then fn (t) converge uniformly to some continu- n t∈[0,1] ous function f (t) [7].

Theorem 2.39. There exists a Wiener process Wt on some probability space (Ω, F,P ) [7].

18 2.2.7 White Noise

White noise is generally deﬁned as follows: it is a Gaussian “stochastic process” ξt with zero mean and covariance E (ξsξt) = δ (t − s) , where δ (·) is Dirac’s delta-“function” [7]. The delta function is deﬁned by the relation

f (s) δ (s) ds = f (0) , ˆ where f is an element in a suitable space of test functions. The simplest space of test

∞ functions is the space C0 of smooth functions of compact support [7]. Obviously, ξt is not a stochastic process, since its covariance is not a function. However, we could think of ξt as an object whose sample paths are themselves generalized functions [7]. To make sense of this, we have to deﬁne the properties of white noise when integrated against a test function [7]. So let us integrate the deﬁning properties of white noise against test functions:

E (ξ (f)) = 0 and

(ξ (f)) ≡ f (s) ξsds g (t) ξtdt E E R+ R+ ´ ´ = f (s) g (t) δ (t − s) dsdt R+×R+ ´ = f (t) g (t) dt R+ ´ ≡ hf, gi .

In addition, the fact that ξt is a Gaussian “process” implies that ξ (f) should be a Gaussian random variable for any test function f. So we can now deﬁne white noise as a generalized

∞ stochastic process: it is a random linear functional ξ on C0 such that ξ (f) is Gaussian, ∞ E (ξ (f)) = 0 and E (ξ (f)) ξ (g) = hf, gi for every f, g ∈ C0 . Given a Wiener process Wt, the stochastic integral ∞ ∞ ξ (f) = f (t) dWt, f ∈ C0 , ˆ0 satisﬁes the deﬁnition of white noise as a generalized stochastic process.

19 ∞ Lemma 2.40. The stochastic integral of f ∈ C0 with respect to the Wiener process Wt (as deﬁned through integration by parts) is a white noise functional [7].

2.3 The Stochastic Integral

Let Ω, F, {Ft}t∈[0, ∞] ,P be a ﬁltered probability space and a Ft−Wiener process. We are going to deﬁne stochastic integrals with respect to Wt.

2 Lemma 2.41. Let X. ∈ L (µT × P ) , and suppose there exists a sequence of Ft−adapted n 2 simple processes X. ∈ L (µT × P ) such that

T kX.n − X.k2 = (Xn − X )2 dt n→→∞ 0. 2,µT ×P E t t ˆ0

Then I (X.) can be deﬁned as the limit in L2 (P ) of the simple integrals I (X.n) , and the deﬁnition does not depend on the choice of simple approximations X.n .

2 Lemma 2.42. Let X. ∈ L (µT × P ) be Ft−adapted. Then there exists a sequence of F −adapted simple processes X. ∈ L2 (µ × P ) such that kX.n − X.k → 0. t T 2,µT ×P

Definition 2.43. (Elementary Itôintegral). Let Xt be any Ft−adapted process in 2 2 L (µT × P ). Then the Itôintegral I (X.) , defined as the limit in L (P ) of simple integrals n n I (X. ) , exists and is unique (i.e., is independent of the choice of Xt ).

2.3.0.1 Continuous sample paths

n 2 Let Xt be an Ft−adapted simple process in L (µT × P ) with jump times ti . For any time t ≤ T , we deﬁne the simple integral

n t n It (X. ) = 0 Xs dWs T´ n = 0 Xs Is≤tdWs N ´ = PXn W − W . ti ti+1∧t ti∧t i=0

20 n The stochastic process It(X. ) has continuous sample paths; this follows immediately from the fact that the Wiener process has continuous sample paths [7].

n Lemma 2.44. It(X. ) is an Ft−martingale [7].

2 Lemma 2.45. Let Xt be an Ft−adapted rocess in L (µT × P ) . Then the Itˆointegral n It(X. ), ∈ [0,T ] can be chosen to have continuous sample paths [7].

2.3.0.2 Localization

T 2 Lemma 2.46. For any Ft−adapted process X. ∈ T <∞ L (µT × P ) , we can deﬁne uniquely the Itˆointegral It(X.) as an Ft−adapted stochastic process on [0, ∞] with continuous sample paths [7].

T 2 Lemma 2.47. Let Xt be an Ft−adapted process in T <∞ L (µT × P ) , and let τ be an

Ft−stopping time. Then It∧τ (X.) = It (X.I.¡τ ) [7].

Lemma 2.48. Let Xt be an Ft−adapted process which admits a localizing sequence τn.

Then It(X.) is uniquely deﬁned as an Ft−adapted stochastic process on [0, ∞] with continuous sample paths and is independent of the choice of localizing sequence [7].

Deﬁnition 2.49. (Itˆointegral). Let Xt be any Ft−adapted stochastic process with

T 2 P Xt dt < = 1 for all T < ∞. ˆ0

Then the Itôintegral t It (X.) = XsdWs ˆ0 is uniquely defined, by localization and the choice of a continuous modification, as an

Ft−adapted stochastic process on [0, ∞] with continuous sample paths [7].

21 2.3.1 Some Elementary Properties

Lemma 2.50. (Linearity). Let Xt and Yt be Itˆointegrable processes, and let α, β ∈ R.

Then It (αX. + βY.) = αIt (X.) + βIt(Y.) [7].

Lemma 2.51. Let Xt be Itˆointegrable and let τ be an Ft−stopping time. Then

t∧τ t Xs dWs = XsIs¡τ dWs. ˆ0 ˆ0

T 2 Lemma 2.52. Let X. ∈ T <∞ L (µT × P ) . Then for any T < ∞

T " T 2# T 2 E XtdWt = 0, E XtdWt = E Xt dt , ˆ0 ˆ0 ˆ0

and moreover It (X.) is an Ft−martingale [7].

n 2 n Corollary 2.53. If X. → X. in L (µT × P ) , then It (X. ) → It (X.) in L (P ) . Moreover, n if the convergence is fast enough, then It (X. ) → It (X.) a.s. [7].

Deﬁnition 2.54. An Ft−measurable process Xt is called an Ft−local martingale if there exists a sequence of Ft−stopping times τn % ∞such that Xt∧τn is a martingale for every n. The sequence τn is called a reducing sequence for Xt [7].

Lemma 2.55. Any Itˆointegral It (X.) is a local martingale [7].

22 2.3.2 The ItˆoCalculus

Let us work on a ﬁltered probability space Ω, F, {Ft}t∈[0, ∞] ,P on which we have deﬁned 1 m i an m−dimensional Ft−Wiener process Wt = (Wt ,...,W t ) (i.e., Wt are independent

Ft−Wiener processes). We consider Ft−adapted processes X1,...,Xn of the form

m t X t Xi = Xi + F ids + GijdW j, t 0 ˆ s ˆ s s 0 j=1 0

i ij where Fs , Gs are Ft−progressively measurable processes that satisfy

t t i ij2 Fs ds < ∞, Gs ds < ∞ a.s ∀ t < ∞, ∀i, j ˆ0 ˆ0

1 n We call Xt = (Xt ,...,Xt ) an n−dimensional Itˆoprocess [7].

1 n Deﬁnition 2.56. A process Xt = (Xt ,...,Xt ) satisfying the above conditions is called an n−dimensional Itˆoprocess. It is also denoted as

t t Xt = X0 + Fsds + GsdWs. ˆ0 ˆ0

Theorem 2.57. (Itˆorule). Let u : [0, ∞] × Rn → R, be a function such that u(t, x) is 1 2 C with respect to t and C with respect to x. Then u(t, Xt) is an Itˆoprocess itself:

n m P P t ik k u (t, Xt) = u (0,X0) + 0 ui (s, Xs) Gs dWs i=1k=1 ´ ( n n m ) t 0 P i 1 P P ik jk + 0 u (s, Xs) + ui (s, Xs) Fs + 2 uij (s, Xs) Gs Gs ds, ´ i=1 i,j=1k=1

0 ∂u(t, x) ∂u(t, x) where we have written u (t, x) = ∂t and ui(t, x) = ∂xi [7].

23 Remark 2.58. (Itôdifferentials). We will often use another notation for the Itôprocess, particularly when dealing with stochastic differential equations [7]:

dXt = Ftdt + GtdW t.

2.3.3 Girsanov’s Theorem

Girsanov’s theorem, tells us what happens to the Wiener process under a change of measure.

Theorem 2.59. (Girsanov). Let Wt be an m−dimensional Ft−Wiener process on the probability space Ω, F, {Ft}t∈[0, ∞] ,P , and let Xt be an Itˆoprocess of the form

t Xt = Fsds + Wt, t ∈ [0,T ] . ˆ0

Suppose furthermore that Ft is Itˆointegrable, and deﬁne

T T ∗ 1 2 Λ = exp − (Fs) dWs − kFsk ds ˆ0 2 ˆ0

∗ 1 1 n n ((Fs) dWs = Fs dWs + ··· + Fs dWs ) . If Novikov’s condition

T 1 2 Ep exp kFsk ds < ∞ 2 ˆ0

is satisﬁed, then {Xt}t∈[0,T ] is an Ft−Wiener process under Q (A) = Ep (ΛIA) .

Lemma 2.60. Let Mt, t ∈ [0,T ] be a nonnegative local martingale. Then Mt is a supermartingale. In particular, if E(MT ) = E(M0), then Mt is a martingale [7].

Lemma 2.61. Let Ft be Itˆointegrable and let Wt be a Wiener process. Then

s t t 1 1 2 E exp FsdWs ≤ E exp (Fs) dWs . 2 ˆ0 2 ˆ0

24 Lemma 2.62. Let Mt be a nonnegative local martingale and let τn be a reducing sequence.

If sup kMT ∧τn kp < ∞ for some p > 1, then {Mt}t∈[0,T ] is a martingale [7]. n

Theorem 2.63. For Itˆointegrable Ft, deﬁne the local martingale

t t ∗ 1 2 ξt (F.) = exp (Fs) dWs − kFsk ds . ˆ0 2 ˆ0

Suppose furthermore that the following condition is satisﬁed:

t 1 2 E exp kFsk ds = K < ∞. 2 ˆ0

Then {ξt (F.)}t∈[0,T ] is in fact a martingale [7].

2.3.4 The Martingale Representation Theorem

W Theorem 2.64. (Martingale representation). Let Mt be an Ft −martingale such that 2 W 2 MT ∈ L (P ). Then for a unique Ft −adapted process {Ht}t∈[0,T ] in L (µT × P )

t Mt = M0 + HsdWs a.s. for all t ∈ [0,T ] , ˆ0

where the uniqueness of Ht is meant up to a µT × P − null set [7].

Actually, the theorem is a trivial corollary of the following result.

W Theorem 2.65. (Itˆorepresentation). Let X be an Ft −measurable random variable 2 W 2 in L (P ). Then for a unique Ft −adapted process {Ht}t∈[0,T ] in L (µT × P )

T X = E (X) + HsdWs a.s., ˆ0

where the uniqueness of Ht is meant up to a µT × P − null set [7].

25 W Lemma 2.66. Introduce the following class of Ft −measurable random variables:

∞ S = {f(Wt1 , ..., Wtn } : n < ∞, t1, . . . , tn ∈ [0,T ] , f ∈ C0

∞ (recall that C0 is the class of smooth functions with compact support). Then for any > 0 W 2 and Ft −measurable X ∈ L (P ), there is a Y ∈ S such that kX − Y k2 < [7].

Lemma 2.67. (Le´vy’s upward theorem). Let X ∈ L2 (P ) be G-measurable, and let

2 Gn be a ﬁltration such that G = σ {Gn}. Then E(X|Gn) → X a.s. and in L (P ).

W Lemma 2.68. (Approximate Itˆorepresentation). For any Y ∈ S, there is an Ft − 2 adapted process Ht in L (µT × P ) such that

T Y = E (Y ) + HsdWs. ˆ0

W 2 In particular, this implies that any Ft −measurable random variable X ∈ L (P ) can be approximated arbitrarily closely in L2 (P ) by an Itoˆ integral.

26 Chapter 3: Stochastic Control and Dynamic Programming

In this Chapter, we assume that the ﬁltration F is the P −augmentation of the canonical

ﬁltration of the Brownian motion Wt [9]. In this Chapter we are following [9] quite closely and do not claim authorship. Our thrust was only to summarize. We will also denote by

d S := [0,T ) × R where T ∈ [0, ∞] .

The set S is called the parabolic interior of the state space [9]. We will denote by S := cl (S) its closure, i.e. S = [0,T ] × Rd for ﬁnite T, and S = S for T = ∞.

3.1 Stochastic Control Problems in Standard Form

Control processes. Given a subset U of Rk, we denote by U the set of all progressively measurable processes v = {vt, t < T } valued in U. The elements of U are called control processes.

Controlled process.

b :(t, x, u) ∈ S × U → b (t, x, u) ∈ Rd and

σ :(t, x, u) ∈ S × U → σ (t, x, u) ∈ MR (n, d) be two continuous functions satisfying the conditions

|b (t, x, u) − b (t, y, u)| + |σ (t, x, u) − σ (t, y, u)| ≤ K |x − y| , (3.1.1)

b (t, x, u) + |σ (t, x, u)| ≤ K (1 + |x| + |u|) . (3.1.2)

27 for some constant K independent of (t, x, y, u). For each control process v ∈ U, we consider the controlled stochastic diﬀerential equation [9]:

dXt = b (t, Xt, vt) dt + σ (t, Xt, vt) dWt. (3.1.3)

If the above equation has a unique solution X, for a given initial data, then the process X is called the controlled process, as its dynamics is driven by the action of the control process v [9].

We shall be working with the following subclass of control processes :

2 U0 := U ∩ H , (3.1.4) where H2 is the collection of all progressively measurable processes with finite L2 (Ω × [0,T )) norm. Then, for every finite maturity T 0 ≤ T, it follows from the above uniform Lipschitz condition on the coefficients b and σ that [9]

" T 0 # 2 d E |b| + |σ| (s, x, vs) ds < ∞ for all v ∈ U0, x ∈ R , ˆ0

which guarantees the existence of a controlled process on the time interval [0,T0] for each given initial condition and control [9].

2 Theorem 3.1. Let v ∈ U0 be a control process, and ξ ∈ L (P) be an F0-measurable random variable. Then, there exists a unique F-adapted process Xv satisfying 3.1.3 together

v with the initial condition X0 = ξ. Moreover for every T > 0, there is a constant C > 0 such that v 2 2 Ct E sup |X0 | < C 1 + E |ξ| e for all t ∈ [0,T ). (3.1.5) 0≤s≤t

28 Gain functional. Let

d d f, k : [0,T ) × R × U → R and g : R → R

− be given functions. We assume that f, k are continuous and kk k∞ < ∞ (i.e. max (−k, 0) is uniformly bounded). Moreover, we assume that f and g satisfy the quadratic growth condition [9]:

|f (t, x, u)| + |g (x)| ≤ K 1 + |u| + |x|2 , for some constant K independent of (t, x, u). We deﬁne the gain function J on

[0,T ] × Rd × U by [9]:

T v t,x,v v t,x,v J (t, x, v) := E β (t, s) f s, Xs , vs ds + β (t, T ) g XT 1T <∞ , ˆt when this expression is meaningful, where

s t,x,v v − t k(r,Xr ,vr)dr β (t, s) := e ´ ,

t,x,v and {Xs , s, t} is the solution of 3.1.3 with control process ν and initial condition t,x,v Xt = x.

Admissible control processes. In the finite horizon case T < ∞, the quadratic growth condition on f and g together with the bound on k− ensure that J (t, x, v) is well-defined for all control process v ∈ U0. We then define the set of admissible controls in this case by

U0 [9].

More attention is needed for the inﬁnite horizon case. In particular, the discount term k needs to play a role to ensure the ﬁniteness of the integral. In this setting the largest set

29 of admissible control processes is given by

∞ v t,x,v 2 U0 := v ∈ U : E β (t, s) 1 + Xs + |vs| ds < ∞ for all x when T = ∞ ˆ0

The stochastic control problem. Consider the optimization problem

V (t, x) := sup (t, x, v) for (t, s) ∈ S. v∈U0

Our main concern is to describe the local behavior of the value function V by means of the so-called dynamic programming equation, or Hamilton-Jacobi- Bellman equation [9].

3.2 The Dynamic Programming Principle

3.2.1 A weak Dynamic Programming Principle

The dynamic programming principle is the main tool in the theory of stochastic control [9]. We denote:

0 0 ∗ 0 0 V∗ (t, x) := lim inf V (t , x ) and V (t, x) := lim sup V (t , x ) , (t0,x0)→(t,x) (t0,x0)→(t,x) for all (t, x) ∈ S.

v Theorem 3.2. Assume that V is locally bounded. Let (t, x) ∈ S be ﬁxed. Let {θ , v ∈ Ut} be a family of ﬁnite stopping times independent of Ft with values in [t, T ] . Then:

θv v t,x,v v v v t,x,v V (t, x) ≤ supE β (t, s) f s, Xs , vs ds + β (t, θ ) V∗ θ ,Xθv . v∈Ut ˆt

30 v v ∞ Assume further that g is lower-semicontinuous and θ ,Xt,x1[t,θv] is L -bounded for all v ∈ Ut . Then

θv v t,x,v v v ∗ v t,x,v V (t, x) ≥ supE β (t, s) f s, Xs , vs ds + β (t, θ ) V θ ,Xθv . v∈Ut ˆt

3.3 The Dynamic Programming Equation

The dynamic programming equation is the inﬁnitesimal counterpart of the dynamic programming principle. It is also widely called the Hamilton-Jacobi-Bellman equation. In this section, we shall derive it under strong smoothness assumptions on the value function [9].

Let Sd be the set of all d × d symmetric matrices with real coeﬃcients, and deﬁne the map d H : S × R × R × Sd by : H(t, x, r, p, γ) 1 h i := sup −k(t, x, u)r + b(t, x, u) · p + Tr σσT (t, x, u) γ + f (t, x, u) u∈U 2

We also need to introduce the linear second order operator Lu associated to the the con-

u u trolled process {β (0, t) Xt , t ≥ 0} controlled by the constant control process u :

Luϕ (t, x) := −k(t, x, u)ϕ(t, x) + b(t, x, u) · Dϕ(t, x) 1 h T 2 i + + 2 Tr σσ (t, x, u) D ϕ(t, x) , where D and D2 denote the gradient and the Hessian operators with respect to the x variable. With this notation, we have by Itˆo’s formula:

v v v v s v vr v β (0, s) ϕ (s, Xs ) − β (0, t) ϕ (t, Xt ) = t β (0, r)(∂t + L ) ϕ (r, Xr ) dr s´ v v v + t β (0, r) Dϕ (r, Xr ) · ϕ (r, Xr , vr) dWr ´

1,2 d for every s ≥ t and smooth function ϕ ∈ C [t, s] , R and each admissible control process v ∈ U0.

31 1,2 d Proposition 3.3. Assume the value function V ∈ C [0,T ), R , and let the coeﬃcients k (·, ·, u) and f (·, ·, u) be continuous in (t, x) for all ﬁxed u ∈ U. Then, for all (t, x) ∈ S :

2 −∂tV (t, x) − H(t, x, V (t, x) ,DV (t, x) ,D V (t, x)) ≥ 0. (3.3.1)

1,2 d 2 Proposition 3.4. Assume V ∈ C [0,T ), R and H(., V.DV, D V ) > −∞. Assume further that k is bounded and the function H is continuous. Then, for all (t, x) ∈ S :

2 −∂tV (t, x) − H(t, x, V (t, x) ,DV (t, x) ,D V (t, x)) ≤ 0. (3.3.2)

As a consequence of Propositions 3.3 and 3.4, we have the main result of this section:

Theorem 3.5. Let the conditions of Propositions 3.3 and 3.4 hold. Then, the value function V solves the Hamilton-Jacobi-Bellman equation

2 −∂tV − H(., V.DV, D V ) = 0 on S. (3.3.3)

Note: The value function should not be expected to be smooth in general.

3.3.1 Continuity of the Value Function for Bounded Controls

Proposition 3.6. Let f = k ≡ 0, T < ∞, and assume that g is Lipschitz continuous. Then:

1. V is Lipschitz in x, uniformly in t.

1 2. Assume further that U is bounded. Then V is 2 -H¨older-continuous in t, and there is a constant C ≥ 0 such that:

0 p 0 d |V (t, x) − V (t , x)| ≤ C (1 + |x|) |t − t0|; t, t ∈ [0,T ] , x ∈ R .

32 Chapter 4: Optimal Stopping and Dynamic Programming

In this Chapter, our goal is to derive similar results to those obtained in the previous chapter for standard stochastic control problems in the context of optimal stopping problems. In this Chapter, we are closely following [9]. Our thrust is only to summarize.

4.1 Optimal Stopping Problems

For 0 ≤ t ≤ T < ∞, we denote by T[t,T ] the collection of all F-stopping times with values in [t, T ]. We also recall the notation S := [0,T ) × Rn for the parabolic state space of the underlying state process X deﬁned by the stochastic diﬀerential equation:

dXt = b(t, Xt)dt + σ(t, Xt)dWt, (4.1.1)

n where b and σ are deﬁned on S and take values in R and Sn, respectively. We assume that b and σ satisfy the usual Lipschitz and linear growth conditions so that the above SDE has a unique strong solution [9].

The inﬁnitesimal generator of the Markov diﬀusion process X is denoted by

1 h i A := b · D + Tr σσTD2ϕ . ϕ ϕ 2

Let g be a continuous function from Rn to R, and assume that:

E sup |g (Xt)| < ∞. (4.1.2) 0≤t≤T

For instance, if g has polynomial growth, the previous integrability condition is automati-

33 cally satisﬁed. Under this condition, the criterion:

t,x J (t, x, τ) := E g Xτ (4.1.3)

t,x is well-deﬁned for all (t, x) ∈ S and τ ∈ T[t,T ]. Here, X denotes the unique strong solution t,x of 4.1.3 with initial condition Xt = x.

The optimal stopping problem is now deﬁned by:

V (t, x) := sup J (t , x, τ) for all (t, s ∈ S) (4.1.4) τ∈T[t,T ]

A stopping timeτ ˆ ∈ T[t,T ] is called an optimal stopping rule if V (t, x) = J (t , x, τˆ) .

The set S := {(t, x): V (t, x) = g (x)} (4.1.5) is called the stopping region and is of particular interest: whenever the state is in this region, it is optimal to stop immediately. Its complement Sc is called the continuation region [9].

4.2 The Dynamic Programming Principle

t Theorem 4.1. Assume that V is locally bounded. For (t, x) ∈ S, let θ ∈ T[t,T ] be a t,x stopping time such that Xθ is bounded. Then:

t,x ∗ t,x V (t, x) ≤ sup E 1{τ<0}g Xτ + 1{τ≥0}V θ, Xθv , (4.2.1) t τ∈T[t,T ]

t,x t,x V (t, x) ≥ sup E 1{τ<0}g Xτ + 1{τ≥0}V∗ θ, Xθv . (4.2.2) t τ∈T[t,T ]

34 4.3 The Dynamic Programming Equation

Theorem 4.2. Assume that V ∈ C1,2 ([0,T ), Rn) , and let g : Rn → R be continuous. Then V solves the obstacle problem:

min {− (∂t + A) V,V − g} = 0 on S. (4.3.1)

4.4 Regularity of the Value Function

4.4.1 Finite Horizon Optimal Stopping

In this subsection, we consider the case T < ∞. Similar to the continuity result of Propo- sition 3.5 for the stochastic control framework, we have:

Proposition 4.3. Assume g is Lipschitz-continuous, and let T < ∞. Then, there is a constant C such that:

|V (t, x) − V (t0, x0)| ≤ C |x − x0| + p|t − t0| for all (t, x) , (t,0 x0) ∈ S.

35 Chapter 5: Solving Control Problems by Veriﬁcation

In this Chapter, we present a general argument, based on Itô’sformula, which allows to show that some “guess” of the value function is indeed equal to the unknown value function. Namely, given a smooth solution v of the dynamic programming equation, we give sufficient conditions which allow to conclude that v coincides with the value function V . This is the so-called verification argument [9]. In this Chapter we are following [9] quite closely and do not claim authorship.

5.1 The Veriﬁcation Argument for Stochastic Control Problems

We recall the stochastic control problem formulation of Section 3.1. The set of admissible control processes U0 ⊂ U is the collection of all progressively measurable processes with k values in the subset U ⊂ R . For every admissible control process v ∈ U0, the controlled process is deﬁned by the stochastic diﬀerential equation:

v v v dXt = b(t, Xt , vt)dt + σ(t, Xt , vt)dWt.

The gain criterion is given by

T v t,x,v v t,x,v J (t, x, v) := E β (t, s) f s, Xs , vs ds + β (t, T ) g XT ˆt with s t,x,v v − t k(r, Xs , vr)dr β (t, s) := e ´ .

The stochastic control problem is deﬁned by the value function:

V (t, x) := supJ (t, x, v) , for (t, s) ∈ S. (5.1.1) v∈U0

36 We follow the notations of Section 3.3. We recall the Hamiltonian

d H : S × R × R × Sd deﬁned by :

H(t, x, r, p, γ) 1 h i := sup −k(t, x, u)r + b(t, x, u) · p + Tr σσT (t, x, u) γ + f (t, x, u) , u∈U 2 where b and σ satisfy the conditions (3.1.1) − (3.1.2), and the coeﬃcients f and k are measurable. From the results of the previous section, the dynamic programming equation corresponding to the stochastic control problem (4.1.1) is:

2 −∂tv − H(., v, Dv, D v) = 0 and v(T,.) = g. (5.1.2)

A function v will be called a supersolution (resp. subsolution) of Equation (4.1.2) if

2 −∂tv − H(., v, Dv, D v) ≥ (resp. ≤) 0 and v (T,.) ≥ (resp. ≤) g.

The proof of the subsequent result will make use of the following linear second-order operator Luϕ (t, x) := −k(t, x, u)ϕ(t, x) + b(t, x, u) · Dϕ(t, x) 1 h T 2 i + + 2 Tr σσ (t, x, u) D ϕ(t, x) ,

u u which corresponds to the controlled process {β (0, t) Xt , t ≥ 0} controlled by the constant control process u, in the sense that

v v v v s v vr v β (0, s) ϕ (s, Xs ) − β (0, t) ϕ (t, Xt ) = t β (0, r)(∂t + L ) ϕ (r, Xr ) dr s´ v v v + t β (0, r) Dϕ (r, Xr ) · ϕ (r, Xr , vr) dWr ´

1,2 d for every t ≤ s and smooth function ϕ ∈ C [t, s] , R and each admissible control process v ∈ U0. The last expression is an immediate application of Itˆo’sformula.

37 1,2 d d Theorem 5.1. Let T < ∞, and v ∈ C [0,T ), R ∩ C [0,T ] × R . Assume that − kk k∞ < ∞ and v and f have quadratic growth, i.e. there is a constant C such that

2 d |f (t, x, u)| + |v (t, x)| ≤ C 1 + |x| for all (t, x, u) ∈ [0,T ) × R × U.

(i) Suppose that v is a supersolution of (4.1.2). Then v ≥ V on [0,T ] × Rd.

(ii) Let v be a solution of (4.1.2), and assume that there exists a minimizer uˆ (t, x) of Luv (t, x) + f (t, x, u) such that

uˆ(t, x) • 0 = ∂tv (t, x) + L v (t, x) + f (t, x, uˆ (t, x)) ,

• the stochastic diﬀerential equation

dXs = b (s, Xs, uˆ (t, x)) ds + σ (s, Xs, uˆ (t, x)) dWs

deﬁnes a unique solution X for each given initial date Xt = x,

• the process vˆs :=u ˆ (s, Xs) is a well-deﬁned control process in U0. Then v = V and vˆ is an optimal Markov control process.

38 5.2 The Veriﬁcation Argument for Optimal Stopping Problems

In this section, we develop the verification argument for finite horizon optimal stopping problems. Let T > 0 be a finite time horizon, and Xt,x denote the solution of the stochastic differential equation:

s s t,x t,x t,x X = x + b s, X ds + σ s, X dWs, (5.2.1) ˆt ˆt where b and σ satisfy the usual Lipschitz and linear growth conditions. Given the functions k, f : [0,T ] × Rd → R and g : Rd → R, we consider the optimal stopping problem

τ t,x v t,x V (t, x) := sup E β (t, s) f s, Xs ds + β (t, τ) g Xτ , (5.2.2) t ˆ τ∈T[t,T ] t whenever this expected value is well-deﬁned, where

s t,x,v − t k(r, Xs , vr)dr β (t, s) := e ´ , 0 ≤ t ≤ s ≤ T.

By the results of the previous chapter, the corresponding dynamic programmin equation is:

d min {−∂tv − Lv − f, v − g} = 0 on [0,T ) × R , v (T,.) = g, (5.2.3) where L is the second order diﬀerential operator

1 h i Lv := b · Dv + Tr σσTD2v − kv. 2

Similar to Section 5.1, a function v will be called a supersolution (resp. subsolution) of (5.2.3) if

min {−∂tv − Lv − f, v − g} ≥ (resp. ≤) 0 and v (T,.) ≥ (resp. ≤) g.

39 Remark 5.2. Let v be a function in the Sobolev space W 1,2 (S) . By deﬁnition, for such a

n 1,2 n function v, there is a sequence of functions (v )n≥1 ⊂ C (S) such that v → v uniformly on compact subsets of S, and

n m n m 2 n 2 m k∂tv − ∂tv kL2(S) + kDv − Dv kL2(S) + D v − D v L2(S) → 0.

Then, Itˆo’sformula holds true for vn for all n ≥ 1, and is inherited by v by sending n → ∞.

1,2 d Theorem 5.3. Let T < ∞ and v ∈ W [0,T ), R . Assume further that v and f have quadratic growth. Then:

(i) If v is a supersolution of (5.2.3) , then v ≥ V.

(ii) If v is a solution of (5.2.3) , then v = V and

∗ τt := inf {s > t : v (s, Xs) = g (Xs)} is an optimal stopping time.

40 Chapter 6: Introduction to Viscosity Solutions

In this chapter, we will provide the main tools from the theory of viscosity solutions for the purpose of our applications to stochastic control problems. In this Chapter, we are reproducing the results from [9].

6.1 Intuition Behind Viscosity Solutions

We consider a non-linear second order partial diﬀerential equation

(E) F x, u(x), Du(x),D2u(x) = 0 for x ∈ O,

d d where O is an open subset of R and F is a continuous map from O × R × R × Sd → R. A crucial condition on F is the so-called ellipticity condition :

d Standing Assumption For all (x, r, p) ∈ O × R × R and A, B ∈ Sd :

F (x, r, p, A) ≤ F (x, r, p, B) whenever A ≥ B.

Deﬁnition 6.1. A function u : O → R is a classical supersolution (resp. subsolution) of (E) if u ∈ C2 (O) and

F x, u(x), Du(x),D2u(x) ≥ (resp. ≤) 0 for x ∈ O.

Proposition 6.2. Let u be a C2 (O) function. Then the following claims are equivalent.

(i) u is a classical supersolution (resp. subsolution) of (E)

2 (ii) for all pairs (x0, ϕ) ∈ O × C (O) such that x0 is a minimizer (resp. maximizer) of the

41 diﬀerence u − ϕ) on O, we have

2 F x0, u(x0), Dϕ(x0),D ϕ(x0) ≥ (resp. ≤) 0.

6.2 Deﬁnition of Viscosity Solutions

∗ For a locally bounded function u : O → R, we denote by u∗ and u the lower and upper ∗ semicontinuous envelopes of u [9]. u∗ is the largest lower semicontinuous minorant of u, u is the smallest upper semicontinuous majorant of u, and

0 ∗ 0 u∗ (x) = lim infu (x ) , u (x) = lim supu (x ) . x0→x x0→x

Deﬁnition 6.3. Let u : O → R be a locally bounded function.

(i) We say that u is a (discontinuous) viscosity supersolution of (E) if

2 F x0, u∗(x0), Dϕ(x0),D ϕ(x0) ≥ 0

2 for all pairs (x0, ϕ) ∈ O × C (O) such that x0 is a minimizer of the diﬀerence (u∗ − ϕ) on O.

(ii) We say that u is a (discontinuous) viscosity subsolution of (E) if

∗ 2 F x0, u (x0), Dϕ(x0),D ϕ(x0) ≤ 0

2 ∗ for all pairs (x0, ϕ) ∈ O × C (O) such that x0 is a maximizer of the diﬀerence (u − ϕ) on O.

(iii) We say that u is a (discontinuous) viscosity solution of (E) if it is both a viscosity supersolution and subsolution of (E).

2 Notation We will say that F (x0, u∗(x0), Dϕ(x0),D ϕ(x0)) ≥ 0 in the viscosity sense

42 whenever u∗ is a viscosity supersolution of (E). A similar notation will be used for subsolution [9].

Remark 6.4. An immediate consequence of Proposition 6.2 is that any classical solution of (E) is also a viscosity solution of (E) [9].

6.3 First Properties

Proposition 6.5. Let u be a locally bounded (discontinuous) viscosity supersolution of

(E). If f is a C1(R) function with Df 6= 0 on R, then the function v : = f −1 ◦ u is a (discontinuous)

• viscosity supersolution, when Df > 0,

• viscosity subsolution, when Df < 0, of the equation K x, v (x) , Dv (x) ,D2v (x) = 0 for x ∈ O, where K(x, r, p, A) := F x, f (r) , Df (r) p, D2f (r) pp0 + Df (r) A .

Theorem 6.6. . Let uε be a lower semicontinuous viscosity supersolution of the equation

2 Fε x, uε(x), Duε(x),D uε(x) = 0 for x ∈ O,

where (Fε)ε>0 is a sequence of continuous functions satisfying the ellipticity condition. Sup- pose that (ε, x) 7−→ uε(x) and (ε, z) 7−→ Fε(z) are locally bounded, and deﬁne

0 0 u (x) := lim inf uε (x ) and F (z) := lim sup Fε (z ) . (ε, x0)→(0, x) (ε, z0)→(0, z)

43 Then, u is a lower semicontinuous viscosity supersolution of the equation

F x, u (x) , Du (x) ,D2u (x) = 0 for x ∈ O.

A similar statement holds for subsolutions.

Proposition 6.7. Let A ⊂ Rd1 and B ⊂ Rd2 be two open subsets, and let u : A × B → R be a lower semicontinuous viscosity supersolution of the equation :

2 F x, y, u(x, y),Dyu(x, y),Dyu(x, y) ≥ 0 on A × B,

where F is a continuous elliptic operator. Then, for all ﬁxed x0 ∈ A, the function v (y) := u (x0, y) is a viscosity supersolution of the equation :

2 F x0, y, v (y) , Dv (y) ,D v (y) ≥ 0 on B.

A similar statement holds for the subsolution property.

6.4 Comparison Result and Uniqueness

In this section, we show that the notion of viscosity solutions is consistent with the maximum principle for a wide class of equations [9]. We recall that the maximum principle is a stronger statement than uniqueness, i.e., any equation satisfying a comparison result has no more than one solution [9]. In the viscosity solutions literature, the maximum principle is rather called comparison principle [9].

44 6.4.1 Comparison of Classical Solutions in a Bounded Domain

Let us ﬁrst review the maxium principle in the simplest classical sense.

Proposition 6.8. Assume that O is an open bounded subset of Rd, and the nonlinearity F (x, r, p, A) is elliptic and strictly increasing in r. Let u, v ∈ C2 (cl (O)) be classical subsolution and supersolution of (E), respectively, with u ≤ v on ∂O. Then u ≤ v on cl (O) .

6.4.2 Semijets Deﬁnition of Viscosity Solutions

We ﬁrst need to develop a convenient alternative deﬁnition of viscosity solutions.

d For x0 ∈ O, r ∈ R, p ∈ R , and A ∈ Sd, we introduce the quadratic function:

1 q (y, r, p, A) := r + p · y + Ay · y, y ∈ d. 2 R

2 For v ∈ LSC (O) , let (x0, ϕ) ∈ O × C (O) be such that x0 is a local minimizer of the 2 diﬀerence (v − ϕ) in O. Then, deﬁning p := Dϕ (x0) and A := D ϕ (x0) , it follows from a second order Taylor expansion that:

2 v(x) ≥ q (x − x0, v(x0), p, A) + ◦ |x − x0| .

− Motivated by this observation, we introduce the subjet JO v (x0) by

− d 2 JO v (x0) := (p, A) ∈ R × Sd : v (x) ≥ q (x − x0, v (x0) , p, A) + ◦ |x − x0| . (6.4.1)

+ Similarly, we deﬁne the superjet JO u (x0) of a function u ∈ USC(O) at the point x0 ∈ O by

+ d 2 JO u (x0) := (p, A) ∈ R × Sd : u (x) ≥ q (x − x0, u (x0) , p, A) + ◦ |x − x0| . (6.4.2)

45 Then, it can prove that a function v ∈ LSC (O) is a viscosity supersolution of the equation (E) is and only if

− F (x, v (x) , p, A) ≥ 0 for all (p, A) ∈ JO v (x0) .

A symmetric statement holds for viscosity subsolutions. By continuity considerations, we

± can even enlarge the semijets JO w (x0) to the folowing closure

¯± d JO w (x0) := {(p, A) ∈ R × Sd :(xn, w (xn) , pn,An) → (x, w (x) , p, A) ± for some sequence (xn, , pn,An)n ⊂ Graph JO w },

± ± where (xn, , pn,An) ∈ Graph JO w means that (pn,An) ∈ JO w (xn) . The following result is obvious, and provides an equivalent deﬁnition of viscosity solutions.

Proposition 6.9. Consider an elliptic nonlinearity F , and let u ∈ USC(O), v ∈ LSC(O).

(i) Assume that F is lower-semicontinuous. Then, u is a viscosity subsolution of (E) if and only if: ¯+ F (x, u (x) , p, A) ≤ 0 for all x ∈ O and (p, A) ∈ JO u (x) .

(ii) Assume that F is upper-semicontinuous. Then, v is a viscosity supersolution of (E) if and only if:

¯− F (x, v (x) , p, A) ≥ 0 for all x ∈ O and (p, A) ∈ JO v (x) .

46 6.4.3 The Crandall-Ishii’s lemma

Lemma 6.10. Let O be an open locally compact subset of Rd. Given u ∈ USC(O) and 2 2 2 v ∈ LSC(O), we assume for some (x0, y0) ∈ O , ϕ ∈ C cl (O) that:

(u − v − ϕ)(x0, y0) = max (u − v − ϕ) . (6.4.3) O2

Then, for each ε > 0, there exist A, B ∈ Sd such that

¯+ ¯− (Dxϕ (x0, y0) ,A) ∈ JO u (x0) , (−Dyϕ (x0, y0) ,B) ∈ JO v (y0) ,

and the following inequality holds in the sense of symmetric matrices in S2d :

  A 0 −1 2 2 2 2 − ε + D ϕ (x0, y0) I2d ≤   ≤ D ϕ (x0, y0) + εD ϕ (x0, y0) . 0 −B

6.4.4 Comparison of Viscosity Solutions in a Bounded Domain

Assumption 6.1 (i) There exists γ > 0 such that

0 0 0 d F (x, r, p, A) − F (x, r , p, A) ≥ γ (r − r ) for all r ≥ r , (x, p, A) ∈ O × R × Sd.

(ii) There is a function $ : R+ → R+ with $(0+) = 0, such that

F (y, r, α (x − y) ,B) − F (x, r, α (x − y) ,A) ≤ $ α |x − y|2 + |x − y|

for all x, y ∈ O, r ∈ R and A, B satisfying

      Id 0 A 0 Id −Id −3α   ≤   ≤ 3α   . (6.4.4) 0 Id 0 −B −Id Id

47 Theorem 6.11. Let O be an open bounded subset of Rd and let F be an elliptic operator satisfying Assumption 5.1. Let u ∈ USC(O) and v ∈ LSC(O) be viscosity subsolution and supersolution of the equation (E), respectively. Then

u ≤ v on ∂O =⇒ u ≤ v onO¯ := cl (O) .

6.5 Comparison in Unbounded Domains

Assumption 6.2 (i) There exists γ > 0 such that

0 0 0 d F (x, r, p, A) − F (x, r , p, A) ≥ γ (r − r ) for all r ≥ r , (x, p, A) ∈ O × R × Sd.

(ii) There is a function $ : R+ → R+ with $(0+) = 0, such that

F (y, r, α (x − y) ,B) − F (x, r, α (x − y) ,A) ≤ $ α |x − y|2 + |x − y|

for all x, y ∈ O, r ∈ R andA, B satisfying

      Id 0 A 0 Id −Id −4α   ≤   ≤ 4α   . (6.5.1) 0 Id 0 −B −Id Id

Theorem 6.12. Let F be a uniformly continuous elliptic operator satisfying Assumption 5.2. Let u ∈ USC(O) and v ∈ LSC(O) be viscosity subsolution and supersolution of the equation (E), respectively, with |u (x)| + |v (x)| = ◦ |x|2 as |x| → ∞. Then

u ≤ v on ∂O =⇒ u ≤ v cl (O) .

48 Chapter 7: Finite Diﬀerence Numerical Approximations

7.1 Introduction

This chapter is intended to provide a numerical solution of the Hamilton-Jacobi-Bellman (HJB) equation for stochastic optimal control problems. The computation’s difficulty is due to the nature of the HJB equation being a second-order partial differential equation which is coupled with an optimization problem. For most optimal stochastic control models arising from applications, the dynamic programming equation can only be solved approximately by numerical computations [10]. We will consider a finite difference scheme due to Kushner for computing approximately the value function V (t, x) for a controlled Markov diffusion on a finite time horizon [10]. In this Chapter, we are following [10] quite closely.

7.2 Controlled Discrete Time Markov Chains

In this section we will briefly explain the method of dynamic programming in discrete time, for finite time horizon and infinite time horizon with discounted cost criterion [10]. We first develop a finite horizon stochastic control problem. Let Σ be a set, which is either finite or countably infinite and ` = k, k + 1, ··· ,M, be our times, where k denotes an initial time and M a terminal time [10]. The state at time ` is denoted by x` and the control chosen at time ` by u` . These are discrete-time stochastic processes, with x` ∈ Σ, u` ∈ U. The

v state dynamics are prescribed by a family of one step transition probabilities p` (x, y). We deﬁne discrete time admissible control systems π as follows. Let

· · π = (Ω, {F`} , P, x , u )

admissible if (Ω, F,P ) is a probability space, {F`} is an increasing family of

σ−algebras (` = k, k + 1, ··· ,M), F` ⊂ F and

49 k ` • x = x, x is F` - measurable.

` • u is F` - measurable.

`+1 u` • P x = y|F` = p` (x, y) P -almost surely for ` = k, ··· ,M − 1.

The problem is to minimize a criterion (or payoﬀ functional) of the form:

(M−1 ) X ` ` M Jk (x; π) = Ekx L` x , u + ψ x . (7.2.1) `=k

To avoid undue technical complications we make the following rather strong assumptions:

(a) There exists K such that |L`(x, v)| ≤ K, |ψ (x)| ≤ K for all x ∈ Σ, v ∈ U, ` = k, ··· ,M − 1. (b) U is compact. (7.2.2)

v (c) p` (x, y) is continuous on U, forallx, y ∈ Σ. v (d) For each x ∈ Σ there exists a ﬁnite set Γx such that p` (x, y) = 0 for y∈ / Γx

Let

Vk (x) = infJk(x; π), x ∈ Σ (7.2.3) π be the value function.

The value function is the unique bounded solution to the dynamic programming equation

" # X v Vk (x) = min pk(x, y)Vk+1 (y) + Lk (x, v) , k < M, (7.2.4) v∈U y∈Σ with the terminal data

VM (x) = ψ (x) . (7.2.5)

? Also, an optimal discrete time Markov control policy uk(x) is found by taking arg min over U on the right side of 7.2.4.

50 Inﬁnite horizon discounted control problem.

Let us now consider times ` = 0, 1, 2, ··· and autonomous state dynamics for a controlled Markov chain, prescribed by one step transition probabilities pv(x, y) [10]. The concept of admissible control system π is deﬁned as above. Let

" ∞ # X ` ` ` J (x; π) = Ex λ L x , u , (7.2.6) `=0 where λ is a discount factor (0 < λ < 1). Let us make the same assumptions about L, U and pv(x, y) as in 7.2.2. Let

V (x) = infJ(x; π). (7.2.7) π

The dynamic programming equation is now

" # X V (x) = min L (x, v) + λ pv(x, y)V (y) . (7.2.8) u∈U y∈Σ

Let us denote the right side of 7.2.8 by F (V )(x). Then 7.2.8 states that V = F (V ); i.e. V is a ﬁxed point of F . It is easy to verify that

kF (V ) − F (W )k ≤ λ kV − W k , (7.2.9) where k·k is the supremum norm. Since 0 < λ < 1, the contraction property 7.2.9 implies that there is a unique ﬁxed point V , which is in fact the value function in 7.2.7.

The two standard methods for computing the value function V are successive approximation (or value iteration) and approximation in policy space for inﬁnite horizon discounted problems. The method of value iteration gives V as the uniform limit of a sequence W m, m = 0, 1, 2, ··· , where W m+1 = F (W m). From the deﬁnition 7.2.8, the operator F

51 is monotone:

F (φ1) ≤ F (φ2) if φ1 ≤ φ2. (7.2.10)

Therefore, if W 0 is chosen such that W 0 ≤ F (W 0), then the approximating sequence is monotone nondecreasing: W m ≤ W m+1. The method of approximation in policy space proceeds as follows. Let u0 be an initial choice of stationary Markov control policy. Deﬁne W m, um+1 successively for m = 0, 1, 2, ··· by

m m P um(x) m W (x) = L (x, u (x)) + λ p (x, y) W (y) , x ∈ Σ, (7.2.11) y∈Σ

" # X um+1 (x) ∈ arg min L (x, v) + λ pv (x, y) W m (y) . (7.2.12) y∈Σ

7.3 Finite Diﬀerence Approximations to HJB Equations

Let us assume autonomous state dynamics and running cost functions f (x, v) , σ (x, v) ,L (x, v) as in Section IV.2 of Fleming and Soner. Let us also assume in addition to IV(2.2) of Fleming and Soner that:

(a) U is compact (7.3.1) (b) f, σ, L, Lx and Lt are bounded on Q0 × U.

We consider the HJB partial diﬀerential equation

2 −Vt + H x, DxV,DxV = 0, (7.3.2) with H(x, p, A) as in IV(3.2) of Fleming and Soner. As in Chapters IV and V of Fleming

52 and Soner, we consider 7.3.2 either in Q0 with bounded terminal (Cauchy) data

n V (t1, x) = ψ (x) , x ∈ R , (7.3.3) or in a cylindrical region Q with the boundary data IV(3.4) in Fleming and Soner on ∂?Q.

First of all, let us consider dimension n = 1 and the case of Cauchy data 7.3.3. After- ward, we include lateral boundary conditions, and outline extensions to dimension n > 1. According to IV(3.2) in Fleming and Soner we have for n = 1:

1 H(x, p, A) = max −f (x, v) p − a (x, v) A − L (x, v) v∈U 2 with a = σ2.

Consider a time step h > 0 and a spatial step δ > 0, which will be related in such a way that inequality (3.7) below holds. The approximating controlled discrete time Markov chain has as state space the 1 dimensional lattice

h Σ0 = {x = jδ : j = 0, ±, ±2, ···} . (7.3.4)

Let f + (x, v) = max (f (x, v) , 0) (7.3.5) f − (x, v) = max (−f (x, v) , 0) .

We call f + and f − the positive and negative parts of f. The dynamics of the controlled Markov chain are speciﬁed by the one step transition probabilities

v h h a(x, v) + i p (x, x + δ) = δ2 2 + δf (x, v) h i v h a(x, v) − (7.3.6) p (x, x − δ) = δ2 2 + δf (x, v) pv (x, x) = 1 − ph (x, x + δ) − ph (x, x − δ) .

53 If y = x + jδ with j 6= 0, ±1, then pv(x, y) = 0. Thus, one-step transitions are to nearest neighbor states. By deﬁnition, pv(x, x ± δ) ≥ 0. We also require that pv(x, x) ≥ 0, which imposes a restriction on h and δ. A suﬃcient condition that pv(x, x) ≥ 0 is that

h a (x, v) + δ |f (x, v)| ≤ δ2 (7.3.7)

for all (x, v) ∈ R1 × U. From now on we choose δ = δ (h) such that equation (7.3.7) holds. Let us consider the Markov chain control problem of minimizing

(M−1 ) h X ` ` M Jk (x; π) = Ekx hL x , u + ψ x (7.3.8) `=k

` h h Thus, in (7.2.1) we take L = hL. Let t0 = t1 − Mh, where t0 → t0 as h ↓ 0. We write

h h h Q0 = (t, x): t = t0 + kh, k = 0, 1, ··· , M, x ∈ Σ0 .

We denote the value function in (7.2.3) by

k h h V (x) = V (t, x) , (t, x) ∈ Q0 .

Thus, one discrete time step corresponds to a step of length h on the time scale for the controlled Markov diﬀusion process. The dynamic programming equation (7.2.4) becomes

V h (t, x) = min[pv(x, x + δ)V h(t + h, x + δ) v∈U (7.3.9) + pv(x, x − δ)V h(t + h, x − δ) + pv(x, x)V h(t + h, x) + hL(x, v)],

with the terminal data (7.3.3) for t = t1. In order to rewrite (7.3.9) in a form which resembles the HJB equation (7.3.2) we introduce the following notations. For any function

54 W (t, x), let + W (t, x+δ)−W (t, x) ∆x W = δ − W (t, x)−W (t, x−δ) ∆x W = δ 2 W (t, x+δ)+W (t, x−δ)−2W (t, x) ∆xW = δ2

These are respectively the forward and backward first order difference quotients, and second order difference quotient in x. Similarly, we consider the first order difference quotient backward in time W (t, x) − W (t − h, x) ∆−W = . t h

Let us replace t by t − h and t + h by t in (7.3.9). By using (7.3.6), rearranging terms and dividing by h we get

− h ˜ + h − h 2 h -∆t V + H(x, ∆x V , ∆x V , ∆xV ) = 0 (7.3.10) where

H˜ (x, p+, p−,A) = max[−f + (x, u) p+ + f − (x, v) p− v∈U (7.3.11) a(x, v) − 2 A − L (x, v)].

Observe that

H˜ (x, p, p, A) = H (x, p, A) . (7.3.12)

Equation (7.3.10) is called an explicit ﬁnite diﬀerence scheme, backward in time. Since

h h ˜ + h − h 2 h V (t − h, x) = V (t, x) − hH x, ∆x V , ∆x V , ∆xV (7.3.13)

± h 2 h and the diﬀerence quotients ∆x V , ∆xV h are evaluated at (t, x) , the values of V at time t−h are explicitly expressed in terms of the values of V h at time t. We expect that V h → V

55 as h → 0, where V is the value function for the controlled diﬀusion process.

Implicit ﬁnite diﬀerence scheme.

If the backward time diﬀerence ∆− in (7.3.10) is replaced by a forward time diﬀerence ∆+, then we obtain instead of (7.3.13) the following equation:

h h ˜ + h − h 2 h W (t, x) = W (t + h, x) − hH x, ∆x W , ∆x W , ∆xW , (7.3.14)

± h 2 h where ∆x W , ∆xW are again evaluated at (t, x). This is called a backward implicit ﬁnite diﬀerence scheme, since (nonlinear) equations must be solved to determine W h(t, ·) from W h(t+h, ·). The implicit scheme (7.3.14) also has a stochastic control interpretation, under restrictions on the step sizes h and δ similar to (7.3.7).

Boundary conditions.

h h In the discussion above we considered x ∈ Σ0 , where Σ0 is the infinite lattice defined h by (7.3.4) with δ = δ(h). For actual numerical calculations Σ0 must be replaced by some finite subset Σh, and then the one step transition probabilities must be changed at boundary points of Σh.

Inﬁnite horizon discounted problem.

Let us consider the controlled diﬀusion problem formulated in Section IV.5 in Fleming and

Soner. For simplicity, we again take O = R1. The dynamics of the approximating discrete time Markov chain are again (7.3.6), and the criterion to be minimized is

∞ h X ` ` ` h J (x; π) = Ex hλ L x , u , x ∈ Σ0 , (7.3.15) `=0 where λ = exp(−βh) and β > 0. The value function

V h (x) = infJ h (x; π) (7.3.16) π

56 satisﬁes (7.2.8), which becomes after using (7.3.6) and rearranging terms

1 − e−βh 0 = V h + H˜ x, ∆+V h, ∆−V h, ∆2V h . (7.3.17) h x x x

In view of (7.3.12) this can be regarded as a discretization of the HJB equation IV (5.8) in Fleming and Soner for the inﬁnite horizon controlled diﬀusion problem.

Controlled diﬀusion in Rn, n > 1.

We again take autonomous f (x, v) , σ (x, v) and L (x, v) where now x ∈ Rn, n 0 f = (f1, ··· , fn) is R − valued and a = σσ is n × n - matrix valued. As in the one + − dimensional case, let fi and fi denote the positive and negative parts of fi, i = 1, ··· , n

[10]. The matrices a (x, v) = (aij (x, v)) , i, j = 1, ··· , n, are nonnegative deﬁnite. Hence + − aii ≥ 0. For j 6= i, let aij, aij denote the positive and negative parts of aij. Let us assume

X aii (x, v) − |aij (x, v)| ≥ 0, (7.3.18) j6=i

n " # X 1X h a (x, v) − |a (x, v) + δ |f (x, v)|| ≤ δ2. ii 2 ij i i=1 j6=i

n Let e1, ··· , en denote the standard basis for R . Thus

n X x = (x1, ··· , xn) = xiei. i=1

The approximating controlled Markov chain has a state space the n-dimensional lattice

( n ) h X Σ0 = x = δ jiei i=1

57 where j1, ··· , jm are any integers. The one step transition probabilities are as follows:

Moreover, pv(x, y) = 0 for all other y. The dynamic programming equation is

  h X v h V (t, x) = min  p (x, y)V (t + h, y) + hL(x, v) , (7.3.20) v∈U h y∈Σ0 which is just (7.3.9) in dimension n = 1. By rearranging terms in (7.3.20) and dividing by h, we obtain a n-dimensional analogue of (7.3.10). Instead of writing this out explicitly, let us recall the definition IV(3.2) in Fleming and Soner, H(x, p, A) in the HJB equation and explain which finite difference quotients are used to approximate the corresponding partial derivatives. For i = 1, ··· , n, and any function W (t, x) let

∆± W = δ−1 [W (t, x ± δe ) W (t, x)] xi i − ∆2 W = δ−2[W (t, x + δe ) + W (t, x − δe ) − 2W (t, x)]. xi i i

− h The time derivative Vt is replaced by ∆t V , just as in 7.3.10. If fi(x, v) ≥ 0, then Vxi is replaced by ∆+ V h, and if f (x, v) < 0 by ∆− V h. Similarly, V is replaced by ∆2 V h. xi i xi xixi xi For the mixed second order partial derivatives, when i 6= j V is replaced by ∆+ V h if xixj xixj

58 − h aij(x, v) ≥ 0 and by ∆xixjV if aij(x, v) < 0, where

∆+ W = 1 δ−2 [2W (t, x) + W (t, x + δe + δe ) + W (t, x − δe − δe )] xixj 2 i j i j 1 −2 − 2 δ [W (t, x + δei) + W (t, x − δei) + W (t, x + δej) + W (t, x − δej)] ∆− W = − 1 δ−2 [2W (t, x) + W (t, x + δe − δe ) + W (t, x − δe + δe )] xixj 2 i j i j 1 −2 + 2 δ [W (t, x + δei) + W (t, x − δei) + W (t, x + δej) + W (t, x − δej)] .

In order to rewrite (7.3.20) as a backward diﬀerence equation like (7.3.13), we introduce

± ± the following notation. For each x, pi ,Aii,Aij, i, j = 1, ··· , n, let

n ˜ ± ± P + + − − H x, pi ,Aii,Aij = max{ [−fi (x, v) pi + fi (x, v) pi v∈U i=1 a+ (x, v) a− (x, v) (7.3.21) aii(s, v) P ij + ij − − 2 Aii + (− 2 Aij + 2 Aij)] − L(x, v)}. j6=i

Then, as in (7.3.13),

V h (t − h, x) = V h (t, x) − hH˜ x, ∆± V h, ∆2 V h, ∆± V h . (7.3.22) xi xi xixj

Computational methods. The value function,V h(t, x) remains to be computed after the value function has been replaced by a finite difference approximation V h. For the finite time horizon problem, the dynamic programming equation (7.3.20) or (7.3.9) in dimensionn = 1 can be solved backward in time, at least in principle [10]. In practice these calculations can only be done for quite low dimension n. In (7.3.20) a minimum over U must in general be computed repeatedly [10]. Fortunately, in many problems of interest there is an explicit formula for H(x, p, q), and this tedious step can be avoided [10]. For numerical solutions of linear parabolic PDEs, implicit finite difference schemes are often found to be advantageous.

59 7.4 Convergence of Finite Diﬀerence Approximations

We wish to show that the value function V h obtained from the finite difference scheme converges to the value function V for the controlled Markov diffusion as h → 0.

Let us ﬁrst describe the Barles - Souganidis method for the HJB equation (7.3.2) in Q0, with the terminal (Cauchy) data (7.3.3). Let Σh be a discrete subset of Rn, for 0 < h ≤ 1; and let B(Σh) denote the space of bounded functions on Σh. We assume that

h n lim dist (x, Σ ) = 0, for all x ∈ R . h↓0

h Let Fh be an operator on B(Σ ). We consider the “abstract” ﬁnite diﬀerence equation, backward in time h h h V (t, x) = Fh V (t + h, ·) (x) , x ∈ Σ (7.4.1) h t = t0 + kh, k = 0, 1, ··· ,M − 1 with

h h V (t1, x) = ψ(x), x ∈ Σ . (7.4.2)

We make the following assumptions:

Fh(φ1) ≤ Fh(φ2) if φ1 ≤ φ2. (monotonicity) (7.4.3)

Fh(φ + c) = Fh(φ) + c, for all c ∈ R. (7.4.4)

For 0 < h < 1, there exists a solution V h to (7.4.1), (7.4.2) and a (7.4.5)

constant K such that V h ≤ K. (stability)

60 −1 lim h [Fh [w (s + h, ·)] (y) − w (s, y)] (s, y)→(t, x) h↓0 (7.4.6) 2 = wt(t, x) − H (x, Dxw (t, x) ,Dxw (t, x)) for every “test function” w ∈ C1,2(Rn+1). (consistency)

? Lemma 7.1. V is a viscosity subsolution of the HJB equation, and V? is a viscosity supersolution.

Let us assume that (7.3.1) and IV(2.2) of Fleming and Soner hold. Let V (t, x) be the value function for the controlled diﬀusion process, as in IV(2.10) of Fleming and Soner. We also make the following assumption, which ensures that V h assumes the terminal data (7.4.2) in a uniform way: lim V h (s, y) = ψ (x) (7.4.7) (s, y)→(t1, x) h↓0 uniformly for x in any compact subset of Rn.

Theorem 7.2. Let V h be a solution to (7.4.1) and (7.4.2). Assume that (7.4.3) - (7.4.6) and (7.4.7) hold. Then lim V h (s, y) = V (t, x) (7.4.8) (s, y)→(t, x) h↓0

uniformly on any compact subset of Q0.

Lemma 7.3. V h ≤ W for any supersolution W of (7.4.1) - (7.4.2) and Z ≤ V h for any subsolution Z of (7.4.1) - (7.4.2).

Theorem 7.4. Let V h(t, x) be deﬁned by (7.3.20), with terminal data (7.3.3). Suppose that ψ is bounded and uniformly continuous in addition to assumptions (7.3.1) and IV(2.2) of Fleming and Soner. Then (7.4.8) holds uniformly on Q0.

61 Chapter 8: Application: Merton’s Portfolio Optimization Problem

Merton’s Portfolio Optimization Problem is a well-known and classical topic in mathematical finance. It concerns finding the optimal investment strategy for the investor, who has only two possible objects of investment: one is a risk-free asset (such as bond), the price of which grows at a fixed rate, and the other is a risky asset (such as stock), the price of which follows a geometric Brownian motion [4]. The investor has two options to allocate his wealth: consumption and investment to accumulate wealth. In order to maximize his expected utility from intermediate consumption and terminal wealth, he needs to choose how much to consume and how to allocate his wealth between the risky asset and risk-free one [4].

In 1969 and 1971 Merton formulated the optimal investment and consumption problem as a stochastic optimal control problem which could be solved with the dynamic programming method [4]. Using the methods of stochastic control, he derived a fully nonlinear partial differential equation (thus the well-known Hamilton-Jacobi-Bellman (HJB) equation) for the value function of the optimization problem [11]. The main difficulty of the Merton problem is the nonlinearity of the HJB equation, which makes it difficult to solve analytically. The utility function, which describes the investor’s risk aversion plays an important role in the Merton problem. For some special cases of utility functions which belongs to the constant relative risk aversion (CRRA) class (i.e., logarithmic and power utility), closed form of analytical solution for the HJB equation on a finite horizon was found by him [4]. For these cases, it turns out that the optimal strategy is to keep a constant fraction of the wealth in stocks.

62 8.1 The Merton’s Portfolio Optimization Problem and the HJB Equation

1Suppose we have a financial market with two assets being traded continuously on a finite horizon [0,T ]. One asset is a risk-free asset called bond, whose price {P (t) , t ≥ 0} is modelled according to the ordinary differential equation (ODE)

dPt = rPtdt, t ∈ [0,T ], (8.1.1) with r being the risk-free interest rate. The other one is a risky asset called stock. We model the price of the risky asset St as the solution of

dSt = µStdt + σStdWt, t ∈ [0,T ], (8.1.2)

where µ is the drift rate, σ is the volatility, and Wt is a standard Brownian motion.

An investor is endowed with a known initial wealth x0 and the wealth at time t is denoted as Xt. In order to maximize his expected utility from intermediate consumption and terminal wealth, the investor needs to make a decision on how much to consume and, in the meantime, how much to invest in stock markets at any time t, prior to T . The consumption rate per unit time at time t is denoted as c (t) and the investment proportion π (t) represents the fraction of total wealth that is invested in the risky asset at time t [4]. The remaining fraction 1 − π (t) is also invested in the risk-free bond. As a result, the total wealth Xt is governed by the following SDE:

dXt = f (t, Xt, (π (t) , c (t))) dt + σ (t, Xt, (π (t) , c (t))) dWt (8.1.3) = {[r + π (t)(µ − r)] Xt − c (t)} dt + Xsπ (t) σ0dWt

The main goal of the Merton portfolio optimization problem is to obtain the optimal investment and consumption strategies, i.e. to determine π (t) and c (t), such that the

1Parts of this chapter follows [4] and [11].

63 expected utility from accumulated consumption and the terminal wealth is maximized. The objective functional is given as

τ max E e−βsU (c (s)) ds + e−βT B (X (τ)) , (8.1.4) (u(·), c(·)) ˆ0

where τ = inf {s | Xs ≤ 0}; E is the expectation operator; ρ is the subjective discount rate; U is a function measuring the utility from intermediate consumption c (s) and B is also a function measuring the utility from terminal wealth X (τ) [4]. In other words, to make sure objective functional is well deﬁned, some constraints may be imposed on the wealth process and consumption process. When the utility function is deﬁned on R+, such as power and logarithmic function adopted by Merton (1969) [4], we need the following constraint

0 ≤ π (t) ≤ 1, c (t) ≥ 0,Xt ≥ 0, , t ∈ [0,T ] (8.1.5)

In summary, the Merton problem has been modelled as a stochastic optimal control problem with the objective functional (8.1.4), driven by the dynamics of the wealth (8.1.3), and subject to the possible constraint (8.1.5).

8.2 The HJB Equation

The idea of the dynamic programming is to break down the optimization problem into smaller sub-problems and then combine the solutions to reach an overall solution.

We apply the dynamic programming method to ﬁnd out a solution of the stochastic optimal control problem as it was done in Chapter 3, for the wealth process Xt.

64 Let (t, x) ∈ [0,T ) × R+ and consider the following control system over [t, T ]

  dXs = f (s, Xs, (π (s) , c (s))) ds + σ (s, Xs, (π (s) , c (s))) dWs, (8.2.1)

 Xt = x,

⇔

  dXs = {[r + π (s)(µ − r)] Xs − c (s)} ds + Xsπ (s) σ0dWs, (8.2.2)

 Xt = x, with the same constraint (8.1.5). The cost functional is

τ J (t, x| π (·) , c (·)) = E L (s, Xs, (π (s) , c (s))) ds + Ψ (τ, Xτ ) t (8.2.3) τ´ −βs −βT = E t e U (c (s)) ds + e B (X (τ)) | Xt = x ´ where τ = inf {s | Xs ≤ 0}, L (s, Xs, (π (s) , c (s))) is the running costs and Ψ (τ, Xτ ) is the terminal costs.

Deﬁne the value function as

V (t, x) = max J (t, x| π (·) , c (·)) (8.2.4) (π(·), c(·))

n Let S+ denote the set of symmetric, nonnegative deﬁnite n × n matrices 0 A = (Aij) , i, j = 1, . . . , n. Let a = σσ and

n X traA = aijAij = aA. (8.2.5) i, j=1

65 8.2.1 Utility Function

For us to be able to characterize the investor’s decisions and preferences we need the concept of utility functions. Utility function basically expresses how satisﬁed the investor is with a certain outcome of the investment, this way it is a function of the wealth or a function of the consumption [11].

Assumption:

1. The investor is risk averse, meaning that he will only accept investments which are better than fair game, this implies strict concavity of the utility function [11].

2. The investor will always prefer more wealth to less, this can be referred to as non- satiation of the investor and it implies that the utility function is strictly monotone increasing [11].

3. The utility function B is assumed to take the same form with the utility function U.

Deﬁnition 8.1. For a subset S ⊆ R,U : S → R is a utility function, if U is strictly increasing, strictly concave and continuous on S.

In this thesis, we will consider only the power utility function, which is deﬁned as:

xγ U (x) = , (8.2.6) γ where γ ∈ R, γ < 1, γ 6= 0. The Arrow-Pratt measure of relative risk aversion of such a utility function is given as xU 00 (x) R (x) = − = 1 − γ (8.2.7) U 0 (x)

66 The HJB function arising from the Merton portfolio optimization problem subject to the power utility function reads as follows.

¯ n n n For (t, x) ∈ Q0 = [t0, t1] × R , p ∈ R ,A ∈ S+, we have

1 H (t, x, p, A) = sup −f (t, x, π, c) · p − 2 tra (t, x, π, c) A − L (t, x, π, c) π, c∈U 1 2 2 2 −βt = sup − [(r + π)(µ − r) x − c] p − 2 Aσ0x π − e U (c) π∈[0, 1], c≥0 (8.2.8)

Let p, A be given, then

1 g (π, c) = − [(r + π)(µ − r) x − c] p − Aσ2x2π2 − e−βtU (c) (8.2.9) 2 0

We need to maximize the smooth function g (π, c) over a closed domain. However, La- grange necessary optimality condition is not directly applicable. We therefore, split our domain into interior and boundary. Lagrange condition is only applied in the interior domain. After some straight forward calculation that we omit here, the optimal value can not be attained at the boundary. Hence, Lagrange needs only to be applied in the interior. Therefore, for π ∈ (0, 1), c > 0, we get:

  2 2 − (µ − r) xp − Aσ0x π ∇(π, c)g (π, c) =   = 0 (8.2.10) p − e−βtU 0 (c)

⇔    (µ−r)p  (µ−r)p  π = − Aσ2x  π = − Aσ2x 0 ⇔ 0 (8.2.11) γ−1  cγ−1 = pe−βt  c = pe−βt .

Assuming p ≥ 0, the optimal pair (π?, c?) is given as

1 c∗ (t, x, p, A) = (peρt) γ−1 (8.2.12) ∗ (µ−r)p π (t, x, p, A) = − 2 Aσ0 x

67 and xγ V (t, x) = h (t)1−γ , γ where h (t) takes strictly positive values for every t.

Plugging the optimal pair (π?, c?) into the HJB function (8.2.8), we get

H (t, x, p, A) = −f (t, x, c∗, π∗) p − 1 a (t, x, c∗, π∗) A − L (t, x, c∗, π∗) 2 (8.2.13) ∗ 1 2 2 ∗ 2 −ρt ∗ = − [(r + u)(µ − r) x − c ] p − 2 Aσ0x (u ) − e U (c ) which is the theoritical HJB function.

68 Chapter 9: Numerical Results

In this section, we will solve the HJB equations associated with the Merton’s Portfolio Problem introduced in the previous chapter using the Finite Diﬀerence Numerical Ap- proximations method discussed in chapter 6. We will compare the theorical HJB to the numerical HJB, compute the optimal investment and consumption strategy (π?, c?) and the optimal wealth process. This chapter contains all of our main contribution.

9.1 Brief Summary of the Steps Involved in Obtaining the HJB Equations Associated with the Merton’s Portfolio Problem

Let

  dXs = f (s, Xs, (π (s) , c (s))) ds + σ (s, Xs, (π (s) , c (s))) dWs,

 Xt = x, ⇔

  dXs = {[r + π (s)(µ − r)] Xs − c (s)} ds + Xsπ (s) σ0dWs,

 Xt = x,

Here, we have the following variables:

State process: Xs

Control process: (π (s) , c (s)), where π (s) is the fraction of capital on investment and c (s) is the fraction of capital on consumption.

Constraints: 0 ≤ π ≤ 1, c ≥ 0,X ≥ 0

69 a = σσ0 = σ2 The cost functional is

τ J (t, x| π (·) , c (·)) = E t L (s, Xs, (π (s) , c (s))) ds + Ψ (τ, Xτ ) τ´ −βs −βT = E t e U (c (s)) ds + e B (X (τ)) | Xt = x ´

τ = inf {s | Xs ≤ 0}. Note that τ is the exist time of Xs. (·)γ U (·) ≡ B (·) = γ . The utility function B is assumed to take the same form with the utility function U. Let

1 H (t, x, p, A) = sup −f (t, x, π, c) · p − 2 tra (t, x, π, c) A − L (t, x, π, c) π, c∈U 1 2 2 2 −βt = sup − [(r + π)(µ − r) x − c] p − 2 Aσ0x π − e U (c) π∈[0, 1], c≥0 traA = aA

Let p, A be given, then

1 g (π, c) = − [(r + π)(µ − r) x − c] p − Aσ2x2π2 − e−βtU (c) 2 0

Therefore, for π ∈ (0, 1), c > 0 , we get:

  2 2 − (µ − r) xp − Aσ0x π ∇(π, c)g (π, c) =   = 0 p − e−βtU 0 (c)

⇔    (µ−r)p  (µ−r)p  π = − Aσ2x  π = − Aσ2x 0 ⇔ 0 γ−1  cγ−1 = pe−βt  c = pe−βt .

70 Assuming p ≥ 0, the optimal pair (π?, c?) is given as

1 c∗ (t, x, p, A) = (peρt) γ−1

∗ (µ−r)p π (t, x, p, A) = − 2 Aσ0 x and xγ V (t, x) = h (t)1−γ , γ where h (t) takes strictly positive values for every t.

Plugging the optimal pair (π?, c?) into the HJB function (8.2.8), we get

∗ ∗ 1 ∗ ∗ ∗ ∗ H (t, x, p, A) = −f (t, x, c , π ) p − 2 a (t, x, c , π ) A − L (t, x, c , π ) ∗ 1 2 2 ∗ 2 −ρt ∗ = − [(r + u)(µ − r) x − c ] p − 2 Aσ0x (u ) − e U (c ) which is the theoritical HJB function. Let f + (x, v) = max (f (x, v) , 0) , f − (x, v) = max (−f (x, v) , 0) .

We call f + and f − the positive and negative parts of f. The dynamics of the controlled Markov chain are speciﬁed by the one-step transition probabilities

v h h a(x, v) + i p (x, x + δ) = δ2 2 + δf (x, v) v h h a(x, v) − i p (x, x − δ) = δ2 2 + δf (x, v) pv (x, x) = 1 − ph (x, x + δ) − ph (x, x − δ) .

71 The numerical HJB function is therefore given as:

 ∗ + 1 2 2 ∗ 2 −ρt ∗  − [(r + u)(µ − r) x − c ] p − Aσ0x (u ) − e U (c ) if f ≥ 0 H¯ t, x, p+, p−,A = 2  ∗ − 1 2 2 ∗ 2 −ρt ∗  − [(r + u)(µ − r) x − c ] p − 2 Aσ0x (u ) − e U (c ) if f < 0

Let us consider a ﬁnancial market which consists of a bond with risk-free interest rate r = 0.07 and a single stock with drift µ = 0.12 and volatility σ = 0.4. The discount rate, β = 0.15, α = 0.5 and the terminal trading time is T = 1.

72 HJB: theoretical HJB: numerical HJB: discretization error

4 0 3

2.5 3 2 -0.05

2 (t, x) 1.5 (t, x) , h V(t, x) -0.1 V 1 1 0.5

0 0 -0.15 0 0 0

0.5 0.5 0.5 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 t 1 t 1 t 1 0 x 0 x 0 x

Figure 9.1: Comparing theoretical and numerical HJB. V (t, x) measures the optimal average utility for the investor who started following the optimal strategy at time t with wealth x. So if the investor starts at a certain time, t with a certain wealth x, then both the graph of the theoretical and the numerical HJB function (the latter only approximately) tells us what the investor will get on average in terms of utility. The domain of the value function V (t, x) is unbounded, therefore we cut it so as to make it compact imposing an artificial Neumann boundary condition. The last graph measures the difference between the theoretical and the numerical HJB function. It simply quantifies the numerical error, this error is approximately less than 4%.

73 1.5

0.7

0.6 1 0.5 , h ,

0.4 t, x , h c

, 0.5

t, x 0.3

0.2

0.1 0 0 0 1 0 0.5 1 0.5 0.5 1 0.5 1 0 x t x 0 t

Figure 9.2: Numerical approximation of the optimal investment and consumption strategies.

π? and c? represent both numerical optimal investment and consumption strategy respectively. The graph of the numerical optimal investment strategy, π? tell us that the more asset the investor has, the more will he or she be willing to risk or invest. Also the graph of the numerical optimal consumption strategy, c? tell us that the more money the investor has, the more he or she will consume. However, the investor consumes more not sooner but later. Since if the investor consumes too much now, there is the risk that he or she might ?,δ,h not have much to consume later or even go bankrupt before the terminal time. πt,x has a numerical artifact at x = 0 due to introduction of the artiﬁcial boundary.

74 Figure 9.3: Wealth process The graph above measures the evolution of the investors total wealth over time. The total wealth of the investor reduces over time since he consumes a fraction of the total wealth.

75 Chapter 10: Conclusions and Further Directions

In this thesis, we began with a brief introduction to Stochastic Calculus. It serves as a fundamental tool throughout this thesis. We outlined the basic structure of a stochastic optimization problem and the features it is formulated with and showed how Dynamic Programming Principle (DPP) can be used to derive the HJB equation. We applied this concept to the Mertons portfolio optimization problem subjet to the power utility function. Application of DPP to a Mertons portfolio optimization problem leads to a nonlinear partial differential equation (PDE), called the Hamilton-Jacobi- Bellman (HJB). The main difficulty of the Mertons portfolio optimization problem is the nonlinearity of the HJB equation, which usually makes it inaccessible to analytically solution attempt. We therefore derived the dynamic programming function for the resulting Merton portfolio optimization problem and the we solved it numerically using the finite difference scheme.

A future work would be to develop a new full-loop numerical technique for stochastic optimization of moderate to (potentially) high-dimensional controlled Markovian diﬀusions. A possible approach could be based on Bellmans dynamic programming principle and a mesh- less discritization of the value function. The method will need to be implemented, analyzed, assessed and applied to various problems in mathematical ﬁnance and data science.

76 Bibliography

[1] Greif, C. (2017). Numerical Methods for Hamilton-Jacobi-Bellman Equations. Uni- versity of Wisconsin-Milwaukee, U.S.A.

[2] Filatova, D., Orlowski, A., Dicoussar, V. (2014). Estimating the time-varying parameters of SDE models by maximum principle, 2014 19th International Conference on Methods and Models in Automation and Robotics (MMAR), Miedzyzdroje, Poland.

[3] Redeker, I. & Wunderlich, R. (2016). Portfolio optimization under dynamic risk constraints: continuous vs. discrete time trading. Statistics & Risk Modeling, De Gruyter, vol. 35(1-2), pages 1-21

[4] Zhu, S. & Ma, G. (2018). An analytical solution for the HJB equation arising from the Merton problem. International Journal of Financial Engineering. 1850008. 10.1142/S2424786318500081.

[5] Evans, L. C. (2013). An Introduction to Stochastic Diﬀerential Equations. American Mathematical Society.

[6] Sondermann, D. (2006). Introduction to Stochastic Calculus for Finance: A New Di- dactic Approach. Fourth Edition. Springer-Verlag Berlin Heidelberg.

[7] Handel, R. V. (2007). Stochastic Calculus, Filtering, and Stochastic Control. Lecture Notes.

[8] Allen, E. (2007). Modeling with Itoˆ Stochastic Diﬀerential Equations. Springer

[9] Touzi, N. (2010). Optimal Stochastic Control, Stochastic Target Problems, And Back- ward SDE. Lecture Notes.

77 [10] Fleming, W. H. & Soner, H. M. (2006). Controlled Markov Processes and Viscosity Solutions. Second Edition. Springer.

[11] Tikosi, K. (2016). Merton’s Portfolio Problem. Central European University, Budapest, Hungary.

78 Appendix

Matlab Code function merton_example %% Constants and parameters % Market constants r = 0.07; mu = 0.12; sigma0 = 0.4;

% Utility and discounting constants alpha = 0.5; beta = 0.15; T = 1.0;

U = @(x) x^alpha/alpha; U_p = @(x) x^(alpha - 1);

% Costs L = @(s, x, u) exp(-beta*s)*U(u(2)); % running costs psi = @(s, x) exp(-beta*s)*U(x); % terminal costs

% Market model drift and volatility

79 f = @(s, x, u) ((r + u(1)*(mu - r))*x - u(2)); %u = [_pi, c] sigma = @(s, x, u) x*u(1)*sigma0;

% Optimal Markovian control strategy in the continuous case u_ast = @(s, x, p, A) ... [-(mu - r)*p/(A*sigma0^2*x); (p*exp(beta*s))^(1/(alpha - 1))];

%% Solve the discrete HJB PDE % x domain truncation x_max = 1.5;

N_x = 50; N_t = N_x^2*10; x_lat = linspace(0, x_max, N_x); t_lat = linspace(0, T, N_t); delta = x_max/(N_x - 1); h = T/(N_t - 1);

V = zeros(N_t, N_x); for j = 1:N_x V(end, j) = psi(T, x_lat(j)); end for i = 1:(N_t - 1)

80 % At j = 0, Dirichlet 0

% Inner points for j = 2:(N_x - 1) dxp = (V(end - i + 1, j + 1) - V(end - i + 1, j))/delta; dxm = (V(end - i + 1, j) - V(end - i + 1, j - 1))/delta; d2x = (V(end - i + 1, j + 1) + V(end - i + 1, j - 1) - 2*V(end - i + 1, j))/delta^2;

[H, ~] = discrete_HJB(t_lat(end - i + 1), x_lat(j), dxp, dxm, d2x);

V(end - i, j) = V(end - i + 1, j) - h*H; end

% At j = end, Neumann 0 V(end - i, end) = V(end - i, end - 1); end figure(1);

% Plotting theoretical solution

I_plot = [1:(N_t/100):N_t N_t]; [T_m, X_m] = meshgrid(t_lat(I_plot), x_lat); subplot(1, 3, 1); c_ = alpha/(1 - alpha)*(r + 1/(2*(1 - alpha))*(mu - r)^2/sigma0^2);

81 h_ = exp(-beta*T_m/(1 - alpha)).*exp(-c_*(T - T_m)) + ... (1 - alpha)*exp(-c_*T_m)/(beta - (1 - alpha)*c_).* ... (exp(-(beta - (1 - alpha)*c_)*T_m/(1 - alpha)) - exp(-(beta - (1 - alpha)*c_)/(1 - alpha)*T));

V_actual = h_.^(1 - alpha).*X_m.^alpha/alpha; mesh(T_m, X_m, V_actual); xlabel(’t’); ylabel(’x’); zlabel(’V(t, x)’); view([70 30]); title(’HJB: theoretical’);

% Plotting numerical solution subplot(1, 3, 2); mesh(T_m, X_m, V(I_plot, :)’); xlabel(’t’); ylabel(’x’); zlabel(’V^{\delta, h}(t, x)’); view([70 30]); title(’HJB: numerical’);

% Plotting numerical error subplot(1, 3, 3);

82 mesh(T_m, X_m, V(I_plot, :)’ - V_actual); xlabel(’t’); ylabel(’x’); zlabel(’\epsilon(t, x)’); view([70 30]); title(’HJB: discretization error’);

%% MC simulation N = 10; x_path = zeros(1, N_t); for k = 1:N x_ind = floor(N_x*0.6);

x_path(k, 1) = x_lat(x_ind);

for i = 1:(N_t - 1) s = t_lat(i); x = x_lat(x_ind);

if ((x_ind == 1) || (x_ind == N_x)) x_path(k, (i + 1):end) = x; break; end

dxp = (V(i, x_ind + 1) - V(i, x_ind))/delta;

83 dxm = (V(i, x_ind) - V(i, x_ind - 1))/delta; d2x = (V(i, x_ind + 1) + V(i, x_ind - 1) - 2*V(i, x_ind))/delta^2;

[~, u] = discrete_HJB(s, x, dxp, dxm, d2x);

f_ = f(s, x, u);

f_p = f_*(f_ >= 0); f_m = -f_*(f_ < 0);

p = [h/delta^2*sigma(t_lat(i), x, u)^2/2 + delta*f_m; 0; h/delta^2*sigma(t_lat(i), x, u)^2/2 + delta*f_p]; p(2) = 1 - p(1) - p(3);

c = randcat(p);

x_ind = x_ind + c;

x_path(k, i + 1) = x_lat(x_ind); end end figure(2); hold on; xlabel(’t’); ylabel(’X_t^{\ast, {\delta}, h}’); for k = 1:N

84 plot(t_lat, x_path(k, :)); end

%% Optimal strategy figure(3);

I_plot = [1:(N_t/100):N_t N_t]; [T_m, X_m] = meshgrid(t_lat(I_plot), x_lat(2:end-1));

PI = zeros(size(T_m)); C = zeros(size(T_m)); for j = 2:size(T_m, 1) for i = 1:size(T_m, 2) dxp = (V(I_plot(i), j + 1) - V(I_plot(i), j))/delta; dxm = (V(I_plot(i), j) - V(I_plot(i), j - 1))/delta; d2x = (V(I_plot(i), j + 1) + V(I_plot(i), j - 1) - 2*V(I_plot(i), j))/delta^2;

[~, u] = discrete_HJB(T_m(j, i), X_m(j, i), dxp, dxm, d2x);

PI(j, i) = u(1); C(j, i) = u(2); end end subplot(1, 2, 1); hold on;

85 mesh(T_m, X_m, PI); xlabel(’t’); ylabel(’x’); zlabel(’\pi^{\ast, {\delta}, h}_{t, x}’); view([120 20]); subplot(1, 2, 2); hold on; mesh(T_m, X_m, C); xlabel(’t’); ylabel(’x’); zlabel(’c^{\ast, {\delta}, h}_{t, x}’); view([-30 20]);

%% Auxiliary functions function [H, u] = discrete_HJB(s, x, pp, pm, A) u_p = u_ast(s, x, pp, A); u_m = u_ast(s, x, pm, A);

f_u_p = f(s, x, u_p); f_u_m = f(s, x, u_m);

if (f_u_p > 0) H_u_p = -f_u_p*pp - sigma(s, x, u_p)^2/2*A - L(s, x, u_p); else H_u_p = -f_u_p*pm - sigma(s, x, u_p)^2/2*A - L(s, x, u_p); end

86 if (f_u_m > 0) H_u_m = -f_u_m*pp - sigma(s, x, u_m)^2/2*A - L(s, x, u_m); else H_u_m = -f_u_m*pm - sigma(s, x, u_m)^2/2*A - L(s, x, u_m); end

if (H_u_p >= H_u_m) H = H_u_p; u = u_p; else H = H_u_m; u = u_m; end end

function c = randcat(p) % categorical pseudo-rv % with support (-1, 0, 1) and probabilities p = (p1, p2, p3) u = rand(1);

if (u <= p(1)) c = -1; elseif (u <= p(1) + p(2)) c = 0; else c = 1; end end end

87 Curriculum Vitae

Prince Osei Aboagye was born on September 6, 1992 as a son of Dickson and Ruth Aboagye. He graduated from Pope John Senior High School, Ghana, in 2014. He holds a Bachelor of Arts (BA) degree in Economics and Mathematics from the University of Ghana (UG). He rendered his national service as a Teaching and Research Assistant at the University of Ghana’s Department of Mathematics.

He has always liked studying Mathematics because he was very comfortable with it and also he found it amazing how the abstract language of Mathematics can be used to describe and study real world problems. He, therefore, opted for a combined major in Economics and Mathematics for his bachelors degree at the University of Ghana. He have always been fascinated by the new ideas that arise out of deep involvement in a particular subject and the consequent discoveries and eﬀects on the society at large. His obsession with acquiring new ideas underlines his desire to read for his Master of Science (MS) in Mathematics at the University of Texas at El Paso (UTEP). He began his graduate studies at UTEP in the fall of 2018. While pursuing a Master’s degree in Mathematics he worked as a Teaching and Research Assistant at the Mathematical Sciences Department. In spring 2017, he presented a poster at the 43rd National Society of Black Engineers Annual Convention, Kansas City, MO on March 30, 2017 on “Game Theory: An application of the Prisoners Dilemma”.

After graduation, Prince Osei Aboagye will pursue his doctoral degree in Computing with specialization in Data Management and Analysis at the University of Utah. Email address: [email protected]