Stochastic Control and Dynamic Asset Allocation∗

Stochastic Control and Dynamic Asset Allocation∗ Gon¸calo dos Reis† David Sˇiˇska‡ 4th April 2021 Contents 1 Introduction to stochastic control through examples 4 1.1 Discrete space and time . 4 1.2 Merton’s problem . 5 1.3 Optimal liquidation problem . 8 1.4 Systemic risk - toy model . 10 1.5 Optimal stopping . 11 1.6 Basic elements of a stochastic control problem . 11 1.7 Exercises . 13 1.8 Solutions to Exercises . 15 2 Controlled Markov chains 20 2.1 Problem setting . 20 2.2 Dynamic programming for controlled Markov chains . 20 2.3 Bellman equation for controlled Markov chain . 22 2.4 Q-learning for unknown environments . 23 2.5 Robbins–Monro algorithm . 24 2.6 The Q-learning algorithm . 25 2.7 Exercises . 26 2.8 Solutions to Exercises . 26 3 Stochastic control of diffusion processes 27 3.1 Equations with random drift and diffusion . 27 3.2 Controlled diffusions . 33 3.3 Stochastic control problem with finite time horizon . 34 3.4 Exercises . 36 ∗Lecture notes for academic year 2020/21, School of Mathematics, University of Edinburgh. †[email protected] ‡[email protected] 1 3.5 Solutions to Exercises . 37 4 Dynamic programming and the HJB Equation 39 4.1 Dynamic programming principle . 39 4.2 Hamilton-Jacobi-Bellman (HJB) and verification . 43 4.3 Solving control problems using the HJB equation and verification theorem 47 4.4 Policy Improvement Algorithm . 50 4.5 Exercises . 50 4.6 Solutions to Exercises . 52 5 Pontryagin maximum principle and BSDEs 55 5.1 Non-rigorous Derivation of Pontryagin’s Maximum Principle . 55 5.2 Deriving a Numerical Method from Pontryiagin’s maximum principle . 58 5.3 Backward Stochastic Differential Equations (BSDEs) . 58 5.4 Pontryagin’s Maximum Principle . 63 5.5 Exercises . 71 5.6 Solutions to the exercises . 72 A Appendix 73 A.1 Basic notation and useful review of analysis concepts . 73 A.2 Some useful results from stochastic analysis . 77 A.3 Useful Results from Other Courses . 83 A.4 Solutions to the exercises . 91 References 92 Reading these notes The reader is expected to know basic stochastic analysis and ideally a little bit of financial mathematics. The notation and basic results used throughout the notes are in Appendix A. Section 1 is introduction. Section 2 is a brief introduction to the controlled Markov chains: a discrete space and time setting for stochastic control problems. Section 3 covers basics of stochastic differential equations and is essential reading for what follows. Sections 4 and 5 are basically independent of each other. Exercises You will find a number of exercises throughout these notes. You must make an effort to solve them (individually or with friends). Solutions to some of the exercises will be made available as time goes by but remember: no one ever learned swimming solely by watching other people swim (and similarly 2 no-one ever learned mathematics solely by reading others’ solutions). Other reading It is recommended that you read the relevant chapters of Pham [14, at least Chapters 1-3 and 6] as well as Touzi [18, at least Chapters 1-4 and 9]. Additionally one recommends Krylov [12] for those wishing to see everything done in great generality and with proofs that do not contain any vague arguments but it is not an easy book to read. Chapter 1 however, is very readable and much recommended. Those interested in applications in algorithmic trading should read Cartea, Jaimungal and Penalva [4] and those who would like to learn about mean field games there is Carmona and Delarue [3]. 3 1 Introduction to stochastic control through examples We start with some motivating examples. 1.1 Discrete space and time We start with an optimal stopping example. Example 1.1. A very simple example of an optimal stopping problem is the following: given a fair die we are told that we’re allowed to roll the die for up to three times. After each roll we can either choose to stop the game and our gain is equal to the number currently apearing on the die, or to carry on. If we choose to carry on then we get nothing for this roll and we hope to get more next time. Of course if this is the 3rd time we rolled the die then we have to accept whichever number it is we got in this last roll. In this case solving the problem is a matter of simple calculation, working backward in time. If we’re in the third round then we stop, because we have no choice. In the second round reason as follows: our expected winning in round three is 1 21 (1 + 2 + 3 + 4 + 5 + 6) = = 3.5 . 6 6 So we stop in the second round if we rolled 4, 5 or 6 as that’s more then our expected outcome from continuing. In the first round we reason as follows: our expected winning from continuing into round two are 1 1 21 (4 + 5 + 6) + = 2.5 + 1.75 = 4.25 6 2 6 The first part corresponds to the decision to stop in round two. The second part corresponds to the decision to continue, weighted by the respective probabilities. So in the first round it is optimal to stop if we got 5 or 6. The optimal expected “payoff” 2 for this optimal stopping problem is 4 + 3 . Example 1.2. There is a biased coin with p (0, 1), p = 1 , probability of getting ∈ ∕ 2 heads and q = 1 p probability of getting tails. − We will start with an initial wealth x = iδ, i N with i < m, with some m N fixed ∈ ∈ reasonably large. At each turn we choose an action a 1, 1 . By choosing a = 1 we bet that the ∈ {− } coin comes up heads and our wealth is increased by δ if we are correct, decreased by δ otherwise. By choosing a = 1 we bet on tails and our wealth is updated accordingly. − That is, given that Xn 1 = x and our action a 1, 1 we have − ∈ {− } P(Xn = x + aδ Xn 1 = x, a) = p , P(Xn = x aδ Xn 1 = x, a) = q . | − − | − The game terminates when either x = 0 or x = mδ. Let N = min n N : Xn = { ∈ 0 or X = mδ . Our aim is to maximize n } α α J (x) = E X X0 = x N | ! " over functions α = α(x) telling what action to choose in each given state. 4 1.2 Merton’s problem In this part we give a motivating example to introduce the problem of dynamic asset allocation and stochastic optimization. We will not be particularly rigorous in these calculations. The market Consider an investor can invest in a two asset Black-Scholes market: a risk-free asset (“bank” or “Bond”) with rate of return r > 0 and a risky asset (“stock”) with mean rate of return µ > r and constant volatility σ > 0. Suppose that the price of the risk-free asset at time t, Bt, satisfies dBt rt = r dt or Bt = B0e , t 0. Bt ≥ The price of the stock evolves according to the following SDE: dSt = µ dt + σ dWt, St where (Wt)t 0 is a standard one-dimensional Brownian motion one the filtered prob- ≥ ability space (Ω, , F = ( t)t 0, P). F F ≥ 0 The agent’s wealth process and investments Let Xt denote the investor’s wealth in the bank at time t 0. Let πt denote the wealth in the risky asset. Let 0 ≥ Xt = Xt + πt be the investor’s total wealth. The investor has some initial capital X0 = x > 0 to invest. Moreover, we also assume that the investor saves / consumes wealth at rate C at time t 0. t ≥ There are three popular possibilities to describe the investment in the risky asset: (i) Let ξt denote the number of units stocks held at time t (allow to be fractional and negative), (ii) the value in units of currency πt = ξtSt invested in the risky asset at time t, (iii) the fraction ν = πt of current wealth invested in the risky asset at time t. t Xt 0 The investment in the bond is then determined by the accounting identity Xt = X π . The parametrizations are equivalent as long as we consider only positive t − t wealth processes (which we shall do). The gains/losses from the investment in the stock are then given by πt Xtνt ξt dSt, dSt, dSt . St St The last two ways to describe the investment are especially convenient when the model for S is of the exponential type, as is the Black-Scholes one. Using (ii), t t t πs Xs πs Xt = x + dSs + − dBs Cs ds 0 Ss 0 Bs − 0 # t # # t = x + π (µ r) + rX C ds + π σ dW s − s − s s s #0 #0 $ % 5 or in differential form dX = π (µ r) + rX C dt + π σ dW , X = x . t t − t − t t t 0 $ % Alternatively, using (iii), the equation simplifies even further.1 Recall π = νX. dSt dBt dXt = Xtνt + Xt 1 νt Ct dt St − Bt − = X ν (µ r) +& r 'C dt + X ν σ dW . t t − − t t t t We can make a further s$imp&lification and'obtain% an SDE in “geometric Brownian motion” format if we assume that the consumption Ct can be written as a fraction of the total wealth, i.e. Ct = κtXt. Then dX = X ν (µ r) + r κ dt + X ν σ dW .

Stochastic Control and Dynamic Asset Allocation∗

STOCHASTIC CONTROL Contents 1. Introduction 1 2. Mathematical Probability 1 3. Stochastic Calculus 3 4. Diffusions 7 5. the Stoc

MARKOV DECISION PROCESSES Abstract: the Theory of Markov

On Some Recent Aspects of Stochastic Control and Their Applications 507

Stochastic Control

Stochastic Bridges of Linear Systems Yongxin Chen and Tryphon Georgiou

Numerical Methods for Mean Field Games and Mean Field Type Control

Markov Chains

Applications of Stochastic Optimal Control to Economics and Finance

Machine Learning, Lecture 5 Kernel Machines

Mean Field Simulation for Monte Carlo Integration MONOGRAPHS on STATISTICS and APPLIED PROBABILITY

Hidden Markov Model Multiarm Bandits: a Methodology for Beam Scheduling in Multitarget Tracking Vikram Krishnamurthy, Senior Member, IEEE, and Robin J

A Framework of Stochastic Power Management Using Hidden Markov Model