The δ− sensitivity and its application to stochastic optimal control of nonlinear diffusions.
Evangelos A. Theodorou1 and Emo Todorov2
Abstract— We provide optimal control laws by using tools sample paths, financial applications require the computation from stochastic calculus of variations and the mathematical of its differential with respect to the initial condition xt0 (δ- concept of δ−sensitivity. The analysis relies on logarithmic delta sensitivity), drift b(x)(ρ-rho sensitivity) and diffusion transformations of the value functions and the use of linearly solvable Partial Differential Equations(PDEs). We derive the σ(x) (ν-vega sensitivity) of the stochastic dynamics in (2). corresponding optimal control as a function of the δ−sensitivity The contribution of this work is to show that the of the logarithmic transformation of the value function for the δ−sensitivity appears in the computation of the optimal con- case of nonlinear diffusion processes affine in control and noise. trol under the logarithmic transformation of value function. In particular we provide explicit formulas for the stochastic I.INTRODUCTION optimal control of Markov diffusion processes affine in con- Stochastic optimal control for Markov diffusion process trol and noise. The analysis relies on the stochastic calculus affine in controls and noise has been considered in different of variations and the concept of Malliavin derivative. Related areas of science and engineering such as machine learning, work in this area traces back to [13] as well as more recent control theory and robotics [1], [3]–[6], [9]–[11], [15], [16]. work on the application of Malliavin calculus to finance [7], One of the common methodologies for solving such optimal [8]. Here we are revisiting the topic of stochastic calculus of control problems is based on the exponential transformations variations and show its applicability to find optimal control of value functions. The main idea is inspired by the fact laws. In addition we provide all the underlying assumptions that linear PDEs can be solved with the application of the regarding smoothness and differentiability conditions of the Feynman-Kac lemma which provides a tractable sampling stochastic dynamics under consideration. numerical scheme. Therefore, instead of working with the The paper is organized as follows. In section III we review initial nonlinear PDE that is the Hamilton-Jacobi-Bellman basic theorem of the stochastic calculus of variations which (HJB) equation, one can use the exponential transformation will be used for computing the δ- sensitivity. In section V of the initial value function to get the corresponding lin- we present the derivation of stochastic optimal control based ear PDE and then apply the Feynman-Kac lemma to find on the logarithmic transformation of the value function. In stochastic representations of its solution. the last section VI we conclude. Parallel to the work in control theory, machine learn- ing and robotics, researchers in the domain of financial II.NOTATION engineering developed tools based on stochastic calculus 2 of variations to find sensitivities of expectations of pay- We denote the norm x ∈ L (P ) as ||x||L2(P ) = off functions with respect to changes in the parameters of (E[x2])1/2 = (R x2(ω)P (dω))1/2. Let x ∈ where ~x is a sample path ~x = {xt1 , xt2 , ..., xtN } gener- ated by forward sampling of the diffusion process: III.STOCHASTIC CALCULUSOF VARIATIONS. In this section we provide the basic definitions and prop- dx = b(x)dt + σ(x(t))dw(t) (2) erties of the stochastic calculus of variations in the form of The cost function Ψ(x) can be computed with Monte- propositions and definitions. Carlo simulations. Despite the estimation of the cost based on Definition 1-(Malliavin Derivative): Let {w(t), 0 ≤ t ≤ T } be a n-dimensional brownian motion on a complete prob- 1 Evangelos A. Theodorou is a Postdoctoral Research Associate with the ability space (Ω, F, P). Let the set G of random variables Department of Computer Science and Engineering,University of Washing- ton, Seattle. [email protected] F of the form: 2 Emo Todorov is Associate professor with the Departments of Computer Z ∞ Z ∞ Science and Engineering, and Applied Math, University of Washington, F = f h (t)dw(t), ... h (t)dw(t) (3) [email protected] 1 n Seattle. 0 0 with f ∈ G( Z t Z t The equation above can be written as: Z(t) = Jb(x)Z(τ)dτ + Dsσ(x(τ))dw(τ)+σ(x(s)) s s Z T Z T or −1 Υ(ti) α(s)ds = α(s)Z(ti)σ(x(s)) Υ(s)1s≤ti ds 0 0 N X By making use of (20) we have the expression: dZ(t) = Jb(x)Z(t)dt + Jσj (x(t))Z(t)dwj(t) (17) j Z tN N Υ(t ) = α(s)Z(t )σ(x(s))−1Υ(s)1 ds X i i s≤ti 0 = Jb(x) + Jσj (x(t))dwj(t) Z(t) (18) j and for ti = tN the final result is: with initial condition Z(s) = σ(x(s)). From the equation Z tN (16) and (17) we will have that −1 Υ(tN ) = α(s)Z(tN )σ(x(s)) Υ(s)ds −1 0 Z(t) = Dsx(t) = Υ(t)Υ(s) σ(x(s))1s≤t, s ≥ 0 (19) In the remaining of the analysis in this paper we as- To see that we solve the equation above with respect to Υ(t), sume that: Assumption 1: The diffusion matrix σ satisfies −1 T T Υ(t) = Z(t)σ(x(s)) Υ(s)1s≤t and then substitute into the uniform ellipticity condition: > 0, ξ σ σξ ≥ 2 n (16). More precisely we write (16) starting from the initial ||ξ||2, ∀ξ, x ∈ < . time τ = s and we will have: IV. THE δ- SENSITIVITY. The goal in this section is to to find the gradient of the Z t Z t N X expectation of the form: Υ(t)−Υ(s) = Jb(x)Υ(τ)dτ+ Jσj (x)Υ(τ)dwj(τ) s s j=1 Z tN E J (~x) xti = E q(x)dt xti (21) substitution Y(t) results in: t −1 with respect to the changes in the initial state xt of the Z(t)σ(x(s)) − I Υ(s) = diffusion process: Z t Z t N X dx = b(x)dt + σ(x(t))dw(t) (22) Jb(x)Z(τ)dτ + Jσj (x)Z(τ)dwj(τ) s s j=1 The trajectories ~x are defined as ~x = × σ(x(s))−1Υ(s) (xti+1 , xti+2 , ..., xtN ). We also denote by ∇i the partial derivative with respect to the i-th argument and we introduce Multiplication of both sides of the equation above with the set Γ as follows: Υ(s)−1σ(x(s)) will provide us with (17). Equations 12 and m 11 can be trivially proved by application of the chain rule as Z ti defined in (6). 2 ΓN = {α ∈ L ([0.T ])| αdt = 1, ∀i = 1, 2, ..., m} 0 Next we derive another identity that will be used later (23) n in our derivations. More precisely let a(t) ∈ [0 T ] be a Lemma 1: Under the assumption 1, and for any x ∈ < deterministic function such that: and for any α ∈ ΓN we have that: Z T N a(t)dt = 1 X T ∇xt E J (~x) xti = E ∇xt J (~x) Υ(tj) xti t0 i j j=i+1 Then from (11) we have that for t = ti the expression that follows: where Υ(t) is the stochastic flow defined as: n X dΥ(t) = J (x(t))Υ(t)dt+ J (x(t))Υ(t)dwi(t) (24) −1 b σi Z(ti) = Dsx(ti) = Υ(ti)Υ(s) σ(x(s))1s≤ti , (20) i=1 T with the initial condition Υ(0) = I T n. where the term v(t) = R α(t) σ(x(t))−1Υ(t) dw(t) is Proof: We assume that the functional J (~x) is continu- 0 ously differentiable and therefore we will have that: the forms the stochastic derivative. The derivation for the case where J (~x) is not continuously T T differentiable can be found in [8]. ∇xt E J (~x) xti = E ∇xt J (~x) xti i i V. STOCHASTIC OPTIMAL CONTROL N X T dxtj = E ∇xt J (~x) xti In this section we will show how δ−sensitivity is used j dx j=i+1 ti for computing the optimal control in feedback form. More N precisely, we consider stochastic optimal control in the clas- X T = E ∇xt J (~x) Υ(tj) xti sical sense, as a constrained optimization problem, with the j j=i+1 cost function under minimization given by the mathematical expression: For the case where J (~x) = φ(xT ) and therefore J (~x) depends on the last state of the state trajectory ~x = V (x) = min E(1) J(x, u) {xt1 , ..., xtN } we have the following proposition: u Proposition 3: Under the assumption 1, and for any x ∈ Z tN n (1) < and for any α ∈ ΓN we have that: = min E φ(x(tN )) + L(x, u, t)dt (26) u to subject to the nonlinear stochastic dynamics: ∇xt E φ(xtN ) xti i dx = F(x, u)dt + B(x)dw (27) Z tN T −1 n×1 p×1 = E φ(xtN ) α(t) σ(x) Υ(t) dw(t) with x ∈ < denoting the state of the system, u ∈ < p×1 ti the control vector and dw ∈ < brownian noise. The where Υ(t) is the stochastic flow defined as: function F(x, u) is a nonlinear function of the state x and affine in controls u and therefore is defined as F(x, u) = n X f(x) + G(x)u . The matrix G(x) ∈ Z tN 1 VI.CONCLUSIONS Ψ(x ) = E(0) exp − q˜(x)dt Ψ(x ) (35) ti λ tN ti We derived the optimal control law by using the δ− (0) The expectation E is taken on sample paths ~xi = sensitivity. Consider the case of estimating the expectation of a cost function over state paths generated by sampling of (xi, ..., xtN ) generated with the forward sampling of the uncontrolled diffusion equation: a diffusion process. Essentially δ−sensitivity corresponds to the differential of the expected cost function with respect to ˜ dx = f(xt)dt + B(x)dw (36) the initial state of the underlying diffusion process. Future research will involve the development of efficient algorithms The optimal controls are specified as: for applications to dynamical systems in robotics and biol- ogy. Another line of research involves the use of stochastic ∇xΨ(x, t) calculus of variations for optimal control of Markov jump u (x, t) = −R−1 q (x, t) − λG(x)T (37) PI 1 Ψ(x, t) diffusion processes. Since, the initial value the function V (x, t) is the minimum REFERENCES of the expectation of the objective function J(x, u) subject to controlled stochastic dynamics in (27), it can be trivially [1] Paolo Dai Pra, Lorenzo Meneghini, and Wolfgang Runggaldier. Con- nections between stochastic control and dynamic games. Mathematics shown that: of Control, Signals, and Systems (MCSS), 9(4):303–326, 1996-12-08. [2] Mark H. A. Davis and Martin P. Johansson. Malliavin monte Stochastic Processes and their Z tN carlo greeks for jump diffusions. (0) 1 Applications, 116(1):101–129, January 2006. V (x, ti) = −λ log E exp − q˜(x)dt Ψ(xt ) N [3] W. H. Fleming and W. M. McEneaney. Risk-sensitive control on ti λ an infinite time horizon. SIAM J. Control Optim., 33:1881–1915, ≤ E(1) J(x, u) (38) November 1995. [4] W. H. Fleming and H. Mete Soner. Controlled Markov processes and viscosity solutions. Applications of mathematics. Springer, New York, (1) The expectation E is taken on sample paths ~xi = 1nd edition, 1993. (x , ..., x ) generated with the forward sampling of the [5] W. H. Fleming and H. Mete Soner. Controlled Markov processes and i tN viscosity solutions. Applications of mathematics. Springer, New York, controlled diffusion equation in (27). Closer look into the 2nd edition, 2006. feedback control law in (37) reveals the use of the δ− sen- [6] W.H. Fleming. Exit probabilities and optimal stochastic control. sitivity. In particular, the feedback control involves the term Applied Math. Optim, 9:329–346, 1971. [7] Eric Fournie,´ Jean-Michel Lasry, Pierre-Louis Lions, and Jrme Lebu- ∇xΨ(x, t) which correspond to the δ− Greek sensitivity of choux. Applications of malliavin calculus to monte-carlo methods in the functions Ψ(x, t) with respect to the initial state of the finance. ii. Finance and Stochastics, 5(2):201–236, 2001. diffusion in (36). The analysis above is summarized by the [8] Eric Fournie,´ Jean-Michel Lasry, Pierre-Louis Lions, Jrme Lebuchoux, and Nizar Touzi. Applications of malliavin calculus to monte carlo following theorem: methods in finance. Finance and Stochastics, 3(4):391–412, 1999. [9] H. J. Kappen. Path integrals and symmetry breaking for optimal con- trol theory. Journal of Statistical Mechanics: Theory and Experiment, 11:P11011, 2005. [10] H. J. Kappen. An introduction to stochastic control theory, path integrals and reinforcement learning. In J. Marro, P. L. Garrido, and J. J. Torres, editors, Cooperative Behavior in Neural Systems, volume 887 of American Institute of Physics Conference Series, pages 149– 181, February 2007. [11] Sanjoy K. Mitter and Nigel J. Newton. A variational approach to nonlinear estimation. SIAM J. Control Optim., 42(5):1813–1833, May 2003. [12] Giulia Di Nunno, Bernt Oksental, and Frank Proske. Maliavin Calculus for Levy´ Processes with Applications to Finance. Springer, 1nd edition, 2009. [13] D. Ocone and I. Karatzas. A generalized clark representation formula, with applications to optimal portfolios. Stochastics, 37:187–220., 1991. [14] Robert F. Stengel. Optimal control and estimation. Dover books on advanced mathematics. Dover Publications, New York, 1994. [15] E. Todorov. Linearly-solvable markov decision problems. In B. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19 (NIPS 2007), Vancouver, BC, 2007. Cambridge, MA: MIT Press. [16] E. Todorov. Efficient computation of optimal actions. Proc Natl Acad Sci U S A, 106(28):11478–83, 2009.