<<

The δ− sensitivity and its application to stochastic optimal control of nonlinear .

Evangelos A. Theodorou1 and Emo Todorov2

Abstract— We provide optimal control laws by using tools sample paths, financial applications require the computation from stochastic of variations and the mathematical of its differential with respect to the initial condition xt0 (δ- concept of δ−sensitivity. The relies on logarithmic delta sensitivity), drift b(x)(ρ-rho sensitivity) and transformations of the value functions and the use of linearly solvable Partial Differential Equations(PDEs). We derive the σ(x) (ν-vega sensitivity) of the stochastic dynamics in (2). corresponding optimal control as a function of the δ−sensitivity The contribution of this work is to show that the of the logarithmic transformation of the value function for the δ−sensitivity appears in the computation of the optimal con- case of nonlinear diffusion processes affine in control and noise. trol under the logarithmic transformation of value function. In particular we provide explicit formulas for the stochastic I.INTRODUCTION optimal control of Markov diffusion processes affine in con- Stochastic optimal control for Markov trol and noise. The analysis relies on the stochastic calculus affine in controls and noise has been considered in different of variations and the concept of Malliavin . Related areas of science and engineering such as machine learning, work in this area traces back to [13] as well as more recent and robotics [1], [3]–[6], [9]–[11], [15], [16]. work on the application of to finance [7], One of the common methodologies for solving such optimal [8]. Here we are revisiting the topic of stochastic calculus of control problems is based on the exponential transformations variations and show its applicability to find optimal control of value functions. The main idea is inspired by the fact laws. In addition we provide all the underlying assumptions that linear PDEs can be solved with the application of the regarding smoothness and differentiability conditions of the Feynman-Kac lemma which provides a tractable sampling stochastic dynamics under consideration. numerical scheme. Therefore, instead of working with the The paper is organized as follows. In section III we review initial nonlinear PDE that is the Hamilton-Jacobi-Bellman basic theorem of the stochastic which (HJB) equation, one can use the exponential transformation will be used for computing the δ- sensitivity. In section V of the initial value function to get the corresponding lin- we present the derivation of stochastic optimal control based ear PDE and then apply the Feynman-Kac lemma to find on the logarithmic transformation of the value function. In stochastic representations of its solution. the last section VI we conclude. Parallel to the work in control theory, machine learn- ing and robotics, researchers in the domain of financial II.NOTATION engineering developed tools based on stochastic calculus 2 of variations to find sensitivities of expectations of pay- We denote the norm x ∈ L (P ) as ||x||L2(P ) = off functions with respect to changes in the parameters of (E[x2])1/2 = (R x2(ω)P (dω))1/2. Let x ∈

where ~x is a sample path ~x = {xt1 , xt2 , ..., xtN } gener- ated by forward sampling of the diffusion process: III.STOCHASTIC CALCULUSOF VARIATIONS. In this section we provide the basic definitions and prop- dx = b(x)dt + σ(x(t))dw(t) (2) erties of the stochastic calculus of variations in the form of The cost function Ψ(x) can be computed with Monte- propositions and definitions. Carlo simulations. Despite the estimation of the cost based on Definition 1-(Malliavin Derivative): Let {w(t), 0 ≤ t ≤ T } be a n-dimensional on a complete prob- 1 Evangelos A. Theodorou is a Postdoctoral Research Associate with the ability space (Ω, F, P). Let the set G of random variables Department of Computer Science and Engineering,University of Washing- ton, Seattle. [email protected] F of the form: 2 Emo Todorov is Associate professor with the Departments of Computer Z ∞ Z ∞  Science and Engineering, and Applied Math, University of Washington, F = f h (t)dw(t), ... h (t)dw(t) (3) [email protected] 1 n Seattle. 0 0 with f ∈ G(

Z t Z t The equation above can be written as: Z(t) = Jb(x)Z(τ)dτ + Dsσ(x(τ))dw(τ)+σ(x(s)) s s Z T Z T or −1 Υ(ti) α(s)ds = α(s)Z(ti)σ(x(s)) Υ(s)1s≤ti ds 0 0 N X By making use of (20) we have the expression: dZ(t) = Jb(x)Z(t)dt + Jσj (x(t))Z(t)dwj(t) (17) j Z tN  N  Υ(t ) = α(s)Z(t )σ(x(s))−1Υ(s)1 ds X i i s≤ti 0 = Jb(x) + Jσj (x(t))dwj(t) Z(t) (18) j and for ti = tN the final result is: with initial condition Z(s) = σ(x(s)). From the equation Z tN (16) and (17) we will have that −1 Υ(tN ) = α(s)Z(tN )σ(x(s)) Υ(s)ds −1 0 Z(t) = Dsx(t) = Υ(t)Υ(s) σ(x(s))1s≤t, s ≥ 0 (19) In the remaining of the analysis in this paper we as- To see that we solve the equation above with respect to Υ(t), sume that: Assumption 1: The diffusion matrix σ satisfies −1 T T Υ(t) = Z(t)σ(x(s)) Υ(s)1s≤t and then substitute into the uniform ellipticity condition:  > 0, ξ σ σξ ≥ 2 n (16). More precisely we write (16) starting from the initial ||ξ||2, ∀ξ, x ∈ < . time τ = s and we will have: IV. THE δ- SENSITIVITY. The goal in this section is to to find the of the Z t Z t N X expectation of the form: Υ(t)−Υ(s) = Jb(x)Υ(τ)dτ+ Jσj (x)Υ(τ)dwj(τ) s s j=1    Z tN 

E J (~x) xti = E q(x)dt xti (21) substitution Y(t) results in: t   −1 with respect to the changes in the initial state xt of the Z(t)σ(x(s)) − I Υ(s) = diffusion process:  Z t Z t N  X dx = b(x)dt + σ(x(t))dw(t) (22) Jb(x)Z(τ)dτ + Jσj (x)Z(τ)dwj(τ) s s j=1 The trajectories ~x are defined as ~x = × σ(x(s))−1Υ(s) (xti+1 , xti+2 , ..., xtN ). We also denote by ∇i the with respect to the i-th argument and we introduce Multiplication of both sides of the equation above with the set Γ as follows: Υ(s)−1σ(x(s)) will provide us with (17). Equations 12 and m 11 can be trivially proved by application of the chain rule as Z ti defined in (6). 2 ΓN = {α ∈ L ([0.T ])| αdt = 1, ∀i = 1, 2, ..., m} 0 Next we derive another identity that will be used later (23) n in our derivations. More precisely let a(t) ∈ [0 T ] be a Lemma 1: Under the assumption 1, and for any x ∈ < deterministic function such that: and for any α ∈ ΓN we have that:

Z T    N  a(t)dt = 1 X T ∇xt E J (~x) xti = E ∇xt J (~x) Υ(tj) xti t0 i j j=i+1 Then from (11) we have that for t = ti the expression that follows: where Υ(t) is the stochastic flow defined as: n X dΥ(t) = J (x(t))Υ(t)dt+ J (x(t))Υ(t)dwi(t) (24) −1 b σi Z(ti) = Dsx(ti) = Υ(ti)Υ(s) σ(x(s))1s≤ti , (20) i=1  T with the initial condition Υ(0) = I T n. where the term v(t) = R α(t) σ(x(t))−1Υ(t) dw(t) is Proof: We assume that the functional J (~x) is continu- 0 ously differentiable and therefore we will have that: the forms the stochastic derivative. The derivation for the case where J (~x) is not continuously  T  T differentiable can be found in [8]. ∇xt E J (~x) xti = E ∇xt J (~x) xti i i V. STOCHASTIC OPTIMAL CONTROL  N  X T dxtj = E ∇xt J (~x) xti In this section we will show how δ−sensitivity is used j dx j=i+1 ti for computing the optimal control in feedback form. More  N  precisely, we consider stochastic optimal control in the clas- X T = E ∇xt J (~x) Υ(tj) xti sical sense, as a constrained optimization problem, with the j j=i+1 cost function under minimization given by the mathematical expression:

For the case where J (~x) = φ(xT ) and therefore J (~x) depends on the last state of the state trajectory ~x =   V (x) = min E(1) J(x, u) {xt1 , ..., xtN } we have the following proposition: u Proposition 3: Under the assumption 1, and for any x ∈  Z tN  n (1) < and for any α ∈ ΓN we have that: = min E φ(x(tN )) + L(x, u, t)dt (26) u to   subject to the nonlinear stochastic dynamics:

∇xt E φ(xtN ) xti i dx = F(x, u)dt + B(x)dw (27)  Z tN  T  −1 n×1 p×1 = E φ(xtN ) α(t) σ(x) Υ(t) dw(t) with x ∈ < denoting the state of the system, u ∈ < p×1 ti the control vector and dw ∈ < brownian noise. The where Υ(t) is the stochastic flow defined as: function F(x, u) is a nonlinear function of the state x and affine in controls u and therefore is defined as F(x, u) = n X f(x) + G(x)u . The matrix G(x) ∈ 0 the corresponding weight. The stochastic T HJB equation [5], [14] associated with this stochastic optimal = E ∇xt φ(xtN ) Υ(tN ) N control problem is expressed as follows:  Z tN  T −1 = E ∇x φ(xt ) Dtx(tN )α(s)σ(x(s)) Υ(s)ds tN N   0 T 1 T   Z tN  −∂tV = min L + (∇xV ) F + tr (∇xxV )BB = E ∇ φ(x )T D x(t )α(t)σ(x(t))−1Υ(t)dt u 2 xtN tN t N 0 (29)  Z tN  To find the minimum, the cost function (26) is inserted into T −1 = E Dtφ(xtN ) α(t)σ(x(t)) Υ(t)dt (29) and the gradient of the expression inside the parenthesis 0 is taken with respect to controls u and set to zero. The  Z T    −1 corresponding optimal control is given by the equation: = E φ(xtN ) α(t) σ(x(t)) Υ(t) dw 0   −1 T where in the last line we make use of (8). Thus we have u(xt) = −R q1(x, t) + G(x) ∇xV (x, t) (30) the final result: These optimal controls will push the system dynamics     in the direction opposite that of the gradient of the value function ∇xV (x, t). The value function satisfies nonlinear, ∇xt E φ(xtN ) xti = E φ(xtN )v(t) i second-order PDE: Theorem 1: Consider the stochastic optimal control 1 problem with performance criterion and stochastic dynam- −∂ V =q ˜ + (∇ V )T ˜f − (∇ V )T GR−1GT (∇ V ) t x 2 x x ics expressed as in (26),(28) and (27) with the terms 1 q0(x, t), q1(x, t), f(x), B(x), G(x) be continuously differen- + tr (∇ V )BBT  (31) 2 xx tiable, f(x), B(x), G(x) have bounded Lipschitz derivatives ˜ ˜ and q˜(x, t) and f(x, t) are given by (32) and (33). Under the with q˜(x, t) and f(x, t) defined as assumption of twice continuous differentiabilty of the value function V (x), the optimal control law is expressed as: 1 T −1 q˜(x, t) = q0(x, t) − q1(x, t) R q1(x, t) (32) 2 −1 uPI (xt , ti) = −R q1(xt , t) and i i T  N  ˜ −1 λG(xt ) X f(x, t) = f(x, t) − G(x, t)R q1(x, t) (33) + R−1 i E Υ(t )T ∇ J (~x) Ψ(x , t ) j x(tj ) ti i j=i+1 and the boundary condition V (xtN ) = φ(xtN ). Given the exponential transformation V (x, t) = −λ log Ψ(x, t) The term J (~x) is defined as J (~x) = and the assumption λG(x)R−1G(x)T = B(x)B(x)T =   1 R tN dx(t) exp − q˜(x(s))ds while Υ(t) = ∀t > ti Σ(xt) = Σ the resulting PDE is formulated as follows: λ ti dx(ti) 1 1 is the stochastic flow: −∂ Ψ = − q˜Ψ + ˜f T (∇ Ψ) + tr ((∇ Ψ)Σ) (34) t λ x 2 xx n with boundary condition: Ψ(x(tN )) = X k 1 dΥ(t) = J Υ(t)dt + J (x(t ))Υ(t)dw (t)  ˜f(x(ti)) Bk i exp − λ φ(x(tN )) . By applying the Feynman-Kac lemma to the Chapman-Kolmogorov PDE (34) yields its k=1 solution in form of an expectation over system trajectories. that corresponds to the stochastic differential equation: This solution is mathematically expressed as: ˜ dx = f(xt)dt + B(x)dw

  Z tN   1 VI.CONCLUSIONS Ψ(x ) = E(0) exp − q˜(x)dt Ψ(x ) (35) ti λ tN ti We derived the optimal control law by using the δ− (0) The expectation E is taken on sample paths ~xi = sensitivity. Consider the case of estimating the expectation of a cost function over state paths generated by sampling of (xi, ..., xtN ) generated with the forward sampling of the uncontrolled diffusion equation: a diffusion process. Essentially δ−sensitivity corresponds to the differential of the expected cost function with respect to ˜ dx = f(xt)dt + B(x)dw (36) the initial state of the underlying diffusion process. Future research will involve the development of efficient The optimal controls are specified as: for applications to dynamical systems in robotics and biol- ogy. Another line of research involves the use of stochastic   ∇xΨ(x, t) calculus of variations for optimal control of Markov jump u (x, t) = −R−1 q (x, t) − λG(x)T (37) PI 1 Ψ(x, t) diffusion processes. Since, the initial value the function V (x, t) is the minimum REFERENCES of the expectation of the objective function J(x, u) subject to controlled stochastic dynamics in (27), it can be trivially [1] Paolo Dai Pra, Lorenzo Meneghini, and Wolfgang Runggaldier. Con- nections between stochastic control and dynamic games. shown that: of Control, Signals, and Systems (MCSS), 9(4):303–326, 1996-12-08. [2] Mark H. A. Davis and Martin P. Johansson. Malliavin monte Stochastic Processes and their   Z tN   carlo greeks for jump diffusions. (0) 1 Applications, 116(1):101–129, January 2006. V (x, ti) = −λ log E exp − q˜(x)dt Ψ(xt ) N [3] W. H. Fleming and W. M. McEneaney. Risk-sensitive control on ti λ   an infinite time horizon. SIAM J. Control Optim., 33:1881–1915, ≤ E(1) J(x, u) (38) November 1995. [4] W. H. Fleming and H. Mete Soner. Controlled Markov processes and viscosity solutions. Applications of mathematics. Springer, New York, (1) The expectation E is taken on sample paths ~xi = 1nd edition, 1993. (x , ..., x ) generated with the forward sampling of the [5] W. H. Fleming and H. Mete Soner. Controlled Markov processes and i tN viscosity solutions. Applications of mathematics. Springer, New York, controlled diffusion equation in (27). Closer look into the 2nd edition, 2006. feedback control law in (37) reveals the use of the δ− sen- [6] W.H. Fleming. Exit probabilities and optimal stochastic control. sitivity. In particular, the feedback control involves the term Applied Math. Optim, 9:329–346, 1971. [7] Eric Fournie,´ Jean-Michel Lasry, Pierre-Louis Lions, and Jrme Lebu- ∇xΨ(x, t) which correspond to the δ− Greek sensitivity of choux. Applications of malliavin calculus to monte-carlo methods in the functions Ψ(x, t) with respect to the initial state of the finance. ii. Finance and Stochastics, 5(2):201–236, 2001. diffusion in (36). The analysis above is summarized by the [8] Eric Fournie,´ Jean-Michel Lasry, Pierre-Louis Lions, Jrme Lebuchoux, and Nizar Touzi. Applications of malliavin calculus to monte carlo following theorem: methods in finance. Finance and Stochastics, 3(4):391–412, 1999. [9] H. J. Kappen. Path and symmetry breaking for optimal con- trol theory. Journal of Statistical Mechanics: Theory and Experiment, 11:P11011, 2005. [10] H. J. Kappen. An introduction to stochastic control theory, path integrals and reinforcement learning. In J. Marro, P. L. Garrido, and J. J. Torres, editors, Cooperative Behavior in Neural Systems, volume 887 of American Institute of Physics Conference , pages 149– 181, February 2007. [11] Sanjoy K. Mitter and Nigel J. Newton. A variational approach to nonlinear estimation. SIAM J. Control Optim., 42(5):1813–1833, May 2003. [12] Giulia Di Nunno, Bernt Oksental, and Frank Proske. Maliavin Calculus for Levy´ Processes with Applications to Finance. Springer, 1nd edition, 2009. [13] D. Ocone and I. Karatzas. A generalized clark representation formula, with applications to optimal portfolios. Stochastics, 37:187–220., 1991. [14] Robert F. Stengel. Optimal control and estimation. Dover books on advanced mathematics. Dover Publications, New York, 1994. [15] E. Todorov. Linearly-solvable markov decision problems. In B. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19 (NIPS 2007), Vancouver, BC, 2007. Cambridge, MA: MIT Press. [16] E. Todorov. Efficient computation of optimal actions. Proc Natl Acad Sci U S A, 106(28):11478–83, 2009.