INTRODUCTION to the OPTIMAL CONTROL THEORY and SOME APPLICATIONS Contents 1. Introduction 1 2. Calculus of Variations and the Eu

INTRODUCTION TO THE OPTIMAL CONTROL THEORY AND SOME APPLICATIONS YUTONG QING Abstract. This paper aims to give a brief introduction to the optimal control theory and attempts to derive some of the central results of the subject, in- cluding the Hamilton-Jacobi-Bellman PDE and the Pontryagin Maximal Prin- ciple. Along the way, some of the more rigorous mathematical tools, such as Hamilton-Jacobi equations, viscosity solutions for PDEs, and the method of characteristics, will be introduced. Finally, some particular examples will be studied at the end of the paper using the developed theorems. Contents 1. Introduction 1 2. Calculus of Variations and the Euler-Lagrange Equations 3 3. Hamilton Equations 4 4. Hamilton-Jacobi Equations 5 5. Viscosity Solution for the Hamilton-Jacobi Equation 7 6. The Optimal Control Theory and the Hamilton-Jacobi-Bellman Equation 12 7. The Pontryagin Maximum Principle 15 8. Applications 17 8.1. Example: The Game \Tag" 17 Acknowledgments 18 References 18 1. Introduction We begin by defining control function, controlled dynamical systems, and the payoff functional of a controlled dynamical system. Definition 1.1. A control function is a function α(t) of time that maps time t 2 [0; +1) to a set A, where A is called the set of admissible controls. Definition 1.2. A controlled dynamical system can be characterized by its dynamics x_ (t) = f(x(t); α(t); t) and the initial state of the system x(0) = x0 where x 2 Rn, and the function f maps Rn × A × [0; +1) to Rn. Date: AUGUST 31, 2019. 1 2 YUTONG QING A dynamical system is called autonomous if it doesn't explicitly depend on time, i.e. its dynamics can be written as x_ (t) = f(x(t); α(t)): In this case, the function f maps Rn × A to Rn. Definition 1.3. The payoff functional of a dynamical system is of the form Z T P[α] = r(x(t); α(t)) dt + g(x(T )) 0 where the function r(x; α) is associated with the \running payoff” of the system and g(x) is related to the \terminal payoff” of the system. The optimal control theory aims to solve the problem of finding a control for a certain autonomous dynamical system that will make the payoff functional of the system P[α] attains its maximum value. Written out in mathematical notation, given a controlled dynamical system ( x_ (t) = f(x(t); α(t)) x(0) = x0 we want to find such an α∗(t) so that for all possible control functions α(t), we have P[α∗(t)] ≥ P[α(t)]: It is worth noting that in practice, usually we will have more requirements on the regularity for f(x; a), r(x; a), and g(x), like being Lipschitz continuous. Readers may better understand the concepts by viewing a toy example. Suppose there is a person originally at the origin of the number line, and this person will be awarded if he stays close to the origin by the end of 1 minute of time. If this person has maximal speed 1 meter per minute, then what is his best strategy to get awarded the most? In mathematical language, the dynamical system will be ( x_ (t) = α(t) x(0) = 0: The admissible set of control is A = B1(0); the running payoff function r(x; a) = 0; the terminal payoff function is g(x) = −|xj. Thus, the payoff functional is P[α] = −|x(1)j; and we want to maximize this functional. Therefore, in general terms, the key idea of optimal control is to find a function that maximizes a certain functional, i.e. an infinite dimensional optimization problem. Usually this type of problem can be resolved using a tool called the calculus of variations, so in Section 2 to Section 5, we will introduce the basic ideas in the calculus of variations, namely the Euler-Lagrange equations, the Hamilton equations, and the Hamilton-Jacobi equations. In Section 5 and 6, we will introduce a type of weak solutions for the Hamilton-Jacobi PDEs called \viscosity solutions" and then build the connection between viscosity solutions and optimal control theory and derive a sufficient condition for a system to have an optimal control. In Section 7, we will use a method called the method of characteristics to obtain necessary conditions for a control system to have optimal control, namely the Pontryagin INTRODUCTION TO THE OPTIMAL CONTROL THEORY AND SOME APPLICATIONS 3 Maximum Principle. Finally, we will apply these results to solve a toy example of an optimal control problem. 2. Calculus of Variations and the Euler-Lagrange Equations Suppose we are given a functional Z T (2.1) I[x(t)] = L(x_ (t); x(t); t) dt 0 where we assume x is a C1 function that maps time t 2 [0;T ] to Rn, and L is called the Lagrangian which is a real-valued function of variables x_ , x, and t, with x_ (t) being the time derivative of x(t), i.e. L(x_ ; x; t) := Rn × Rn × [0;T ] 7! R. Note that x is the second variable that gets plugged into L and also a path. If we want to find the extremal of a classical real value function with finite dimensional domain, one thing that we will usually do is to calculate the directional derivatives of that function and then find the point that makes all the directional derivatives be 0. In the infinite dimensional case, the procedure will be fairly similar, and that leads to the following theorem. Theorem 2.2 (Euler-Lagrange Equations). Suppose L is C2, and x : [0;T ] ! Rn. Then if x(t) is an extremal of I(·), we have the following equation d − [r L(x_ (t); x(t); t)] + r L(x_ (t); x(t); t) = 0: dt x_ x Proof. Suppose we have a smooth test function φ(t) 2 C1, where φ(0) = φ(T ) = 0. Because x is a local extremal of I(·), then if we perturb the original function x by a small function kφ with k small, the resulting functional I[x+kφ] should be roughly unchanged. In other words, we should have d [I(x + kφ)] = 0: dk k=0 Now, we just compute the derivative, and by Leibniz's rule we may exchange the integration sign with the derivative sign. Thus, we have for each coordinate of x Z T dL(x_ + kφ_ ; x + kφ ; t) i i i i dt = 0 for each i: 0 dk Hence, by the fact that we evaluate at k = 0, we get Z T @L @L _ φi + φi dt = 0: 0 @x_i @xi Then, we can use the integration by parts formula and the fact that φ is smooth and vanishes at the boundary to get T Z T Z T @L d @L @L 0 = φi − ( )φi dt + φi dt: @x_i 0 0 dt @x_i 0 @xi Consequently, Z T d @L @L φi(− ( ) + ) dt = 0: 0 dt @x_i @xi Also, by the fact that the choice of the test function is arbitrary, we have that for each coordinate d @L @L − ( ) + = 0: dt @x_i @xi 4 YUTONG QING Finally, we have the Euler-Lagrange equation d − [r L(x_ (t); x(t); t)] + r L(x_ (t); x(t); t) = 0: dt x_ x We notice that the Euler-Lagrange equation is a second-order partial differential equation, which is somehow more challenging to handle compared with a system of ordinary differential equations. Therefore, it will be of our benefit to transform the Euler-Lagrange equation to a system of ordinary differential equations. In the next section we shall show how to obtain the desired system of first order ordinary differential equations. 3. Hamilton Equations We first begin this section by defining the conjugate variable p(t) and the Hamiltonian H. Definition 3.1. The conjugate variable p(t) is defined to be p(t) = rx_ L(x_ (t); x(t); t): From now on, we assume that the function x_ (t) can be express as a function of p, x and t. The rigorous proof of this fact can be found in Evans' book \Partial Differential Equations", and the intuition here is that x_ is defined implicitly by p, x and t. Definition 3.2. The Hamiltonian H is defined to be H(p; x; t) = p · x_ − L(x_ ; x; t) with x_ being a function of p, x and t. We have the following theorem for the following system of ordinary differential equations. Theorem 3.3 (Hamilton Equations). Define the conjugate variable p and the Hamiltonian H as above. Then these two variables satisfy the following equations. ( x_ (t) = rpH(p(t); x(t); t) p_ (t) = −∇xH(p(t); x(t); t): Also, we have d H(p(t); x(t)) ≡ 0 dt if H doesn't explicitly depend on t. Proof. We first compute rpH, and along a specific trajectory we get rpH(p(t); x(t); t) = rp[p(t) · x_ (p; x; t) − L(x_ (p; x; t); x(t); t)] = x_ (p; x; t) + p(t) · rpx_ (p; x; t) − rx_ L · rpx_ (p; x; t): However, recall that p(t) is defined to be rx_ L, so we have rpH(p(t); x(t); t) = x_ (p; x; t) = x_ (t): INTRODUCTION TO THE OPTIMAL CONTROL THEORY AND SOME APPLICATIONS 5 Then we compute rxH, which yields rxH(p(t); x(t); t) = rx[p(t) · x_ (p; x; t) − L(x_ (p; x; t); x(t); t)] = p(t) · rxx_ (p; x; t) − rx_ L · rxx_ (p; x; t) − rxL = −∇xL: To complete the proof, it suffices to show that p_ (t) = rxL, and this fact follows immediately from the Euler-Lagrange Equations d − r L + r L = 0 dt x_ x d because p_ (t) = dt rx_ L.

Load more