Applications of Stochastic Optimal Control to Economics and Finance

Total Page:16

File Type:pdf, Size:1020Kb

Applications of Stochastic Optimal Control to Economics and Finance A Service of Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics Federico, Salvatore (Ed.); Ferrari, Giorgio (Ed.); Regis, Luca (Ed.) Book — Published Version Applications of stochastic optimal control to economics and finance Provided in Cooperation with: MDPI – Multidisciplinary Digital Publishing Institute, Basel Suggested Citation: Federico, Salvatore (Ed.); Ferrari, Giorgio (Ed.); Regis, Luca (Ed.) (2020) : Applications of stochastic optimal control to economics and finance, ISBN 978-3-03936-059-8, MDPI, Basel, http://dx.doi.org/10.3390/books978-3-03936-059-8 This Version is available at: http://hdl.handle.net/10419/230540 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence. https://creativecommons.org/licenses/by/4.0/ www.econstor.eu Applications of Stochastic Optimal Control to Economics and Finance Applications of Stochastic • Salvatore Federico, Giorgio Ferrari and Luca Regis Optimal Control to Economics and Finance Edited by Salvatore Federico, Giorgio Ferrari and Luca Regis Printed Edition of the Special Issue Published in Risks www.mdpi.com/journal/risks Applications of Stochastic Optimal Control to Economics and Finance Applications of Stochastic Optimal Control to Economics and Finance Special Issue Editors Salvatore Federico Giorgio Ferrari Luca Regis MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin Special Issue Editors Salvatore Federico Giorgio Ferrari Luca Regis University of Siena Bielefeld University University of Torino Italy Germany Italy Editorial Office MDPI St. Alban-Anlage 66 4052 Basel, Switzerland This is a reprint of articles from the Special Issue published online in the open access journal Risks (ISSN 2227-9091) (available at: https://www.mdpi.com/journal/risks/special issues/Stochastic Optimal Control). For citation purposes, cite each article independently as indicated on the article page online and as indicated below: LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. Journal Name Year, Article Number, Page Range. ISBN 978-3-03936-058-1 (Hbk) ISBN 978-3-03936-059-8 (PDF) c 2020 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications. The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND. Contents About the Special Issue Editors ..................................... vii Preface to ”Applications of Stochastic Optimal Control to Economics and Finance” ...... ix Jason S. Anquandah and Leonid V. Bogachev Optimal Stopping and Utility in a Simple Model of Unemployment Insurance Reprinted from: Risks 2019, 7, 94, doi:10.3390/risks7030094 ...................... 1 Francesco Rotondi American Options on High Dividend Securities: A Numerical Investigation Reprinted from: Risks 2019, 7, 59, doi:10.3390/risks7020059 ...................... 43 Matteo Brachetta and Claudia Ceci Optimal Excess-of-Loss Reinsurance for Stochastic Factor Risk Models Reprinted from: Risks 2019, 7, 48, doi:10.3390/risks7020048 ...................... 63 John Moriarty and Jan Palczewski Imbalance Market Real Options and the Valuation of Storage in Future Energy Systems Reprinted from: Risks 2019, 7, 39, doi:10.3390/risks7020039 ...................... 87 Zbigniew Palmowski, Łukasz Stettner and Anna Sulima Optimal Portfolio Selection in an Ito–Markovˆ Additive Market Reprinted from: Risks 2019, 7, 34, doi:10.3390/risks7010034 ......................117 Carmine De Franco, Johann Nicolle and Huyˆen Pham Dealing with Drift Uncertainty: A Bayesian Learning Approach Reprinted from: Risks 2019, 7, 5, doi:10.3390/risks7010005 .......................149 Abel Cadenillas and Ricardo Huam´an-Aguilar On the Failure to Reach the Optimal Government Debt Ceiling Reprinted from: Risks 2018, 6, 138, doi:10.3390/risks6040138 .....................167 v About the Special Issue Editors Salvatore Federico is an Associate Professor of mathematics for economics and finance at the University of Siena. He holds a PhD from Scuola Normale Superiore in the field of financial mathematics. Giorgio Ferrari is an Associate Professor of mathematical finance at the Center for Mathematical Economics at Bielefeld University. He holds a PhD in mathematics for economic-financial applications from the University of Rome “La Sapienza”. His main research interests are the theory and application of singular stochastic control, optimal stopping, and stochastic games. Luca Regis is an Associate Professor of mathematics for finance and insurance at the University of Torino, ESOMAS Department. His research interests are in the interplay between financial and actuarial mathematics, and include corporate finance. vii Preface to “Applications of Stochastic Optimal Control to Economics and Finance” In a world dominated by uncertainty, the modeling and understanding of the optimal behavior of agents are of the utmost importance. Many problems in economics, finance, and actuarial science naturally require decision-makers to undertake choices in stochastic environments. Examples include optimal individual consumption and retirement choices, optimal management of portfolios and risk, hedging, optimal timing issues in pricing American options, or in investment decisions. Stochastic control theory provides the methods and results to tackle all such problems. This book collects the papers published in the 2019 Special Issue of Risks “Applications of Stochastic Optimal Control to Economics and Finance” and contains 7 peer-reviewed papers dealing with stochastic control models motivated by important questions in Economics and Finance. Each model is mathematically rigorously funded and treated, and numerical methods are also possibly employed to derive the optimal solution. The topics of the book’s chapters range from optimal public debt management, to optimal reinsurance, real options in energy markets, and optimal portfolio choice in partial and complete information setting, just to mention a few. From a mathematical point of view, techniques and arguments from dynamic programming theory, filtering theory, optimal stopping, one-dimensional diffusions, and multi-dimensional jump processes are used. The paper by Anquandah and Bogachev deals with unemployment insurance and proposes a simple model of optimal agent entrance in a scheme. The paper analyzes individual decisions regarding optimal entry time, the dependence of the optimal solution on macroeconomic variables, and individual preferences. In the second paper of the Special Issue, Rotondi documents a bias that may arise when American options are monetarily valued using the traditional least square method. The estimates on which this method is based utilize a regression run of the in-the-money paths of the Monte Carlo simulation. Therefore, large biases may occur if, for instance, the option is far out of the money. The paper proposes two approaches that completely overcome this problem and evaluates their performance, using the standard least square method. In the third paper, Ceci and Brachetta study optimal excess-of-loss reinsurance problems when both the intensity of the claims’ arrival and the claims’ size are influenced by an exogenous stochastic factor. The model allows for stochastic risk premia, which takes into account risk fluctuations. Using stochastic control theory, based on the Hamilton–Jacobi– Bellman equation, the authors analyze the optimal reinsurance strategy under the criterion of maximizing the expected exponential utility of terminal wealth. In the fourth paper, Moriarty and Palczewski study a real option problem arising in the context of energy markets. They assess the real option value of an arrangement under which an autonomous energy-limited storage unit sells incremental balancing reserve. The problem is set as a perpetual American swing put option with random refraction times. In the fifth paper, Palmowski, Stettner, and Sulima study a portfolio selection problem in a continuous-time Itô–Markov additive market, where the prices of financial assets are described by the Markov additive processes that combines Lévy processes and regime- switching models. The model takes into account two sources of risk: the jump-diffusion risk and the regime-switching risk. The
Recommended publications
  • Propt Product Sheet
    PROPT - The world’s fastest Optimal Control platform for MATLAB. PROPT - ONE OF A KIND, LIGHTNING FAST SOLUTIONS TO YOUR OPTIMAL CONTROL PROBLEMS! NOW WITH WELL OVER 100 TEST CASES! The PROPT software package is intended to When using PROPT, optimally coded analytical solve dynamic optimization problems. Such first and second order derivatives, including problems are usually described by: problem sparsity patterns are automatically generated, thereby making it the first MATLAB • A state-space model of package to be able to fully utilize a system. This can be NLP (and QP) solvers such as: either a set of ordinary KNITRO, CONOPT, SNOPT and differential equations CPLEX. (ODE) or differential PROPT currently uses Gauss or algebraic equations (PAE). Chebyshev-point collocation • Initial and/or final for solving optimal control conditions (sometimes problems. However, the code is also conditions at other written in a more general way, points). allowing for a DAE rather than an ODE formulation. Parameter • A cost functional, i.e. a estimation problems are also scalar value that depends possible to solve. on the state trajectories and the control function. PROPT has three main functions: • Sometimes, additional equations and variables • Computation of the that, for example, relate constant matrices used for the the initial and final differentiation and integration conditions to each other. of the polynomials used to approximate the solution to the trajectory optimization problem. The goal of PROPT is to make it possible to input such problem • Source transformation to turn descriptions as simply as user-supplied expressions into possible, without having to worry optimized MATLAB code for the about the mathematics of the cost function f and constraint actual solver.
    [Show full text]
  • STOCHASTIC CONTROL Contents 1. Introduction 1 2. Mathematical Probability 1 3. Stochastic Calculus 3 4. Diffusions 7 5. the Stoc
    STOCHASTIC CONTROL NAFTALI HARRIS Abstract. In this paper we discuss the Stochastic Control Problem, which deals with the best way to control a stochastic system and has applications in fields as diverse as robotics, finance, and industry. After quickly reviewing mathematical probability and stochastic integration, we prove the Hamilton- Jacobi-Bellman equations for stochastic control, which transform the Stochas- tic Control Problem into a partial differential equation. Contents 1. Introduction 1 2. Mathematical Probability 1 3. Stochastic Calculus 3 4. Diffusions 7 5. The Stochastic Control Problem 11 6. The Hamilton-Jacobi-Bellman Equation 12 7. Acknowledgements 18 References 18 1. Introduction The Stochastic Control Problem arises when we wish to control a continuous- time random system optimally. For example, we wish to fly an airplane through turbulence, or manage a stock portfolio, or precisely maintain the temperature and pressure of a system in the presence of random air currents and temperatures. To build up to a formal definition of this problem, we will first review mathematical probability and stochastic calculus. These will give us the tools to talk about It^o processes and diffusions, which will allow us to state the Stochastic Control Problem formally. Finally, we will spend the latter half of this paper proving necessary conditions and sufficient conditions for its solution. 2. Mathematical Probability In this section we briskly define the mathematical model of probability required in the rest of this paper. We define probability spaces, random variables, and conditional expectation. For more detail, see Williams [4]. Definition 2.1 (Probability Space). A probability space is a triple (Ω; F;P ), where Ω is an arbitrary set, F is a sigma algebra over Ω, and P is a measure on sets in F with P (Ω) = 1.
    [Show full text]
  • Optimal Control Theory Version 0.2
    An Introduction to Mathematical Optimal Control Theory Version 0.2 By Lawrence C. Evans Department of Mathematics University of California, Berkeley Chapter 1: Introduction Chapter 2: Controllability, bang-bang principle Chapter 3: Linear time-optimal control Chapter 4: The Pontryagin Maximum Principle Chapter 5: Dynamic programming Chapter 6: Game theory Chapter 7: Introduction to stochastic control theory Appendix: Proofs of the Pontryagin Maximum Principle Exercises References 1 PREFACE These notes build upon a course I taught at the University of Maryland during the fall of 1983. My great thanks go to Martino Bardi, who took careful notes, saved them all these years and recently mailed them to me. Faye Yeager typed up his notes into a first draft of these lectures as they now appear. Scott Armstrong read over the notes and suggested many improvements: thanks, Scott. Stephen Moye of the American Math Society helped me a lot with AMSTeX versus LaTeX issues. My thanks also to Atilla Yilmaz for spotting lots of typos and errors, which I have corrected. I have radically modified much of the notation (to be consistent with my other writings), updated the references, added several new examples, and provided a proof of the Pontryagin Maximum Principle. As this is a course for undergraduates, I have dispensed in certain proofs with various measurability and continuity issues, and as compensation have added various critiques as to the lack of total rigor. This current version of the notes is not yet complete, but meets I think the usual high standards for material posted on the internet. Please email me at [email protected] with any corrections or comments.
    [Show full text]
  • MARKOV DECISION PROCESSES Abstract: the Theory of Markov
    July 06, 2010, 15:00 Draft, don’t distribute MARKOV DECISION PROCESSES NICOLE BAUERLE¨ ∗ AND ULRICH RIEDER‡ Abstract: The theory of Markov Decision Processes is the theory of controlled Markov chains. Its origins can be traced back to R. Bellman and L. Shapley in the 1950’s. During the decades of the last century this theory has grown dramatically. It has found applications in various areas like e.g. computer science, engineering, operations research, biology and economics. In this article we give a short intro- duction to parts of this theory. We treat Markov Decision Processes with finite and infinite time horizon where we will restrict the presentation to the so-called (gen- eralized) negative case. Solution algorithms like Howard’s policy improvement and linear programming are also explained. Various examples show the application of the theory. We treat stochastic linear-quadratic control problems, bandit problems and dividend pay-out problems. AMS 2010 Classification: 90C40, 60J05, 93E20 Keywords and Phrases: Markov Decision Process, Markov Chain, Bell- man Equation, Policy Improvement, Linear Programming 1. Introduction Do you want to play a card game? Yes? Then I will tell you how it works. We have a well-shuffled standard 32-card deck which is also known as a piquet deck. 16 cards are red and 16 cards are black. Initially the card deck lies on the table face down. Then I start to remove the cards and you are able to see its faces. Once you have to say ”stop”. If the next card is black you win 10 Euro, if it is red you loose 10 Euro.
    [Show full text]
  • On Some Recent Aspects of Stochastic Control and Their Applications 507
    Probability Surveys Vol. 2 (2005) 506–549 ISSN: 1549-5787 DOI: 10.1214/154957805100000195 On some recent aspects of stochastic control and their applications∗ Huyˆen Pham† Laboratoire de Probabilit´es et Mod`eles Al´eatoires, CNRS, UMR 7599 Universit´eParis 7, e-mail: [email protected], and CREST Abstract: This paper is a survey on some recent aspects and developments in stochastic control. We discuss the two main historical approaches, Bell- man’s optimality principle and Pontryagin’s maximum principle, and their modern exposition with viscosity solutions and backward stochastic differ- ential equations. Some original proofs are presented in a unifying context including degenerate singular control problems. We emphasize key results on characterization of optimal control for diffusion processes, with a view towards applications. Some examples in finance are detailed with their ex- plicit solutions. We also discuss numerical issues and open questions. AMS 2000 subject classifications: 93E20, 49J20, 49L20, 60H30. Keywords and phrases: Controlled diffusions, dynamic programming, maximum principle, viscosity solutions, backward stochastic differential equations, finance. Received December 2005. arXiv:math/0509711v3 [math.PR] 12 Jan 2006 ∗This is an original survey paper †I would like to thank Philip Protter for his encouragement to write this paper. 506 H. Pham/On some recent aspects of stochastic control and their applications 507 Contents 1 Introduction 508 2 Theproblemanddiscussiononmethodology 508 2.1 Problemformulation . 508 2.2 Bellman’s optimality principle . 509 2.2.1 The Hamilton-Jacobi-Bellman (in)equation . 511 2.2.2 Theclassicalverificationapproach . 512 2.3 Pontryagin’s maximum principle . 514 2.4 Othercontrolproblems . 515 2.4.1 Randomhorizon .
    [Show full text]
  • Ensemble Kalman Filtering for Inverse Optimal Control Andrea Arnold and Hien Tran, Member, SIAM
    Ensemble Kalman Filtering for Inverse Optimal Control Andrea Arnold and Hien Tran, Member, SIAM Abstract—Solving the inverse optimal control problem for In the Bayesian framework, unknown parameters are mod- discrete-time nonlinear systems requires the construction of a eled as random variables with probability density functions stabilizing feedback control law based on a control Lyapunov representing distributions of possible values. The EnKF is a function (CLF). However, there are few systematic approaches available for defining appropriate CLFs. We propose an ap- nonlinear Bayesian filter which uses ensemble statistics in proach that employs Bayesian filtering methodology to param- combination with the classical Kalman filter equations for eterize a quadratic CLF. In particular, we use the ensemble state and parameter estimation [9]–[11]. The EnKF has been Kalman filter (EnKF) to estimate parameters used in defining employed in many settings, including weather prediction the CLF within the control loop of the inverse optimal control [12], [13] and mathematical biology [11]. To the authors’ problem formulation. Using the EnKF in this setting provides a natural link between uncertainty quantification and optimal knowledge, this is the first proposed use of the EnKF in design and control, as well as a novel and intuitive way to find inverse optimal control problems. The novelty of using the the one control out of an ensemble that stabilizes the system the EnKF in this setting allows us to generate an ensemble of fastest. Results are demonstrated on both a linear and nonlinear control laws, from which we can then select the control law test problem. that drives the system to zero the fastest.
    [Show full text]
  • Stochastic Control
    1 Stochastic Control Roger Brockett April 3, 2009 2 PREFACE Teaching stochastic processes to students whose primary interests are in applications has long been a problem. On one hand, the subject can quickly become highly technical and if mathematical concerns are allowed to dominate there may be no time available for exploring the many interesting areas of applications. On the other hand, the treatment of stochastic calculus in a cavalier fashion leaves the student with a feeling of great uncertainty when it comes to exploring new material. Moreover, the problem has become more acute as the power of the differential equation point of view has become more widely appreciated. In these notes, an attempt is made to resolve this dilemma with the needs of those interested in building models and designing algorithms for estimation and control in mind. The approach is to start with Poisson counters and to identify the Wiener process with a certain limiting form. We do not attempt to define the Wiener process per se. Instead, everything is done in terms of limits of jump processes. The Poisson counter and differential equations whose right-hand sides include the differential of Poisson counters are developed first. This leads to the construction of a sample path representations of a continuous time jump process using Poisson counters. This point of view leads to an efficient problem solving technique and permits a unified treatment of time varying and nonlinear problems. More importantly, it provides sound intuition for stochastic differential equations and their uses without allowing the technicalities to dominate. In treating estimation theory, the conditional density equa- tion is given a central role.
    [Show full text]
  • Optimal Control Theory
    Optimal Control Theory Mrinal Kanti Ghosh Department of Mathematics Indian Institute of Science Bangalore 560 012 email : [email protected] 1 Introduction Optimal Control Theory deals with optimization problems involving a controlled dynamical system. A controlled dynamical system is a dynamical system in which the trajectory can be altered continuously in time by choosing a control parameter u(t) continuously in time. A (deterministic) controlled dynamical system is usually governed by an ordinary differential equation of the form 9 x_(t) = f(t; x(t); u(t)); t > 0 = (1:1) d ; x(0) = x0 2 R where u : Rd ! U is a function called the control, U a given set called the control set, and d d f : R+ × R × U ! IR : By choosing the value of u(t), the state trajectory x(t) can be controlled. The objective of controlling the state trajectory is to minimize a certain cost associated with (1.1). Consider the following important problem studied by Bushaw and Lasalle in the field of aerospace engineering. Let x(t) represent the deviation of a rocket trajectory from some desired flight path. Then, provided the deviation is small, it is governed to a good approxi- mation by 9 x_(t) = A(t)x(t) + B(t)u(t); t > 0 = x(0) = x0; x(T ) = 0; ; (1:2) u(t) 2 [−1; 1]; where A and B are appropriate matrix valued functions. The vector x0 is the initial deviation, u(t) is the rocket thrust at time t, and T is the final time.
    [Show full text]
  • PSO-Based Soft Lunar Landing with Hazard Avoidance: Analysis and Experimentation
    aerospace Article PSO-Based Soft Lunar Landing with Hazard Avoidance: Analysis and Experimentation Andrea D’Ambrosio 1,* , Andrea Carbone 1 , Dario Spiller 2 and Fabio Curti 1 1 School of Aerospace Engineering, Sapienza University of Rome, Via Salaria 851, 00138 Rome, Italy; [email protected] (A.C.); [email protected] (F.C.) 2 Italian Space Agency, Via del Politecnico snc, 00133 Rome, Italy; [email protected] * Correspondence: [email protected] Abstract: The problem of real-time optimal guidance is extremely important for successful au- tonomous missions. In this paper, the last phases of autonomous lunar landing trajectories are addressed. The proposed guidance is based on the Particle Swarm Optimization, and the differ- ential flatness approach, which is a subclass of the inverse dynamics technique. The trajectory is approximated by polynomials and the control policy is obtained in an analytical closed form solution, where boundary and dynamical constraints are a priori satisfied. Although this procedure leads to sub-optimal solutions, it results in beng fast and thus potentially suitable to be used for real-time purposes. Moreover, the presence of craters on the lunar terrain is considered; therefore, hazard detection and avoidance are also carried out. The proposed guidance is tested by Monte Carlo simulations to evaluate its performances and a robust procedure, made up of safe additional maneuvers, is introduced to counteract optimization failures and achieve soft landing. Finally, the whole procedure is tested through an experimental facility, consisting of a robotic manipulator, equipped with a camera, and a simulated lunar terrain. The results show the efficiency and reliability Citation: D’Ambrosio, A.; Carbone, of the proposed guidance and its possible use for real-time sub-optimal trajectory generation within A.; Spiller, D.; Curti, F.
    [Show full text]
  • ID 467 Optimal Control Applied to an Economic Model for Reducing Unemployment Problems in Bangladesh
    Proceedings of the International Conference on Industrial Engineering and Operations Management Bangkok, Thailand, March 5-7, 2019 Optimal Control Applied to an Economic Model for Reducing Unemployment Problems in Bangladesh Md. Haider Ali Biswas, Uzzwal Kumar Mallick, Mst. Shanta Khatun Mathematics Discipline, Khulna University Khulna-9208, Bangladesh [email protected],[email protected],[email protected] Kazi Masum Parves Joint Director, Bangladesh Bank, Khulna, Bangladesh [email protected] Abstract In this paper, a nonlinear mathematical model of unemployment is described and analyzed for reduction of unemployed persons using two controls ‘creation of new vacancies’ and ‘skilled manpower’. For maximization of the number of employed people at the end of implemented policy, we find the control profile of two controls using Pontryagin’s Maximum Principle. This study concludes two effective controls so that the number of unemployed persons at the end of implemented policies of the government is minimized while minimizing the systematic cost of policy making. Our numerical results as illustrated in the figures strongly agree with the analytical findings. Keywords: Unemployment, Socio-economic damage, Mathematical model, Optimal control, Numerical simulation. 1. Introduction Unemployment problems have become the most immense concerns all over the world. This issue is significantly an alarming concern in Bangladesh as well. Unemployment can be defined as a state of workless for a person, who is fit, capable and willing to work. It is well known that the main reason for socio-economic damage, deletion of morality and social values is unemployment. With the increasing tendency of unemployed population in a country significantly affects the other factors, such that the income per person, health costs, quality of healthcare and poverty and seriously hamper the growth of national economy.
    [Show full text]
  • Optimal Control the Optimal Control Problem Can Be Described By
    Optimal Control The optimal control problem can be described by introducing the system dynamics x_ = F (x; u) which is assumed to start in an initial state given by x(0) = x0 and has controllable parameters u u 2 U The objective function consists of a function of the final state [x(T )] and a cost function (or loss function) ` that is integrated over time. Z T J = [x(T )] + `(u; x)dt 0 1 The simplest optimization problem is depicted in Figure 1: min F~(x) x 40 4 20 2 0 0 -4 -2 -2 0 2 -4 4 Figure 1: A simple case for optimization: a function of two variables has a single minimum at the point x∗. 2 Constraints The simplest kind of constraint is an equality constraint, for example, g(x) = 0. Formally this addition is stated as min F~(x) subject to G(x) = 0 x This complication can be handled by the method of Lagrange. This method reduces the constrained problem to a new, unconstrained minimization prob- lem with additional variables. The additional variables are known as Lagrange multipliers. To handle this problem, add g(x) to f~(x) using a Lagrange mul- tiplier λ: F (x; λ) = F~(x) + λG(x) The Lagrange multiplier is an extra scalar variable, so the number of degrees of freedom of the problem has increased, but the advantage is that now sim- ple, unconstrained minimization techniques can be applied to the composite function. The problem becomes min F (x; λ) x,λ Most often the equality constraint is the dynamic equations of the system itself, x_ = F (x; u), and the problem is to find an optimal trajectory x(t).
    [Show full text]
  • Optimal and Autonomous Control Using Reinforcement Learning: a Survey
    2042 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 29, NO. 6, JUNE 2018 Optimal and Autonomous Control Using Reinforcement Learning: A Survey Bahare Kiumarsi , Member, IEEE , Kyriakos G. Vamvoudakis, Senior Member, IEEE , Hamidreza Modares , Member, IEEE , and Frank L. Lewis, Fellow, IEEE Abstract — This paper reviews the current state of the art on unsupervised, or reinforcement, depending on the amount and reinforcement learning (RL)-based feedback control solutions quality of feedback about the system or task. In supervised to optimal regulation and tracking of single and multiagent learning, the feedback information provided to learning algo- systems. Existing RL solutions to both optimal H2 and H∞ control problems, as well as graphical games, will be reviewed. rithms is a labeled training data set, and the objective is RL methods learn the solution to optimal control and game to build the system model representing the learned relation problems online and using measured data along the system between the input, output, and system parameters. In unsu- trajectories. We discuss Q-learning and the integral RL algorithm pervised learning, no feedback information is provided to the as core algorithms for discrete-time (DT) and continuous-time algorithm and the objective is to classify the sample sets to (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review different groups based on the similarity between the input sam- several applications. ples. Finally, reinforcement learning (RL) is a goal-oriented Index Terms — Autonomy, data-based optimization, learning tool wherein the agent or decision maker learns a reinforcement learning (RL).
    [Show full text]