Abstract Dynamic Programming, 2Nd Edition, by Dimitri P
Total Page:16
File Type:pdf, Size:1020Kb
Abstract Dynamic Programming SECOND EDITION Dimitri P. Bertsekas Massachusetts Institute of Technology WWW site for book information and orders http://www.athenasc.com Athena Scientific, Belmont, Massachusetts Athena Scientific Post Office Box 805 Nashua, NH 03061-0805 U.S.A. Email: [email protected] WWW: http://www.athenasc.com Cover design and photography: Dimitri Bertsekas Cover Image from Simmons Hall, MIT (Steven Holl, architect) c 2018 Dimitri P. Bertsekas All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. Publisher’s Cataloging-in-Publication Data Bertsekas, Dimitri P. Abstract Dynamic Programming: Second Edition Includes bibliographical references and index 1. Mathematical Optimization. 2. Dynamic Programming. I. Title. QA402.5 .B465 2018 519.703 01-75941 ISBN-10: 1-886529-46-9, ISBN-13: 978-1-886529-46-5 ABOUT THE AUTHOR Dimitri Bertsekas studied Mechanical and Electrical Engineering at the National Technical University of Athens, Greece, and obtained his Ph.D. in system science from the Massachusetts Institute of Technology. He has held faculty positions with the Engineering-Economic Systems Department, Stanford University, and the Electrical Engineering Department of the Uni- versity of Illinois, Urbana. Since 1979 he has been teaching at the Electrical Engineering and Computer Science Department of the Massachusetts In- stitute of Technology (M.I.T.), where he is currently the McAfee Professor of Engineering. His teaching and research spans several fields, including determinis- tic optimization, dynamic programming and stochastic control, large-scale and distributed computation, and data communication networks. He has authored or coauthored numerous research papers and sixteen books, sev- eral of which are currently used as textbooks in MIT classes, including “Dynamic Programming and Optimal Control,” “Data Networks,” “Intro- duction to Probability,” “Convex Optimization Theory,” “Convex Opti- mization Algorithms,” and “Nonlinear Programming.” Professor Bertsekas was awarded the INFORMS 1997 Prize for Re- search Excellence in the Interface Between Operations Research and Com- puter Science for his book “Neuro-Dynamic Programming” (co-authored with John Tsitsiklis), the 2001 AACC John R. Ragazzini Education Award, the 2009 INFORMS Expository Writing Award, the 2014 AACC Richard Bellman Heritage Award, the 2014 Khachiyan Prize for Life-Time Accom- plishments in Optimization, and the MOS/SIAM 2015 George B. Dantzig Prize. In 2001, he was elected to the United States National Academy of Engineering for “pioneering contributions to fundamental research, practice and education of optimization/control theory, and especially its application to data communication networks.” iii ATHENA SCIENTIFIC OPTIMIZATION AND COMPUTATION SERIES 1. Abstract Dynamic Programming, 2nd Edition, by Dimitri P. Bertsekas, 2018, ISBN 978-1-886529-46-5, 360 pages 2. Dynamic Programming and Optimal Control, Two-Volume Set, by Dimitri P. Bertsekas, 2017, ISBN 1-886529-08-6, 1270 pages 3. Nonlinear Programming, 3rd Edition, by Dimitri P. Bertsekas, 2016, ISBN 1-886529-05-1, 880 pages 4. Convex Optimization Algorithms, by Dimitri P. Bertsekas, 2015, ISBN 978-1-886529-28-1, 576 pages 5. Convex Optimization Theory, by Dimitri P. Bertsekas, 2009, ISBN 978-1-886529-31-1, 256 pages 6. Introduction to Probability, 2nd Edition, by Dimitri P. Bertsekas and John N. Tsitsiklis, 2008, ISBN 978-1-886529-23-6, 544 pages 7. Convex Analysis and Optimization, by Dimitri P. Bertsekas, An- gelia Nedi´c, and Asuman E. Ozdaglar, 2003, ISBN 1-886529-45-0, 560 pages 8. Network Optimization: Continuous and Discrete Models, by Dim- itri P. Bertsekas, 1998, ISBN 1-886529-02-7, 608 pages 9. Network Flows and Monotropic Optimization, by R. Tyrrell Rock- afellar, 1998, ISBN 1-886529-06-X, 634 pages 10. Introduction to Linear Optimization, by Dimitris Bertsimas and John N. Tsitsiklis, 1997, ISBN 1-886529-19-1, 608 pages 11. Parallel and Distributed Computation: Numerical Methods, by Dimitri P. Bertsekas and John N. Tsitsiklis, 1997, ISBN 1-886529- 01-9, 718 pages 12. Neuro-Dynamic Programming, by Dimitri P. Bertsekas and John N. Tsitsiklis, 1996, ISBN 1-886529-10-8, 512 pages 13. Constrained Optimization and Lagrange Multiplier Methods, by Dimitri P. Bertsekas, 1996, ISBN 1-886529-04-3, 410 pages 14. Stochastic Optimal Control: The Discrete-Time Case, by Dimitri P. Bertsekas and Steven E. Shreve, 1996, ISBN 1-886529-03-5, 330 pages iv Contents 1.Introduction . p.1 1.1. StructureofDynamicProgrammingProblems . p.2 1.2. AbstractDynamicProgrammingModels . p.5 1.2.1. Problem Formulation . p. 5 1.2.2. Monotonicity and Contraction Properties . p. 7 1.2.3.SomeExamples . .p.10 1.2.4. Approximation Models - Projected and Aggregation . BellmanEquations . .p.24 1.2.5. Multistep Models - Temporal Difference and . ProximalAlgorithms . p.26 1.3.OrganizationoftheBook . p.29 1.4.Notes,Sources,andExercises. p.31 2.ContractiveModels . .p.39 2.1. Bellman’s Equation and Optimality Conditions . p. 40 2.2. Limited Lookahead Policies . p. 47 2.3.ValueIteration . .p.52 2.3.1. Approximate Value Iteration . p. 53 2.4.PolicyIteration. .p.56 2.4.1. Approximate Policy Iteration . p. 59 2.4.2. Approximate Policy Iteration Where Policies . Converge . .p.61 2.5. Optimistic Policy Iteration and λ-Policy Iteration . p. 63 2.5.1. Convergence of Optimistic Policy Iteration . p. 65 2.5.2. Approximate Optimistic Policy Iteration . p. 70 2.5.3. Randomized Optimistic Policy Iteration . p. 73 2.6.AsynchronousAlgorithms . p.77 2.6.1. AsynchronousValueIteration . p.77 2.6.2. Asynchronous Policy Iteration . p. 84 2.6.3. Optimistic Asynchronous Policy Iteration with a . UniformFixedPoint . p.89 2.7.Notes,Sources,andExercises. p.96 v vi Contents 3. Semicontractive Models . p. 107 3.1. Pathologies of Noncontractive DP Models . p. 109 3.1.1. Deterministic Shortest Path Problems . p. 113 3.1.2. Stochastic Shortest Path Problems . p. 115 3.1.3. The Blackmailer’s Dilemma . p. 117 3.1.4. Linear-Quadratic Problems . p. 120 3.1.5. An Intuitive View of Semicontractive Analysis . p. 125 3.2. Semicontractive Models and Regular Policies . p. 127 3.2.1. S-Regular Policies . p. 130 3.2.2. Restricted Optimization over S-Regular Policies . p. 132 3.2.3. Policy Iteration Analysis of Bellman’s Equation . p. 138 3.2.4. Optimistic Policy Iteration and λ-Policy Iteration . p. 146 3.2.5. A Mathematical Programming Approach . p. 150 3.3. Irregular Policies/Infinite Cost Case . p. 151 3.4. Irregular Policies/Finite Cost Case - A Perturbation . Approach ...................... p.157 3.5. Applications in Shortest Path and Other Contexts . p. 163 3.5.1. Stochastic Shortest Path Problems . p. 164 3.5.2. AffineMonotonicProblems . p.172 3.5.3. Robust Shortest Path Planning . p. 181 3.5.4. Linear-Quadratic Optimal Control . p. 191 3.5.5. Continuous-State Deterministic Optimal Control . p. 193 3.6.Algorithms. p.197 3.6.1. Asynchronous Value Iteration . p. 197 3.6.2. Asynchronous Policy Iteration . p. 198 3.7.Notes,Sources,andExercises. p.205 4. Noncontractive Models . p. 217 4.1. Noncontractive Models - Problem Formulation . p. 219 4.2. Finite Horizon Problems . p. 221 4.3. Infinite Horizon Problems . p. 227 4.3.1. Fixed Point Properties and Optimality Conditions . p. 230 4.3.2. Value Iteration . p. 242 4.3.3. Exact and Optimistic Policy Iteration - . λ-Policy Iteration . p. 246 4.4. Regularity and Nonstationary Policies . p. 251 4.4.1. Regularity and Monotone Increasing Models . p. 257 4.4.2. Nonnegative Cost Stochastic Optimal Control . p. 259 4.4.3. Discounted Stochastic Optimal Control . p. 262 4.4.4.ConvergentModels . p.264 4.5. Stable Policies for Deterministic Optimal Control . p. 268 4.5.1. Forcing Functions and p-Stable Policies . p. 272 4.5.2. Restricted Optimization over Stable Policies . p. 275 4.5.3. Policy Iteration Methods . p. 287 Contents vii 4.6. Infinite-Spaces Stochastic Shortest Path Problems . p. 293 4.6.1. The Multiplicity of Solutions of Bellman’s Equation . p. 301 4.6.2. TheCaseofBoundedCostperStage . p.303 4.7.Notes,Sources,andExercises. p.306 Appendix A: Notation and Mathematical Conventions . p. 323 A.1.SetNotationandConventions . p.323 A.2.Functions . p.325 AppendixB:ContractionMappings . p.327 B.1. Contraction Mapping Fixed Point Theorems . p. 327 B.2. WeightedSup-NormContractions . p. 331 References . p.337 Index ....................... p.345 Preface of the First Edition This book aims at a unified and economical development of the core the- ory and algorithms of total cost sequential decision problems, based on the strong connections of the subject with fixed point theory. The analy- sis focuses on the abstract mapping that underlies dynamic programming (DP for short) and defines the mathematical character of the associated problem. Our discussion centers on two fundamental properties that this mapping may have: monotonicity and (weighted sup-norm) contraction. It turns out that the nature of the analytical and algorithmic DP theory is determined primarily by the presence or absence of these two properties, and the rest of the problem’s structure is largely inconsequential. In this book, with some minor exceptions, we will assume that mono- tonicity holds. Consequently, we organize our treatment around the con- traction property, and we focus on four main classes of models: (a) Contractive models, discussed in Chapter 2, which have the richest and strongest theory, and are the benchmark against which the the- ory of other models is compared. Prominent among these models are discounted stochastic optimal control problems. The development of these models is quite thorough and includes the analysis of recent ap- proximation algorithms for large-scale problems (neuro-dynamic pro- gramming, reinforcement learning). (b) Semicontractive models, discussed in Chapter 3 and parts of Chap- ter 4. The term “semicontractive” is used qualitatively here, to refer to a variety of models where some policies have a regularity/contrac- tion-like property but others do not.