An Introduction to Deep Reinforcement Learning Full Text Available At
Total Page:16
File Type:pdf, Size:1020Kb
Full text available at: http://dx.doi.org/10.1561/2200000071 An Introduction to Deep Reinforcement Learning Full text available at: http://dx.doi.org/10.1561/2200000071 Other titles in Foundations and Trends® in Machine Learning Non-convex Optimization for Machine Learningy Prateek Jain and Purushottam Ka ISBN: 978-1-68083-368-3 Kernel Mean Embedding of Distributions: A Review and Beyond Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur and Bernhard Scholkopf ISBN: 978-1-68083-288-4 Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 1 Low-Rank Tensor Decompositions Andrzej Cichocki, Anh-Huy Phan, Qibin Zhao, Namgil Lee, Ivan Oseledets, Masashi Sugiyama and Danilo P. Mandic ISBN: 978-1-68083-222-8 Tensor Networks for Dimensionality Reduction and Large-scale Optimization: Part 2 Applications and Future Perspectives Andrzej Cichocki, Anh-Huy Phan, Qibin Zhao, Namgil Lee, Ivan Oseledets, Masashi Sugiyama and Danilo P. Mandic ISBN: 978-1-68083-276-1 Patterns of Scalable Bayesian Inference Elaine Angelino, Matthew James Johnson and Ryan P. Adams ISBN: 978-1-68083-218-1 Generalized Low Rank Models Madeleine Udell, Corinne Horn, Reza Zadeh and Stephen Boyd ISBN: 978-1-68083-140-5 Full text available at: http://dx.doi.org/10.1561/2200000071 An Introduction to Deep Reinforcement Learning Vincent François-Lavet Peter Henderson McGill University McGill University [email protected] [email protected] Riashat Islam Marc G. Bellemare McGill University Google Brain [email protected] [email protected] Joelle Pineau Facebook, McGill University [email protected] Boston — Delft Full text available at: http://dx.doi.org/10.1561/2200000071 Foundations and Trends® in Machine Learning Published, sold and distributed by: now Publishers Inc. PO Box 1024 Hanover, MA 02339 United States Tel. +1-781-985-4510 www.nowpublishers.com [email protected] Outside North America: now Publishers Inc. PO Box 179 2600 AD Delft The Netherlands Tel. +31-6-51115274 The preferred citation for this publication is V. François-Lavet, P. Henderson, R. Islam, M. G. Bellemare and J. Pineau. An Introduction to Deep Reinforcement Learning. Foundations and Trends® in Machine Learning, vol. 11, no. 3-4, pp. 219–354, 2018. ISBN: 978-1-68083-539-7 © 2018 V. François-Lavet, P. Henderson, R. Islam, M. G. Bellemare and J. Pineau All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, mechanical, photocopying, recording or otherwise, without prior written permission of the publishers. Photocopying. In the USA: This journal is registered at the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by now Publishers Inc for users registered with the Copyright Clearance Center (CCC). The ‘services’ for users can be found on the internet at: www.copyright.com For those organizations that have been granted a photocopy license, a separate system of payment has been arranged. Authorization does not extend to other kinds of copying, such as that for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. In the rest of the world: Permission to photocopy must be obtained from the copyright owner. Please apply to now Publishers Inc., PO Box 1024, Hanover, MA 02339, USA; Tel. +1 781 871 0245; www.nowpublishers.com; [email protected] now Publishers Inc. has an exclusive license to publish this material worldwide. Permission to use this content must be obtained from the copyright license holder. Please apply to now Publishers, PO Box 179, 2600 AD Delft, The Netherlands, www.nowpublishers.com; e-mail: [email protected] Full text available at: http://dx.doi.org/10.1561/2200000071 Foundations and Trends® in Machine Learning Volume 11, Issue 3-4, 2018 Editorial Board Editor-in-Chief Michael Jordan University of California, Berkeley United States Editors Peter Bartlett Aapo Hyvarinen Luc de Raedt UC Berkeley Helsinki IIT KU Leuven Yoshua Bengio Leslie Pack Kaelbling Christian Robert Université de Montréal MIT Paris-Dauphine Avrim Blum Michael Kearns Sunita Sarawagi Toyota Technological UPenn IIT Bombay Institute Daphne Koller Robert Schapire Craig Boutilier Stanford University Microsoft Research University of Toronto John Lafferty Bernhard Schoelkopf Stephen Boyd Yale Max Planck Institute Stanford University Michael Littman Carla Brodley Brown University Richard Sutton Northeastern University University of Alberta Gabor Lugosi Inderjit Dhillon Pompeu Fabra Larry Wasserman Texas at Austin CMU David Madigan Jerome Friedman Columbia University Bin Yu Stanford University UC Berkeley Pascal Massart Kenji Fukumizu Université de Paris-Sud ISM Andrew McCallum Zoubin Ghahramani University of Cambridge University Massachusetts Amherst David Heckerman Marina Meila Amazon University of Washington Tom Heskes Andrew Moore Radboud University CMU Geoffrey Hinton John Platt University of Toronto Microsoft Research Full text available at: http://dx.doi.org/10.1561/2200000071 Editorial Scope Topics Foundations and Trends® in Machine Learning publishes survey and tutorial articles in the following topics: • Adaptive control and signal • Inductive logic programming processing • Kernel methods • Applications and case studies • Markov chain Monte Carlo • Behavioral, cognitive and • Model choice neural learning • Nonparametric methods • Bayesian learning • Online learning • Classification and prediction • Optimization • Clustering • Reinforcement learning • Data mining • Dimensionality reduction • Relational learning • Evaluation • Robustness • Game theoretic learning • Spectral methods • Graphical models • Statistical learning theory • Independent component • Variational inference analysis • Visualization Information for Librarians Foundations and Trends® in Machine Learning, 2018, Volume 11, 6 issues. ISSN paper version 1935-8237. ISSN online version 1935-8245. Also available as a combined paper and online subscription. Full text available at: http://dx.doi.org/10.1561/2200000071 Contents 1 Introduction2 1.1 Motivation.......................... 2 1.2 Outline............................ 3 2 Machine learning and deep learning6 2.1 Supervised learning and the concepts of bias and overfitting 7 2.2 Unsupervised learning .................... 9 2.3 The deep learning approach ................. 10 3 Introduction to reinforcement learning 15 3.1 Formal framework ...................... 16 3.2 Different components to learn a policy ........... 20 3.3 Different settings to learn a policy from data ........ 21 4 Value-based methods for deep RL 24 4.1 Q-learning .......................... 24 4.2 Fitted Q-learning ...................... 25 4.3 Deep Q-networks ...................... 27 4.4 Double DQN ......................... 28 4.5 Dueling network architecture ................ 29 4.6 Distributional DQN ..................... 31 4.7 Multi-step learning ...................... 32 Full text available at: http://dx.doi.org/10.1561/2200000071 4.8 Combination of all DQN improvements and variants of DQN 34 5 Policy gradient methods for deep RL 36 5.1 Stochastic Policy Gradient ................. 37 5.2 Deterministic Policy Gradient ................ 39 5.3 Actor-Critic Methods .................... 40 5.4 Natural Policy Gradients .................. 42 5.5 Trust Region Optimization ................. 43 5.6 Combining policy gradient and Q-learning ......... 44 6 Model-based methods for deep RL 46 6.1 Pure model-based methods ................. 46 6.2 Integrating model-free and model-based methods ..... 49 7 The concept of generalization 53 7.1 Feature selection ....................... 58 7.2 Choice of the learning algorithm and function approximator selection ........................... 59 7.3 Modifying the objective function .............. 61 7.4 Hierarchical learning ..................... 62 7.5 How to obtain the best bias-overfitting tradeoff ...... 63 8 Particular challenges in the online setting 66 8.1 Exploration/Exploitation dilemma .............. 66 8.2 Managing experience replay ................. 71 9 Benchmarking Deep RL 73 9.1 Benchmark Environments .................. 73 9.2 Best practices to benchmark deep RL ........... 78 9.3 Open-source software for Deep RL ............. 80 10 Deep reinforcement learning beyond MDPs 81 10.1 Partial observability and the distribution of (related) MDPs 81 10.2 Transfer learning ....................... 86 10.3 Learning without explicit reward function .......... 89 10.4 Multi-agent systems ..................... 91 Full text available at: http://dx.doi.org/10.1561/2200000071 11 Perspectives on deep reinforcement learning 94 11.1 Successes of deep reinforcement learning .......... 94 11.2 Challenges of applying reinforcement learning to real-world problems ........................... 95 11.3 Relations between deep RL and neuroscience ........ 96 12 Conclusion 99 12.1 Future development of deep RL ............... 99 12.2 Applications and societal impact of deep RL ........ 100 Appendices 103 A Appendix 104 A.1 Deep RL frameworks .................... 104 References 106 Full text available at: http://dx.doi.org/10.1561/2200000071 An Introduction to Deep Reinforcement Learning Vincent François-Lavet1, Peter Henderson2, Riashat Islam3, Marc G. Bellemare4 and Joelle Pineau5 1McGill University; [email protected] 2McGill University;