
UNIVERSITY OF CALIFORNIA, MERCED Learning Representations in Reinforcement Learning A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering and Computer Science by Jacob Rafati Heravi Committee in charge: Professor David C. Noelle, Chair Professor Marcelo Kallmann Professor Roummel F. Marcia Professor Shawn Newsam Professor Jeffrey Yoshimi 2019 Copyright Notice Portion of Chapter3 c 2015 Cognitive Science Society • Jacob Rafati, and David C. Noelle. (2015). Lateral Inhibition Overcomes Lim- its of Temporal Difference Learning, In proceedings of 37th Annual Meeting of Cognitive Science Society, Pasadena, CA. Portion of Chapter3 c 2017 Cognitive Computational Neuroscience • Jacob Rafati, and David C. Noelle. (2017). Sparse Coding of Learned State Representations in Reinforcement Learning. 1st Cognitive Computational Neu- roscience Conference, NYC, NY. Portion of Chapter4 c 2019 Association for the Advancement of Artificial Intelligence • Jacob Rafati, and David C. Noelle. (2019). Learning Representations in Model- Free Hierarchical Reinforcement Learning. In proceedings of 33rd AAAI Con- ference on Artificial Intelligence, Honolulu, HI. Portion of Chapter5 c 2018 The European Association for Signal Processing • Jacob Rafati, Omar DeGuchy, and Roummel F. Marcia (2018). Trust-Region Minimization Algorithms for Training Responses (TRMinATR): The Rise of Machine Learning Techniques. In proceedings of 26th European Signal Process- ing Conference (EUSIPCO 2018), Rome, Italy. Portion of Chapter5 c 2018 Institute of Electrical and Electronics Engineers (IEEE) • Jacob Rafati, and Roummel F. Marcia. (2018). Improving L-BFGS Initializa- tion For Trust-Region Methods In Deep Learning. In proceedings of 17th IEEE International Conference on Machine Learning and Applications, Orlando, FL. All Other Chapters c 2019 Jacob Rafati Heravi All Rights Reserved. The Dissertation of Jacob Rafati Heravi is approved, and it is acceptable in quality and form for publication on microfilm and electronically: Marcelo Kallmann Roummel F. Marcia Shawn Newsam Jeffrey Yoshimi David C. Noelle, Chair University of California, Merced 2019 iii Dedication To my mother. She has worked so hard, and has supported me throughout my entire life all alone. She has tolerated our 8,000 mile distance for over 5 years due to the current travel bans. I hope this accomplishment brings joy to her life. To Katie Williams, for all the emotional, and spiritual support, and for all love. iv Contents List of Symbols ix List of Figures xi List of Tables xvii List of Algorithms xviii Preface xix Acknowledgment xx Curriculum Vita xxi Abstract xxiii 1 Introduction1 1.1 Motivation.................................1 1.2 Dissertation Outline, and Objectives..................4 2 Reinforcement Learning6 2.1 Reinforcement Learning Problem....................6 2.1.1 Agent and Environment interaction...............6 2.1.2 Policy Function..........................7 2.1.3 Objective in RL..........................7 2.2 Markov Decision Processes........................7 2.2.1 Formal Definition of MDP....................7 2.2.2 Value Function...........................8 2.2.3 Bellman Equations........................9 2.2.4 Optimal Value Function..................... 10 2.2.5 Value iteration algorithm..................... 10 2.3 Reinforcement Learning algorithms................... 11 2.3.1 Value-based vs Policy-based Methods.............. 12 2.3.2 Bootstrapping vs Sampling.................... 13 2.3.3 Model-Free vs Model-Based RL................. 13 v 2.4 Temporal Difference Learning...................... 14 2.4.1 SARSA.............................. 14 2.4.2 Q-Learning............................ 15 2.5 Generalization in Reinforcement Learning............... 15 2.5.1 Feed-forward Neural Networks.................. 16 2.5.2 Loss Function as Expectation of TD Error........... 17 2.6 Empirical Risk Minimization in Deep RL................ 18 3 Learning Sparse Representations in Reinforcement Learning 19 3.1 Introduction................................ 19 3.2 Background................................ 21 3.3 Methods for Learning Sparse Representations............. 23 3.3.1 Lateral inhibition......................... 23 3.3.2 k-Winners-Take-All mechanism................. 23 3.3.3 Feedforward kWTA neural network............... 24 3.4 Numerical Simulations.......................... 26 3.4.1 Experiment design........................ 26 3.4.2 The Puddle-world task...................... 27 3.4.3 The Mountain-car task...................... 29 3.4.4 The Acrobot task......................... 31 3.5 Results and Discussions......................... 33 3.5.1 The puddle-world task...................... 33 3.5.2 The mountain-car task...................... 35 3.5.3 The Acrobot task......................... 35 3.6 Future Work................................ 38 3.7 Conclusions................................ 38 4 Learning Representations in Model-Free Hierarchical Reinforcement Learning 39 4.1 Introduction................................ 40 4.2 Failure of RL in Tasks with Sparse Feedback.............. 42 4.3 Hierarchical Reinforcement Learning.................. 43 4.3.1 Subgoals vs. Options....................... 44 4.3.2 Spatiotemporal Hierarchies.................... 45 4.3.3 Hierarchical Reinforcement Learning Subproblems....... 45 4.4 Meta-controller/Controller Framework................. 47 4.5 Intrinsic Motivation Learning...................... 50 4.6 Experiment on Intrinsic Motivation Learning.............. 52 4.6.1 Training the State-Goal Value Function............. 53 4.6.2 Intrinsic Motivation Performance Results............ 54 4.6.3 Reusing Learned Skills...................... 55 4.7 Unsupervised Subgoal Discovery..................... 56 4.7.1 Anomaly Detection........................ 60 4.7.2 K-Means Clustering........................ 61 vi 4.7.3 Mathematical Interpretation................... 61 4.8 A Unified Model-Free HRL Framework................. 62 4.9 Experiments on Unified HRL Framework................ 63 4.9.1 4-Room Task with Key and Lock................ 63 4.9.2 Montezuma’s Revenge...................... 70 4.10 Neural Correlates of Model-Free HRL.................. 73 4.11 Future Work................................ 74 4.11.1 Learning Representations in Model-based HRL......... 74 4.11.2 Solving Montezuma’s Revenge.................. 75 4.12 Conclusions................................ 76 5 Trust-Region Methods for Empirical Risk Minimization 77 5.1 Introduction................................ 78 5.1.1 Existing Methods......................... 78 5.1.2 Motivation and Objectives.................... 79 5.2 Background................................ 81 5.2.1 Unconstrained Optimization Problem.............. 81 5.2.2 Recognizing A Local Minimum................. 82 5.2.3 Main Algorithms......................... 82 5.3 Optimization Strategies......................... 82 5.3.1 Line Search Method....................... 83 5.3.2 Trust-Region Qausi-Newton Method.............. 84 5.4 Quasi-Newton Optimization Methods.................. 85 5.4.1 The BFGS Update........................ 86 5.4.2 The SR1 Update......................... 86 5.4.3 Compact Representations.................... 86 5.4.4 Limited-Memory quasi-Newton methods............ 87 5.4.5 Trust-Region Subproblem Solution............... 87 5.5 Experiment on L-BFGS Line Search vs. Trust Region......... 88 5.5.1 LeNet-5 Convolutional Neural Network Architecture...... 88 5.5.2 MNIST Image Classification Task................ 88 5.5.3 Results............................... 89 5.6 Proposed Quasi-Newton Matrix Initializations............. 90 5.6.1 Initialization Method I...................... 90 5.6.2 Initialization Method II..................... 92 5.6.3 Initialization Method III..................... 93 5.7 Experiments on L-BFGS Initialization................. 94 5.7.1 Computing Gradients....................... 94 5.7.2 Multi-batch Sampling....................... 94 5.7.3 Computing yk ........................... 95 5.7.4 Other Parameters......................... 95 5.7.5 Results and Discussions..................... 95 5.8 Future Work................................ 98 vii 5.9 Conclusions................................ 98 6 Quasi-Newton Optimization in Deep Reinforcement Learning 99 6.1 Introduction................................ 100 6.2 Optimization Problems in RL...................... 102 6.3 Line-search L-BFGS Optimization.................... 103 6.3.1 Line Search Method....................... 103 6.3.2 Quasi-Newton Optimization Methods.............. 104 6.3.3 The BFGS Quasi-Newton Update................ 104 6.3.4 Limited-Memory BFGS..................... 104 6.4 Deep L-BFGS Q Learning Method................... 105 6.5 Convergence Analysis........................... 106 6.5.1 Convergence of empirical risk.................. 106 6.5.2 Value Optimality......................... 108 6.5.3 Computation time........................ 109 6.6 Experiments on ATARI 2600 Games.................. 110 6.7 Results and Discussions......................... 112 6.8 Future Work................................ 115 6.9 Conclusions................................ 116 7 Concluding Remarks 117 7.1 Summary of Contributions........................ 118 7.2 Future Work................................ 121 Bibliography 122 viii List of Symbols α Learning rate D Agent’s experience memory D1 Controller’s experience memory in HRL D2 Meta-controller’s experience memory in HRL Exploration rate in -greedy method G Subgoals space
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages154 Page
-
File Size-