Explorations in E Cient Reinforcement Learning

Explorations in E Cient Reinforcement Learning

Explorations in Ecient Reinforcement Learning ACADEMISCH PROEFSCHRIFT Ter verkrijging van de graad van do ctor aan de Universiteit van Amsterdam op gezag van Rector Magnicus Prof dr JJM Franse ten overstaan van een do or het college voor promoties ingestelde commissie in het op enbaar te verdedigen in de Aula der Universiteit op woensdag februari te uur do or Marco Wiering geb oren te Schagen Promotiecommissie Promotor Prof dr ir FCA Gro en CoPromotor Dr Hab JH Schmidhuber Overige leden Prof dr P van Emde Boas Universiteit van Amsterdam Dr ir BJA Krose Universiteit van Amsterdam Prof dr PMB Vitanyi Universiteit van Amsterdam Dr M Dorigo Universite Libre de Bruxelles Prof Dr J van den Herik Universiteit van Maastricht Faculteit der Wiskunde Informatica Natuurkunde en Sterrenkunde Universiteit van Amsterdam Cover design and photography by Marco Wiering The work describ ed in this thesis was p erformed at IDSIA Istituto Dalle Molle di Studi sullIntelligenza Articiale in Lugano Switzerland It was made p ossible due to funding of IDSIA and was supp orted in part by the Swiss National Fund SNF grant Long ShortTerm Memory The more a man knows ab out himself in relation to every kind of exp erience the greater his chance of suddenly one ne morning realizing who in fact he is or rather Who in Fact he Is Aldous Huxley in Island For You Contents Introduction Reinforcement Learning Reinforcement Learning with Incomplete Information Current Problems of RL Goals of the Thesis Outline and Principal Contributions Markov Decision Pro cesses Markov Decision Pro cesses Value Functions Finite Horizon Problems Innite Horizon Problems Action Evaluation Functions Contraction Dynamic Programming Policy Iteration Value Iteration Linear Programming Exp eriments Description of the Maze Task Scaling up Dynamic Programming More Dicult Problems Continuous State Spaces Curse of Dimensionality Conclusion Reinforcement Learning Principles of Algorithms Delayed Reward and Credit Assignment Problem Markov Prop erty TD Learning Temporal Dierence Learning Replacing Traces QLearning Qlearning Fast Qlearning CONTENTS Team Q Extending the Eligibility Trace Framework Combining Eligibility Traces Exp eriments Evaluating Fast Online Q Exp eriments with Accumulating and Replacing Traces Multiagent Exp eriments Conclusion Learning World Mo dels Extracting a Mo del Mo delBased Qlearning Prioritized Sweeping Exp eriments Comparison b etween Mo delbased and Mo delfree RL Prioritized Sweeping Sensitivity Analysis Comparison b etween PS Metho ds Discussion Conclusion Exploration Undirected Exploration Maxrandom Exploration Rule Boltzmann Exploration Rule MaxBoltzmann Exploration Rule Initialize High Exploration Directed Exploration Reward Function Frequency Based Reward Function Recency Based Reward Function Error Based False Exploration Reward Rules Learning Exploration Mo dels Mo delBased Interval Estimation Exp eriments Exploration with Prioritized Sweeping ExploitationExploration Exp eriments with Sub optimal Goals Discussion Partially Observable MDPs Optimal Algorithms POMDP Sp ecication Belief States Computing an Optimal Policy HQlearning Memory in HQ Learning Rules CONTENTS Exp eriments Discussion Related Work Conclusion Function Approximation for RL Function Approximation Linear Networks Lo cal Function Approximators CMACs Function Approximation for Direct RL Extending Q to function approximators Notes on combining RL with FAs World Mo deling with Function Approximators Linear Mo dels Neural Gas Mo dels CMAC Mo dels A So ccer Case Study So ccer The So ccer Simulator Comparison Qlin Qgas and PIPE Comparison CMACs vs PIPE Discussion Previous Work Conclusion Conclusion Contributions Exact Algorithms for Markov Decision Problems Reinforcement Learning Mo delbased RL Exploration POMDPs Function Approximation .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    218 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us