Coping with the Curse of Dimensionality by Combining Linear Programming and Reinforcement Learning
Total Page:16
File Type:pdf, Size:1020Kb
Utah State University DigitalCommons@USU All Graduate Theses and Dissertations Graduate Studies 5-2010 Coping with the Curse of Dimensionality by Combining Linear Programming and Reinforcement Learning Scott H. Burton Utah State University Follow this and additional works at: https://digitalcommons.usu.edu/etd Part of the Computer Engineering Commons, and the Operational Research Commons Recommended Citation Burton, Scott H., "Coping with the Curse of Dimensionality by Combining Linear Programming and Reinforcement Learning" (2010). All Graduate Theses and Dissertations. 559. https://digitalcommons.usu.edu/etd/559 This Thesis is brought to you for free and open access by the Graduate Studies at DigitalCommons@USU. It has been accepted for inclusion in All Graduate Theses and Dissertations by an authorized administrator of DigitalCommons@USU. For more information, please contact [email protected]. COPING WITH THE CURSE OF DIMENSIONALITY BY COMBINING LINEAR PROGRAMMING AND REINFORCEMENT LEARNING by Scott H. Burton A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in Computer Science Approved: _______________________ _______________________ Nicholas S. Flann Donald H. Cooley Major Professor Committee Member _______________________ _______________________ Dan W. Watson Byron R. Burnham Committee Member Dean of Graduate Studies UTAH STATE UNIVERSITY Logan, Utah 2010 ii Copyright © Scott H. Burton 2010 All Rights Reserved iii ABSTRACT Coping with the Curse of Dimensionality by Combining Linear Programming and Reinforcement Learning by Scott H. Burton, Master of Science Utah State University, 2010 Major Professor: Dr. Nicholas S. Flann Department: Computer Science Reinforcement learning techniques offer a very powerful method of finding solutions in unpredictable problem environments where human supervision is not possible. However, in many real world situations, the state space needed to represent the solutions becomes so large that using these methods becomes infeasible. Often the vast majority of these states are not valuable in finding the optimal solution. This work introduces a novel method of using linear programming to identify and represent the small area of the state space that is most likely to lead to a near-optimal solution, significantly reducing the memory requirements and time needed to arrive at a solution. An empirical study is provided to show the validity of this method with respect to a specific problem in vehicle dispatching. This study demonstrates that, in problems that are too large for a traditional reinforcement learning agent, this new approach yields iv solutions that are a significant improvement over other nonlearning methods. In addition, this new method is shown to be robust to changing conditions both during training and execution. Finally, some areas of future work are outlined to introduce how this new approach might be applied to additional problems and environments. (66 pages) v CONTENTS Page ABSTRACT ....................................................................................................................... iii LIST OF TABLES ............................................................................................................. vi LIST OF FIGURES ......................................................................................................... viii CHAPTER I. INTRODUCTION ............................................................................................... 1 Significance................................................................................................. 1 Background ................................................................................................. 2 Reinforcement Learning ............................................................................. 5 Problem Definition...................................................................................... 7 II. APPROACH AND METHODS .......................................................................... 9 Linear Programming Approach .................................................................. 9 Reinforcement Learning Approach ........................................................... 12 Combined Approach ................................................................................. 14 Random Dispatcher ................................................................................... 19 Simulator ................................................................................................... 19 Comparison Methods ................................................................................ 20 III. EMPIRICAL STUDY ........................................................................................ 22 Study 1: The Results and Explored State Space of a Simple Problem .... 23 Study 2: Overcoming Bad Initial Data Based on Experience .................. 28 Study 3: Examining Larger Problems ...................................................... 35 Study 4: Varied Learning Parameters ...................................................... 42 Study 5: Non-Uniform Problem Configuration ....................................... 46 IV. CONCLUSION AND FUTURE WORK .......................................................... 54 REFERENCES ................................................................................................................. 56 vi LIST OF TABLES Table Page 1 Advantages and Disadvantages of Methods. ........................................................ 15 2 Summary of Algorithms ....................................................................................... 21 3 Mine-1 Optimal Quotas......................................................................................... 24 4 Mine-1 Total Delivered. ........................................................................................ 25 5 Mine-1 Unpaired t-test Results. ............................................................................ 26 6 Biased Mine-1 Results. ......................................................................................... 29 7 Biased Mine-1 Relative Results. ........................................................................... 29 8 Mine-1-Small-Bias Unpaired t-test Results. ......................................................... 31 9 Mine-1-Large-Bias Unpaired t-test Results. ......................................................... 31 10 Mine-2 Processing Times. ..................................................................................... 36 11 Mine-2 Distances. ................................................................................................. 36 12 Mine-2 Optimal Quotas......................................................................................... 37 13 Mine-2 Results. ..................................................................................................... 38 14 Mine-2 Relative Results. ....................................................................................... 38 15 Mine-2-Small-Bias Unpaired t-test Results. ......................................................... 40 16 Mine-2-Large-Bias Unpaired t-test Results. ......................................................... 40 17 Mine-2 Varied Exploration Rates. ........................................................................ 43 18 Mine-2 Varied Discounting Rates. ........................................................................ 45 19 Mine-3 Processing Times ..................................................................................... 47 20 Mine-3 Distances. ................................................................................................. 47 vii 21 Mine-3 Optimal Quotas......................................................................................... 48 22 Mine-3 Results. ..................................................................................................... 49 23 Mine-3 Relative Results. ....................................................................................... 49 24 Mine-3-Small-Bias Unpaired t-test Results. ......................................................... 50 25 Mine-3-Large-Bias Unpaired t-test Results. ......................................................... 51 viii LIST OF FIGURES Figure Page 1 A simple RL state space representation. ............................................................... 13 2 RL state space with maximum upper bound. ........................................................ 16 3 RL state space showing diagonal. ......................................................................... 17 4 Visual representation of Mine-1............................................................................ 24 5 Mine-1 total delivered. .......................................................................................... 25 6 Mine-1 total delivered relative to LP-greedy. ....................................................... 26 7 Visual representation of updates made during learning. ....................................... 27 8 Total delivered on the biased Mine-1 problems. ................................................... 30 9 Total delivered on biased Mine-1 problems relative to LP-greedy. ...................... 30 10 Mine-1-Small-Bias learning progress. .................................................................. 33 11 Mine-1-Large-Bias learning progress. .................................................................. 33 12 Mine-2 total delivered. .........................................................................................