Essays on Reinforcement Learning with Decision Trees and Accelerated Boosting of Partially Linear Additive Models

Essays on Reinforcement Learning with Decision Trees and Accelerated Boosting of Partially Linear Additive Models A dissertation submitted to the Graduate School of the University of Cincinnati in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the department of Operations, Business Analytics and Information Systems of the Carl H. Lindner College of Business by Steven Dinger M.S. Clemson University 2014 B.S. Clemson University 2005 Committee Uday Rao, PhD (Chair) Yan Yu, PhD Dungang Liu, PhD Raj Bhatnagar, PhD June 2019 Abstract Reinforcement learning has become a popular research topic due to the recent successes in combining deep learning value function estimation and reinforcement learning. Because of the popularity of these methods, deep learning has become the de facto standard for function approximation in reinforcement learning. However, other function approximation methods offer advantages in speed, ease of use, interpretability and stability. In our first essay, we examine several existing reinforcement learning methods that use decision trees for function approximation. Results of testing on a benchmark reinforcement learning problem show promising results for decision tree based methods. In addition, we propose the use of online random forests for reinforcement learning which show competitive results. In the second essay, we discuss accelerated boosting of partially linear models. Partially linear additive models are a powerful and flexible technique for modeling complex data. However, automatic variable selection to linear, nonlinear and uninformative terms can be computation- ally expensive. We propose using accelerated twin boosting to automatically select these terms and fit a partially linear additive model. Acceleration reduces the computational effort versus non-accelerated methods while maintaining accuracy and ease of use. Twin boosting is adopted to improve variable selection of accelerated boosting. We demonstrate the results of our pro- posed method on simulated and real data sets. We show that accelerated twin boosting results in accurate, parsimonious models with substantially less computation than non-accelerated twin boosting. iii Acknowledgements I would like to thank my advisor, Uday Rao, for all his help and support. Without your guid- ance and encouragement I would not be where I am today. I would also like to thank Yan Yu for her advice and feedback on my research. You helped me turn an idea into paper in a short amount of time and I am eternally grateful for it. To my committee members, Dungang Liu and Raj Bhatnagar, thank you for feedback and willingness to help me through this process. To my family, thank you for your support and understanding. To my friends, thanks for mak- ing me laugh, letting me complain and taking my mind off work. To Jason Thatcher, thank you for always being available to give me advice about academia. Finally, to my fiancée, Kate, thank you for being there for me. Thank you for helping me through tough times and taking me on amazing adventures. I am excited to spend the rest of my life with you. v Contents Abstract iii Acknowledgements v List of Figures xi List of Tables xiii List of Algorithms xv List of Abbreviations xvii 1 Introduction 1 2 Reinforcement Learning with Decision Trees - Review, Implementation and Computation Comparison 5 1 Introduction . .5 2 Literature Review . .6 3 Methods . 10 3.1 Background . 10 vii Markov Decision Process . 11 Bellman Optimality Equation . 13 Q-Learning . 15 Function Approximation . 17 3.2 Batch Methods . 18 Fitted Q Iteration . 18 Boosted FQI . 21 Neural FQI . 23 3.3 Online Methods . 25 G-Learning . 26 Abel’s Method . 28 Deep Q-Networks . 29 3.4 Online Random Forest Methods . 30 4 Testing . 33 4.1 Environment . 33 4.2 Batch Methods . 35 4.3 Online Methods . 36 4.4 Replications . 37 4.5 Parameter Tuning . 39 5 Results . 39 5.1 Batch Methods . 40 Single Tree FQI . 40 Random Forest FQI . 40 Boosted FQI . 41 viii Neural FQI . 45 5.2 Online . 45 G-Learning . 46 Abel’s Method . 47 Deep Q-Network . 49 Online Random Forest . 51 5.3 Comparison . 55 6 Limitations . 58 7 Future Work . 59 8 Discussion . 60 3 Accelerated Boosting of Partially Linear Additive Models 61 1 Introduction . 61 2 Partially Linear Additive Models and Boosting . 65 2.1 Partially Linear Additive Models . 65 2.2 Boosting of Partially Linear Additive Models . 67 2.3 Twin Boosting of Partially Linear Additive Models . 71 3 Accelerated Boosting . 72 3.1 Accelerated Boosting of Partially Linear Additive Models . 72 3.2 Accelerated Twin Boosting of Partially Linear Additive Models . 74 4 Numerical Study . 74 4.1 Variable Selection Results . 78 4.2 Estimation Results . 81 4 Conclusion 89 ix Bibliography 97 x List of Figures 1 RL Agent Interacting with the Environment . 11 2 RL Example Problem Environment . 12 3 Value Iteration Example . 14 4 Q-Learning Example . 16 5 Example Neural Network Structure . 24 6 CartPole Testing Environment . 34 7 Comparison of Raw and Filtered Reward Outputs . 37 8 Comparison of All Replications and Averaged Rewards . 38 9 Single Tree FQI Results . 41 10 Random Forest FQI Results . 42 11 Boosted FQI: Uncorrelated and Correlated Comparison . 43 12 Boosted FQI: Final Results . 44 13 Neural FQI Results . 46 14 G-Learning Results . 47 15 Abel’s Method Results . 48 16 Deep Q-Network Results . 50 17 DQN Results with Limited Memory . 51 xi 18 Online Random Forest FQI with Stopping Results . 52 19 Moving Window Random Forest Results . 53 20 Moving Window Random Forest Results with Limited Memory . 54 21 Comparison of RL Methods . 56 1 Twin versus Single Accelerated Boosting Results for Example 1 . 79 2 Variable Selection Results Example 1 . 79 3 Twin versus Single Accelerated Boosting Results for Example 2 . 80 4 Variable Selection Results Example 2 . 81 5 Results for Example 1 . 83 6 Boxplots for Example 1 vs Learning Rate . 84 7 Results for Example 2 . 85 8 Boxplots for Example 2 vs Learning Rate . 85 9 Boxplots for Taiwan Housing Data vs Learning Rate . 86 xii List of Tables 1 Classification of RL Papers . .9 2 Classification of RL Methods . 10 3 Execution Time of RL Methods. 58 1 Summary.

Essays on Reinforcement Learning with Decision Trees and Accelerated Boosting of Partially Linear Additive Models

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support