Efficient Learning of Relational Models for Sequential Decision Making

EFFICIENT LEARNING OF RELATIONAL MODELS FOR SEQUENTIAL DECISION MAKING BY THOMAS J. WALSH A dissertation submitted to the Graduate School—New Brunswick Rutgers, The State University of New Jersey in partial fulfillment of the requirements for the degree of Doctor of Philosophy Graduate Program in Computer Science Written under the direction of Michael L. Littman and approved by New Brunswick, New Jersey October, 2010 c 2010 Thomas J. Walsh ALL RIGHTS RESERVED ABSTRACT OF THE DISSERTATION Efficient Learning of Relational Models for Sequential Decision Making by Thomas J. Walsh Dissertation Director: Michael L. Littman The exploration-exploitation tradeoff is crucial to reinforcement-learning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, near-optimal behavior in all but a polynomial number of timesteps in the agent’s lifetime. In this work, we prove similar results for certain relational representations, primarily a class we call “relational action schemas”. These generalized models allow us to specify state transitions in a compact form, for instance describing the effect of picking up a generic block instead of picking up 10 different specific blocks. We present theoretical results on crucial subproblems in action-schema learning using the KWIK framework, which allows us to characterize the sample efficiency of an agent learning these models in a reinforcement-learning setting. These results are extended in an apprenticeship learning paradigm where and agent has access not only to its environment, but also to a teacher that can demonstrate traces of state/action/state sequences. We show that the class of action schemas that are efficiently learnable in this paradigm is strictly larger than those learnable in the online setting. We link the class of efficiently learnable dynamics in the apprenticeship setting to a rich class of models derived from well-known learning frameworks. As an application, we present theoretical and empirical results on learning relational models of web-service descriptions using a dataflow model called a Task Graph to capture the important ii connections between inputs and outputs of services in a workflow, with experiments constructed using publicly available web services. This application shows that compact relational models can be efficiently learned from limited amounts of basic data. Finally, we present several extensions of the main results in the thesis, including expansions of the languages with Description Logics. We also explore the use of sample-based planners to speed up the computation time of our algorithms. iii Acknowledgements I want to thank a few people for their contributions to this work and my career. First, I want to thank my advisor, Michael Littman, who has been an extraordinarily patient and wise mentor throughout this process. I am eternally grateful for the faith he showed in me and the advice he imparted to me over the years. Michael’s insights and sense of humor (puns and all) make his research lab a fantastic environment, and I am so proud to have been his student. I also thank Alex Borgida, who was a mentor to me a Rutgers and worked with me on a number of different research projects, several of which are described in this thesis. Alex’s openness to new research topics, wide breadth of knowledge, and endless supply of cookies were wonderful resources for me throughout grad school. Also, thank you to the other members of my committee, Chung-chieh Shan and Roni Khardon, for helping to shape and clarify the ideas in this document. In particular, I want to thank Roni for his careful attention to detail throughout the whole thesis, which helped fix a number of errors and unclear statements from earlier versions. Many portions of this thesis were expanded from earlier collaborative work, and while I have mentioned these papers in the corresponding chapters, I would like to thank Michael Littman, Alex Borgida, István Szita, Carlos Diuk, Kaushik Subramanian, and Sergiu Goschin, each of whom collaborated with me on these earlier works. I also thank Lihong Li, who worked with me on a number of publications and projects, and whose research echoes throughout this document. I would also like to thank my co-authors from other publications that were instrumental in my growth as a researcher: Bethany Leffler, Alex Strehl, Haym Hirsch, Ali Nouri, Fusun Yaman, Marie desJardins, and Rick Kuhn. A special thank-you goes to Marie desJardins, who was also my undergraduate advisor at UMBC. Her guidance and patience in those early years helped set me on the path I am on today. While many of them have been mentioned above, I want to explicitly thank the members (past and present) of the RL3 lab, with whom I have worked, laughed, and weathered the storms of grad school. To the original cast of Alex Strehl, Carlos Diuk, Bethany Leffler, Ali Nouri, iv and Lihong Li: you guys were there from the beginning and shared many of the moments of both exuberance and doubt that came in those early years. I can’t think of a better group of people to be trapped in a van with on the way to Pittsburgh. And to the “next generation” of John Asmuth, Chris Mansley, Monica Babes, Michael Wunder, Ari Weinstein, Sergiu Goshin, and Kaushik Subramanian: thank you for your friendship and trips to “fancy lunch” over the last few years—I feel that this transition is leaving the lab in a high-reward state and that the future is very bright for all of you. Also, I want to specifically thank John, Ari, Chris, Sergiu, and Monica for carefully proofreading portions of this document. Last but certainly not least, I’d like to thank my friends and family who have supported me throughout this whole endeavor. Specifically I’d like to thank my immediate family—my mother and father, Nancy and Tom, and my siblings Kathleen, Kenneth, and Susan for their moral support over the years. Thank you all for helping to shape this document, and me as well. v Table of Contents Abstract ............................................ ii Acknowledgements ..................................... iv List of Tables .......................................... viii List of Figures ......................................... ix 1. Introduction ........................................ 1 1.1. The Art of the State and the State of the Art . ........ 1 1.2. BridgingtheGap................................. ... 6 1.3. ARoadmapforthisDocument . .... 7 1.4. CommonThreads.................................. .. 13 2. Background and Related Work ............................ 14 2.1. ReinforcementLearning . ...... 14 2.2. Sample Complexity in Supervised Learning . .......... 26 2.3. KWIK-R-max, A general Algorithm for Sample Efficient RL . ........... 30 2.4. Languages, Actions, and Learning for Relational Models.............. 33 2.5. MovingForward .................................. .. 40 3. Online Action-Schema Learning ............................ 42 3.1. TerminologyandtheRepresentation . ......... 42 3.2. Example Languageand Benchmark Problems . ........ 48 3.3. RelatedWork.................................... .. 51 3.4. ActionSchemaLearningProblems . ....... 54 3.5. LearningEffectDistributions . ........ 59 3.6. Learning Pre-conditions and Conditions . ........... 73 3.7. LearningSmallPre-conditions. ......... 75 vi 3.8. LearningConditions . .. .. .. .. .. .. .. .. .. .. .. .. ..... 80 3.9. Learning Effects and their Distributions . ........... 92 3.10.ThefullCED-LearningProblem . .......114 4. Apprenticeship Learning of Action Schemas .................... 117 4.1. Apprenticeship Learning: An Alternative Learning Protocol . .. .. .. .. ..117 4.2. Separating Sample Efficiency in the Online and Apprenticeship Frameworks . 122 4.3. Apprenticeship Learning of Action Schemas . ...........128 5. Web-Service Task Learning ............................... 142 5.1. WebServiceTaskLearning . .142 5.2. TerminologyandRepresentation . ........145 5.3. SimpleTaskLearning . .. .. .. .. .. .. .. .. .. .. .. .. .154 5.4. FullTaskLearning ............................... .159 5.5. ReasoningwithTaskGraphs . .164 5.6. ExampleswithRealServices . ......167 5.7. RelatedWork.................................... 171 5.8. LinkingBacktoActionSchemas . ......173 6. Language Extensions, Planning, and Concluding Remarks ........... 175 6.1. OOMDPsasActionSchemas . 175 6.2. LearningDescriptionLogicOperators . ..........178 6.3. PlanninginLargeStateSpaces . .......188 6.4. FutureWork ..................................... 195 6.5. ConcludingRemarks . .. .. .. .. .. .. .. .. .. .. .. .. .196 Bibliography .......................................... 198 Vita ................................................ 211 vii List of Tables 1.1. Previousliteratureandtopics . ......... 12 2.1. KWIK-learnable classes and architectures . ............ 29 3.1. Action-Schema dynamics settings . ......... 46 3.2. DeterministicBlocksWorld . ....... 49 3.3. StochasticPaint/PolishWorld. ......... 51 3.4. MetalPaint/PolishWorld . ...... 52 3.5. Adifficultsetofeffectstolearn . ....... 98 3.6. Summaryofonlineschemalearning. ........116 4.1. Blocks world operatorslearned from traces . ...........135 4.2. Summary of apprenticeship learning of action schemas . ..............141 5.1. An action schema version of FlightLookup ......................174 6.1. AnOOMDPoperator ................................ 176 6.2. SomepopularDLconstructors . ......179 6.3. AsimpleDLActionSchema. .181 6.4. A partial DL Action Schema in ALN ........................184 viii

Efficient Learning of Relational Models for Sequential Decision Making

Unsupervised Recurrent Neural Network Grammars

Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges

GRAINS: Generative Recursive Autoencoders for Indoor Scenes

Unsupervised Recurrent Neural Network Grammars

Semantic Analysis of Multi Meaning Words Using Machine Learning and Knowledge Representation by Marjan Alirezaie

Max-Margin Synchronous Grammar Induction for Machine Translation

Compound Probabilistic Context-Free Grammars for Grammar Induction

Grammar Induction

Dependency Grammar Induction with a Neural Variational Transition

Are Pre-Trained Language Models Aware of Phrases?Simplebut Strong Baselinesfor Grammar Induction

What Do Recurrent Neural Network Grammars Learn About Syntax?

A Survey of Unsupervised Dependency Parsing