Hierarchical Methods for Optimal Long-Term Planning
Total Page:16
File Type:pdf, Size:1020Kb
UC Berkeley UC Berkeley Electronic Theses and Dissertations Title Hierarchical Methods for Optimal Long-Term Planning Permalink https://escholarship.org/uc/item/34q2f9vm Author Wolfe, Jason Andrew Publication Date 2011 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California Hierarchical Methods for Optimal Long-Term Planning by Jason Andrew Wolfe A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Stuart Russell, Chair Professor Christos Papadimitriou Professor Rhonda Righter Fall 2011 Hierarchical Methods for Optimal Long-Term Planning Copyright 2011 by Jason Andrew Wolfe 1 Abstract Hierarchical Methods for Optimal Long-Term Planning by Jason Andrew Wolfe Doctor of Philosophy in Computer Science University of California, Berkeley Professor Stuart Russell, Chair This thesis addresses the problem of generating goal-directed plans involving very many elementary actions. For example, to achieve a real-world goal such as earning a Ph.D., an intelligent agent may carry out millions of actions at the level of reading a word or striking a key. Given computational constraints, it seems that such long-term planning must incorporate reasoning with high-level actions (such as delivering a conference talk or typing a paragraph of a research paper) that abstract over the precise details of their implementations, despite the fact that these details must eventually be determined for the actions to be executed. This multi-level decision-making process is the subject of hierarchical planning. To most effectively plan with high-level actions, one would like to be able to correctly identify whether a high-level plan works, without first considering its low-level implementa- tions. The first contribution of this thesis is an \angelic" semantics for high-level actions that enables such inferences. This semantics also provides bounds on the costs of high-level plans, enabling the identification of provably high-quality (or even optimal) high-level solutions. Effective hierarchical planning also requires algorithms to efficiently search through the space of high-level plans for high-quality solutions. We demonstrate how angelic bounds can be used to speed up search, and introduce a novel decomposed planning framework that leverages task-specific state abstraction to eliminate many redundant computations. These techniques are instantiated in the Decomposed, Angelic, State-abstracted, Hierarchical A* (DASH-A*) algorithm, which can find hierarchically optimal solutions exponentially faster than previous algorithms. i Contents 1 Introduction 1 1.1 Classical Planning . 3 1.2 Hierarchical Planning . 3 1.3 Outline of the Thesis . 6 2 Background 10 2.1 Classical Planning and Search . 10 2.1.1 State-Space Search Problems . 10 2.1.2 Planning Problems: Representations and Examples . 13 2.1.3 State-Space Search Algorithms . 17 2.1.4 Partial-Order Planning . 22 2.2 AND/OR Graph Search . 23 2.2.1 AND/OR Search Problems . 23 2.2.2 Bottom-Up Search Algorithms . 30 2.2.3 Top-Down Search Algorithms . 31 2.2.4 Hybrid Search Algorithms . 41 2.3 Hierarchical Planning . 44 2.3.1 Historical Approaches . 46 2.3.2 This Thesis: Simplified Hierarchical Task Networks . 51 3 Decomposed Hierarchical Planning 60 3.1 Decomposition, Caching, and State Abstraction . 61 3.1.1 Decomposition and Caching . 61 3.1.2 State Abstraction . 65 3.2 Exhaustive Decomposed Search . 69 3.3 Cost-Ordered Decomposed Search . 70 3.3.1 DSH-LDFS: Recursive Top-Down Search . 70 3.3.2 DSH-UCS: Flattened Search . 74 ii 4 Angelic Semantics 77 4.1 Angelic Descriptions for Reachability . 78 4.1.1 Reachable Sets and Exact Descriptions . 78 4.1.2 Optimistic and Pessimistic Descriptions . 81 4.1.3 Simple Angelic Search . 83 4.2 Angelic Descriptions with Costs . 85 4.2.1 Valuations and Exact Descriptions . 85 4.2.2 Optimistic and Pessimistic Descriptions . 88 4.3 Representations and Inference . 91 4.3.1 Interface for Search Algorithms . 91 4.3.2 Declarative Representations: NCSTRIPS . 93 4.3.3 Procedural Specifications . 100 4.4 Origins of Descriptions . 102 4.4.1 Computing Descriptions . 103 4.4.2 Angelic Semantics, STRIPS, and the Real World . 105 5 Angelic Hierarchical Planning 108 5.1 Angelic Hierarchical A* . 109 5.1.1 Optimistic AH-A* . 109 5.1.2 AH-A* with Pessimistic Descriptions . 112 5.2 Singleton DASH-A* . 118 5.3 DASH-A* . 121 5.3.1 Introduction . 121 5.3.2 Overview . 127 5.3.3 Initial Algorithm . 132 5.3.4 Subsumption . 143 5.3.5 Same-Output Grouping . 144 5.3.6 Caching and State Abstraction . 150 5.3.7 Correctness of DASH-A* . 152 5.3.8 Analysis of DASH-A* . 155 5.4 Suboptimal Search . 157 6 Experimental Results 161 6.1 Implemented Algorithms . 161 6.2 Nav-Switch . 164 6.3 Discrete Manipulation . 167 6.4 Bit-Stash . 170 7 Related Work 174 7.1 Planning . 174 7.1.1 Explanation-Based Learning . 174 iii 7.1.2 Nondeterministic Planning . 175 7.1.3 HTN Planning Semantics . 175 7.2 Hierarchies of State Abstractions . 176 7.3 Hierarchical Reinforcement Learning . 178 7.4 Hierarchies in Robotics . 179 7.5 Interleaving Planning and Execution . 180 7.6 Formal Verification . 181 8 Conclusion 183 8.1 Summary . 183 8.2 Future Work . 184 8.2.1 Classical Planning . 184 8.2.2 Beyond Classical Planning . 185 8.3 Outlook . 186 iv Acknowledgments This dissertation would not have been possible without the help and support of advisors, colleagues, friends, and family. First and foremost, I would like to thank my advisor Stuart Russell. His wisdom, ency- clopedic knowledge of the field, and ample guidance have shaped many of the ideas in this dissertation, and helped me become a better writer, researcher, and thinker. I would also like to thank Bhaskara Marthi, who originated this research project, con- tributed to and helped refine many of the ideas presented here, and has taught me a lot about research over the years. I am very grateful to Professors Christos Papadimitriou and Rhonda Righter for serving on my dissertation committee, and offering helpful comments on this manuscript. Along with Professor Dan Klein, they also helped guide this research in its earlier stages by serving on my qualifying exam committee. This space is unfortunately too short to thank all those who have shaped my educa- tion and research, including a great number of brilliant faculty and students throughout undergrad and grad school at Berkeley. Conversations with Malik Ghallab, Pieter Abbeel, and many of the great researchers at Willow Garage stand out especially. Working on side- projects with Aria Haghighi and Dan Klein was also a great pleasure. My fellow RUGS members have been a never-ending source of great conversations, inspi- ration, and helpful suggestions on countless research ideas, papers, and talks over the years: Norm Aleks, Nimar Arora, Emma Brunskill, Kevin Canini, Shaunak Chatterjee, Daniel Duckworth, Nick Hay, Gregory Lawrence, Bhaskara Marthi, Brian Milch, David Moore, Rodrigo de Salvo Braz, Erik Sudderth, and Satish Kumar Thittamaranahalli. I am also grateful for all of the reviewers, researchers, and others I've had the pleasure to interact with throughout graduate school. In particular, the ICAPS community has inspired and welcomed papers based on many of the ideas in this thesis. Last and far from least, I would like to thank my friends and family. My parents, Rich and Sue, have always offered unconditional love and support for my endeavors, whatever they may be. I can't thank you enough! My girlfriend, Sarah Reed, has been a pillar of support and a perfect partner in crime over the past three years | I can't wait for you to join me as Dr. Reed! To all of my family and friends: I can't tell you how much your love, support, and understanding mean to me | you've helped make my time in graduate school the best of my life thus far. 1 Chapter 1 Introduction A central aspect of intelligence is the ability to plan ahead, selecting actions to do based on their expected impact on the world, including their influence on potential future action choices. In fact, nearly all actions that humans actually carry out on a day-to-day basis | striking a key, articulating a word, and so on | are only useful insofar as they take us tiny steps closer to achieving our long-term goals. Since the early days of Artificial Intelligence, hierarchical structure in behavior has been recognized as perhaps the most important tool for coping with the complex environments and long time horizons encountered in such real- world decision-making. Humans, who execute on the order of one trillion primitive motor commands in a lifetime, appear to make heavy use of hierarchy in their decision making| we entertain and commit to (or discard) high-level actions such as \write a Ph.D. thesis" and \run for President" without preplanning the motor commands involved, even though these high-level actions must eventually be refined down to motor commands (through many intermediate levels of hierarchy) in order to take effect. Consider a simulated robot tasked with cleaning up a room by delivering a set of objects (e.g., dirty cups, magazines) to target regions (e.g., dishwasher, coffee table), an example that will be used throughout this thesis. To accomplish this task, the robot must execute a coordinated sequence of many thousands of low-level \primitive" actions, corresponding to small movements of its base and arm, articulations of its gripper, and so on. Attempting to find such a solution by directly searching over the space of all possible such sequences seems like a hopeless, or at least misguided, pursuit.