Monitoring the Generation and Execution of Optimal Plans By
Total Page:16
File Type:pdf, Size:1020Kb
Monitoring the Generation and Execution of Optimal Plans by Christian Fritz A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Computer Science University of Toronto Copyright c 2009 by Christian Fritz Abstract Monitoring the Generation and Execution of Optimal Plans Christian Fritz Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2009 In dynamic domains, the state of the world may change in unexpected ways during the generation or execution of plans. Regardless of the cause of such changes, they raise the question of whether they interfere with ongoing planning efforts. Unexpected changes during plan generation may invalidate the current planning effort, while discrepancies between expected and actual state of the world during execution may render the executing plan invalid or sub-optimal, with respect to previously identified planning objectives. In this thesis we develop a general monitoring technique that can be used during both plan generation and plan execution to determine the relevance of unexpected changes and which supports recovery. This way, time intensive replanning from scratch in the new and unexpected state can often be avoided. The technique can be applied to a variety of objectives, including monitoring the optimality of plans, rather then just their validity. Intuitively, the technique operates in two steps: during planning the plan is annotated with additional information that is relevant to the achievement of the objective; then, when an unexpected change occurs, this information is used to determine the relevance of the discrepancy with respect to the objective. We substantiate the claim of broad applicability of this relevance-based technique by developing four concrete applications: generating optimal plans despite frequent, un- expected changes to the initial state of the world, monitoring plan optimality during execution, monitoring the execution of near-optimal policies in stochastic domains, and ii monitoring the generation and execution of plans with procedural hard constraints. In all cases, we use the formal notion of regression to identify what is relevant for achieving the objective. We prove the soundness of these concrete approaches and present empir- ical results demonstrating that in some contexts orders of magnitude speed-ups can be gained by our technique compared to replanning from scratch. iii Dedication To my parents. iv Acknowledgements First and foremost I want to thank my supervisor, Sheila McIlraith, who, from all possible perspectives, is the best supervisor I could imagine. This regards not only the scientific guidance she gave me, which was always characterized by a maximal degree of objectivity and rationality, but also the mentoring I have received from her. Sheila was always an active—and pro-active—supporter of my, and her other students’, best interests, constantly looking out for and setting up new opportunities for us, both in research and in academic networking. I greatly appreciate her flexibility regarding the topics I wanted to work on, and her support and encouragement when pursuing my own ideas or collaborating with other students. It is due to what I have learned from her that I now feel ready to pursue independent research. I thank Jorge Baier for the endless discussions, white board sessions, and collabora- tions. Jorge’s analytical skills and his clear and calm way of pondering a problem have always been inspiring to me. Moreover, I greatly value his friendship. Besides Jorge, I was fortunate to have a number of other collaborators whom I would like to thank for working with me: Meghyn Bienvenu, for our collaborations on planning with preferences; Richard Hull and Jianwen Su for giving me the chance to work with them at Bell Labs and beyond, and for broadening my horizon and triggering my interest in work-flow related issues. For their extremely valuable feedback, comments, and suggestions I thank the mem- bers of my supervisory committee, Hector Levesque, Fahiem Bacchus, and Craig Boutilier. The presented thesis has greatly benefited from the thorough review and constructive crit- icism they have provided me with over the course of my program at the department. This also applies to my external examiner, Sven Koenig, who has done an extremely thorough and critical review of this thesis, providing a lot of useful comments and suggestions for future work. I thank Gerhard Lakemeyer, who, as the supervisor of my Master’s research in Aachen, v had given me the chance to apply my algorithms in the highly-dynamic RoboCup domain. This first-hand experience highlighted the need for and triggered my interest in the topic of this thesis. The chance to learn about the entire spectrum from sophisticated, state-of- the-art theory to pragmatic, assumptions-defying practice, has made a lasting impression on me and has always been a source of motivation for my research. As a member of the Knowledge Representation Group, I have benefited numerous times from the feedback provided by its other members, and I thank them for attending and critiquing my practice talks and presentations of early stage ideas. I have always appreciated the friendly, constructive, yet critically objective atmosphere that has been characteristic for our group meetings. My gratitude also extends to the Department of Computer Science itself, for creating a stimulating environment and for hosting a long list of excellent and inspiring guest speakers. Finally, I thank my girlfriend Tanya for her loving and understanding support in hard times. vi Contents 1 Introduction 1 1.1 Motivation................................... 1 1.2 Framework................................... 4 1.2.1 Comparison To Existing Frameworks . 7 1.3 OutlineandContributions . 9 1.3.1 Contributions ............................. 9 2 Background 13 2.1 TheSituationCalculus . 13 2.1.1 Basic Action Theories . 14 2.1.2 The Frame Problem and A Solution for Deterministic Actions . 15 2.2 Regression................................... 17 2.3 NotationandDefinitions ........................... 19 3 Monitoring Plan Validity 21 3.1 Introduction.................................. 21 3.2 Characterizing Existing Approaches . 22 3.2.1 Integrated Planning and Execution . 23 3.2.2 Expectation-Based Monitoring . 27 3.2.3 StateEvaluation ........................... 32 3.3 AnAbstractMonitoringApproach . 33 vii 3.3.1 Other Uses of Relevance in Planning . 34 3.4 Monitoring Plan Validity in the Situation Calculus . ........ 36 4 Monitoring Plan Optimality During Execution 40 4.1 Introduction.................................. 40 4.1.1 Contributions ............................. 41 4.2 A∗ SearchBasedPlanning .......................... 41 4.3 A Sufficient Condition for Optimality . 43 4.4 Algorithm ................................... 47 4.4.1 Annotation .............................. 47 4.4.2 ExecutionMonitoring . 51 4.4.3 ExploitingtheSearchTree. 56 4.5 AnIllustrativeExample ........................... 57 4.6 EmpiricalResults............................... 59 4.7 Discussion................................... 61 5 Generating Optimal Plans in Highly Dynamic Environments 65 5.1 Introduction.................................. 65 5.1.1 Contributions ............................. 66 5.1.2 About Optimality in Dynamic Environments . 68 5.2 Algorithm ................................... 68 5.2.1 Regression-Based A∗ planning .................... 70 5.2.2 Recovering from Unexpected Changes . 72 5.3 EmpiricalResults............................... 76 5.4 Discussion................................... 82 6 Monitoring Policy Execution in Stochastic Domains 85 6.1 Introduction.................................. 85 6.1.1 Contributions ............................. 86 viii 6.2 Background .................................. 87 6.2.1 RepresentingMDPs.......................... 87 6.2.2 SolvingMDPsthroughSearch . 89 6.3 Algorithm ................................... 91 6.3.1 Annotation .............................. 92 6.3.2 ExecutionMonitoring . 95 6.4 AnIllustrativeExample ........................... 98 6.5 Analysis .................................... 101 6.5.1 SpaceComplexity. 101 6.5.2 OptimalityConsiderations . 102 6.5.3 EmpiricalResults. 105 6.6 Discussion................................... 106 7 Generating and Executing Plans with Procedural Control 109 7.1 Introduction.................................. 110 7.1.1 Contributions ............................. 111 7.2 Background .................................. 112 7.2.1 GologandConGolog . 112 7.3 Compiling ConGolog into Basic Action Theories . 115 7.4 Analysis .................................... 124 7.4.1 TheoreticalMerits . 124 7.4.2 PracticalMerits............................ 126 7.5 Monitoring the Execution of ConGolog Programs . 131 7.6 RelatedWork ................................. 135 7.7 Discussion................................... 137 8 Related Work 138 8.1 Replanning .................................. 138 ix 8.1.1 PlanRepair .............................. 140 8.1.2 Backtracking ............................. 147 8.1.3 Learning................................ 149 8.2 ContingencyPlanning ............................ 153 8.3 StateEstimation ............................... 156 8.4 ExternalMonitoring ............................. 162 8.5 Meta-reasoning ................................ 164 8.6 ControlTheory ................................ 165 9 Conclusion 166 9.1 Summary ................................... 166 9.2 Contributions ................................