DISSERTATION Long-Term Planning and Reactive Execution in Highly
Total Page:16
File Type:pdf, Size:1020Kb
Long-Term Planning and Reactive Execution in Highly Dynamic Environments D I S S E R TA T I O N zur Erlangung des akademischen Grades Doktoringenieurin (Dr.-Ing.) angenommen durch die Fakultät für Informatik der Otto-von-Guericke-Universität Magdeburg von M.Sc. Xenija Neufeld geb. am 18.01.1990 in Nikolskoje Gutachterinnen/Gutachter Prof. Dr.-Ing. habil. Sanaz Mostaghim Prof. Dr. Simon Lucas Prof. Dr. Mike Preuss Magdeburg, den 17.12.2020 Abstract In many highly dynamic environments artificial agents need to follow long-term goals and therefore are required to reason and to plan far into the future. At the same time, while following long-term plans, the agents are also required to stay reactive to environmental changes and to act deliberately while always maintaining robust and secure behaviors. In many cases, such agents act as parts of a larger system and need to collaborate while coordinating their actions. Generating agent behaviors that allow for long-term planning and reactive acting is a complex task, which becomes even more challenging with an increasing number of agents and an increasing size of the search space. This thesis focuses on video games as highly dynamic multi-agent environments proposing a solution that allows to combine long-term planning with reactive execution. On the one hand, existing literature proposes a variety of different planning approaches. However, plans that are executed in highly-dynamic environments are very likely to fail during their execution. This can lead to high replanning frequencies and delayed execution. On the other hand, there are various reactive decision-making approaches, which allow agents to quickly adjust their behaviors to environmental changes. However, usually such approaches do not allow for long-term planning. Inspired by various approaches observed in areas such as spacecraft control, robotics, and video games, this thesis proposes a hybrid approach. The general idea of the hybrid solution combines a Hierarchical Task Network (HTN) planner and a reactive approach in a three-layer architecture. The approach separates the decision-making responsibilities between a planner, which is responsible for abstract long-term planning, and a reactive approach that is responsible for local decision-making and task refinement at execution time. The major contribution of this work, which allows for such interleaved decision- making and continuous execution of long-term plans, is an extension of the plan tasks, which is used by the reactive approach at execution time. The thesis describes two different implementations of this solution using either Behavior Trees or Monte Carlo Tree Search (MCTS) as reactive approaches. It examines the effects of the interleaved decision-making in two different highly dynamic video game environ- ments and evaluates the performance of agents using the hybrid approaches comparing them to existing benchmark agents. Additionally, it proposes a possibility to automati- cally improve the execution of long-term tasks using an Evolutionary Algorithm. The results of the performed experiments show that the proposed solutions allow to reduce the global replanning frequencies and decrease the total execution time of multi- agent long-term plans while increasing the success rates of their execution when com- pared to a pure planning approach. Furthermore, the use of the extended high-level plan tasks allows to guide the search process of MCTS resulting in emergent agent behav- iors, which can be further improved by a learning mechanism such as an Evolutionary Algorithm. Zusammenfassung In vielen hochdynamischen Umgebungen m¨ussenAgenten Langzeitziele verfolgen und daf¨urweit in die Zukunft planen k¨onnen.W¨ahrendsie Langzeitpl¨aneausf¨uhren,m¨ussen sie schnell auf Ver¨anderungenin ihrer Umgebung reagieren k¨onnenund stets bewusstes, robustes und sicheres Verhalten zeigen. In vielen F¨allenagieren sie als Teile eines gr¨oßerenSystems und m¨ussenihre Handlungen koordinieren. Die Generierung von Agentenverhalten, die sowohl die Verfolgung von Langzeitpl¨anenals auch reaktives Han- deln erm¨oglichen, ist eine große Herausforderung, die mit steigender Agentenanzahl und steigender Gr¨oßedes Suchraums noch komplexer wird. In dieser Thesis werden Videospiele als hochdynamische Multiagentenumgebungen untersucht und eine L¨osung vorgeschlagen, die es erlaubt, die Verfolgung von Langzeitzielen mit reaktivem Handeln zu kombinieren. Einerseits beschreibt existierende Literatur eine Vielzahl an unterschiedlichen Planungs- ans¨atzen, jedoch scheitern Langzeitpl¨aneoft bei ihrer Ausf¨uhrungin hochdynamis- chen Umgebungen. Dies kann zu h¨aufigenNeuplanungen f¨uhrenund potenziell die Ausf¨uhrungder Pl¨aneverz¨ogern.Andererseits existieren viele reaktive Entscheidungssys- teme, die schnelle Anpassungen an Agentenverhalten erm¨oglichen, jedoch nicht weit in die Zukunft planen k¨onnen. Inspiriert von unterschiedlichen Ans¨atzenaus den Bereichen der Raumfahrt, der Robotik und der Videospiele wird in dieser Thesis ein hybrider Ansatz vorgeschlagen. In seiner Grundidee kombiniert der Ansatz einen Hierarchical Task Network Planer und ein reak- tives Entscheidungssystem in einer Drei-Schichten-Architektur. Die Entscheidungsver- antwortung wird zwischen dem Planer, welcher f¨urabstrakte Langzeitplanung verant- wortlich ist, und einem reaktiven System, welches lokale Entscheidungen trifft und die abstrakten Aufgaben w¨ahrendder Ausf¨uhrung verfeinert, aufgeteilt. Der Haupt- beitrag dieser Arbeit, der eine gekoppelte Entscheidungsfindung and eine ununterbroch- ene Ausf¨uhrungerm¨oglicht, ist eine Erweiterung der Planungsdom¨ane,welche w¨ahrend der Ausf¨uhrungvon dem reaktiven System benutzt wird. Die Thesis beschreibt zwei konkrete Umsetzungen der vorgeschlagenen L¨osung,die en- tweder Behavior Trees oder Monte Carlo Tree Search (MCTS) als reaktive Systeme einsetzen. Die Auswirkungen der kombinierten Entscheidungsfindung werden in zwei unterschiedlichen hochdynamischen Videospielumgebungen untersucht und die hybriden Agenten mit existierenden Benchmark-Agenten anhand ihrer Spielleistung verglichen. Außerdem wird eine M¨oglichkeit vorgeschlagen, die Ausf¨uhrungvon Langzeitaufgaben durch einen evolution¨arenAlgorithmus zu verbessern. Experimentergebnisse zeigen, dass die vorgeschlagene L¨osungdie globale H¨aufigkeit der Neuplanungen sowie die Ausf¨uhrzeitvon Langzeitpl¨anenim Vergleich zu einem reinen Planungsansatz verringern kann w¨ahrenddie Erfolgsrate der Ausf¨uhrungenerh¨oht wird. Die Erweiterung der Planungsdom¨aneerlaubt es außerdem den Suchprozess von MCTS zu lenken, wodurch emergente Verhalten entstehen, die durch einen Lernmechanismus wie einen evolution¨arenAlgorithmus noch weiter angepasst werden k¨onnen. Contents 1 Introduction 1 1.1 Motivation . 1 1.2 Highly Dynamic Environments . 3 1.3 Goals of the Thesis . 11 1.4 Structure of the Thesis . 12 2 Background and Related Work 13 2.1 Reactive Decision-Making Approaches . 13 2.1.1 Behavior Trees . 14 2.1.2 Monte Carlo Tree Search . 18 2.2 Planning Approaches . 23 2.2.1 Classical Planning . 24 2.2.2 Real-Time Planning . 25 2.2.3 Hierarchical Task Network Planning . 27 2.3 Planning and Execution . 33 2.3.1 Interleaved Planning and Execution . 33 2.3.2 Multi-Agent Planning and Execution . 39 2.4 Conclusion . 45 3 HTN Planning in a Highly Dynamic Game 47 3.1 Goals . 47 3.2 Test Environment . 48 3.3 HTN Fighter . 50 3.3.1 Two-layer Architecture . 51 3.3.2 Planning Domain . 52 3.3.3 Top Layer . 55 3.3.4 Bottom Layer . 58 3.4 Evaluation . 62 3.4.1 Ordered Method Selection . 63 3.4.2 UCB Method Selection . 66 3.5 Limitations of Pure HTN Planning . 70 3.6 Conclusion . 71 4 Hybrid Approach : General Idea 73 4.1 Goals . 73 4.2 Three-Layer Architecture . 74 4.3 Top Layer . 75 4.4 Planning Domain . 75 CONTENTS 4.5 Middle Layer . 77 4.6 Bottom Layer . 78 5 Hybrid Approach I : HTN + BT 79 5.1 Goals . 79 5.2 Test Environment . 81 5.3 Hybrid Approach . 83 5.3.1 Three-layer Architecture . 83 5.3.2 Planning Domain . 85 5.3.3 Top Layer . 89 5.3.4 Middle Layer . 91 5.3.5 Bottom Layer . 97 5.4 Evaluation . 97 5.5 Conclusion . 104 6 Hybrid Approach II : HTN + MCTS 107 6.1 Goals . 107 6.2 Test Environment . 109 6.3 Hybrid Approach . 110 6.3.1 Three-layer Architecture . 111 6.3.2 Planning Domain . 112 6.3.3 Top Layer . 116 6.3.4 Middle Layer . 116 6.3.5 Bottom Layer . 121 6.4 First Evaluation . 121 6.5 Evolution of Evaluation Functions . 128 6.5.1 Application of the Genetic Algorithm . 130 6.5.2 Second Evaluation . 132 6.6 Conclusion . 140 7 Conclusion 143 7.1 Summary . 143 7.2 Discussion . 148 7.3 Limitations and Future Work . 151 Bibliography 153 List of Figures 166 List of Tables 168 Acronyms 169 Glossary 171 CONTENTS A Combo Results of the FightingIce Experiments with UCB Method Selection 173 B HTN Domains for the Hybrid Approach I 177 B.1 Pure HTN Domain . 177 B.2 Hybrid HTN Domain . 186 C Behavior Trees for the Hybrid Approach I 191 D Game Maps for the Hybrid Approach II 207 C h a p t e1 r Introduction In many virtual as well as real-world environments, artificial agents are required to operate deliberately and pursue long-term goals under quickly changing environment conditions. The difficulty of achieving such goals while staying reactive increases if further requirements such as coordination or cooperation are added. This work focuses on such problems and proposes different solutions to them. This chapter explains the motivation of this work in more details and describes fundamental characteristics of the kind of environments considered in this work. It clarifies the problems and challenges arising from such environments and, finally, defines the goals of this thesis. 1.1 Motivation Virtual and physical artificial agents are operating in many environments such as video games, simulation environments, smart factory environments, and various robotics ap- plications. Often, they are contributing to an overall high-level goal while being part of a larger system. In order to help achieving such goals, the agents are required to reason far into the future searching for feasible (and potentially optimal) sequences of actions and to execute them in a robust way over a potentially long period of time.