DISSERTATION Long-Term Planning and Reactive Execution in Highly

Long-Term Planning and Reactive Execution in Highly Dynamic Environments D I S S E R TA T I O N zur Erlangung des akademischen Grades Doktoringenieurin (Dr.-Ing.) angenommen durch die Fakultät für Informatik der Otto-von-Guericke-Universität Magdeburg von M.Sc. Xenija Neufeld geb. am 18.01.1990 in Nikolskoje Gutachterinnen/Gutachter Prof. Dr.-Ing. habil. Sanaz Mostaghim Prof. Dr. Simon Lucas Prof. Dr. Mike Preuss Magdeburg, den 17.12.2020 Abstract In many highly dynamic environments artificial agents need to follow long-term goals and therefore are required to reason and to plan far into the future. At the same time, while following long-term plans, the agents are also required to stay reactive to environmental changes and to act deliberately while always maintaining robust and secure behaviors. In many cases, such agents act as parts of a larger system and need to collaborate while coordinating their actions. Generating agent behaviors that allow for long-term planning and reactive acting is a complex task, which becomes even more challenging with an increasing number of agents and an increasing size of the search space. This thesis focuses on video games as highly dynamic multi-agent environments proposing a solution that allows to combine long-term planning with reactive execution. On the one hand, existing literature proposes a variety of different planning approaches. However, plans that are executed in highly-dynamic environments are very likely to fail during their execution. This can lead to high replanning frequencies and delayed execution. On the other hand, there are various reactive decision-making approaches, which allow agents to quickly adjust their behaviors to environmental changes. However, usually such approaches do not allow for long-term planning. Inspired by various approaches observed in areas such as spacecraft control, robotics, and video games, this thesis proposes a hybrid approach. The general idea of the hybrid solution combines a Hierarchical Task Network (HTN) planner and a reactive approach in a three-layer architecture. The approach separates the decision-making responsibilities between a planner, which is responsible for abstract long-term planning, and a reactive approach that is responsible for local decision-making and task refinement at execution time. The major contribution of this work, which allows for such interleaved decision- making and continuous execution of long-term plans, is an extension of the plan tasks, which is used by the reactive approach at execution time. The thesis describes two different implementations of this solution using either Behavior Trees or Monte Carlo Tree Search (MCTS) as reactive approaches. It examines the effects of the interleaved decision-making in two different highly dynamic video game environments and evaluates the performance of agents using the hybrid approaches comparing them to existing benchmark agents. Additionally, it proposes a possibility to automati- cally improve the execution of long-term tasks using an Evolutionary Algorithm. The results of the performed experiments show that the proposed solutions allow to reduce the global replanning frequencies and decrease the total execution time of multi- agent long-term plans while increasing the success rates of their execution when com- pared to a pure planning approach. Furthermore, the use of the extended high-level plan tasks allows to guide the search process of MCTS resulting in emergent agent behaviors, which can be further improved by a learning mechanism such as an Evolutionary Algorithm. Zusammenfassung In vielen hochdynamischen Umgebungen müssenAgenten Langzeitziele verfolgen und dafürweit in die Zukunft planen können.Währendsie Langzeitpläneausführen,müssen sie schnell auf Veränderungenin ihrer Umgebung reagieren könnenund stets bewusstes, robustes und sicheres Verhalten zeigen. In vielen Fällenagieren sie als Teile eines größerenSystems und müssenihre Handlungen koordinieren. Die Generierung von Agentenverhalten, die sowohl die Verfolgung von Langzeitplänenals auch reaktives Han- deln ermöglichen, ist eine große Herausforderung, die mit steigender Agentenanzahl und steigender Größedes Suchraums noch komplexer wird. In dieser Thesis werden Videospiele als hochdynamische Multiagentenumgebungen untersucht und eine Lösung vorgeschlagen, die es erlaubt, die Verfolgung von Langzeitzielen mit reaktivem Handeln zu kombinieren. Einerseits beschreibt existierende Literatur eine Vielzahl an unterschiedlichen Planungs- ansätzen, jedoch scheitern Langzeitpläneoft bei ihrer Ausführungin hochdynamischen Umgebungen. Dies kann zu häufigenNeuplanungen führenund potenziell die Ausführungder Pläneverzögern.Andererseits existieren viele reaktive Entscheidungssys- teme, die schnelle Anpassungen an Agentenverhalten ermöglichen, jedoch nicht weit in die Zukunft planen können. Inspiriert von unterschiedlichen Ansätzenaus den Bereichen der Raumfahrt, der Robotik und der Videospiele wird in dieser Thesis ein hybrider Ansatz vorgeschlagen. In seiner Grundidee kombiniert der Ansatz einen Hierarchical Task Network Planer und ein reaktives Entscheidungssystem in einer Drei-Schichten-Architektur. Die Entscheidungsver- antwortung wird zwischen dem Planer, welcher fürabstrakte Langzeitplanung verant- wortlich ist, und einem reaktiven System, welches lokale Entscheidungen trifft und die abstrakten Aufgaben währendder Ausführung verfeinert, aufgeteilt. Der Haupt- beitrag dieser Arbeit, der eine gekoppelte Entscheidungsfindung and eine ununterbroch- ene Ausführungermöglicht, ist eine Erweiterung der Planungsdomäne,welche während der Ausführungvon dem reaktiven System benutzt wird. Die Thesis beschreibt zwei konkrete Umsetzungen der vorgeschlagenen Lösung,die en- tweder Behavior Trees oder Monte Carlo Tree Search (MCTS) als reaktive Systeme einsetzen. Die Auswirkungen der kombinierten Entscheidungsfindung werden in zwei unterschiedlichen hochdynamischen Videospielumgebungen untersucht und die hybriden Agenten mit existierenden Benchmark-Agenten anhand ihrer Spielleistung verglichen. Außerdem wird eine Möglichkeit vorgeschlagen, die Ausführungvon Langzeitaufgaben durch einen evolutionärenAlgorithmus zu verbessern. Experimentergebnisse zeigen, dass die vorgeschlagene Lösungdie globale Häufigkeit der Neuplanungen sowie die Ausführzeitvon Langzeitplänenim Vergleich zu einem reinen Planungsansatz verringern kann währenddie Erfolgsrate der Ausführungenerhöht wird. Die Erweiterung der Planungsdomäneerlaubt es außerdem den Suchprozess von MCTS zu lenken, wodurch emergente Verhalten entstehen, die durch einen Lernmechanismus wie einen evolutionärenAlgorithmus noch weiter angepasst werden können. Contents 1 Introduction 1 1.1 Motivation . 1 1.2 Highly Dynamic Environments . 3 1.3 Goals of the Thesis . 11 1.4 Structure of the Thesis . 12 2 Background and Related Work 13 2.1 Reactive Decision-Making Approaches . 13 2.1.1 Behavior Trees . 14 2.1.2 Monte Carlo Tree Search . 18 2.2 Planning Approaches . 23 2.2.1 Classical Planning . 24 2.2.2 Real-Time Planning . 25 2.2.3 Hierarchical Task Network Planning . 27 2.3 Planning and Execution . 33 2.3.1 Interleaved Planning and Execution . 33 2.3.2 Multi-Agent Planning and Execution . 39 2.4 Conclusion . 45 3 HTN Planning in a Highly Dynamic Game 47 3.1 Goals . 47 3.2 Test Environment . 48 3.3 HTN Fighter . 50 3.3.1 Two-layer Architecture . 51 3.3.2 Planning Domain . 52 3.3.3 Top Layer . 55 3.3.4 Bottom Layer . 58 3.4 Evaluation . 62 3.4.1 Ordered Method Selection . 63 3.4.2 UCB Method Selection . 66 3.5 Limitations of Pure HTN Planning . 70 3.6 Conclusion . 71 4 Hybrid Approach : General Idea 73 4.1 Goals . 73 4.2 Three-Layer Architecture . 74 4.3 Top Layer . 75 4.4 Planning Domain . 75 CONTENTS 4.5 Middle Layer . 77 4.6 Bottom Layer . 78 5 Hybrid Approach I : HTN + BT 79 5.1 Goals . 79 5.2 Test Environment . 81 5.3 Hybrid Approach . 83 5.3.1 Three-layer Architecture . 83 5.3.2 Planning Domain . 85 5.3.3 Top Layer . 89 5.3.4 Middle Layer . 91 5.3.5 Bottom Layer . 97 5.4 Evaluation . 97 5.5 Conclusion . 104 6 Hybrid Approach II : HTN + MCTS 107 6.1 Goals . 107 6.2 Test Environment . 109 6.3 Hybrid Approach . 110 6.3.1 Three-layer Architecture . 111 6.3.2 Planning Domain . 112 6.3.3 Top Layer . 116 6.3.4 Middle Layer . 116 6.3.5 Bottom Layer . 121 6.4 First Evaluation . 121 6.5 Evolution of Evaluation Functions . 128 6.5.1 Application of the Genetic Algorithm . 130 6.5.2 Second Evaluation . 132 6.6 Conclusion . 140 7 Conclusion 143 7.1 Summary . 143 7.2 Discussion . 148 7.3 Limitations and Future Work . 151 Bibliography 153 List of Figures 166 List of Tables 168 Acronyms 169 Glossary 171 CONTENTS A Combo Results of the FightingIce Experiments with UCB Method Selection 173 B HTN Domains for the Hybrid Approach I 177 B.1 Pure HTN Domain . 177 B.2 Hybrid HTN Domain . 186 C Behavior Trees for the Hybrid Approach I 191 D Game Maps for the Hybrid Approach II 207 C h a p t e1 r Introduction In many virtual as well as real-world environments, artificial agents are required to operate deliberately and pursue long-term goals under quickly changing environment conditions. The difficulty of achieving such goals while staying reactive increases if further requirements such as coordination or cooperation are added. This work focuses on such problems and proposes different solutions to them. This chapter explains the motivation of this work in more details and describes fundamental characteristics of the kind of environments considered in this work. It clarifies the problems and challenges arising from such environments and, finally, defines the goals of this thesis. 1.1 Motivation Virtual and physical artificial agents are operating in many environments such as video games, simulation environments, smart factory environments, and various robotics ap- plications. Often, they are contributing to an overall high-level goal while being part of a larger system. In order to help achieving such goals, the agents are required to reason far into the future searching for feasible (and potentially optimal) sequences of actions and to execute them in a robust way over a potentially long period of time.

DISSERTATION Long-Term Planning and Reactive Execution in Highly

Game Console Rating

Horizon Zero Dawn Pc Release Date

“In You All Things”: Biblical Influences on Story, Gameplay, and Aesthetics in Guerrilla Games’ Horizon Zero Dawn Rebekah Dyer [email protected]

Horizon Zero Dawn Thomas Arthur

Horizon Zero Dawn Best Modification Location

The Baroque in Games: a Case Study of Remediation

The Foundations of Song Bird

Best Modifications Horizon Zero Dawn

The Female Video Game Player- Character Persona and Emotional Attachment

Zero Dawn (2017)

Lies for the 'Greater Good' – the Story of Horizon Zero Dawn

CV Patrick Moechel