Outcome Prediction and Hierarchical Models in Real-Time Strategy Games
Total Page:16
File Type:pdf, Size:1020Kb
Outcome Prediction and Hierarchical Models in Real-Time Strategy Games by Adrian Marius Stanescu A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computing Science University of Alberta ⃝c Adrian Marius Stanescu, 2018 Abstract For many years, traditional boardgames such as Chess, Checkers or Go have been the standard environments to test new Artificial Intelligence (AI) al- gorithms for achieving robust game-playing agents capable of defeating the best human players. Presently, the focus has shifted towards games that of- fer even larger action and state spaces, such as Atari and other video games. With a unique combination of strategic thinking and fine-grained tactical com- bat management, Real-Time Strategy (RTS) games have emerged as one of the most popular and challenging research environments. Besides state space complexity, RTS properties such as simultaneous actions, partial observabil- ity and real-time computing constraints make them an excellent testbed for decision making algorithms under dynamic conditions. This thesis makes contributions towards achieving human-level AI in these complex games. Specifically, we focus on learning, using abstractions and performing adversarial search in real-time domains with extremely large action and state spaces, for which forward models might not be available. We present two abstract models for combat outcome prediction that are accurate while reasonably computationally inexpensive. These models can in- form high level strategic decisions such as when to force or avoid fighting or be used as evaluation functions for look-ahead search algorithms. In both cases we obtained stronger results compared to at the time state-of-the-art heuris- tics. We introduce two approaches to designing adversarial look-ahead search algorithms that are based on abstractions to reduce the search complexity. Firstly, Hierarchical Adversarial Search uses multiple search layers that work at different abstraction levels to decompose the original problem. Secondly, Puppet Search methods use configurable scripts as an action abstraction mech- ii anism and offer more design flexibility and control. Both methods show similar performance compared to top scripted and state-of-the-art search based agents in small maps, while outperforming them on larger ones. We show how to use Convolutional Neural Networks (CNNs) to effectively improve spatial aware- ness and evaluate game outcomes more accurately than our previous combat models. When incorporated into adversarial look-ahead search algorithms, this evaluation function increased their playing strength considerably. In these complex domains forward models might be very slow or even un- available, which makes search methods more difficult to use. We show how policy networks can be used to mimic our Puppet Search algorithm and to bypass the need of a forward model during gameplay. We combine the much faster resulting method with other search-based tactical algorithms to pro- duce RTS game playing agents that are stronger than state-of-the-art algo- rithms. We then describe how to eliminate the need for simulators or forward models entirely by using Reinforcement Learning (RL) to learn autonomous, self-improving behaviors. The resulting agents defeated the built-in AI con- vincingly and showed complex cooperative behaviors in small scale scenarios of a fully fledged RTS game. Finally, learning becomes more difficult when controlling increasing num- bers of agents. We introduce a new approach that uses CNNs to produce a spatial decomposition mechanism and makes credit assignment from a single team reward signal more tractable. Applied to a standard Q-learning method, this approach resulted in increased performance over the original algorithm in both small and large scale scenarios. iii Acknowledgements I would first like to thank my supervisor Michael Buro for his excellent guid- ance and support throughout my doctoral studies and for his help with less academic ventures such as becoming a prize-winning Skat player and a passable skier. I would also like to thank my committee members Vadim Bulitko and Jonathan Schaeffer for their comments and feedback that helped improve my work. Thanks to Dave Churchill for paving the way and helping me get started. Special thanks to Nicolas Barriga for all the coffee and moral support shared before deadlines as well as the pizza, movie and scotch afterwards. Thanks to my roommates Sergio, Mona and Graham for letting me win at boardgames, sharing pancakes and countless other delicious meals. Aristotle for offering invaluable assistance in staying positive (while overspending on biking items) and Stuart for helping me stay fit. I would especially like to thank my family. My wife, Raida has been ex- tremely supportive throughout this entire adventure and has made countless sacrifices to help me get to this point. My mom deserves many thanks forher infinite patience and implicit faith in my capabilities. iv Table of Contents 1 Introduction 1 1.1 Real Time Strategy Games . .2 1.1.1 Multiplayer Online Battle (MOBA) Arena Games . .3 1.1.2 eSports and RTS Game AI . .3 1.2 Challenges . .4 1.3 Contributions . .7 1.4 Publications . .8 2 Literature Survey 10 2.1 Reactive Control . 11 2.2 Tactics . 12 2.3 Strategy . 13 2.4 Holistic Approaches . 16 2.5 Deep Reinforcement Learning in RTS Games . 17 2.5.1 Reinforcement Learning Approaches . 19 2.5.2 Overview of Deep RL Methods used in RTS Games . 20 3 Predicting Opponent's Production in Real-Time Strategy Games with Answer Set Programming 25 3.1 Introduction . 26 3.2 Answer Set Programming . 27 3.3 Predicting Unit Production . 28 3.3.1 Generating Unit Combinations . 29 3.3.2 Removing Invalid Combinations . 30 3.3.3 Choosing the Most Probable Combination . 30 3.4 Experiments and Results . 31 3.4.1 Experiment Design . 31 3.4.2 Evaluation Metrics . 33 3.4.3 Results . 34 3.4.4 Running Times . 35 3.5 Conclusion . 37 3.6 Contributions Breakdown and Updates Since Publication . 37 4 Predicting Army Combat Outcomes in StarCraft 39 4.1 Introduction . 39 4.1.1 Purpose . 39 4.1.2 Motivation . 40 4.1.3 Objectives . 41 4.2 Background . 41 4.2.1 Bayesian networks . 41 4.2.2 Rating Systems . 42 4.3 Proposed model & Learning task . 43 v 4.3.1 The Model . 43 4.3.2 Learning . 45 4.4 Experiments . 46 4.4.1 Data . 46 4.4.2 Who Wins . 47 4.4.3 What Wins . 50 4.5 Future Work & Conclusions . 50 4.6 Contributions Breakdown and Updates Since Publication . 52 5 Using Lanchester Attrition Laws for Combat Prediction in StarCraft 54 5.1 Introduction . 55 5.2 Background . 56 5.2.1 Lanchester Models . 57 5.3 Lanchester Model Extensions for RTS Games . 59 5.3.1 Learning Combat Effectiveness . 60 5.4 Experiments and Results . 62 5.4.1 Experiments Using Simulator Generated Data . 62 5.4.2 Experiments in Tournament Environment . 64 5.5 Conclusions and Future Work . 66 5.6 Contributions Breakdown and Updates Since Publication . 67 6 Hierarchical Adversarial Search Applied to Real-Time Strat- egy Games 69 6.1 Introduction . 69 6.1.1 Motivation and Objectives . 70 6.2 Background and Related Work . 71 6.2.1 Multi-Agent Planning . 71 6.2.2 Goal Driven Techniques . 72 6.2.3 Goal Based Game Tree Search . 72 6.3 Hierarchical Adversarial Search . 73 6.3.1 Application Domain and Motivating Example . 73 6.3.2 Bottom Layer: Plan Evaluation and Execution . 74 6.3.3 Middle Layer: Plan Selection . 75 6.3.4 Implementation Details . 77 6.4 Empirical Evaluation . 79 6.4.1 Results . 80 6.5 Conclusions and Future Work . 81 6.6 Contributions Breakdown and Updates Since Publication . 82 7 Evaluating Real-Time Strategy Game States Using Convolu- tional Neural Networks 83 7.1 Introduction . 84 7.2 Related Work . 85 7.2.1 State Evaluation in RTS Games . 85 7.2.2 Neural Networks . 86 7.3 A Neural Network for RTS Game State Evaluation . 87 7.3.1 Data . 88 7.3.2 Features . 88 7.3.3 Network Architecture & Training Details . 88 7.4 Experiments and Results . 90 7.4.1 Winner Prediction Accuracy . 90 7.4.2 State Evaluation in Search Algorithms . 92 7.5 Conclusions and Future Work . 96 vi 7.6 Contributions Breakdown and Updates Since Publication . 96 8 Combining Strategic Learning and Tactical Search in RTS Games 98 8.1 Puppet Search: Enhancing Scripted Behavior by Look-Ahead Search with Applications to RTS Games . 98 8.1.1 Contributions Breakdown and Updates Since Publication 99 8.2 Game Tree Search Based on Non-Deterministic Action Scripts in RTS Games . 100 8.2.1 Contributions Breakdown . 101 8.3 Combining Strategic Learning and Tactical Search in RTS Games102 8.3.1 Contributions Breakdown and Updates Since Publication 103 9 Case Study: Using Deep RL in Total War: Warhammer 105 9.1 Introduction . 105 9.2 Implementation . 107 9.2.1 Network Architecture . 108 9.2.2 Reward Shaping . 109 9.2.3 Experimental Setting . 109 9.2.4 Models and Training . 110 9.3 Results . 110 9.4 Hierarchical Reinforcement Learning . 112 9.4.1 Implementation . 112 9.4.2 Results . 113 9.5 Conclusions and Future Work . 113 9.6 Contributions Breakdown and Updates Since Publication . 115 10 Multi-Agent Action Network Learning Applied to RTS Game Combat 117 10.1 Introduction . 117 10.2 Background and Related Work . 118 10.2.1 Independent Q-Learning . 118 10.2.2 Centralised Learning . 119 10.2.3 Value Decomposition . 119 10.2.4 Abstractions and Hierarchies . 120 10.3 Spatial Action Decomposition Learning . 120 10.3.1 Spatial Action Decomposition Using Sectors . 121 10.3.2 Network Architecture .