<<

Optimizing an Evolutionary Approach to Machine Generated Artificial Intelligence for Games

Master’s Thesis MTA 161030

Aalborg University Medialogy

Medialogy Aalborg University http://www.aau.dk

Title: Abstract: Optimizing an evolutionary approach to machine generated artificial intelligence This thesis presents an investiga- for games tion into how to effectively optimize the production of machine generated Theme: game AI, exploring the behavior tree Master’s Thesis model and evolutionary computation. The optimization methods focus on Project Period: providing a ‘proof of concept’ that Spring Semester 2016 a system can be designed and im- plemented, through a series of stud- Project Group: ies, being capable of producing game MTA161030 AIs with alternative behaviors within a playthrough of a game. The con- Participant(s): struction of these behaviors should be Andrei Vlad Constantin informed by the evaluation of pre- vious behaviors, as well as show a Richard Alan Cupit quantifiable improvement in perfor- mance. The studies evaluate the per- Konstantinos Monastiridis formance of a generated AI for the game XCOM 2, a Turn-Based Tactics Supervisor(s): . The AIs will be evalu- Martin Kraus ated by running combat simulations Copies: 1 against the standard AI implemented by its developers. Ultimately, the re- Number of Pages: 90 sults of the process led to an user ex- periment, in which the most successful Date of Completion: machine generated game AI won 50% May 24, 2016 of matches. The content of this report is freely available, but publication (with reference) may only be pursued due to agreement with the author.

Contents

List of Figures vii

List of Tables ix

1 Introduction 1

2 Background 3 2.1 Game AI ...... 3 2.1.1 Perspectives ...... 4 2.1.2 History ...... 4 2.1.3 Modern Video Game AI ...... 4 2.2 Behavior Trees ...... 5 2.2.1 Overview ...... 5 2.2.2 Uses in game industry and research ...... 8 2.3 Evolutionary Algorithms ...... 10 2.3.1 Genetic Algorithms ...... 10 2.3.2 Genetic Programming ...... 17 2.3.3 Uses in game industry and research ...... 17 2.4 Evolving Behavior Trees ...... 18 2.5 Platform of Application ...... 19 2.5.1 Turn Based Tactic Games ...... 19 2.5.2 XCOM 2 ...... 20

3 Project Statement 23

4 Design and Implementation 25 4.1 Mod Implementation ...... 25 4.1.1 Game Systems ...... 25 4.1.2 Normalization ...... 26 4.1.3 Environmental Cover ...... 29 4.1.4 Default AI ...... 30 4.2 Genetic Algorithm Implementation ...... 30

v vi Contents

4.2.1 Chromosome Design ...... 31 4.2.2 Example Chromosome Implementation ...... 32 4.2.3 WatchMaker Framework ...... 33 4.2.4 Generational Evolution Engine ...... 33

5 Experiment And Results 37 5.1 Pilot Test ...... 37 5.1.1 Design ...... 37 5.1.2 Analysis of Data ...... 42 5.2 Study 1 ...... 46 5.2.1 Design ...... 46 5.2.2 Analysis of Data ...... 52 5.3 Study 2 ...... 57 5.3.1 Design ...... 57 5.3.2 Analysis of Data ...... 61 5.4 Study 3 ...... 64 5.4.1 Design ...... 65 5.4.2 Analysis Of Data ...... 67 5.4.3 Final Evaluation ...... 70 5.5 User Testing ...... 71

6 Discussion and Conclusion 77 6.1 Discussion ...... 77 6.1.1 Dynamic Elitism ...... 77 6.1.2 Fitness Function ...... 78 6.1.3 Chromosome Structure ...... 78 6.1.4 Unit Conditions and Decisions ...... 79 6.2 Conclusion ...... 79

7 Future Directions 81 7.1 Further Development ...... 81 7.2 Alternative Directions ...... 82

Bibliography 83

Appendices 85 A. Extra Content ...... 85 B. Unit Condition Implementation ...... 86 C. Unit Decision Implementation ...... 87 D. Questionnaire ...... 88 E. Classifying Evaluation Matches ...... 89 List of Figures

2.1 Sequence checking for Ammunition and if so, the agent Reloads and the Sequence returns Success...... 7 2.2 The Selector will return Success when either of depicted Actions return Success...... 8 2.3 Visual representation of a string encoded chromosome, holding the solution variables ...... 12 2.4 Visual representation of a string encoded chromosome, holding ran- dom variables ...... 12 2.5 Single-point crossover performed an a string...... 15 2.6 Two-point crossover performed an a string...... 16 2.7 Visualization of Uniform Crossover - The ‘H’ characters represent the positive result of a coin toss...... 16 2.8 Visualization of Input (top) and Output (bottom) string from a ge- netic mutation operator.The character H represents a successful coin toss...... 17

4.1 XCOM 2 unit movement UI...... 28 4.2 An example decision tree structure, with example string representa- tions ...... 32 4.3 Complete example chromosome ...... 33 4.4 Call to instantiate and Evolution Engine of type string, using the Generational Evolution Engine interface ...... 34 4.5 Example fitness evaluator code ...... 34

5.1 Chromosome encoding for the first action point ...... 38 5.2 Example of crossover producing undesirable offspring ...... 40 5.3 Average fitness % and win % of candidates per generation of Pilot Test...... 42 5.4 Graph showing the number of Unit Conditions contained within candidate solutions who won a minimum of one game ...... 44

vii viii List of Figures

5.5 Graph showing the number of Unit Decisions contained within can- didate solutions who won a minimum of one game ...... 45 5.6 Chromosome decision structure for Study One ...... 47 5.7 Chromosome encoding for the first action point ...... 47 5.8 Average fitness % and win % of candidates per generation for Study One. Fitness average does not include modifier ...... 53 5.9 Graph showing the number of Unit Conditions contained within candidate solutions who won a minimum of one game at Study One 55 5.10 Graph showing the number of Unit Decisions contained within can- didate solutions who won a minimum of one game at Study One ...... 56 5.11 Average fitness % and win % of candidates per generation for Study 2. Fitness average does not include modifier ...... 61 5.12 Occurrence of Unit Conditions within candidates winning 3 or more consecutive matches at Study 2 ...... 62 5.13 Occurrences of Unit Decisions within candidates winning 3 or more consecutive matches at Study 2 ...... 63 5.14 Example chromosome structures showing the irrelevance of the or- der of the Unit Conditions ...... 65 5.15 Average fitness % and win % of candidates per generation for Study 3. Fitness average does not include modifier ...... 67 5.16 Occurrence of Unit Conditions within candidates winning 3 or more consecutive matches at Study 3 ...... 69 5.17 Occurrences of Unit Decisions within candidates winning 3 or more consecutive matches at Study 3 ...... 69 5.18 Playtime differences between XCOM2 and XCOM Enemy Unknown/Enemy Within ...... 73 5.19 Correlation between participants play times in descending order and combat outcome statistics ...... 75 5.20 Most evolved candidate AI: ’BCDgdcdaihfADEigidfdeg’ UC&D break- down...... 76

1 Classification error rates for a trained K-nearest neighbor classifier, evaluating both sets ...... 89 2 Confidence matrix of a K-nearest neighbor classifier evaluating the allOnlyResims set ...... 90 List of Tables

4.1 Example Unit Condition characters for the GA to choose from, and their identifiers...... 31 4.2 Example Unit Decision characters for the GA to choose from, and their identifiers...... 32

5.1 Set of Unit Conditions used in the pilot test...... 39 5.2 Set of Unit Decisions used in pilot test...... 39 5.3 Pilot Test’s Solution space size...... 40 5.4 Candidates that won multiple games over the course of the evolution. 43 5.5 Set of Unit Conditions used in Study 1...... 48 5.6 Set of Unit Decision used in Study 1...... 49 5.7 Solution space size for study 1 ...... 49 5.8 Success of elite candidates produced by the first 3 generations of Study 1 ...... 50 5.9 Elite candidate performance during Study One...... 53 5.10 Elite candidate performance during Study One...... 54 5.11 Set of Unit Conditions used in the Study 2...... 57 5.12 Set of Unit Decision used in Study 2...... 58 5.13 Size of Solution space from Study 2...... 58 5.14 Elite candidate performance during Study 2...... 62 5.15 Combat performance information about candidates that failed to win consecutive matches...... 64 5.16 Study One’s Solution space size...... 66 5.17 Elite candidate performance during Study 3...... 68 5.18 Combat performance information about candidates that failed to win consecutive matches...... 70 5.19 Combat performance information about candidates that failed to win consecutive matches...... 71 5.20 Results from the Wilcoxon rank sum test, comparing candidates from study 3 against those from study 2 ...... 71

ix x List of Tables

5.21 Results from the User Testing performed on the 5 best BTs evolved and on the Default AI...... 74 5.22 Results from the User Testing performed on the 5 best BTs evolved and on the Default AI...... 75 Chapter 1

Introduction

In the recent years, the game industry has shown tremendous growth, and has created a multi-billion-dollar industry, reaching millions of consumers. [4]. This, and continued increase in computational power, has resulted in advances in every aspect of video game software. A single-player video game usually involves a player competing against ene- mies or obstacles. Hence, there is a need for the further development of game Artificial Intelligence (AI). AI is applied in different aspects of games, from move- ment to Non-Player Character (NPC) behaviors and reactions, thus having a large impact on gameplay. This impact on gameplay makes game AI a crucial part of game development. However, developing a strong game AI is an arduous task. The AI agents need to behave realistically, as well as seem as human as possible. Furthermore, they need to be able to react to random events and make decisions dynamically, depending on a player’s reactions. These demanding attributes have made game AI development an area that the game industry has focused less on, in comparison to areas such as graphics, animations or physics. At the same time, this lack of focus makes game AI an interesting research topic both for academic and industry use. Many game AI strategies have been developed to cater to the often disparate needs of games from different genres. Behavior trees (BTs) have been proposed as a new approach to the designing of game AI. Their advantages over traditional AI approaches are being simple to design and implement, scalability when games get larger and more complex, and modularity to aid reusability and portability. The popularization of BTs within the gaming industry [10] and scientific community, has led to research that employs techniques favored by fields of traditional AI, such as evolutionary algorithms. They show that diverse sets of BTs can be generated [14], which are able to solve the problem of defeating an opponent, for specific game genres, by generating strategies that might not be immediately obvious to a developer manually implementing an AI[18], or simply creating strategies that

1 2 Chapter 1. Introduction solve specific situational needs. This research has the potential to expedite the production of game AI’s, and create sets of AIs which present diverse behaviors, or even use the behaviors gen- erated to be the sole method by which a game alters the challenge presented to players. However, they still need to be pre-computed and implemented into a game, they cannot adjust their behavior based on the actions of a player, and thus still offer a predictable experience. Methods for creating machine generated game AI’s are usually expensive, requiring a large amount of sample data. This has re- sulted in very little research exploring how to optimize these methods to such a degree that they are able to generate AI’s within a single play through of a game. If this were possible, it could offer developers new ways to offer players a challenge. This report presents an investigation into how to effectively optimize the pro- duction of machine generated game AIs, using the techniques from a sub-field of machine learning - evolutionary computation - in which ‘candidates’ are evaluated on their performance by some metric. The optimization methods proposed focus on providing a ‘proof of concept’ that a system can be designed and implemented, through a series of studies. The system should be capable of producing game AIs with alternative behaviors within a playthrough of a game, and the construction of these behaviors should be informed by the evaluation of previous behaviors, and show a quantifiable improvement in performance. The studies conducted evaluate the performance of a generated AI for the game XCOM 2, a Tactical Turn-based (TBT) game. The AI’s will be evaluated by running simulated game scenarios against the standard AI behavior implemented by the developers of the game. Chapter 2

Background

2.1 Game AI

In the video games industry, developers typically draw upon existing methodolo- gies from the field of artificial intelligence to create behaviors for NPCs, which attempt to simulate the observed behavior of some known entity. However, a dis- tinction should be made between what is considered general purpose AI - which encompass many scientific disciplines attempting solve the problem of creating a genuine intelligence - and game AI, which often refers to a broad set of algorithms that also employ techniques from control theory, computer graphics and computer science. Traditionally the development of game AIs was driven by providing the illu- sion of intelligence to players, and is focused on generating interesting or chal- lenging gameplay, distinguishing it from the fields of general AI. Workarounds are employed to circumvent the limited intelligence of game AIs, for example the difficulty of a game can be increased by making the player face more and more enemies. Game AI/heuristic algorithms are utilized in a wide array of game systems. The most obvious is in the control of any NPCs in the game, with scripting cur- rently being the most common method. Pathfinding is another common use for AI [17], widely seen in real-time strategy games. Pathfinding is the method for determining how to get an NPC from one point in a to another, taking into consideration the terrain, obstacles and possibly visibility. The concept of emergent AI has also been explored in games1 2 3. The AI entities in these games are able to "learn" and adapt their behavior by analyzing the consequences of a player’s actions, rather than the input driving the actions. While these choices are part of

1Halo Series, Bungie Studios, 2001 2Black and White, Lionhead Studios 2001 3F.E.A.R, Monolith Productions, 2005

3 4 Chapter 2. Background a limited pool, it does often give the desired appearance of an intelligence on the other side of the screen.

2.1.1 Perspectives In recent years, game developers have presented an increasing awareness of scien- tific AI methods and there is a growing interest in computer games by the academic community. There are significant differences between the various application do- mains of AI, which serve to prove that game AI can be viewed as a distinct sub-field of general AI. However, a note must be made about the fact that some game AI problems can not be solved without workarounds. As an example, calculating the position of an obscured object based on previous observations is considered a very difficult problem when the AI is deployed in a robotics simulation, but in a com- puter game, the NPC can simply look up the position in the game’s scene graph. Although this can also lead to unrealistic behavior making it not always desirable.

2.1.2 History Games featuring a single player campaign with AI controlled enemies started ap- pearing in the 1970s with AI unit movement being based on stored patterns. Later, around the 1980s, the success of arcade video games like Space Invaders 4 started popularizing the idea of AI opponents. Over the course of the next 10 years, this concept developed via the addition of features such as difficulty levels, distinct movement patterns, game events dependent on player input, unit formations, in- dividual enemy personalities, leader-followers hierarchy, and that is to name only a few of the advancements featured by the games of that decade. In the 1990s, the emergence of new game genres and the general growth of the industry lead to the creation of formal AI tools like Finite State Machines (FSM). For example, real-time strategy games tasked the AI with many objectives, includ- ing incomplete information, path finding problems, real-time decisions and eco- nomic planning, among other things. Although the first games featuring this new AI implementation had major issues with the system, later games exhibited more sophisticated AI, thus confirming the benefits of using the method and leading to further development of the concept.

2.1.3 Modern Video Game AI Once these formal AI models became popularized, research and development shifted towards improving the behavior of computer controlled units. One ex- ample, of the more beneficial and efficient features found in contemporary game AI, is the ability to hunt player units. Many of the initial AI’s exhibited what was

4Tomohiro Nishikado, 1978 2.2. Behavior Trees 5 perceived as machine-like behavior, which makes sense considering the binary na- ture of “yes/no”. If the player was present in a specific area, the AI would react either entirely offensive or defensive. By contrast, in this ’hunting’ state, the AI will look for realistic cues, such as sounds made by the other units or footprints they may have left [24]. These developments ultimately allowed for more com- plex sets of rules, leading to a richer game-play experiences, because the player is encouraged to actually consider how to approach or if to avoid an enemy. Another valuable breakthrough in game AI was the development of a "survival instinct" for AI controlled units. In-game, the computer can recognize the shifting state of different objects or events in the environment and determine whether it is beneficial or detrimental to its survival. The AI can then search for an advanta- geous or safe position before engaging in a scenario that would leave it otherwise vulnerable, such as reloading or throwing a grenade. This can be achieved by set markers that tell the AI when to act in a certain manner. Alternatively an AI could contain a condition to check its avatars health through- out a game, then further commands can be set so that it reacts a specific way at a certain percentage of health. The more creative the conditions are, the more inter- esting AI behaviors can be achieved. However, the conditions and actions making up behaviors like these are usually optimized to make the AI seem more human. Even so, that is a very difficult task and there is still room for improvement in this area. Unlike a human player, the AI must be programmed for all the possi- ble scenarios, which severely compromises its ability to surprise the player unless optimized at doing so, perhaps via the help of other aspects of the game [24].

2.2 Behavior Trees

Hierarchical, state-based techniques are simple and intuitive so they can provide good solutions. Nevertheless, when they increase in size they become too com- plicated and editing them can be risky as simple reconfigurations could make the whole AI system break down. Furthermore, those methods lack of flexibil- ity, meaning that changes in design could require extensive programming work. Behavior trees (BTs) can help to avoid these problems, providing a means to de- scribe sophisticated behaviors through a simple hierarchical decomposition using basic building blocks.

2.2.1 Overview A BT is a mathematical model of plan execution used in various fields of computer science and video games, to generate game AI. This model of creating behavior is a type of finite state machine, with the purpose of switching between a given set of tasks in a modular fashion. The most beneficial aspect of BTs is their ability 6 Chapter 2. Background to create very complex tasks out of simple tasks, regardless of how the simple tasks are implemented. BTs also share features with hierarchical state machines, with the main building block of a behavior being a task rather than a state. A high level of readability, make BTs accessible to developers with varying levels of coding experience, and are less prone to errors, which has seen them embraced by the game developer community [1][2]. Behavior trees provide a hierarchical way of organizing behaviors in a descend- ing order of complexity. They are made of nodes, with the outgoing node of a con- nected pair being the parent, and the incoming node being the child. The child-less nodes are called leaves, and the unique parent-less node is the Root. Each node in a BT, with the exception of the Root, is one of several possible types: Composite node (Selector, Sequence, Parallel and Decorator), Condition node or Action node. There is no limit of how many children a node can have. The execution of a BT always begins from the root, which periodically sends ticks to its child, with a cer- tain frequency. A tick is a signal that enables the execution of a child. When the execution of a node in the BT is allowed, it returns to the parent with a status of running, if its execution has not finished yet, success, if it has achieved its goal, or failure in any other case [20] [3].

Condition Nodes A condition node checks whether a certain condition has been met or not. In order to accomplish this, the node must have a target variable (e.g “Does the player has ammunition?”) and a criterion to base the decision (e.g “Player’s Ammunition enough for shooting”) . These nodes return SUCCESS if the condition has been met and FAILURE otherwise. Conditions do not return RUNNING, nor change values of system.

Action Nodes Action nodes perform computations to change the system state. This can be, for example, shooting an enemy with a specific weapon. If the action node returns SUCCESS, then the player will shoot. It will return FAILURE if, for any reason, it could not be finished, like if the player has no ammunition or returns RUNNING while executing the computation.

Composite Nodes In order to remain relatively simple to work with, while maintaining versatility, composite nodes are often employed. These flow-control nodes define the way in which the tree will be computed. The execution order will change according to the type and attribute of the composite node, but also according to the values of 2.2. Behavior Trees 7 their children. The two simplest composite nodes are the Selector node and the Sequence node.

• Sequence Node Sequence nodes will test their child nodes in a defined order - executing them sequentially from left to right. Sequence nodes will return SUCCESS if and only if all of its children return SUCCESS, and FAILURE if at least one child node returns FAILURE. In programmatic terms, a Sequence works identically to the logical AND function. The figure 2.1 below depicts a simple example of a Sequence sub-tree. Here, the Sequence node has two children, one condition ("Needs Reload?") and one action ("Reload"). In this sub-tree, the unit checks available ammunition and if that returns SUCCESS, the agent performs the "Reload" action, thus returning SUCCESS and so the whole Sequence returning SUCCESS.

Figure 2.1: Sequence checking for Ammunition and if so, the agent Reloads and the Sequence returns Success.

• Selector Node A Selector node is an operational opposite of a Sequence node. Execution order remains unchanged, but Selector nodes will return SUCCESS immedi- ately when one of its children returns SUCCESS. Selector nodes will return FAILURE only when all of its children return FAILURE. Analogously, a Se- lector is the behavior tree counterpart to the logical OR function. In figure 2.2, an example Selector is depicted. Here, the Selector node will return SUC- CESS when one of its children returns SUCCESS. If the agent fails at moving to cover, then the Selector will try to execute the next action and the agent will attempt to use the ability "Hunker Down". 8 Chapter 2. Background

Figure 2.2: The Selector will return Success when either of depicted Actions return Success.

• Decorator Nodes The purpose of decorator nodes is to add functionality to any modular be- havior, without necessarily knowing what it does. In a sense, it takes the original behavior and adds new features. This increases the readability and expressiveness of behavior trees. An example of a Decorator node could be to invert the result state of its child, similar to the NOT operator. There is no default algorithm for decorators, it depends on their purpose.

• Parallel Nodes The parallel node ticks all children at the same time, allowing them to work in parallel, a way to use concurrency in behavior trees. Parallel nodes return SUCCESS if the number of succeeding children is larger than a local constant S (this constant may be different for each parallel node), return FAILURE if the number of failing children is larger than a local constant F, or return RUNNING otherwise.

2.2.2 Uses in game industry and research Behavior Trees were first introduced in the game industry around 2004, most no- tably for Halo 2 by Damian Isla [10] and Façade by Andrew Stern and Michael Mateas [16] - both building upon prior work in the field of robotics and intelligent virtual agents. In later years, BTs became popular in the gaming industry as they could easily be used to implement game AI of different levels of complexity. Thus, they were used in AAA games such as the Halo game series5, Spore6, Black and White7. Killing All Humans 2 8 is an open world game where players are free to roam around and experience the game as they see fit. This aspect makes the produc- tion of an AI even more challenging [12]. Although this non-linear gameplay has

5Bungie Studios, 2001 6Maxis Studios, 2008 7Lionhead Studios 2001 8Pandemic Studios, 2001 2.2. Behavior Trees 9 been shown to immerse players, it also makes it difficult for the developers to con- trol, limit, and pre-script scenarios which players encounter. The way Pandemic Studios’ developers tackled this issue was by modifying the classic behavior tree formalism of a hierarchical finite state machine (HFSM) to a more modular, "puz- zle piece" system that was more flexible and easy to use. In their implementation, everything that characters can do in the game is constructed by putting together states. A state can eventually have more sub-states, that will get activated if the parent state is activated as well. Each sub-state is a smaller part of the parent state responsible for a more specific job. By using this division, they were able to reuse, override or delete sub-states making the behaviors in the game more dynamic. In Crysis 9, a First Person Shooter by Crytek , the developers expanded the use of behavior trees by implementing a system for coordinated tactic actions among the NPCs [23]. In their implementation, they created a two-phased process. In the first phase, ideal candidates for the coordinated tactics are marked, but the action does not begin. The second phase starts the coordinated action when the mini- mum number of candidate NPCs have been marked ideal. This implementation eliminates the chances of high priority actions being overridden, but also allows coordinated actions between NPCs, making the gameplay more interesting and sophisticated. Driver: San Francisco, part of the Driver franchise 10, is a mission based action- adventure game. The game offers a variety of missions that required creating AI drivers with different personalities and goals, such as reckless racers, cops or getaway drivers. Each driver had a specific goal which was in charge of gener- ating and updating paths that controlled their cars. These goals were built using an extension of traditional behavior trees, called Hinted-execution Behavior Trees (HeBT) [19], which allow to dynamically modify AI behaviors. HeBT give devel- opers an extra layer of control over their trees and allow them to create and test new features in a plug-in fashion. Agents were able to take hints, about their be- haviors. Hints were information given that suggested changes in how the agents should react. In the BTs that were created for each agent, a priority of actions was given to each Selector. This priority was able to change depending on the given hint without having to redesign the whole branch[19]. Going one step further, the developers added a new type of Conditions called Hint Conditions in order to im- prove the way Sequences worked with the new type of Selectors. The way Hint Conditions worked was by being able to bypass certain conditions depending on the hint given, resulting in the preferable behavior. With the method described, the developers were able to tweak and modify traditional BTs with few actual changes. Stephan Delmer describes in his research [25] that BTs can be really helpful in the development of game AI for different game genres. More specifically, he

9Crytek, 2007 10Ubisoft, 2011 10 Chapter 2. Background points out the requirement of a highly sophisticated AI when it comes to Real Time Strategy games. In this game genre, the AI should be able to both micro and macro manage operations in real time. Delmer mentions that human players solve this problem by putting their decisions in a hierarchy and continues by sug- gesting a new method of BTs that imitate the human’s hierarchical approach, the Hierarchical Behavior Tree System or HBT. The AI must command all the aspects of the game, and although a single tree could achieve this, it’s construction would be overly complicated and difficult to maintain. The proposed method tackles this problem by splitting the decision making process into sub-trees for each aspect of the AI’s behavior.

2.3 Evolutionary Algorithms

Evolutionary Algorithms (EA) are inspired by Darwinian principles of evolution and natural selection [9]. Over the course of many generations, species change to better suit their environmental needs. This is driven through the process of non- random natural selection, in which individuals of a species, who are better suited to an environment - due to traits inherited from their parents or random mutation - are more likely to survive and produce offspring than an individual which is not [8]. EAs employ a simplified model of biological evolution in order to solve prob- lems. To solve a target problem with an EA, one must create an environment in which a potential solution (individual) can evolve. This environment should be shaped by attributes which help define the problem, and encourage the evolution of good solutions [11].

2.3.1 Genetic Algorithms Genetic Algorithms (GA) are a class of EAs, and are implemented as a computer simulation of the evolutionary process [9]. The most common of these, the Gener- ational Genetic Algorithm has shown to generate useful solutions to optimization and search problems [13], without having to enumerate over every possible candi- date within the solution space.

Methodology Once an initial population of individuals is generated, the Generational Genetic Al- gorithm follows an iterative process, in which each individual within a generation is rated by a fitness function, with respect to the problem being solved [8]. Once an entire population has been evaluated, a new generation can be created. During procreation, parent candidate solutions are obtained by applying a chosen selection strategy to the rated generation. This strategy discriminates in relation 2.3. Evolutionary Algorithms 11 to candidates’ fitness rating. Offspring candidate solutions are generated using genetic operators which recombine and mutate the parent candidate solutions, and form a new generation of solutions.

Generational Genetic Algorithm Outline 1. Initialization Generate a population of individuals. These can be random, created or a combination of both.

2. Evaluation Evaluate each individual using a fitness function.

3. Selection Parent candidates for next generation chosen according to some selection strat- egy. Favoring those with higher fitness rating.

4. Evolution Generate a new population using genetic operators such as crossover and mu- tation on pairs of selected individuals to produce offspring.

5. Iteration Perform steps 2 - 4 until a solution is found that meets a termination condition.

Chromosome In a GA, chromosomes are a set of parameters which define candidate solutions to a target problem. The standard representation of a chromosome is an array of bits which represent information that potentially provide partial solutions to a problem. However, other data structures can be used as well. Chromosome design will be introduced here in the form of a string of charac- ters, as it closely follows the methodology employed in this report. For example, imagine that the solution to a given problem could be represented by the string “Am I a problem?”, with each character representing a part of the solution to a larger problem. In this case, a chromosome could be represented by a string of a fixed length, with each element being able to take on some range of character values (eg. alphabet/ascii). What these character values represent within the con- text of the problem is not relevant to this discussion. The figure 2.3 below shows a visual representation of each element of this chromosome. 12 Chapter 2. Background

Figure 2.3: Visual representation of a string encoded chromosome, holding the solution variables .

Population A population is a set of individuals containing potential solutions to a target prob- lem. Each generation of the algorithm will produce a new population of these individuals. Often the initial generation will be comprised of randomly generated candidate solutions. An example of what a random individual might look like within the introduced context can be seen in the figure 2.4 below.

Figure 2.4: Visual representation of a string encoded chromosome, holding random variables .

Fitness Function The design of a GA’s fitness function is critical in arriving at a solution to a prob- lem, as it determines how optimal a candidate solution (chromosome) is at solving the problem. This is quantified by a numerical measure, and provides the fitness value by which candidate solutions are evaluated. The form of a fitness function is dependent upon the nature of the target prob- lem. Following on from the above example of matching string values, if a character within a candidate solution is an exact match for the character which has the same index within the target string, the candidate solution could gain a fitness value of 1, and 0 if not. Meaning that the maximum fitness value would be the length of the string itself. Alternatively, each character could be compared for every index of the can- didate and target strings. But rather than checking for an exact match, it could compare the distance of their Ascii values. For example, the first character in the target string is ‘I’, which is represented by the Ascii value 73. If a candidate so- lution were to have a first character of ‘P’ which has an Ascii value of 80, then a quantifiable distance measure is available. This distance could then be a negative representation of an individual’s fitness, with the ultimate solutions fitness value being 0. 2.3. Evolutionary Algorithms 13

Selection Strategy Selection strategies decide which individuals will be used to form the chromo- somes of a new generation. Individuals are either copied entirely to a new gen- eration, or paired with others to become parents to offspring candidate solutions, which will populate an evolved generation. All selection strategies favor the fitness value of individuals when choosing a parent. How a selection strategy utilizes the fitness value to select parents cre- ates selection pressure. The GA process of iteratively selecting parent candidates, affords individuals the opportunity to be parents and reproduce multiple times, the likelihood of which can increase with fitness value. Generally, when selection pressure is high, the fittest individuals are being selected more often, breeding out those who are unfit. Roulette Wheel Selection is a fitness-proportionate selection technique where se- lection occurs based on the ratio of an individual’s fitness, in relation to the fitness of all others within a population. The probability of selecting a candidate “c” can be summarized by the following equation 2.1.

Fitness(c) Psel(c) = n (2.1) ∑i Fitness(ci)

Unlike a real roulette wheel, where each section generally has the same sized slice of a whole wheel, and thus the same probability of being chosen, this fitness- proportionate method figuratively allows individuals to take up as much of the wheel as the ratio of their fitness to others’ dictates, making it possible and indeed probable that one or more individuals will be selected to be parents more than once. This is desirable, as in nature, fitter individuals might be expected to breed more than those who are less fit. However, it can lead to premature convergence on less optimal solutions than other selection strategies. Rank Selection is similar to Roulette Selection, except that the selection probabil- ity is based on fitness ranks rather than the fitness value. Individuals are ranked based on their absolute fitness values, and the probability of selection is based on this rank. This means that it makes no difference to the selection preference if the highest ranked candidate is 100% or 0.01% fitter than the next ranked, the selection probabilities will be the same. This strategy tends to avoid premature convergence, as it applies less selection pressure for large fitness differentials during early generations. However, as an optimal solution is approached in later generations, this method begins to apply higher selection pressure by amplifying small fitness differences. Tournament Selection is almost the default selection strategy for GAs, as it works 14 Chapter 2. Background well for a wide range of problems. Each time a parent candidate is to be selected, a random sample of individuals from a population are ranked by their fitness and the fittest of those are selected. The size of the sample taken from a population determines the selection pressure of this strategy, with more samples taken the higher the chance of selecting fitter individuals. When the sample size is chosen to be 2 individuals, the selection pressure is often applied by a fixed probability. During selection, a random number between the range of 0 and 1 is generated, and if it is lower than the fixed probability - which usually greater than 0.5 to favor fitter individuals - then the fitter individual is selected. Both variants provide a simple way to control selection pressure. Elitism is effectively a form of truncation selection, where a defined amount of the fittest individuals are copied directly to the next generation. Due to the way that offspring candidates are generated from the parent candidates, sometimes ideal candidates solutions can be lost, and the offspring can be ‘weaker’ than the parents. GAs will often rediscover these candidate solutions later on, but it is not guaranteed. Elitism is designed to combat this and can have a large impact on per- formance by ensuring that the algorithm does not waste time re-discovering partial solutions that were previously lost. Individuals which are preserved through gen- erations via elitism are still eligible for the selection as parents when breeding the rest of the offspring candidates of a new generation.

Genetic Operators Genetic operators are used to produce the offspring candidates which form the next generation. It is a process that maintains genetic diversity, which is necessary for successful evolution.

Crossover This genetic operator is what distinguishes GAs from many other evolutionary algorithms [15], due to selected candidates not always being copied to the next generation. Crossover takes two selected candidates and recombines them into offspring candidates. How this mixing and matching occurs depends on the format of a chromosome, but there are some commonly used methods.

• Single-Point Crossover A single-point crossover method generates a point within the the chromo- some of parent candidates, and swaps T number of values before that point. Continuing the string example, for a string of length L, a number would be generated between 0 and L-1, and this crossover point would represent a character’s index within the string. All characters stored in these indices 2.3. Evolutionary Algorithms 15

below the crossover point would be swapped between the parent candidates to form two offspring. See figure 2.5.

Figure 2.5: Single-point crossover performed an a string.

If the number generated is exactly 0 or L-1 then no crossover would occur, it’s possible to define custom ranges for crossover point generation, such that crossover occurs more centrally within the data structure - should this be desirable. A problem with single-point crossover is that it can inhibit evolu- tion when there is linkage between elements of a chromosome. If neighbor elements rely on each other more than others to form partial solutions, then the crossover point generated can favor breaking up certain sections of the chromosome [15] due to the length of that section.

• Two-Point Crossover Two-point crossover swaps all elements of the parent candidates between the two points generated, in order to generate offspring. See figure 2.6. This method helps reduce the bias created by chromosome linkage, however some longer sections of linked elements are still more likely to be broken up than others. The number of crossover points here can be increased in multi- point crossover methods, where each second ‘section’ formed between the crossover points will be swapped. 16 Chapter 2. Background

Figure 2.6: Two-point crossover performed an a string.

• Uniform Crossover In order to treat each element fairly with respect to linkage, crossover can occur on each element of a chromosome independently. This is known as uniform crossover, where a coin toss with a fixed probability, determines if crossover should occur at each element of the parent candidates’ chromosome as seen in the figure below 2.7.

Figure 2.7: Visualization of Uniform Crossover - The ‘H’ characters represent the positive result of a coin toss.

Mutation Mutation is a tool to prevent premature convergence and to help maintain diversity within a population. In the early generations of a GA, a lot of information is discarded because it performs badly within the context of that particular candidate solution. As the algorithm begins to produce fitter candidates, these pieces of discarded information, could now potentially offer new combinations that might be desirable. The mutation operation iterates through every individual within a newly cre- ated generation, and at each element of its chromosome a coin toss with a certain 2.3. Evolutionary Algorithms 17 probability is conducted, to determine if a mutation occurs. An example can be seen in the figure 2.8 below.

Figure 2.8: Visualization of Input (top) and Output (bottom) string from a genetic mutation operator.The character H represents a successful coin toss.

2.3.2 Genetic Programming Genetic Programming is, in essence, - merely an adaptation of Genetic Algorithm, with further changes to accommodate handling of a different data structure. The main difference between Genetic Programming and Genetic Algorithms is the rep- resentation of the solution. Genetic Programming creates computer programs in the lisp or scheme computer languages as the solution. Genetic Algorithms create a string of numbers that represent the solution [11]. Genetic Programming, uses four steps to solve problems:

2.3.3 Uses in game industry and research This section will discuss how genetic algorithms have been employed to generate solutions to game related AI problems. Bullen et al. [5] pointed at the ever increasing complexity of the game industry requiring more sophisticated AIs in games. Specifically they focus on the devel- opment of Non-Player Characters(NPCs). Achieving high quality AI NPCs is an arduous task involving many parameters that have to be defined and tuned accord- ing to the nature of the game in order to achieve the desired behaviors. Bullen et al created a prototype system that evolved NPCs using Genetic Algorithms. In their experiment, they used the Unreal Tournament 2004 game as a research platform and put two groups of NPCs to compete each other, one group was evolving over time using GA and the other one was a fixed control group. Through their exper- iment, the evolved NPCs kept improving their performance successfully proving that Genetic Algorithms could be used to evolve NPCs that could compete in a commercial game. In similar manner, Cole et al [7] used GAs to tune the bot behavior in Counter Strike 11, a popular first person shooter game. In Counter-Strike, the behavior of the NPCs is based on hard coded parameters. By increasing the amount of these parameters, the NPCs behaviors become more sophisticated and realistic.

11Valve Corporation, 2000 18 Chapter 2. Background

But tuning these parameters is an arduous, time consuming and costly task. To solve this problem, a GA was implemented to fine tune variables dealing with NPC weapon selection and aggressiveness. In their experiment, they compared the evolved NPCs with NPCs that had their parameters manually tuned. Their results showed that the evolved NPCs performed as well as those with manually tuned parameters. Genetic Algorithms have also been used in Turn-Based Games. Byrne et al. [6] researched the use of genetic algorithms as a method for creating game AI in a fighting turn-based game called Toribash 12. They pinpointed a problem with developing game AI is that usually it is quite restricted and prone to errors when the game environment changes. Thus, they suggested that the use of GA can solve this problem by being able to adjust the AI behaviors depending on the changes on the game environment. Their goal was to develop a GA capable of producing realistic AI behaviors. The use of GAs could potentially increase the replayability of the game and reduce human intervention and development time. In Toribash, one controls more than 20 joints of a rag doll giving them one out of four possible behaviors (extend, contract, relax or hold). There are more than 4 trillion possible move combinations, making GA ideal to search and find solutions. Byrne et al results shows that GAs are capable of evolving moves for Toribash successfully, but also that using GAs is ideal for developing game AI.

2.4 Evolving Behavior Trees

As discussed above, behavior trees are a tree-based structure with conditions, ac- tions and composite nodes. Their formalism implies that behavior trees could be evolved successfully using Genetic Algorithm techniques. Perez et al [22] investigated the use of Genetic Programming as a means for de- veloping game AI in dynamic game environments. More specifically, they applied Grammatical Evolution (GE) [21], a grammar-based form of Genetic Programming, to evolve controllers for the Mario AI Benchmark 13 based on a Behavior Tree rep- resentation. GE works by using GAs to evolve integer strings, which are then used to create a syntax of possible solutions to the problem in hand. Afterwards, these solutions are evaluated and their fitness is fed back into the system. Perez’s im- plementation came fourth in the Mario AI Championship14, strengthening the idea that Genetic Programming systems can be used successfully to evolve game AI, but also that Genetic Algorithms can be combined with Behavior Trees and other AI methods to produce novel solutions.

12Nabi Studios, 2006 13Mario AI Benchmark, http://www.marioai.org/gameplay-track/marioai-benchmark 14Mario AI Benchmark, http://www.marioai.org/ 2.5. Platform of Application 19

The work of Lim et al. [14] specifically deals with evolving behavior tree struc- tures. It used Genetic Programming (GP) to evolve AI controllers for the DEF- CON15 game. It starts with a set of hand-crafted trees, encoding feasible behaviors for each of the game’s parts, and separate GP runs are then used for each part, creating new behaviors from the original set. The final combined tree, after evolu- tion, was pitted against the standard AI controller that comes with the game, and achieved a success rate superior to 50%. This hints at the possibility that such an approach is indeed feasible in the development of automated players for commer- cial games and put it on the map of viable methods for developing game AI. In his thesis, Oakes [18] claims that interest in AI from the game industry has consistently increased in recent years and mentions that there is a need for more complex and sophisticated game AI, as players’ expectations are also on the rise. Therefore, he highlights the importance of AI development as a research topic. His research focused on applying GAs to evolve strategies for a turn-based strategy game using Behavior Trees, that he then tested against each other as well as against the default AI of the game. To evaluate these strategies, he used Battle for Wesnoth 16, an open-source game. Oakes’ results show that the evolved strategies can successfully compete with each other, but also win against the default game AI.

2.5 Platform of Application

2.5.1 Turn Based Tactic Games Certain game genres require AI’s which are able to produce complex, seemingly intelligent behaviors (strategy, tactics, shooters), often requiring the AI implemen- tation to be performed via some behavioral model (e.g. Behavior Trees). One of these genres is turn-based tactics, or TBT. The gameplay of turn-based tactics games can be broken down to 2 major components - a turn-based timekeeping mechanic and tactical combat scenarios. TBT games lean towards employing military tactics and focus on their intri- cate and planned-out execution. The genre is inspired by tactical and miniature war-gaming, and due to the static nature of turn-based gameplay, dice or random number generators are often used to emulate variables that can be perceived as based on chance. A few examples are attributes such as unit attack hit chance or attack critical hit chance. The specific mechanics and encounter depth of a given TBT can vary greatly, however the gameplay generally centers around two opponents (player or AI) con- trolling a team of units each, with the winning condition being to eliminate the

15Introversion Software, 2006 16David White, 2003 20 Chapter 2. Background opposing team. Each opponent takes turns to issue instructions to each of their units, to move or use abilities, and once every controlled unit has no more actions available for the active turn, control is passed to the opponent.

2.5.2 XCOM 2 The XCOM games series is a science fiction video game franchise that began with the turn-based tactics/Strategy video game UFO: Enemy Unknown created by and MicroProse in 1994. In 2012, the series was rebooted under the title X-Com: Enemy Unknown, belonging to the same TBT/Strategy game genre, published by Games17, developed by Firaxis Games18, with the expan- sion entitled Enemy Within, being released in 2013. XCOM 2 was released in 2016 - the direct sequel to Enemy Unknown/Within - and bundled with the game was the Source Development Kit (SDK) that Firaxis used for making it. The campaign mode sees players in command of a mobile mili- tary base, fighting against alien overlords controlled by an ‘AI’. The base forms the ground of the games’ strategy layer, which essentially houses the game’s progres- sion systems. Progression manifests itself by allowing players to conduct research into upgrades for items and abilities, so that the units commanded by the player in the tactical layer of the game, become more powerful or produce alternative gameplay. While the strategic base management layer is an important aspect of XCOM 2, the true core of the game lies in the gameplay provided by the tactical mis- sions. This involves the player commanding a squad of units and leading them into combat against alien units. Generally, the players squad of units will have to fight several pods (the alien equivalent of a squad) in order to complete a mission. The environment of the tactical layer is traversed using a tile-based grid layout representing fixed positions in the game world that units can be moved to. Combat scenarios begin when a unit under control of the player enters the sight range of an alien unit, activating the entire pod, who then assume a defensive position. The combat scenario will continue with the player and AI taking turns to issue orders to controlled units, until either has no remaining units (the size of a player’s squad is variable and tied to strategic layer of the game). Each unit within a squad gains 2 actions points (AP) when the players’ turn begins, and the turn ends when all available action points have been exhausted or the player manually ends the turn. The APs are used to issue orders to units, and this is where tactics are employed in the game. Players must move units to tile positions on the map that give them some kind of advantage, so that they can use the units’ abilities to deal damage,

172K Games, https://www.2k.com/ 18Firaxis Games, https://www.firaxis.com/ 2.5. Platform of Application 21 heal allies, etc.The placement of units is important because of the way the game decides if an ability selected by the player is successfully executed, as many abilities in the game are based on some element of chance. For example, when a player wants a controlled unit to shoot at an alien unit, the chance for the player unit to hit is based off a set of calculations, including the units base hit chance, ideal range, and whether or not the target is in cover - thus obstructing the view of the shot. However, this environmental cover is dependent on the tile-based levels, movement being done by point and click interaction with the tiles. From the AI’s perspective, all tiles are evaluated based on an internal weighing system with regard to the purpose of the movement. Since a player can devise tactics that rely on multiple units working together, the AI features 2 systems to attempt to compensate for this advantage, thus retain- ing the illusion of intelligence and presenting a more appropriate challenge to the player. One of them is the “leader-follower” system, which allows the multiple pods of enemies present in a level to organize themselves and better tackle the player’s advance, while the other is the AI’s "hunting" ability, or tracking alerts triggered by the player or narrative level events, which in combination with the first system confers the AI opponent awareness of the level without using cheats. Because of all these different layers of strategic and tactical considerations, the developers decided to employ behavior trees in the implementation of XCOM’s AI system. As such, several types of trees were crafted manually, providing differ- ent, but complementary behaviors for the various enemy units present in the game. Furthermore, the AI system takes advantage of the implementation of level encoun- ters, featuring multiple groups of enemies on the same map, that present awareness to potential alerts triggered by the player or map narrative events. This allows the otherwise highly individualized behaviors to group together and present emergent team behavior.

Chapter 3

Project Statement

Recent academic contributions to the field of game AI show genetic algorithms to be capable of producing useful AI’s for games which use a BT interface to de- sign their behaviors. Genetic algorithms are expensive search heuristics, and stud- ies generally present methodologies which evolve AI solutions before they can be tested against a player, due to the volume of simulations required. It is proposed that a methodology can be developed to evolve candidate AIs that sufficiently challenge a player, and that they can be produced within relatively few evaluations. Based on the work done previously with genetic algorithms and BT driven game AI, the methodology will be developed by simulating game ses- sions between candidate AIs and a ‘default’ AI of the chosen test environment. It is expected that the methodology should be optimized sufficiently to produce solutions within a ‘normal difficulty playthrough’ of a game, such that it could be employed to learn from human players, as they engage with every candidate AI of every generation, each with their own distinct behavior - creating a highly dynamic and potentially engaging experience. The development platform for the project is the turn-based tactics game XCOM 2. This genre of games feature some of the most complex gameplay rules, em- phasizing the necessity for tactical and strategic player thinking. The gameplay provides a worthy test for an AI’s ability to defeat human opponents, in a scenario where the latter have ample time to consider their options before taking action. As one of the most recent releases in the genre, XCOM 2 matches all the required criteria of this project, as well as being of AAA production value and providing a free SDK, holding itself as an appropriate choice of development platform.

23 24 Chapter 3. Project Statement

Considering all the above, the project’s problem statement is thus: “Can a system be developed to generate game AI behaviors for a TBT game, which are capable of challenging human players? How can this process be opti- mized to the extent that BTs can be produced alongside a player’s progression?” Chapter 4

Design and Implementation

The evolution of AI’s for XCOM 2 using a genetic algorithm required several ele- ments to be designed and implemented in order to create an environment capable of producing solutions. The results of that process provide the basic structure of the proposed methodology created to develop the system used for this project.

4.1 Mod Implementation

4.1.1 Game Systems To provide an environment to evolve AIs, several changes need to be implemented into XCOM 2. These changes were made possible by creating a ‘mod’ for the game, using the SDK provided.

AI vs AI By default, XCOM 2 does not provide functionality which allows the AI to play against itself. However, the source code accessed through SDK is extensive, and almost all mechanics can be altered or recreated. There was no simple fix to this problem, however, a workaround was created based on the implementation of the ‘Panic’ effect. When a player’s controlled units are in combat, certain events can cause this effect to trigger, removing control of the unit from the player for 1-3 turns. During this time, control over the unit is given to a specific BT contained in the AI configuration files. Therefore, an ability was created to run a modified version of the ‘panic’ effect, such that units normally under the control of a player, instead run a specified behavior inside the AI configuration files, created by the evolution strategies employed.

25 26 Chapter 4. Design and Implementation

Manual simulation The configuration file which contains the definition of behaviors available to the AI (and indeed all configuration files), are only read when the game is launched. Hence, any modifications made after launch would require a game restart to take effect. This meant that the AI configuration files needed to be manually altered to contain the current generated AI to be evaluated, and thus the entire process of simulating matches could not be fully automated.

Data Export To effectively evaluate the generated AIs, combat data was needed. XCOM 2 shipped with an analytics system which is used to display various “fun” statistics to a player at the end of a tactical mission. The implementation of this interface element is contained within the “MissionSummary” class, which in order to out- put information used to evaluate the generated AIs, was overridden within the implementation of the mod.

4.1.2 Normalization XCOM 2 is a complex game, with numerous types of units, abilities, missions types, and a pseudo-random map generation system. It has various systems work- ing together to allow its current set of AIs to produce a desired challenge to players. It was necessary to restrict a large portion of the game content to provide a con- trollable, stable and fair environment within which to evolve AI’s.

Units The generated AIs and their opposition will control a team of 6 units each, and these units will be identical with the exception of the definition of the BT driv- ing their behavior. Setup of the units and their associated armor, weapons and statistics was achieved by editing the relevant configuration files. For example, the ‘XComGameData_Char acterStats.ini’, provides access to units’ health and combat statistics, such as critical strike chance. There are many unit variables that can be assigned values, but most of them are not relevant to this discussion. The main item of interest for this report is that the health of each of the units was initially set to 4.

Abilities The abilities available to both sets of units are also identical, with the ones selected being the basic building blocks or the game, and feature within most of the fallback AI behaviors set up within the AI configuration file. It was felt that these should be 4.1. Mod Implementation 27 sufficient to provide a level of complexity that would support the proof of concept and define manageable solution space sizes.

Attack/Shoot Attack/Shoot is a generic ability available to all units active in the game world. Its action is tied to the weapon equipped on a unit, as it defines the range of the attack, the damage it deals, etc. All units are equipped with a standard first tier assault rifle, resulting in them doing 3 damage with a successful shot and 5 damage with a critical strike. Whether or not a shot hits or lands with a critical strike is not only determined by the statistics of the unit using the ability, but also its target and their relative po- sitions in the environment. For example, a target unit could have an ability which increases their defensive abilities, or it might be located on a tile which has envi- ronmental cover between itself and the attacking unit - reducing hit chance greatly. In fact, if there is no cover between two units, they are considered to be ‘flanking’ each other, and shots taken against flanked units have a greatly increased critical strike chance (50%). A unit can only shoot once per turn, however it can do so after using a first action point for something else, as attacking requires a minimum of 1 AP and consumes any remaining APs.

Move Move is another generic ability available to all units, as it facilitates a units traversal of the game world. The tactical mission game environments are com- prised of an array of tiles arranged in a grid formation and as such, a pathfinding algorithm is employed to find routes to all target tiles. The UI available to players (figure 4.1) gives a good overview of how movement is handled. The current active unit, pictured bottom-left, is considering a move to a tile location which provides environmental cover from two directions (indicated by the blue shield overlay). 28 Chapter 4. Design and Implementation

Figure 4.1: XCOM 2 unit movement UI.

A unit can only move through a defined number of tiles per action point. The blue line (see figure 4.1) surrounding the active unit shows all tiles it can move to using a single AP. The outer yellow line represents the tiles it can move to at the cost of consuming both APs available for that turn. The variable which defines the mobility of a unit was set to 12. When the XCOM 2 AI wants to move a unit, it has to evaluate each tile it can reach and assign them a value. This calculation is dependent on information such as the cover provided by the tile, the distance to the tile, how many enemies are visible from the tile, if moving to that location affords a flanking position, and various other considerations. The AI configuration file contains a set of profiles, which have weight values intended to represent different tactical movements. For example, an aggressive movement strategy might care less about the cover value of a destination tile, than a defensive one. The strategies implemented are: Defensive, Standard, Aggressive, Fanatic, Hunting, Advance Cover, and Flanking. These will form the basis of the movement options available to the evolution environment.

Overwatch Overwatch is an ability which, like shooting, consumes all remaining APs when used but only requires 1 AP to activate. It is designed to allow a unit to ‘watch’ an area of the map (within the units line of sight) during the opponent’s subsequent turn, such that if the opponent moves a unit through this area, the overwatching unit will take a reaction shot at the moving target. Reaction shots taken with over- watch suffer a hit chance reduction and cannot deal critical damage. If units using overwatch take damage, the effect is removed. 4.1. Mod Implementation 29

Hunker Down Hunker Down is a defensive ability that consumes both APs when used and only requires 1 AP for activation. With this ability, a unit will gain a large boost to their defensive statistics at the cost of not being able to use offensive abilities. To be able to hunker down, a unit must be on a tile which has environmental cover in at least 1 direction.

Map Implementation XCOM 2 features a procedural content generation (PCG) system to generate maps in its tactical missions. The implementation of this PCG system is quite deep and difficult to restrict, as it was designed to create maximum variability and re- usability. This means most of the assets are compatible with each other and can be combined to create a large number of playable levels. Each map is considered a plot and it holds the data for any enemy encounters, victory conditions, any level narrative elements, objectives, etc. Each plot has some predetermined positions that can be filled by parcels. These smaller level elements - parcels - can be any combination of even smaller map elements, but usually form some sort of blueprinted structure, ranging from things like a small park to a house or even large buildings. The fact that this process can not be altered or bypassed infers a lack of control over the test environment and might lead to noisy data. Due to the random nature of PCG, combined with how influential terrain is as a game mechanic (cover distribution, line of sight, height, etc.), it was decided to attempt to reduce the variability of the terrain, by setting up a single map, that was designed to be more consistent in it’s content. This map was modified in the level editor provided with the XCOM 2 SDK, using assets that the PCG algorithm can select from, which are less likely to impact combat scenarios. For example, assets designed for missions taking place in the wilderness are less likely to have explosive assets, such as gas canisters which can damage XCOM 2 units. Any map in XCOM 2 must contain at least one parcel. The map implemented for the evolution environment contained a single small-sized parcel positioned in the centre of the map, with each team setup to spawn either side of it.

4.1.3 Environmental Cover Environmental cover has been formally mentioned at various points of the report so far, and it clearly plays an important role in the defensive tactics employed in the game. Environmental elements such as trees often permanently occupy a given tile of a map. If a unit located at a tile which is adjacent to a cover tile, the unit receives a ‘cover value’ in the direction between the two tiles. A unit’s defense value is greatly influenced by the cover mechanic within 30 Chapter 4. Design and Implementation

XCOM 2. There are 3 types of cover values that a unit can have and they are represented by the numbers 0 (not in cover), 1 (in low cover) and 2 (in high cover). The values are used as modifiers in a units defense calculation, providing a 20% penalty to attack hit chance per unit of cover (20% in low cover and 40% in high cover), .

4.1.4 Default AI To evaluate the generated AIs, an opponent AI was required. This ‘Default AI’ was created from the default behaviors constructed for the basic alien units. The default behaviors are not designed to be overly challenging to a human opponent, as the game uses other means to provide that. As such, the generated candidates are expected to perform better against the default AI than human opponents. The behaviors used to construct the default AI, and its definition are found in the AI configuration file, which contains all the BT nodes used to construct the game’s AI.

AI configuration file Following the BT formalism, the behavior nodes contained within the ‘DefaultAI.ini’ configuration file, are employed with a modular approach. It contains a vast array of condition nodes, small action sequences and selectors, tactical weights, and var- ious other items. Within the INI files, each item is defined as a ‘Behavior’ and is given a name, for example ‘Move_Defensive’. This name is then able to be refer- enced by various sequences and selectors to execute the associated instructions. The names of the ‘Behaviors’ is how the GA will generate AI solutions, using a data structure to represent selected ‘Behaviors’ from the configuration file. Ad- ditional ‘Behaviors’ can also be added and as such, the output of the GA has to consider the syntax of the INI file, after evaluating performance and creating the new AIs.

4.2 Genetic Algorithm Implementation

The design of the environment in which a GA will evolve candidate AIs for XCOM 2 encompasses more than balancing the settings and variables which define a game state. The environment in which the evolution takes place must be considered as to encourage the generation of good solutions. This section will describe the process of how the AIs behavior tree structures are encoded into a chromosome and the implementation process of the GA itself. 4.2. Genetic Algorithm Implementation 31

4.2.1 Chromosome Design As stated previously, the data structure of a chromosome for this project will be based on a fixed-length string of characters. The exact length and structure of the string is altered between experiments, however two important things are al- ways required: the elements from which candidate solutions can be formed and the structure in which they will be placed. The elements available for the GA to generate a candidate will be represented by two distinct sets - Unit Conditions and Unit Decisions.

Unit Conditions Unit Conditions are a set of variables which reference condition nodes contained within the XCOM 2 AI behavior configuration files. These enable an AI generated by the GA to consider information about a controlled units’ current active match scenario. The Unit Conditions available to the GA for each experiment will be represented by a range of consecutive capitalized characters, depending on how many are needed.

Representative String Character Example Condition Identifier "A" ‘UnitHasHighHealth’ "B" ‘UnitIsFlankingAnEnemy’ "C" ‘UnitHasAmmo’

Table 4.1: Example Unit Condition characters for the GA to choose from, and their identifiers.

Unit Decisions Unit Decisions represent a set of XCOM 2 behaviors which end in a single action node, or one of several action nodes. They are designed to be the ‘decision’ that a generated AI makes, after considering a number of Unit Conditions. They often contain more than one action node within a selector, to prevent a unit from exiting a behavior without performing an action. For example, if an action node that gives the instruction to ‘shoot’ is reached but the AI’s currently controlled unit has no ammo, these behaviors could allow the character to move or reload in the case that shooting is not possible. In the SDK implementation, these are represented as selector nodes, and each denominator in the name of the decision is the counterpart to a basic unit action. These basic actions are considered as such because they can not be further decomposed and their implementation is done in the original source code. 32 Chapter 4. Design and Implementation

Representative String Character Example Decision Identifier "a" ‘Shoot’ "b" ‘Move’ "c" ‘OverwatchOrShootOrReload’ "d" ‘MoveAggressiveOrFlanking’

Table 4.2: Example Unit Decision characters for the GA to choose from, and their identifiers.

The representation of Unit Conditions and Unit Decisions (UC&Ds) as capital- ized and lower case characters respectively, does limit the maximum potential size of each set. Within the stripped down state of the test environment, and the at- tempts to keep string complexity as low as possible, this limit was never reached. Given the evaluation of the chromosomes will require the observation of entire match simulation, the representation of partial solutions as recognizable charac- ters enabled real-time informed analysis.

4.2.2 Example Chromosome Implementation The structure within which the sets of characters will be placed within the string representing the chromosome, is altered between experiments, however an exam- ple that works very similarly will be presented. A decision structure is required in order to define the form of the string. An example decision structure can be seen in 4.2, and this can be encoded in many ways using a character’s index within the string to always represent the value a specific node in the decision structure. Unit Conditions and Unit Decisions are mutually exclusive for each index of the string.

Figure 4.2: An example decision tree structure, with example string representa- tions

Each AI generated will need to contain behaviors for the 2 action points that an XCOM unit has available per turn. As such, each chromosome will have its length 4.2. Genetic Algorithm Implementation 33 doubled to encode two separate decision structures. Further considerations, such as restricting certain ordering situations or the duplication of characters will be addressed for each experiment.

Figure 4.3: Complete example chromosome

4.2.3 WatchMaker Framework The WatchMaker Framework for Evolutionary Computation 1, is an object oriented framework for implementing evolutionary/genetic algorithms in Java. This frame- work is a useful prototyping tool, as it enables users to freely design custom im- plementations for the individual elements of a GA, such as genetic operators. The central component to the Watchmaker Framework is its Generational Evolution Engine, for which a number of interfaces are implemented.

4.2.4 Generational Evolution Engine As the AIs will be encoded into strings, the Generational Evolution Engine inter- face was used with a string as it’s defined type. The arguments to the method call (figure 4.4) are elements of a GA that will need to be either selected from pre-built implementations, or specifically designed and implemented for the context of the problem area. The framework provides common implementations found in Evolu- tionary Computation, as well as interfaces to quickly implement custom variants. The custom implementations developed for these experiments were iterated upon for each experiment, therefore an overview will be provided here, with specific information presented when required.

1WatchMaker Framework, http://watchmaker.uncommons.org/ 34 Chapter 4. Design and Implementation

Figure 4.4: Call to instantiate and Evolution Engine of type string, using the Generational Evolution Engine interface

The arguments to the method call 4.4, are the references to the following classes:

• Candidate Factory A class which contains a method which returns random candidate solutions in the form of a string

• Evolutionary Operator Pipeline A class which pipelines a number of classes which perform evolutionary operations on candidate solutions.

– Crossover A class which recombines candidate solutions into offspring. – Mutation A class which randomly mutates

• Fitness Evaluator A class which returns an integer value to the evolution engine, representing the fitness of a candidate solution. This part of the algorithm will remain consistent between experiments from an implementation perspective, as the fitness evaluation takes place within XCOM 2. Once all candidates of a pop- ulation have been evaluated, their fitness values are stored. When the evolu- tion engine needs the fitness for a candidate, this class returns its associated value. 4.2. Genetic Algorithm Implementation 35

Figure 4.5: Example fitness evaluator code

Selection A class which allows the evolution engine to implement a chosen selection strat- egy. The GA used a pre-built implementation of Roulette-Wheel Selection for the studies conducted throughout this project, which was chosen due to its fitness- proportionate nature encouraging faster convergence.

Experimental Procedure As discussed, the evaluation of a candidate solution within this experimental sce- nario is conducted outside of the GA framework. This necessitates a slightly con- voluted experimental procedure:

1. Generate initial population of n candidate solutions.

2. Copy a single candidate (generated behavior tree) into the modified AI con- figuration file.

3. Launch X-Com 2, simulate a match of generated AI vs Default AI and record data.

4. Repeat steps 2 and 3, for each candidate within a given generation.

5. Compile results and make available to GA.

6. Evolve population to for a new generation of candidate solutions.

Chapter 5

Experiment And Results

This chapter will demonstrate the results of the development of the system em- ployed to optimize the process of evolving Behavior Trees via a Genetic Algo- rithm for the turn-based tactics game XCOM 2. The development process involved conducting a pilot test, three studies and a final user test. Differences in imple- mentation for each individual study, and associated results will be presented and discussed chronologically.

5.1 Pilot Test

The complete test environment required several modifications and custom imple- mentations, therefore it was important to check that everything was working as expected. A pilot test was conducted to evaluate the entire experiment process, in order to analyze whether test environment is fair, and the various elements of the process are performing as expected. Particular consideration was given to the following: • The Fitness evaluation method

• Unit Health and Damage

• Map and Mission setup

• Initial evaluation of the sets of Unit Conditions and Unit Decisions

• Implementation of the GA.

5.1.1 Design The test will attempt to evolve a simple chromosome design over 4 generations of 40 candidate solutions, following the procedure described in the genetic algorithm experimental design section.

37 38 Chapter 5. Experiment And Results

Chromosome The chromosome used in the pilot test, was designed to be simpler than the exam- ple described previously. The primary reason for this was that the simpler repre- sentation made it possible to evaluate Unit Conditions and Decisions in real-time, to see if their implementations were evaluating and acting upon the game-state correctly according to their value and location within the chromosomes structure. The behavior for each action point an XCOM 2 unit has available per turn, was defined by six successive pairs of Unit Conditions and Unit Decisions. Each Unit Condition would be checked in sequence, and whenever one would return true its associated Unit Decision would be used to enable a Unit controlled by a candidate AI to perform a specific action with the game. The encoding shown in figure 5.1 is the representation of a single action point, the final chromosome will contain 2 of these representations concatenated sequentially for AP1 and AP2 respectively.

Figure 5.1: Chromosome encoding for the first action point

To remove the possibility of performing redundant Unit Condition checks, and reduce the size of the solution space, the formation of candidates was restricted to not allow a Unit Condition to appear twice within a single action point. If the example action point encoding shown in figure 5.1 was to have the character contained at index 0 ("A") placed at any of the other indices available to Unit Conditions, it would be redundant as it would either never be reached (the same Unit Condition at index 0 would call its associated Unit Action), or do nothing (return false). Selection of Unit Actions have no such restrictions, the same value could be inserted into each index of the chromosome associated with a Unit Action.

Unit Conditions and Unit Decisions The set of Unit Conditions available to the GA for the pilot test are shown below (table 5.1). Details on how they were implemented in the XCOM 2 SDK can be found in Appendix B.. 5.1. Pilot Test 39

Representative String Character Behavior tree identifier "A" ‘HasHighHP’ "B" ‘HasWounds’ "C" ‘HasKillShot’ "D" ‘IsFlanked’ "E" ‘OneEnemyVisible’ "F" ‘MultipleEnemiesVisible’ "G" ‘OneOrMoreOverwatchingTeammates’ "H" ‘NoAllyIsHunkerDown’ "I" ‘AnyAllyIsHunkerDown’ "J" ‘NoOverwatchingTeammates’ "K" ‘AllShotPercentagesAtOrAbove50’ "L" ‘AllShotPercentagesBelow50’

Table 5.1: Set of Unit Conditions used in the pilot test.

The Unit Decisions available to the GA shown below (table 5.2), their imple- mentation within the AI Configuration files is available in Appendix C..

Representative String Character Behavior tree identifier "a" ’SelectMoveStandard’ "b" ’SelectMove_Defensive’ "c" ’SelectMove_Aggressive’ "d" ’SelectMove_AdvanceCover’ "e" ’SelectMove_Flanking’ "f" ’SelectMove_Fanatic’ "g" ’SelectMove_Hunter’ "h" ’TryShootOrReload’ "i" ’ConsiderHunkerDown’ "j" ’TryOverwatchOrReload’ "k" ’TryShootIfIdealOrReload’ "l" ’TryShootIfFavorableOrReload’ "m" ’TryShootIfFavorableOrOverwatch’ "n" ’TryHunkerDownOrShootIfFavorable’ "o" ’ConsiderHunkerDownOrOverwatch’

Table 5.2: Set of Unit Decisions used in pilot test. 40 Chapter 5. Experiment And Results

Solution Space The size of the solution space is dependent upon a combinations of the structure of the chromosome, and the amount of Unit Conditions and Decisions.

Size of set of Unit Conditions c = 12 Size of set of Unit Decisions d = 15 Amount of Unit Conditions per AP e = 6 Amount of Unit Decisions per AP f= 6 Total permutations per AP [c*(c-1)*(c-2)*(c-3)*(c-4)*(c-5)]*(d f )7.5*1012 Size of solution space Total permutations per AP2 = 5.7*1025

Table 5.3: Pilot Test’s Solution space size.

Evolutionary Operators Crossover A single-point crossover method was employed for the pilot test. For a string of length S (candidate solution), an integer value is generated between 2 and S - 3, which provides the index at which two parent candidates will recombine. With the restriction of no repeated characters within an action point, this crossover method could potentially cause undesired solutions to be created (see figure 5.2).

Figure 5.2: Example of crossover producing undesirable offspring

To fix this, after crossover is performed, the offspring are analyzed, and if a duplicate is found (as can be seen with the character G at Index 6 of the offspring shown in figure 5.2), then the character found at the same index (index 6) of the parent candidate which did not provide that duplicate character (the character "E" at index 6 of parent 1 in figure 5.2), will be chosen. It is possible that this character could also have been undesirable, if the alternative character had also appeared earlier, in this case a random value would be generated until a desirable one is found. 5.1. Pilot Test 41

Mutation The mutation operator was setup such that it could iterate through the entire string and mutate any gene to a value from the set-type (Unit Conditions or Deci- sions) associated with the index being currently operated on. The probability that a mutation would occur for any given index was arbitrarily chosen to be 0.03 (3%). Again, undesirable candidates can be produced by the mutation operator, and the same correction procedure described for the crossover operator was applied.

Selection Roulette selection was used to choose parents for each subsequent generation of candidates. To keep ‘good’ candidates intact from generation to generation, elitism was employed to retain the 10% candidates with the highest fitness value. As each generation was comprised of 40 candidates, the amount of elite candidates per population was 4.

Fitness Evaluation The aim for the project is to optimize the generation of ’successful’ candidate AIs, and thus perform well in their evaluation matches by meeting the winning con- ditions of the game. To this end, the fitness function (Equation 5.1) will return a value based on the health of the units which compete. If a generated AI wins a match, then all the enemy units have been killed, and fitness is represented as all of the health taken from the opposing team. However, this means that all winning AIs will have the same fitness value. Instead, the AI teams health is used to create a more representative fitness. The health remaining for all alive units is added to the fitness value, thus having both survival and offense as components of the com- putation. The "Team_DamageDone" and the "Team_DamageTaken" can be from 0 to 24. This process is also used for AI teams which lose their matches, which attain fitness for every point of damage done to the enemy team, providing a way to distinguish between the performance of unsuccessful candidates.

Fitness = Team_DamageDone + Team_DamageTaken (5.1)

According to these equations, a fitness value would represent victory with any score above 24, while a score of 48 would mean flawless victory and one of 0 would mean total defeat. 42 Chapter 5. Experiment And Results

5.1.2 Analysis of Data Overview Despite the simplicity of the decision structure housed within this chromosome, the GA was able to continually produce higher fitness values and more winning AI’s for each generation, culminating with a 50% win rate after 4 generations. The average fitness % is average fitness of all candidates in a generation, represented as a percentage of the total attainable fitness. While the win % is the number of wins represented as a percentage of the population size (see figure 5.3).

Figure 5.3: Average fitness % and win % of candidates per generation of Pilot Test.

Successful Candidates With the way fitness was evaluated, and with only the 4 most elite candidates being retained in a new generation, many candidates which won their matches would not be able play another match, but they still would have a high chance of breeding into the next generation. In fact, of the 36 candidates which actually managed to win a single game, only 10 ever made it through to play another match. Of the 10 candidates who played more than a single game, 5 of those played enough games to eventually lose a match, and those candidates each won 2 of the 3 games they played in total(see Table 5.4). Obviously these are small sample sizes, 5.1. Pilot Test 43 however there was an observed tendency for candidates to win a game due to a se- ries of favorable dice rolls, and specific game situations. For example, as units only had 4 health they could be eliminated within a single action point when receiving a critical shot. Often favorable dice rolls would result in heavy implications to the overall match situation.

Candidate Matches Played Matches Won Average Fitness KaIdFhLnBeAbJcKmFcCcHdDh 3 2 20 HeBhCkEeFfKeGlEbIkKeJeDf 3 2 24 eFiJkDjLmEgIhGhIdElLeHkAi 3 2 24 DeCbFcEdHmAkJcKmBmHfLdFa 3 2 42 LbBkIfJjHlDeIkCiJmHeLdFa 3 2 32

Table 5.4: Candidates that won multiple games over the course of the evolution.

Chromosome Despite the simple decision structure of the chromosome and the sets of Unit Con- ditions and Decisions available to form a candidate solution, the solution space is still large. However, the candidate solutions generated showed improvement over the 4 generations. The subsequent studies in this paper will evaluate decision structures which are more complex, and as such consideration will be taken with regards to the size of sets of Unit conditions and Decisions.

Unit Conditions

The graph below (figure 5.4) displays the amount of times a particular Unit Condi- tion appears within candidate solution which managed a victory in a least 1 match. Interestingly, the conditions relating to a unit’s health (A and B), appear the least amount of times overall - contrary to expectation. It is presumed that these have a weaker presence in the successful candidates due to the fact that their values within a game state are highly variable, and within such a simple rigid decision structure, this is not desirable. This idea is supported by the fact that many of the Unit Conditions which had a strong presence in victorious candidate solutions, often would return true the majority of the time, leading to their associated Unit Decisions always being called. 44 Chapter 5. Experiment And Results

Figure 5.4: Graph showing the number of Unit Conditions contained within candidate solutions who won a minimum of one game

As an example, ‘MultipleEnemiesVisible’ ("F") appears 49 times within the 36 candidates which won an evaluation match; this equates to 49 of the 72 action points (70%). In general this Unit Condition will always return true until the en- emy team has been reduced to a single remaining unit, a scenario in which it is still highly likely to win the match.

Unit Decisions The Unit Decisions found in candidate solution achieving 1 or more victories can be seen in figure 5.5. The fact that the GA was evolving solutions that favored Unit Conditions which would generally return true, led to shooting-based Unit Decisions being associated with them. This is not initially clear from the data shown in figure 5.5, as although there are shooting-based variants which are well represented (k, m), there are also movement-based variants (b, e) which are also well represented within the strings of successful candidate solutions. 5.1. Pilot Test 45

Figure 5.5: Graph showing the number of Unit Decisions contained within candidate solutions who won a minimum of one game

However, a closer inspection of the 36 successful candidates shows that the shooting-based Unit Decisions were being favored to be contained in the early elements of the first AP represented by the string. For example, the highest repre- sented of the shooting-based variants ("k"), appears in AP1 of successful candidates a total of 25 times, and 19 of them within the first two Unit Decision elements of the strings. When Unit Conditions are generally returning true, these slots are more likely to be reached than those appearing later in the string. 46 Chapter 5. Experiment And Results

5.2 Study 1

The purpose of this study was to investigate if the evolution environment imple- mented can produce candidate solutions capable of winning a specified percentage of matches, within a certain number of generations. It is expected that changes to the environment - informed by analysis of the pilot test - will produce solutions which are more consistent in their results, and adaptive to varying match scenarios.

5.2.1 Design The pilot test showed that candidates were evolving towards effective solutions, but that the structure of the chromosome, selection of elements it could contain, and the influence of chance, were guiding these solutions towards answering spe- cific match situations and maximizing the positive impact of RNG. The resulting modifications to the evolution environment, and specific decisions in regards to the design of this study will be described. After 3 generations had been evaluated, a decision was taken to adapt the fit- ness function and elitism methods. Despite this creating an amount of unreliability when analyzing the data set as a whole, it was felt that the changes were neces- sary. The nature of these changes and the rationale behind them will be discussed within the relevant sections.

Chromosome The decision structure housed by the chromosome, is based on a binomial structure similar to the one shown in GA implementation section 4.2. The difference was that it simply has another level of depth such that a third Unit Condition is checked along each path, resulting in 8 Unit Decisions being available for each action point [Figure X - Below, main structure]. A restriction was enforced, to not allow the repetition of Unit Conditions along any given decision path. This structure, due to each Unit Decision being arrived at by considering 3 consecutive Unit Conditions, is expected to encourage the evolution of candidate solutions which are flexible to changing match scenarios. An important considera- tion here was to try to encourage the candidates generated by the AI to make use of both action points available per turn. 5.2. Study 1 47

Figure 5.6: Chromosome decision structure for Study One

In the pilot test, candidates evolved towards solutions which used Unit Condi- tions that often returned either true or false consistently, and led to Unit Decisions that were dependent upon chance to be successful. With the new structure, the candidates generated should be far less likely to arrive at a single Unit Decision after checking the status of 3 successive Unit Conditions, over a succession of turns. Despite the limitations of the structure used in the pilot test, candidate solu- tions were being produced which were able to win matches, after just 4 generations of 40 candidates, suggesting that there was room to increase the size of the solu- tion space. The solution space’s dimensions are dependent on a combination of the structure of the chromosome and the size of the sets of Unit Conditions and Deci- sions (UC&D). Despite this structure affording more depth in the decision process, the actual length of the string representing a candidate solution has only increased from 24, in the pilot test, to 30. However, due to the multiplicative nature of the relationship between the decision structure and the UC&D, upon which the solu- tion space’s dimensions are formed, the complexity could still increase by a large amount.

Figure 5.7: Chromosome encoding for the first action point

Unit Conditions and Unit Decisions Many of the Unit Conditions used in the pilot test were constructed by using In- verter nodes, which simply modify the return value of existing condition nodes, 48 Chapter 5. Experiment And Results

flipping true to false and vice-versa. With the binomial decision structure, Unit Conditions which were constructed this way were no longer necessary. For ex- ample, ‘HasHighHp’ checked if a unit’s health was above a threshold, where as ‘HasWounds’ checked if the health was below the same threshold. Having these checks along the same path, would provide and AI with no more information about the game state. Their removal from the set of Unit Conditions available to the GA for this study, reduced its size from 12 to 7.

Representative String Character Behavior tree identifier "A" ‘HasHighHP’ "B" ‘HasKillShot’ "C" ‘OneEnemyVisible’ "D" ‘NoOverwatchingTeammates’ "E" ‘NoAllyIsHunkerDown’ "F" ‘AllShotPercentagesAtOrAbove50’ "G" ‘IsFlanked’

Table 5.5: Set of Unit Conditions used in Study 1.

The set of Unit Decisions increased in size from 15 to 20, to give the GA oppor- tunity to build its own solutions, rather than be overly guided by the Unit Decisions created for this project. The behaviors added end in a single action node, and are represented by the characters; "p", "q", "r", "s", "t" (see Table 5.6). The Unit Decisions represented by the characters "m" and "n" (see Table 5.6) received a concatenated ‘Reload’ ability added to their selector node, in order to reduce the situations each node in a selector would return false and a unit would do nothing for an action point. 5.2. Study 1 49

Representative String Character Behavior tree identifier "a" ’SelectMoveStandard’ "b" ’SelectMove_Defensive’ "c" ’SelectMove_Aggressive’ "d" ’SelectMove_AdvanceCover’ "e" ’SelectMove_Flanking’ "f" ’SelectMove_Fanatic’ "g" ’SelectMove_Hunter’ "h" ’TryShootOrReload’ "i" ’ConsiderHunkerDown’ "j" ’TryOverwatchOrReload’ "k" ’TryShootIfIdealOrReload’ "l" ’TryShootIfFavorableOrReload’ "m" ’TryShootIfFavorableOrOverwatch’ "n" ’TryHunkerDownOrShootIfFavorable’ "o" ’ConsiderHunkerDownOrOverwatch’ "p" ’TryShoot’ "q" ’TryShootIfIdeal’ "r" ’TryShootIfFavorable’ "s" ’TryReload’ "t" ’TryOverwatch’

Table 5.6: Set of Unit Decision used in Study 1.

Solution Space

Size of set of Unit Conditions c = 7 Size of set of Unit Decisions d = 20 Amount of Unit Decisions per AP f= 8 Tree depth p p (0,1,2) p0 p p2 Number of indices i at p i0 = 2 , i1 = 2 1 , i2 = 2 Total Unit Condition permutations per AP Tc =ci0 *(c − 1)i1 *(c − 2)i2 = 7*36*625 = 157500 Total Unit Decisions permutation per AP Td = d f = 2.56*1010 Total permutations per AP Tp = Tc * Td = 157500*(2.56*1010) =4.03*1015 Size of solution space Ssp =Tp2 = 1.62*1031

Table 5.7: Solution space size for study 1 50 Chapter 5. Experiment And Results

Evolutionary Operators Crossover Uniform crossover replaced the single-point crossover method used by the GA to produce a new generation of candidates. Due to the amount of linkage between elements of the chromosome (paths of Unit Conditions directly relate to specific Unit Decision indices), uniform crossover was chosen to treat each element fairly. Similarly to the pilot test, crossover can produce undesirable candidates by placing more than one Unit Condition within the indices of the string, while form- ing a decision path. A similar solution to the problem is implemented for this iteration. If a double Unit Condition is found at the index of an offspring candi- date, values of both parents at that index are checked to see if they are a desirable option, and if not a new random value is generated until a unique one is found.

Mutation Mutation follows closely to what has been presented before. However, due to the chance of random Unit Conditions being generated when crossover pro- duces undesirable candidates, two probabilities were used to control mutations. The indices containing Unit Conditions have a lower mutation probability than those containing Unit Decisions, as having to generate random Unit Conditions is a similar process to mutation. The probabilities used were 0.01 (1%) and 0.02 (2%).

Selection Roulette selection was again used and, for the first 3 generations of the study, a 10% elitism (10 candidates) model was employed. After those 3 generations, it was observed that many candidates were winning matches and not being able to try again, something which was also happening in the pilot test. Additionally, those candidates who did win and got selected by elitism, were not performing as well as expected.

Number of elite candidates Number of winning Number of elite Generation which won their candidates in previous candidates subsequent games generation 0 - - - 1 10 6 (60%) 20 2 10 6 (60%) 30 3 10 4 (40%) 30

Table 5.8: Success of elite candidates produced by the first 3 generations of Study 1

It seemed as though many of the elite candidates were produced due to the RNG element of XCOM 2, and were unable to repeat their successes over conse- 5.2. Study 1 51 quent matches (table 5.8). This could inhibit the evolution of good solutions, as assigning a high fitness value to a candidate which is not a good solution in most cases, allows that solution to deposit its UC&Ds throughout the population. If a candidate does win through a series of favorable outcomes, it will have the op- portunity to play again, and be assigned another fitness value - again distributing potentially unwanted UC&Ds throughout the population. It was felt that every candidate should have the opportunity to play another match if they are victorious, meaning that the actual amount of elite candidates taken from each generation should be variable - this method was termed ’Dynamic Elitism’ for the rest of the paper. This way each winner will be able to prove their fitness across multiple matches, and reduce the impact RNG can have on the evolution process. As the GA converges on a set of solutions, it’s possible for this method to produce generations that are completely comprised of elite candidates. As no further evolution is possible in this case, the amount of elite candidates should not exceed 50% of the population size.

Setup Unit health was adjusted, due to the observed impact that the previous value had on the solutions produced during the pilot test. As a result, each unit had their health pools doubled from 4 to 8. All other unit settings remained unchanged.

Fitness Evaluation The basic fitness evaluation method was essentially unchanged, except that it now could produce larger values due to the increase in unit health pools. This meant that a candidate would receive a score of at least 48 should it win a match, and the maximum possible fitness would be 96. However the evaluation process went through several changes. When the Dynamic Elitism was introduced at generation 3, thought was given to how this would work with the standard fitness evaluation. Firstly, given the variable nature of the fitness values assigned to candidates (The actual fitness are likely to be quite different, even when winning consecutive matches), the fitness should be averaged across generations for candidates selected via elitism, to lessen the impact of RNG. If a candidate wins 2 or more consecutive matches, it indicates that the solution has a higher probability of achieving those wins by being a good solution rather than by chance, and its fitness value should reflect this. As otherwise, any can- didate that wins due to favorable circumstances could have an unrepresentative fitness associated with it and be favored over candidates with a fitness averaged over multiple matches. 52 Chapter 5. Experiment And Results

To address this, a modifier was added to fitness scores of multiple winners. If a given candidate managed to win each of its first 5 games with a score of 48 (the minimum possible score for a win), then it’s average and representative fitness value would be 48. This means that any candidate winning its first match is likely to have a higher fitness value. It was decided that a candidate winning 5 consecutive games should never have a lower fitness than a candidate winning it first game. This modifier is not intended to represent the fitness of a candidate from an analytical point of view, it is more of a tool to give potentially good solutions an environment in which to thrive. Any candidate which loses a match, loses its modifier value to allow other solutions the chance to evolve. As the fitness value is how the GA determines elite candidates, solutions which won a few games in a row would always be considered elites. The fitness modifier is calculated by the following formula 5.2:

Fitnessmod = (Numberofconsecutivewins − 1) ∗ 12 (5.2)

This is applied up to a maximum of 5 consecutive wins, at which point the modifier remains consistent, otherwise fitness values could become so large that a single candidate could dominate the selection process. The value 12 was calculated from the maximum amount of successive wins considered by the fitness modifier, and the 48 fitness required to guarantee that a candidate winning its 5th consecutive match would always have a higher fitness than a first time winner. A record of the unmodified fitness values of candidates will be retained for analysis.

Procedure Overall, the procedure for this study follows precisely what has been described previously, except that the population size was increased to 100 for each generation produced, a 150% increase on the number generated for the pilot test. As the chromosome decision structure used here is more complicated than the one used in the pilot test, it was decided that to allow for more variation between candidates - and thus more potential directions in which to converge - the population size should be larger. The 100 candidates are evaluated over 6 generations in total, resulting in 600 evaluation matches being simulated.

5.2.2 Analysis of Data The introduction of the Dynamic Elitism method and adjusted fitness evaluation, was initially encouraging. There was a jump between generations 3 and 4, in both 5.2. Study 1 53 the number of wins and average fitness - not including the fitness modifier (figure 5.8). For generation 5, however, the amount of winning candidates fell from 45 to 37.

Figure 5.8: Average fitness % and win % of candidates per generation for Study One. Fitness average does not include modifier

When Dynamic Elitism was introduced for generation 3, the number of elite candidates increased to 36, of which 19 were successful in the matches simulated during generation 4. Generation 4 then produced 45 winning candidates, that’s 26 candidates winning who were non-elite candidates out of a non-elite pool of 64. The fact that of those 45 elite candidates, only 16 were able to win their subsequent match suggests that chance is having an impact here. The current used Dynamic Elitism model does allow candidates who win to continue to prove themselves, but it also allows candidates who achieved match victories though favorable dice rolls to breed and take up space in subsequent generations, the lower amount victories in generation 5 could result from the impact of fortunate candidates.

Number of winning Number of elite candidates which won Generation Number of elite candidates candidates in previous their subsequent games generation 0 - - - 1 10 6 (60%) 20 2 10 6 (60%) 30 3 10 4 (40%) 30 4 36 19 (53%) 36 5 45 16 (40%) 45

Table 5.9: Elite candidate performance during Study One. 54 Chapter 5. Experiment And Results

Given these fortunate candidates are likely to drop out in subsequent genera- tions, the evolutionary process can still evolve towards an optimal solution, how- ever, the process could potentially be made more efficient by adjusting the eval- uation procedure, by reducing the opportunities for ‘lucky’ candidates to become elite candidates. The GA produced several candidate AI’s which were capable of winning mul- tiple matches (table 5.10). 17 candidates won 3 or more matches (7 with 1 loss, 5 undefeated), including 3 candidates that won 4 and were undefeated, indicating that the decision structure allowed the GA to generate more robust solutions than during the pilot test.

Consecutive Number of Number of wins defeats candidates 2 1 10 3 1 7 4 1 2 2 0 11 3 0 5 4 0 2 5 0 0 6 0 1

Table 5.10: Elite candidate performance during Study One.

Unit Conditions and Unit Decisions When analyzing the occurrences of Unit Conditions within the strings which rep- resent the decision structure of candidate solution, their index within the string also must be considered. A Unit Condition at index 0 of a string will be a part of all 8 paths within the decision structure, those at indices 1 and 2 are in 4 paths, and those at indices 3, 4, 5 and 6 are in 2 paths (for action point 1 only). This fact is reflected in the overview of Unit Conditions contained within the 17 candidate solutions which won 3 or matches (figure 5.9), where an appearance of a Unit Con- dition in the string was weighted by the number of paths associated with its index divided by 2 (4, 2, 1). 5.2. Study 1 55

Figure 5.9: Graph showing the number of Unit Conditions contained within candidate solutions who won a minimum of one game at Study One

The Unit Condition "E" (a check to see if any other friendly units are using the ability ‘Hunker Down’) is poorly represented within the dataset. Part of the reason for this, could be attributed to the fact that only 2 of the 20 Unit Decisions have the ‘Hunker Down’ action node contained within its behavior, and one of those - "o" - was also poorly represented within the same dataset (figure 5.10). This would result in the Unit Condition "E", returning false the majority of the time, and be less useful. Another Unit Condition with a smaller representation is "B" (returns true if there is a visible enemy unit that can be killed with the normal hit damage AI units can do). This Unit Condition is the 2nd least represented of the Unit Conditions for AP2 and overall. The check that this Unit Condition performs and its return value have no real influence on the Unit Decision it results in, as the targeting system used when instructing a unit to shoot is based on a set of weights that includes, hit chance being above 50% and health remaining equal or below the minimal damage of 3, so an AI unit might not specifically shoot at an enemy unit who can be killed with one shot. 56 Chapter 5. Experiment And Results

Figure 5.10: Graph showing the number of Unit Decisions contained within candidate solutions who won a minimum of one game at Study One

At first glance the figure 5.10 above shows a high variance distribution of Unit Decisions within the successful candidates, and in many ways it is higher, as many of the Unit Decisions are based around the same action nodes. The Unit Decisions "p" and "h" both instruct a unit to shoot if it has a visible target, the only difference being that "h" will instruct a unit to reload if no targets are available. The fact that "p" is favored on 1st AP, further indicates that RNG could still influencing the evolution of some of the candidates. It is expected that AI units will prefer to move into positions on the AP1 and use abilities on AP2 - and it can be seen that all of the movement based Unit Decisions (characters "a" through "e") are better represented for AP1, with the exception of "e" (move to try and flank an enemy unit). Additionally, "l", "m" and "r" are based around telling a unit to shoot if it has a ‘favorable’ shot, each of these being well represented. It appears as though the shooting-based Unit Decisions are heavily favored by the GA, although this could be due to there being more of those than movement-based variants available when forming candidates. The shooting-based Unit Decision "q", which instructs a unit to shoot if it has an ‘ideal’ shot (chance to hit that is greater than 70%), is extremely poorly represented. This is most likely due to the hit chance requirement, as it has no backup action to instruct a unit to perform if no ideal shot is available. The nature of the shooting based Unit Decisions should also be considered, when a unit tries to shoot it’s because there is a visible target, that it has a chance to hit. So if a shooting action node doesn’t fail, there is a chance of a tangible payoff with respect to the fitness evaluation, as fitness is given for each point of damage done to enemy units. The payoff from a movement-based variant is wholly 5.3. Study 2 57 dependent on what follows after that movement. This could slow down the process of identifying optimal partial solutions that contain should contain movement- based Unit Decisions.

5.3 Study 2

The data gathered in Study 1 showed the evolution environment generally pro- duced more consistent candidate solutions than those produced during the pilot test. This study will investigate a modified dynamic elitism method, and a fur- ther refined set of Unit Conditions and Decisions, to see if they can help the GA to produce a similar number of candidate solutions capable of winning at least 3 matches in a row, but with a smaller population size, and thus fewer evaluations than it was able to, within the evolution environment of Study 1.

5.3.1 Design The implementation of the evolution environment followed closely to that which was used during Study 1. The alterations, and why they were made will be dis- cussed here.

Unit Conditions and Unit Decisions Two of the 7 Unit Conditions available to the GA in Study 1 were removed. They were the least represented within the successful candidates from that study, and as explained in the analysis of their data - they had limited meaningful impact on the decision making process, in the general case.

Representative String Behavior tree identifier Character "A" ‘HasHighHP’ "B" ‘OneEnemyVisible’ "C" ‘NoOverwatchingTeammates’ "D" ‘AllShotPercentagesAtOrAbove50’ "E" ‘IsFlanked’

Table 5.11: Set of Unit Conditions used in the Study 2.

The set of Unit Decisions went through a heavy revision based on the analysis of Study 1. Many of the similar variants were condensed, the balance between movement-based and shooting-based variants was evened out, and each variant was constructed to not return a failure in any circumstance. 58 Chapter 5. Experiment And Results

Representative String Character Behavior tree identifier "a" ’SelectMove_Defensive’ "b" ’SelectMove_Aggressive’ "c" ’SelectMoveFlankingOrAggressive’ "d" ’TryShootOrReloadOrOverwatch’ "e" ’TryOverwatchOrReload’ "f" ’TryShootIfFavorableOrReloadOrOverwatch’ "g" ’TryShootIfIdealOrReloadOrOverwatch’ "h" ’ConsiderHunkerDownOrMoveDefensive’ "i" ’TryShootIfIdealOrMoveFlankingOrMoveAggressive’

Table 5.12: Set of Unit Decision used in Study 2.

The movement-based variants best represented within the successful candi- dates generated during Study 1 are retained and represented by the characters "a", "b" and "c". The flanking movement behavior was able to fail should no flank- ing position be available. Now, the aggressive movement behavior is called should a flanking movement not be possible. The multiple ‘TryShoot’ and ‘TryShootIfFa- vorable’ variants available in study 1 have been condensed into Unit Decisions "d" and "f", given that each of them were very similar and no subsequent condition checks used within the variants were removed when condensing. To help with balancing the types of Unit Decisions, movement behaviors were used to create some variation and choice. For example "f" instructs a unit to shoot only if it has a hit chance greater than 70%, otherwise try to take up a flanking position, or move aggressively. For "h", units are instructed to attempt to use the Hunker Down ability, and should it not be in cover and thus not able to use the ability, it should move defensively.

Solution Space

Size of set of Unit Conditions c = 5 Size of set of Unit Decisions d = 9 Amount of Unit Decisions per AP f= 8 Tree depth p p (0,1,2) p0 p p2 Number of indices i at p i0 = 2 , i1 = 2 1 , i2 = 2 Total Unit Condition permutations per AP Tc =ci0 *(c − 1)i1 *(c − 2)i2 = 5*16*81 = 6480 Total Unit Decisions permutation per AP Td = d f = 4.3*107 Total permutations per AP Tp = Tc * Td = 6480*(4.3*107) =2.78*1011 Size of solution space Ssp =Tp2 = 7.7*1022

Table 5.13: Size of Solution space from Study 2. 5.3. Study 2 59

Selection Dynamic Elitism was designed to help candidates that win their matches become elite candidates, and be able to continue to contribute to the evolution of an optimal solution. However the continued impact of RNG leading to candidates winning with poor or highly situational solutions, could be reduced to improve the effi- ciency of the dynamic elitism model, by adjusting the previous fitness evaluation model.

Setup All unit variables have remained constant, with the exception of the health pool. AI and Enemy units will now have 10 hit points, up from 8. This change was made due to observations during previous studies, that a good turn for either side would essentially determine the outcome of the match. 8 hit points equated to 1 critical hit and 1 non-critical hit being required to kill a unit, incrementing to 10 required. To kill a unit in two shots, they would both have to be critical hits, which is less likely to occur.

Fitness Evaluation The basics of the fitness evaluation have remained the same. All damage done and all health remaining at the end of a match contribute 1 fitness. However due to the variability in performance, fitness is averaged through generations when can- didates are copied through generation due to elitism. There are further iterations required for this study, including increasing minimum baseline fitness for a victory to be 60, and the maximum being 120. The Fitness modifier - introduced to help candidates who have won multiple matches in a succession be more likely to be chosen for selection - also factors in the increase in health, with the modifier values being 20 for 2 wins, 40 for 3 wins, and 60 for 4 wins. As before should a candidate lose a match at anytime, the modifier is lost and its fitness is only represented by the average fitness of all matches played across generations. Any candidate solution winning its first match will now have to be evaluated a second time. A candidate winning both matches will have their base fitness values averaged, and then the modifier value for 2 wins is applied. Candidates winning and then losing have their base fitness averaged. This is intended to supplement the Dynamic Elitism selection model, by reducing the chance of poor solutions becoming elite candidates. The modifier is also applied immediately, as a candidate could win with a very high fitness and narrowly lose a match, and have a higher average fitness than a candidate who narrowly won 2 matches. 60 Chapter 5. Experiment And Results

Data Logging In addition the fitness value and match wins which have been logged during the previous candidate evaluation phases if the GA, the XCOM 2 mod was setup to output additional combat information to attempt to attain a deeper understanding candidate performance.

• Damage per hit This value represents the average amount of damage done on a successful hit, for all units of a team, within a single match. It is calculated by taking the total damage done to the enemy units, and dividing it by the number of shots taken by the AI units. It should be noted that the ’Overwatch’ ability damage contributes to damage done, but not to shots taken, meaning this value can be inflated by behaviors that make liberal use of ’Overwatch’.

• Turn count The amount of turns taken before a match ended, with a turn being consid- ered a succession of one AI turn and one AI opponent turn.

• Accuracy It is defined as the amount of attacks that resulted in a hit, divided by the amount of shots taken. This is calculated from the totals of all units at the end of a match.

• Cover per turn The value output ‘cover per turn’ takes each active units cover modifier per turn, and calculates an average, then these per-turn averages are then aver- aged over an entire match, as such this value provides an indication as to the effect of favoring cover in candidate solutions

Procedure The experiment procedure generally follows the same process. For this study can- didate solutions which win their first match are re-simulated, and their associated average fitness and modifiers are manually recorded. The population size was reduced to 50 from 100, as it is expected that the performance of the GA should improve sufficiently to produce comparable can- didates. Additionally, the decision to re-evaluate candidate solutions which win their first match, means that the amount of evaluations will be dynamic, so any potential optimization improvements will have to be offset by this. 5.3. Study 2 61

5.3.2 Analysis of Data The changes made to the evolution environment were designed to produce stable candidates more efficiently than those produced in study 1, by reducing the impact of poor solutions on the GA. The changes should initially produce fewer winning candidates and thus lower fitness values (when viewed as a percentage of total fitness possible, given the increase in unit health), but these values are expected to rise more quickly due to the GA being able to converge on more optimal solutions. The values in figure 5.11 indicate that indeed these values did decrease. In generation 0, only 4% of candidate solutions were considered winners (won first 2 matches), compared to the 20% achieved in generation 0 during study1. Average fitness was less affected, with the average fitness in this study being 25% of the maximum attainable, compared to 29% in Study 1.

Figure 5.11: Average fitness % and win % of candidates per generation for Study 2. Fitness average does not include modifier

The generational change in fitness value and match wins is also more consis- tent, with both increasing from generation to generation, unlike the performance observed during study 1, where these values could stagnate or even fall between subsequent generations. An 18% increase in wins and a 19% increase in the av- erage fitness percentage, can be seen between generations 0 and 5, This compares favorably with the candidates produced during study 1, which produced results of 16% and 17% respectively. 62 Chapter 5. Experiment And Results

Winning Candidates There were 30 candidates generated which won consecutive matches, and of those, 14 won 3 or more 5.14, compared with the 17 of 38 produced during study 1. Although fewer stable candidates were produced, they were produced by a GA evolving population size which was 50% smaller.

Consecutive Number of Number of wins defeats candidates 2 1 9 3 1 6 4 1 2 2 0 6 3 0 7

Table 5.14: Elite candidate performance during Study 2.

The breakdown of Unit Conditions contained within the 14 candidates win- ning 3 or more matches (figure 5.12) shows each of them to be reasonably evenly distributed, when looking at both the totals and appearances per action point. Although "A", "B" and "E" are slightly better represented than "C" and "D". Indi- cating that each of these Unit Conditions have some value, and should be able to contribute towards producing stable candidate solutions.

Figure 5.12: Occurrence of Unit Conditions within candidates winning 3 or more consecutive matches at Study 2

Unit Decisions went though a large revision for this study, and their representa- tion within the stable candidate solutions produced (figure 5.13), shows that these changes had an immediate effect on how the GA utilized them to form the stable solutions. It can be seen that most of the Unit Decisions are now being favored 5.3. Study 2 63 for a particular action point. Each of the movement based variants ("a", "b", "c") are now heavily favored for AP1, with the shooting based variants ("d", "f", "g") favoring AP2. It can also be shown that each of the Unit Decisions is well represented within the sample space, though they appear to have low total occurrences within the stable candidates. The tendency to heavily favor one action point over the other ("a", "b", "h"). Indicating that the placement in one action point or another of these Unit Decisions is what gives them their value.

Figure 5.13: Occurrences of Unit Decisions within candidates winning 3 or more consecutive matches at Study 2

In stark contrast to this, is the Unit Decision represented by the character "i". The newly created variant introduced for this study, instructs a unit to shoot if it has a 70% or greater chance to hit an enemy unit, or move to a flanking position, or if that is not possible move aggressively. Given this Unit Decision ends in either movement and shooting action nodes, it’s perhaps not surprising that it is well represented for both AP1 and AP2, although the condition gating the shooting action node will ensure that the movement action node is run more often. The extent to which "i" was favored could also indicate another potential method to help the GA produce better candidate solutions. Perhaps more of these variants could be created when trying to ensure that a given Unit Condition is not able to return false (no action executed). They could offer alternatives to the current set which attempt to group similar or related action nodes together (’Shoot’, ’Over- watch’, ’Reload’), to create Unit Decisions that are perhaps able to handle changing game situations. 64 Chapter 5. Experiment And Results

Re-evaluation There were 36 candidate solutions which after successfully winning their initial matches, failed to win their subsequent matches.Table 5.15 displays combat in- formation about these how these candidates performed, broken down between the two matches, and compared against the combat information off all matches played.

Average of Average of Average of Match 1 Match 2 Difference Difference all played all matches all matches (Win) (Loss) (absolute) % matches won lost Damage / 3.50 3.30 0.20 -5.8 3.37 3.48 3.31 Hit Turn 13.28 12.56 0.72 -5.4 13.55 13.52 13.57 Count

Accuracy 0.86 0.61 0.25 -29.4 0.67 0.76 0.62

Average 1.64 1.58 0.06 -3.7 1.60 1.64 1.59 Cover/turn

Table 5.15: Combat performance information about candidates that failed to win consecutive matches.

As expected, accuracy, damage and inclination towards favoring cover all fall between matches 1 and 2, however the biggest difference can be seen in the ac- curacy of the candidates, where they performed with 29.4% reduced accuracy on average. This is likely due to the fact that the average accuracy attained during each first matches was 86%. This shows a 19% increase over the average accuracy attained by candidates during all matches played, and a 10% increase over the av- erage of all matches in candidates won. During match 2, candidates on average had an accuracy that was 5% higher than the average for all lost matches, and only 6% lower than the average for all matches played. This data could indicate that favorable RNG impacted the results of matches which were won, rather than poor luck in the subsequent re-evaluation matches. Additionally, it can be seen that the other items of data extracted seem to not have a strong correlation with the match results, with all average values being similar no matter the outcome.

5.4 Study 3

This chapter will detail the investigation of a restriction to the structure of the chromosome used by the GA, that is designed to reduce the size solution space. It’s expected that this restriction should produce more successful and stable candidates than those found in Study 2. 5.4. Study 3 65

5.4.1 Design The only redesigned elements of the evolution environment were the chromosome structure, and associated evolutionary operators

Chromosome This iteration of the chromosome focused on further restricting the decision struc- tures housed within, to primarily reduce to solution space, but to also provide a more rigid structure for the GA to work with, such that a crossover of a Unit Deci- sion has a better chance of still being meaningful at its location within a generated offspring candidate. Within the previous binomial decision structure, the only restriction was that no character representative of a Unit Condition could appear more than once along any path. That restriction still remains, with the added restriction that each tree will be made from exactly 3 Unit Conditions. Using only 3 Unit Conditions means no matter which order you place them within a binomial tree, if the rule that no more than a single Unit Condition can reside on any given path is respected, then the ordering of the characters makes no difference to the outcomes of the decision paths (figure 5.14).

Figure 5.14: Example chromosome structures showing the irrelevance of the order of the Unit Conditions

As order is not important, the size of the current set of Unit Conditions (r), and the number of Unit Conditions required to form a decision structure (n), can be used with the equation shown in 5.3 to calculate the total possible decision structure combinations. Using the same set of Unit Conditions as previously (r = 5) the total amount of possible combinations Unit Conditions is 10. Because the combinations ‘ABC’,‘CBA’ or ‘BAC’, are considered the same, each potential combination is restricted to alphabetical order when considering them as decision structures, reducing the sample space for this part of the chromosome to 10 from 6480 during Study 2. 66 Chapter 5. Experiment And Results

n! (5.3) (n − r)!(r!)

Evolutionary Operators Crossover The Crossover operator for Unit Decision indices of the chromosome remains unchanged, however some implementation changes were required to handle the new decision structure. Although there are only 10 possible variations possible, it was decided to continue with the principle of uniform crossover, rather than sim- ply swapping each parent candidate’s entire decision structures. As such, uniform crossover is applied to each of the 3 characters, following the rule of no repeated characters. Similarly to before, if no valid character is available from the associated indices of either parent candidate, a random valid Unit Condition is selected. Once crossover is applied to an entire string, the decision structure sections are sorted into alphabetical order, such that it takes the form of one of the 10 possible decision structures chosen.

Mutation The actual implementation of the mutation operator remained unchanged. How- ever given the limited amount of potential decision structures available, and the fact that crossover has a chance to essentially mutate them when it has to generate random Unit Conditions, the mutation probability for the decision structure ele- ments of the chromosome is set to 0. The chance for mutation to occur on Unit Decisions remains unchanged at 2%.

Solution Space

Size of set of Unit Conditions c = 5 Size of set of Unit Decisions d = 9 Amount of Unit Decisions per AP f= 8 Total Unit Condition permutations per AP Tc = 10 Total Unit Decisions permutation per AP Td = d f = 4.3*107 Total permutations per AP Tp = Tc * Td = 10*(4.3*107) =4.3*108 Size of solution space Ssp =Tp2 = 1.8*1017

Table 5.16: Study One’s Solution space size. 5.4. Study 3 67

5.4.2 Analysis Of Data The average percentage of maximum fitness and win percentages per generation (figure 5.15), show a general improvement of those attained during study 2. In particular a 50% increase in win percentages can be observed, for example the win percentage for generation 5 in studies 2 and 3, was 22% and 44% respectively. The increase to the average fitness percentage was improved to a lesser degree, with each generation returning with a 5-10% improvement compared with Study 2.

Figure 5.15: Average fitness % and win % of candidates per generation for Study 3. Fitness average does not include modifier

There appears to be a large increase in both metrics between generations 1 and 2, resulting in a decrease in average fitness, and no increase in win percentage between generations 2 and 3. Similar occurrences were observed from the overview of results obtained during study 1, and the was attributed to candidates benefiting from favorable RNG, and led to the decision to re-evaluate candidates winning for the first time. Here it appears that the offspring candidates generated from the candidates found in generation 2, were not good solutions in general. 15 winning candidates were found in generation 2, 11 coming from the offspring candidates produced from generation 1, and 4 elite winners. 7 of the 11 candidates won their 3rd matches as elite candidates during generation 3, and 2 of the 4 elites from generation 2, also won. As generation 3 had a total of 15 winning candidates, the total wins coming from newly formed offspring candidate solutions was only 6, and 9 coming from elite candidates. 68 Chapter 5. Experiment And Results

Winning Candidates There were a total of 46 candidates which won 2 or more consecutive matches, 21 of which did so without losing a match. Of these 26 candidates were able to win 3 or more matches, with 14 suffering 1 defeat and 12 remaining undefeated (table 5.17). This shows an improvement in successful candidates produced during this study when compared to that of study 2.

Consecutive Number of Number of wins defeats candidates 2 1 11 3 1 8 4 1 2 5 1 2 6 1 1 8 1 1 2 0 9 3 0 10 4 0 1 7 0 1

Table 5.17: Elite candidate performance during Study 3.

In addition, it seems as though these candidates also show improved stability, as there were candidates winning, 4, 5, 6, 7, and 8 consecutive matches, Indicating that their behaviors are consistent in various scenarios, and less sensitive to random events or chance. The Unit Conditions contained in the successful candidates, show a fairly even distribution (figure 5.16) between the representative characters "B", "C", "D". While those represented by "A" and "E" feature more prominently, favoring AP1, and AP2 respectively. In general the distribution of of Unit Condition within candidates did not alter that much compared with Study 2. 5.4. Study 3 69

Figure 5.16: Occurrence of Unit Conditions within candidates winning 3 or more consecutive matches at Study 3

The distribution of Unit Decisions (figure 5.17) was also similar to Study 2, with movement-based variants preferred on AP1, and shooting-based variants preferred for AP2. These similarities in distributions, between studies could indicate that the restrictions to the chromosome structure was successful in helping to reduce the solution space, and that the reduction didn’t have an obvious impact how the solutions were formed.

Figure 5.17: Occurrences of Unit Decisions within candidates winning 3 or more consecutive matches at Study 3 70 Chapter 5. Experiment And Results

Re-evaluation The results show that 22 candidate solutions won their first matches and suffered defeat when they were re-evaluated. The combat data obtained during the evalu- ation of these candidates can be seen in Table 5.18. On the surface, ‘Damage per Hit’ and ‘Average Cover’ seem to continue show show little variability in relation to the outcome of a given match. Although there is a larger decrease between the average turns taken, for these candidates than was seen in Study 2.

Average of Average of Average of Match 1 Match 2 Difference Difference all played all matches all matches (Win) (Loss) (absolute) % matches won lost Damage/ 3.42 3.53 0.11 3.2 3.56 3.60 3.50 Hit Turn 16 14 2 -12.5 11.77 11.45 12.01 Count

Accuracy 0.64 0.55 0.09 -14.1 0.65 0.67 0.62

Average 1.82 1.83 0.01 0.5 1.81 1.80 1.82 Cover/turn

Table 5.18: Combat performance information about candidates that failed to win consecutive matches.

During study 2 it was suggested that the values of accuracy obtained from candidates - which won the first and lost the second of their two matches played - indicated that the RNG had been favorable during match 1, and that during match 2 the accuracy attained was what was expected given the population means. Here it can be seen that the average accuracy during match 1 was in line with population means, and the average accuracy from match 2 fell below the population means of even all lost matches. It contradicts the assertion from Study 2.

5.4.3 Final Evaluation The data indicates that Study 3 was able to produce more successful and stable candidates than those from Study 2. To support this, the 5 fittest candidates from the final generation of each study were further evaluated up to a total of 10 matches each against the default AI. The overview of the results of these evaluations in table 5.19 show that the fittest candidates from study 3 won more matches in general. With the average amount of wins per candidate increasing from 6.8 to 8.2 for studies 2 and 3 respectively, supporting the suggestion that there is an improvement in candidate performance. 5.5. User Testing 71

Study 2 Study 3 Candidate Wins Average Fitness Candidate Wins Average Fitness 1 8 76.2 1 7 72.4 2 5 60.8 2 9 74.1 3 7 68.5 3 9 77.6 4 6 69.1 4 8 74.1 5 8 72.7 5 8 74.9 Average 6.8 69.46 Average 8.2 74.62

Table 5.19: Combat performance information about candidates that failed to win consecutive matches.

To see if the improvement could be described as statistically significant, a ‘Wilcoxon rank sum test’ was conducted on the average fitness’ of each candidate. This test investigates a null hypothesis that the medians of each set of samples are equal, and the alternative hypothesis states that the median of study 2 is less than the median of study 3. The resulting P-value, is the probability of observing the given result, or one more extreme, by chance if the null hypothesis is true. To be statistically significant the P-value should be less than the alpha value of 0.05.

alpha Value p Value Null Hypothesis Fitness 0.05 0.0675 True Wins 0.05 0.0714 True

Table 5.20: Results from the Wilcoxon rank sum test, comparing candidates from study 3 against those from study 2 .

The ranksum test was conducted in Matlab1 for both the wins candidates achieved and their fitness (see table 5.20). In both cases the null hypothesis that the medians of each sample set were equal could not be disproved, however the p-values 0.06 and 0.07 are very close to the significance level defined by the alpha value, indicating that there was an improvement, even if it was not statistically significant.

5.5 User Testing

Motivation Having identified the methodology used during Study 3 produced the most suc- cessful and stable candidates, a selection of the most ‘evolved’ of these were pitted against real players, and their performance analyzed. This was necessary in order to judge the potential that the proposed method has in providing another useful tool for developing game AI.

1Mathworks, http://se.mathworks.com/products/matlab/ 72 Chapter 5. Experiment And Results

Expectations Based on the observations of the previous study in the process and the fact that the AIs trained against the default AI - which itself is not representative of the challenge posed to players by XCOM 2 - it is not expected for the generated AIs to be extremely difficult for human players to defeat. However, they should provide superior performance than the default AI. Additionally, it is expected that the candidate AIs should be able to win at least 1 game against test candidates.

Setup The 5 candidate AI’s with the highest average fitness of generation 5 from Study 3 were selected to play against human opponents - although there were candidates which may have won more games, and then eventually lost in previous genera- tions. These were chosen as they are more representative of the quality of solutions produced on the most evolved point of the methodology available, and thus any potential applications. To validate the performance of these candidate AIs relative to the default AI, the latter will be the 6th AI that the players will face. Since the mod was initially configured to perform simulations between com- puter controlled opponents, changes had to be made in order to allow human players to regain control of their units. The effect responsible for running behav- ior trees on the player units was adjusted to no longer perform this functionality. Along with this change, the fitness evaluation was adjusted to track the AI oppo- nents units rather than the player units. In terms of physical setup, it was decided to perform the experiment in a com- fortable environment for the players, one that would provide the optimal circum- stances for them to concentrate. Inspired by competitive chess tournaments, efforts were made to reduce, background noise, provide sufficient lighting, and personal comfort. As such, a well lit and somewhat soundproofed room was utilized. In- side, the two simultaneous participants would be positioned back to back. Two computers running the experiment game platform were provided for the partici- pants respectively and the test would then begin.

Candidate Demographics The pool of 10 participants feature ages ranging from 23 to 31 and as such, are fitting the general target audience for testing video game products. This segment is comprised of young adults that have had contact with computer technologies in some capacity and are capable of understanding the basic game mechanics re- quired for participation in the experiment. All participants have reported playing games among other leisure activities and could even recall at least one turn-based tactics game that they have played in the past. 5.5. User Testing 73

Figure 5.18: Playtime differences between XCOM2 and XCOM Enemy Un- known/Enemy Within

Of all participants, 40% have not played XCOM 2 at all, 30% have played some- where between 1 and 20 hours, 10% played between 20 and 60 hours and 20% had more than 60 hours experience (figure 5.18) . Also, because the gameplay is quite similar between X-Com Enemy Unknown/Enemy Within and XCOM 2, it was po- tentially relevant to ask about experience with the prequel as well. As such, the percentages are mostly the same with the exception of participant 5, which had more experience with XCOM EU/EW than XCOM 2.

Methodology Two participants would play simultaneously, positioned back to back, using two computers capable of running the game. The first step would be filling out a questionnaire (See Appendix D.), which gathered data about age, past TBT gaming experience, past XCOM EU/EW and XCOM 2 play times. The first two items are designed to get an impression of a participant’s overall experience, while the last two are aimed specifically at potential correlations between experience and competence. Upon completing the questionnaire, the participants would begin their series of 6 games, always starting with the Default AI setting and then randomly switching between the remaining behavior trees until all have been played once. The Default AI was selected to be encountered first, to not bias the data in favor of the candidate AI’s, as test participants who are less familiar with XCOM 2 could potentially gain increased aptitude as their experience with the game increases. A short break was allowed between games, partly due to data recording that was performed and partly due to the intensity of the matches, which appeared to have some exhausting effects on test participants, primary due to match lengths exceeding 30 minutes. 74 Chapter 5. Experiment And Results

Results and Analysis Overall, the behavior trees scored fitness values above 0, which suggests that re- gardless of their performance, all the AIs managed to present some degree of chal- lenge to the players. Additionally, most matches seem to take around 10 turns to reach resolution, with the exception of games in which the AI wins, which appear to take around 12 turns.

Average Average Victory Average Average Winner Winner Rate Fitness Turns Fitness Turns Default 0% 29.1 0 9.6 0 ACEcagcficiBCEdagafdfc 20% 39 78.5 10.4 12 ADEfcccbfbfBCEdbeggffg 10% 38 81 9.5 12 BCDgdcdaihfADEigidfdeg 50% 60.9 85.2 10.2 12 BCEcaccaaidADEdbgdbbge 0% 27.1 0 9.6 0 ACEiaabficbADEiaggfgfg 0% 27.6 0 9.3 0

Table 5.21: Results from the User Testing performed on the 5 best BTs evolved and on the Default AI.

Out of the 5 candidate behavior trees tested, 2 lost all games played and the other 3 managed to win at least 1 match against test participants. The default AI failed to win any match, as expected. The average fitness of the AIs which failed to win a match (Default AI and 2 candidate AIs), was 30 (50% of number required to win). 2 of the 3 candidate AIs with victories achieved an average of around 40 points and with the most success- ful candidate AI having achieved an average fitness of 60 points. The percentage of victories achieved by the 3 candidate AI’s which managed to win at least one match were 10% and respectively 20%, while the most successful one boasts a 50% victory rate. From the test participants’ perspective, there is a strong correlation between the reported play times of XCOM games and AI victory rates. Players with a reported playtime of above 20 hours have achieved 100% victories, while players below that mark are the ones most susceptible to the tactics employed by the candidate AIs, with each of them suffering at least 1 defeat to a candidate AI. A similar correlation can be observed in relation to the average enemy fitness, which shows higher AI fitness values for the inexperienced players and lower for the experienced ones (figure 5.19). 5.5. User Testing 75

Figure 5.19: Correlation between participants play times in descending order and combat outcome statistics

A two-tailed Wilcoxon rank sum test was conducted to determine if the candi- date AI’s showed a statistically significant increase in fitness value compared to the Default AI (table 5.22). The null hypothesis states that the default and candidate AI should provide equivalent performance and the alternative hypothesis states that the candidate AIs should show increased performance. The alpha value was 0.05. The candidate AI represented by the string “BCDgdcdaihfADEigidfdeg”, achieved a p-value 0.0057 value, disproving the null hypothesis. This is the only candidate AI which can be said to have shown a statistically significant improvement in per- formance compared to the default AI.

Mean Standard Variance p-Value Fitness Deviation Default 29.1 107.43 8.76 - ACEcagcficiBCEdagafdfc 39 510.22 17.57 0.4268 ADEfcccbfbfBCEdbeggffg 38 377.11 15.56 0.2408 BCDgdcdaihfADEigidfdeg 60.9 756.98 20.78 0.0057 BCEcaccaaidADEdbgdbbge 27.1 239.43 13.16 0.4715 ACEiaabficbADEiaggfgfg 27.6 174.93 11.88 0.9698

Table 5.22: Results from the User Testing performed on the 5 best BTs evolved and on the Default AI.

Discussion and Conclusions Ultimately the results confirm that there was a general improvement in perfor- mance of the candidate AI’s over the the Default AI when analyzed with respect to 76 Chapter 5. Experiment And Results their fitness. With 3 of the 5 candidate AIs achieving higher fitness values on aver- age, and 2 with lower average fitness having quite similar values. Despite this, only 1 candidate would be confirmed to perform significantly better than the defaults, meaning it is not possible to confirm the expectations of this test completely.

Figure 5.20: Most evolved candidate AI: ’BCDgdcdaihfADEigidfdeg’ UC&D breakdown

It is clear from the breakdown of the most successful candidate AI (figure 5.20), that the success of its behavior might come out of how many of its Unit decisions are damage dealing or offensive abilities. From the AI’s perspective, it is true that the fitness value has always promoted dealing damage as the scoring factor and as such, it is clear that is exactly the result that is seen here. Over the entire process of optimizing this AI generation technique, the largest optimization criteria was always the fitness calculation which was always based on the AI’s capability of defeating its opponent, which can only be done by dealing damage, as is dictated by the environment. Chapter 6

Discussion and Conclusion

6.1 Discussion

The development of the methodology described in this paper, produced system which showed consistent improvements in the quality of the AI generated and the amount of evaluations required to generate them. Each iteration highlighted unforeseen issues, or potential directions to optimize the process, and generally the implementation was adjusted to address these. Despite the overall increase in quality of the generated AIs, it is difficult to infer from the data what impact, if any, some of these adjustments made towards their intended goal. Many decisions which instruct an XCOM 2 unit to perform an action are sub- ject to the outcome of a dice roll. This infers variability in evaluation of candidates, from a combat performance perspective, and meant that the true quality of a can- didate AI would only reveal itself over a series of evaluations. Typically, in evolu- tionary algorithms, a fitness value is a fixed evaluation of the fitness of a candidate, however, this application required consideration of the variable nature of fitness.

6.1.1 Dynamic Elitism The Dynamic Elitism method introduced during study 1, was designed to allow candidate AIs to play again should they win, even if their fitness value was below what was needed to be considered an elite candidate. Allowing each candidate AI that won a game to be considered an elite candidate created further issues, as either AI might receive favorable dice rolls during an evaluation match, and win when their contained behavior normally wouldn’t. Re-simulating a match each time a generated AI wins its first match, while helping produce promising results over time, proved to be a poorly considered adjustment that could be improved. The assumption made was that winning suc- cessive matches indicated that a candidate AI was ‘sufficiently likely’ to be of better

77 78 Chapter 6. Discussion and Conclusion than average quality. Not only does this solution not account for candidate AIs re- ceiving poor dice rolls, it also, by its nature, reduces the efficiency of the whole methodology. An improvement could be made by setting up a system that can review combat data from a match and flag those matches which data indicates that there was an anomaly, as only they should be re-simulated.

6.1.2 Fitness Function Another issue was that candidate AIs winning multiple successive matches re- duce the potential impact that chance can have on average fitness attained, making them more representative of the solution’s ‘quality’, but this was not reflected in the evaluation of candidates. The fitness function was adjusted to favor successive winners, but how well it performed was difficult to analyze. Overall, it’s a worth- while consideration and didn’t noticeably impede production of higher quality solutions. However, higher values of the modifier applied to winning candidates meant faster algorithmic convergence. Therefore, the application of this is to be considered in the context of what the intended effect was. The fitness evaluation method presented other issues. In its basic form, it re- wards solutions that instruct units to deal damage, but there is no way to tell if the fitness gained from doing that damage was the result of a well positioned (within the AI decision structure) movement action, or an effect of volume of fire tactics. The fitness function should evaluate with respect to what is desired. If a generic ‘best AI behavior’ is desired, perhaps the fitness function could reward candidates whose combat data shows particular trends that can be attributed to tactical behavior. For example, aggressive movement could be a desired trait of an ideal solution, so the rate at which a candidate AI moves to flanking positions could be represented somehow within the fitness function. This could in turn lead to a potential simultaneous evolution of different “species” of behaviors that may cooperate as a result of their evolution environment, rather than being reliant on other game systems to allude at this emergent group intelligence.

6.1.3 Chromosome Structure The binomial decision structure employed was limited in scope, only having 16 potential decision paths, and was not expected to produce solutions which could out-maneuver an experienced human player, even considering that the solution spaces produced initially were large. Each restriction made to the structure of the chromosome reduced the sample space, but at the same time reduced the max- imum potential ‘quality’ of a candidate solution. The decisions taken to restrict the decision structures were taken with logical consideration, to always provide as much flexibility. The 10 potential structures available in study 3 produced the highest overall 6.2. Conclusion 79 amount of successful candidates, with some of those being capable of defeating novice human opponents, as was presented by the user test. It is likely that, should the methods from studies 2 and 3 be evaluated for further generations, the candi- dates from study 2 would begin to show improved performance as well.

6.1.4 Unit Conditions and Decisions Another contributor to the size of the solution space was the sets of Unit Conditions and Decisions. The initial sets were constructed from what was commonly used in the default AI configuration file, and perhaps a more careful selection and design of additional entries would have led to higher quality analysis of the methodological aspects discussed above. These sets exist entirely within the context of XCOM 2 and, for an exhaustive genetic algorithm search, the entire set of behaviors in the configuration file could be considered, with a clear chance of achieving a ‘quality’ solution. However, in the context of this project, it was unrealistic and the restriction to the more select options used in studies 2 and 3 should have been identified earlier in the process. Nonetheless, there are no actual limitations imposed by the system in regards to the choice of UC&Ds, so the overall potential of optimization is this area is entirely dependent on the design and implementation of the game.

6.2 Conclusion

Optimization of the methods used to generate AIs for XCOM 2 was a complicated problem. Even in the restricted version of the game created for evaluating can- didate AIs, an intimate knowledge of the game’s system were required in order to conceive for ways to streamline the process. The research presented showed that, in it’s current setup the system was capable of producing a candidate ca- pable of defeating novice human opponents, and that this candidate was formed after approximately 300 match evaluations. In campaign mode, players of XCOM 2 can expect to encounter around 200 pods of enemies, and as such the system is incapable of currently providing adequate AIs within this context. There are many games in which 300 encounters barely scratches the surface, and it can be said that the research presented here shows that it could be possible to generate AIs during the natural progression of some games. But context is vital to the success of a GA employed to solve this kind of problem, as the evolution environment is shaped by it. The way any given game works impacts everything, and as such it cannot be said that this project was able to provide a solution to its problem statement. However, it is felt that the dynamic elitism and solution space reduction methods showed encouraging results in their respective goals, and could potentially be used in any future work conducted into how to optimize the 80 Chapter 6. Discussion and Conclusion evolution of AIs with a GA, for a game where outcomes heavily dependent on RNG. Chapter 7

Future Directions

7.1 Further Development

In XCOM 2 a complete rendering of its combat scenarios is required due to the game’s calculations being performed based on physics, as well as units statistics. A game in which the visual layer is just a rendering of actions with calculations performed in the background would likely provide a more suitable platform for this type of research. Any further work with this system would require solving the problem of fully automating the evaluation process so that simulations would continue until a certain termination condition was met.This would provide the freedom to quickly explore and iterate through different strategies, and allow de- velopment of a wide array of potential applications. It was suggested in the discussion section that the re-simulation of winning can- didates could be improved by identifying anomalous combat statistics, and only re-evaluating those matches. An investigation was conducted using the combat data gathered during studies 2 and 3 of this project, where pattern classification methods were applied to try and begin identify matches which should have been re-evaluated (See Appendix E.). It showed some potential With an automated evo- lution procedure, training classifiers to identify anomalous data could be possible. Evolving AIs from a set of completely randomly formed characters is perhaps not going to be useful in a commercial application. The initial generations are likely to offer very little challenge to players, and with a system capable of automating the evolution procedure, there are potential workarounds should be they be required. For example, a game could ship with an array of semi-evolved candidate solutions, and each time a player begins a campaign, the initial generation is selected from this array, and from there they evolve organically. This could allow developers to determine the initial challenge offered to players, and allow for the production of AIs alongside the natural progression of a game. The evolution of AIs for this project required evaluation against a fixed Default

81 82 Chapter 7. Future Directions

AI. Though sufficient for the scope of the work, evolving solutions against a single AI is likely to produce solutions which are over-trained in solving the problem presented, and this could result in the generated AI’s not being adaptive enough to handle alternative tactics. Thus, any system which aims to produce AIs which are capable of defeating human beings employing a variety of strategies, should consider varying the tactics of the opponents faced by the candidate AI’s.

7.2 Alternative Directions

Video game genres that favor cyclic gameplay can possibly incorporate the evolu- tion of BTs as part of the cycle, so that with each new iteration, the AI opponents use different strategies, based on the same root. Adding in some of the improve- ments mentioned earlier could lead to interesting ever-changing gameplay, creating an infinite problem for players to solve. Furthermore, it could be interesting to al- low players to interact with this mechanic. This might not sound like a very active game mechanic, but it does not have to be the main one either. Extrapolating from this concept of player generated BTs leads to something resembling “bot” compe- titions for strategy games, which is an idea that has seen very little concrete game implementations, likely due to its very niche popularity. Perhaps a game that plays similar to XCOM 2 while the player is engaged, but is about evolving AIs through simulated battles while he is away from the game, could be worth developing within the context of Massive Multiplayer Online games, thus considering the high computational demands of such a design. The potential of using this mechanic as a way of constantly shifting the meta-strategy of the game presents yet another intriguing possibility for developers to create diversity. This could take advantage of the best of both worlds, allowing for intense tactical combat scenarios, deeply strategic AI evolution, as well as allowing the player to perhaps “teach” tactics to an AI. Bibliography

[1] J Champandard Alex. “Behavior Trees for Next-Gen Game AI”. In: Game Developers Conference, Lyon, France. 2007, pp. 3–4. [2] J Champandard Alex. Understanding behavior trees. Online, September 2007. [3] J Champandard Alex. Using decorators to improve behaviors. 2007. [4] Entertainment Software Association, Ipsos Insight, et al. Essential facts about the computer and video game industry: 2015 sales, demographic and usage data. Entertainment Software Association, 2015. [5] T Bullen and M Katchabaw. Using Genetic Algorithms To Evolve Character Be- haviours in Modern Video Games. [6] Jonathan Byrne, Michael O’Neill, and Anthony Brabazon. “Optimising Of- fensive Moves in Toribash Using a Genetic Algorithm”. In: Proceedings of the Sixteenth International Conference on Soft Computing (MENDEL). 2010. [7] Nicholas Cole, Sushil J Louis, and Chris Miles. “Using a genetic algorithm to tune first-person shooter bots”. In: Evolutionary Computation, 2004. CEC2004. Congress on. Vol. 1. IEEE. 2004, pp. 139–145. [8] Daniel W. Dyer. "Evolutionary Computation in Java. A Practical Guide to the Watchmaker Framework.". 2008. [9] John H Holland. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. U Michigan Press, 1975. [10] Damian Isla. “Handling complexity in the Halo 2 AI”. In: Game Developers Conference. Vol. 12. 2005. [11] John R Koza. Genetic programming: on the programming of computers by means of natural selection. Vol. 1. MIT press, 1992. [12] John Krajewski. “Creating all humans: A data-driven AI framework for open game worlds”. In: Gamasutra (February 2009) (2009). [13] François Dominique Laramee. “Genetic algorithms: Evolving the perfect troll”. In: AI game programming wisdom. Charles River Media (2002), pp. 629–639.

83 84 Bibliography

[14] Chong-U Lim, Robin Baumgarten, and Simon Colton. “Evolving behaviour trees for the commercial game DEFCON”. In: Applications of evolutionary com- putation. Springer, 2010, pp. 100–110. [15] Sean Luke. Essentials of Metaheuristics. second. http://cs.gmu.edu/∼sean/ book/metaheuristics. Lulu, 2013. [16] Michael Mateas and Andrew Stern. “Façade: An experiment in building a fully-realized interactive drama”. In: Game Developers Conference. Vol. 2. 2003. [17] Alexander Nareyek. “AI in Computer Games”. In: Queue 1.10 (Feb. 2004), pp. 58–65. issn: 1542-7730. doi: 10.1145/971564.971593. url: http://doi. acm.org/10.1145/971564.971593. [18] Bentley James Oakes. “Practical and theoretical issues of evolving behaviour trees for a turn-based game”. PhD thesis. Citeseer, 2013. [19] Sergio Ocio. “Adapting AI Behaviors To Players in Driver San Francisco: Hinted-Execution Behavior Trees”. In: Eighth Artificial Intelligence and Interac- tive Digital Entertainment Conference. 2012. [20] Petter Ogren. “Increasing modularity of UAV control systems using com- puter game behavior trees”. In: AIAA Guidance, Navigation and Control Con- ference, Minneapolis, MN. 2012. [21] Michael O’Neill and Conor Ryan. “Grammatical evolution”. In: IEEE Trans- actions on Evolutionary Computation 5.4 (2001), pp. 349–358. [22] Diego Perez et al. “Evolving behaviour trees for the mario ai competition us- ing grammatical evolution”. In: Applications of evolutionary computation. Springer, 2011, pp. 123–132. [23] Ricard Pillosu. “Coordinating Agents with Behavior Trees”. In: Paris Game AI Conference. 2009. [24] Tim Schreiner. "Artificial Intelligence in Game Design.". Artificial Intelligence Depot. http://ai-depot.com/GameAI/Design.html. 2009. [25] Delmer Stephan. “Behavior Trees for Hierarchical RTS AI”. In: 2012. Appendices

A. Extra Content

On the attached CD, readers can find additional content relevant to the under- standing of the implementation of this project’s test platform. We provide the Java code utilized for generating behavior trees, as a stand-alone Eclipse IDE1 project, and the modification files used to alter XCOM 2, as a custom Visual Studio IDE2 project. Additionally, we provide the source code to XCOM 2, for any additional relevant code references that may have been omitted, as well as the unmodified configuration (“*.ini”) files. Unfortunately, due to XCOM 2 being a commercial game, the actual use of the mod is dependent on ownership of the game and its installation being present on the target computer via a distribution platform, such as . However, as mentioned earlier, the relevant “*.uc” and “*.ini” files can still be viewed using any simple text editor like Notepad++3. Given the large amount of folders provided, we recommend using any available search function- ality to navigate to any of the files referenced in the following Appendix sections, as well as finding relevant sections of code within those files. The bonus content CD also features the raw experiment results sheets, under the folder “Experiment results”. The files can be explored with any “Microsoft Office Excel” type software. Furthermore, the introductory audio-video production accompanying the project can be found on the bonus content CD. 1The Eclipse Foundation, https://eclipse.org/ 2Mircrosoft Corporation,https://www.visualstudio.com/ 3Notepad ++, https://notepad-plus-plus.org/

85 86 Bibliography

B. Unit Condition Implementation

The implementation of the conditions can be explored in the file ”XGAIBehav- ior.uc” which is part of the XCOM 2 source code, in the” XCOM 2 source code” folder. C.. Unit Decision Implementation 87

C. Unit Decision Implementation

The implementation of the actions are spread throughout a number of files, but they all share a naming convention for easy identification - X2Action_*ActionIdentifier*.uc - and are also part of the XCOM 2 source code, in the” XCOM 2 source code” folder. 88 Bibliography

D. Questionnaire

Questionnaire https://docs.google.com/forms/d/1C-mCr5jvwtSxNtzs2VPOihbuAaGqH...

Questionnaire After answering the following items, we will ask you to play a total of 6 XCOM 2 matches, versus various computer controlled opponents. There are no restrictions for how you play the game, however, this is a modified version of XCOM 2 which features only a small selection of the game's mechanics.

1. Candidate Number

2. Age

3. Have you played any turn-based tactics games in the past? If yes, then name one or a few.

4. Please provide us with an estimation of your XCOM Enemy Unknown / Enemy Within play time: Mark only one oval.

None 1 - 20 Hours 20 - 60 Hours 60+ Hours

5. Please provide us with an estimation of your XCOM 2 play time: Mark only one oval.

None 1 - 20 Hours 20 - 60 Hours 60+ Hours

Powered by

1 of 1 5/24/2016 12:50 PM E.. Classifying Evaluation Matches 89

E. Classifying Evaluation Matches

The decision to re-simulate matches has been shown to in general with the amount of ‘successful’ and ‘stable’ candidates produced on a per generation basis, but it does have a cost in terms of optimizing the procedure which generates evolved AI’s. It would be more efficient if it was possible to tell from the combat data of an evaluated candidate, if it needed to be re-simulated regardless if it won or if it lost. Multivariate classification techniques make it possible to find patterns in data that has a high dimensionality, and although only 4 combat data items are avail- able to analyze, it could still be possible for classification technique to find useful patterns in the data.The idea is that these candidates who lost their re-evaluation matches have essentially been flagged as potentially having produced abnormal evaluation results for their contained behavior, due to favourable or otherwise RNG. If a classifier is trained on the data from matches which were not flagged, to understand which makes a candidate likely to win or lose, it could then be used to decide if any given candidate should be re-evaluated. To see if this might be possible, 2 sets of combat data were extracted from the entire set of all matches played by all candidates. The first set (allNoResims) contained the combat data for all matches in which candidates did not lose their re-evaluation match (the all prefix refers the set containing data from both studies 2 and 3). The second set (allOnlyResims) contains the combat data for all matches of candidates which lost their second evaluation match. In classification terms, these sets will be the training and test sets, each match instance is a sample, and their combat data provides the feature vectors. The class labels for the sample data will classify each match as either being a win or a loss (1 and 0 respectively). A classifier based on the K-nearest neighbors algorithm, was trained on the allNoResims dataset. Using only the 4 combat data items, it was able find a clas- sification which would correctly label a candidate as having won its match or not 74% of the time (figure 1).

Figure 1: Classification error rates for a trained K-nearest neighbor classifier, evaluating both sets

Although this value is a little optimistic given it was trained on that same dataset as it was tested on, using cross-validation shows the value to be closer to 70%. Ideally there would be additional combat data extracted from X-Com 2, to 90 Bibliography better inform the training of a classifier and provide higher accuracy, otherwise between . When the trained classifier was run on the test dataset (allOnlyResims) the rate at which it was able to correctly assign a class label to samples fell to 57%. The potential begins to be seen here, if this classifier had been trusted during studies one and two, 43% less re-evaluations would have been conducted. Obviously with- out further development and training of the classifier this would likely have not improved the overall efficiency of the GA.

Figure 2: Confidence matrix of a K-nearest neighbor classifier evaluating the allOnlyResims set

However the confidence matrix (figure 2) shows that lost matches were miss classified as wins 10 times, and won match were miss-classified 16 times.This in- dicates that the classification is likely heavily influenced by accuracy. Study 2 had more candidates losing re-simulated matches, and the average accuracy of those candidates was higher than the average of all winning games in the study by 10%.This study saw the opposite, with accuracy being lower than the population mean for the re-evaluation matches.