FACULDADEDE ENGENHARIADA UNIVERSIDADEDO PORTO

Development of Artificial Intelligence Systems for Stealth Games based on the Monte Carlo Method

Diogo Albuquerque Valente Silva

Mestrado Integrado em Engenharia Informática e Computação

Supervisor: Eugénio da Costa Oliveira Co-Supervisor: Pedro Gonçalo Ferreira Alves Nogueira

July 23, 2014

Development of Artificial Intelligence Systems for Stealth Games based on the Monte Carlo Method

Diogo Albuquerque Valente Silva

Mestrado Integrado em Engenharia Informática e Computação

July 23, 2014

Abstract

Stealth elements have been present in the gaming industry since 1981 with the release of the first game that required the player to hide and walk through hiding spots, avoiding lights to achieve a goal. Stealth-based games evolved throughout the years, centred always on the player. This thesis intends to help change the paradigm of this kind of games, creating an agent that instead of acting as a reactive opponent, has the ability to plan ahead inside the game world, shaping the agent to be the player in a stealth game. To test this agent, a simulator with procedurally generated content stealth-centred elements was created in parallel to the agent’s development. This shift in focus will open way for new mechanics and a new kind of meta-game, hopefully driving the creation of a new type of games.

i ii Acknowledgements

First and foremost, I want to show my gratitude to Prof. Augusto de Sousa for his continual work in keeping the great gears of MIEIC running. I also want to express my gratitude to my supervisors, Prof. Eugénio da Costa Oliveira and Msc. Pedro Gonçalo Ferreira Alves Nogueira. Without their approval and vision, this project and this great opportunity would never have come to be. I’d like to thank my family for all the support given through the tough development times. A special thank you to Álvaro Valente da Silva. I need to thank all my friends for everything, but especially for the late hours of company while working. I’d also like to show my gratitude to Msc. António Sérgio Ferreira, for all the wise words and encouragement without which the trek would have been much harder.

Diogo Albuquerque Valente da Silva

iii iv "There is an art, or, rather, a knack to flying. The knack lies in learning how to throw yourself at the ground and miss."

Hitchiker’s Guide to the Galaxy, Douglas Adams

v vi Contents

1 Introduction1 1.1 Context ...... 1 1.2 Motivation and Goals ...... 2 1.3 Structure of the Thesis ...... 3

2 State of the Art5 2.1 Intelligent Agents for Video Games ...... 5 2.1.1 Planning ...... 6 2.1.2 Monte-Carlo Tree Search ...... 8 2.1.3 Movement ...... 9 2.1.4 Procedurally Generated Content ...... 10

3 Conceptual Model 11 3.1 Agent Architecture ...... 11 3.1.1 Goal-Oriented Action Planning ...... 11 3.1.2 Goals ...... 12 3.1.3 Actions ...... 12 3.1.4 Planner ...... 12 3.1.5 Sensors ...... 13 3.1.6 Knowledge ...... 13 3.2 Simulation platform ...... 13

4 Implementation 15 4.1 Simulator ...... 15 4.1.1 Map elements ...... 15 4.2 Agent ...... 18 4.2.1 Sensors ...... 18 4.2.2 Knowledge ...... 18 4.2.3 Goals ...... 18 4.2.4 Actions ...... 19 4.2.5 Planner ...... 19 4.2.6 Movement ...... 22

5 Experimental Results 23 5.1 Experimental Setup ...... 23 5.2 Experimental Results ...... 24 5.3 Discussion ...... 27

vii CONTENTS

6 Conclusions and Future Work 29 6.1 Goal Assessment ...... 29 6.2 Future Work ...... 29

References 31

A Procedurally generated simulator 35 A.0.1 Procedural map generation ...... 35

viii List of Figures

2.1 Sample agent using FSMs by Fernando Bevilacqua ...... 6 2.2 Example Hierarchical Task Network ...... 7 2.3 Example of pathfinding ...... 9

3.1 Representation of a Goal ...... 12 3.2 Representation of an Action ...... 12 3.3 Representation of a Planner ...... 13 3.4 Agent, represented by a green circle, hiding inside a barrel ...... 14

4.1 Map generation example ...... 17 4.2 Backward Search ...... 20 4.3 Forward Search ...... 20 4.4 MCTS. From http://mcts.ai/about/index.html ...... 21

5.1 Exploration Rate (Easy) ...... 26 5.2 Exploration Rate (Hard) ...... 26

ix LIST OF FIGURES

x List of Tables

4.1 Container Types ...... 16 4.2 List of Actions ...... 19

5.1 Map Difficulty ...... 24 5.2 Average number of detections per game ...... 25 5.3 Average loot per game ...... 25 5.4 Average number of iterations per game ...... 25 5.5 Maximum Completion Percentage ...... 27 5.6 Number of playouts and average duration in miliseconds ...... 28

xi LIST OF TABLES

xii Abbreviations

AI Artificial Intelligence FSM Finite-State Machine HTN Hierarchical Task Network PCG Procedural Content Generation IDA* Interactive Deepening A* MCTS Monte-Carlo Tree Search NPC Non-Playing Character UCT Upper-Confidence Bounds applied to Trees

xiii

Chapter 1

Introduction

References to artificial intelligence have been around since ancient Greece, in the form of automata with some semblance of intelligence, used by Hephaestus to help in his work. There are also refer- ences to intelligent machines, mostly with humanoid shapes, in different civilizations throughout history . The creation of intelligent machines is a topic that has intrigued mankind for millennia [Neg05], from its presence in the ancient legends to its current situation. Research on AI, although it was officially started in 1956 at a conference at Dartmouth College in Hanover, has its corner- stones in the attempt to define intelligence and thought on the part of philosophers of antiquity. The research carried out has led to the application of AI algorithms to solve various problems from completely different areas with high degrees of success. AI is used today as a response to problems that would be difficult for man to solve within the same time limit. One of the areas that have prevalent synergy with artificial intelligence is the industry. The first instance of artificial intelligence in this area appears on one of the first created games, Pong[MF09], where when the game was only played by one player, the vertical bar that was controlled by the computer followed a simple equation to move to the expected height the ball would arrive at.

1.1 Context

The application of AI in games was simple or non-existent in its infancy, since most games that existed were played with two players, or had opponents whose behavior was sufficiently simple to be defined as a series of conditions, or by a simple minimax algorithm. The simplicity of the research was a requirement given the low computational capacity and the need for a response from the opponent in real time. The AI in games grew with the evolution of processing power and the changing needs of the game development, the need for smarter opponents, richer environments created near real-time, more realistic situations and reactions from the opponents, culminating in examples of intelligent agents such as the ones existing in Sims (Maxis,2000), where agents have their own personalities and needs, in which to base their own decisions, in The Elder Scrolls

1 Introduction

V: Skyrim (Bethesda Softworks, 2011) where the non-player characters (NPCs) have their own agendas that extend over weeks of play, enabling the creation of cities bursting with a life of their own or as in F.E.A.R. (Monolith Productions, 2005), where the agents take the form of opponents that move in a tactical way in order to effectively hinder the player’s progress. Stealth games are games that allow, reward or even force the player to overcome their obstacles with some degree of stealth. In this type of games, the artificial intelligence agents take the form of opponents that actively seek the player, or patrol a given area and react if they find proof of the presence of the player. The most common reactions of this type of agent are raising an alarm, which causes the other opponents in the field know that there is an intruder, followed by the initiation of an active search for the player, based on different search methods. These opponents then seek the player for a while, and attack if they find him or give up after some time has passed.

1.2 Motivation and Goals

This project aims to create an agent that is not just reactive but it is also able to plan its steps with precision and a high degree of stealth, capable of learning and be adaptable in order to get a good response in a situation that has not yet seen without wasting too much computing time. The creation of this agent will allow for a new vision of stealth gaming, expanding the paradigm of such games in order to allow for the creation of new game modes, where the system would be able to be the player rather than the guards opposing him. The creation of this agent may enable the creation of different modes of cooperative play, creating the opportunity for a player to be one of the opponents in this type of game, the creation of new play testing and game balancing techniques, new obstacles in such games and the possibility of creating procedurally generated content for stealth games. This requires the possibility for the agent to create counter strategies to respond to external stimuli. When the project is concluded, we intend to have created a high- level simulator for testing and training the agent in different situations and maps, that procedurally generates a map for stealth games, which in turn will require the creation of abstractions of some key concepts in stealth games. The second and main objective of this thesis is to create an artificial intelligent agent that is able to:

• Create a plan based on incomplete world information received through its sensors that allows the agent to achieve a goal in a realistic and near-optimal fashion.

• Learn through a memory of past plans that achieved set goals and recall those plans when a similar situation arises.

• Move through any map stealthily, avoiding detection and creating an escape plan if neces- sary.

2 Introduction

1.3 Structure of the Thesis

In the next section of this thesis, we will explore the area of Artificial Intelligence with a special focus on agents in video games and Procedural Content Generation. After that, a conceptual model of the AI system will be discussed, detailing the architecture used. Then the agent and details about its implementation and the testing platform will be presented, followed by a chapter where the results are discussed. The document ends with an assessment of the objectives accomplished in this thesis and possibilities for future improvement.

3 Introduction

4 Chapter 2

State of the Art

Intelligent agents have been present in AI since the late 1980s[WJ95]. There are several definitions of agents, but they seem to concur in a few basic properties: an agent must have some measure of autonomy, or be able to control its own actions without the intervention of humans, it can have social abilities, or be able to communicate and interact with other agents, it must be able to react, as in perceive its environment and react in a way that makes sense according to the changes it senses and it must be able to be proactive, or act toward a defined goal instead of simply reacting to the environment. There are some other attributes that are considered to be important to agents, such as the addition of human properties to the agent, such as emotions, knowledge, beliefs, intentions [WJ95].

2.1 Intelligent Agents for Video Games

There are various different approaches when constructing agents, or different architectures. De- liberative architectures [Bro91], which are defined as architectures that contains an explicit model of the world and where the agent makes its decisions via logical reasoning, logic which is world dependent[TRR+13]. These architectures create two interesting problems, the representation of the world that is given to the agent must be accurate and created in a timely fashion and the infor- mation inside that world that represents its knowledge and the properties of more complex entities has to be represented in a way that makes it possible for the agent to process and react in time. A good example of this kind of architecture are planning agents, who receive a symbolic representa- tion of the world and of a goal and attempt to find a sequence of actions that leads to the fulfilment of that goal [FN72]. One of the first examples was the planning system STRIPS. There are also reactive architectures, defined as architectures that do not need or include any symbolic representation of the world. These were the basis for behaviour-based robots [Bro91] in which the key aspects were situatedness where the robots (agents) did not have to deal with abstract descriptions of the world, embodiment, where the actions of the agents in the world have

5 State of the Art instantaneous feedback, intelligence, where there are a list of factors that influence the actions the robot takes and not just the computational engine and emergence where the intelligence comes from the interaction of the robot and the world. The third kind of architectures usable to construct agents are hybrid architectures, that contain parts of both of the architectures described above as subsystems, such as having a deliberative system with a symbolic world model and a system capable of making decisions and reacting to events much like a reactive system. This introduced a form of layered architectures, with each layer dealing with information with different degrees of abstractness. The most popular methods to build agents for games nowadays are finite state machines (FSMs) and hierarchical finite state machines (HFSMs) . These allow for a network of states and transitions to be created that represent plans the agents might follow given an initial state. One problem in FSM usage is the scalability, FSMs grow to a very high complexity in larger prob- lems [HYK10], which is reduced by the HFSMs’ capability of reducing parts of the machine to superstates. Another is the fact that agents built based on these kinds of state machines tend to have a robotic behaviour, which can be predicted by the player and can ruin the immersiveness in games.

Figure 2.1: Sample agent using FSMs by Fernando Bevilacqua

2.1.1 Planning

In 1999, a planner called SHOP (Simple Hierarchical Ordered Planner [NCLMnA99]) was created as an argument against the idea that planning in AI using total-order forward search methods was a bad idea, because of the need for excessive backtracking. This method requires a formal representation of the problem domain, in the shape of axioms and methods, from which is extracted a plan . This planner is based on Hierarchical Task Networks (HTN) , which causes the domain

6 State of the Art to be represented as a series of tasks of different types, some of which are preconditions for performing other until a final task is completed. Such planners achieve good results in situations where the domain can be described formally.

Figure 2.2: Example Hierarchical Task Network

In 2007 [KBK07] offline tests with an advanced version of the SHOP planning were made in order to create plans for the daily routine of NPCs in the game The Elder Scrolls IV : Oblivion from Bethesda Softworks. Each NPC has a group of AI packages that are used to control its behavior. Each package describes behavior in the form of a HTN and has preconditions that must be met for the package to be activated. Each agent has access to various information about the state of the game at any time and these are used for the activation of packages. This approach is more efficient for gaming environments with static than for dynamic environments, where the second type of environment has the potential to describe a game world more realistically. In the case of dynamic environments, the plan created would have to be sufficiently robust, which would make it more complex to be able to cover all the possibilities or at least be able to react through situations that were not anticipated at the time of planning . As an alternative to this, Jeff Orkin [Ork03] developed an architecture for making decisions oriented goals , the Goal-Oriented Action Planning (GOAP). This architecture was developed based on goal-oriented architectures already used at the time, without a planning component, constantly reevaluating the existing goals and choosing the most appropriate for the situation the agents were in. In this kind of architecture each agent has only one goal active at any time and a list of actions that can be taken and from them generates a plan that is composed of a sequence of actions in order to achieve the chosen goal. Each goal has a condition required for completion that is defined as a value within a domain variable. Every action has a pre-condition whose fulfillment is required for it to take place, in the form of a key-value pair or array, an effect that describes what happens after its completion and a cost which creates a metric through which one action might be selected over the other[Ork06]. Two problems arise in the implementation of this system, the first being the need of a good search algorithm to choose the actions that compose the plan and the second being the representation of the world, whose data is needed by the planner for

7 State of the Art the formulation of plans. The solutions presented are the use of A* as a search algorithm to formulate the plan, regressively searching for some action that fulfills the goal, followed by the search of an action that satisfies the pre-condition of previous action until all conditions are met. The representation of the world is suggested to be a data structure that contains an attribute, a value and the corresponding world entity.The advantage of this approach is there a limit on the necessary knowledge of the agent to accomplish a given goal, which discards unnecessary data and allows a simpler representation of the world [Ork04]. Later, in 2013, a new framework based on GOAP was created, in order to reduce the time needed to compute a plan, even for hundreds of agents [MSD+13]. The framework allows for naive planning, where the NPCs are able to plan with limited knowledge of the world. It uses an informed version of A* in the planning stages, the IDA* algorithm, using heuristics to expand the nodes in the search-tree, and uses a form of layered planning, where the planners are nested hierarchically, with more complex actions being higher in the chain than simpler actions. This kind of layered planning system drastically reduces the time necessary to compute more complex plans. The framework also allows for a memory system for the NPCs, where they recall plans if they are in similar situations, and for some degree of personalization, making an NPC favor different decisions than other in the same circumstances, due to their preferences.

2.1.2 Monte-Carlo Tree Search

Monte Carlo Tree Search (MCTS) is a variation on tree search algorithms that is easy to use, domain independent, in the sense that it can be used in any game, only needing knowledge of the legal moves available and the end conditions. The method is also able to learn, and its parameters can be fine-tuned for different needs of results [FNO14], since it allows for good plays to be kept and reused throughout the . These facts make this method a valuable alternative to other search methods. The method is structured in four different stages, Selection, where a node is selected, Expansion, where child nodes of the selected node are created, Simulation, where a playout of the path chosen is simulated and Backpropagation where the current move sequence is evaluated and updated. MCTS was used in several different games, and some improvements were made to lighten the processing load and to reduce the tree size. The usage of UCT (Upper-Confidence bounds applied to trees) was implemented in order to add an informed way for the selection of nodes to be explored[Lor08]. This was used mainly in board games, with the first example and most common usage being in the Go game, but it was proved to be effective in any kind of game without any sort of previous experience or training [MC10]. This method was proven to be a good alternative to standard methods such as A* and IDA* [SWV+08] both in single and multiplayer games, getting especially good results in single-player games with perfect information. There is also a study that applies this method to create intelligent and adaptive opponents, where it is shown that since the performance of the algorithm is based on computation time, the difficulty of the opponents can be adjusted by adjusting the allowed simulation time for the method. Since MCTS is very computing intensive, it might be complicated

8 State of the Art to use it to calculate optimal solutions online, but it is possible to train a Neural Controller with data obtained from MCTS simulations and achieve near-optimal solutions[XYS+09].

2.1.3 Movement

Movement in videogames is a very important part that can damage the player’s immersion if not dealed with properly. Usually pathfinding algorithms are used to calculate a path for an agent to follow, and these algorithms can be categorized under two different types[GMS03].

• Undirected approach, where the agent does not plan its path ahead of time and instead moves around trying to find the way to its target. Search algorithms such as Depth-first search and Breadth-first search are often used to improve the efficiency of this method. • Directed approach, where there is an assessment of the path beforehand. Usually, methods that use this approach have some measure of cost or distance to estimate the value of a path between two points. Algorithms such as Dijkstra or A* are used to find an optimal path.

Figure 2.3: Example of pathfinding

Stealthy movement is by definition movement that allows for the agent to remain undetected by potential threats. Usually, movement in games is an application of a pathfinding algorithm, like A*, but this kind of movement is used to find the smallest path between two or more points, with no regard for threats other than the complete avoidance of them, assuming them as non traversable paths. There is an alternative that explores the aesthetics of stealth in pathfinding algorithms developed by Ron Coleman, which explores an alteration in the A* algorithm to create a stealthy A* algorithm [Col09]. This algorithm generates its paths using a stealth effect that is added to the weight calculation for each cell of the map. This leads to paths that favor proximity to corners and walls, leading in turn to stealthier movement on the agent’s part. There is also a beautifying treatment that can be applied to the path leading to smoother turns and a more realistic path. Stealthy movement is only needed if there is a chance for the agent to be discovered, therefore it makes sense to take into account the movement of an opponent when calculating possible paths. Such a problem could be tackled as a zero-sum transit game between two players [VBJP10].

9 State of the Art

One would be the agent trying to traverse the region and the other would be responsible for all the patrols in the region. The problem in this approach is that it assumes the knowledge of the strategies of both the players, which might not be true in a multiplayer game. Another problem in pathfinding is that in larger maps it can be a problem to store the maps and paths in memory. Nathan Sturtevant suggests that an abstraction of the map can be used to lighten the load on the memory at the cost of optimality [Stu07]. This approach encompasses the possibility of the usage of dynamic maps.

2.1.4 Procedurally Generated Content

The need for Procedural Content Generation (PCG) appeared to offset the huge amount of time necessary to create content in a game. There is a lot of documentation on procedurally generated content, but at this moment the author is unaware of any in use specifically for stealth games. Procedural content generation is used in many things in gaming, from items to maps and even storyboards and rules can be procedurally generated at this moment [PCWL08]. Many different approaches are taken in generating content, usually depending on the type of game that content is for. Strategy games usually use search-based methods or evolutionary algorithms to generate maps with a selected or random topology[TPY10] with more or less regard to the aesthetic aspect of the maps [LCF13a] and map balancing[LCCFL12], using grammars with the purpose of defining what the content of the map should be [vdLLB13] with the necessary parameters to guarantee dynamism in multiplayer games [LCF13b]. Other games, like platformers might use a different method, such as generating the map with automata[JYT10] or generating the map based on the player’s performance [SW10] and/or the rhythm intended for a given level [JTSWF10]. In general there are two types of techniques for PCG, assisted techniques which require heavy human intervention to generate content, and non-assisted that requires much less or even no human intervention at all. Both techniques must be able to generate content appropriate to the setting. Some techniques are based in real-life examples, such as the use of algorithms to simulate erosion when creating continents [CBPD11]. There are other techniques using semantic descriptions in the design of game worlds, where some rules are attributed to objects and a solver attempts to shape and place the objects in logical places [TSBdK09]. There are also studies using genetic and evolutionary algorithms for puzzle generation [Ash10] or map generation based on pieces of content [SP10]. The studies about genetic algorithms that generate the maps or levels through stitching together components and pieces of the level and evaluate them through different means (custom heuristics or even the A* algorithm to check for traversability), crossing over the best results. Evolutionary methods generate the maps through sets of rules or grammars, and some take the input of the player’s past experience through the map to generate new maps. Some recent approaches even use the player’s emotions[NS13] to help generate new experiences and new game worlds [NRON13][NARO14].

10 Chapter 3

Conceptual Model

This thesis was created to bring a new take into AI agents for stealth games in general, creating an agent that would eventually emulate the actions of a player. There was a need to create an agent that could traverse through any map in a stealthy way, hidden from sight, aware of its enemies and its goals. Ideally this is represented by an agent with incomplete knowledge of the current map, as knowledgeable as a player would be when first playing through a map on these kinds of games.

3.1 Agent Architecture

The artificial intelligence developed needed to be able to plan instead of just react to external inputs, in order to better simulate the movements of a player inside of a stealth game. To that purpose, the agent required an architecture that supported dynamic decision-making on-the-fly. Planning has been proven to be a very powerful tool in creating agents for games, leading to NPCs with routines that expanded through a 24 hour cycle [KBK07] or reactive opponents, based on FSMs and HFSMs, whose perception triggered a set of actions [HYK10]. The behaviour we aimed for could potentially be too complex to be defined by simple rule-based systems or FSMs. Therefore, an architecture was chosen based on the Goal-Oriented Action Planning, a choice that allowed to reduce the development time for the agent’s system and allows for a possible expansion with moderate ease, as well as allowing for the Agent to better suit its actions to a dynamically changing world.

3.1.1 Goal-Oriented Action Planning

This type of architecture was chosen over FSMs and HTNs mostly due to the time constraints for the development of both the Agent and the simulator platform. If one would use FSMs to create the agent, every possible scenario would have to be accounted for, which could lead to an FSM of very high complexity. If the Agent was to be developed in HTNs, every task defined would have to be separated in sub-tasks, each with its own priority. On one hand this appears to be a

11 Conceptual Model solid choice for the Agent’s design, but there was a need for the Agent to be modular in nature, so that eventually any new aspects added to the could require extensive code changes. GOAP is composed of Goals, Actions, and a Planner. Its modular aspect allows for expansion through the creation of new instances of any of its parts [Ork06]. For instance, if a need arises to create an Agent with different priorities, new Goals can be created and used with the already implemented Goals and it can give the Agent a totally different approach to the game than before. The Agent developed needed not only to be able to Plan but also a way of gathering information from the world around it and a representation of said Knowledge.

3.1.2 Goals

Goals can be diverse and can be abstract concepts as long as they can be defined by an alteration in the knowledge or the state of the agent. The existence of the Goal concept allows for the Agent to have a defined objective in the playthrough, which can change dynamically if any situation arises that removes the need for the current Goal or creates the need for another.

Figure 3.1: Representation of a Goal

3.1.3 Actions

Actions can be as diverse as goals, and usually are simple steps that can be done within one iteration of the simulation, but can also be more complex actions, such as walking through a defined path. Actions are a crucial part of GOAP, since they the elements that compose the plan the Agent is going to follow. Each Action has a precondition that can be used to check if the Action is feasible and an effect that represents a change in the game world.

Figure 3.2: Representation of an Action

3.1.4 Planner

The agent also has a Planner incorporated which chooses actions in order to achieve the goal. The Planner does this by applying a search-based algorithm to decide what Action in the search tree to explore next. The sequence of Actions chosen is called a Plan.

12 Conceptual Model

Figure 3.3: Representation of a Planner

3.1.5 Sensors

The agent possesses sensors that represent a line of sight oriented toward where the agent is facing. Any object in the line of sight of which the agent had no previous knowledge grants the agent knowledge of all the possible actions related to that object.

3.1.6 Knowledge

Knowledge representation is an important area in AI research[DSS93]. The structure of this rep- resentation can allow for an increase in the computing capability of a tree search-based algorithm, by simplifying the way the tree is generated or how it is accessed. There is a need to create a sym- bolic representation [Ork04] that safeguards all the important knowledge in relation to the Agent’s needs and still simplifies in any way the data structure.

3.2 Simulation platform

There was also a need to test the agent created in an environment ready for stealth games. At the moment of development there was a tool in Alpha stage called AI Sandbox (http://aisandbox.com, 2014), since this tool in a closed access Alpha, the choice was made to create a platform that could be used to test the Agent and could create environments with stealth elements. For that purpose, the simulation platform was developed with PCG elements based on stealth games. The platform uses a method based loosely on Johnson’s [JYT10] take on the usage of cellular automata for level generation. We aimed to create an environment approximate to that of a castle or mansion filled with valuables for our Agent to take. The map generator follows a simple, non-assisted

13 Conceptual Model generate and test approach to level generation, guaranteeing the whole level is traversable and that any objective is within reach of the agent. The generation is stochastic, in a way that every level is different than any before it, and performed in an offline fashion, during the runtime of the simulator, generating each level before each simulation based on a number of parameters given to the generator at the beginning of the simulation. Certain elements are present throughout stealth games. The most common element in this type of games is the light, or rather, the shadow that comes from its absence. In these games, the player needs to stay away from the light and tread in darkness to avoid detection. Another very common element in stealth games are opponents, usually in the form of guards, patrolling or static, chasing and attacking the player if detected. Most stealth games also have some kind of hiding mechanic, either in a bush or under a box, the player can evade opponents by taking advantage of these spots.

Figure 3.4: Agent, represented by a green circle, hiding inside a barrel

Lights are a simple element to create, their main property being the illumination they give to nearby areas. The shadows left by the absence of light are where the stealth players will dwell during most of the game, because usually they affect the vision of the opponents in some way, by making the player less visible. The opponents on these games can be as complex as any Agent. In our case we intended to test the performance of a thief on a map, so there was a need to create opponents, in the form of Guards, that would hinder the player’s progress on the map. Guards that could see far and would search the player even if they lost sight of it. The last common element, the presence of hiding spots, needs to be present in any stealth game, and in this case our Agents chose to hide themselves inside or under all kinds of Furniture.

14 Chapter 4

Implementation

As discussed in the previous chapter, the agent was created based on the GOAP architecture, settled on a procedurally generated map simulator. In this chapter we’ll review the details of the creation of both the agent and the simulator.

4.1 Simulator

The first step of each simulation is map creation. Each map is 2-dimensional and has a maximum size defined upon creation, a maximum and minimum room size and a maximum number of rooms. Each map creation phase is composed of a cycle of two steps until the map is validated for use.

4.1.1 Map elements

We will now discuss the elements that compose a map. The first element of interest is a Cell. The core map is a matrix of cells.

4.1.1.1 Cell

A Cell is the smallest element of which the map is composed. Each Cell has its own position inside the map in cartesian coordinates, a reference to every cell adjacent to itself, a Light and a Furniture which may or may not exist on the cell, and light intensity.

4.1.1.2 Light

Light is an element that represents a lightbulb or a torch. Each Light element has an intensity that affects nearby cells in a radius of the origin and a status of on or off.

15 Implementation

4.1.1.3 Furniture

Furniture is a more complex element. There are several different types of Furniture that may be present on each map. Each Furniture has an orientation, which is important in Furnitures that occupy more than one cell, a type that defines what kind of Furniture it is and if the Furniture may appear in a Room based on the Room type, a hiding property that defines if the agent can use the Furniture to hide and how it hides in relation to that furniture, a height value that defines if it may block the line-of-sight of the agent or the guards, a property that defines if the Furniture is a container and a value of loot contained in the Furniture itself. Any Furniture of medium height or more blocks line-of-sight and allows the thief to hide behind it. These are the different types of Furniture and their properties.

Table 4.1: Container Types

Type Container Hiding Height Barrel Yes Inside Medium Bed No Under Short Chair No None Short Chest Yes Inside Short Desk Yes Under Medium Dummy No None Medium Hole No None Short Shelf Yes None High Table Yes Under Medium Throne No None High

4.1.1.4 Room

A Room is a section of the map composed of Cells with similar properties. Each Room has a type, that defines what kind of Furniture may appear on the Cells composing the Room, its own Cartesian coordinates, a size value, and a list of furniture inside the room. The following is a list of different Room types and the respective Furniture.

• Corridor: represents a corridor to transition between rooms. No Furniture may appear in this room.

• Bedroom : May contain Barrels, Beds, Chests, Chairs and Tables.

• Storage Room : May contain Barrels, Chests, Shelves and Tables.

• Training Room : May contain Dummies, Barrels, Chests and Tables.

• Throne Room : May contain a Throne and Chairs.

• Restroom : May contain Holes.

• Work Room : May contain Desks, Chests, Shelves and Chairs

16 Implementation

Figure 4.1: Map generation example

4.1.1.5 Waypoint

A Waypoint is a point inside a Room that is part of the patrol path of the Guards inside the Map. A Room may have only one Waypoint and it is only created on a Cell that contains no Furniture.

4.1.1.6 Guards

A Map contains one or more guards. Each Guard is an extension of the class Person. Each Person has its own position on the Map, an unique id code and a HitPoint value that defines whether the Person is alive or dead. Each Guard has a target Waypoint, which is the next waypoint to where it has to move on its patrol, a value defining its status, which can be on patrol, inquisitive, on alert or dead; a Guard also possesses Sensors such as sight, an orientation, or where the guard is turned, an alert timer, which defines the duration of the Alert mode and a target Cell, which is the last known location of the Agent. Each Guard is on Patrol mode by default, where it walks from one waypoint to another, via the A* algorithm. If the guard becomes aware of the Agent, it goes into Alert mode, chasing the agent until it either catches it or loses it from sight. If it loses the Agent from sight it goes into Inquisitive mode, where it searches the vicinity of the last known place of the Agent until it finds the Agent again or the Alert timer runs out, in which case it returns to Patrol mode. If a Guard catches the Agent the simulation ends in a failure.

17 Implementation

4.2 Agent

The Agent was built to represent a character akin to a Thief in a stealth game. As the Guards mentioned above, the Agent is an extension of the class Person. Apart from the attributes inherited from that class, the architecture used imparts other properties to the Agent.

4.2.1 Sensors

The Agent has a sight sensor that simulates a viewpoint oriented toward wherever the Agent is turned. The sight sensor functions as a kind of Ray Casting algorithm in two-dimensional space. Sight is therefore blocked by walls or medium to high pieces of Furniture. Anything not before known detected by the sensors is added to the Knowledge of the Agent.

4.2.2 Knowledge

The Agent possesses a structure that represents its knowledge. The Knowledge structure starts blank in the beginning of the simulation. It contains a list of known Cells, a list of known Furniture, a list of possible Actions, a HashMap recording the cells in which the Agent has seen the Guards patrolling, all of which are updated each time the Agent views a new Cell with its Sensor or sees a Guard. The structure also records the current position of the Agent, its hiding status and current and last goal.

4.2.3 Goals

Goals are objectives that the Agent has to accomplish to be successful. Each Goal has a condition for success that depends on an alteration of the Agent’s Knowledge. In this case, four different goals were implemented. By default the agent has the Exploration goal active, where it attempts to learn more about its surroundings, adding to its knowledge whatever it finds. There is also the Loot Goal, which becomes active if the agent knows of enough containers to loot, the Hide Goal, where the agent attempts to hide from current pursuers and the Escape Goal, that becomes active when the agent has fulfilled his primary objective (to loot the current map), where it attempts to escape the current map without being detected.

• Explore: The default Goal, which is accomplished if the agent finishes his turn with more known cells than he had at the beginning of the turn.

• Hide : Goal that becomes active if a Guard triggers the Alarm state on the Map. It is accomplished when the agent is hidden out of the line-of-sight of Guards.

• Loot : Becomes active when the Agent has in its knowledge the whereabouts of more than a third of the total loot of the Map.

• Escape : Becomes active when the Agent has in its possession more than a third of the total loot of the Map and is accomplished if the Agent succeeds in getting to the exit of the Map.

18 Implementation

The only way to achieve any of the Goals is through the effect of different Actions. Goals change dynamically whenever the agent senses something that invalidates the current goal or gives priority to another.

4.2.4 Actions

Actions are single steps that the agent can follow and that compose a plan. Actions have a precon- dition and an effect. Every new object the agent detects through its sensors gives it the knowledge of actions it can perform related to those objects. The agent can know different types of actions such as the Hide action, where the agent hides inside, under or behind an object, the Steal action in which the agent loots a container, the Move, that defines a path for the agent, the Peek action where the agent peeks from behind a cover and the Look action, which changes the facing of the agent. Each Action has a pre-condition and an effect. The precondition defines a state that the Agent has to achieve for the Action to be successful. They receive Knowledge as a parameter and the effect function returns altered Knowledge that is tested to see if it accomplishes a Goal or a precondition for another Action. There are five defined Actions.

Table 4.2: List of Actions

Action Precondition Effect Hide Adjacency of the Agent to a Furniture that allows hiding Hides the agent Look No precondition Changes the facing of the Agent Move No precondition Moves the Agent to the Cell Steal Adjacency of the Agent to a container-type Furniture Loots the container Peek Adjacency of the Agent to a Furniture Gives line-of-sight from behind Furniture

4.2.5 Planner

The Planner, as the name implies, formulates a Plan. A Plan is a sequence of Actions that once finished accomplishes a Goal. The Planner therefore receives a Goal and the known Actions that the Agent has access to. The planning process can be similar to a navigational pathfinding problem, so it can be tackled by search-based algorithms or even pathfinding algorithms. Since the objective was to find a way to make the Agent act within time constraints, the Monte-Carlo Tree Search (MCTS) method was selected for the Planner. An algorithm like A* would find the optimal plan for any situation at the cost of increased planning time.

4.2.5.1 Planning tree

The planning stage starts with the definition of the Goal to accomplish. Once that is obtained, there are two different methods to expand the search tree, through forward search, or backward search. Both have been implemented using the MCTS method.

19 Implementation

Figure 4.2: Backward Search

Figure 4.3: Forward Search

20 Implementation

4.2.5.2 Planning with MCTS

Monte-Carlo Tree Search is an algorithm divided in four steps: Selection, Expansion, Simulation and Backpropagation. It functions aheuristically, or does not need previous domain knowledge to achieve good decisions, which fits with the fact that our Agent works with incomplete knowledge. Two versions of MCTS were implemented over the GOAP architecture, one using forward search, one using backward search. Forward searching starts from a node that represents the current state of the Agent and expands to an Action from all possible Actions the Agent may take at its current state, that is, whose precondition has been met by the previous state, and then expands to all possible Actions whose precondition is met by the effect of the Action selected on the previous node, until an effect accomplishes the Goal. Backward searching starts from a state where the Goal is accomplished and expands into nodes whose Action’s effect accomplishes the Goal, expanding afterward to nodes whose Action’s meets the selected Action’s precondition. Each of the four-step iterations of the MCTS method is called a playout.

Figure 4.4: MCTS. From http://mcts.ai/about/index.html

The Selection step starts on a node that represents the current state of the Agent during forward search or the success state during backward Search. A node is selected randomly during several playouts and then based on the Upper-Confidence Bounds formula:

p vi +C ∗ ln(N)/ni

Where v is the value of the node, given by the number of successes achieved when traversing that node over the number of times the node has been visited, C is a bias parameter, set at 1, N is the total number of times the parent node has been visited and n is the number of times the node has been visited. This step repeats itself until it reaches a node that hasn’t been explored. Each node except the starting node is an Action. During forward search the nodes are selected from all the Actions whose precondition is met by the state of the Agent after the effect of the Action on the previous node. During Backwards search the nodes are selected from all the Actions whose effect leads to accomplishing the precondition of the Action in the previous node. The Expansion step expands the node selected above into a random node in the same fashion. Since the Agent

21 Implementation is supposed to have incomplete knowledge of the world, i.e. not knowing where the Guards are unless it sees them, the Simulation step expands the nodes randomly, until it either reaches an end state or until a certain depth has been reached. The last step is Backpropagation, where the values of the Actions in the current sequence are updated, increasing if they reached a success state.

4.2.6 Movement

One of the requirements of the Agent was its ability to traverse the map in a stealthy fashion, so the heuristic was changed to include an extra cost of traversing a Cell based on the lighting and the danger level of the Cell.

hx = mx + ix + dx

Where h is the heuristic value of a Cell, m is the Manhattan distance from it to the ending Cell, i is the light intensity on the Cell cubed and d is the danger level, which is the number of times that a Guard has been seen on that Cell over the number of times that cell has been seen. This is not an admissible heuristic [Kor00] since it increases the depth of search, overestimating the distance cost to the goal, but an admissible heuristic for a pathfinding algorithm requires extensive testing and was not the focus of this thesis. This heuristic nonetheless adds to the distance cost of nodes that have a high light intensity value and danger level allowing the Agent to favor Cells with a low light intensity level and with a low danger level, creating a stealthier path.

22 Chapter 5

Experimental Results

In this section we review the experimental data collected through simulations on the platform developed. We begin by describing the experimental setup, followed by an analysis the results and their discussion.

5.1 Experimental Setup

The experiments were on a PC with a 2.5 GHz processor and 4 GB of RAM. Each experiment was a playthrough of a procedurally generated map by an Agent. In each playthrough the Agent would start in a random position on the map, and both the Agent and the Guards would act in turn. Each iteration of the program would mean one step from the Guards and one Action or a step for the Agent (while during a Move Action). There were three agents, all implemented as per Section 4.2.

• Random: an agent that chooses its actions randomly, based only on which it can perform that moment.

• MCTS-F: an agent using the MCTS method in forward search.

• MCTS-B: an agent using the MCTS method in backward search.

Maps were divided into two difficulties based on the number of Guards per area of the map, since a higher concentration of Guards would increase the probability of a Guard finding and catching the agent. The map difficulty formula is:

Di = gi/Ai

Where D is the difficulty, g is the number of guards in the map and A is the effective area of the map, or the number of non-wall cells. Maps under 0.002 Difficulty were discarded since this usually signified a low concentration of guards over a large area.

23 Experimental Results

Table 5.1: Map Difficulty

Difficulty Minimum Area Maximum Area Average Guards per map Easy D < 0.004 264 1149 1.9 Hard D >= 0.004 50 987 2.6

In each experiment the agent had to go through a map, discovering the layout of the level, avoiding guards and stealing valuables. Over 300 experiments were conducted with each Agent in each difficulty level. Each of these experiments generated a log-file, with the following informa- tion of the playthrough:

• Guards: The number of guards patrolling around the map, useful to test the performance of the Agents in more dangerous environments.

• Actual Map Size: Number of cells on the map that are not walls.

• Total Loot: Quantity of loot present on the map.

• Known Actions: Number of Actions known by the Agent before the each planning stage begins.

• Playout time: Time spent on each playout.

• Number of Alarms: number of times the Agent was discovered by guards.

• Looted value: quantity of valuables the Agent looted during the playthrough.

• Score: value of success of the agent on said map.

Each Agent had a maximum of 100 ms time allotment to create a plan. The Random Agent picks Actions randomly, so the planning stage would be almost instantaneous. MCTS-F performs a forward tree search using the MCTS method, limited by a maximum expansion depth of 30 levels, and MCTS-B searches backward, with the same limitation.

5.2 Experimental Results

The agents were tested in the simulating platform described in Section 4.1, where a map was randomly generated, and a planner type chosen to play it. Each map had a minimum of one Guard, and more than one Furniture with a loot value, and the Agent failed the playthrough immediately if a Guard caught it (if it became adjacent to the Agent while knowing its position). The performance of each Agent is rated through several parameters that allow us to perceive the Agent’s efficiency in stealth. The first of these parameters is the number of detections per game. In this experimental setup, due to the guaranteed high concentration of Guards in each map, detection is almost unavoidable, even for the stealthier of Agents. MCTS-B outperformed the other Agents in avoiding guards, due

24 Experimental Results

Table 5.2: Average number of detections per game

Easy Hard Alarms Easy Alarms Hard Random 6.51 6.81 180 69 MCTS-F 4.92 5.53 133 63 MCTS-B 4.02 4.26 106 51 to its planning type, which led to considerably shorter plans than its counterpart with the same method. Shorter plans led it to safety faster, away from the Guard’s line-of-sight. In the same way, we can see that the MCTS-B Agent triggered fewer alarms than its counterparts. The lower value of alarms in the Hard difficulty is due to the fact that in the Hard difficulty an alarm would usually be followed by a failure.

Table 5.3: Average loot per game

Easy Hard Random 2.35 5.54 MCTS-F 11.60 12.94 MCTS-B 13.61 13.68

In each playthrough, each Agent had as a primary objective to steal the loot it could find and attempt to escape. Table 5.4 shows an average of the loot each Agent could appropriate before being caught. The Agents were scored by their exploration in each experiment and the loot they were able to accumulate.

Table 5.4: Average number of iterations per game

Easy Hard Random 35.70 29.59 MCTS-F 13.19 11.17 MCTS-B 19.62 13.40

During each playthrough, the Agents explored the map, and every Cell discovered increased their known Actions, and therefore raised their planning possibilities. During the Exploration Goal, the Agent’s objective is to maximize the number of known Actions. The following figures (5.1,5.2) show that the exploration rate follows a logarithmic trend, with MCTS-B having a much higher exploration rate on easy maps than its counterpart and that rate being much closer on hard difficulty maps. As mentioned above, during each iteration of each experiment the time allotment for the plan- ning step was 100 ms. Of the three Agents, Random was the fastest to finish the playouts. On the next table we can see the average duration of a playout in each of the difficulties for each of the Agents. The duration of each playout increases at near exponential rate with the number of Actions that the Agent knows at the time of planning, since the planner has to go through the entirety of the Actions known to find which ones are feasible at that step in the plan. During the initial

25 Experimental Results

Figure 5.1: Exploration Rate (Easy)

Figure 5.2: Exploration Rate (Hard)

26 Experimental Results

Table 5.5: Maximum Completion Percentage

Easy Hard Random 2% 6% MCTS-F 8% 10% MCTS-B 13% 23% stages, since the Agent has few possible Actions, most of the playouts are under 1 ms of duration. On instances where the Agent knew more than 600 Actions it is common to see the number of Playouts severely reduced when using the MCTS-F, but each Playout duration over 50 ms. This happens because the forward search expands the tree until it finds a state where the current Goal is accomplished, and since the first playouts are essentially random because the Actions’ v value (as seen in Section 4.2.5.2) is zero during the initial playouts. The minimum time for a playout for any of the methods above is under 1 ms, and the maximum time is 99 ms, in planning stages where only one playout was made, since the planner stops itself if going over 100 ms. This can be a problem with some search-based methods, but in MCTS-F, if the planner is stopped, the plan created can still be valid, since it starts exploring the tree from the Agent’s current status.

5.3 Discussion

The performance of the Agents in these experiments leads us to the conclusion that MCTS used in backward search, even in an environment with incomplete knowledge, is a very powerful search method and that with some improvement it could possibly be used to create a stealth agent that plans during runtime. The GOAP architecture can function with any search-based algorithm, and can profit with the usage of MCTS due to it being able to not only achieve optimal plans as per the A* method but also due to the possibility of interrupting the method at any time and still be able to extract a working plan. The simulation platform requires some changes, in order to be less unforgiving and allow the Agents some measure of ease during the playthroughs.

27 Experimental Results

Table 5.6: Number of playouts and average duration in miliseconds

Medium Avg. Duration Medium Hard Avg. Duration Hard Random 9889 0.051 2989 0.094 MCTS-F 768846 0.704 336343 1.223 MCTS-B 7031303 5.929 3107600 7.079

28 Chapter 6

Conclusions and Future Work

In this thesis we discussed the creation of an AI Agent that would be able to act in a way befitting a player on a stealth-based game. We explored diverse possibilities for its creation, and chose a path for its implementation. We also discussed PCG and its potential in creating game environments tailored to different types of games and testing platforms.

6.1 Goal Assessment

The Agent created was able to walk stealthily through a game map, avoiding lights and patrolling guards, being able to hide in order to avoid being caught. It was also able to plan ahead, with a specific goal in mind, be it hiding from an inquisitive guard, or looting an entire map filled with valuables. The Agent is not yet capable of playing like a human player or passing a Turing test, but the framework was created so that it would be possible to improve upon and eventually get to a point where it would be able to act in a realistic way that could be confused with a human playthrough. The testing platform created used PCG concepts to create different game maps every playthrough, allowing for near-infinite game experiences. It was built in a modular fashion, so that more stealth elements can be added with relative ease and without increasing the generating time significantly.

6.2 Future Work

The next step in this work will be improving the Agent, in the pathfinding heuristic, making sure that it finds a stealthy path in optimal time, in the planning method, using optimization techniques to simplify the search-tree, and applying the layered version of the GOAP architecture, in the behaviour of the Agent, adding new Actions and Goals and changing some of the existing ones to reflect a more realistic approach to each level. The knowledge of the Agent can also be improved, with personality values, such as a value of fear or courage to better emulate a player, and by

29 Conclusions and Future Work optimizing the way the Actions are accessed in order to reduce search times. The PCG tool can be improved by adding more stealth elements, such as noisy ground, or vents that the Agent could crawl through, and by balancing the maps in terms of difficulty.

30 References

[Ash10] Daniel Ashlock. Automatic generation of game elements via evolution. In 2010 IEEE Conference on Computational Intelligence and Games, CIG2010, August 18, 2010 - August 21, 2010, pages 289–296, Copenhagen, Denmark, 2010. IEEE Computer Society.

[Bro91] Rodney A Brooks. Intelligence without representation. Artificial intelligence, 47(1):139–159, 1991.

[CBPD11] D M D Carli, F Bevilacqua, C T Pozzer, and M C D’Ornellas. A Survey of Proce- dural Content Generation Techniques Suitable to Game Development. In Games and Digital Entertainment (SBGAMES), 2011 Brazilian Symposium on, pages 26– 35, 2011.

[Col09] Ron Coleman. Fractal analysis of stealthy pathfinding aesthetics. International Journal of Computer Games Technology, (1), 2009.

[DSS93] Randall Davis, Howard Shrobe, and Peter Szolovits. What Is a Knowledge Repre- sentation? AI magazine, 14:17–33, 1993.

[FN72] Richard E Fikes and Nils J Nilsson. STRIPS: A new approach to the application of theorem proving to problem solving. Artificial intelligence, 2(3):189–208, 1972.

[FNO14] H. Fernandes, P. Nogueira, and E. Oliveira. Monte Carlo Tree Search in The Oc- tagon Theory. In 6th International Conference on Agents and Artificial Intelligence (ICAART), pages 328–335, 2014.

[GMS03] Ross Graham, Hugh Mccabe, and Stephen Sheridan. Pathfinding in computer games. ITB Journal, pages 57–81, 2003.

[HYK10] Frederick W P Heckel, G Michael Youngblood, and Nikhil S Ketkar. Representa- tional complexity of reactive agents. In 2010 IEEE Conference on Computational Intelligence and Games, CIG2010, August 18, 2010 - August 21, 2010, pages 257– 264, Copenhagen, Denmark, 2010. IEEE Computer Society.

[JTSWF10] Martin Jennings-Teats, Gillian Smith, and Noah Wardrip-Fruin. Polymorph: A model for dynamic level generation. In Sixth Artificial Intelligence and Interactive Digital Entertainment Conference, 2010.

[JYT10] Lawrence Johnson, Georgios N Yannakakis, and Julian Togelius. Cellular au- tomata for real-time generation of infinite cave levels. In Proceedings of the 2010 Workshop on Procedural Content Generation in Games, page 10. ACM, 2010.

31 REFERENCES

[KBK07] John-Paul Kelly, Adi Botea, and Sven Koenig. Planning with hierarchical task networks in video games. In Proceedings of the ICAPS-07 Workshop on Planning in Games, 2007.

[Kor00] Richard E Korf. Recent progress in the Design and Analysis of Admissible Heuris- tic Functions. American Association for Artificial Intelligence (AAAI), pages 1165–1170, 2000.

[LCCFL12] R Lara-Cabrera, C Cotta, and A J Fernendez-Leiva. Procedural map generation for a RTS game. In 13th International Conference on Intelligent Games and Sim- ulation (Game-On 2012), 14-16 Nov. 2012, pages 53–58, Ostend, Belgium, 2012. EUROSIS-ETI Publications.

[LCF13a] Raúl Lara Cabrera, Carlos Cotta, and Antonio J Fernández Leiva. Evolving Aes- thetic Maps for a Real Time Strategy Game. 2013.

[LCF13b] Raúl Lara Cabrera, Carlos Cotta, and Antonio J Fernández Leiva. Using Self- Adaptive Evolutionary Algorithms to Evolve Dynamism-Oriented Maps for a Real Time Strategy Game. 2013.

[Lor08] Richard J Lorentz. Amazons discover monte-carlo. In Computers and games, pages 13–24. Springer, 2008.

[MC10] Jean Mehat and Tristan Cazenave. Combining UCT and nested Monte Carlo search for single-player general game playing. IEEE Transactions on Computational Intelligence and AI in Games, 2(4):271–277, 2010.

[MF09] Ian Millington and John Funge. Artificial intelligence for games. CRC Press, 2009.

[MSD+13] Giuseppe Maggiore, Carlos Santos, Dino Dini, Frank Peters, Hans Bouwknegt, and Pieter Spronck. LGOAP: adaptive layered planning for real-time videogames. In Computational Intelligence in Games (CIG), 2013 IEEE Conference on, pages 1–8. IEEE, 2013.

[NARO14] P. A. Nogueira, R. Aguiar, R. Rodrigues, and E. Oliveira. Modelling Players’ Emotional Reactions in Digital Games Via Physiological Input. In EEE/WIC/ACM International Conference on Intelligent Agent Technology, 2014.

[NCLMnA99] Dana Nau, Yue Cao, Amnon Lotem, and Hector Muñoz Avila. SHOP: Simple hier- archical ordered planner. In Proceedings of the 16th international joint conference on Artificial intelligence-Volume 2, pages 968–973. Morgan Kaufmann Publishers Inc., 1999.

[Neg05] Michael Negnevitsky. Artificial intelligence: a guide to intelligent systems. Pear- son Education, 2005.

[NRON13] Pedro A Nogueira, Rui Rodrigues, Eugénio Oliveira, and Lennart E Nacke. Guided emotional state regulation: Understanding and shaping players’ affective experiences in digital games. Proceedings of the Ninth Annual AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), 2013.

32 REFERENCES

[NS13] Pedro Nogueira and José Serra. Personality Simulation in Interactive Agents Through Emotional Biases. In European Conference on Modelling and Simu- lation, pages 25–31. IEEE, 2013. [Ork03] Jeff Orkin. Applying goal-oriented action planning to games. AI Game Program- ming Wisdom, 2(1):217–227, 2003. [Ork04] Jeff Orkin. Symbolic representation of game world state: Toward real-time plan- ning in games. In AAAI Workshop on Challenges in Game AI, 2004. [Ork06] Jeff Orkin. Three states and a plan: the AI of FEAR. In Game Developers Con- ference, volume 2006, page 4. Citeseer, 2006. [PCWL08] David Pizzi, Marc Cavazza, Alex Whittaker, and Jean-Luc Lugrin. Automatic generation of game level solutions as storyboards. In 4th Artificial Intelligence and Interactive Digital Entertainment Conference,AIIDE 2008, October 22, 2008 - October 24, 2008, pages 96–101, Stanford, CA, United states, 2008. AAAI Press. [SP10] Nathan Sorenson and Philippe Pasquier. Towards a generic framework for auto- mated video game level creation. In Applications of Evolutionary Computation, pages 131–140. Springer, 2010. [Stu07] Nathan R Sturtevant. Memory-Efficient Abstractions for Pathfinding. In AIIDE, pages 31–36, 2007. [SW10] Gillian Smith and Jim Whitehead. Analyzing the expressive range of a level gen- erator. In Proceedings of the 2010 Workshop on Procedural Content Generation in Games, page 4. ACM, 2010. [SWV+08] Maarten P D Schadd, Mark H M Winands, H Jaap Van Den Herik, Guillaume MJ- B Chaslot, and Jos W H M Uiterwijk. Single-player monte-carlo tree search. In Computers and Games, pages 1–12. Springer, 2008. [TPY10] Julian Togelius, Mike Preuss, and Georgios N Yannakakis. Towards multiobjective procedural map generation. In Proceedings of the 2010 Workshop on Procedural Content Generation in Games, page 3. ACM, 2010. [TRR+13] LuísFilipe Teófilo, Rosaldo Rossetti, LuísPaulo Reis, HenriqueLopes Cardoso, and PedroAlves Nogueira. Simulation and Performance Assessment of Poker Agents. In Francesca Giardini and Frédéric Amblard, editors, Multi-Agent-Based Simulation XIII SE - 6, volume 7838 of Lecture Notes in Computer Science, pages 69–84. Springer Berlin Heidelberg, 2013. [TSBdK09] Tim Tutenel, Ruben Michaël Smelik, Rafael Bidarra, and Klaas Jan de Kraker. Using Semantics to Improve the Design of Game Worlds. In AIIDE, 2009. [VBJP10] Ondrej Vanek, Branislav Boansky, Michal Jakob, and Michal Pechoucek. Tran- siting areas patrolled by a mobile adversary. In 2010 IEEE Conference on Com- putational Intelligence and Games, CIG2010, August 18, 2010 - August 21, 2010, pages 9–16, Copenhagen, Denmark, 2010. IEEE Computer Society. [vdLLB13] Roland van der Linden, Ricardo Lopes, and Rafael Bidarra. Designing procedu- rally generated levels. In Proceedings of the the second workshop on Artificial Intelligence in the Game Design Process, 2013.

33 REFERENCES

[WJ95] Michael Wooldridge and Nicholas R Jennings. Intelligent agents: Theory and practice. Knowledge engineering review, 10(2):115–152, 1995.

[XYS+09] Liu Xiao, Li Yao, He Suoju, Fu Yiwen, Yang Jiajian, Ji Donglin, and Chen Yang. To create intelligent adaptive game opponent by using Monte-Carlo for the game of Pac-Man. In 2009 Fifth International Conference on Natural Computation (ICNC 2009), 14-16 Aug. 2009, volume vol.5, pages 598–602, Piscataway, NJ, USA, 2009. IEEE.

34 Appendix A

Procedurally generated simulator

There was a need to create a platform for testing the AI developed in different scenarios. There was a platform called AI Sandbox (http://aisandbox.com, 2014) in closed access alpha stages during the development of this thesis, so there was a need to create a platform for testing. Since the agent was decided to be a thief, mirroring some stealth-based games, the the simulator would have some necessary content to emulate the gameplay style of said games, the stealth elements and mechanics that caracterize those games and possible objectives included on those games.

A.0.1 Procedural map generation

The generator was based in the Rogue (Toy and Wichman 1980) map generation, starting with a map of a given size, where every cell composing the map is a wall and the elements of the map are dug out from those walls. The generator has two phases, repeated until the map is deemed ready for the simulation, or until a certain condition has been reached. The first phase is the generation phase where a room is created in a random location inside the map. The second phase is the test phase where the map is then tested to see if it achieves the end state condition or if the generation step created any illegal zones in the map (a blocked path or a room outside the map area). If any illegalities are detected, the generation step is then reversed and attempted again, and if the map achieves the end condition the generation stops and the simulation begins.

A.0.1.1 Step 1: Generation

At the beginning of the first generation phase, the generator receives the following set of parame- ters:

• Maximum map size • Maximum and minimum room size • Maximum number of rooms • Valid room types for the map

The generator digs rooms out of the walls of the map based on the following elements:

35 Procedurally generated simulator

• Cell: Smallest element available, can contain other objects. • Room: Element composed of cells, can be a regular room or a corridor. • Furniture: Elements that can appear inside a room, may be containers with loot and may allow the agent to hide within, under or behind them. • Light: Element that affects the lighting of the cell it is contained and cells within its intensity radius.

During this step the generator creates a room composed of cells, each cell may contain a piece of furniture and a light. The room is generated empty at first, with only the lighting and is subsequently filled with furniture.

A.0.1.2 Step 2: Testing

The testing step of generation tests the room created for accessibility, every part of furniture must be accessible and any doors created must be accessible as well. If the test returns a valid room it selects a random wall connected to a room and creates a door on it, jumping to step 1 and creating a new room. If not, the room is emptied of furniture and refurnished and this step is repeated. If some attempts happen without success the whole room is discarded and the generator jumps to step 1 to create a new room with different properties.

36