DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2017

Creating Human-like AI Movement in Games Using Imitation Learning

CASPER RENMAN

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Creating Human-like AI Movement in Games Using Imitation Learning

May 31, 2017

CASPER RENMAN

Master’s Thesis in Computer Science School of Computer Science and Communication (CSC) Royal Institute of Technology, Stockholm Swedish Title: Imitation Learning som verktyg för att skapa människolik rörelse för AI-karaktärer i spel Principal: Kristoffer Benjaminsson, Games Supervisor: Christopher Peters Examiner: Olov Engwall iii

Abstract

The way characters move and behave in computer and video games are impor- tant factors in their believability, which has an impact on the player’s experi- ence. This project explores Imitation Learning using limited amounts of data as an approach to creating human-like AI behaviour in games, and through a user study investigates what factors determine if a character is human-like, when observed through the characters first-person perspective. The idea is to create or shape AI behaviour by recording one’s own actions. The implemented framework uses a Nearest Neighbour algorithm with a KD-tree as the policy which maps a state to an action. Results showed that the chosen approach was able to create human-like AI behaviour while respecting the performance constraints of a modern 3D game. iv

Sammanfattning

Sättet karaktärer rör sig och beter sig på i dator- och tvspel är viktiga faktorer i deras trovärdighet, som i sin tur har en inverkan på spelarens upplevelse. Det här projektet utforskar Imitation Learning med begränsad mängd data som ett tillvägagångssätt för att skapa människolik rörelse för AI-karaktärer i spel, och utforskar genom en användarstudie vilka faktorer som avgör om en karaktär är människolik, när karaktären observeras genom dess förstapersonsperspektiv. Iden är att skapa eller forma AI-beteende genom att spela in sina egna hand- lingar. Det implementerade ramverket använder en Nearest Neighbour-algoritm med ett KD-tree som den policy som kopplar ett tillstånd till en handling. Re- sultaten visade att det valda tillvägagångssättet lyckades skapa människolikt AI-beteende samtidigt som det respekterar beräkningskomplexitetsrestriktio- ner som ett modernt 3D-spel har. Contents

Contents v

1 Introduction 1 1.1 Artificial Intelligence in games ...... 1 1.1.1 Imitation Learning ...... 2 1.1.2 Human-likeness ...... 3 1.2 Objective ...... 3 1.3 Limitations ...... 4 1.4 Report outline ...... 4

2 Background 5 2.1 Imitation Learning ...... 5 2.1.1 Policy ...... 5 2.1.2 Demonstration ...... 5 2.1.3 State representation ...... 6 2.1.4 Policy creation ...... 6 2.1.5 Data collection ...... 6 2.1.6 Demonstration dataset limitations ...... 7 2.2 Related work ...... 7 2.2.1 Summary and state of the art ...... 10 2.3 Performance in games ...... 11 2.4 Measuring believability of AI ...... 11 2.4.1 Turing test-approach ...... 11 2.4.2 Automated similarity test ...... 12 2.5 Conclusion ...... 12

3 Implementation 13 3.1 Setting ...... 13 3.2 Method motivation ...... 13 3.3 Implementation ...... 14 3.3.1 Summary ...... 14 3.3.2 Recording movement and state representation ...... 15 3.3.3 Playing back movement ...... 16

v vi CONTENTS

3.3.4 Policy ...... 17 3.3.5 Feature extraction ...... 17 3.3.6 Avoiding static obstacles ...... 17 3.3.7 Avoiding dynamic obstacles ...... 19 3.3.8 KD-tree ...... 21 3.3.9 Discretizing the environment ...... 24 3.3.10 Additional details ...... 27 3.3.11 Storing data ...... 28 3.3.12 Optimization and measuring performance ...... 29 3.4 Overall implementation ...... 29

4 Evaluation 31 4.1 User study ...... 31 4.1.1 The set-up ...... 32 4.1.2 Participants ...... 34 4.1.3 Stimuli ...... 34 4.1.4 Procedure ...... 34 4.1.5 Hypothesis ...... 36 4.2 Results ...... 36 4.2.1 User study ...... 37 4.2.2 Imitation agent performance ...... 40 4.3 Discussion ...... 41 4.3.1 The imitation agent ...... 41 4.3.2 The user study ...... 41 4.3.3 Creating non-human-like behaviour ...... 43 4.3.4 Performance in relation to games ...... 43 4.3.5 Ethical aspects ...... 43

5 Conclusions 45 5.1 Future work ...... 46 5.1.1 Use outside of games ...... 47

Bibliography 49 Chapter 1

Introduction

This chapter gives a brief overview of Artificial Intelligence in games, Imitation Learning and human-likeness. It also presents the objective, limitations and the outline of the project.

1.1 Artificial Intelligence in games

Computer and video games produce more and more complex virtual worlds. This in- troduces new challenges for the characters controlled by Artificial Intelligence (AI), also known as agents [20] or NPC’s (Non-Player Characters), meaning characters that are not being controlled by a human player. The way characters move and behave in computer and video games are important factors in their believability, which has an impact on the player’s experience. Being able to interact with NPC’s in meaningful ways and feel that they belong in the world is important [4]. In Virtual Reality (VR) this is even more important, as the gaming experience is even more immersive. The goal of many games’ AI is more or less the same as attempts to beat the Turing test - to create believable intelligence [12]. A popular genre in computer and video games is First-person shooter (FPS). In an FPS game the player experiences the game through the eyes of the character the player is controlling, also known as a first-person perspective. Typically a player is at most able to see the hands and arms of the character the player is controlling. The player can however see the whole bodies of characters of other players’ characters or NPC’s (Non-Player Characters). This is visualized in Figure 1.1.

1 2 CHAPTER 1. INTRODUCTION

Figure 1.1: An example first-person perspective game scenario, seen from the eyes of the character that the player controls. The blue and red characters are NPC’s.

AI in games is traditionally based on Finite State Machines (FSM), Behaviour Trees (BT) or other hand-coded techniques [27]. In these techniques, a programmer needs to explicitly define rules for what an agent should do in different situations. An example of such a rule could be: "if the character’s health is low and the character sees a hostile character, the character should flee". These techniques work in the sense that the agent is able to execute tasks and adapt its behaviour to its situation, but the result is predictable and static [11]. For example, if a player sees an NPC react to a situation the same way it did in an earlier similar situation, the player can be quite sure that the NPC will probably always react like that given a similar situation. In 2006, Orkin [17] said: “in the early generations of shooters, such as Shogo (1998) players were happy if the A.I. noticed them at all and started attacking. . . . Today, players expect more realism, to complement the realism of the physics and lighting in the environments”. In order to get more realism and unpredictability in order to increase the entertainment for the player, it would perhaps be a good approach for agents to imitate human behaviour.

1.1.1 Imitation Learning Imitation Learning (IL) is a technique where the agent learns from examples, or demonstrations, provided by a teacher [1]. IL is a form of Machine Learning (ML). ML has been defined as the “field of study that gives computers the ability to learn without being explicitly programmed” [14]. Unlike Reinforcement Learning algorithms, IL does not require a reward function to be specified. Instead, an IL algorithm observes a teacher perform a task and learns a policy that imitates the teacher, with the purpose of generalizing to unseen data [28]. IL is regarded as a promising technique for creating human-like artificial agents [3]. Some approaches have shown to be able to develop agents with good performance in non-trivial tasks using limited amounts of data and computational resources [3]. It is a technique 1.2. OBJECTIVE 3 which also can be used to dynamically change game play to adapt to different players based on their play style and skill [7].

1.1.2 Human-likeness Shaker et al. [24] describe character believability, which says that an agent is believ- able if someone who observes it believes that the agent is a human being. Player believability on the other hand, says that the agent is believable if someone observ- ing the agent believes that it is a human controlling it. It is player believability that is meant by human-like in this project.

1.2 Objective

The primary goal of this project is to describe a method for creating human-like agent movement using IL with limited amounts of data. The idea is to create an agent by recording one’s own actions, shaping it with desired behaviours. Most related works in the field of IL in games want to create competitive AI, meaning AI that is good at beating the game. This is not the case in this project. The goal is to create AI that lets an agent imitate a demonstrating human well, while respecting the performance requirements of a modern 3D game. A hope is that this will lead to a more unpredictable and human-like agent which in turn could lead to better entertainment for a player playing the game. Lee et al. [9] say that human-like agent behaviour leads to a raised emotional involvement of the player, which increases the players immersion in the game. Whether it is more fun or not to play with a human-like agent will not be explored. This project aims to answer the following question:

– Q1: How can IL be used to create human-like agent behaviour, using limited amounts of data?

This question is further split up into two sub-questions: – Q1.1: How to create an agent that imitates demonstrated behaviour, using IL with limited amounts of data? – Q1.2: What determines if a character is human-like, when observed through the character’s first-person perspective?

The human-likeness of the agent will depend on how human-like the human is when recording itself. This means that behaviour that is non-human-like will also be possible to create. Suppose that it is desired to create a behaviour for a dog in a game. A human would then record itself playing the game, role-playing a dog and behaving like it wants the dog to behave. If the intended behaviour is that the dog should flee when it sees something hostile, then so should the human when recording itself. The outcome should then be an agent that behaves like a dog. 4 CHAPTER 1. INTRODUCTION

1.3 Limitations

By agent movement is meant that the actions that the agent can execute are limited to movement including rotation, i.e. moving from one position to another. As contrast, actions that are not considered movement in this project could for example be shooting, jumping or picking up items. The simulations will be done in a 3D environment but the movement of the implemented agent will be limited to a 2D plane. This means that the agent will not be able to walk up a ramp or climb stairs for example. The movement behaviour of the agent will be limited by the feature extractors implemented, as described in the implementation chapter. In theory, any behaviour which only requires the agent to be able to move could be implemented, like path-finding and obstacle avoidance for example. The project will use limited amounts of data, meaning that it should be possible to create agent behaviour using the framework created in this project, by recording one’s own actions for a couple of minutes. The motivation for this is that if game developers should be able to design their own agent behaviour for a game, there will not exist data for them to use. Some works listed in the related works section perform their experiments in big games such as Quake III1, where there is a lot of saved data available. Quake is a first-person shooter . This allows them to use complex algorithms which perform better with more data. Not requiring a lot of data is also thought to make the contributions of this work more attractive to the gaming industry, as it will require less time and effort to be able to utilize.

1.4 Report outline

The report starts with presenting background information about the areas of Imita- tion Learning and measuring believability of AI, and related work. Following is the implementation chapter which motivates the choice of methods and describes the implementation process. The evaluation chapter describes the user study which was conducted in order to evaluate the human-likeness of the resulting imitation agent. It also presents the results of the user study and a brief performance measurement of the imitation agent, as well as summarizes what was done in the project and discusses the results. Finally conclusions are made in the conclusions chapter.

1https://en.wikipedia.org/wiki/Quake_III_Arena/ Chapter 2

Background

This chapter presents background knowledge and related works about Imitation Learning and measuring believability of AI controlled characters. It also presents why heavy computations with long computational times are particularly bad in games.

2.1 Imitation Learning

The work by Argall et al. [1] is frequently cited and is a comprehensive survey of IL. The survey is the biggest source of background knowledge in the area of IL in this project. They describe IL as a subset of Supervised Learning, where an agent learns an approximation of the function that produced the provided labeled training data, called a policy. The dataset is made up out of demonstrations of a given task.

2.1.1 Policy A policy π is a function that maps a state x to an action u. A policy allows an agent to select an action based on its current state. Developing a policy by hand is often difficult. Therefore machine learning algorithms have been used for policy development [1].

2.1.2 Demonstration A demonstration is a sequence of state-action pairs that are recorded at the time of the demonstration of the desired behaviour [1]. This way of learning a policy through examples, differs from learning it based on data collected through exploration such as in Reinforcement Learning [25]. A feature of IL is that it focuses the dataset to areas of the state-space that is actually encountered during the execution of the behaviour [1]. This is a good thing in games where computation time is very limited, as the search space of appropriate solutions is reduced.

5 6 CHAPTER 2. BACKGROUND

2.1.3 State representation A state can be represented as either discrete, e.g. can see enemy or cannot see enemy or continuous, e.g. 3D position and rotation of the agent.

2.1.4 Policy creation Creating a policy can be done in different ways. A mapping function uses the demonstrated data to directly approximate the function mapping from the agent’s state observations to actions (f() : Z → A) [1]. This can be done using either classification where the output is class labels, or regression where the output consists of continuous values. A system model uses the demonstrated data to create a model. A policy is then derived from that model [1]. Plans use the demonstrated data together with user intention information to learn rules that associate pre- and post-conditions with each action. A sequence of actions is then planned using that information [1].

2.1.5 Data collection The correspondence problem [16] has to do with the mapping between the teacher and the learner (see Figure 2.1). For example, a player playing an FPS game using a mouse and keyboard sends inputs which are processed by the game and translated into actions. An NPC in the same game, is controlled by AI which sends commands to control the character several times per second, which is not directly equivalent to keystrokes and mouse movements of a human player.

Figure 2.1: Visualization of the record mapping and embodiment mapping.

The record mapping is the extent to which the exact states/actions experienced by the teacher during demonstration are recorded in the dataset [1]. If there is no record mapping or a direct record mapping, the exact states/actions are recorded in the dataset. Otherwise some encoding function is applied to the data before storing the data. The embodiment mapping is the extent to which the states/actions recorded within the dataset are exactly those that the learner would observe/execute [1]. If there is no embodiment mapping or a direct embodiment mapping, the recorded states/actions are exactly those that the learner will observe/execute. Otherwise there is a function which maps the recorded states/actions to actions to be executed. Two data collection approaches are demonstration and imitation [1]. In demonstra- tion, the teacher can operate the learner through teleoperation where the record 2.2. RELATED WORK 7 mapping is direct. There is also shadowing where the agent tries to mimic the teachers motions by using its own sensors. Here the record mapping is non-direct. Within imitation the embodiment mapping is non-direct, and the teacher execution can be recorded either with sensors on the teacher where the record mapping is direct, or external observation where the record mapping is non-direct.

2.1.6 Demonstration dataset limitations In IL, the performance of an agent is heavily dependent on the demonstration dataset. Low learner performance can be due to areas of the state space that have not been demonstrated. This can be solved by either improving upon the existing demonstrations by generalizing them or through acquisition of new demon- strations [1]. As mentioned, low performance can also be caused due to low quality of the demonstration dataset [1]. Dealing with this involves eliminating parts of the teacher’s executions that are suboptimal. Another solution is to let the learner learn from experience. If feedback is provided on the learners actions, this can be used to update the policy [1]. The demonstration dataset limitations are not dealt with in this project, as it is considered out of scope. It is however mentioned as a possible extension in the Future work chapter.

2.2 Related work

This section gives an overview of the related work in the field of Imitation Learning in games in chronological order. Thurau et al. [26] in "Imitation In All Levels of Game AI" create bots for the game Quake II1. Different algorithms are presented that learn from human generated data. They create behaviours on different levels: strategic behaviour used to achieve long-term goals, tactical behaviour used for localized situation handling such as anticipating enemy movement, and reactive behaviour like jumping, aiming and shooting. The generated bots are compared to the existing Quake II bots. It is shown that Machine Learning can be applied on different behavioural layers. It is concluded that Imitation Learning is well suited for generating behaviour for artificial game characters. The bots created with Imitation Learning outperformed the Quake II bots. It should however be taken into consideration that these results are thirteen years old at the time of writing this report. Priesterjahn et al. [20] in "Evolution of Reactive Rules in Multi Player Computer Games Based on Imitation" propose a system in which the behaviour of artificial opponents is created through learning rules by observing human players. The rules are selected using an evolutionary algorithm with the goal of choosing the best and most important rules and optimizing the behaviour of the agent.

1https://en.wikipedia.org/wiki/Quake_II/ 8 CHAPTER 2. BACKGROUND

The paper shows that limited learning effort is needed to create behaviour which is competitive in reactive situations in the game Quake III. After a few generations of the algorithm, the agent was able to behave in the same way as the original players. In the conducted experiments, the generated agent outperformed the built in game agents. The world is simplified to a single plane. The plane is divided into cells in a grid, with the agent centered in the grid. The grid moves relative to the agent. Each frame, the agent checks each cell if it is empty or not and scores it accordingly. They limit the commands to moving and attacking or not attacking. A rule is a mapping from a grid to a command. Human players are recorded and a basic rule set is generated by recording the grid-to-command matches every frame of the game. An evolutionary algorithm is then used to learn the best rules and thus the best competitive behaviour. Saunders et al. [21] in "Teaching Robots by Moulding Behavior and Scaffolding the Environment" teaches behaviour to robots by moulding their actions within a scaffolded environment. A scaffolded environment is an environment which is modified to make it easier for the robot to complete a task, when the robot is at a developmental stage. Robot behaviour is created by teaching state-action memory maps in a hierarchical manner, which during execution are polled using a k-Nearest Neighbour based algorithm. Their goal was to reproduce all observable movement behaviours. Their results show that the Bayesian framework leads to human-like behaviour. Priesterjahn [19] in "Imitation-Based Evolution of Artificial Players in Modern Computer Game" which is based on the paper by [20], proposes the usage of imita- tion techniques to generate more human-like behaviours in an action game. Players are recorded, and the recordings are used as the basis of an evolutionary learning ap- proach. The approach is motivated by stating that to behave human-like, an agent should base its behaviour on how human players play the game and try to imitate them. This as opposed to a pure learning approach based on the optimization of behaviour, which only optimizes the raw performance of the game agent. The authors present the result of the conducted experiments and explain that the imitation-based initialization has a big effect on the performance and behaviour of the evolved agents. The generated agents showed a much higher of sophis- tication in their behaviour and appeared much more human-like than the agents evolved using plain evolution, though performing worse. Cardamone et al. [3] in "Learning Drivers for TORCS through Imitation Using Supervised Methods" develop drivers for The Open Racing Car Simulator (TORCS) using a direct method, meaning the method uses supervised learning to learn driving behaviour from data collected from other drivers. They show that by using high- level information about the environment and high-level actions to be performed, the developed drivers can achieve good performance. High-level actions mean that they learn trajectories and speeds along the track, and let controllers achieve the target 2.2. RELATED WORK 9 values. This as opposed to predicting/learning low-level actions such as pressing the gas pedal an amount, or rotate the wheel an amount of degrees. It is also stated that the performance can be achieved with limited amounts of data and limited computational power. The learning methods used are k-Nearest Neighbour and Neural Networks with Neuroevolution. The performance is measured in how fast a driver completes a race, which means they want to create an AI that is good at playing the game. It is compared to the best AI driver. Munoz et al. [15] in "Controller for TORCS Created by Imitation" create a con- troller for the game TORCS using Imitation Learning. They use three types of drivers to imitate: a human player, an AI controller created with Machine Learn- ing and one hand-coded controller which performs a complete lap. The imitation is done on each of the drivers separately and then a mix of the data is combined into new controllers. The aim of the work is to create competitive NPCs that imi- tate human behaviour. The learning method is feed-forward Neural Networks with Backpropagation. The performance of the driver is measured by how fast a driver completes a race. It is compared to other AI and human drivers. They conclude that it is difficult to learn from human behaviour, as humans do not always perform the same actions given the same situation. Humans also make mistakes, which is not good behaviour to learn if the goal is to create a driver that is good at playing the game. Mehta et al. [13] in "Authoring Behaviors for Games using Learning from Demon- stration" is similar to [21] in that behaviour is taught by demonstrating actions and annotating the actions with a goal. Here, the learning involves four steps:

– Demonstration: Playing the game.

– Annotation: Specifying the goals the teacher was pursuing for each action.

– Behaviour learning: Using a temporal reasoning framework.

– Behaviour execution: Done through a case-based reasoning (CBR) technique, case-based planning.

The goal of this project was to create a framework in which people without pro- gramming skills can create game AI behaviour by demonstration. The authors conclude that by using case-based planning techniques, concrete be- haviours demonstrated in concrete game situations can be reused by the system in a range of other game situations, providing an easy way to author general behaviours. Karpov et al. [8] in "UT2: Believable Bot Navigation via Playback of Human Traces" create the UT2 bot for the BotPrize competition2, a Turing-like test where

2http://botprize.org/ 10 CHAPTER 2. BACKGROUND computer game bots compete by attempting to fool human judges into thinking they are just another human player. UT2 broke the 50% humanness threshold and won the grand prize in 2012. The bot has a component called the Human Trace Controller, which is inspired by the idea of direct imitation. The controller uses a database of recorded human games in order to retrieve and play back segments of human behaviour. The results show that using direct imitation allows the bot to solve navigation problems while moving in a human-like fashion. Two types of data are recorded, pose data and event data. The pose includes position, orientation, velocity and acceleration. An event is for example switching weapons, firing weapons or jumping. All of the pose and event data for a player in a particular game form a sequence. Sequences are stored so that preceding and succeeding event and pose data can be retrieved from any given pose or event. In order to be able to quickly retrieve the relevant human traces, they implemented an efficient indexing scheme of the data. The two most effective indexing schemes used were Octree based indexing and Navigation Graph based indexing using a KD- tree. Ortega et al. [18] in "Imitating Human Playing Styles in Super Mario Bros" describe and compare different methods for generating game AI based on Imitation Learning. Three different methods for imitating human behaviour are compared: Backpropagation, Neuroevolution and Dynamic scripting. The game is in 2D. Similarity in playing style is measured through comparing the play trace of one or several human players with the play trace of an AI player. The methods com- pared are hand-coded, direct (based on supervised learning) or indirect (based on maximizing a similarity measure). The conclusion is that a method based on Neu- roevolution performs best both when evaluated by the similarity measure and by human spectators. Inputs were the game state, e.g. enemies, obstacles and distance to gaps and outputs were actions.

2.2.1 Summary and state of the art In 2006, Gorman et al. [6] stated that every particular game is different from the other, and claimed that it thus probably is impossible to suggest an ultimate ap- proach. They said that "Currently, there are no generally preferred knowledge rep- resentation data structures and machine learning algorithms for the task of creating believable behaviour". They claim that believable characters should possess certain features, that hardly can be achieved without observing and/or simulating human behaviour. Imitation Learning is listed as a proven human behaviour acquisition method. Few of the works listed here have the sole aim of creating an agent that imitates demonstrated behaviour as well as possible, and no such works could be found. Most 2.3. PERFORMANCE IN GAMES 11 have another aim, such as performing as well as a human, or performing well after being inspired by human behaviour. The most popular and successful approach in these works are using Neural Networks with Neuroevolution, which is a form of Machine Learning that uses evolutionary algorithms to train Neural Networks [18]. The Human Trace Controller in the work by Karpov et al. [8] however, is the most recent and successful found work which aims to imitate demonstrated behaviour, without doing it in a "beating the game"-manner.

2.3 Performance in games

In games it is important to keep computational times low and a high and stable frame rate, usually measured in frames per second (FPS). The frame rate is the frequency at which frames (images) in a game (or video) are displayed. A high frame rate typically means about 60 FPS for normal computer games and about 90 FPS for VR games, in order to have objects on the screen appear to move smoothly. Games usually contain a function called the update or tick function, which runs once every frame. The game will wait for the update function to finish before processing the next frame. If the calculations made in the update function take longer than the time slot for one frame (in order to keep 90 FPS, one frame has 1/90 ≈ 0.11 ms to run its calculations) the game will not be able to stay at its target FPS and will not run as smoothly.

2.4 Measuring believability of AI

Umarov and Mozgovoy [27] study current approaches to believability and effec- tiveness of AI behaviour in virtual worlds and gives a good overview of different approaches. They talk both about measuring believability as well as various imple- mentations for achieving it in games. It is stated how believability is not the only feature that makes AI-controlled char- acters fun to play with. A game should be challenging, so the agent should also be skilled or effective. However, they explain that the goals of believability and effectiveness are not always the same. A skilled agent is not necessarily believable, and a believable agent might be a weak opponent.

2.4.1 Turing test-approach To evaluate the believability of an AI controlled character, Umarov and Mozgovoy [27] refer to a Turing test-approach, where a human player (judge) plays a game against two opponents, where one opponent is controlled by a human and one is controlled by an AI. The judge’s task is to determine which one is human. A simplification of this test is also mentioned, where the judge instead watches a game between two players which both can be controlled either by a human or an AI. The judge’s task is 12 CHAPTER 2. BACKGROUND to identify game participants. Lee et al. [9] learn human-like behaviour via Markov decision processes in the 2D game Super Mario. They evaluate the human-likeness by performing a modified Turing test [22] as well. Gorman et al. [6] performed an experiment which [27] refers to. Quake II agents were evaluated by showing a number of people a series of video clips as seen by the char- acters first-person camera. The task was to identify whether the active character is human. The different characters were controlled by a real human player, a Quake agent and a specifically designed imitation agent that tried to reproduce human behaviour using Bayesian motion modeling. The imitation agent was misidentified as a human 69% of the time and the Quake agent was mistaken as a human 36% of the time. "Sample evaluators’ comments, quoted in (Gorman et al., 2006), indicate that quite simple clues were used to guess human players (’fires gun for no reason, so must be human’, ’stand and wait, AI wouldn’t do this’, ’unnecessary jumping’)".

2.4.2 Automated similarity test One way to compare human player actions and agent actions, is by comparing velocity direction angle changes and frequencies of angles between player direction and velocity direction. Another is to compare pre-recorded trajectories of human players with those of agents [27].

2.5 Conclusion

This chapter presented Imitation Learning and the different challenges that it in- volves. Then related works were listed and the state of the art was determined. It seems like a direct imitation method is a good approach as used by Karpov et al. [8]. Since no learning is done the approach should give a lot of control, which is good as the computational performance of AI in games is important. The choice of method is described in detail in the next chapter. In order to evaluate the believability of the agent, a Turing test-approach is described as an option. The evaluation is described in Chapter 4. Chapter 3

Implementation

This chapter describes the implementation of the Imitation Learning framework and thereby aims to answer Q1.1. Section 3.3.1 provides a summary of what was implemented. Throughout an iterative implementation process it was determined what to imple- ment, in order to create an agent with behaviour which can be evaluated. The agent created in this process will be referred to as the agent when no other type of AI controlled character is in the same context. Otherwise it will be referred to as the imitation agent.

3.1 Setting

The implementation was carried out in the Unity®Pro game engine1. Unity is a cross-platform game engine developed by Unity Technologies and used to develop video games for PC, consoles, mobile devices and websites.

3.2 Method motivation

To keep the complexity of the framework low, and to allow for quick evaluation and iteration, it was decided to go with a Nearest Neighbour (NN) classification approach as used by Cardamone et al. [3]. Policy creation is thus done through a mapping function. No learning is done, and the collected data represents the model. Argall et al. [1] state that regardless of learning technique "minimal param- eter tuning and fast learning times requiring few training examples are desirable". This speaks against more sophisticated algorithms such as Neural Networks, which require a lot of data to perform well. Cardamone et al. [3] claim that it is desirable to have the output of the agent be high-level actions, such as a target position and velocity as opposed to low-level actions such as a certain key press for a certain

1https://unity3d.com/

13 14 CHAPTER 3. IMPLEMENTATION amount of time. Other classification techniques may perform as well or better than Nearest Neighbour algorithms, but the focus of the thesis is not to compare or find the best classification algorithm. It is however important that the algorithm is fast, as there is not much time for heavy calculations in a game. Karpov et al. [8] show that using direct imitation, i.e. playing back recorded segments of human gameplay as they were recorded, allows the bot to solve navigation problems while moving in a human-like fashion. Their work passes the test of a structured and recognized competition aimed at measuring human-likeness, which gives the work high credi- bility. It is also one of the most recent works. This project was therefore inspired by their solution. The implementation used imitation as the data collection approach, where the record mapping is direct and the embodiment mapping is indirect. This is described in more detail in the next section.

3.3 Implementation

3.3.1 Summary An Imitation Learning framework was created which allows a human to create human-like agent behaviour by recording its own actions. Below is a summary of the implementation of the imitation agent. Details are described in the subsections following this summary.

– Recording movement: The human is in control of the agent and the agent’s state is continuously recorded. – Playing back movement: The agent moves by executing actions. An action is a set of states. An action is chosen by classifying the agent’s state and weighing actions. Classification is done using a Nearest Neighbour algorithm. – Feature extraction: The agent uses sensors to sense the environment. Read- ing the sensors results in a feature vector that is a representation of the envi- ronment. – Avoiding static obstacles: If there is recorded data which corresponds to the agent’s current state, the agent will be able to avoid obstacles by executing the nearest neighbour action. If that is not the case, static obstacles are avoided by checking if an action goes through a static obstacle or not in the Nearest Neighbour algorithm. If it does the action is not considered a near neighbour and is not chosen. – Avoiding dynamic obstacles: Dynamic obstacles are avoided like static obstacles, but a different feature extractor is utilized which extracts different features. The dynamic obstacle avoidance was the last part of the implemen- tation process. 3.3. IMPLEMENTATION 15

– KD-tree: A KD-tree is used to speed up the Nearest Neighbour algorithm.

– Grid: The environment is discretized into a grid of cells. The grid is used in weighing actions. An action is weighted with the score a cell. The grid can be manipulated to make the agent move to a destination.

3.3.2 Recording movement and state representation

Figure 3.1: Flowchart visualizing the record mode.

The agent can be in either Record or Playback mode. During recording, a human is in control of the agent from the agent’s first-person perspective using a mouse and keyboard. The record mapping was direct, meaning that the exact states/actions were recorded in the dataset. Data was recorded when the direction vector of the agent changed, and the distance between the agent’s current position and the last 16 CHAPTER 3. IMPLEMENTATION recorded position was bigger than a set threshold. The policy that an IL algorithm is meant to learn, maps a state x to an action u. Adopting this terminology, one record of data was structured as a state. Several states make up an action. A state consists of two parts. The first part is the agent’s position, rotation and direction (i.e. the agent’s forward vector), called the pose state. The state representation is thus continuous. The pose state also contains the time passed between the previous state and the current state. The second part is a feature vector of floats, corresponding to a representation of the environment at the current pose state. This second part is called the sensor state. How the sensor state is created is explained in further detail in the section Avoiding static obstacles. Karpov et al. [8] similarly use sequences of states for representing the stored human traces, separating them into a pose state and an event state. The data is stored by writing all recorded states as binary data to a file. When more data is recorded, the data is appended to the existing file. Figure 3.2 shows one environment, or scene, used during development at an early stage of the implementation process. The aim here was to play back recorded data by having the agent move to the closest position in the recorded data.

Figure 3.2: The scene. Recorded trajectory data in black and the agent’s trajectory in blue.

3.3.3 Playing back movement During Playback the agent moves on its own by executing actions. Executing an action means moving from one recorded pose state to the next, interpolating between states to achieve a position and rotation that approximate the recorded data. This interpolation/approximation is a form of embodiment mapping, as the agent maps the recorded data into movement. The embodiment mapping was therefore non- direct, meaning that the recorded states/actions were not exactly those that the agent would execute. To find an action to execute, the agent’s sensor state is classified using a NN algorithm. The algorithm returns the nearest recorded action to the agent’s current sensor state. This action is then applied relative to the agent’s current pose state so that the action’s first state is the same as the agent’s current 3.3. IMPLEMENTATION 17 rotation. To create smooth rotation between states the following was done: Suppose that the agent is at the first state a where it has the correct rotation r1, and the next pose state is b containing rotation r2. When moving from a to b, the rotation of the agent is set to be the value of the interpolation between r1 and r2 by the distance traveled from a to b. Upon reaching b the rotation is therefore r2. Slight errors in the imitation occurs here, since the human most likely did not rotate at a constant speed when demonstrating. However, making the distance between states short made it hard to tell a difference when observing the agent. When the agent has finished executing an action, meaning it has reached the final pose state position of an action, the process is repeated by classifying the sensor state again.

3.3.4 Policy A policy is a function that maps a state to an action. The NN algorithm receives a state as input, efficiently finds the best action with the KD-tree data structure and returns it. Thus the NN algorithm with the KD-tree can be said to be the policy.

3.3.5 Feature extraction An IL algorithm learns a policy that imitates the teacher, with the purpose of generalizing to unseen data. In order to generalize, the agent had to sense its environment and represent it in a way which allows for recognizing similar states. The feature extraction process uses sensors on the teacher to sense the environment and represents it as a vector of floats, called the feature vector or simply the features. When recording a state or classifying a state, the sensor state is created by extracting features for the agent’s current pose state.

3.3.6 Avoiding static obstacles In many games, a desirable skill for an agent to have is to be able to avoid obstacles, so called obstacle avoidance. In order to be able to avoid static (non-moving) obstacles, such as walls, sensors were implemented similar to the ones used by the authors of [8] in [23]. They show a figure similar to Figure 3.3a which represents the sensors they use on their Quake III bot. Their motivation was that there are more sensors near the front so that the agent can better distinguish locations in front of it. 18 CHAPTER 3. IMPLEMENTATION

(a) (b)

Figure 3.3: Sensors similar to those used by Schrum et al. [23] (a) were added to the agent (b).

The feature extractor creates the sensor state by ray casting in all sensor directions using Unity’s function Physics.Raycast. The function returns information about what was hit, including the distance to the hit obstacle/collider. This results in a feature vector v containing the distances x1, ..., x6 to obstacles in the different directions. Figure 3.4 shows how data could be recorded in one environment (Figure 3.4a) and played back in another (Figure 3.4b), thus showing that the approach generalizes to new environments.

(a) (b)

Figure 3.4: Recorded traces in black, the chosen action in green, the chosen action applied to the agent in blue and sensors in white. 3.3. IMPLEMENTATION 19

Figure 3.4b shows how the agent currently is in the top right corner. When clas- sifying its state, it is determined that an action should be chosen as if it currently was in the lower left corner (the action is highlighted in green). This makes sense, as it is a similar situation. If there is recorded data which corresponds to the agent’s current state, the agent will be able to avoid obstacles by executing the nearest neighbour action. However, that may not always be the case, as there probably will not be recorded data for every possible state. Therefore the NN algorithm checks if actions go through a static obstacle, and if so does not consider them near neighbours and they will not be chosen.

3.3.7 Avoiding dynamic obstacles Another common task for game AI is to be able to avoid moving (dynamic) obstacles. A new feature extractor was created which sensed the environment in a different way. The area within a certain radius around the agent was sensed with the purpose of sensing moving obstacles, visualized in Figure 3.5a. To be able to recognize a state correctly, it was needed to be able to differentiate between obstacles moving in different directions. For example if an obstacle is close and headed straight towards the agent, the agent should probably dodge the obstacle somehow. If the obstacle is headed away from the agent however, no particular action needs to be taken. Intuitively when an agent should avoid an obstacle, it would be important to know:

• How close is the obstacle to the agent?

• Is the obstacle moving towards or or away from the agent?

• Will the obstacle hit the agent if the agent does not move?

What is important is to be able to distinguish one state from another. The resulting extractor extracts three features per moving obstacle within the sensor. This is described in Algorithm 1 and visualized in Figure 3.5b. The features are 20 CHAPTER 3. IMPLEMENTATION

Algorithm 1 Dynamic obstacle extractor 1: function ExtractFeatures(agent) 2: Sort obstacles in sensor by distance 3: for each moving obstacle obstacle at index i in sensor do 4: velocitySimilarity ← dot(agent.velocity, obstacle.velocity) 5: sqrDist ← sqrDist(agent, obstacle) 6: diffVector ← obstacle.position - agent.position 7: velPosSimilarity ← dot(diffVector, agent.velocity) 8: 9: features[3 * i] ← velocitySimilarity 10: features[3 * i + 1] ← sqrDist 11: features[3 * i + 2] ← velPosSimilarity

(a) (b)

Figure 3.5: The new sensor (a) and visualization of the vectors used in calculating features for the dynamic obstacle extractor (b).

The velocitySimilarity is the dot product of the agent velocity and the obstacle velocity. It will tell whether an obstacle is heading in the same direction as the agent or not. velPosSimilarity is the dot product between the diffVector and the agent’s velocity. This value says whether the obstacle lies in the agent’s current path or not. If this value is 1 it means that the two vectors are in the same direction. This means that the agent is headed straight towards the obstacle. sqrDist could act as a weight for how crucial the situation is. The proposed approach is by no means the correct or the best solution. Different approaches similar to the above were tried, but these values were able to distinguish the agent’s state the best out of the tried values. Using this with recorded data containing around 100 actions demonstrating how to avoiding a single obstacle, the 3.3. IMPLEMENTATION 21 agent was able to avoid a single obstacle efficiently. Attempts were also made with more obstacles at the same time. In many situations, the agent would avoid obsta- cles well, but in some it would not. In theory, like with static obstacle avoidance, if there is data for every situation, the feature extractor separates different situations well and the quality of the data is good, then the agent should be able to always avoid obstacles. Good data is meant in the sense of the current goal behaviour. If the goal behaviour is obstacle avoidance, the data is good if the recorded human performed good/avoiding actions and did not walk into an obstacle while recording.

(a) t = 1 (b) t = 2

Figure 3.6: The agent avoiding an obstacle (blue square) moving in the opposite direction. The blue curve is the agent’s chosen action trajectory that it chose at t = 1 when it sensed the obstacle. At t = 2 the agent has moved further along the trajectory and the obstacle has moved further to the right.

3.3.8 KD-tree It was decided to implement a data structure to make the NN algorithm more ef- ficient. Karpov et al. [8] use a KD-tree as one of their approaches to efficiently retrieve recorded data. KD-tree is a common approach to making NN algorithms more efficient. Weber et al. [29] showed that if a nearest neighbour approach is used in a space of magnitude higher than ten dimensions, it better to use a naive exhaustive search. The reason is that the work of partitioning the space becomes more expensive than the similarity measure. The number of features was six (dis- tance to walls in six directions), which is less than ten, so a KD-tree should speed up the NN algorithm. A KD-tree is a space-partitioning data structure for organizing points in k-dimensional space. During construction, as one moves down the tree, one cycles through the axes used to select the splitting planes that divide the space. In the case of a two- dimensional space, this could be the x and y coordinates (Figure 3.7). Points are inserted by selecting the median point from the list of points being inserted, with 22 CHAPTER 3. IMPLEMENTATION

root X (7, 2)

Y (5, 4) (9, 6)

X (2, 3) (4, 7) (8, 1) (2, 7)

Figure 3.7: The points (7, 2), (5, 4), (2, 3), (4, 7), (9, 6), (8, 1), (2, 7) inserted in the KD-tree. respect to the coordinates in the axis being used. If one starts with the x axis, the points would be divided into the median point with respect to the x coordinate and two sets: the points with an x coordinate less than the median and the points with an x coordinate bigger than the median. Then, recursively the two sets do the same thing, cycling on to the next axis (y). This would correspond to cycling through the features representing distances to walls in different directions. Algorithm 2 describes the construction of the KD-tree.

Algorithm 2 Construction of the KD-tree 1: function BuildTree(actions, depth = 0) 2: dimensions ← numFeatures(actions) 3: axis ← depth % dimensions 4: sort(actions) by comparing feature[axis] for actions 5: median ← median element in sorted actions 6: if median is the only element then 7: return TreeNode(median, null, null, axis) 8: a ← actionsBeforeMedian 9: b ← actionsAfterMedian 10: return TreeNode(median, BuildTree(a, depth + 1), 11: BuildTree(b, depth + 1), axis)

The nearest neighbour algorithm using the KD-tree is described in Algorithm 3. The search time is on average O(log n). 3.3. IMPLEMENTATION 23

Algorithm 3 The Nearest Neighbour algorithm 1: function NN(node, inputState, ref nearestNeighbour, ref nearestDist) 2: if node is null then 3: return 4: searchPointAxisValue ← inputState[node.axis] 5: dist ← ∞ 6: nodeAxisValue, index ← 0 7: 8: // Determine how near current action is to input 9: for state s at index i in node.action do 10: if dist(inputState, s) < dist then 11: dist ← dist(inputState, s) 12: nodeAxisValue ← node.action.state(i)[node.axis] 13: index ← i 14: if node.leftChild is null && node.rightChild is null then 15: return 16: 17: // Applying the action on the current state 18: appliedAction ← applyActionOnState(inputState, node.action) 19: 20: // Let calling model weigh action (it may i.e. go through an obstacle) 21: weight ← weighAction(callingModel, appliedAction) 22: dist ← weight 23: 24: // Determine the nearest side to search first 25: nearestSide, furthestSide ← null 26: if searchPointAxisValue < nodeAxisValue then 27: nearestSide ← node.leftChild 28: furthestSide ← node.rightChild 29: else 30: nearestSide ← node.rightChild 31: furthestSide ← node.leftChild 32: NN(nearestSide, inputState, nearestNeighbour, nearestDist) 33: if dist < nearestDist then 34: // Update nearest neighbour as recursion unwinds 35: nearestNeighbour ← node.action 36: nearestDist ← dist 37: 38: // Check if it is worth searching on the other side 39: nearestAxisValue ← nearestNeighbour.state(index)[node.axis] 40: splittingPlaneDist ← dist(inputState, splittingPlane) 41: nearestNeighbourDist ← dist(inputState, nearestNeighbour) 42: if splittingPlaneDist < nearestNeighbourDist then 43: NN(furthestSide, inputState, nearestNeighbour, nearestDist) 24 CHAPTER 3. IMPLEMENTATION

Following is a short and slightly simplified explanation of the algorithm. An ex- tended description of how the algorithm works can for example be found in the Wikipedia article2. The algorithm recursively moves down the tree, starting from the root. When it reaches a leaf, that leaf is set as the current best. As the recursion unwinds, each node compares its distance to the input to the current best. If the distance is smaller than the current best, then the node is set to the current best. It also checks whether it is possible that a nearer neighbour can be on the other side of a node. If the distance between the current best node and the input search point is bigger than the distance from the input search point to the current node, then there might be a nearer neighbour on the other side of the current node, so that side is searched. When the search reaches the root node, the search is done.

3.3.9 Discretizing the environment In games, it is desirable to be able to tell an AI to go to a position. This diverges from the Imitation Learning, as the sensor state is not used to decide what action to execute. Instead an external input says what position to go to. It was decided to implement it however, for the sake of practical usability. One could argue that the agent still moves in a human-like fashion, as it executes actions the same way the actions were recorded, and the only way for the agent to move is by executing actions. A first approach in making the agent go to a goal was to weigh the actions by how close an action would take the agent towards the goal. This worked to some extent, but the agent did not register where it had been or if it walked into a dead end. This resulted in it sometimes walking around in the same area for a long time, without realising that it did not get closer to the goal. The phenomenon is shown in Figure 3.8. It was therefore concluded that some sort of path finding was needed and that it would help to be able to say if a position on a map was good or bad, or close to the goal or not.

2https://en.wikipedia.org/wiki/K-d_tree/ 3.3. IMPLEMENTATION 25

Figure 3.8: Problem with getting stuck. The blue lines show traces of the agent trying to get to the white goal.

Priesterjahn et al. [20] used a grid to represent a state in their Neuroevolution approach. Inspired by them, the map was discretized into a grid of cells where each cell had a score which represented the distance from the cell to the goal. Actions were then weighted by the score of the cell that the action ended up in. A lower score means closer to the goal (greener in Figure 3.9a). As the agent moved around the map, the score of the nine adjacent cells to the agent were increased, thus decreasing the chance of picking an action which ended up in one of those cells again. Spending time in a corner would result in those cells getting a higher score, which would lead to the agent not going there again. This is visualized in Figure 3.9. 26 CHAPTER 3. IMPLEMENTATION

(a) The grid. (b) t = 1

(c) t = 2 (d) t = 3

Figure 3.9: As cells are visited, their scores are increased.

This approach solved the problem of the agent getting stuck in corners or close to the goal but on the wrong side of a wall. This was however more of an exploring approach, which could be used if the agent does not know where the goal is. Unless the agent is meant to be blind, this strategy would need to be improved by scoring cells which the agent can see. Telling the agent to go to a position means that the agent knows where the goal is. Therefore a better path finding strategy was implemented. Using the classic A* algorithm3, the grid would calculate the shortest path from the agent to the goal, and score each cell the shortest path touches with its path distance to the goal. Other cells were scored with a bad score. This is visualized in Figure 3.10. The grid is the tool a programmer/user would use to influence what the NN algorithm should consider a good action to be. In the NN algorithm, actions are weighted according to the cell score at the action’s last pose state position.

3https://en.wikipedia.org/wiki/A*_search_algorithm/ 3.3. IMPLEMENTATION 27

Figure 3.10: Cells that touch the A* path from the agent to the goal are scored with a low score (green).

3.3.10 Additional details The length of an action could be chosen, which would split up the recorded data into actions of the given length. States in an action were recorded in sequence af- ter each other, so while executing an action, the agent moves like the human who recorded itself did. Choosing a big action length would result in long actions, and thus longer continuous segments of the agent behaving human-like. The downside of long actions is that they might not be able to get the agent out of certain situa- tions without hitting an obstacle. They may also take the agent to worse locations. If there is no recorded data similar to the agent’s current state, the returned ac- tion probably does not suit the situation well. A longer action would then result in a bigger bad investment, whereas a shorter action would be able to re-classify the state sooner and hopefully get a better suiting action. Short actions however would result in shorter continuous segments of the agent behaving human-like. It would also require the state to be classified more often, which has an impact on the performance. Classifying often however, increases the chance of choosing a correct action for the situation. An action length that was somewhere in between long and short was chosen at first. Later, support for splitting up the data into several action lengths at the same time was implemented. This would help by making long actions available for areas without obstacles and short actions available for trickier situations. In practice, for an AI to be useful in a game, it should be possible to define different types of behaviour and be able to switch between them depending on the situation. The implementation was structured to allow for several types of actions and models, resulting in a loop described in Algorithm 4. Data was recorded separately for each 28 CHAPTER 3. IMPLEMENTATION behaviour.

Algorithm 4 The agent loop 1: function Update 2: if recording then 3: // Recording 4: features ← featureExtractor.ExtractFeatures(agent) 5: recorder.Record(agent, features) 6: else 7: // Playback 8: if action is done executing or was aborted then 9: features ← featureExtractor.ExtractFeatures(agent) 10: action ← model.Classify(agent, features) 11: else 12: action.Execute(agent, destination)

The agent used a controller for deciding which feature extractor to use. When a dynamic obstacle would come within a certain distance, the agent would switch to the feature extractor for dynamic obstacle avoidance with the corresponding recorded actions. Otherwise it would use the static obstacle avoidance model.

3.3.11 Storing data The recorded data was stored as raw binary data. A file containing data for 1000 recorded actions á 50 states per action corresponding to about 25 minutes of record- ing has a size of approximately 3 Mb. The stored data per state (pose + sensor state) is described in Table 3.1.

Pose state Sensor state Vector3 position Quaternion4 rotation Vector3 direction Delta time5 Feature vector

float posx float rotx float dirx float time float n0

float posy float roty float diry ...

float posz float rotz float dirz float nnumfeatures

float rotw

Table 3.1: The data stored for one state.

4https://en.wikipedia.org/wiki/Quaternion 5The time between the previous state and this state. 3.4. OVERALL IMPLEMENTATION 29

3.3.12 Optimization and measuring performance For usage in a proper game, the computational time of the AI should be as low as possible. The bottleneck was to apply an action on the agent’s current state in the NN algorithm since it was checked for each traversed action if it would go through an obstacle if applied to the agent’s current state. This was improved by instead of checking for collision between every state in an action, the check was approximated by only checking for collision between the first state and the middle state, and middle state and the last state in an action. To ensure the agent did not get stuck by picking an invalid action, it was forced to update its current action at a certain time interval. The performance of the imitation agent was measured by measuring the average computational time per game frame for different amounts of data; 100, 200, 500 and 1000 recorded actions with an action length of 50. 1000 recorded actions correspond to about 25 minutes of recording.

3.4 Overall implementation

The framework allows a user to create an agent which imitates demonstrated move- ment behaviour. To create a behaviour, a user creates a feature extractor which defines what environmental features should be classified. The user then chooses when the behaviour should be activated. The user collects data for the behaviour by recording itself. Finally the behaviour can be played back. An agent can possess several behaviours at once, and it is up to the user to define when which behaviour should be activated. This chapter described how IL can be used to create an agent that imitates human demonstrations using a direct imitation approach and limited amounts of data. In the next chapter, the evaluation of the imitation agent is described.

Chapter 4

Evaluation

This chapter presents the user study that was conducted in order to answer the project’s stated questions. The results of the study are presented thereafter along with a performance measure of the imitation agent. Following that is a discussion section which presents and discusses what was done in the project, what the study found to be important in looking human-like and the performance of the imitation agent in relation to games. Finally some ethical aspects are discussed.

4.1 User study

Recall that the objective of the project (see Section 1.2) is to answer the following:

– Q1.1: How to create an agent that imitates demonstrated behaviour, using IL with limited amounts of data?

– Q1.2: What determines if a character is human-like, when observed through the character’s first-person perspective?

A user study was conducted in order to answer Q1.2 and to contribute to the answer to Q1.1 by asking humans how well imitation agent imitates demonstrations. The method chapter describes how IL can be used to create behaviour by imitating recorded human behaviour, but no evaluation of whether the behaviour is human- like or not. The user study aimed to evaluate the human-likeness of the agent and to evaluate in a qualitative manner how well the agent imitates the recorded human. As a reminder, an agent is said to be human-like if it looks like it is being controlled by a human. The layout of the study was inspired by [27] which as presented in the background chapter describe a simplification of a Turing test-approach. It was also inspired by [9] which gave users statements to agree or disagree with.

31 32 CHAPTER 4. EVALUATION

4.1.1 The set-up The study consisted of videos of three different character controllers: The imitation agent, a human and Unity’s built in NavMeshAgent. These controllers will be labeled Imitation Controller (IC), Human Controller (HC) and NavMesh Controller (NC) respectively. The human provided the demonstrations for the imitation agent to imitate. The NC was intended to act as a sanity check. A person with a lot of gaming experience would be able to easily tell that the NC was not being controlled by a human, as it moves very statically, does no unexpected movements and turns with a set speed. Three different settings were set up: Setting 1

– A simple environment like during development (Figure 4.1). When the char- acter reaches the goal, the goal gets randomly positioned somewhere on the map.

Figure 4.1: Setting 1.

Setting 2

– An even simpler environment but with a single moving obstacle (Figure 4.2). 4.1. USER STUDY 33

Figure 4.2: Setting 2 with a moving obstacle (blue) and the goal (white).

Setting 3

– Same concept as Setting 1, but different map (Figure 4.3). Here, the goal positions were deterministic, meaning that when the character reaches the goal, the goal gets positioned at the next index in the goal positions list. This means that all characters take the same path.

(a) (b)

Figure 4.3: Setting 3 from a top-down view with the corresponding first-person perspective.

One video was recorded for each of the settings and for each character controller, resulting in a total of nine videos. The videos were recordings of the controllers moving around in the three different settings, from a first-person perspective (Fig- ure 4.3b). In most games, a player would observe an NPC from a third-person perspective. Using a third-person perspective requires the observed character to be modeled and potentially animated. Whether a user wants to or not, these things will most likely affect the users thoughts on how the character should behave. It is also more difficult to spot detailed movement and rotation from a third-person perspective. In first-person perspective however, a user does not need to know or 34 CHAPTER 4. EVALUATION see what the character looks like, and it is easier to register the characters exact movement and rotation. Most importantly, it is easier to spot differences between different controllers. Figure 4.4 illustrates one trajectory of the IC from a top-down view. This trajectory does not correspond to the one it took in the video in the study.

Figure 4.4: The (black) trajectory of the IC in the third setting. Visited goals in red, current goal in white.

4.1.2 Participants The user study had 32 participants of varying age and with varying gaming and AI experience. The majority was between 20-40 years old with high gaming experience and a moderately high understanding of what game AI is. 62.5% considered them- selves to have a lot of gaming experience. 46% considered themselves to have a lot of experience with how AI controlled characters move in games.

4.1.3 Stimuli The users watched nine 25 seconds long video clips of three different controllers in three different settings, from a first-person perspective. The IC used data consisting of about 150 recorded actions, which corresponds to a couple of minutes of recording. The data was recorded by the author of the project. The clips were shown in a Latin square order1.

4.1.4 Procedure In the first part, the aim was to understand what the factors are that determine whether the controller looks like it is being controlled by a human or not and thus to answer Q1.2. In this part, the users were not told which controller they were watching. They were told the following about the characters in the video clips: 1https://en.wikipedia.org/wiki/Latin_square/ 4.1. USER STUDY 35

– It can either be controlled by AI or by a human.

– There is no requirement of getting to the goal as fast as possible or taking the shortest path.

After each video clip, the users agreed or disagreed to six different statements. The statements were presented as five-point Likert scales2 shown in Table 4.1. The users were asked if anything seemed unclear.

Response Statement 1 2 3 4 5 Its movement is human-like Disagree completely Disagree Neutral Agree Agree completely It rotates in a human-like fashion Disagree completely Disagree Neutral Agree Agree completely It looks around in a human-like fashion Disagree completely Disagree Neutral Agree Agree completely Its pathing is human-like Disagree completely Disagree Neutral Agree Agree completely It avoids walls in a human-like fashion Disagree completely Disagree Neutral Agree Agree completely Overall, the behaviour of the agent is Human-like Artificial

Table 4.1: The questionnaire to be filled in by the participants.

The statements will be labeled MOVE, ROTATE, LOOKS, PATH, WALLS respec- tively. Movement means the forwards, backwards and sideways movement typically using the keys WASD on a keyboard. Rotation is done using a mouse. Looking around means that a character could perhaps look up to the sky while walking, or quickly turn to look at a wall behind it. Pathing means the path a character takes from one point to another. The NC for example, always takes the fastest path. In the second part, the aim was to determine how well the IC imitated the human that had recorded it, and thus to contribute to the answer to Q1.1. The users were told which character was controlled by which controller. They were then shown the video clips of the HC and the IC and agreed or disagreed to the similar statements as before, shown in Table 4.2. It refers to the IC and the trainer refers to the HC.

2https://en.wikipedia.org/wiki/Likert_scale/ 36 CHAPTER 4. EVALUATION

Response Statement 1 2 3 4 5 Its movement is like its trainers Disagree completely Disagree Neutral Agree Agree completely It rotates like its trainer Disagree completely Disagree Neutral Agree Agree completely It looks around like its trainer Disagree completely Disagree Neutral Agree Agree completely Its pathing is like its trainers Disagree completely Disagree Neutral Agree Agree completely It avoids walls like its trainer Disagree completely Disagree Neutral Agree Agree completely Overall, the behaviour of the agent is like its trainers Disagree Agree

Table 4.2: The questionnaire in the comparison.

4.1.5 Hypothesis The NC makes no effort in looking human-like. It takes the shortest path and makes no unnecessary movement or rotations. Therefore it was believed to be rated as not looking human-like. The HC was an actual human using a mouse and keyboard, which causes jitter in the rotation. Gorman et al. [6] found that simple clues such as standing and waiting were used to guess human players. The HC and the IC are more likely than the NC to show such clues. The transition from one of the human’s movement actions to the next is seamless and actions come in natural sequence after each other since it is a human in control. This as opposed to the IC, which picks an action according to its current situation. The IC does however imitate the human while executing an action. With this reasoning it was believed that the HC would be rated as human-like and that the IC would be rated somewhere in-between the NC and the HC.

4.2 Results

This section presents the results of the user study and a brief performance measure- ment of the imitation agent. The user study results are averaged over all settings for each question. The standard deviation describes how the number of votes differed for the different settings. Figure 4.5 shows the results of the final question, if the behaviour of the characters are human-like or artificial. The chart is split up into all people, people who considered themselves to be experts and people who considered themselves to be non-experts. These three groups consisted of 32, 20 and 8 people respectively. 4.2. RESULTS 37

4.2.1 User study

Overall, the behaviour of the agent is human-like

IC NC HC 80

60

Percentage 40

20

All Experts Non-experts

Figure 4.5: Average human-likeness ratings from the three groups. Experts are people who rated themselves as having a lot of gaming experience, 4 and 5 on the scale 1-5. Non-experts are people who rated themselves as having very little gaming experience, 1 and 2 on the scale 1-5. The vertical lines/bars are the standard deviation which describe how the number of votes differed for the different settings.

The results show that averaged over the three different settings, overall 48% of the people who participated in the user study found the IC to be human-like, 73% found the HC to be human-like and 16.5% thought that the NC was human-like (see Figure 4.5). Out of the experts, 68.3% thought that the HC was human-like with a standard deviation of 5.7. Examples of what users commented are "Good mix of rota- tion/strafing", "Felt like it was me playing the game". On the IC, users commented things like "Stared at the floor for some reason, but humans do that", "Some things made it feel like human, some like artificial. Overall perception is that it was more human than artificial", "Hard to say either. Looked like an inexperienced player but could also be an AI...". Gorman et al. [6] presented 38 CHAPTER 4. EVALUATION similar comments on their imitation agent, like "fires gun for no reason, so must be human" and "stand and wait, AI wouldn’t do this". 40% of the experts found the IC to be human-like with a rather high standard deviation of 20.0, which perhaps reflects the insecurity in the comments. Some comments on the NC were "Too efficient", "Constant rotation", "It feels too precise and robot-like". 10% of the experts found it to be human-like, with a standard deviation of 5.0.

Its pathing is human-like

18 IC NC 16 HC

14

12

10

Votes 8

6

4

2

0

1 2 3 4 5 Disagree completely Agree completely

Figure 4.6: Pathing. The standard deviation describes how the votes varied for the different settings.

In the detailed questions, the IC achieved its highest human-likeness score on its pathing (PATH in Figure 4.7), i.e. the path it takes from one point in the environ- ment to another, where 51% (4 and 5 in Figure 4.6) thought it to be human-like. 4.2. RESULTS 39

IC 5 NC HC

4

3

Mean Likert response 2

1

MOVE ROTATE LOOKS PATH WALLS

Figure 4.7: Mean points for the three different controllers on the five-point Likert scale, summed over the three settings. The stan- dard deviation shows how the answers differed from the mean value.

Figure 4.7 shows the results of the detailed questions. The variation in the scores were all similar to PATH with slight differences. Figure 4.6 displays why the stan- dard deviation is in this figure is quite high. 40 CHAPTER 4. EVALUATION

Comparison: it behaves like its trainer

IC

15

10 Votes

5

0

1 2 3 4 5 Disagree completely Agree completely

Figure 4.8: The second part of the user study, where it was asked how well the IC imitates the demonstrating HC.

On average 75% (4 and 5 in Figure 4.8) of all people agreed that the IC behaves like the demonstrating human.

4.2.2 Imitation agent performance The Unity engine has a built in profiler which shows the exact amount of time each function takes each frame. That data is however only available during runtime and cannot easily be extracted. Upon manual inspection of the profiler over time, using 100, 200, 500 and 1000 recorded actions with an action length of 50, and using up to five agents at the same time, the computational time stayed below 0.1ms on average. In the cases where multiple agents would classify their state on the same frame, the computational time would go up to 1.5ms that frame. The experiments were made on a computer with 64-bit Windows 10, Intel Core i5-6500 @ 3.20 GHz (4 CPUs), 8 GB of RAM and a GeForce GTX 1060 6GB. 4.3. DISCUSSION 41

4.3 Discussion

Research has shown Imitation Learning to be a successful technique for creating agent behaviour [26][3][15] and also human-like agent behaviour [20][19][8][18]. This project described a method for how Imitation Learning can be used to create agent behaviour using limited amounts of data. This was demonstrated through the created framework described in the method chapter. The framework allows for recording human demonstrations and playing back agent behaviour which imitates the demonstrations. A user study was conducted in order to evaluate the human- likeness of the imitation agent and in a qualitative manner determine how well it imitates the demonstrating human.

4.3.1 The imitation agent The framework similarly to [13] specifies the target behaviour before recording. It is heavily inspired by Karpov et al. [8] who use their Human Trace Controller for playing back recorded sequences of human traces for getting unstuck. They were able to create an agent that solves navigation problems while moving in a human- like fashion and were able to fool 50% of judges in their human-likeness test. Their approach does not generalize to new environments however, as they store positions of specific environments. When replaying a sequence they look for sequences with stored positions close to the agents current position. This project ended up achieving something similar to [8]’s controller, but doing it in custom arbitrary environments with no pre-existing data. About 40% of the users in the user study believed that the imitation agent was controlled by a human. The framework would be well suited for people who for example develop games in Unity and would like to quickly have an NPC with more interesting behaviour than the standard NavMeshAgent, without having to spend time and resources on programming it. The agent’s Playback loop can easily be paused if the imitating behaviour would not always be required.

4.3.2 The user study 4.3.2.1 Reliability of the user study results The user study was not a big study and could be improved. Some users said that it is hard to say if something moves human-like or like they would move without having tried to control an agent or "felt" the controls themselves. The HC was used to and comfortable with the controls and the project. Perhaps the study should have let several different humans record themselves and create their own imitation agents, that they then could watch and evaluate. However, the purpose of the created framework is to be used by an expert, who knows how it wants the agent to behave. In that sense the videos were recorded correctly. Munoz et al. [15] imitate a human, a ML AI and a hand-coded AI, but their goal is different. They want to 42 CHAPTER 4. EVALUATION create competitive behaviour by imitating three different good controllers. When measuring what is human-like and what is not in a more general fashion though, more variation in human behaviour would probably have been better. With the current length of the video clips, the average time to complete the study was around twenty minutes. It was determined that the study should not take longer, in order to keep a user’s focus and to make more people willing to participate. Longer videos would perhaps have given better results though, as the users would have been able to observe the characters for longer periods of time.

4.3.2.2 What is important in looking human-like? The question of what is human-like is could perhaps be partly answered by de- termining what is not human-like. Comments from users in the study were the most consistent regarding the NC not being human-like, frequently stating that it was too efficient, rotated with a constant speed, was predictable and did nothing unexpected. The HC was rated as being human-like by 70% of the users. A guess without evidence is that this is a result of the users being unsure on the IC and do not want to be fooled, so they "play it safe" and say that it is controlled by AI. It is also interesting to investigate what it is in the IC that makes it not look as human as the HC does. Many of the comments describe feelings, as in it felt like AI behaviour, rather than specifying exactly what they mean. The short length of the video clips may have been a reason for this, as the impression of a character becomes more of a summarized feeling of what had been observed, rather than exact details about different scenarios. Some people noted on the IC that it would sometimes get too close to a wall. This is a result of the agent executing an action whose last state ended up being close to a wall. When the state gets classified again the agent will pick an action which probably makes a sharp turn or in some way avoids the wall. A human would perhaps have seen the wall coming "several actions" away and planned a different path. Planning a longer path or trajectory is an extension that would potentially make the agent appear more human-like. Other comments similarly pointed out single events as breaking the illusion of the character being human-like, like one too fast rotation. Actions without a clear purpose seem to correspond to human behaviour according to the study, like looking down at the floor for no reason. The reason could be that these kinds of behaviour are not typically implemented by an AI programmer. There were also comments on strafing (sideways movement), and that doing it in correct situations was something that was human-like behaviour. Strafing around a corner to have vision of what is behind the corner is one such example. When using the grid with the IC to specify a destination, an action which involved strafing could sometimes be weighted as the best action, because it got the agent closer to the destination. However, a strafing motion may not have been a human-like thing to 4.3. DISCUSSION 43 do in that situation. This is something that users used as motivation to why the IC sometimes did not look human-like. How actions are weighted is something that can be tweaked, and is a trade-off between the agent getting to a specified destination faster and thereby ignoring "learning", and behaving similar to the demonstration given the current state.

4.3.3 Creating non-human-like behaviour As mentioned in the introduction, there is not anything stopping a user of this framework from creating agent behaviour that is not human-like. The framework is made to imitate the demonstrated behaviour.

4.3.4 Performance in relation to games The performance of the IC presented in the result chapter shows that it can operate in about 0.1ms per frame. This does however depend on several factors, such as what action length is chosen, how many simultaneous agents are used and how often it should check for moving obstacles if using the obstacle avoidance behaviour. There are probably ways of optimizing it further, but the framework created in this project is a prototype which shows that it is possible to use it in a modern game.

4.3.5 Ethical aspects There is a debate about the increased use of computer and video games3, whether the content of the games change the behaviour and attitude of a player or not. For example there are theories that violent games could influence aggression in players. Making agents in games more human-like is not likely to affect this. It is up to the creator of the agent to decide whether the agent should act violently or not. A more human-like agent would perhaps appear more real though, making the violence more real. Either way, the human-like agents will not be more human-like than actual human players, so if this is a problem it already existed before human-like agents. In the possibility that more human-like agents in games make the games more interesting and fun to play, one could argue that the agents contribute to peoples’ potential addiction to games, which could be considered negative from a societal or social sustainability point of view. It could also lead to decreasing the need for multiplayer support in games, since the human-like agents to some extent could replace other humans, thus decreasing an important social aspect of gaming. It could however add an extra social aspect to games that otherwise would have had none. A bigger interest in the games could also lead to better sales for the game making companies, leading to more and better games.

3https://en.wikipedia.org/wiki/Video_game_controversies/

Chapter 5

Conclusions

This chapter presents the conclusions of the project and future work. The method chapter describes the implementation of the imitation agent using IL and limited amounts of data. The only way for the agent to move is by executing actions. During the execution of an action, the agent imitates the recorded be- haviour. Therefore, if the recorded behaviour is human-like, the agent behaviour will be human-like. Additionally, the comparison made in the user study shows that the majority of the people who participated think that the agent behaves like the human who recorded it did. IL proves to be a good technique for creating human- like behaviour, like previous research has stated. This project shows one way of doing it, using a Nearest Neighbour algorithm with a KD-tree as the policy that maps a state to an action. Q1.1: How to create an agent that imitates demonstrated behaviour, using IL with limited amounts of data? can thus be considered answered. If a character appears to be human-like when observed through the characters first- person perspective (Q1.2: What determines if a character is human-like, when observed through the character’s first-person perspective?) seems to be determined by several factors. According to the user study, human-like traits are actions without a clear purpose, such as looking down at the floor for no particular reason. Also moving with correct usage of timing and motion, like strafing around corners appears to be important. Another thing is to have varied but consistent movement. If movement is too static, with for example a constant rotation speed or a too straight and precise path, it does not look human-like. On the other hand, if the rotation speed and timing is consistent but occasionally twitches or breaks the pattern, it also does not look human-like. Combining the answers to Q1.1 and Q1.2 answers Q1: How can IL be used to create human-like agent behaviour, using limited amounts of data?.

45 46 CHAPTER 5. CONCLUSIONS

5.1 Future work

The question of what makes a character human-like would be better answered if the study contained more different recorded humans. Given more time and resources it would have been interesting to do a bigger user study. As mentioned in the discussion, letting several different humans record themselves and create their own imitation agents that they then could watch and evaluate would be one interesting experiment. A question could then be if the human feels like it recognizes itself play when watching the agent. It would also be interesting to dig deeper into why a person thinks that a character looks human-like or not, and to what extent a person thinks a character looks human-like or not because it behaves or does not behave like the person itself would. Some of the users’ comments in the user study motivate an agent looking human-like because it moves like they would have moved. The study mostly investigates what low-level actions make a character look human- like, such as rotation and movement. One extension would be to have a more realistic and complex game scenario with a dynamic environment and more sophisticated behaviour like in [8], such as shooting, jumping and avoiding enemies. Then a more general behaviour and the agent’s decision making could be evaluated. Since it seems that the imitation agent shows some human-like traits, it would be interesting to investigate whether human players think that it is more entertaining to play with an agent that imitates human behaviour than with an agent that does not. Especially in VR games this could be of interest, as the player is the virtual character as opposed to controlling the character with a mouse and keyboard. The framework could be extended by path planning further ahead to prevent end- ing up too close to a wall and plan according to what is visible to the agent, in order to make it appear more human-like. The efficiency of the action selection could be improved by improving the data. Common actions could be identified and reduced into fewer actions to reduce the data set. A possible extension is also to modify demonstrations and improve their efficiency, in terms of faster getting the agent a longer distance. Argall et al. [1] mentions eliminating parts of the teacher’s executions that are suboptimal as an approach to dealing with low quality of the demonstration dataset. Chen and Zelinsky [5] present a method for identifying and eliminating noise in a demonstration which could be a useful technique. This would however possibly remove some of the human-likeness and make the agent become more similar to traditional AI. It would be interesting to see if implementing actual learning, with for example a Neural Network, would yield the same results in a similar user study. If the results are similar or better with a learning approach, it would be interesting to compare the complexity of the framework and amount of control in shaping behaviour of the two approaches (learning/not learning). Perhaps the framework created in this project could act as a good tool for prototyping agent behaviours, or to get a working agent with human-like behaviour up and running with little effort. 5.1. FUTURE WORK 47

5.1.1 Use outside of games Imitation Learning has been most widely used in the area of robotics [1]. It has been discussed that since robots one day could be able exist and work together with humans, it would facilitate the interaction if the movements of the robot are human-like and look natural [2], where Imitation Learning is one approach to achieve that. There is also an interest in the area of human crowd simulation, to make the individual humans in a crowd behave in a human-like or natural fashion [10].

Bibliography

[1] Brenna D Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. A sur- vey of robot learning from demonstration. Robotics and autonomous systems, 57(5):469–483, 2009. [2] Tamim Asfour, Pedram Azad, Florian Gyarfas, and Rüdiger Dillmann. Imita- tion learning of dual-arm manipulation tasks in humanoid robots. International Journal of Humanoid Robotics, 5(02):183–202, 2008. [3] Luigi Cardamone, Daniele Loiacono, and Pier Luca Lanzi. Learning drivers for TORCS through imitation using supervised methods. In Computational Intelligence and Games, 2009. CIG 2009. IEEE Symposium on, pages 148– 155. IEEE, 2009. [4] Yu-Han Chang, Rajiv T Maheswaran, Tomer Levinboim, and Vasudev Rajan. Learning and evaluating human-like NPC behaviors in dynamic games. In AIIDE, 2011. [5] Jason Chen and Alex Zelinsky. Programing by demonstration: Coping with suboptimal teaching actions. The International Journal of Robotics Research, 22(5):299–319, 2003. [6] Bernard Gorman, Christian Thurau, Christian Bauckhage, and Mark Humphrys. Believability testing and bayesian imitation in interactive com- puter games. In International Conference on Simulation of Adaptive Behavior, pages 655–666. Springer, 2006. [7] Robin Hunicke. The case for dynamic difficulty adjustment in games. In Proceedings of the 2005 ACM SIGCHI International Conference on Advances in computer entertainment technology, pages 429–433. ACM, 2005. [8] Igor V. Karpov, Jacob Schrum, and Risto Miikkulainen. Believable Bot Naviga- tion via Playback of Human Traces, pages 151–170. Springer Berlin Heidelberg, 2012. URL http://nn.cs.utexas.edu/?karpov:believablebots12. [9] Geoffrey Lee, Min Luo, Fabio Zambetta, and Xiaodong Li. Learning a super mario controller from examples of human play. In Evolutionary Computation (CEC), 2014 IEEE Congress on, pages 1–8. IEEE, 2014.

49 50 BIBLIOGRAPHY

[10] Alon Lerner, Yiorgos Chrysanthou, Ariel Shamir, and Daniel Cohen-Or. Data driven evaluation of crowds. In International Workshop on Motion in Games, pages 75–83. Springer, 2009.

[11] Mei Yii Lim, João Dias, Ruth Aylett, and Ana Paiva. Creating adaptive affec- tive autonomous NPCs. Autonomous Agents and Multi-Agent Systems, 24(2): 287–311, 2012.

[12] Daniel Livingstone. Turing’s test and believable AI in games. Computers in Entertainment (CIE), 4(1):6, 2006.

[13] Manish Mehta, Santiago Ontanón, Tom Amundsen, and Ashwin Ram. Au- thoring behaviors for games using learning from demonstration. In Proceedings of the Workshop on Case-Based Reasoning for Computer Games, 8th Inter- national Conference on Case-Based Reasoning (ICCBR 2009), L. Lamontagne and PG Calero, Eds. AAAI Press, Menlo Park, California, USA, pages 107– 116, 2009.

[14] Andres Munoz. Machine learning and optimization. URL: https://www. cims. nyu. edu/˜ munoz/files/ml_optimization. pdf [accessed 2016-03-02][WebCite Cache ID 6fiLfZvnG], 2014.

[15] Jorge Munoz, German Gutierrez, and Araceli Sanchis. Controller for TORCS created by imitation. In Computational Intelligence and Games, 2009. CIG 2009. IEEE Symposium on, pages 271–278. IEEE, 2009.

[16] Chrystopher L Nehaniv, Kerstin Dautenhahn, et al. The correspondence prob- lem. Imitation in animals and artifacts, 41, 2002.

[17] Jeff Orkin. Three states and a plan: the AI of FEAR. In Game Developers Conference, volume 2006, page 4, 2006.

[18] Juan Ortega, Noor Shaker, Julian Togelius, and Georgios N Yannakakis. Imi- tating human playing styles in super mario bros. Entertainment Computing, 4 (2):93–104, 2013.

[19] Steffen Priesterjahn. Imitation-based evolution of artificial players in modern computer games. In Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 1429–1430. ACM, 2008.

[20] Steffen Priesterjahn, Oliver Kramer, Alexander Weimer, and Andreas Goebels. Evolution of reactive rules in multi player computer games based on imitation. In International Conference on Natural Computation, pages 744–755. Springer, 2005.

[21] Joe Saunders, Chrystopher L Nehaniv, and Kerstin Dautenhahn. Teaching robots by moulding behavior and scaffolding the environment. In Proceedings of BIBLIOGRAPHY 51

the 1st ACM SIGCHI/SIGART conference on Human-robot interaction, pages 118–125. ACM, 2006.

[22] Ayse Pinar Saygin and Ilyas Cicekli. Pragmatics in human-computer conver- sations. Journal of Pragmatics, 34(3):227–258, 2002.

[23] Jacob Schrum, Igor V Karpov, and Risto Miikkulainen. Human-like combat behaviour via multiobjective neuroevolution. In Believable bots, pages 119–150. Springer, 2013.

[24] Noor Shaker, Julian Togelius, Georgios N Yannakakis, Likith Poovanna, Vinay S Ethiraj, Stefan J Johansson, Robert G Reynolds, Leonard K Heether, Tom Schumann, and Marcus Gallagher. The turing test track of the 2012 mario AI championship: entries and evaluation. In Computational Intelligence in Games (CIG), 2013 IEEE Conference on, pages 1–8. IEEE, 2013.

[25] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduc- tion, volume 1. MIT press Cambridge, 1998.

[26] Christian Thurau, Christian Bauckhage, and Gerhard Sagerer. Imitation learn- ing at all levels of game-AI. In Proceedings of the international conference on computer games, artificial intelligence, design and education, volume 5, 2004.

[27] Iskander Umarov and Maxim Mozgovoy. Believable and effective AI agents in virtual worlds: Current state and future perspectives. International Journal of Gaming and Computer-Mediated Simulations (IJGCMS), 4(2):37–59, 2012.

[28] Andreas Vlachos. An investigation of imitation learning algorithms for struc- tured prediction. In EWRL, pages 143–154, 2012.

[29] Roger Weber, Hans-Jörg Schek, and Stephen Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, volume 98, pages 194–205, 1998. www.kth.se