University of Nevada, Reno

Neurovisual Control in the Quake II Environment

A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science with a major in Computer Science.

by

Matt Parker

Dr. Bobby D. Bryant, Thesis Advisor

August 2009

THE GRADUATE SCHOOL

We recommend that the thesis prepared under our supervision by

MATT PARKER

entitled

Neurovisual Control In The Quake II Environment

be accepted in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

Bobby D. Bryant, Advisor

Kostas E. Bekris, Committee Member

Jennifer Mahon, Graduate School Representative

Marsha H. Read, Ph. D., Associate Dean, Graduate School

August, 2009

i

Abstract

An enormous variety of tasks and functions can be performed by humans using only a two-dimensional visual array as input. If an artificial intelligence (AI) controller could adequetely harness the great amount of data that is readily extracted by humans in visual input, then a large number of robotics and AI problems could be solved using a single camera as input. First-person shooter computer games that provide realistic-looking graphics can be used to test visual AI controllers that can likely be used for real-life robotics. In this research, the computer game Quake II is used to test and make improvements to visual neural network controllers. Neural networks are promising for visual control because they can parse raw visual data and learn to recognize patterns. Their computational time can also be much faster than complex mathematical visual algorithms, which is essential for real- time applications. In the first experiment, two different retinal layouts are connected to the same type of neural network: one retina imitates a human’s clear-center/blurred-periphery, and the other uses uniform acuity. In the second experiment, a Lamarckian learning scheme is devised that uses a hand-coded non-visual controller to help train agents in a mixture of back-propagation and neuroevolution. Lastly, the human-element is completely removed from the Lamarckian scheme by replacing the hand-coded non-visual controller with an evolved non-visual neural network. The learning techniques in this research are all successful advances in the field of visual control and can be applied beyond Quake II. ii

Acknowledgements

This work was supported in part by NSF EPSCoR grant EPS-0447416. Quake II is a registered trademark of Id Software, Inc., of Mesquite, Texas. iii

Contents

Abstract i

Acknowledgements ii

List of Figures iv

1 Introduction 1 1.1 Motivation ...... 1 1.2 Overview ...... 1

2 Background 3 2.1 Genetic Algorithms ...... 3 2.1.1 Standard Genetic Algorithm ...... 3 2.1.2 Queue Genetic Algorithm ...... 5 2.2 Neural Networks ...... 7 2.2.1 Backpropagation ...... 9 2.2.2 Neuroevolution ...... 9 2.3 Computer Vision ...... 10 2.3.1 Hidden Markov Models and Bayesian networks ...... 10 2.3.2 Support Vector Machines ...... 11 2.3.3 Neural Networks ...... 11 2.4 First-Person-Shooters as an AI Research Platform ...... 12 2.5 Synthetic Vision vs. Raw Vision ...... 13

3 The Quake II Environment 14 3.1 Original Game ...... 14 3.2 Quake2AI Interface ...... 15

4 Experiment 1: Graduated vs. Uniform Density Retina 17 4.1 Introduction ...... 17 4.2 Experimental Setup ...... 17 4.3 Neuro-Visual Controller ...... 18 4.4 Training ...... 20 4.5 Results ...... 22 4.6 Conclusion ...... 23

5 Experiment 2: Lamarckian Neuroevolution 25 iv

5.1 Introduction ...... 25 5.2 Experiment Setup ...... 25 5.3 Hand-Coded Bot for Backpropagation ...... 26 5.4 Training ...... 28 5.5 Results ...... 29 5.6 Conclusion ...... 31

6 Experiment 3: Lamarckian Neuroevolution Without Human Supervision 33 6.1 Introduction ...... 33 6.2 Experiment Setup ...... 33 6.3 Supervising Bots for Backpropagation ...... 34 6.4 Training ...... 36 6.5 Results ...... 37 6.6 Conclusion ...... 39

7 Conclusion 41 v

List of Figures

2.1 The Queue Genetic Algorithm (QGA). New individuals are bred from parents chosen from the current population by roulette wheel selection. After the new individual is tested, it is placed on the beginning of the queue and the oldest individual is discarded...... 6 2.2 A single-layer perceptron network. Each of the 22 inputs is connected by a weight to each of the 3 outputs...... 8

3.1 An in-game screenshot from the game Quake II...... 15

4.1 A screenshot of the simple room used for this experiment. The ceilings are a dark brown textures, the walls are a gray texture, and the floors are white. The enemy is dark-blue and contrasts the light colored walls and floors. The trail of dots indicates the bolts from the learning agent’s blaster...... 18 4.2 Left: A scene as rendered by the game engine. Right: The same scene as viewed via the graduated density retina. The black tics above and below the image indicate the retinal block widths...... 19 4.3 A view via the uniform density retina. The enemy’s location and distance are similar to the view in figure 4.2. The contrast between the enemy and the walls and floor are much less distinct in the uniform retina than in the graduated density retina, because of the increased area averaged into the visual block where it appears...... 20 4.4 Diagram of the controller network, with 28 visual inputs, 10 recurrent hidden layer neurons, and 4 outputs...... 21 4.5 The population fitness averaged over 24 independent runs, tested using the uniform density retina (figure 4.3) and the graduated density retina (figure 4.2). The dashed bottom line indicates the increased movement of the enemy, where 300 is the maximum possible enemy movement speed...... 23

5.1 An in-game screenshot of the environment used in this experiment. The floor and ceilings are brown, the walls are gray, and the enemy is dark blue. The room is dimly lit with varying shadows. The display of the shooter’s own weapon has also been removed. (Cf. figure 4.1.) ...... 26 5.2 An in-game screenshot of an agent looking at the enemy opponent in the map. The left side of the screen shows the Quake II game screen and the right side shows what the agent actually sees through its retina...... 28 5.3 Average of the average fitnesses of the 24 populations that used plain neu- roevolution and Lamarckian neuroevolution. The top dark line shows the fitness according to the number of kills, and the dashed line shows the en- emy’s speed, which increased whenever the fitness reached a certain point. . 30 vi

6.1 An in-game screenshot of the environment used in this experiment. The floor and ceilings are brown, the walls are gray, and the enemy is dark blue. The room is dimly lit with varying shadows. A large square pillar is placed in the center of the room. The display of the shooter’s own weapon has also been removed...... 34 6.2 An in-game screenshot of an agent looking at the enemy opponent in the map. The left side of the screen shows the Quake II game screen and the right side shows what the agent actually sees through its retina...... 36 6.3 Average of the average fitnesses of the 25 populations that used neuroevolu- tion only and that used Lamarckian neuroevolution. The top dark line shows the fitness according to the number of kills, and the dashed line shows the enemy’s speed, which increased whenever the fitness reached a certain point. 38 6.4 Average of the average fitnesses of the 25 populations evolved non-visual controllers. The dark top line shows the fitness according to the number of kills, and the dashed line shows the enemy’s speed, which increased whenever the average fitness reached a certain point...... 39 1

Chapter 1 Introduction 1.1 Motivation

Humans with normal vision use their visual system to help them live and survive in the world. Even if all other senses, such as taste, touch, hearing, and smell, are removed, a human can still accomplish a large variety of tasks. For example, a human can remotely control a military drone using only a streaming two-dimensional camera image taken from the nose of the airplane; likewise, cars or other vehicles can be controlled with just visual input. Doctors are able to operate remotely on patients using a robotic scalpel system and a camera image. Computer games usually only require that a player can view the graphical data presented on a 2-D screen, yet there are computer games and simulations for almost every interesting human action, all which can be performed by using the visual screen to reap the game information. Artificial intelligence systems that utilize visual input can use a single camera for in- put. Vehicles could be controlled entirely or in part by an AI looking through a camera [28][21]. AI agents in computer games could also be created that would see exactly what a human would see, so that the AI could not cheat by directly reading the game environment information [33]. It is the goal of this research to improve the capabilities of AI in real-time visual control to further realize the benefits of such visual AI.

1.2 Overview

In this research AI controllers are created that are able to play a computer game. Computer games are good test-beds for AI controllers because they provide cheap, robust, and well- tested simulations that are generally non-deterministic in nature [25][24]. They often will allow for game-customability by changing models, maps, and game-rules. Moreover, AI controllers used in games can easily provide a direct comparison to a human’s ability to play the same game, often, in the case of multi-player games, by allowing the AI and the humans to directly compete. For this research the first-person shooter Quake II is used and AI controllers are trained to hunt and kill an opponent using only visual data from the 2 rendered game-screen as input. Imitating the abilities of the human visual system is difficult because it is very complex and processes visual data rapidly [43][15]. The visual cortex processes in massively parallel [18], but an individual computer processor typically computes serially. Writing algorithms to decode visual data is difficult because they must attempt to do sequentially what the human cortex does seemingly instantaneously and in parallel; solving such problems in sequence is hard and computationally intensive [1]. Algorithms such as artificial neural networks are designed to imitate in part the computational processes of the human brain. Their computational abilities for certain visual tasks have been shown to be much faster than conventional techniques [47][13]. The simulator used in this research, Quake II, runs in real-time at 40 frames per second, so any visual computation algorithm that is used must be fast in order to keep up with the gameplay. Because of the speed requirements and the proven abilities of neural networks to process visual data quickly, they are used for the visual controllers in this research. The neural networks are trained with a mix of backpropagation and genetic algorithms (chapter 2). The neural network controllers in this research read visual data from the rendered screen in the game. The first experiment compares the differing abilities of different retinal layouts, and trains an AI agent to shoot a moving enemy in a visually simple room. The second experiment trains neural networks to shoot a moving enemy in a visually complex room using Lamarckian neuroevolution with a hand-coded non-visual controller to aid in supervised learning. The third experiment trains an agent to kill a moving enemy in a visually complex room with a large central pillar; again Lamarckian neuroevolution is used, but this time the supervising hand-coded controller is replaced with an evolved non-visual neural network. The results of these experiments are discoveries of improvements to neurovisual controllers and learning methods to train them. While the final Quake II agents trained in these experiments are not yet robust and enjoyable opponents, hopefully the techniques discovered in this research will be used to make more complicated visual agents in the future. 3

Chapter 2 Background 2.1 Genetic Algorithms

2.1.1 Standard Genetic Algorithm

Genetic algorithms (GA) are a class of learning algorithms that imitate micro-evolution and breeding as seen in real world biology. Living organisms all contain chromosomes and genes that specify how that organism grows and whether or not it exhibits certain characteristics and traits common to that species. When animals reproduce there is some form of crossover between the chromosomes of the two parents such that the resulting children contain traits from both parents. In the wild, because the fittest animals are most likely to survive and mate, the genes that produce the fittest animals tend to become dominant in species in certain environments. Thus, a pack of arctic rabbits will have thicker fur coats; a result of the survival of all those past rabbits with thick coats who mated and the demise of those rabbits with thin coats who did not mate as often. However, if the arctic rabbits were to move to a hotter climate, the thicker-coated rabbits would sooner overheat and would be less likely to survive and mate; thus, the rabbit children would be thinner-coated. This form of evolution is also seen when humans mix together breeds of animals to create breeds with desireable characteristics. For a simple example, suppose a man has a large number of pet rats and would like to breed a ”hooded” rat, whose fur is colored white below the shoulders and dark above: from the initial population of rats, this man would find those that most closely matched the required characteristics and engineer their settings such that they mate together, and he would discard the other rats. After several generations of breeding the man would end up with his hooded rats. Genetic algorithms simulate this sort of evolutionary breeding by changing a population of digital chromosomes until it has evolved one or many chromosomes that produce traits of the desired result. The digital chromosome is in actuality a chain of numbers stored in the computer’s memory. This chain of numbers is mapped in some way to some manifestation, in the same way that every rat chromosome is mapped to a real rat organism; this manifestation can be a picture, a computer program, an AI of some sort, or almost anything whose traits 4 can be represented in some way by numbers. In the research presented in this report, I map the genetic algorithm as the weights of a neural network, which is discussed in more detail in chapter 3.

Selection and Fitness Function

In order to evolve the population of chromosomes to a desirable solution, the evolution must be guided by some function that specifies how nearly the individuals with the chromosomes achieve that solution. This guiding function is called the fitness function, and it is the main propellant for successful evolution. If we were somehow breeding hooded digital rats, for instance, the fitness function would return some measuremeant of just how ”hooded” a digital rat appeared; this measurement is called the individual’s ”fitness,” and by ranking the fitnesses of all the individuals in the population, we can selectively breed those that are more fit. After several generations, we hopefully achieve our desired result. The research in this thesis uses stochastic, ”roulette wheel”, selection when picking out which chromosomes should mate and perform crossover. The fitness function tests each individual in the population and assigns each a fitness. The chance that an individual will be picked for crossover is (Individual0sF itness)/(SumT otalofpopulation0sfitness). Supposing that all the fitnesses of the individuals were put into a proportional pie-chart, one could play roulette with that table and have the same results, with the more-fit having a better chance of selection than the less fit.

Crossover

There are several different forms of crossover that can be used to create a child from two parent chromosomes. A basic method is to randomly pick an index within the length of the chromosomes and take the first part from one parent and all after from the other parent; this method is called single-point crossover, and suffers because the beginning and ending of one chromosome are almost never included in the child’s chromosome. Two-point crossover handles this problem by selecting exactly half of the length of the chromosome in one large chunk from some random location within one of the parents’ chromosomes; the un-selected portions are taken from the second parent. Two-point crossover, however, creates a new problem in that the beginning and the end of the child’s chromosome is almost always 5 from the same parent. Another simple method is called uniform crossover, which iterates through each gene, randomly selecting with an even chance from which parent to take the gene, eliminating both of the problems associated with single- and double-point crossover. Uniform crossover is used in this research because it is simple to implement and mixes the chromosomes well.

Mutation

Another important part of the evolutionary learning process in genetic algorithms is random mutation, which occurs along with every crossover. This mutation randomly flips a few bits or slightly changes the numbers within the chromosome. The purpose of mutation is to prevent the population from prematurely converging upon a sub-optimal solution. For instance, in the hooded-rat example, a premature convergence might be a population of rats that are all white on the bottom but have only half of their hood on top; if a large number of rats evolved to become like this near the beginning, then without mutation the rats might never become fully hooded; however, with mutation, after time, a rat would likely evolve that had slightly more than half of a hood, and so on, until they achieved a full hood. Overall, genetic algorithms are very useful for solving problems where there are far too many variables to try one by one. Many of the chromosomes evolved in this research consist of 434 floating point numbers in about the range of -1.0 to 1.0. Trying out all the different combinations for 434 floating point numbers, even to an imprecise resolution, would be far too many trials to accomplish. With a GA, however, a good solution can often be found relatively quickly.

2.1.2 Queue Genetic Algorithm

In order to evolve a population using a genetic algorithm, every individual must be tested and given a fitness. In some cases this testing only takes a few hundred milliseconds, and results are achieved very quickly; but in other cases the testing can take several seconds, which makes the evolution an extremely slow process. It is desireable to evolve quickly with a genetic algorithm because often the fitness function or the mapping between the chromosome 6 and its representation needs to be tested and tweaked many times before it produces the desired results. In the research in this experiment, all of the fitness evaluations required at least 24 seconds; because of this, the evolution needed to be distributed across multiple processors and computers. Rather than use a generational GA to do this, I constructed a more distribution-friendly genetic algorithm, called the Queue Genetic Algorithm (QGA) [35]. A generational genetic algorithm will, as the name suggests, evolve over a series of generations. First, the entire population is created with random chromosomes; next, each chromosome is tested with the fitness function. After all the chromosomes are assigned a fitness, they are bred together by the selection mechanism and a new population of children is created. This form of evolution learns well, but is difficult to distribute because the necessary generational cut-off points. The difficulty arises near the end of one generation and at the beginning of the next. Suppose that the population is of size p, and p − 1 individuals’ fitnesses have been evaluated, and that there are d distributed fitness tests. When the pth individual’s fitness is returned then the population can be bred and evolved to create the next generation; however, at that point there are still d − 1 fitness tests being evaluated for the old generation. These tests, since they are with the old generation, should be discarded, and thus computation is wasted. Optimizations can be made to help synchronize the finishing times of the fitness functions to reduce waste, but this requires added complexity, and can not easily handle randomly dropped or added clients.

Figure 2.1: The Queue Genetic Algorithm (QGA). New individuals are bred from parents chosen from the current population by roulette wheel selection. After the new individual is tested, it is placed on the beginning of the queue and the oldest individual is discarded.

My solution to the distribution problem is to set up the population as a steady-state first-in-first-out queue, called a Queue Genetic Algorithm (figure 2.1). Instead of having 7 a generation-by-generation evolution, the population is constantly evolving. The chromo- somes in the population are stored in a queue, with the oldest individuals being near the end, and the youngest at the beginning. When a fitness function is ready to test a new chromosome, it is created by roulette wheel selection of two parents from the individuals in the queue. After the chromosome is tested and a fitness is assigned, both the chromosome and its fitness are sent back to the QGA and is added to the beginning of the population; to make room for the new individual, the oldest is removed. This setup is easily distributed because there are no discrete generational cutoff points; as soon as a process is available to run the fitness function it requests a chromosome from the QGA and receives it; whenever it finishes testing the chromosome it can send the individual back to the QGA and it will immediately be inserted into the population. The QGA is also resistant to errors such as unexpected disconnections; chromosomes that are sent out for testing are not tracked and if they never return due to some disconnection then the QGA can always create new child chromosomes. Another advantage is that the population always matures upon the return of each tested chromosome, unlike a standard GA’s generational increases in population maturity which limits observation of the newest results until the entire population has been tested. The QGA design can be easily set up as a that distributes chromosomes to clients over a network protocol. By doing this, it is possible to distribute the fitness evaluations across multiple cores in a single computer, as well as over multiple computers over the network, and even over hundreds of nodes in the cluster. Each of these QGA clients connects to one central QGA server, receives a new chromosome, tests it, and sends it back with an attached fitness. The QGA is used for the evolution in all the tests presented in this paper.

2.2 Neural Networks

Neural networks are found in animals and are a mesh of interconnected neurons. Neurons accept signals from other neurons, and as determined by those incoming signals will also send out signals. This mesh of neurons appears to provide humans and animals with most of their cognitive power. Artificial neural networks are made by man and are designed to simulate in some way the workings of a real neural network, with the hope that the artificial 8 will exhibit the same sort of intelligence as the natural. Neural networks have been used very successfully in visual applications for object identification as well as for real-time control using visual inputs. An artificial neural network, henceforth referred to simply as a neural network (NN), can be created as a functional algorithm inside of a computer. Like a function, the NN consists of a number of inputs that produce outputs. The inputs can be any sorts of values that are deemed necessary to produce a corresponding output. For instance, if the problem is for visual object recognition, the inputs may be the color values of pixels in an image. In a very simple network, such as a single-layer perceptron (figure 2.2), the inputs are connected directly to the outputs. Every input is connected to every output by a single weight, which is usually a floating point value; the input value is multiplied by all the weights, and the value of those products are summed together at each output. The summed value is then squashed to within an acceptable range by using a squashing function, such as tanh. By changing the values of the weights, the mapping between inputs and outputs is changed, and thus the functional behavior of the network is changed.

Figure 2.2: A single-layer perceptron network. Each of the 22 inputs is connected by a weight to each of the 3 outputs.

It was found that single-layer perceptron networks are unable to solve exclusive-or logic problems, or any non-linearly separable problem. However, it was also discovered that such problems could be solved by adding a hidden layer of neurons between the input and output layers. In a hidden layer network every input is connected to every hidden neuron, and every hidden neuron is connected to every output. The values of the hidden neurons are calculated in the same way as the outputs in the perceptron: by summing the products of the weights and the input values and squashing them. The hidden layer values then act as inputs to the output neurons, and they likewise are multiplied by their weights and 9 summed, then squashed. By adding the hidden layer, it is possible for neural networks to exhibit non-linearly separable behavior. A regular hidden layer neural network can solve many problems, but it is limited to instantaneous solutions, and can not look into its past to aid in a current solution; such limited neural networks can not even learn to produce a sine wave; indeed, they can only exhibit merely reactive behavior. Many techniques, however, have been created to allow neural networks to compute with regard to time [11]. A very common technique, and the one used in this research, is to recur the values of the hidden layer neurons as inputs in the next time-step of computation. This allows the network to perform simple time-dependent behavior. More complicated time-related neural networks techniques that may be useful, such as long short-term memory cells, are saved for future work.

2.2.1 Backpropagation

A very common learning method to train the weights of the neural network is backpropaga- tion. Backpropagation is supervised learning, meaning that it is necessary that the desired outputs be known for a given set of inputs. A training set of desireable input and output pairs are used to train the neural network, with the hope that the network will learn a generalized solution and will be able to give desirable outputs for other inputs that are not in the training set. For example, if a neural network was somehow configured to drive a car as its output, and had as its input a camera image from the front of the car, then a driver could drive the car for a few hours around some roads to train it; then, if the neural network had adequately learned, the driver should be able to let the neural network drive, and it ought to still be able to drive on roads not part of the training set. Or, a neural network could be trained to look at faces and identify if the face was of a male or a female; it could be trained on a hundred different pictures, and if trained well, could then accurately identify thousands more.

2.2.2 Neuroevolution

Backpropagation requires that the desired functional behavior of the network be known in advance. In some problems the ideal solution is not yet known, but there still exists 10 a way to measure the success of the behavior. For example, a spaceship controlled by a neural network ought to avoid asteroids for as long as possible, but the optimal behavior for such avoidance might not be known. In this case, it would be impossible to do supervised backpropagation because the supervising behavior cannot be defined. However, behaviors of different neural networks can be ranked according to how long they survive in the asteroid field; we can use this as our fitness function and then evolve the weights of the neural network with a genetic algorithm. After several generations the evolution should produce a solution that is effective for avoiding asteroids, without ever needing a human’s guidance. Both backpropagation and neuroevolution are commonly used to train the weights of a neural network, and both techniques are used in various portions of this research.

2.3 Computer Vision

The field of computer vision has been explored by many researchers since the advent of computers fast enough to handle the large amounts of data typically found in visual images. Some research is focused on recognizing objects or patterns in images, such as faces for facial recognition. Other research focuses on real-time control over a series of images; rather than identifying and classifying an image, it outputs control movements which will influence upcoming images from the visual input. There are many different methods that can be used to train a visual classifier or controller, including Bayesian networks, Support Vector Machines, and neural networks.

2.3.1 Hidden Markov Models and Bayesian networks

Bayesian networks are probabilistic belief graphs that are useful for determining relation- ships between inputs and outputs. This method can be used with vision by associating the visual array with desired outputs. For example, Moghaddam et al. were able to use Bayesian networks for facial recognition [30]. Moreover, Regh et al. used Bayesian networks to detect human speakers [39]. A subset of Bayesian networks, Hidden Markov models (HMM), are statistical systems that model Markov models that have a hidden state. They are especially useful for temporal pattern recognition problems such as is needed for real-time visual con- trol. For example, Starner and Pentland used HMM’s to successfully recognize of subset of 11

American Sign Language by observing hand motions through a video camera [42]. HMM’s can also be used for non-temporal problems, such as for face recognition [31][9].

2.3.2 Support Vector Machines

Support vector machines (SVM) are classification and regression systems which calculate hyperplanes to distinguish between two sets of vectors. SVM’s, like backpropagation with neural networks, use supervised learning for training, and the require a training set of desireable input and output matches. Some of the desireable features of SVM’s are: they are a generalized solution for many problems, that they require little or no domain knowledge, there are few model paremeters to modify, they can learn even with noisy data sets, and the final results are stable [4]. SVM’s can perform pattern recognition very well [6] and are therefore very useful for visual identification problems. Pintil and Verri showed that linear SVM’s can recognize 3-D objects with a high efficiency without using any feature extraction [37]. Guo et al. used binary tree SVM’s to successfully perform facial recognition, with results better than that of the more traditional nearest center approach [19]. Later, Michel and El Kaliouby used SVM’s to perform facial recognition in real-time using a video feed, in order to identify human emotions [29].

2.3.3 Neural Networks

Neural networks are robust controllers that have been successfully used for many visual applications. Much research has used neural networks for recognition problems, such as face recognition [27][20][10] or license plate recognition [8][32]. NNs have also been used for real- time visual processing, such as in the research done by Pomerleau, where a neural network was trained with backpropagation to drive a car along a road using a 30x32 grayscale input [36]. The training was performed in real-time to imitate a human driver. In other research by Baluja, the experiment was modified to replace backpropagation with an evolutionary computation model that allowed for more robust control, but could only be trained with recorded sets of driving data rather than in real-time [2]. Floreano et al. trained a car controller to use active vision to race around a track in a realistic driving simulator [14]. Another virtual car-racing controller was trained by Kohl et al. that used NeuroEvolution of 12

Augmenting Topologies and a 20x14 grayscale input to evolve a vehicle warning system [22]. Furthermore, they used this technique on an actual robot with a mounted camera. These experiments show that neural networks can successfully be used to control an autonomous agent using visual input. The research presented in this thesis may have been able to use SVM’s, HMM’s, or Bayesian networks, but neural networks were chosen because of their proven capabilities in this area.

2.4 First-Person-Shooters as an AI Research Platform

First Person Shooters (FPS) such as Quake II are popular video games that put a player in control of a gun-wielding character who must survive in a hostile world. The player controls the game character from the viewpoint of the character’s eyes and generally always views down the barrel of a gun. FPS’s often natively support some form of programmable AI and are often used for research. Bauckhage et al. conducted research that first recorded demos of humans playing the game Quake II, then taught two neural networks to imitate them. One network learned to aim during combat situations by changing the yaw and pitch; another network learned to change its x and y velocities during combat, and another network learned to traverse the map [3]. Zanetti and El Rhalibi trained neural networks with backpropagation for Quake III to control an agent to collect items, engage in combat, and navigate a map. They also used pre-recorded demos of human players, using the player’s weapon information and location as inputs [46]. Graham et al. trained controllers using neuroevolution for path-finding strategies and obstacle avoidance in the game Quake II [16]. Thurau et al. used a Neural Gas method to train agents in Quake II to imitate the waypoint-navigation of real players by observing their movements from pre-recorded demos [44]. Behavioral grids were evolved in Quake III in research done by Priesterjahn et al., and the resulting agents were able to amply defeat the hand-coded AI robots supplied with the game [38]. In other research not presented in this thesis, I evolved a cyclic programmatic controller for visual control in Quake II, in which the chromosome instructed the agent where on the screen to look and what actions to perform depending on the darkness of wherever it looked [33]. 13

2.5 Synthetic Vision vs. Raw Vision

Synthetic vision, which is a method of representing extra-visual data within the visual field, is used by many research experiments that take place in a virtual world [12][40]. This method might change colors of objects to reflect their distance or angle or to show their specific object type. Enrique et al. re-rendered the visual input into two separate color-coded views, one of which displayed the color of an object according to its velocity and one which displayed color according to entity identification and wall-angle. This high- visual data was used to simplify creation of a hand-coded robot controller that could navigate a map and pick up health boxes [12]. In other research by Renault et al, an agent used extra-visual pixel data and a hand-coded controller to walk down a hallway. Each pixel included a distance value, an object identification value, and a color value [40]. The extra-visual information used in synthetic vision can be accessed because they are using a virtual world, where such information is easily available; because it relies on this extra-visual information, synthetic vision cannot be easily transferred to real-world robotics. Because a virtual simulator is used in this research, synthetic vision could have been used. Instead, only the raw visual data from a normal rendered game screen is used so that the trained controllers would not have any extra information beyond what a human player might see. This forces solutions to be generalized to the point that the learning techniques in this research ought to work without many changes on other games as well as in real-life camera applications. 14

Chapter 3 The Quake II Environment 3.1 Original Game

Quake II is a first-person-shooter (FPS) by id Software, first released in December of 1997. A FPS is any 3-dimensional game that puts the player’s viewpoint behind the in-game characters weapon. The player must usually shoot at enemies and move around in the environment to accomplish some task. In Quake II, the player’s character is a space-marine from earth who is a participant in a massive counter-attack on an alien race, the Strogg, who are readying to invade earth. Unfortunately, almost all of the marines except the player’s character die or are imprisoned shortly after they land on the aliens’ planet. The player must travel across the landscape, through twisted hallways and burnt-out buildings, battling against hundreds of various Strogg aliens, until he eventually meets their leader, Makron, who he must slay. Quake II also features a multiplayer game that allows over 30 players to connect si- multaneously to the same server to play a variety of games with one another. The default multiplayer game is , which pits each player against every other player in a map, with the goal to kill as many other players as possible. There is also a team-based deathmatch, which puts players of the same color on the same team, so that they must work together to kill the other teams. Another method of gameplay is capture-the-flag, which has two teams, each with a flag in their own territory; the goal is to grab the other team’s flag and carry it back to base, while avoiding being killed by the other team. The graphics in Quake II are dark, gritty, and realistic (figure 3.1). The game’s color palette is mostly brown and metallic gray, with some darker primary colors in the mix. The levels are all realistically shaded, though dynamic entities such as enemies and the player do not create shadows. The game engine cannot render large outdoor environments very well so most of the levels are complex rooms and passageways, mixed with a few simple outdoor areas. Id software created Quake II such that it is easy to customize. There are several map editors that users can use to create their own maps. The textures within those maps can be 15

Figure 3.1: An in-game screenshot from the game Quake II. changed by using image editing software. All of the models in the game, such as the enemies and weaponry, can be changed in 3-D modelling software. The behavior of the enemies, the weapons, and the game rules can be changed, even to the extent of transforming the game into a racing game or a flight simulator.

3.2 Quake2AI Interface

I chose Quake II out of the many available FPS’s for a few key reasons. First, the game is open-source; in December of 2001 id Software released the source code for the game engine under the GNU General Public License, which allows anyone to use and modify the code at will. The game needed to be open-source to easily access the screen buffer and to control the player’s movements. There are many open-source FPS’s available on the , but most of them require a dedicated 3D graphics accelerator card, meaning that only one game could run on each computer, and computers without graphics cards could not run the game at all. Fortunately, Quake II can render either on a graphics card or completely in software; because of this, I am able to run several instances per computer as well as run instances on the nodes in our cluster. Quake II natively supports server-side AI controlled bots. Server-side bots, as the name suggests, run on the game server. Since there is only one server for each Quake II game, all of the bots need to run on the same processor, which would computationally restrict bot AI whenever multiple bots were in the same game. Server-side bots also do not graphically 16 render their view of the game world, so they are unusable for vision research. Instead of using server-side bots, I use the Quake II and make client-side bots. The Quake II client allows a player to connect to the game server and play the game; it takes keyboard and mouse input and sends the desired player movements to the server; the server sends back game-world information and the client renders it to the screen. In order to control the player’s movements with an AI rather than by a human, I found out how all the movement and control messages were sent to the server, and I made easily accessable functions with intuitive names and put them in the AI interface. The AI interface includes functions to move forward, backward, and sideways, and to turn, look up/down, shoot, jump, and crouch; any other control option can be sent through the console command function. For the visual input, I directly access the 2-D buffer of the rendered 3-D world; functions are available to access individual pixel color values as well as averaged values of large blocks of pixels. The interface also provides access to location coordinates of any rendered entities in the map, and it provides a tracing function to find out if and where there is an intersection with a wall between any two points in the map. In order to make the AI functions generally accessable, the source code is compiled into a shared library rather than a binary executable. The ”main” function that would normally be called to launch the game upon execution is renamed and is instead called by a function in the shared library to launch the game. Because it is a shared library it can be accessed easily by linking from a C program, and it can easily be interfaced by other programming languages. For the research in this paper, I built a module in Chicken Scheme, so that all the AI functions can be called from the Scheme programming language. I also have made a Python module, which is being used in other research. 17

Chapter 4 Experiment 1: Graduated vs. Uniform Density Retina 4.1 Introduction

The human retina retrieves more visual data from the middle of the visual field than it does from the periphery; the rods and cones are more dense in the center. Because of this, humans can focus their attention wherever their eyes are pointed and see clearly; at the same time, they are able to notice objects in their peripheral vision so that they can choose to shift their attention when necessary. For this experiment I loosely imitate the density of visual acuity found in human retinas and make a digital retina that has denser visual data in the center of the screen and less dense near the periphery. I test this against a retina which uses a uniform density of visual data across the entire width of the screen. Results show that the graduated density of visual data found in humans works very well as a digital retina design for the task given in this experiment.

4.2 Experimental Setup

As previously stated, the Quake II game world is dark and dreary. Because of this, there is usually not much contrast between enemies and walls, between walls and ceilings, or walls and floors. While human players are usually able to cope with this environment, I decided the first vision-based controller should start out in a visually simpler environment. I created a simple room map that is square in shape and quite small; relative to humans, about as big as half a basketball court. The ceiling is a dark brown texture, the walls are a gray texture, and the floor is white. The enemy is dark-blue and black, and contrasts very well against the lighter walls and floors, even when converted to grayscale (figure 4.1). In regular Quake II multiplayer, the players enter the game through spawn portals. I raised the ceilings and put the spawn portals above the play area, so that they would not be in the way of the agents. Whenever an agent or enemy dies, it reappears at a random spawn portal and drops to the floor. In order to lessen the computational burden, I also disabled 18

Figure 4.1: A screenshot of the simple room used for this experiment. The ceilings are a dark brown textures, the walls are a gray texture, and the floors are white. The enemy is dark-blue and contrasts the light colored walls and floors. The trail of dots indicates the bolts from the learning agent’s blaster. the display of dead bodies and their scattered body-parts that otherwise would litter the floor. The goal for the learning agent in this experiment is simply to kill an enemy as many times as possible in the allotted time. The enemy does not shoot, but does move around the room in random sequences. The learning agent is equipped with a blaster weapon, which I have removed graphically from the screen so as not to interfere with the visual environment. I modified the blaster weapon from the original game to inflict more damage so that one shot kills the enemy, and I made it capable of shooting in bursts. Normally, if the learning agent is constantly shooting, the frequency of shots is about one every half second. However, if the agent refrains from shooting for a time, the gun will charge and will be capable of firing up to 5 shots in a quick-succession burst, depending on how long it has charged. I made this modification so that agents would have a reason to avoid shooting constantly, and only when needed.

4.3 Neuro-Visual Controller

I tested two different visual schemes for the controllers. I could not simply input the entire pixel buffer of Quake II’s lowest default screen resolution, 320x240, into a neural network because the network would need 76,800 weights for each hidden-layer neuron, and would 19 be too slow in computation and take too long to evolve. Also, the agent does not need so much information to perform such a simple task, so I reduce the resolution by averaging pixel values together into larger blocks. Also, since the agent is only fighting one enemy, and the enemy appears much darker than the walls and floor, color is not important, so I use gray-scale input. Because the map is entirely flat, the agent never needs to look up or down. The most important visual information for the agent appears in a horizontal band that runs across the center of the screen. Thus, in order to keep the controller simple and pertinent, I read only a 14x2 band of gray-scale pixel blocks across the center of the screen (figures 4.2, 4.3).

Figure 4.2: Left: A scene as rendered by the game engine. Right: The same scene as viewed via the graduated density retina. The black tics above and below the image indicate the retinal block widths. The first type of block input layout is inspired by the human retina, which is able to capture visual data of higher resolution in the center of the retina than in the periphery. To make such a graduated density retina, I used a 14x2 band of blocks and made the blocks 10 pixels in height, as with a uniform retina, but the blocks nearer to the center encompass smaller amounts of pixels averaged than the blocks nearer to the outer edge; each block away from the center is approximately 1.618 times larger than the previous block (the golden ratio). With this system the center views much finer detail than the blocks near the outer edge (figure 4.2). The second type of input layout that I tested used evenly spaced blocks that averaged a 23x10 grid of grayscale pixel values into one value for each block (figure 4.3). Both the 20 graduated and uniform layouts used 28 blocks, so that each used the same number of inputs. I rescale the integer grayscale value of the blocks in the 14x2 input array onto floating point numbers in the range [0, 1] for use as inputs to the controller network (figure 4.4). The controller is a simple recurrent network [11] with 28 inputs, 10 hidden/context units, and 4 output neurons. I use a “flat” hidden layer, with no attempt to capture the input geometry in the network architecture. The neurons are squashed by tanh, which scales their outputs to the range (−1, 1). The four output nodes control turning, forward and backward velocity, lateral velocity (strafing), and shooting. For turning, the output is scaled onto the maximum possible per-frame turn rate (10◦), and the sign determines the direction of the turn. For the longitudinal and lateral velocity controls, the output activations represent the desired fraction of full speed, with the signs indicating the choices of direction. Each layer had a bias unit with a constant 1.0 value.

Figure 4.3: A view via the uniform density retina. The enemy’s location and distance are similar to the view in figure 4.2. The contrast between the enemy and the walls and floor are much less distinct in the uniform retina than in the graduated density retina, because of the increased area averaged into the visual block where it appears.

4.4 Training

For the evolution, I use a QGA (chapter 2), which is a steady state first-in-first-out algo- rithm that allows for easy distribution of fitness evaluations over multiple instances of the simulation [35]. In this experiment I use a queue size of 128 individuals. The 434 weights of the controller network are stored as floating point numbers, which are genes in a chromo- 21

Figure 4.4: Diagram of the controller network, with 28 visual inputs, 10 recurrent hidden layer neurons, and 4 outputs. some for each individual. To initialize the population, individuals are created with random genes with values in the range [−1, 1]. Once selection begins, each new individual is formed by roulette wheel selection of two parents from the population, where there is higher chance that individuals with higher fitness will be selected over lower fitness individuals. Crossover is uniform between the two parents, per gene, with equal chance for each parent that its gene will be selected. Mutation occurs with a 10% chance per gene. The mutations are generally very small and are calculated by adding a delta from a sharply peaked distribution log(n) function 10 ∗ random(−1or1) to the current value of a gene, where n is some random number in the range [0, 1]. Each individual is tested in the following manner:

1. The agent appears in room and drops to the floor.

2. The agent is given 24 seconds to kill the enemy as many times as possible.

3. Whenever the agent kills the enemy, the enemy immediately reappears at a random location.

4. At the end of the 24 seconds, the agent is removed, and a new agent with a new chromosome appears for evaluation.

Fitness is awarded solely by the number of times the agent has killed the enemy. The 22 time-limit of 24 seconds is long enough to allow for skilled individuals to distinguish them- selves from lucky individuals, and short enough to complete the evolution in a reasonable time. In order to increase the bias toward the more fit genomes in roulette wheel selection, the number of kills is multiplied by 5 and then squared. I found it too difficult for a random population to learn to shoot a quickly moving enemy, so I increment the difficulty of the enemy as the evolution proceeds. To do this I first evolved a population against a completely stationary enemy and marked at approximately what average fitness of the population the agents appeared to be actually aiming at the enemy. I set the fitness bar at that fitness, and, starting from new populations, whenever the average fitness of the population reached the fitness bar, the difficulty is increased by increasing the speed of the enemy. After every sixth individual has been evaluated for fitness the average fitness is calculated again, and the difficulty level is incremented again if that average has risen back above the current fitness bar. Difficulty is increased by adding 0.01 to the enemy’s speed multiplier, which begins at 0.0. The difficulty must be increased 100 times before the enemy reaches its maximum possible speed.

4.5 Results

The graduated density retina performed better than the uniform density retina (figure 4.5). The main behavior learned by controllers that used the graduated density retina is to spin in circles, moving forward, until the enemy appears as a dark dot in the center of its retina, then to stop spinning, or spin in the reverse direction, move toward the enemy, and shoot. Some populations learned to hold their fire until the enemy appeared in their center, and then fire a burst of shots. In the first stages of the evolution, the agents appear to only pay much attention to the center blocks of the retina. Since the agents only compute at 40 frames per second, maximum, sometimes they turn so quickly that the dark spot of the enemy agent skips over the center of the retina, and the agent does not react. However, as the agent becomes more advanced it learns to use the information in the periphery as well. The uniform density retina controller did not learn as well as did the graduated density retina (figure 4.5), likely due to its inability to fine-tune its aim. When the enemy is far away the retina’s input blocks viewing the enemy turn only slightly darker due to the relatively 23

Figure 4.5: The population fitness averaged over 24 independent runs, tested using the uniform density retina (figure 4.3) and the graduated density retina (figure 4.2). The dashed bottom line indicates the increased movement of the enemy, where 300 is the maximum possible enemy movement speed. large visual area being averaged, unlike the blocks of the graduated density retina, which clearly show the enemy as dark when centered, even at a great distance. The uniform density tests produced agents that could kill an enemy at close range, but spun around blindly whenever the enemy was across the room. The graduated density controllers, however, were able to see the enemy, though it be far away, and shoot at it. The aiming of the graduated density could also be more precise even at closer range, and especially at farther, because the darkening of the small center blocks clearly indicated that the enemy was centered; the uniform retina, however, had in the center only large blocks of gray, which it must compare to determine the offset of the enemy.

4.6 Conclusion

In this experiment, the first-person-shooter Quake II was used to test two different retina layouts as inputs to an evolving neural network to accomplish a simple task. Quake II was modified to appear visually simpler, so that the ceilings, walls, floors, and enemy were all distinct. A genetic algorithm was used to evolve the weights in the neural networks. To test the fitness of the agents, they were placed into a simple single-room map and given 24 seconds to shoot an enemy as many times as possible, given fitness for each kill. At the start 24 of the test, the enemy did not move, but after the agent population reached a certain average fitness bar, the enemy began to increment the speed of its random movement, and continued to increment whenever the average fitness rose above the fitness bar. Two different retina layouts were tested, each with the same number of block inputs: first, a graduated density retina that is more focused in the center, with smaller blocks that become wider as they near the periphery; second, a uniform retina which uses equal block widths across the width of the view. The results of the tests showed that the graduated density retina learned faster and with more success on average than did the uniform retina. This experiment shows that it is beneficial to focus on particular sections of information in the visual field. With the same amount of input information, a focused retina can greatly improve behavior. In my experiment, the focused retina displayed the enemy as a distinctly darker block, even at great distance, and it enhanced the agent’s ability to accurately aim at the enemy. Setting the focus to the center of the screen was so successful because the agent’s gun aims at the center of the screen. However, in vehicle-based road-following experiments, or in other aspects of first-person shooters, it may not be best to focus in the center of the screen, but rather on the side of the road, or on likely pathways. Future work will include evolving where to focus the retina for particular tasks, as well as evolving controllers which dynamically change the focus in real-time. 25

Chapter 5 Experiment 2: Lamarckian Neuroevolution 5.1 Introduction

In this experiment, I again train a neural network controller to shoot an enemy in a room, but I have drastically increased the difficulty of the problem by adding uneven shadows to the room, as well as a darker floor. This added visual complexity hinders learning to the point that my previously-used graduated density controller [33] is unable to learn satisfactory behavior when trained only with neuroevolution. Instead, I combine neuroevolution with backpropagation to help train the network and are able to achieve the desired behavior. My process is inspired by Lamarckian evolution, an idea proposed by Jean-Baptiste Lamarck in the early nineteenth century which hypothesized that phenotype adaptations learned within an individual’s lifetime were passed onto its offspring [26]. This type of evolution can easily be added to neuroevolution by changing the weights of the neural network over an individual’s lifetime and returning the modified weight structure to the evolutionary gene- pool [17][45][23]; the Lamarckian adaptations can be made by using backpropagation [41]. Lamarckian neuroevolution that uses backpropagation has previously been used in agent- control research by Bryant and Miikkulainen, which trained a controller to play the strategy game Legion II by imitating pre-recorded human players [5]. Instead of backpropagating on pre-recorded human players I use a hand-coded bot that does not access any visual input, but rather directly accesses game state information. The neural network is corrected against this hand-coded bot and learns to imitate it using only visual inputs.

5.2 Experiment Setup

In this experiment I use the same game-world setup as in the retinal layout experiment [33], excepting that I use a more visually complex map. In my previous research the map had a white floor, gray walls, a dark ceiling, and a dark enemy; moreover, the map was fully lit with no shadows (figure 4.1). In my current research, the floor and ceiling of the map are brown, the walls are gray, the enemy is dark, and the room is lit dimly with varying shadows (figure 5.1). This map is much more realistic when compared to real-world rooms, 26 and is also more difficult for good visual controller performance. The map is a single open room that is about the size of half a basketball court. The spawn portals are also near the ceiling, as in my previous experiment, to prevent spawn-kills. The enemy agent is exactly the same as in the previous experiment, and moves about the room in a random pattern. The blaster also has the same burst-fire capabilities as in the previous experiment.

Figure 5.1: An in-game screenshot of the environment used in this experiment. The floor and ceilings are brown, the walls are gray, and the enemy is dark blue. The room is dimly lit with varying shadows. The display of the shooter’s own weapon has also been removed. (Cf. figure 4.1.)

In this experiment I use the graduated density retina (figure 4.2) from the previous experiment. I also use the same neural network with 28 inputs, 10 hidden/context neurons, and 4 output neurons. Like in the retinal layout experiment, I represent the weights of the neural network as a chromosome and evolve it with the QGA [35]. While I could have probably solved the problem of the darker room by increasing the visual acuity of the retina and the number of inputs to the neural network, I wanted to try to find a way to solve it with the limited controller, concentrating on improving the learning technique.

5.3 Hand-Coded Bot for Backpropagation

The backpropagation of the neural network used for Lamarckian adaptation needs to imitate some supervising behavior. I could have attempted to imitate a human player, but humans are very inconsistent in their actions so learning would be difficult. Instead, I hand- coded a controller that performs the desired behavior; rather than using visual input like 27 the neural network, it directly accesses the location coordinates of the enemy as relative to the agent and outputs the actions that the agent should do for every frame. The goal of this experiment is to have a controller that uses only visual input, but I can let the hand-coded controller “cheat” and access the direct coordinates because the neural network will learn to imitate this behavior using only the visual input. In order to hand-code a controller that can be used with backpropagation for a neuro- visual network, one must consider the capabilities of the visual field. The hand-coded bot must be designed to react only to situations that are clearly visible in the visual field; if it does react to non-visible elements, then the visual controller will become confused because duplicate visual situations will require different control outputs. For instance, if my Quake II bot always turned directly toward the enemy, even if the enemy is not currently on screen, then the visual controller will be trained to turn both left and right when it sees no enemy on screen, because sometimes the enemy will be offscreen and closer to the left, and other times offscreen and closer to the right. Because of this, my hand-coded bot assumes that there is no enemy until the enemy is somewhat centered in the view of the agent; only then does it follow its aim directly towards the enemy. When the enemy is not near the center, or not even on screen, then the bot always spins to the left, as if searching for the enemy. When the enemy is centered, then the hand-coded bot shoots at the enemy, vibrating its turning slightly to scatter the shots. When the enemy is killed it begins its death animation, and by observing the enemy’s animation frames I can tell when the enemy has fallen out of view of the visual controller’s retina and then proceed to search for the enemy again by turning left. My controller performs no action inconsistent with the visual field, which is essential to successful backpropagation of the visual controller. Like the neural network, the hand-coded controller outputs 4 values, corresponding to turning, moving left or right, moving forward or back, and shooting or not shooting. For the real valued movement, I can simply output the desired movement values. For the binary shooting output, which will shoot if positive and not if negative, I output the most extreme values of -1.0 and 1.0, which seems to work well for training. 28

Figure 5.2: An in-game screenshot of an agent looking at the enemy opponent in the map. The left side of the screen shows the Quake II game screen and the right side shows what the agent actually sees through its retina. 5.4 Training

I trained two different tests in this experiment: in one test I use only neuroevolution for the learning; in the other, I use Lamarkian neuroevolution with backpropagation for adaptation. The setup of the two tests are exactly the same, except that each individual in the latter test is given 12 seconds to backpropagate against the hand-coded controller before its fitness is tested. During these twelve seconds the hand-coded controller completely controls the agent. The neural network controller, which consists of the weights given by the chromosome as dispensed by the QGA, is given the visual inputs of the agent as the agent performs the hand-coded control. The neural network outputs what it would do to control the agent for each frame of gameplay, 40 per second, and the error is backpropagated with a learning rate of 0.0001. The weights are permanently modified throughout the backpropagation; after the backpropagation the updated chromosome remains static and is tested for 24 seconds and returned to the QGA. Each individual chromosome is tested according to this process:

1. The learning agent appears in the room and drops to the floor.

2. If the agent is using backpropagation, it imitates the hand-coded controller for 12 seconds.

3. The agent is given 24 seconds to kill as many enemy ’bots’ as it can; kills are counted. 29

4. Whenever an enemy is killed, it promptly respawns at some random location in the room.

5. After the 24 seconds the current learning agent is removed and the fitness for its chromosome is reported.

The fitness is according to the number of kills achieved in the 24 second testing period; in order to increase selection pressure, I use (5n)2, where n is the number of kills. Because there is some time delay between the death of the enemy and its reappearance, the maximum number of kills possible in the 24 second period is about 12. In this research, as in my previous research, I shape the enemy’s speed, increasing it whenever the average fitness of the agent’s population reaches a certain point.

5.5 Results

I tested the two learning schemes for 1000 generations each. The tests that learned using backpropagation for Lamarckian neuroevolution were much more successful than the test that used only neuroevolution. The right of Figure 5.3 shows the average fitness of the 24 populations that used backpropagation mixed with neuroevolution. In this graph we see that the fitness jumped up quickly and by the 75th generation the fitness was high enough that the enemy’s speed began to be incrementally raised. By the end, the average enemy speed of the 24 tests is a little over half of the full speed capable by the enemy. Comparatively, the left of figure 5.3 shows the average fitness for the 24 populations that used only neuroevolution. In 21 of the 24 neuroevolution-only tests the average fitnesses of the populations never reached high enough to increase the speed of the enemy past zero, whereas in all 24 of the Lamarckian tests the enemy’s speed increased past zero. The dim lighting and varying shadows makes this problem extremely difficult for my simple retina controller. The 14x2 retina does not appear to provide adequate information for distinguishing the darkness of the enemy from the darkness of the shadows. Observance of the agents developed by the neuroevolution-only controller shows that the agents con- verge prematurely on a “sprinkler” solution: they turn around the room sprinkling it in a pattern that is somewhat likely to hit the enemy; the enemy itself is ignored and no special 30

Figure 5.3: Average of the average fitnesses of the 24 populations that used plain neuroevo- lution and Lamarckian neuroevolution. The top dark line shows the fitness according to the number of kills, and the dashed line shows the enemy’s speed, which increased whenever the fitness reached a certain point. behavior is presented when the enemy appears in view. The tests that used backpropagation as a learning aid, however, could not get stuck in this premature local optimum because the agents were always being influenced to imitate the hand-coded controller. After some generations of backpropagation and evolution, the controllers began to learn to react to the enemy’s appearance. By the later generations, the agents behaved very similar to the hand-coded bot: they shot bursts of blaster fire at the enemy and would even track the enemy as it moved; they did not fire their blaster unless the enemy was in view, except at an occasional shadow. The fitnesses of the best individuals from the Lamarckian controllers were comparable to the best fitnesses of the hand-coded bot, ranging between 7 and 9 kills per turn; the average fitness of the visual controller, however, was much lower at about 3.5 kills per turn, whereas the average fitness of the hand-coded controller was about 7 kills per turn. Since luck can play a large role in earning the “best” fitness, the average fitnesses are generally a more accurate representation of the abilities of the Lamarckian visual versus hand-coded controllers. One disadvantage of adding backpropagation to neuroevolution is that it requires extra time for the backpropagation training. In this experiment it added an extra 12 seconds of training to every individual; they each took a total of 36 seconds instead of 24 seconds, a 31

150% time increase. In the time that it would take for the neuroevolution-only test to finish its 1000th generation, the test using backpropagation would only be at its 666th generation. We can see by comparing the graphs that the fitness of the Lamarckian test is still much better at the 666th generation than the neuroevolution-only test at the 1000th generation. Another disadvantage of the backpropagation is that the population will only become as successful as the hand-coded controller which it is imitating. Because the controller is designed by hand and is not using the visual field, it may not be an optimal controller for supervision. This problem can likely be solved by decreasing the effect or time of backpropagation as the evolution proceeds. For example, in the beginning longer periods of backpropagation could be used to quickly boost the behavioral fitness; then, as the fitness increases, the time used doing backpropagation could slowly drop down until sometime when there is no backpropagation, and the evolution can continue to perfect the controller for visual control without any influence from the hand-coded controller. Using backpropagation by itself was an option for this research, but distributing back- propagation across multiple computers on a network is a difficult problem. If we were to match the running computational time of a population from the the neuroevolution-only test, running only a single backpropagating client, it would require over 2000 days of com- putation. There are some techniques to distribute backpropagation, such as was done by Chen et al. [7], but the techniques are complicated and do not exactly model pure back- propagation. Using the distribution of the QGA to evolve the controllers in combination proved to be an easy and fast solution that could utilize our large number of computers, and that also had the advantage of the incorporated evolutionary learning.

5.6 Conclusion

In this research I compared two learning methods: one used neuroevolution only and one used neuroevolution with backpropagation for Lamarckian adaptation. The controller used by both tests was the same graduated density and neural network combination from the previous experiment. Both tests evolved the weights of the network for 1000 generations, but the Lamarckian neuroevolution test also used backpropagation on every individual to persuade it to imitate a hand-coded controller. The hand-coded controller was programmed 32 using non-visual game data, such as the enemy’s exact location, and controlled the agent to shoot at the enemy. During the backpropagation, the hand-coded controller controlled the agent while the neural network controller learned to imitate it. The task learned was to shoot a moving enemy in a visually complex environment. The environment was a square room with dim lighting and varying shadows. The enemy’s speed increased throughout the evolution, after the average fitness of a population reached a certain value. Fitness was awarded according to the number of kills during a 24 second test period. Results showed that using Lamarckian neuroevolution was much more success- ful than using neuroevolution alone. The neuroevolution-only tests learned a premature solution that ignored the appearance of the enemy and merely sprinkled bullets around the room in a semi-deadly pattern. The tests that used Lamarckian neuroevolution learned to imitate the hand-coded controller using only the visual inputs, and their fitnesses were accordingly more successful. This research shows that backpropagation can be used with a hand-coded non-visual controller and mixed with neuroevolution to learn visual-only control in a visually complex environment. Moreover, this research emphasizes the general idea that an effective controller that uses high-level inputs can be used with backpropagation to teach a controller that uses lower-level inputs. This could also be useful to train non-visual agents, such as a robot that learns to use its infrared distance sensors to navigate by imitating a higher-level hand- coded controller which uses some higher-level triangulation of WiFi signals to determine its location; in the final product the controller might not be able to use WiFi to pinpoint its location, but to train the lower-level sensors it can “cheat” and use WiFi triangulation. 33

Chapter 6 Experiment 3: Lamarckian Neuroevolution Without Human Supervision 6.1 Introduction

In my last experiment, I trained a neurovisual controller by using a combination of neuroevo- lution and backpropagation in a Lamarckian evolution learning scheme. The controllers were evolved according to a fitness that measured how many times they were able to shoot a mov- ing enemy in a visually complex room. The agent’s controller, which used purely raw visual inputs, backpropagated against a hand-coded non-visual controller which performed the desired behavior. I also trained controllers that used pure neuroevolution for comparison. I found that the neuroevolution-only controllers were unable to learn a satisfactory solution to the problem, converging upon a sprinkler-like strategy that paid little or no attention to the location of the enemy. The Lamarckian controller, by backpropagating against the non- visual hand-coded controller, was able to imitate the behavior of the non-visual controller, and learned to attack the enemy whenever he appeared on screen, using only visual inputs. A problem with my previous Lamarckian experiment was that it required a priori knowledge about the solution of the problem in order to program the non-visual controller. In cases where the optimal behavior is unknown, a human’s guess will often fall short, and may even hinder the evolution by always limiting it to backpropagate against a sub-optimal solution. In this experiment, I remove that human element by evolving, rather than hand- coding, the non-visual controller; it is then used in Lamarckian evolution to train the visual controller.

6.2 Experiment Setup

For this experiment I have parted from the simple open room used in my previous research [33][34] and have inserted a large square pillar into the midst of it (figure 6.1). The room is the same size as in my previous research, with the same style of shading, and the task is also the same: shoot and kill the enemy as many times as possible. The pillar in the middle of the room occludes the player’s view of most of the map, so often times the enemy is not 34 visible and the hallways around the pillar must be traversed in order to find him. Because of this new setup, the agent can no longer use its previous optimal behavior, which was to spin around in circles shooting burst of shots at the enemy. In this map I place the spawn portals for the agent over the empty hallways surrounding the pillar, so that he drops to the floor and can immediately begin hunting for the enemy. The enemy’s spawn portals are over the top of the pillar, and he is programmed to auto- matically walk to the opposite side of wherever the agent happens to be. Once the enemy drops to the hallway floor, it begins moving about in a random pattern. Other than the pillar placed in the center of the room, the environment is mostly the same as the previous experiment. The agent’s weapon still features the ability to fire bursts of shots. The controller again consists of a 28 input, 10 hidden, and 4 output neural network which recurs the hidden layer as more inputs to the next frame, and which has a 1.0 bias unit as input to both the hidden and output layers. The neural network again uses the 14x2 graduated density retina (figure 4.2) as presented in chapter 5. The weights of the network are represented as a chromosome and evolved with a QGA [35].

Figure 6.1: An in-game screenshot of the environment used in this experiment. The floor and ceilings are brown, the walls are gray, and the enemy is dark blue. The room is dimly lit with varying shadows. A large square pillar is placed in the center of the room. The display of the shooter’s own weapon has also been removed.

6.3 Supervising Bots for Backpropagation

In order to perform backpropagation learning, the agent must have some supervis- 35 ing behavior to imitate. As explained in the introduction of this thesis, many real-time backpropagation experiments in FPS’s involve imitating pre-recorded human behaviors. A problem with using human gameplay is that humans are not very behaviorally consistant, and it is a hassle to build up a large library of recorded human players; moreover, unless they are particularly good, human players perform less than optimal. Instead of using a human player my first solution was to use a hand-coded controller that would tell the neural network what it should have done for every frame. If I were able to program this supervising controller to do exactly what I wanted using the same inputs as the visual controller then I would not have much need to evolve a neural network that does the same thing with the same inputs. However, hand-coding visual controllers is particularly difficult, so instead I “cheat” and use non-visual inputs, like the enemy’s exact X and Y location, to easily hand-code a controller that does what I want. It’s permissible to let the supervising controller use non-visual inputs because in the end I still end up with a visual-only controller, but it will have learned its behavior from a non-visual controller. In my previous Lamarckian research, I hand-coded the non-visual controller, which worked well because I already had a good idea of the optimal solution to the problem, which I learned by observing the solutions evolved by neuroevolution in the retinal layout experiment, which used a room with no shadows. For the pillar room, I did not know the optimal solution, so rather than making a hand-coded controller, I evolved the a neural network that used non-visual data for inputs. The non-visual neural network controller is a simple recurrent network with a hidden layer. There are 11 non-visual inputs: 7 are wall distance sensors at 0 degrees and at 10, 25, and 60 degrees on either side; there are 4 enemy inputs which tell the x and y distance and x and y velocities of the enemy, relative to the agent’s heading angle. Whenever the enemy is not on screen, is occluded behind the pillar, or is not near the center of the field of view, since some input must be given to the network, fictional inputs are used that indicate an imaginary enemy who is very far away. It is important that the non-visual controller cannot use information that is not somehow accessable to the visual-controller, so that the non-visual does not attempt to train the visual to do the impossible. The non-visual controller has 4 outputs that are exactly the same as the visual controller’s, so the outputs can easily be used to teach the visual 36 controller through backpropagation.

Figure 6.2: An in-game screenshot of an agent looking at the enemy opponent in the map. The left side of the screen shows the Quake II game screen and the right side shows what the agent actually sees through its retina.

6.4 Training

In this experiment I compare the Lamarckian controller to a neuroevolution-only con- troller. The setup of each test is the same, except that each individual in the Lamarckian tests are given 12 extra seconds to learn through backpropagation to imitate the non-visual supervisory controller before their fitnesses are tested. During these twelve seconds the non-visual controller completely controls the agent. The visual neural network controller, which consists of the weights given by the chromosome as dispensed by the QGA, is given the visual inputs of the agent as the agent performs the control specified by the non-visual controller. The visual neural network outputs what it would do to control the agent for each frame of gameplay, 40 per second, and the error is backpropagated with a learning rate of 0.0001. The weights are permanently modified throughout the backpropagation; after the backpropagation the updated chromosome remains static and is tested for 24 seconds and is returned to the QGA. For the supervising controllers, I evolve one non-visual controller for each Lamarckian test, selecting the best individual from the 1000th generation. I then use this supervising non-visual controller for the 12 seconds of backpropagation in the Lamarckian tests, and run those tests for 1000 generations. To accurately compare the neuroevolution-only controller 37 to the Lamarckian controller, we must allow it to evolve for the sum total of the time used to evolve the non-visual supervising controller and the Lamarckian visual controller, as well as the 12 seconds of backpropagation for each individual. This totals to 2500 generations of evolution for the neuroevolution-only controller, at 24 seconds per individual. Each individual chromosome is tested as follows:

1. The learning agent appears in the room and drops to the floor.

2. If the agent is using backpropagation, it imitates the supervising controller for 12 seconds.

3. The agent is given 24 seconds to kill as many enemy ’bots’ as it can; kills are counted.

4. Whenever an enemy is killed, it promptly respawns at some random location in the room.

5. After the 24 seconds the current learning agent is removed and the fitness for its chromosome is reported.

As in my other experiments, the fitness function is the equation (5n)2, where n is the number of kills.

6.5 Results

I tested 25 different populations for each of the two learning schemes. The Lamarckian controller’s non-visual supervisor was also evolved 25 separate times, one for each Lamar- ckian population. The Lamarckian learning tests performed much better than did the neuroevolution-only tests. The fitness chart of the Lamarckian tests (right of figure 6.3) shows that the enemy’s movement speed is up to 27% of fullspeed by the end of the test; comparatively, the enemy’s movement speed is only at 10% in the neuroevolution-only test (left of figure 6.3). The average fitness of the neuroevolution tests gradually reaches just under 250. The average fitness of the Lamarckian test exceeds 250 very early in the tests and wavers around there for the remainder of the testing period. The Lamarckian controller not only statistically achieved higher results, its observable behavior is also superior and much different than the behavior of neuroevolution-only tests. 38

Figure 6.3: Average of the average fitnesses of the 25 populations that used neuroevolution only and that used Lamarckian neuroevolution. The top dark line shows the fitness accord- ing to the number of kills, and the dashed line shows the enemy’s speed, which increased whenever the fitness reached a certain point.

Both controllers learned to walk around the pillar in one direction, which seemed to be optimal movement strategy because the enemy always dropped on the opposite side of whatever location the agent happened to occupy. The neuroevolution controller seems to have learned a “sprinkler” pattern for shooting at the enemy, by which the agent merely sprinkles the hallways in a pattern that is likely to hit any enemies that may be in it. The majority of the Lamarckian agents, however, learned to shoot bursts of blaster fire only when the enemy was nearby. The non-visual supervising controllers easily learned that the best strategy is to save the shots until the enemy appears, then shoot a burst at the enemy, thereby increasing the chance that one of the shots will hit the enemy. These supervising agents pushed the visual Lamarckian controllers towards this strategy until they were able to do likewise. The neuroevolution-only tests, however, became stuck in a “sprinkler” strategy local optimum and had no supervisory controller to push them out. Figure 6.4 shows the average fitness of the non-visual supervisory controllers. The non-visual controllers do much better than either of the visual controllers, and seem to continually be improving. Almost immediately the average fitness of the populations is higher than the highest average fitnesses of the visual tests. I arbitrarily chose to pick the best individual from the 1000th generation, though it appears that I could have picked one from a much earlier generation, since it still would be outperforming the visual controllers 39 at anything above 250 fitness. However, since the best individual is sometimes just a lucky individual in the earlier generations, choosing the best from a later generation provides a greater chance of the best individual really being skillful.

Figure 6.4: Average of the average fitnesses of the 25 populations evolved non-visual con- trollers. The dark top line shows the fitness according to the number of kills, and the dashed line shows the enemy’s speed, which increased whenever the average fitness reached a certain point.

6.6 Conclusion

The research in this paper compared a visual controller trained by neuroevolution only and a visual controller that used a combination of neuroevolution and backpropagation in a Lamarckian learning scheme. Each individual in the Lamarckian test was trained shortly with backpropagation before it was tested by the fitness function; the supervising input for the backpropagation came from an evolved neural network controller that used non- visual inputs, such as distance sensors and the exact enemy location, which was previously trained by neuroevolution. The controllers’ task was to learn to shoot and kill an enemy as many times as possible in a 24 second period in a room with a large central pillar. Both the neuroevolution-only and the Lamarckian tests evolved controllers that learned to walk around the pillar, but they differed in that the Lamarckian tests learned to shoot bursts when the enemy was in view, while the neuroevolution-only tests seemed to ignore the enemy and 40 shot in a random “sprinkler” pattern. The neuroevolution-only tests probably were stuck in a “sprinkler” local optimum, and were unable to learn more complex strategies because the problem was too difficult to learn from scratch with visual inputs. The Lamarckian tests, however, were heavily influenced through backpropagation by the strategy that the supervising controller learned, which it was able to learn from scratch because it used simpler non-visual input; these supervising controllers were able to guide the Lamarckian controllers out of any local optimums like the “sprinkler” strategy. The most important aspect of this research is the idea of using controllers that use a certain set of inputs to help train controllers that use some other set of inputs. This idea may be particularly useful in real-world robotics as well as in simulations. It may be easier to train successful behavior using one set of inputs, usually higher level inputs, but it may be required that the final controller must use a different set of inputs, usually lower-level inputs. For example, suppose that someone needs a robot that can navigate through some hallways using only laser rangefinders for inputs. It may be too difficult to hand-code or to evolve a successful controller for such low-level inputs; instead, a controller that used some extra information such as GPS and an internal map could be hand-coded or trained to perform successfully, and that new controller could then be used to help train the controller that used the low-level laser rangefinders. In this research it was fortunate that the evolved non-visual supervisory neural network learned a strategy that could transfer well to a visual controller. To make sure that the evolved supervisory controller could teach the visual controller I had to make sure that the non-visual inputs did not use any information that could not be derived from the visual input. For this problem it was not too difficult to limit the supervisory inputs, but it may be very difficult to do for other controllers that have more complex inputs. It would be helpful in the future to devise some system to automatically limit the supervisory controller so that it does not attempt to train the learning controller to do some impossible thing, given its limited set of inputs. My work is now being directed to find such an automated solution to further remove requirements of human intuition from this Lamarckian style learning process. 41

Chapter 7 Conclusion

In this research I explored various techniques for learning visual control in the Quake II environment. Throughout the experiments I used evolutionary learning with a genetic algo- rithm, which is a technique that breeds together chromosomes that represent the variables of a solution. A GA uses a fitness function to test the chromosomes, and those with the higher fitness have more probability of being selected for breeding. In my particular case, I used a Queue Genetic Algorithm, which is a steady-state first-in-first-out GA that is very easily distributable over multiple cores and over a network. I mapped floating point numbers that made up the chromosomes of the GA as weights to neural network controllers. In my first experiment, I tested two different retinal layouts that both mapped to the same neurovisual controller. I grabbed grayscale averaged color values of rectangular block regions of pixels from the screen. The first retinal layout’s input blocks were all the same width apart across the screen. The second layout somewhat imitated a human retina in that the blocks were of a graduated density, being narrower in the center and growing wider to the periphery; likewise, a human can see most detailed in the center of vision, and nearer the periphery the vision is less resolute. Both of these layouts used the same number of blocks; only the arrangement was different. The controllers were evolved with neuroevolution and were given the task to shoot and kill an enemy as many times as possible in a small, visually simple room. Results showed that the graduated density retina outperforms the uniform density retina, most likely because the most important visual information for Quake II agents is the center of the screen, which is the direction which their gun always aims. In the second experiment I increased the visual complexity of the room in which the agents were tested. Tests that used neuroevolution with the same graduated density con- troller were unable to adequately learn to solve the problem. Instead of learning to find the enemy and shoot at him a burst of shots, they mostly learned a less effective “sprin- kler” pattern that involved shooting constantly in a semi-dangerous pattern as they walked around the room, making no special preference to shoot the enemy when he came into view. I knew from the previous experiment that the optimal behavior in this test is to spin around quickly in circles until the enemy is spotted, then shoot a burst of shots in his general direc- 42 tion. I used this knowledge to hand-code a controller that uses non-visual input, accessing instead the exact enemy coordinates. I then used this non-visual hand-coded controller as a supervisor in backpropagation. I combined both neuroevolution and backpropagation in a Lamarckian learning scheme; every individual in the population was taught with backpropa- gation against the non-visual hand-coded controller for a short time before they were tested with the fitness function. This hand-coded supervisor was able to lead them away from any “sprinkler” strategy local optimums, until they learned to shoot at the enemy when they were in front of the agent’s gun, and rarely otherwise. This experiment showed that it can be very beneficial to use higher-level inputs to make a hand-coded controller which can guide the training of a lower-level inputs controller through Lamarckian evolution. In order to have a hand-coded controller, there must be a hand to code it. A human must contrive the behavior that the controller should exhibit then implement it through code. If the desired behavior is not known, or if it is very complicated, then the programmer will have a difficult or impossible task. If, instead of hand-coding the supervising controller it is evolved, then the human element is removed entirely, and there is no need to know the proper solution beforehand. Instead, the same fitness function that is used for the low-level visual inputs can be used to train the high-level non-visual supervising controller. In the third experiment, I tested the agents in a room with a large central pillar, around which the agent must find the enemy and kill it. The supervising controller used non-visual inputs, such as enemy coordinate locations and wall sensors, and was evolved using neuroevolution. The best of these controllers were then used to train the visual controllers via a combination of backpropagation and neuroevolution in a Lamarckian-style evolution. I compared these Lamarckian tests against visual controllers that learned by neuroevolution-only, and found that the Lamarckian controllers learned faster and outperformed the neuroevolution-only controllers. As in the second experiment, the neuroevolution-only controllers became stuck in a “sprinkler” local optimum, whereas the Lamarckian tests learned to directly attack the enemy, often firing only when the enemy is in front of the gun. The experiments presented in this thesis are foundational discoveries for further research in the area of neurovisual control. The eventual behavior exhibited by my agents is far from that of a competitive and fun AI opponent, but I have kept my controllers and tasks simple 43 in order to isolate the strategical requirements, so that I could see exactly which techniques work better when training the visual agents. It is my hope that this research will be used in more complicated experiments that use larger retinas and train the agent to do more difficult tasks. My Lamarckian experiments will also be useful for the field of robotics, or for any low-level neural network that can be trained by a higher-level controller. 44

Bibliography

[1] D.H. Ballard, G.E. Hinton, and T.J. Sejnowski. Parallel visual computation. Nature, November 1983. [2] S. Baluja. Evolution of an artificial neural network based autonomous land vehicle controller. IEEE Transactions on Systems, Man and Cybernetics, Vol. 26 No. 3, 450- 463, June 1996. [3] C. Bauckhage, C. Thurau, and G. Sagerer. Learning human-like opponent behavior for interactive computer games. Pattern Recognition, volume 2781 of LNCS, pages 148-155. Springer-Verlag, 2003. [4] K. Bennett and C. Campbell. Support vector machines: hype or hallelujah? ACM SIGKDD Explorations Newsletter, December 2000. [5] B. Bryant and R. Miikkulainen. Acquiring visibly intelligent behavior with example- guided neuroevolution. Proceedings of the Twenty-Second National Conference on Ar- tificial Intelligence, (AAAI-07), pp. 801-808. Menlo Park, CA: AAAI Press. [6] C. Burges. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2, 121167 1998. [7] Q. Chen, Y. Lai, and J. Han. A implementation for distributed backpropagation using corba architecture. Proceedings of the 2006 5th International Conference on Cognitive Informatics, (ICCI 2006), Beijing, China, July 2006. [8] S. Draghici. A neural network based artificial vision system for licence plate recognition. International Journal of Neural Systems, Feb. 1997. [9] S. Eickeler, S. Mller, and G. Rigoll. High performance face recognition using pseudo 2-D hidden markov models. European Control Conference (ECC), 1999. [10] H. El-Bakry. Automatic human face recognition using modular neural networks. Ma- chine Graphics and Vision, 2001, Vol. 10, nr 1, s. 47-73. [11] J. L. Elman. Finding structure in time. Cognitive Science, 14:179–211, 1990. [12] S. Enrique, A. Watt, F. Policarpo, and S. Maddock. Using synthetic vision for au- tonomous non-player characters in computer games. 4th Argentine Symposium on Artificial Intelligence, Santa Fe, Argentina, 2002. [13] R. Feraund, O.J. Bernier, J.-E. Viallet, and M. Collobert. A fast and accurate face detector based on neural networks. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Jan 2001. 45

[14] D. Floreano, T. Kato, D. Marocco, and E. Sauser. Coevolution of active vision and feature selection. Biological Cybernetics, 90(3), 2004, pp. 218-228. [15] K. Fukushima. A neural network for visual pattern recognition. Computer, March 1988. [16] R. Graham, H. McCabe, and S. Sheridan. Neural pathways for real time dynamic computer games. Proceedings of the Sixth Eurographics Ireland Chapter Workshop, ITB June 2005, Eurographics Ireland Workshop Series, Volume 4 ISSN 1649-1807, ps.13-16. [17] J. Grefenstette. Lamarckian learning in multi-agent environments. Proceedings of the Fourth International Conference on Genetic Algorithms, 303-310, San Mateo, CA, 1991. [18] K. Grill-Spector and R. Malach. The human visual cortex. Annual Review of Neuro- science Vol. 27: 649-677, July 2004. [19] G. Guo, S. Li, and K. Chan. Face recognition by support vector machines. Fourth IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France. March 2000. [20] N. Intrator, D. Reisfeld, and Y. Yeshurun. Face recognition using a hybrid super- vised/unsupervised neural network. Pattern Recognition Letters 17, July 1996. [21] T.M. Jochem, D.A. Pomerleau, and C.E. Thorpe. Vision-based neural network road and intersection detection and traversal. Intelligent Robots and Systems 95. ’Human Robot Interaction and Cooperative Robots’, Proceedings. 1995 IEEE/RSJ International Conference on, Aug 1995. [22] N. Kohl, K. Stanley, R. Miikkulainen, M. Samples, and R. Sherony. Evolving a real- world vehicle warning system. Proceedings of the Genetic and Evolutionary Computa- tion Conference 2006, pp. 1681-1688, July 2006. [23] K. Ku, M. Mak, and W. Sui. A study of the lamarckian evolution of recurrent neural networks. IEEE Transactions on Evolutionary Computation, 4:31-42, 2000. [24] J.E. Laird. Research in human-level AI using computer games. Communications of the ACM, January 2002. [25] J.E. Laird and M. Van Lent. Human-level AIs killer application. AI magazine, Summer 2001. [26] J.-B. Lamarck. Pilosophie Zoologique, 1809. [27] S. Lawrence, C. Giles, A. Tsoi, and A. Back. Face recognition: A convolutional neural- network approach. IEEE Transactions on Neural Networks, Special Issue on Neural Networks and Pattern Recognition, Volume 8, Number 1 1997. [28] I. Masaki. Vision-based vehicle guidance. Industrial Electronics, Control, Instrumen- tation, and Automation, 1992. Proceedings of the 1992 International Conference on Power Electronics and Motion Control, Nov 1992. [29] P. Michel and R. El Kaliouby. Real time facial expression recognition in video using support vector machines. Proceedings of the 5th International Conference on Multi- modal Interfaces (ICMI), Vancouver, British Columbia, Canada. November 2003. 46

[30] B. Moghaddam, T. Jebara, and A. Pentland. Bayesian face recognition. Pattern Recog- nition, November 2000. [31] A. Nefian and M. Hayes. Hidden markov models for face recognition. Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. [32] S. Park, K. Kim, K. Jung, and H. Kim. Locating car license plates using neural networks. Electronics Letters, Aug 1999. [33] M. Parker and B. Bryant. Neuro-visual control in the Quake II game engine. Proceedings of the 2008 International Joint Conference on Neural Networks, (IJCNN 2008), Hong Kong, June 2008. [34] M. Parker and B. Bryant. Lamarckian neuroevolution for visual control in the Quake II environment. Proceedings of the 2009 International Conference on Evolutionary Computation, (CEC 2009), Trondheim, Norway, May 2009. [35] M. Parker and G. Parker. Using a queue genetic algorithm to evolve xpilot control strategies on a distributed system. Proceedings of the 2006 IEEE Congress on Evolu- tionary Computation, (CEC 2006), Vancouver, BC, Canada, July 2006. [36] D. Pomerleau. Efficient training of artificial neural networks for autonomous navigation. Neural Computation, Vol. 3, No. 1, 1991. [37] M. Pontil and A. Verri. Support vector machines for 3D object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, June 1998. [38] S. Priesterjahn, O. Kramer, A. Weimer, and A. Goebels. Evolution of human- competitive agents in modern computer games. Proceedings of the 2006 IEEE Congress on Evolutionary Computation, (CEC 2006), Vancouver, BC, Canada, July 2006. [39] J. Rehg, K. Murphy, and P. Fieguth. Vision-based speaker detection using bayesian networks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1999. [40] O. Renault, N. Magnenat-Thalmann, and D. Thalmann. A vision-based approach to behavioural animation. Journal of Visualization and Computer Animation, Vol.1, No1, 1990, pp.18-21. [41] D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by error propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations, Cambridge, MA: MIT Press, 318-362, 1986. [42] T. Starner and A. Pentland. Real-time american sign language recognition from video using hidden markov models. Computational Imaging and Vision, 1997. [43] S. Thorpe, D. Fize, and C. Marlot. Speed of processing in the human visual system. Nature, June 1996. [44] C. Thurau, C. Bauckhage, and G. Sagerer. Learning human-like movement behavior for computer games. Proc. Int. Conf. on the Simulation of Adaptive Behavior, pages 315-323. MIT Press, 2004. [45] D. Whitley, S. Dominic, R. Das, and C.W. Anderson. Genetic reinforcement learning for neurocontrol problems. Machine Learning, 13:259-284, 1993. 47

[46] S. Zanetti and A. El Rhalibi. Machine learning techniques for first person shooter in Quake3. International Conference on Advances in Computer Entertainment Technol- ogy, ACE2004, 3-5 June 2004, Singapore. [47] L. Zhao and C. Thorpe. Stereo- and neural network-based pedestrian detection. IEEE Transactions on Intelligent Transportation Systems, September 2000.