Neurovisual Control in the Quake II Environment
Total Page:16
File Type:pdf, Size:1020Kb
University of Nevada, Reno Neurovisual Control in the Quake II Environment A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science with a major in Computer Science. by Matt Parker Dr. Bobby D. Bryant, Thesis Advisor August 2009 THE GRADUATE SCHOOL We recommend that the thesis prepared under our supervision by MATT PARKER entitled Neurovisual Control In The Quake II Environment be accepted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Bobby D. Bryant, Advisor Kostas E. Bekris, Committee Member Jennifer Mahon, Graduate School Representative Marsha H. Read, Ph. D., Associate Dean, Graduate School August, 2009 i Abstract An enormous variety of tasks and functions can be performed by humans using only a two-dimensional visual array as input. If an artificial intelligence (AI) controller could adequetely harness the great amount of data that is readily extracted by humans in visual input, then a large number of robotics and AI problems could be solved using a single camera as input. First-person shooter computer games that provide realistic-looking graphics can be used to test visual AI controllers that can likely be used for real-life robotics. In this research, the computer game Quake II is used to test and make improvements to visual neural network controllers. Neural networks are promising for visual control because they can parse raw visual data and learn to recognize patterns. Their computational time can also be much faster than complex mathematical visual algorithms, which is essential for real- time applications. In the first experiment, two different retinal layouts are connected to the same type of neural network: one retina imitates a human's clear-center/blurred-periphery, and the other uses uniform acuity. In the second experiment, a Lamarckian learning scheme is devised that uses a hand-coded non-visual controller to help train agents in a mixture of back-propagation and neuroevolution. Lastly, the human-element is completely removed from the Lamarckian scheme by replacing the hand-coded non-visual controller with an evolved non-visual neural network. The learning techniques in this research are all successful advances in the field of visual control and can be applied beyond Quake II. ii Acknowledgements This work was supported in part by NSF EPSCoR grant EPS-0447416. Quake II is a registered trademark of Id Software, Inc., of Mesquite, Texas. iii Contents Abstract i Acknowledgements ii List of Figures iv 1 Introduction 1 1.1 Motivation . .1 1.2 Overview . .1 2 Background 3 2.1 Genetic Algorithms . .3 2.1.1 Standard Genetic Algorithm . .3 2.1.2 Queue Genetic Algorithm . .5 2.2 Neural Networks . .7 2.2.1 Backpropagation . .9 2.2.2 Neuroevolution . .9 2.3 Computer Vision . 10 2.3.1 Hidden Markov Models and Bayesian networks . 10 2.3.2 Support Vector Machines . 11 2.3.3 Neural Networks . 11 2.4 First-Person-Shooters as an AI Research Platform . 12 2.5 Synthetic Vision vs. Raw Vision . 13 3 The Quake II Environment 14 3.1 Original Game . 14 3.2 Quake2AI Interface . 15 4 Experiment 1: Graduated vs. Uniform Density Retina 17 4.1 Introduction . 17 4.2 Experimental Setup . 17 4.3 Neuro-Visual Controller . 18 4.4 Training . 20 4.5 Results . 22 4.6 Conclusion . 23 5 Experiment 2: Lamarckian Neuroevolution 25 iv 5.1 Introduction . 25 5.2 Experiment Setup . 25 5.3 Hand-Coded Bot for Backpropagation . 26 5.4 Training . 28 5.5 Results . 29 5.6 Conclusion . 31 6 Experiment 3: Lamarckian Neuroevolution Without Human Supervision 33 6.1 Introduction . 33 6.2 Experiment Setup . 33 6.3 Supervising Bots for Backpropagation . 34 6.4 Training . 36 6.5 Results . 37 6.6 Conclusion . 39 7 Conclusion 41 v List of Figures 2.1 The Queue Genetic Algorithm (QGA). New individuals are bred from parents chosen from the current population by roulette wheel selection. After the new individual is tested, it is placed on the beginning of the queue and the oldest individual is discarded. .6 2.2 A single-layer perceptron network. Each of the 22 inputs is connected by a weight to each of the 3 outputs. .8 3.1 An in-game screenshot from the game Quake II. 15 4.1 A screenshot of the simple room used for this experiment. The ceilings are a dark brown textures, the walls are a gray texture, and the floors are white. The enemy is dark-blue and contrasts the light colored walls and floors. The trail of dots indicates the bolts from the learning agent's blaster. 18 4.2 Left: A scene as rendered by the game engine. Right: The same scene as viewed via the graduated density retina. The black tics above and below the image indicate the retinal block widths. 19 4.3 A view via the uniform density retina. The enemy's location and distance are similar to the view in figure 4.2. The contrast between the enemy and the walls and floor are much less distinct in the uniform retina than in the graduated density retina, because of the increased area averaged into the visual block where it appears. 20 4.4 Diagram of the controller network, with 28 visual inputs, 10 recurrent hidden layer neurons, and 4 outputs. 21 4.5 The population fitness averaged over 24 independent runs, tested using the uniform density retina (figure 4.3) and the graduated density retina (figure 4.2). The dashed bottom line indicates the increased movement of the enemy, where 300 is the maximum possible enemy movement speed. 23 5.1 An in-game screenshot of the environment used in this experiment. The floor and ceilings are brown, the walls are gray, and the enemy is dark blue. The room is dimly lit with varying shadows. The display of the shooter's own weapon has also been removed. (Cf. figure 4.1.) . 26 5.2 An in-game screenshot of an agent looking at the enemy opponent in the map. The left side of the screen shows the Quake II game screen and the right side shows what the agent actually sees through its retina. 28 5.3 Average of the average fitnesses of the 24 populations that used plain neu- roevolution and Lamarckian neuroevolution. The top dark line shows the fitness according to the number of kills, and the dashed line shows the en- emy's speed, which increased whenever the fitness reached a certain point. 30 vi 6.1 An in-game screenshot of the environment used in this experiment. The floor and ceilings are brown, the walls are gray, and the enemy is dark blue. The room is dimly lit with varying shadows. A large square pillar is placed in the center of the room. The display of the shooter's own weapon has also been removed. 34 6.2 An in-game screenshot of an agent looking at the enemy opponent in the map. The left side of the screen shows the Quake II game screen and the right side shows what the agent actually sees through its retina. 36 6.3 Average of the average fitnesses of the 25 populations that used neuroevolu- tion only and that used Lamarckian neuroevolution. The top dark line shows the fitness according to the number of kills, and the dashed line shows the enemy's speed, which increased whenever the fitness reached a certain point. 38 6.4 Average of the average fitnesses of the 25 populations evolved non-visual controllers. The dark top line shows the fitness according to the number of kills, and the dashed line shows the enemy's speed, which increased whenever the average fitness reached a certain point. 39 1 Chapter 1 Introduction 1.1 Motivation Humans with normal vision use their visual system to help them live and survive in the world. Even if all other senses, such as taste, touch, hearing, and smell, are removed, a human can still accomplish a large variety of tasks. For example, a human can remotely control a military drone using only a streaming two-dimensional camera image taken from the nose of the airplane; likewise, cars or other vehicles can be controlled with just visual input. Doctors are able to operate remotely on patients using a robotic scalpel system and a camera image. Computer games usually only require that a player can view the graphical data presented on a 2-D screen, yet there are computer games and simulations for almost every interesting human action, all which can be performed by using the visual screen to reap the game information. Artificial intelligence systems that utilize visual input can use a single camera for in- put. Vehicles could be controlled entirely or in part by an AI looking through a camera [28][21]. AI agents in computer games could also be created that would see exactly what a human would see, so that the AI could not cheat by directly reading the game environment information [33]. It is the goal of this research to improve the capabilities of AI in real-time visual control to further realize the benefits of such visual AI. 1.2 Overview In this research AI controllers are created that are able to play a computer game. Computer games are good test-beds for AI controllers because they provide cheap, robust, and well- tested simulations that are generally non-deterministic in nature [25][24]. They often will allow for game-customability by changing models, maps, and game-rules. Moreover, AI controllers used in games can easily provide a direct comparison to a human's ability to play the same game, often, in the case of multi-player games, by allowing the AI and the humans to directly compete.