Using Computer Vision Techniques to Play an Existing Video Game
Total Page:16
File Type:pdf, Size:1020Kb
CALIFORNIA STATE UNIVERSITY SAN MARCOS PROJECT SIGNATURE PAGE PROJECT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE MASTER OF SCIENCE IN COMPUTER SCIENCE PROJECT TITLE: Using Computer Vision Techniques to Play an Existing Video Game AUTHOR: Christopher E. Erdelyi DATE OF SUCCESSFUL DEFENSE: May 6, 2019 THE PROJECT HAS BEEN ACCEPTED BY THE PROJECT COMMITTEE IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE. Xin Ye PROJECT COMMITTEE CHAIR SIGNATURE DATE Shaun-bill Wu PROJECT COMMITTEE MEMBER DATE . Using Computer Vision Techniques to Play an Existing Video Game Presented to the faculty of the College of Science and Mathematics at California State University, San Marcos Submitted in partial fulfillment of the requirements for the degree of Masters of Science Christopher Erdelyi [email protected] March 2019 Abstract Game playing algorithms are commonly implemented in video games to control non-player characters (hereafter, “NPC’s,”) in order to provide a richer or more competitive game environment. However, directly programming opponent algorithms into the game can cause the game-controlled NPC’s to become predictable to human players over time. This can negatively impact player enjoyment and interest in the game, especially when the algorithm is supposed to compete against human opponents. To extend the revenue-generating lifespan of a game, the developers may wish to continually refine the algorithms – but these updates would need to be downloaded to every players’ installed copy of the game. Thus, it would be beneficial for the game’s algorithm to run independently from the game itself, located on a server which can be easily accessed and updated by the game developers. Furthermore, the same basic algorithm setup could be used across many games that the developer creates, by using computer vision to determine game states, rather than title-specific Application Program Interfaces (hereafter, “API’s.”) In this paper, we propose a method for playing a racing game using computer vision, and controlling the game only through the inputs provided to human players. Using the Open Source Computer Vision Library (hereafter known by its common name, “OpenCV”) to take screenshots of the game and apply various image processing techniques, we can represent the game world in a manner suitable for external driving algorithm to process. The driving algorithm then makes decisions based on the state of the processed image, and sends inputs back to the game via keyboard emulation. The driving algorithm created for this project was tuned using more than 50 separate adjustments, and run multiple times on each adjustment to measure how far the player’s vehicle could travel before crashing or stalling. These results were then compared to a set of baseline tests, where random input was used to steer the vehicle. The results show that our computer vision-based approach does indeed show promise, and could be used to successfully compete against human players if enhanced. 2 Acknowledgements Thank you Dr. Ye, for your suggestions on computer vision and driving algorithm design, and for guiding me throughout the research project. I also thank my family, friends, and coworkers for their patience and support while I completed the Master’s program. 3 Table of Contents List of Abbreviations and Definitions ................................................... 6 1. Introduction and Background ...................................................... 7 2. Related Work ................................................................................. 9 2.1 DeepMind: Capture the Flag ................................................................................................ 9 2.2 OpenCV: Grand Theft Auto V ............................................................................................. 10 3. Program Flow Explanation and Diagrams ............................... 13 3.1. Overall Process Flow for Experiment ....................................................................................... 13 3.2. OpenCV Image Manipulation: Functions and Order of Operations ........................................... 14 3.3 Visual Analysis Processing Steps ............................................................................................... 17 4. Hardware and Software Utilized ............................................... 18 5. Approach and Implementation .................................................. 19 5.1 Approach ........................................................................................................................... 19 5.2 Capture Emulator Display .................................................................................................. 22 5.3 Overlay Mask on Screen Capture ....................................................................................... 22 5.4 Examine Processed Image ................................................................................................. 28 5.5 Driving Algorithm Chooses Next Input Action .................................................................... 30 5.6 Emulate System Keypresses ............................................................................................... 31 6. Experimental Results .................................................................. 34 6.1 Setup and Baseline Results ................................................................................................ 34 6.2 Driving Algorithm Tuning: Iterative Results ........................................................................ 35 6.3 Experiment Results on Static Drive Algorithm Configuration .............................................. 37 6.4 Driving Behavior ................................................................................................................ 38 7. Conclusion and Future Work ..................................................... 40 References ............................................................................................. 43 External Figure References ................................................................. 49 4 Table of Figures Figure 1. Screengrab of a video demo for DeepMind playing Capture the Flag. [External Figure 1]. ............................................... 10 Figure 2. Canny Edge Detection on GTA V. [External Figure 2]. ... 11 Figure 3. Hough Lines on GTA V image. [External Figure 3]. ........ 12 Figure 4. Lane marker overlay in GTA V. [External Figure 4]. ..... 12 Figure 5. Closed-loop process cycle. ................................................... 14 Figure 6. OpenCV processing steps for emulator screenshots. ........ 16 Figure 7. Visual analysis steps. ............................................................ 18 Figure 8. Grayscale image conversion. ............................................... 23 Figure 9. Threshold function generated black and white image. ..... 24 Figure 10. Processed game image after Canny edge detection is applied. .................................................................................................. 25 Figure 11. Processed game image after Gaussian blurring has been applied to Canny edges. ....................................................................... 26 Figure 12. Processed game image with Hough lines.. ........................ 27 Figure 13. Processed game image after second application of Hough lines. ....................................................................................................... 28 Figure 14. Turns navigated vs algorithm tuning iteration. .............. 37 Figure 15. Trial results over 30 attempts. .......................................... 38 5 List of Abbreviations and Definitions API: Application Program Interface. In this paper, we are referring to communications definitions or tools that allow for one program to interact with another directly. CPU: Central Processing Unit. The general purpose computing cores used in personal computers. GPU: Graphics Processing Unit. The computing cores which are architected to specialize in computer graphics generation. NPC: Non-Player Character. Refers to an in-game avatar which may act and look similar to a human player’s avatar, but is controlled by the game itself. OpenCV: Open Computer Vision. An open-source library of functions that allow for real-time computer vision. OS: Operating System. Software which manages computer hardware, software, and services. PAL: Phase Alternating Line. An analogue television encoding standard with a resolution of 576 interlaced lines. RAM: Random Access Memory. Refers to computer memory for temporary program storage. ROM: Read Only Memory. In this paper, it refers to the test game’s program file. The name originated from the fact that cartridge-based video games were stored on solid state memory chips, and did not have the ability to be written to. 6 1. Introduction and Background In today’s video games, one common requirement of the main game program is to control a wide variety of non-player characters, which interact with the human player. These non-player characters, or “NPC’s,” can be cooperative characters, enemies, or environmental figures that add decoration and flair to the game’s world. Traditionally, computer-controlled enemy players, or “bots,” are controlled by a hard-coded logic within a game [25]. Games have traditionally implemented various forms of pathfinding algorithms to control their NPC’s. These pathfinding methods require a full understanding of