Developing Cooperative Agents for Nba Jam
Total Page:16
File Type:pdf, Size:1020Kb
DEVELOPING COOPERATIVE AGENTS FOR NBA JAM A Project Presented to the Faculty of California State Polytechnic University, Pomona In Partial Fulfillment Of the Requirements for the Degree Master of Science In Computer Science By Charlson So 2020 SIGNATURE PAGE Project: DEVELOPING COOPERATIVE AGENTS FOR NBA JAM Author: Charlson So Date Submitted: Spring 2020 Department of Computer Science Dr. Adam Summerville Project Committee Chair Computer Science _____________________________________ Dr. Amar Raheja Computer Science _____________________________________ ii ACKNOWLEDGEMENTS I would like to give a special thanks to Professor Adam Summerville for his lessons and advice with my project. I am extremely grateful to have such a caring and passionate advisor. I would also like to express my gratitude to Professor Amar Raheja. His class, Digital Image Processing, is one I will remember throughout my career. To my dad, Kyong Chin So, my mom, Jae Hyun So, and sister, Katherine So, it was only through your love and support that I was able to succeed in life. Through all the rough times and struggle, here’s to a brighter future. Charlson So iii ABSTRACT As artificial intelligence development has been rapidly advancing, the goal of creating artificial agents that can mimic human behavior is becoming a reality. Artificial agents are becoming capable of reflecting human behavior and decision making such as drawing creative art pieces and playing video games [10][24]. Therefore, they should be able to mimic one of the greatest human strengths, cooperation. Cooperation is an integral skill that allows humans to achieve feats that they cannot do alone. It is also a highly valuable skill that can be developed for artificial agents as software with intelligent programming becomes integrated into human society. As advanced neural network architectures become widely available, cooperative artificial agents will aid humans in a wide variety of fields. Thus, it becomes vital to discuss the quality of interactions between society and artificial systems and analyze how these systems should interact with the public. This study attempts to emulate past experimentation with cooperative agents and how well it performs in the context of a cooperative video game, NBA Jam for the Super Nintendo Entertainment System. NBA Jam is a great testbed for testing cooperative agents because it is well known in the gaming community for its difficulty [25]. NBA Jam includes a complex set of inputs and combinations that allow the player to play with a unique style. This experiment seeks to explore the results of training an intelligent artificial agent that attempts to maximize cooperation between itself and its teammate. The inclusion of well programmed cooperative agents should allow players to win more games and have an enjoyable experience. By adjusting the rewards systems of neural iv networks, this project attempts to explore the nuances of developing cooperative agents for video games. v TABLE OF CONTENTS SIGNATURE PAGE ii ACKNOWLEDGMENTS iii ABSTRACT iv LIST OF TABLES vii LIST OF FIGURES viii CHAPTER 1: COOPERATIVE ARTIFICIAL AGENTS 1 AND NEURAL NETWORKS 1.1 Introduction to Designing a Cooperative Agent 1 1.2 History of Artificial Agents Within Video Games 2 1.3 Machine Learning and Reinforcement Learning 4 1.4 Markov Decision Process 5 1.5 Q-Learning and Deep Q-Learning 7 1.6 Actor Critic 10 1.7 A2C (Advantage Actor Critic) 11 1.8 Training Cooperative Agents 13 CHAPTER 2: EXPERIMENTAL SETUP 16 2.1 NBA Jam For Super Nintendo Entertainment System 16 and Retro Gym Integration vi 2.2 Training Loop 20 CHAPTER 3: ANALYSIS OF DATA AND DISCUSSION OF RESULTS 24 3.1 Results 24 3.2 Analysis 28 3.3 Adjustments for Future Experimentation 29 CHAPTER 4: CONCLUSION 31 REFERENCES 32 vii LIST OF TABLES Table 1: NBA Jam Controls 18 Table 2: Variable and Memory Addresses 20 Table 3: Description of Files Required for Integration of a ROM for Retro Gym 21 Table 4: Experiment Reward Functions 22 Table 5: Results of Deterministic Policy Player Statistics 24 Table 6: Results of Non-Determinisitc Policy Player Statistics 25 Table 7: Results of Deterministic Model Team Scores 26 Table 8: Results of Non-Deterministic Model Team Scores 27 vii LIST OF FIGURES Figure 1: Markov Decision Process 6 Figure 2: Comparison between Q Learning and Deep Q Learning 9 Figure 3: Actor Critic Model 10 Figure 4: Advantage Actor Critic 12 Figure 5: Menu Screen for NBA Jam 16 Figure 6: Instance of a Game of NBA Jam 17 Figure 7: Gym Retro Integration Application 19 Figure 8: Command to Train pytorch-a2c-acktr on HPC 22 Figure 8: Results For Scoring Agents 18 Figure 9: Mean Reward For A2C For Reward #2 19 Figure 10: Agent Blocking a Shot (Reward Function #3, Deterministic Policy) 26 Figure 11: Mean Reward For Reward Function #1 27 Figure 12: Mean Reward For Reward Function #2 27 Figure 13: Mean Reward For Reward Function #3 28 Figure 14: Mean Reward For Reward Function #4 28 viii CHAPTER 1: COOPERATIVE ARTIFICIAL AGENTS AND NEURAL NETWORKS 1.1 Introduction to Designing a Cooperative Agent Cooperation arises from a multitude of reasons including but not limited to material trading, culturally installed behaviors, competition, and expression of emotion. It is theorized to be a primary characteristic that allowed humans to create an advanced society. Therefore, sociologists have studied human cooperation in depth and have devised a set of criteria in order to define cooperation. According to Carl Couch’s theory of cooperative action, cooperation is based on “elements of sociation.” These traits included acknowledged attentiveness, mutual responsiveness, congruent functional identities, shared focus, and social objective [1]. Reciprocal-acknowledged attention is a form of interconnectedness between people that includes a fluid-shared consciousness. Humans who cooperate share an understanding with each other that allows them to focus on the goal and make decisions that will ultimately benefit the group. Mutual responsiveness is commonly used through verbal statements or visual actions. Two friends can establish communication and meaning through a phrase or a nod. Shared focus and social objective describe past-bound or future-oriented connectedness between a group. By having similar goals, cooperation creates discernible outcomes that will ultimately tie two agents together. But many of these traits are unique to human to human cooperation. Humans are capable of verbal and non verbal communications and shared cultural backgrounds that may predicate certain actions. But human to artificial intelligence cooperation is limited 1 since artificial intelligence must include features that allow the agent to understand these verbal and non verbal communiques. Therefore, a new set of criteria based on Couch’s theory of cooperative action should be formed in order to define a reward system for high performing cooperative agents that emulates human behavior. In the context of NBA Jam on the SNES, a highly cooperative agent should be able to set up the teammate to score and generate a high number of assists. Also, the agent should have mutual responsiveness by taking advantage of passes the teammate gives in order to score. Since the goal of NBA Jam is to ultimately win the game, the cooperative agent should be able to learn the complex moveset programmed into the game and maximizes its defensive capabilities in order to increase the chance of winning a game. This experiment aims to create an artificial agent capable of maximizing a user’s playing experience by cooperating with them efficiently based on these criteria. 1.2 History of Artificial Agents Within Video Games Video games have been a huge testing ground for developing artificial agents. Since games provide a reliable simulated environment that is capable of running faster than real time, agents can be trained on a variety of parameters without the need of human intervention. The first well known example of artificial intelligence algorithms developed on video games is Deep Blue, developed by IBM [20]. IBM’s Deep Blue supercomputer defeated world chess champion Garry Kasparov in 1997, proving that computer calculations are capable of solving problems previously thought only solvable by humans. Deep Blue won a match under regulation time control, meaning the game had to play out faster than three minutes per move. Its architecture included 480 chess chips, 2 capable of searching 2 to 2.5 million chess positions per second in parallel. Deep Blue also utilized the minimax algorithm, an algorithm used to minimize the possible loss for a worst case scenario and chooses an action that maximizes its future state. Deep Blue led to a proving ground where algorithms and computer architectures were tested against humans in order to highlight the effectiveness of computers and algorithms with decision making capabilities. Although Deep Blue was proof that computer and software technology had advanced to challenge human intelligence, exhaustive search for a game with greater depth was not possible with current hardware. For example, a chinese checkers game called Go utilizes a 19x19 board compared to a chess board that uses an 8x8 grid. The search space in Go is considerably wider; Go 150 80 having approximately 250 possible legal moves while chess only has 35 moves [10]. Therefore, it was no surprise when researchers developed an agent capable of defeating world champions of Go using deep reinforcement learning [10]. AlphaGo, developed by DeepMind, was a computer program that defeated its first professional Go player, Lee Sedol, in 2016. Alpha Go utilized supervised learning on a SL policy network that included convolutional layers, rectifier nonlinearities, and a final softmax layer output that would generate a probability distribution over all legal moves. With this neural architecture, AlphaGo showed that artificial intelligence had developed the ability to tackle new challenging problems. A growth in artificial intelligence, complemented by advances in parallel computing hardware, led to a new age of artificial intelligence where scientists began testing different neural networks in new and unique mediums.