Combo-Action: Training Agent for FPS Game with Auxiliary Tasks

Total Page:16

File Type:pdf, Size:1020Kb

Combo-Action: Training Agent for FPS Game with Auxiliary Tasks The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) Combo-Action: Training Agent for FPS Game with Auxiliary Tasks Shiyu Huang, Hang Su, Jun Zhu,∗ Ting Chen∗ Dept. of Comp. Sci. & Tech., BNRist Center, State Key Lab for Intell. Tech. & Sys., Institute for AI, THBI Lab, Tsinghua University, Beijing, 100084, China [email protected]; fsuhangss, dcszj, [email protected] Abstract Deep reinforcement learning (DRL) has achieved surpassing human performance on Atari games, using raw pixels and rewards to learn everything. However, first-person-shooter (FPS) games in 3D environments contain higher levels of hu- man concepts (enemy, weapon, spatial structure, etc.) and a large action space. In this paper, we explore a novel method which can plan on temporally-extended action sequences, which we refer as Combo-Action to compress the action space. We further train a deep recurrent Q-learning network model as a high-level controller, called supervisory network, to manage the Combo-Actions. Our method can be boosted with auxiliary tasks (enemy detection and depth prediction), which enable the agent to extract high-level concepts in the FPS games. Extensive experiments show that our method is efficient in training process and outperforms previous state- of-the-art approaches by a large margin. Ablation study ex- periments also indicate that our method can boost the perfor- mance of the FPS agent in a reasonable way. Introduction Figure 1: An illustration of the Combo-Action in ViZ- Deep reinforcement learning (DRL) has shown great suc- Doom. We define four Combo-Actions (Forward, Turn, Ro- cess in many games, including the computer Go game (Sil- tate, Shoot) in this paper. Each Combo-Action is a tempo- ver et al. 2016), Atari games (Mnih et al. 2013), etc. Be- ral abstraction of the primitive actions. A depth prediction sides the 2D games (e.g., Go and Atari), applying DRL to model and an enemy detection model are used for corre- first-person-shooter (FPS) games in an adversarial 3D envi- sponding Combo-Actions. Instead of using the whole depth ronment (Kempka et al. 2016; Lample and Chaplot 2016) map, we divide it into a 3 × 6 grid and calculate the average has attracted attention, in which a player fights against other depth value for each cell. We further train a DRQN model, computer agents or human players. Compared with the 2D called supervisory network, as a high level controller for the games, FPS games show a multitude of challenges since Combo-Actions. the additional spatial dimension not only introduces no- tions of partial observability and occlusions, but also causes complications due to viewpoint variance and more unpre- egy via data-driven methods since the release of ViZDoom, dictable actions of the enemies. Moreover, this task also in- e.g., Arnold (Lample and Chaplot 2016), F1 (Wu and Tian volves a wide variety of actions and skills, such as navigat- 2017) and IntelAct (Dosovitskiy and Koltun 2016). ing through a map, collecting items, fighting enemies, etc. ViZDoom (Kempka et al. 2016) is a RL research platform Challenges which allows researchers to develop agents to play the Doom Even much work has been done recently, there still remain game with the screen buffer and game variables. Many ef- many problems in building agents for the FPS game. forts have been paid on developing AI bots to learn a strat- Large Action Space: In general, there are numerous prim- ∗J. Zhu and T. Chen are corresponding authors. itive actions in FPS games for an agent to interact with Copyright c 2019, Association for the Advancement of Artificial the environment, which can be categorized into on-off ac- Intelligence (www.aaai.org). All rights reserved. tions and delta actions. On-off actions only contain bi- 954 nary states, while delta buttons have continuous values. dom choosing of Combo-Actions can yield not-bad perfor- Moreover, the combinations of actions increase exponen- mance. tially with time steps. This makes the action space very Supervisory Network: To manage the switch between dif- large, and results in the inefficient training process. Previ- ferent Combo-Actions, a high-level controller should be ous work (Lample and Chaplot 2016; Wu and Tian 2017; applied. In this paper, we use an LSTM (Hochreiter and Dosovitskiy and Koltun 2016) selectively abandons many Schmidhuber 1997) based recurrent neural network for the actions to compress the action space, which results in per- Q-learning model. Our supervisory network can work har- formance loss, e.g., their agents can’t change visual angle monically with other auxiliary networks during test period. vertically, making the agent unable to shoot the enemy on a Contributions: The contributions of our work are as fol- different horizontal plane. lows: (1) Our method can compress the original action space Lack of Prior Knowledge: Humans can learn throughout sharply, which improves the training efficiency and explo- their lives and can utilize prior knowledge to complete new ration ability. (2) Our method can fuse priori knowledge and tasks quickly. However, reinforcement learning algorithms basic concepts into the RL, which reduces the training dif- often learn a new task from scratch, which makes them ficulty and boosts the performance of the trained agent. (3) requiring far more experience than humans during train- Our method can alleviate disharmonious-action problem by ing. Although large amounts of research seeks to improve defining reasonable Combo-Actions for the FPS game. the sample efficiency of reinforcement learning algorithms, there are few studies in incorporating prior knowledge into reinforcement learning. In FPS games, for example, it is vi- Background tal to recognize some basic concepts(enemy, weapon, spatial In this section, we briefly review the deep Q-learning and structure, etc.). But it is hard to extract such information in deep recurrent Q-learning network. We also present some a single end-to-end RL model. work related to our method and the efforts made in the FPS Disharmonious Actions: Disharmonious actions often oc- game AI research field. cur in previous trained agents, i.e., actions are not meaning- ful between step to step. For example, sometimes the agent Deep Q-learning will turn left and right repeatedly and remain where it is. Previous work (Wu and Tian 2017) only tried to relieve this Deep Q-learning can learn a policy by interacting with the problem by manually detecting this situation in test period environment. At each step, the agent obtains current state and could do nothing for the RL model. st of the environment, gives out an action according to its policy, and receives a reward rt. The goal of the Q-learning Our Proposal algorithm is to maximize the expected sum of discounted PT t0−t To address the aforementioned issues, we develop a novel rewards Rt = t0=t γ rt0 , where T is the terminating method that can plan on temporally-extended action se- time, and γ 2 [0; 1] is a discount factor. The action value quences, which we refer as Combo-Action. We trained a function, called Q-function, takes two inputs: state s and ac- deep recurrent Q-learning network(DRQN) as a supervisory tion a, and returns the expected future reward: Qπ(s; a) = network to manage the Combo-Action. Our method enables E[Rtjst = s; at = a]. In Deep Q-learning(Sutton, Barto, the reinforcement learning algorithm to be boosted with aux- and others 1998), a neural network parameterized by θ is iliary tasks and prior knowledge. used as an estimate of the optimal Q-function. To optimize Combo-Action: We propose a kind of micro-action, called the Q-function, the temporal difference error is taken as the Combo-Action, in this paper. The Combo-Action is built on loss function: a series of primitive actions, which can complete a specific sub-task. These action combinations are adopted to RL train- 2 ing, which compresses the action space sharply and allows L(θ) = Es;a;r;s0 [(Qtarget − Qθ(s; a)) ]; (1) us to obtain the optimal value function within a practical 0 0 time and memory limitation. This method also guides the where Qtarget = r + maxa0 Qθ(s ; a ). agent for a better exploration during training. Deep Recurrent Q-learning Network(DRQN): Typi- Auxiliary Tasks: Previous methods prefer to use an end-to- cally, the task for reinforcement learning should be Marko- end neural network to play the FPS game. However, a single vian. However, the observation(e.g. partial field of vision in model is hard to handle a complex task. Our method devel- 3D FPS game) for the agent is not Markovian, and this is ops extra two sub-tasks simultaneously, i.e., enemy detec- considered as a partially observable Markov decision pro- tion task and depth prediction task. This decoupling makes cess(POMDP). To allay this problem, a memory module the debugging process to be more intuitionistic. Moreover, is often required, which can be used to store the history the auxiliary networks extract high-level concepts from the information. (Hausknecht and Stone 2015) introduced the observation, which provides useful information to the exe- Deep Recurrent Q-Networks(DRQN). DRQN applies a re- cuting of Combo-Action. current neural network into DRL, and LSTM (Hochreiter Our method can alleviate disharmonious-action problem and Schmidhuber 1997) is often used on the top of the nor- by defining reasonable Combo-Actions. The priori knowl- mal DQN model. In our project, we use DQRN as our ba- edge in Combo-Actions can emit more reasonable primitive sic reinforcement learning model. We present some related actions. Interestingly, experiment shows that even the ran- work in following sub-sections.
Recommended publications
  • First Person Shooting (FPS) Game
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 04 | Apr-2018 www.irjet.net p-ISSN: 2395-0072 Thunder Force - First Person Shooting (FPS) Game Swati Nadkarni1, Panjab Mane2, Prathamesh Raikar3, Saurabh Sawant4, Prasad Sawant5, Nitesh Kuwalekar6 1 Head of Department, Department of Information Technology, Shah & Anchor Kutchhi Engineering College 2 Assistant Professor, Department of Information Technology, Shah & Anchor Kutchhi Engineering College 3,4,5,6 B.E. student, Department of Information Technology, Shah & Anchor Kutchhi Engineering College ----------------------------------------------------------------***----------------------------------------------------------------- Abstract— It has been found in researches that there is an have challenged hardware development, and multiplayer association between playing first-person shooter video games gaming has been integral. First-person shooters are a type of and having superior mental flexibility. It was found that three-dimensional shooter game featuring a first-person people playing such games require a significantly shorter point of view with which the player sees the action through reaction time for switching between complex tasks, mainly the eyes of the player character. They are unlike third- because when playing fps games they require to rapidly react person shooters in which the player can see (usually from to fast moving visuals by developing a more responsive mind behind) the character they are controlling. The primary set and to shift back and forth between different sub-duties. design element is combat, mainly involving firearms. First person-shooter games are also of ten categorized as being The successful design of the FPS game with correct distinct from light gun shooters, a similar genre with a first- direction, attractive graphics and models will give the best person perspective which uses light gun peripherals, in experience to play the game.
    [Show full text]
  • TI£ᄍᆲAL ¬タモ Asynchronous Multiplayer Shooter With
    Entertainment Computing 16 (2016) 81–93 Contents lists available at ScienceDirect Entertainment Computing journal homepage: ees.elsevier.com/entcom TIṬAL – Asynchronous multiplayer shooter with procedurally generated maps q,qq ⇑ Bhojan Anand , Wong Hong Wei School of Computing, National University of Singapore, Singapore article info abstract Article history: Would it be possible to bring the promise of unlimited re-playability typically reserved for Roguelike Received 1 March 2015 games to competitive multiplayer shooters? This paper tries to address this issue by proposing a novel Revised 21 January 2016 method to dynamically generate maps at run-time almost as soon as players press the Play button, while Accepted 19 February 2016 ensuring the features what players would expect from the genre. The procedures are simple and practi- Available online 7 March 2016 cally feasible to be employed in actual computer games. In addition, the work experiments the possibility of incorporating asynchronous game-play element into a multiplayer shooter with human imitating bots Keywords: where the players can let their bot/avatar replace them when they are not around. The algorithms are Game implemented and evaluated with a playable game. The evaluations prove that playable 3D dynamic maps TIṬAL Prodecural content generation can be generated in order of seconds using game context data to initialise the parameters of the algo- Game map generation rithm. The paper also shows that asynchronous game-play element is a possible feature for consideration Bots in next generation multiplayer shooters. Artificial intelligence Ó 2016 Elsevier B.V. All rights reserved. 1. Introduction 1.1. Procedural content generation This paper demonstrates a novel method for procedurally gen- Procedural Content Generation (PCG) is the use of algorithmic erating dynamic maps at run-time at the click of the ‘Play’ button means to create content [2,3] dynamically during run-time.
    [Show full text]
  • Game 232 001
    Online & Mobile Gaming GAME232 Fall 2016 TR 9:00–10:15 am (001) 10:30–11:45 am (002) AB1018 Instructor: Prof. Sang Nam Associate Director, Computer Game Design Email: [email protected] Office: A&D Building 2025 Phone: 703-993-3163/office Twitter: twitter.com/sangumc Office Hours*: TR 11:45 am – 1:30 pm * Other times by appointment. The best way to reach me is via email. MASON MISSION STATEMENT Mission-Who we are and why we do what we do A public, comprehensive research university established by the Commonwealth of Virginia in the National Capital Region, we are an innovative and inclusive academic community committed to creating a more just, free, and prosperous world. MASON GAME DESIGN MISSION STATEMENT The Mission of the Computer Game Design Program at George Mason University is to prepare students for employment and further study in the computer game design and development field, doing so with a curriculum designed to reflect the gaming industry’s demand for an academically rigorous technical program coupled with an understanding of the artistic and creative elements of the evolving field of study. CATALOG DESCRIPTION Class covers the history, practice, and design of online and mobile games. Class will discuss the current state of the smartphone applications, and study the best practices to be successful in the applications market. Students will learn the development process for smartphone applications and develop original and innovative applications in a team-based environment. COURSE OVERVIEW In this course, you will explore the ever-expanding world of mobile, pervasive, and “big” games.
    [Show full text]
  • Regulating Violence in Video Games: Virtually Everything
    Journal of the National Association of Administrative Law Judiciary Volume 31 Issue 1 Article 7 3-15-2011 Regulating Violence in Video Games: Virtually Everything Alan Wilcox Follow this and additional works at: https://digitalcommons.pepperdine.edu/naalj Part of the Administrative Law Commons, Comparative and Foreign Law Commons, and the Entertainment, Arts, and Sports Law Commons Recommended Citation Alan Wilcox, Regulating Violence in Video Games: Virtually Everything, 31 J. Nat’l Ass’n Admin. L. Judiciary Iss. 1 (2011) Available at: https://digitalcommons.pepperdine.edu/naalj/vol31/iss1/7 This Comment is brought to you for free and open access by the Caruso School of Law at Pepperdine Digital Commons. It has been accepted for inclusion in Journal of the National Association of Administrative Law Judiciary by an authorized editor of Pepperdine Digital Commons. For more information, please contact [email protected], [email protected], [email protected]. Regulating Violence in Video Games: Virtually Everything By Alan Wilcox* TABLE OF CONTENTS I. INTRODUCTION ................................. ....... 254 II. PAST AND CURRENT RESTRICTIONS ON VIOLENCE IN VIDEO GAMES ........................................... 256 A. The Origins of Video Game Regulation...............256 B. The ESRB ............................. ..... 263 III. RESTRICTIONS IMPOSED IN OTHER COUNTRIES . ............ 275 A. The European Union ............................... 276 1. PEGI.. ................................... 276 2. The United
    [Show full text]
  • Adaptive Shooting for Bots in First Person Shooter Games Using Reinforcement Learning Frank G
    180 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 7, NO. 2, JUNE 2015 Adaptive Shooting for Bots in First Person Shooter Games Using Reinforcement Learning Frank G. Glavin and Michael G. Madden Abstract—In current state-of-the-art commercial first person late the most kills. Objective based games also exist where the shooter games, computer controlled bots, also known as non- emphasis is no longer on kills and deaths but on specific tasks player characters, can often be easily distinguishable from those in the game which, when successfully completed, result in ac- controlled by humans. Tell-tale signs such as failed navigation, “sixth sense” knowledge of human players' whereabouts and quiring points for your team. Two examples of such games are deterministic, scripted behaviors are some of the causes of this. We ‘Capture the Flag’ and ‘Domination’. The former involves re- propose, however, that one of the biggest indicators of nonhuman- trieving a flag from the enemies' base and returning it to your like behavior in these games can be found in the weapon shooting base without dying. The latter involves keeping control of pre- capability of the bot. Consistently perfect accuracy and “locking defined areas on the map for as long as possible. All of these on” to opponents in their visual field from any distance are in- dicative capabilities of bots that are not found in human players. game types require, first and foremost, that the player is profi- Traditionally, the bot is handicapped in some way with either a cient when it comes to combat.
    [Show full text]
  • A First Person Shooter with Dual Guns Using Multiple Optical Air Mouse Devices
    A First Person Shooter with Dual Guns Using Multiple Optical Air Mouse Devices Young-Bum Kim , Min-Sub Shim, Chang Geun Song and Yu-Seop Kim * Department of Computer Engineering, Hallym University 39 Hallymdaehak-gil, Chuncheon, Gangwon-do, 200-702, Korea {stylemove, connecting , cgsong, yskim01}@hallym.ac.kr Abstract mouse USB filter driver was developed by using the Windows mouse USB driver to monitor mouse event We proposed a first person shooter for both hands with data transferred to USB ports in order to display mouse multiple mouse devices, which are not used in existing pointers on the screen. games. Players control their movement and shooting separately using dual guns for both hands. For two-hand FLTDRIVER usage, we designed a display method for multiple mouse IRP for Dispatch I/O Completion pointers. First, we loaded typical physical driver into the FDO . memory and registered our developed multiple mouse . device driver instead of the existing MS Windows device IoCallDriver return driver. We then executed both message loop procedures for Windows mouse event handling and those for FDODRIVER multiple mouse event handling. We could display Dispatch Dispatch or Dpc Forlsr multiple mouse cursor images and utilize both mouse . FDO . devices separately. After that, a prototype game using . both hands was developed for an experimental study. IOCompleteRequest Fig. 1 A Filter Driver Key words: Multiple mouse devices, Game, First person Shooter The main difference between the filter driver and other layered driver is the device objects generated by the 1. Introduction drivers. The object of the filter driver has no name opposed to the layered drivers which export objects with First-person shooters (FPS), like Doom and Quake [1], unique names.
    [Show full text]
  • Human Factors of Speed-Based Exergame Controllers Taiwoo Park1, Uichin Lee2, I
    Human Factors of Speed-based Exergame Controllers Taiwoo Park1, Uichin Lee2, I. Scott MacKenzie3, Miri Moon4, Inseok Hwang1,5, Junehwa Song1 2Dept. of Knowledge Service 3Dept. of Electrical Engineering 1Dept. of Computer Science, Engineering, and Computer Science, KAIST, Korea KAIST, Korea York University, ON, Canada 4Div. of Web Science Technology, 5Center for Mobile Software Platform, KAIST, Korea KAIST, Korea {twpark, miri.moon, inseok, junesong}@nclab.kaist.ac.kr, [email protected], [email protected] ABSTRACT Exergames – games involving physical activity – have Exergame controllers are intended to add fun to monotonous shown potential to increase the enjoyment of repetitive exercise. However, studies on exergame controllers mostly exercise [24, 29, 39]. They usually involve new technology, focus on designing new controllers and exploring specific such as an exergame controller that adds a gaming interface application domains without analyzing human factors, such to exercise equipment. The transformed device serves as a as performance, comfort, and effort. In this paper, we medium to introduce fun factors such as coordinated examine the characteristics of a speed-based exergame interactions and competition to repetitive solitary exercises. controller that bear on human factors related to body The device also provides an opportunity for game developers movement and exercise. Users performed tasks such as to design interactive exergames similar to commonly changing and maintaining exercise speed for avatar control enjoyed video games. We envision that new exergame while their performance was measured. The exergame controllers will emerge featuring rich interactivity and controller follows Fitts’ law, but requires longer movement immersive game play. time than a gamepad and Wiimote.
    [Show full text]
  • Understanding Game Playing Preferences
    CLEI ELECTRONIC JOURNAL, VOLUME 17, NUMBER 3, PAPER 9, DECEMBER 2014 Understanding Game Playing Preferences Cecilia M. Curlango Rosas, Jorge Eduardo Ibarra Esquer, Gloria Etelbina Chávez Valenzuela, María Luisa González Ramírez, Linda Eugenia Arredondo Acosta and Marcela D. Rodríguez Universidad Autonoma de Baja California, Facultad de Ingeniería Mexicali, Mexico 21280 {curlango, jorge.ibarra}@uabc.edu.mx {gloria_chavez, maria.gonzalez, linda_arredondo, marcerod}@uabc.edu.mx Abstract This paper presents the results of a study made with university students to determine the types of video games that they play, the platforms they use to play, and what motivates them to play or stop playing games. The study results distinguish among genders in order to be able to design appropriate teaching strategies which appeal to both genders. This work was done as part of a larger study whose aim is to both attract and retain students to the computer-engineering program. Keywords: computer science education, engineering students, video game preferences 1 Introduction In 2010, The United Nations Educational, Scientific and Cultural Organization (UNESCO) released a report where global challenges and opportunities for engineering were presented. The report stresses the importance of the various engineering fields in development of society, but also warns about the decreasing public awareness and understanding of engineering, and the declining interest and enrollment of young people in engineering courses [1]. This is a trend that during the first decade of the 21st Century was seen as something critical for the Computing Engineering field [2], being until recent years that some countries have started to show a slight reversal in enrollment [3].
    [Show full text]
  • Design and Implementation of a Single-Player First-Person Shooter Game Using XNA Game Development Studio
    Final Report Design and implementation of a single-player first-person shooter game using XNA game development studio Master of Science Thesis in the Department of Computer Science and Engineering Hatice Ezgi TUGLU Kahraman AKYIL Chalmers University of Technology Department of Computer Science and Engineering Göteborg, Sweden, 2010 The Author grants to Chalmers University of Technology and University of Gothenburg the non-exclusive right to publish the Work electronically and in a non-commercial purpose make it accessible on the Internet. The Author warrants that he/she is the author to the Work, and warrants that the Work does not contain text, pictures or other material that violates copyright law. The Author shall, when transferring the rights of the Work to a third party (for example a publisher or a company), acknowledge the third party about this agreement. If the Author has signed a copyright agreement with a third party regarding the Work, the Author warrants hereby that he/she has obtained any necessary permission from this third party to let Chalmers University of Technology and University of Gothenburg store the Work electronically and make it accessible on the Internet. Design and implementation of a single-player first-person shooter game using XNA game development studio Hatice Ezgi TUGLU Kahraman AKYIL © Hatice Ezgi TUGLU, October 2010. © Kahraman AKYIL, October 2010. Examiner: Per ZARING Chalmers University of Technology University of Gothenburg Department of Computer Science and Engineering SE-412 96 Göteborg Sweden Telephone + 46 (0)31-772 1000 Department of Computer Science and Engineering Göteborg, Sweden October 2010 1 | P a g e HUMANKILLERS Will you keep your promise? 2 | P a g e Abstract “Humankillers” is a name of the game that was developed for Master Thesis in Computer Science Department at Chalmers University of Technology.
    [Show full text]
  • Game Dynamics Preferences Are Connected with Experiences Derived from First-Person Shooters
    Game Dynamics Preferences Are Connected with Experiences Derived from First-Person Shooters Suvi K. Holm Johanna K. Kaakinen Department of Psychology Department of Psychology University of Turku University of Turku Turku, Finland Turku, Finland [email protected] [email protected] Abstract —Individual differences in game dynamics violent video game playing is heightened physiological arousal preferences may affect the way players react to different types of [1], [4], [5], [6], which is often interpreted as a negative stress games. In the present study, 40 inexperienced players played a response that may lead to desensitization to violent content [7]. violent first-person shooter game. The players’ preferences for However, it is likely that the players experience some kind of violent game dynamics were scoped before playing. Moreover, the positive experiences from playing these games, as otherwise players self-reported their sense of curiosity, vitality and self- they would not be attractive to so many people. In fact, video efficacy in life in general and during playing. The results show that games in general have been noted to have potential for players who do not like violent game dynamics experience a lower increasing psychological well-being [8]. Most often, video sense of curiosity, vitality and self-efficacy during playing rather games are seen as a temporary source of joy, with some studies than in real life. Instead, there is no evidence for such difference indicating that players’ main motivations to play are fun and for players who express a neutral or mildly positive preference for violent dynamics, with the exception of a slightly worse sense of entertainment [9], [10].
    [Show full text]
  • Design Patterns and Analysis of Player Behavior in First-Person Shooter Levels
    UNIVERSITY OF CALIFORNIA SANTA CRUZ THE SCIENCE OF LEVEL DESIGN: DESIGN PATTERNS AND ANALYSIS OF PLAYER BEHAVIOR IN FIRST-PERSON SHOOTER LEVELS A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in COMPUTER SCIENCE by Kenneth M. Hullett September 2012 The Dissertation of Kenneth M. Hullett is approved: _______________________________ Professor E. James Whitehead, Chair _______________________________ Associate Professor Travis Seymour _______________________________ Assistant Professor Arnav Jhala _____________________________ Tyrus Miller Vice Provost and Dean of Graduate Studies Copyright © by Kenneth M. Hullett 2012 TABLE OF CONTENTS Table of Contents ......................................................................................................... iii List of Figures ............................................................................................................ xiii List of Tables ............................................................................................................ xvii ABSTRACT ................................................................................................................ xx Dedication .................................................................................................................. xxi Acknowledgments..................................................................................................... xxii Chapter 1: Introduction ...............................................................................................
    [Show full text]
  • Learning to Shoot in First Person Shooter Games by Stabilizing Actions and Clustering Rewards for Reinforcement Learning
    Learning to Shoot in First Person Shooter Games by Stabilizing Actions and Clustering Rewards for Reinforcement Learning Frank G. Glavin Michael G. Madden College of Engineering & Informatics, College of Engineering & Informatics, National University of Ireland, Galway, National University of Ireland, Galway, Ireland. Ireland. Email: [email protected] Email: [email protected] Abstract—While reinforcement learning (RL) has been applied A. Reinforcement Learning to turn-based board games for many years, more complex games The basic underlying principle of reinforcement learning involving decision-making in real-time are beginning to receive more attention. A challenge in such environments is that the time [1] is that an agent interacts with an environment and receives that elapses between deciding to take an action and receiving a feedback in the form of positive or negative rewards based reward based on its outcome can be longer than the interval on the actions it decides to take. The goal of the agent is to between successive decisions. We explore this in the context of maximize the long term reward that it receives. For this, a a non-player character (NPC) in a modern first-person shooter set of states, actions and a reward source must be defined. game. Such games take place in 3D environments where players, both human and computer-controlled, compete by engaging in The state space comprises a (typically finite) set of states, combat and completing task objectives. We investigate the use representing the agent’s view of the world. The action space of RL to enable NPCs to gather experience from game-play and comprises all of the actions that the agent can carry out when improve their shooting skill over time from a reward signal based in a given state.
    [Show full text]