State of the Art Control of Atari Games Using Shallow Reinforcement Learning

Total Page:16

File Type:pdf, Size:1020Kb

State of the Art Control of Atari Games Using Shallow Reinforcement Learning State of the Art Control of Atari Games Using Shallow Reinforcement Learning Yitao Liangy, Marlos C. Machadoz, Erik Talvitiey, and Michael Bowlingz yFranklin & Marshall College zUniversity of Alberta Lancaster, PA, USA Edmonton, AB, Canada {yliang, erik.talvitie}@fandm.edu {machado, mbowling}@ualberta.ca ABSTRACT simultaneously learn a problem-specific representation and The recently introduced Deep Q-Networks (DQN) algorithm estimate a value function. has gained attention as one of the first successful combina- Games have always been an important testbed for AI, fre- tions of deep neural networks and reinforcement learning. Its quently being used to demonstrate major contributions to promise was demonstrated in the Arcade Learning Environ- the field [5, 6, 8, 23, 28]. DQN follows this tradition, demon- ment (ALE), a challenging framework composed of dozens strating its success by achieving human-level performance in of Atari 2600 games used to evaluate general competency the majority of games within the Arcade Learning Environ- in AI. It achieved dramatically better results than earlier ment (ALE) [2]. The ALE is a platform composed of dozens approaches, showing that its ability to learn good represen- of qualitatively diverse Atari 2600 games. As pictured in tations is quite robust and general. This paper attempts to Figure 1, the games in this suite include first-person per- understand the principles that underlie DQN's impressive spective shooting games (e.g. Battle Zone), platforming performance and to better contextualize its success. We sys- puzzle games (e.g. Montezuma's Revenge), sports games tematically evaluate the importance of key representational (e.g. Ice Hockey), and many other genres. Because of this biases encoded by DQN's network by proposing simple linear diversity, successful approaches in the ALE necessarily ex- representations that make use of these concepts. Incorporat- hibit a degree of robustness and generality. Further, because ing these characteristics, we obtain a computationally prac- it is based on problems designed by humans for humans, the tical feature set that achieves competitive performance to ALE inherently encodes some of the biases that allow hu- DQN in the ALE. Besides offering insight into the strengths mans to successfully navigate the world. This makes it a and weaknesses of DQN, we provide a generic representation potential stepping-stone to other, more complex decision- for the ALE, significantly reducing the burden of learning a making problems, especially those with visual input. representation for each game. Moreover, we also provide a DQN's success in the ALE rightfully attracted a great deal simple, reproducible benchmark for the sake of comparison of attention. For the first time an artificial agent achieved to future work in the ALE. performance comparable to a human player in this challeng- ing domain, achieving by far the best results at the time and demonstrating generality across many diverse problems. Keywords It also demonstrated a successful large scale integration of Reinforcement Learning, Function Approximation, DQN, deep neural networks and RL, surely an important step to- Representation Learning, Arcade Learning Environment ward more flexible agents. That said, problematic aspects of DQN's evaluation make it difficult to fully interpret the 1. INTRODUCTION results. As will be discussed in more detail in Section 6, the DQN experiments exploit non-standard game-specific prior In the reinforcement learning (RL) problem an agent au- information and also report only one independent trial per tonomously learns a behavior policy from experience in or- game, making it difficult to reproduce these results or to der to maximize a provided reward signal. Most successful make principled comparisons to other methods. Indepen- RL approaches have relied upon the engineering of problem- dent re-evaluations have already reported different results specific state representations, diminishing the agent as fully due to the variance in the single trial performance of DQN autonomous and reducing its flexibility. The recent Deep (e.g. [12]). Furthermore, the comparisons that Mnih et al. Q-Network (DQN) algorithm [20] aims to tackle this prob- did present were to benchmark results using far less training lem, presenting one of the first successful combinations of data. Without an adequate baseline for what is achievable RL and deep convolutional neural-networks (CNN) [13, 14], using simpler techniques, it is difficult to evaluate the cost- which are proving to be a powerful approach to represen- benefit ratios of more complex methods like DQN. tation learning in many areas. DQN is based upon the In addition to these methodological concerns, the evalu- well-known Q-learning algorithm [31] and uses a CNN to ation of a complicated method such as DQN often leaves open the question of which of its properties were most im- Appears in: Proceedings of the 15th International Conference portant to its success. While it is tempting to assume that on Autonomous Agents and Multiagent Systems (AAMAS 2016), the neural network must be discovering insightful tailored J. Thangarajah, K. Tuyls, C. Jonker, S. Marsella (eds.), representations for each game, there is also a considerable May 9{13, 2016, Singapore. Copyright c 2016, International Foundation for Autonomous Agents and amount of domain knowledge embedded in the very struc- Multiagent Systems (www.ifaamas.org). All rights reserved. π : P1 k V (s) = Eπ k=0 γ rt+k+1jst = s for all s 2 S, where γ 2 [0; 1) is known as the discount factor. As a critical step toward improving a given policy π, it is common for reinforcement learning algorithms to learn a state-action value function, denoted: : Qπ(s; a) = R(s ; a ; s ) + γV π(s ) s =s; a =a: Figure 1: Examples from the ALE (left to right: Eπ t t t+1 t+1 t t Battle Zone, Montezuma's Revenge, Ice Hockey). However, in large problems it may be infeasible to learn a value for each state-action pair. To tackle this issue agents often learn an approximate value function: Qπ(s; a; θ) ≈ ture of the network. We can identify three key structural Qπ(s; a). A common approach uses linear function approx- biases in DQN's representation. First, CNNs provide a form imation (LFA) where Qπ(s; a; θ) = θ>φ(s; a), in which θ of spatial invariance not exploited in earlier work using lin- denotes the vector of weights and φ(s; a) denotes a static ear function approximation. DQN also made use of multiple feature representation of the state s when taking action a. frames of input allowing for the representation of short-range However, this can also be done through non-linear function non-Markovian value functions. Finally, small-sized convo- approximation methods, including neural networks. lutions are quite well suited to detecting small patterns of One of the most popular reinforcement learning algorithms pixels that commonly represent objects in early video game is Sarsa(λ) [22]. It consists of learning an approximate graphics. These biases were not fully explored in earlier action-value function while following a continually improv- work in the ALE, so a natural question is whether non-linear ing policy π. As the states are visited, and rewards are deep network representations are key to strong performance observed, the action-value function is updated and conse- in ALE or whether the general principles implicit in DQN's quently the policy is improved since each update improves network architecture might be captured more simply. the estimative of the agent's expected return from state s, The primary goal of this paper is to systematically in- taking action a, and then following policy π afterwards, i.e. vestigate the sources of DQN's success in the ALE. Obtain- Qπ(s; a). The Sarsa(λ) update equations, when using func- ing a deeper understanding of the core principles influencing tion approximation, are: DQN's performance and placing its impressive results in per- spective should aid practitioners aiming to apply or extend δt = rt+1 + γQ(st+1; at+1; θ~t) − Q(st; at; θ~t) it (whether in the ALE or in other domains). It should also ~ reveal representational issues that are key to success in the ~et = γλ~et−1 + rQ(st; at; θt) ALE itself, potentially inspiring new directions of research. θ~t+1 = θ~t + αδt~et; We perform our investigation by elaborating upon a sim- ple linear representation that was one of DQN's main com- where α denotes the step-size, the elements of ~et are known parison points, progressively incorporating the representa- as the eligibility traces, and δt is the temporal difference er- tional biases identified above and evaluating the impact of ror. Theoretical results suggest that, as an on-policy method, each one. This process ultimately yields a fixed, generic fea- Sarsa(λ) may be more stable in the linear function approx- ture representation able to obtain performance competitive imation case than off-policy methods such as Q-learning, to DQN in the ALE, suggesting that the general form of which are known to risk divergence [1, 10, 18]. These theo- the representations learned by DQN may in many cases be retical insights are confirmed in practice; Sarsa(λ) seems to more important than the specific features it learns. Further, be far less likely to diverge in the ALE than Q-learning and this feature set offers a simple and computationally prac- other off-policy methods [7]. tical alternative to DQN as a platform for future research, especially when the focus is not on representation learning. 2.2 Arcade Learning Environment Finally, we are able provide an alternative benchmark that In the Arcade Learning Environment agents have access is more methodologically sound, easing reproducibility and only to sensory information (160 pixels wide by 210 pixels comparison to future work.
Recommended publications
  • Taito Pinball Tables Volume 1 User Manual
    TAITO PINBALL TABLES VOLUME 1 USER MANUAL For all Legends Arcade Family Devices © TAITO CORPORATION 1978, 1982, 1986, 1987 ALL RIGHTS RESERVED. Version 1.7 | September 1, 2021 Contents Overview ............................................................................................................1 DARIUS™ .............................................................................................................2 Description .......................................................................................................2 Rollovers ..........................................................................................................2 Specials ...........................................................................................................2 Standup Targets ................................................................................................. 2 Extra Ball .........................................................................................................2 Hole Score ........................................................................................................2 FRONT LINE™ ........................................................................................................3 Description .......................................................................................................3 50,000 Points Reward ...........................................................................................3 O-R-B-I-T Lamps ................................................................................................
    [Show full text]
  • Res2k Gamelist ATARI 2600
    RES2k Gamelist ATARI 2600 Carnival 3-D Tic-Tac-Toe Casino 32 in 1 Game Cartridge Cat Trax (Proto) 2005 Minigame Multicart (Unl) Cathouse Blues A-VCS-tec Challenge (Unl) Cave In (Unl) Acid Drop Centipede Actionauts (Proto) Challenge Adventure Championship Soccer Adventures of TRON Chase the Chuckwagon Air Raid Checkers Air Raiders Cheese (Proto) Air-Sea Battle China Syndrome Airlock Chopper Command Alfred Challenge (Unl) Chuck Norris Superkicks Alien Greed (Unl) Circus Atari Alien Greed 2 (Unl) Climber 5 (Unl) Alien Greed 3 (Unl) Coco Nuts Alien Codebreaker Allia Quest (Unl) Colony 7 (Unl) Alpha Beam with Ernie Combat Two (Proto) Amidar Combat Aquaventure (Proto) Commando Raid Armor Ambush Commando Artillery Duel Communist Mutants from Space AStar (Unl) CompuMate Asterix Computer Chess (Proto) Asteroid Fire Condor Attack Asteroids Confrontation (Proto) Astroblast Congo Bongo Astrowar Conquest of Mars (Unl) Atari Video Cube Cookie Monster Munch Atlantis II Cosmic Ark Atlantis Cosmic Commuter Atom Smasher (Proto) Cosmic Creeps AVGN K.O. Boxing (Unl) Cosmic Invaders (Unl) Bachelor Party Cosmic Swarm Bachelorette Party Crack'ed (Proto) Backfire (Unl) Crackpots Backgammon Crash Dive Bank Heist Crazy Balloon (Unl) Barnstorming Crazy Climber Base Attack Crazy Valet (Unl) Basic Math Criminal Persuit BASIC Programming Cross Force Basketball Crossbow Battlezone Crypts of Chaos Beamrider Crystal Castles Beany Bopper Cubicolor (Proto) Beat 'Em & Eat 'Em Cubis (Unl) Bee-Ball (Unl) Custer's Revenge Berenstain Bears Dancing Plate Bermuda Triangle Dark
    [Show full text]
  • The Retriever, Issue 1, Volume 39
    18 Features August 31, 2004 THE RETRIEVER Alien vs. Predator: as usual, humans screwed it up Courtesy of 20th Century Fox DOUGLAS MILLER After some groundbreaking discoveries on Retriever Weekly Editorial Staff the part of the humans, three Predators show up and it is revealed that the temple functions as prov- Many of the staple genre franchises that chil- ing ground for young Predator warriors. As the dren of the 1980’s grew up with like Nightmare on first alien warriors are born, chaos ensues – with Elm street or Halloween are now over twenty years Weyland’s team stuck right in the middle. Of old and are beginning to loose appeal, both with course, lots of people and monsters die. their original audience and the next generation of Observant fans will notice that Anderson’s filmgoers. One technique Hollywood has been story is very similar his own Resident Evil, but it exploiting recently to breath life into dying fran- works much better here. His premise is actually chises is to combine the keystone character from sort of interesting – especially ideas like Predator one’s with another’s – usually ending up with a involvement in our own development. Anderson “versus” film. Freddy vs. Jason was the first, and tries to allow his story to unfold and build in the now we have Alien vs. Predator, which certainly style of Alien, withholding the monsters almost will not be the last. Already, the studios have toyed altogether until the second half of the film. This around with making Superman vs. Batman, does not exactly work.
    [Show full text]
  • Rétro Gaming
    ATARI - CONSOLE RETRO FLASHBACK 8 GOLD ACTIVISION – 130 JEUX Faites ressurgir vos meilleurs souvenirs ! Avec 130 classiques du jeu vidéo dont 39 jeux Activision, cette Atari Flashback 8 Gold édition Activision saura vous rappeler aux bons souvenirs du rétro-gaming. Avec les manettes sans fils ou vos anciennes manettes classiques Atari, vous n’avez qu’à brancher la console à votre télévision et vous voilà prêts pour l’action ! CARACTÉRISTIQUES : • 130 jeux classiques incluant les meilleurs hits de la console Atari 2600 et 39 titres Activision • Plug & Play • Inclut deux manettes sans fil 2.4G • Fonctions Sauvegarde, Reprise, Rembobinage • Sortie HD 720p • Port HDMI • Ecran FULL HD Inclut les jeux cultes : • Space Invaders • Centipede • Millipede • Pitfall! • River Raid • Kaboom! • Spider Fighter LISTE DES JEUX INCLUS LISTE DES JEUX ACTIVISION REF Adventure Black Jack Football Radar Lock Stellar Track™ Video Chess Beamrider Laser Blast Asteroids® Bowling Frog Pond Realsports® Baseball Street Racer Video Pinball Boxing Megamania JVCRETR0124 Centipede ® Breakout® Frogs and Flies Realsports® Basketball Submarine Commander Warlords® Bridge Oink! Kaboom! Canyon Bomber™ Fun with Numbers Realsports® Soccer Super Baseball Yars’ Return Checkers Pitfall! Missile Command® Championship Golf Realsports® Volleyball Super Breakout® Chopper Command Plaque Attack Pitfall! Soccer™ Gravitar® Return to Haunted Save Mary Cosmic Commuter Pressure Cooker EAN River Raid Circus Atari™ Hangman House Super Challenge™ Football Crackpots Private Eye Yars’ Revenge® Combat®
    [Show full text]
  • Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay
    Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay Haiyan Yin, Sinno Jialin Pan School of Computer Science and Engineering Nanyang Technological University, Singapore {haiyanyin, sinnopan}@ntu.edu.sg Abstract To tackle the stated issue, model compression and multi- task learning techniques have been integrated into deep re- The process for transferring knowledge of multiple reinforce- inforcement learning. The approach that utilizes distillation ment learning policies into a single multi-task policy via dis- technique to conduct knowledge transfer for multi-task rein- tillation technique is known as policy distillation. When pol- forcement learning is referred to as policy distillation (Rusu icy distillation is under a deep reinforcement learning setting, due to the giant parameter size and the huge state space for et al. 2016). The goal is to train a single policy network each task domain, it requires extensive computational efforts that can be used for multiple tasks at the same time. In to train the multi-task policy network. In this paper, we pro- general, it can be considered as a transfer learning process pose a new policy distillation architecture for deep reinforce- with a student-teacher architecture. The knowledge is firstly ment learning, where we assume that each task uses its task- learned in each single problem domain as teacher policies, specific high-level convolutional features as the inputs to the and then it is transferred to a multi-task policy that is known multi-task policy network. Furthermore, we propose a new as student policy.
    [Show full text]
  • Fighting Games, Performativity, and Social Game Play a Dissertation
    The Art of War: Fighting Games, Performativity, and Social Game Play A dissertation presented to the faculty of the Scripps College of Communication of Ohio University In partial fulfillment of the requirements for the degree Doctor of Philosophy Todd L. Harper November 2010 © 2010 Todd L. Harper. All Rights Reserved. This dissertation titled The Art of War: Fighting Games, Performativity, and Social Game Play by TODD L. HARPER has been approved for the School of Media Arts and Studies and the Scripps College of Communication by Mia L. Consalvo Associate Professor of Media Arts and Studies Gregory J. Shepherd Dean, Scripps College of Communication ii ABSTRACT HARPER, TODD L., Ph.D., November 2010, Mass Communications The Art of War: Fighting Games, Performativity, and Social Game Play (244 pp.) Director of Dissertation: Mia L. Consalvo This dissertation draws on feminist theory – specifically, performance and performativity – to explore how digital game players construct the game experience and social play. Scholarship in game studies has established the formal aspects of a game as being a combination of its rules and the fiction or narrative that contextualizes those rules. The question remains, how do the ways people play games influence what makes up a game, and how those players understand themselves as players and as social actors through the gaming experience? Taking a qualitative approach, this study explored players of fighting games: competitive games of one-on-one combat. Specifically, it combined observations at the Evolution fighting game tournament in July, 2009 and in-depth interviews with fighting game enthusiasts. In addition, three groups of college students with varying histories and experiences with games were observed playing both competitive and cooperative games together.
    [Show full text]
  • List of Intellivision Games
    List of Intellivision Games 1) 4-Tris 25) Checkers 2) ABPA Backgammon 26) Chip Shot: Super Pro Golf 3) ADVANCED DUNGEONS & DRAGONS 27) Commando Cartridge 28) Congo Bongo 4) ADVANCED DUNGEONS & DRAGONS 29) Crazy Clones Treasure of Tarmin Cartridge 30) Deep Pockets: Super Pro Pool and Billiards 5) Adventure (AD&D - Cloudy Mountain) (1982) (Mattel) 31) Defender 6) Air Strike 32) Demon Attack 7) Armor Battle 33) Diner 8) Astrosmash 34) Donkey Kong 9) Atlantis 35) Donkey Kong Junior 10) Auto Racing 36) Dracula 11) B-17 Bomber 37) Dragonfire 12) Beamrider 38) Eggs 'n' Eyes 13) Beauty & the Beast 39) Fathom 14) Blockade Runner 40) Frog Bog 15) Body Slam! Super Pro Wrestling 41) Frogger 16) Bomb Squad 42) Game Factory 17) Boxing 43) Go for the Gold 18) Brickout! 44) Grid Shock 19) Bump 'n' Jump 45) Happy Trails 20) BurgerTime 46) Hard Hat 21) Buzz Bombers 47) Horse Racing 22) Carnival 48) Hover Force 23) Centipede 49) Hypnotic Lights 24) Championship Tennis 50) Ice Trek 51) King of the Mountain 52) Kool-Aid Man 80) Number Jumble 53) Lady Bug 81) PBA Bowling 54) Land Battle 82) PGA Golf 55) Las Vegas Poker & Blackjack 83) Pinball 56) Las Vegas Roulette 84) Pitfall! 57) League of Light 85) Pole Position 58) Learning Fun I 86) Pong 59) Learning Fun II 87) Popeye 60) Lock 'N' Chase 88) Q*bert 61) Loco-Motion 89) Reversi 62) Magic Carousel 90) River Raid 63) Major League Baseball 91) Royal Dealer 64) Masters of the Universe: The Power of He- 92) Safecracker Man 93) Scooby Doo's Maze Chase 65) Melody Blaster 94) Sea Battle 66) Microsurgeon 95) Sewer Sam 67) Mind Strike 96) Shark! Shark! 68) Minotaur (1981) (Mattel) 97) Sharp Shot 69) Mission-X 98) Slam Dunk: Super Pro Basketball 70) Motocross 99) Slap Shot: Super Pro Hockey 71) Mountain Madness: Super Pro Skiing 100) Snafu 72) Mouse Trap 101) Space Armada 73) Mr.
    [Show full text]
  • Alien Vs. Predator Table Guide by Shoryukentothechin
    Page 1 of 36 Alien vs. Predator Table Guide By ShoryukenToTheChin 6 7 5 8 9 3 4 2 10 11 1 Page 2 of 36 Key to Table Overhead Image – 1. Egg Sink Holes 2. Left Orbit 3. Left Ramp 4. Glyph Targets 5. Alien Target 6. Sink Hole 7. Plasma Mini – Orbit 8. Wrist Blade Target 9. Pyramid Mini – Orbit 10. Right Ramp 11. Right Orbit In this guide when I mention a Ramp, Lane, Hole etc. I will put a number in brackets which will correspond to the above Key, so that you know where on the Table that particular feature is located. Page 3 of 36 TABLE SPECIFICS Notice: This Guide is based off of the Zen Pinball 2 (PS4/PS3/Vita) version of the Table on default controls. Some of the controls will be different on the other versions (Pinball FX 2, etc...), but everything else in the Guide remains the same. INTRODUCTION Zen Studios has teamed up with Fox to give us an Alien vs. Predator Pinball Table. The Table was released within a pack titled “Aliens vs. Pinball” which featured 3 Pinball Tables based on Aliens Cinematic Universe. Alien vs. Predator Pinball sees you play through various Modes which see you play out key scenes in the blockbuster movie. The Table incorporates the art style of the movie, and various audio works from the movie itself. I hope my Guide will help you understand the Table better. Page 4 of 36 Skill Shot - *1 Million Points, can be raised* Depending on the launch strength you used on the Ball.
    [Show full text]
  • Newagearcade.Com 5000 in One Arcade Game List!
    Newagearcade.com 5,000 In One arcade game list! 1. AAE|Armor Attack 2. AAE|Asteroids Deluxe 3. AAE|Asteroids 4. AAE|Barrier 5. AAE|Boxing Bugs 6. AAE|Black Widow 7. AAE|Battle Zone 8. AAE|Demon 9. AAE|Eliminator 10. AAE|Gravitar 11. AAE|Lunar Lander 12. AAE|Lunar Battle 13. AAE|Meteorites 14. AAE|Major Havoc 15. AAE|Omega Race 16. AAE|Quantum 17. AAE|Red Baron 18. AAE|Ripoff 19. AAE|Solar Quest 20. AAE|Space Duel 21. AAE|Space Wars 22. AAE|Space Fury 23. AAE|Speed Freak 24. AAE|Star Castle 25. AAE|Star Hawk 26. AAE|Star Trek 27. AAE|Star Wars 28. AAE|Sundance 29. AAE|Tac/Scan 30. AAE|Tailgunner 31. AAE|Tempest 32. AAE|Warrior 33. AAE|Vector Breakout 34. AAE|Vortex 35. AAE|War of the Worlds 36. AAE|Zektor 37. Classic Arcades|'88 Games 38. Classic Arcades|1 on 1 Government (Japan) 39. Classic Arcades|10-Yard Fight (World, set 1) 40. Classic Arcades|1000 Miglia: Great 1000 Miles Rally (94/07/18) 41. Classic Arcades|18 Holes Pro Golf (set 1) 42. Classic Arcades|1941: Counter Attack (World 900227) 43. Classic Arcades|1942 (Revision B) 44. Classic Arcades|1943 Kai: Midway Kaisen (Japan) 45. Classic Arcades|1943: The Battle of Midway (Euro) 46. Classic Arcades|1944: The Loop Master (USA 000620) 47. Classic Arcades|1945k III 48. Classic Arcades|19XX: The War Against Destiny (USA 951207) 49. Classic Arcades|2 On 2 Open Ice Challenge (rev 1.21) 50. Classic Arcades|2020 Super Baseball (set 1) 51.
    [Show full text]
  • Complete-Famicom-Game-List.Pdf
    --------------------------------------------------------------------------------------------- The Ultimate Famicom Software Guide -- version 1.0 -- Created by fcgamer, fcgamer26 [at] gmail [dot] com https://fcgamer.wordpress.com --------------------------------------------------------------------------------------------- INTRODUCTION / AUTHOR'S NOTE: When I first started collecting Famicom games, over three years ago, I had decided to chase after a complete set of the unlicensed software developed for the Famicom. I chose this goal partially because of my location, but also because it was a collection that few people had ever bothered to collect. Aside from a few deleted webpages available through archive.org and the incomplete sources at Famicom World and BootlegGames Wiki, there just wasn't much to go on. Hundreds of hours of searching through auctions around the globe, chatting with other collectors, and just going out and tracking down this obscure corner of Famicom collecting, and I had ended up compiling my own personal list to help aid with my own collecting goals. Since those modest times, the scope of my personal collection has evolved into collecting everything Famicom, from Russian translations to educational cartridges, to official game packs and promos as well. As such, the documents which I use to keep track of my collection / game wants also evolved, and I felt it was time to compile it into a more user-friendly document. Likewise, I would like to offer this document as a gift to all of my collecting buddies out there, who have helped sell / trade / gift me so many carts over the past three years. My collection wouldn't be where it is today without you guys, and without all of the interesting discussions about Famicom, I probably would have also lost interest by now if I were to go at it truly solo.
    [Show full text]
  • Using Deep Learning to Create a Universal Game Player
    Using Deep Learning to create a Universal Game Player Tackling Transfer Learning and Catastrophic Forgetting Using Asynchronous Methods for Deep Reinforcement Learning Joao˜ Manuel Godinho Ribeiro Thesis to obtain the Master of Science Degree in Information Systems and Computer Engineering Supervisors: Prof. Francisco Antonio´ Chaves Saraiva de Melo Prof. Joao˜ Miguel De Sousa de Assis Dias Examination Committee Chairperson: Prof. Lu´ıs Manuel Antunes Veiga Supervisor: Prof. Francisco Antonio´ Chaves Saraiva de Melo Member of the Committee: Prof. Manuel Fernando Cabido Peres Lopes October 2018 Acknowledgments This dissertation is dedicated to my parents and family, who supported me not only throughout my academic career, but throughout all the difficulties I’ve encountered in life. To my two advisers, Francisco and Joao,˜ for not only providing all the necessary support, patience and availability, throughout the entire process, but especially for this wonderful opportunity to work in the scientific area for which I am most passionate about. To my beautiful girlfriend Margarida, for always believing in me in times of great doubt and adversity, and without whom the entire process would have been much tougher. Thank you for your strength. Finally, to my friends and colleagues which accompanied me throughout the entire process and without whom life wouldn’t have the same meaning. Thank you all for always being there for me. Abstract This dissertation introduces a general-purpose architecture that allows a learning agent to (i) transfer knowledge from a previously learned task to a new one that is now required to learned and (ii) remember how to perform the previously learned tasks as it learns the new one.
    [Show full text]
  • Learning Robust Helpful Behaviors in Two-Player Cooperative Atari Environments
    Learning Robust Helpful Behaviors in Two-Player Cooperative Atari Environments Paul Tylkin Goran Radanovic David C. Parkes Harvard University MPI-SWS∗ Harvard University [email protected] [email protected] [email protected] Abstract We initiate the study of helpful behavior in the setting of two-player Atari games, suitably modified to provide cooperative incentives. Our main interest is to understand whether reinforcement learning can be used to achieve robust, helpful behavior— where one agent is trained to help a second, partner agent. Robustness requires the helpful AI to be able to cooperate effectively with a diverse set of partners. We study this question with both artificial partner agents as well as human participants (introducing a new, web-based framework for the study of human-with-AI behavior). We achieve positive results in both Space Invaders and Fall Down, as well as successful transfer to human partners, including with people who are asked to deliberately follow unexpected behaviors. 1 Introduction As we anticipate a future of systems of AIs, interacting in ever-increasing ways and with each other as well as with people, it is important to develop methods to promote cooperation. This need for cooperation is relevant, for example, in settings with automated vehicles [1], home robotics [14], as well as in military domains, where UAVs assist teams of soldiers [31]. In this paper, we seek to advance the study of cooperative behavior through suitably modified, two-player Atari games. Although closed and relatively simple environments, there is a rich, recent tradition of using Atari to drive advances in AI [23, 24].
    [Show full text]