Multi-Agent Reinforcement Learning: a Selective Overview of Theories and Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Artificial Intelligence in Health Care: the Hope, the Hype, the Promise, the Peril
Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril Michael Matheny, Sonoo Thadaney Israni, Mahnoor Ahmed, and Danielle Whicher, Editors WASHINGTON, DC NAM.EDU PREPUBLICATION COPY - Uncorrected Proofs NATIONAL ACADEMY OF MEDICINE • 500 Fifth Street, NW • WASHINGTON, DC 20001 NOTICE: This publication has undergone peer review according to procedures established by the National Academy of Medicine (NAM). Publication by the NAM worthy of public attention, but does not constitute endorsement of conclusions and recommendationssignifies that it is the by productthe NAM. of The a carefully views presented considered in processthis publication and is a contributionare those of individual contributors and do not represent formal consensus positions of the authors’ organizations; the NAM; or the National Academies of Sciences, Engineering, and Medicine. Library of Congress Cataloging-in-Publication Data to Come Copyright 2019 by the National Academy of Sciences. All rights reserved. Printed in the United States of America. Suggested citation: Matheny, M., S. Thadaney Israni, M. Ahmed, and D. Whicher, Editors. 2019. Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril. NAM Special Publication. Washington, DC: National Academy of Medicine. PREPUBLICATION COPY - Uncorrected Proofs “Knowing is not enough; we must apply. Willing is not enough; we must do.” --GOETHE PREPUBLICATION COPY - Uncorrected Proofs ABOUT THE NATIONAL ACADEMY OF MEDICINE The National Academy of Medicine is one of three Academies constituting the Nation- al Academies of Sciences, Engineering, and Medicine (the National Academies). The Na- tional Academies provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. -
Learning and Equilibrium
Learning and Equilibrium Drew Fudenberg1 and David K. Levine2 1Department of Economics, Harvard University, Cambridge, Massachusetts; email: [email protected] 2Department of Economics, Washington University of St. Louis, St. Louis, Missouri; email: [email protected] Annu. Rev. Econ. 2009. 1:385–419 Key Words First published online as a Review in Advance on nonequilibrium dynamics, bounded rationality, Nash equilibrium, June 11, 2009 self-confirming equilibrium The Annual Review of Economics is online at by 140.247.212.190 on 09/04/09. For personal use only. econ.annualreviews.org Abstract This article’s doi: The theory of learning in games explores how, which, and what 10.1146/annurev.economics.050708.142930 kind of equilibria might arise as a consequence of a long-run non- Annu. Rev. Econ. 2009.1:385-420. Downloaded from arjournals.annualreviews.org Copyright © 2009 by Annual Reviews. equilibrium process of learning, adaptation, and/or imitation. If All rights reserved agents’ strategies are completely observed at the end of each round 1941-1383/09/0904-0385$20.00 (and agents are randomly matched with a series of anonymous opponents), fairly simple rules perform well in terms of the agent’s worst-case payoffs, and also guarantee that any steady state of the system must correspond to an equilibrium. If players do not ob- serve the strategies chosen by their opponents (as in extensive-form games), then learning is consistent with steady states that are not Nash equilibria because players can maintain incorrect beliefs about off-path play. Beliefs can also be incorrect because of cogni- tive limitations and systematic inferential errors. -
The Poetics of Reflection in Digital Games
© Copyright 2019 Terrence E. Schenold The Poetics of Reflection in Digital Games Terrence E. Schenold A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2019 Reading Committee: Brian M. Reed, Chair Leroy F. Searle Phillip S. Thurtle Program Authorized to Offer Degree: English University of Washington Abstract The Poetics of Reflection in Digital Games Terrence E. Schenold Chair of the Supervisory Committee: Brian Reed, Professor English The Poetics of Reflection in Digital Games explores the complex relationship between digital games and the activity of reflection in the context of the contemporary media ecology. The general aim of the project is to create a critical perspective on digital games that recovers aesthetic concerns for game studies, thereby enabling new discussions of their significance as mediations of thought and perception. The arguments advanced about digital games draw on philosophical aesthetics, media theory, and game studies to develop a critical perspective on gameplay as an aesthetic experience, enabling analysis of how particular games strategically educe and organize reflective modes of thought and perception by design, and do so for the purposes of generating meaning and supporting expressive or artistic goals beyond amusement. The project also provides critical discussion of two important contexts relevant to understanding the significance of this poetic strategy in the field of digital games: the dynamics of the contemporary media ecology, and the technological and cultural forces informing game design thinking in the ludic century. The project begins with a critique of limiting conceptions of gameplay in game studies grounded in a close reading of Bethesda's Morrowind, arguing for a new a "phaneroscopical perspective" that accounts for the significance of a "noematic" layer in the gameplay experience that accounts for dynamics of player reflection on diegetic information and its integral relation to ergodic activity. -
Improving Fictitious Play Reinforcement Learning with Expanding Models
Improving Fictitious Play Reinforcement Learning with Expanding Models Rong-Jun Qin1;2, Jing-Cheng Pang1, Yang Yu1;y 1National Key Laboratory for Novel Software Technology, Nanjing University, China 2Polixir emails: [email protected], [email protected], [email protected]. yTo whom correspondence should be addressed Abstract Fictitious play with reinforcement learning is a general and effective framework for zero- sum games. However, using the current deep neural network models, the implementation of fictitious play faces crucial challenges. Neural network model training employs gradi- ent descent approaches to update all connection weights, and thus is easy to forget the old opponents after training to beat the new opponents. Existing approaches often maintain a pool of historical policy models to avoid the forgetting. However, learning to beat a pool in stochastic games, i.e., a wide distribution over policy models, is either sample-consuming or insufficient to exploit all models with limited amount of samples. In this paper, we pro- pose a learning process with neural fictitious play to alleviate the above issues. We train a single model as our policy model, which consists of sub-models and a selector. Everytime facing a new opponent, the model is expanded by adding a new sub-model, where only the new sub-model is updated instead of the whole model. At the same time, the selector is also updated to mix up the new sub-model with the previous ones at the state-level, so that the model is maintained as a behavior strategy instead of a wide distribution over policy models. -
Approximation Guarantees for Fictitious Play
Approximation Guarantees for Fictitious Play Vincent Conitzer Abstract— Fictitious play is a simple, well-known, and often- significant progress has been made in the computation of used algorithm for playing (and, especially, learning to play) approximate Nash equilibria. It has been shown that for any games. However, in general it does not converge to equilibrium; ǫ, there is an ǫ-approximate equilibrium with support size even when it does, we may not be able to run it to convergence. 2 Still, we may obtain an approximate equilibrium. In this O((log n)/ǫ ) [1], [22], so that we can find approximate paper, we study the approximation properties that fictitious equilibria by searching over all of these supports. More play obtains when it is run for a limited number of rounds. recently, Daskalakis et al. gave a very simple algorithm We show that if both players randomize uniformly over their for finding a 1/2-approximate Nash equilibrium [12]; we actions in the first r rounds of fictitious play, then the result will discuss this algorithm shortly. Feder et al. then showed is an ǫ-equilibrium, where ǫ = (r + 1)/(2r). (Since we are examining only a constant number of pure strategies, we know a lower bound on the size of the supports that must be that ǫ < 1/2 is impossible, due to a result of Feder et al.) We considered to be guaranteed to find an approximate equi- show that this bound is tight in the worst case; however, with an librium [16]; in particular, supports of constant sizes can experiment on random games, we illustrate that fictitious play give at best a 1/2-approximate Nash equilibrium. -
Modeling Human Learning in Games
Modeling Human Learning in Games Thesis by Norah Alghamdi In Partial Fulfillment of the Requirements For the Degree of Masters of Science King Abdullah University of Science and Technology Thuwal, Kingdom of Saudi Arabia December, 2020 2 EXAMINATION COMMITTEE PAGE The thesis of Norah Alghamdi is approved by the examination committee Committee Chairperson: Prof. Jeff S. Shamma Committee Members: Prof. Eric Feron, Prof. Meriem T. Laleg 3 ©December, 2020 Norah Alghamdi All Rights Reserved 4 ABSTRACT Modeling Human Learning in Games Norah Alghamdi Human-robot interaction is an important and broad area of study. To achieve success- ful interaction, we have to study human decision making rules. This work investigates human learning rules in games with the presence of intelligent decision makers. Par- ticularly, we analyze human behavior in a congestion game. The game models traffic in a simple scenario where multiple vehicles share two roads. Ten vehicles are con- trolled by the human player, where they decide on how to distribute their vehicles on the two roads. There are hundred simulated players each controlling one vehicle. The game is repeated for many rounds, allowing the players to adapt and formulate a strategy, and after each round, the cost of the roads and visual assistance is shown to the human player. The goal of all players is to minimize the total congestion experienced by the vehicles they control. In order to demonstrate our results, we first built a human player simulator using Fictitious play and Regret Matching algorithms. Then, we showed the passivity property of these algorithms after adjusting the passivity condition to suit discrete time formulation. -
Learning Board Game Rules from an Instruction Manual Chad Mills A
Learning Board Game Rules from an Instruction Manual Chad Mills A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science University of Washington 2013 Committee: Gina-Anne Levow Fei Xia Program Authorized to Offer Degree: Linguistics – Computational Linguistics ©Copyright 2013 Chad Mills University of Washington Abstract Learning Board Game Rules from an Instruction Manual Chad Mills Chair of the Supervisory Committee: Professor Gina-Anne Levow Department of Linguistics Board game rulebooks offer a convenient scenario for extracting a systematic logical structure from a passage of text since the mechanisms by which board game pieces interact must be fully specified in the rulebook and outside world knowledge is irrelevant to gameplay. A representation was proposed for representing a game’s rules with a tree structure of logically-connected rules, and this problem was shown to be one of a generalized class of problems in mapping text to a hierarchical, logical structure. Then a keyword-based entity- and relation-extraction system was proposed for mapping rulebook text into the corresponding logical representation, which achieved an f-measure of 11% with a high recall but very low precision, due in part to many statements in the rulebook offering strategic advice or elaboration and causing spurious rules to be proposed based on keyword matches. The keyword-based approach was compared to a machine learning approach, and the former dominated with nearly twenty times better precision at the same level of recall. This was due to the large number of rule classes to extract and the relatively small data set given this is a new problem area and all data had to be manually annotated. -
Games and Rules
Requirements for a General Game Mechanics Framework Imre Hofmann In this article I will apply a meta-theoretical approach to the question of how a game mechanics framework has to be designed in order to fulfil its task of ade- quately depicting or modeling the reality of game mechanics. I had two aims in mind when starting my research. I studied existing game theories and three game mechanics theories in particular in order to evaluate the state of the art of game mechanics theory. (Descriptive aim) The second aim followed on from these studies. With the results of my research I wanted to define the attributes a com- prehensive and general game mechanics theory might or – rather – needed to have. What are the general properties that such a theory should display? (Norma- tive aim) In my opinion the underlying mechanics of a game may be regarded in some way as its centerpiece. The reasons for this shall become clear over the course of this article. To begin with, I would like to make two terminological clarifications. First, I am not going to examine different game mechanics and typologies or categories of game mechanics. Instead, I will take a theoretical step backwards, so to speak, on the meta-level by analyzing and comparing different theories of game me- chanics (which themselves suggest categories and typologies). Secondly, I will use the expressions “game mechanics theory” “(game mechanics) framework” and “(game mechanics) model” synonymously. In my research I focused mainly on the following three key theories of game mechanics: 1. Carlo Fabricatore’s Gameplay and Game Mechanics Design (2007) 2. -
Transfer Learning Between RTS Combat Scenarios Using Component-Action Deep Reinforcement Learning
Transfer Learning Between RTS Combat Scenarios Using Component-Action Deep Reinforcement Learning Richard Kelly and David Churchill Department of Computer Science Memorial University of Newfoundland St. John’s, NL, Canada [email protected], [email protected] Abstract an enormous financial investment in hardware for training, using over 80000 CPU cores to run simultaneous instances Real-time Strategy (RTS) games provide a challenging en- of StarCraft II, 1200 Tensor Processor Units (TPUs) to train vironment for AI research, due to their large state and ac- the networks, as well as a large amount of infrastructure and tion spaces, hidden information, and real-time gameplay. Star- Craft II has become a new test-bed for deep reinforcement electricity to drive this large-scale computation. While Al- learning systems using the StarCraft II Learning Environment phaStar is estimated to be the strongest existing RTS AI agent (SC2LE). Recently the full game of StarCraft II has been ap- and was capable of beating many players at the Grandmas- proached with a complex multi-agent reinforcement learning ter rank on the StarCraft II ladder, it does not yet play at the (RL) system, however this is currently only possible with ex- level of the world’s best human players (e.g. in a tournament tremely large financial investments out of the reach of most setting). The creation of AlphaStar demonstrated that using researchers. In this paper we show progress on using varia- deep learning to tackle RTS AI is a powerful solution, how- tions of easier to use RL techniques, modified to accommo- ever applying it to the entire game as a whole is not an eco- date actions with multiple components used in the SC2LE. -
Week 3 Day 1: Designing Games with Conditionals
INTRODUCTION | CYBER ARCADE: PROGRAMMING AND MAKING WITH MICRO:BIT 3-1 | ELEMENTARY WEEK 3 DAY 1: DESIGNING GAMES WITH CONDITIONALS INTRODUCTION This week makers continue to use the Micro:bit microcontroller and begin coding and designing interactive games and art. ESSENTIAL QUESTIONS • How can we use code and math to create a fun game? • What is a conditional statement? • How do artists, engineers, and makers solve problems when they’re working? LEARNING OUTCOMES 1. Learn how to use code and conditionals to create a game. 2. Engage in project-based learning through problem-solving and troubleshooting by creating a game using a Micro:bit microcontroller and code. TEACHERLESSON | RESOURCE CYBER ARCADE: | CYBER PROGRAMMING ARCADE: PROGRAMMING AND MAKING AND WITH MAKING MICRO:BIT WITH MICRO:BIT 3-23-2 | | ELEMENTARY ELEMENTARY VOCABULARY Conditional: Set of rules performed if a certain condition is met Game mechanics: Basic actions, processes, visuals, and control mechanisms that are used to make a game Game designer: Person responsible for designing game storylines, plots, objectives, scenarios, the degree of difficulty, and character development Game engineer: Specialized software engineers who design and program video games Troubleshooting: Using resources to solve issues as they arise TEACHERLESSON | RESOURCE CYBER ARCADE: | CYBER PROGRAMMING ARCADE: PROGRAMMING AND MAKING AND WITH MAKING MICRO:BIT WITH MICRO:BIT 3-33-3 | | ELEMENTARY ELEMENTARY MATERIALS LIST EACH PAIR OF MAKERS NEEDS: • Micro:bit microcontroller • Laptop with internet connection • USB to micro-USB cord • USB flash drive • External battery pack • AAA batteries (2) • Notebook TEACHER RESOURCE | CYBER ARCADE: PROGRAMMING AND MAKING WITH MICRO:BIT 3-4 | ELEMENTARY They can also use the Troubleshooting Tips, search the internet for help, TEACHER PREP WORK or use the tutorial page on the MakeCode website. -
Computer Game Flow Design, ACM Computers in Entertainment, 4, 1
LJMU Research Online Taylor, MJ, Baskett, M, Reilly, D and Ravindran, S Game theory for computer games design http://researchonline.ljmu.ac.uk/id/eprint/7332/ Article Citation (please note it is advisable to refer to the publisher’s version if you intend to cite from this work) Taylor, MJ, Baskett, M, Reilly, D and Ravindran, S (2017) Game theory for computer games design. Games and Culture, 14 (7-8). pp. 843-855. ISSN 1555-4139 LJMU has developed LJMU Research Online for users to access the research output of the University more effectively. Copyright © and Moral Rights for the papers on this site are retained by the individual authors and/or other copyright owners. Users may download and/or print one copy of any article(s) in LJMU Research Online to facilitate their private study or for non-commercial research. You may not engage in further distribution of the material or use it for any profit-making activities or any commercial gain. The version presented here may differ from the published version or from the version of the record. Please see the repository URL above for details on accessing the published version and note that access may require a subscription. For more information please contact [email protected] http://researchonline.ljmu.ac.uk/ Game theory for computer games design Abstract Designing and developing computer games can be a complex activity that may involve professionals from a variety of disciplines. In this article, we examine the use of game theory for supporting the design of game play within the different sections of a computer game, and demonstrate its application in practice via adapted high-level decision trees for modelling the flow in game play and payoff matrices for modelling skill or challenge levels. -
Games and Rules
Games as a Special Zone Motivation Mechanics of Games René Bauer PLAY AS POSSIBILITY SPACE In games, sheep might speak, mountains fly, spaces move, moons disappear and faces morph; men can shrink, avatars can be moved, time rewound or frogs blown up. What would be regarded as delusional laws, imaginations and crazy ideas in other areas, as described for example in Dr. Daniel Schreber’s 1903 book “Denkwürdigkeiten eines Nervenkranken” (Memories of a Neuropath), is a tangible and interactive reality in many games. It is the special zone in which games operate that makes these divergent rules and laws culturally possible. Huizinga identified a number of playful special zones such as arenas, card tables, “Magic Circles”, stages or temples. They are all playgrounds. “The arena, the card-table, the magic circle, the temple, the stage, the screen, the tennis court, the court of justice, etc, are all in form and function play-grounds, i.e. forbidden spots, isolated, hedged round, hallowed, within which special rules obtain. All are tempo- rary worlds within the ordinary world, dedicated to the performance of an act apart.” (Huizinga 1955: 5) Thus, games (temporarily) position themselves against the ‘analog’ world with its fixed continuous rules that are based on atoms and their properties. The ana- log space surrounding us is bijective and continuous. It comes with three spatial dimensions, a continuous time and various resulting ‘laws’. The same applies to social and cultural conditions, which can also override or subvert games. This analog space and its rules become a complex, special set of rules in the much 36 | René Bauer more powerful possibility space of the games – or in other words: the analog space turns into another one of many game systems.