Computing Equilibria in Multiplayer Stochastic Games of Imperfect Information ∗
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Learning and Equilibrium
Learning and Equilibrium Drew Fudenberg1 and David K. Levine2 1Department of Economics, Harvard University, Cambridge, Massachusetts; email: [email protected] 2Department of Economics, Washington University of St. Louis, St. Louis, Missouri; email: [email protected] Annu. Rev. Econ. 2009. 1:385–419 Key Words First published online as a Review in Advance on nonequilibrium dynamics, bounded rationality, Nash equilibrium, June 11, 2009 self-confirming equilibrium The Annual Review of Economics is online at by 140.247.212.190 on 09/04/09. For personal use only. econ.annualreviews.org Abstract This article’s doi: The theory of learning in games explores how, which, and what 10.1146/annurev.economics.050708.142930 kind of equilibria might arise as a consequence of a long-run non- Annu. Rev. Econ. 2009.1:385-420. Downloaded from arjournals.annualreviews.org Copyright © 2009 by Annual Reviews. equilibrium process of learning, adaptation, and/or imitation. If All rights reserved agents’ strategies are completely observed at the end of each round 1941-1383/09/0904-0385$20.00 (and agents are randomly matched with a series of anonymous opponents), fairly simple rules perform well in terms of the agent’s worst-case payoffs, and also guarantee that any steady state of the system must correspond to an equilibrium. If players do not ob- serve the strategies chosen by their opponents (as in extensive-form games), then learning is consistent with steady states that are not Nash equilibria because players can maintain incorrect beliefs about off-path play. Beliefs can also be incorrect because of cogni- tive limitations and systematic inferential errors. -
Improving Fictitious Play Reinforcement Learning with Expanding Models
Improving Fictitious Play Reinforcement Learning with Expanding Models Rong-Jun Qin1;2, Jing-Cheng Pang1, Yang Yu1;y 1National Key Laboratory for Novel Software Technology, Nanjing University, China 2Polixir emails: [email protected], [email protected], [email protected]. yTo whom correspondence should be addressed Abstract Fictitious play with reinforcement learning is a general and effective framework for zero- sum games. However, using the current deep neural network models, the implementation of fictitious play faces crucial challenges. Neural network model training employs gradi- ent descent approaches to update all connection weights, and thus is easy to forget the old opponents after training to beat the new opponents. Existing approaches often maintain a pool of historical policy models to avoid the forgetting. However, learning to beat a pool in stochastic games, i.e., a wide distribution over policy models, is either sample-consuming or insufficient to exploit all models with limited amount of samples. In this paper, we pro- pose a learning process with neural fictitious play to alleviate the above issues. We train a single model as our policy model, which consists of sub-models and a selector. Everytime facing a new opponent, the model is expanded by adding a new sub-model, where only the new sub-model is updated instead of the whole model. At the same time, the selector is also updated to mix up the new sub-model with the previous ones at the state-level, so that the model is maintained as a behavior strategy instead of a wide distribution over policy models. -
Approximation Guarantees for Fictitious Play
Approximation Guarantees for Fictitious Play Vincent Conitzer Abstract— Fictitious play is a simple, well-known, and often- significant progress has been made in the computation of used algorithm for playing (and, especially, learning to play) approximate Nash equilibria. It has been shown that for any games. However, in general it does not converge to equilibrium; ǫ, there is an ǫ-approximate equilibrium with support size even when it does, we may not be able to run it to convergence. 2 Still, we may obtain an approximate equilibrium. In this O((log n)/ǫ ) [1], [22], so that we can find approximate paper, we study the approximation properties that fictitious equilibria by searching over all of these supports. More play obtains when it is run for a limited number of rounds. recently, Daskalakis et al. gave a very simple algorithm We show that if both players randomize uniformly over their for finding a 1/2-approximate Nash equilibrium [12]; we actions in the first r rounds of fictitious play, then the result will discuss this algorithm shortly. Feder et al. then showed is an ǫ-equilibrium, where ǫ = (r + 1)/(2r). (Since we are examining only a constant number of pure strategies, we know a lower bound on the size of the supports that must be that ǫ < 1/2 is impossible, due to a result of Feder et al.) We considered to be guaranteed to find an approximate equi- show that this bound is tight in the worst case; however, with an librium [16]; in particular, supports of constant sizes can experiment on random games, we illustrate that fictitious play give at best a 1/2-approximate Nash equilibrium. -
Modeling Human Learning in Games
Modeling Human Learning in Games Thesis by Norah Alghamdi In Partial Fulfillment of the Requirements For the Degree of Masters of Science King Abdullah University of Science and Technology Thuwal, Kingdom of Saudi Arabia December, 2020 2 EXAMINATION COMMITTEE PAGE The thesis of Norah Alghamdi is approved by the examination committee Committee Chairperson: Prof. Jeff S. Shamma Committee Members: Prof. Eric Feron, Prof. Meriem T. Laleg 3 ©December, 2020 Norah Alghamdi All Rights Reserved 4 ABSTRACT Modeling Human Learning in Games Norah Alghamdi Human-robot interaction is an important and broad area of study. To achieve success- ful interaction, we have to study human decision making rules. This work investigates human learning rules in games with the presence of intelligent decision makers. Par- ticularly, we analyze human behavior in a congestion game. The game models traffic in a simple scenario where multiple vehicles share two roads. Ten vehicles are con- trolled by the human player, where they decide on how to distribute their vehicles on the two roads. There are hundred simulated players each controlling one vehicle. The game is repeated for many rounds, allowing the players to adapt and formulate a strategy, and after each round, the cost of the roads and visual assistance is shown to the human player. The goal of all players is to minimize the total congestion experienced by the vehicles they control. In order to demonstrate our results, we first built a human player simulator using Fictitious play and Regret Matching algorithms. Then, we showed the passivity property of these algorithms after adjusting the passivity condition to suit discrete time formulation. -
Stochastic Game Theory Applications for Power Management in Cognitive Networks
STOCHASTIC GAME THEORY APPLICATIONS FOR POWER MANAGEMENT IN COGNITIVE NETWORKS A thesis submitted to Kent State University in partial fulfillment of the requirements for the degree of Master of Digital Science by Sham Fung Feb 2014 Thesis written by Sham Fung M.D.S, Kent State University, 2014 Approved by Advisor , Director, School of Digital Science ii TABLE OF CONTENTS LIST OF FIGURES . vi LIST OF TABLES . vii Acknowledgments . viii Dedication . ix 1 BACKGROUND OF COGNITIVE NETWORK . 1 1.1 Motivation and Requirements . 1 1.2 Definition of Cognitive Networks . 2 1.3 Key Elements of Cognitive Networks . 3 1.3.1 Learning and Reasoning . 3 1.3.2 Cognitive Network as a Multi-agent System . 4 2 POWER ALLOCATION IN COGNITIVE NETWORKS . 5 2.1 Overview . 5 2.2 Two Communication Paradigms . 6 2.3 Power Allocation Scheme . 8 2.3.1 Interference Constraints for Primary Users . 8 2.3.2 QoS Constraints for Secondary Users . 9 2.4 Related Works on Power Management in Cognitive Networks . 10 iii 3 GAME THEORY|PRELIMINARIES AND RELATED WORKS . 12 3.1 Non-cooperative Game . 14 3.1.1 Nash Equilibrium . 14 3.1.2 Other Equilibriums . 16 3.2 Cooperative Game . 16 3.2.1 Bargaining Game . 17 3.2.2 Coalition Game . 17 3.2.3 Solution Concept . 18 3.3 Stochastic Game . 19 3.4 Other Types of Games . 20 4 GAME THEORETICAL APPROACH ON POWER ALLOCATION . 23 4.1 Two Fundamental Types of Utility Functions . 23 4.1.1 QoS-Based Game . 23 4.1.2 Linear Pricing Game . -
Deterministic and Stochastic Prisoner's Dilemma Games: Experiments in Interdependent Security
NBER TECHNICAL WORKING PAPER SERIES DETERMINISTIC AND STOCHASTIC PRISONER'S DILEMMA GAMES: EXPERIMENTS IN INTERDEPENDENT SECURITY Howard Kunreuther Gabriel Silvasi Eric T. Bradlow Dylan Small Technical Working Paper 341 http://www.nber.org/papers/t0341 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 August 2007 We appreciate helpful discussions in designing the experiments and comments on earlier drafts of this paper by Colin Camerer, Vince Crawford, Rachel Croson, Robyn Dawes, Aureo DePaula, Geoff Heal, Charles Holt, David Krantz, Jack Ochs, Al Roth and Christian Schade. We also benefited from helpful comments from participants at the Workshop on Interdependent Security at the University of Pennsylvania (May 31-June 1 2006) and at the Workshop on Innovation and Coordination at Humboldt University (December 18-20, 2006). We thank George Abraham and Usmann Hassan for their help in organizing the data from the experiments. Support from NSF Grant CMS-0527598 and the Wharton Risk Management and Decision Processes Center is gratefully acknowledged. The views expressed herein are those of the author(s) and do not necessarily reflect the views of the National Bureau of Economic Research. © 2007 by Howard Kunreuther, Gabriel Silvasi, Eric T. Bradlow, and Dylan Small. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source. Deterministic and Stochastic Prisoner's Dilemma Games: Experiments in Interdependent Security Howard Kunreuther, Gabriel Silvasi, Eric T. Bradlow, and Dylan Small NBER Technical Working Paper No. 341 August 2007 JEL No. C11,C12,C22,C23,C73,C91 ABSTRACT This paper examines experiments on interdependent security prisoner's dilemma games with repeated play. -
Stochastic Games with Hidden States∗
Stochastic Games with Hidden States¤ Yuichi Yamamoto† First Draft: March 29, 2014 This Version: January 14, 2015 Abstract This paper studies infinite-horizon stochastic games in which players ob- serve noisy public information about a hidden state each period. We find that if the game is connected, the limit feasible payoff set exists and is invariant to the initial prior about the state. Building on this invariance result, we pro- vide a recursive characterization of the equilibrium payoff set and establish the folk theorem. We also show that connectedness can be replaced with an even weaker condition, called asymptotic connectedness. Asymptotic con- nectedness is satisfied for generic signal distributions, if the state evolution is irreducible. Journal of Economic Literature Classification Numbers: C72, C73. Keywords: stochastic game, hidden state, connectedness, stochastic self- generation, folk theorem. ¤The author thanks Naoki Aizawa, Drew Fudenberg, Johannes Horner,¨ Atsushi Iwasaki, Michi- hiro Kandori, George Mailath, Takeaki Sunada, and Masatoshi Tsumagari for helpful conversa- tions, and seminar participants at various places. †Department of Economics, University of Pennsylvania. Email: [email protected] 1 1 Introduction When agents have a long-run relationship, underlying economic conditions may change over time. A leading example is a repeated Bertrand competition with stochastic demand shocks. Rotemberg and Saloner (1986) explore optimal col- lusive pricing when random demand shocks are i.i.d. each period. Haltiwanger and Harrington (1991), Kandori (1991), and Bagwell and Staiger (1997) further extend the analysis to the case in which demand fluctuations are cyclic or persis- tent. One of the crucial assumptions of these papers is that demand shocks are publicly observable before firms make their decisions in each period. -
Lecture 10: Learning in Games Ramesh Johari May 9, 2007
MS&E 336 Lecture 10: Learning in games Ramesh Johari May 9, 2007 This lecture introduces our study of learning in games. We first give a conceptual overview of the possible approaches to studying learning in repeated games; in particular, we distinguish be- tween approaches that use a Bayesian model of the opponents, vs. nonparametric or “model-free” approaches to playing against the opponents. Our investigation will focus almost entirely on the second class of models, where the main results are closely tied to the study of regret minimization in online regret. We introduce two notions of regret minimization, and also consider the corre- sponding equilibrium notions. (Note that we focus attention on repeated games primarily because learning results in stochastic games are significantly weaker.) Throughout the lecture we consider a finite N-player game, where each player i has a finite pure action set ; let , and let . We let denote a pure action for Ai A = Qi Ai A−i = Qj=6 i Aj ai player i, and let si ∈ ∆(Ai) denote a mixed action for player i. We will typically view si as a Ai vector in R , with si(ai) equal to the probability that player i places on ai. We let Πi(a) denote the payoff to player i when the composite pure action vector is a, and by an abuse of notation also let Πi(s) denote the expected payoff to player i when the composite mixed action vector is s. More q q a a generally, if is a joint probability distribution on the set A, we let Πi( )= Pa∈A q( )Π( ). -
On the Convergence of Fictitious Play Author(S): Vijay Krishna and Tomas Sjöström Source: Mathematics of Operations Research, Vol
On the Convergence of Fictitious Play Author(s): Vijay Krishna and Tomas Sjöström Source: Mathematics of Operations Research, Vol. 23, No. 2 (May, 1998), pp. 479-511 Published by: INFORMS Stable URL: http://www.jstor.org/stable/3690523 Accessed: 09/12/2010 03:21 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=informs. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Mathematics of Operations Research. http://www.jstor.org MATHEMATICS OF OPERATIONS RESEARCH Vol. 23, No. 2, May 1998 Printed in U.S.A. -
570: Minimax Sample Complexity for Turn-Based Stochastic Game
Minimax Sample Complexity for Turn-based Stochastic Game Qiwen Cui1 Lin F. Yang2 1School of Mathematical Sciences, Peking University 2Electrical and Computer Engineering Department, University of California, Los Angeles Abstract guarantees are rather rare due to complex interaction be- tween agents that makes the problem considerably harder than single agent reinforcement learning. This is also known The empirical success of multi-agent reinforce- as non-stationarity in MARL, which means when multi- ment learning is encouraging, while few theoret- ple agents alter their strategies based on samples collected ical guarantees have been revealed. In this work, from previous strategy, the system becomes non-stationary we prove that the plug-in solver approach, proba- for each agent and the improvement can not be guaranteed. bly the most natural reinforcement learning algo- One fundamental question in MBRL is that how to design rithm, achieves minimax sample complexity for efficient algorithms to overcome non-stationarity. turn-based stochastic game (TBSG). Specifically, we perform planning in an empirical TBSG by Two-players turn-based stochastic game (TBSG) is a two- utilizing a ‘simulator’ that allows sampling from agents generalization of Markov decision process (MDP), arbitrary state-action pair. We show that the em- where two agents choose actions in turn and one agent wants pirical Nash equilibrium strategy is an approxi- to maximize the total reward while the other wants to min- mate Nash equilibrium strategy in the true TBSG imize it. As a zero-sum game, TBSG is known to have and give both problem-dependent and problem- Nash equilibrium strategy [Shapley, 1953], which means independent bound. -
Discovery and Equilibrium in Games with Unawareness∗
Discovery and Equilibrium in Games with Unawareness∗ Burkhard C. Schippery April 20, 2018 Abstract Equilibrium notions for games with unawareness in the literature cannot be interpreted as steady-states of a learning process because players may discover novel actions during play. In this sense, many games with unawareness are \self-destroying" as a player's representa- tion of the game may change after playing it once. We define discovery processes where at each state there is an extensive-form game with unawareness that together with the players' play determines the transition to possibly another extensive-form game with unawareness in which players are now aware of actions that they have discovered. A discovery process is rationalizable if players play extensive-form rationalizable strategies in each game with un- awareness. We show that for any game with unawareness there is a rationalizable discovery process that leads to a self-confirming game that possesses a self-confirming equilibrium in extensive-form rationalizable strategies. This notion of equilibrium can be interpreted as steady-state of both a discovery and learning process. Keywords: Self-confirming equilibrium, conjectural equilibrium, extensive-form rational- izability, unawareness, extensive-form games, equilibrium, learning, discovery. JEL-Classifications: C72, D83. ∗I thank Aviad Heifetz, Byung Soo Lee, and seminar participants at UC Berkeley, the University of Toronto, the Barcelona workshop on Limited Reasoning and Cognition, SAET 2015, CSLI 2016, TARK 2017, and the Virginia Tech Workshop on Advances in Decision Theory 2018 for helpful discussions. An abbreviated earlier version appeared in the online proceedings of TARK 2017 under the title “Self-Confirming Games: Unawareness, Discovery, and Equilibrium". -
Information and Beliefs in a Repeated Normal-Form Game
Beliefs inaRepeated Forschungsgemeinschaft through through Forschungsgemeinschaft SFB 649DiscussionPaper2008-026 Normal-form Game Information and This research was supported by the Deutsche the Deutsche by was supported This research * TechnischeUniversitätBerlin,Germany SFB 649, Humboldt-Universität zu Berlin zu SFB 649,Humboldt-Universität Dorothea Kübler* Spandauer Straße 1,D-10178 Berlin Spandauer Dietmar Fehr* http://sfb649.wiwi.hu-berlin.de http://sfb649.wiwi.hu-berlin.de David Danz* ISSN 1860-5664 the SFB 649 "Economic Risk". "Economic the SFB649 SFB 6 4 9 E C O N O M I C R I S K B E R L I N Information and Beliefs in a Repeated Normal-form Game Dietmar Fehr Dorothea Kübler Technische Universität Berlin Technische Universität Berlin & IZA David Danz Technische Universität Berlin March 29, 2008 Abstract We study beliefs and choices in a repeated normal-form game. In addition to a baseline treatment with common knowledge of the game structure and feedback about choices in the previous period, we run treatments (i) without feedback about previous play, (ii) with no infor- mation about the opponent’spayo¤s and (iii) with random matching. Using Stahl and Wilson’s (1995) model of limited strategic reasoning, we classify behavior with regard to its strategic sophistication and consider its development over time. We use belief statements to track the consistency of subjects’actions and beliefs as well as the accuracy of their beliefs (relative to the opponent’strue choice) over time. In the baseline treatment we observe more sophisticated play as well as more consistent and more accurate beliefs over time. We isolate feedback as the main driving force of such learning.