Modeling Replicator Dynamics in Stochastic Games Using Markov Chain Method
Total Page:16
File Type:pdf, Size:1020Kb
Main Track AAMAS 2021, May 3-7, 2021, Online Modeling Replicator Dynamics in Stochastic Games Using Markov Chain Method Chuang Deng Zhihai Rong Department of Automation School of Computer Science and Engineering Shanghai Jiao Tong University University of Electronic Science and Technology of China [email protected] [email protected] Lin Wang Xiaofan Wang Department of Automation School of Mechatronic Engineering and Automation Shanghai Jiao Tong University Shanghai University [email protected] [email protected] ABSTRACT For gaining qualitative and detailed insights into the learning In stochastic games, individuals need to make decisions in multi- process in repeated games, many algorithms and models have been ple states and transitions between states influence the dynamics proposed to build the bridge between evolutionary game theory of strategies significantly. In this work, by describing the dynamic and multi-agent reinforcement learning algorithms, such as FAQ- process in stochastic game as a Markov chain and utilizing the tran- learning [10], regret minimization [11], continuous strategy repli- sition matrix, we introduce a new method, named state-transition cator dynamics [5], IGA [21], IGA-WOLF [2] and so on. Besides replicator dynamics, to obtain the replicator dynamics of a stochas- the assumption that individuals are in the same state, how to de- tic game. Based on our proposed model, we can gain qualitative scribe strategy dynamics with the reciprocity of multiple states and and detailed insights into the influence of transition probabilities separate strategies also raises the attention of researchers. Vrancx on the dynamics of strategies. We illustrate that a set of unbalanced et al. first investigate the problem and they combine replicator transition probabilities can help players to overcome the social dynamics and piecewise dynamics to model the learning behav- dilemmas and lead to mutual cooperation in a cooperation back ior of agents in stochastic games [25]. Hennes et al. propose a state, even if the stochastic game has the same social dilemmas in method called state-coupled replicator dynamics which couples each state. Moreover, we also present that a set of specifically de- the multiple states directly [8]. The derived algorithms from the signed transition probabilities can fix the expected payoffs of one combination of evolutionary game theory and multi-agent rein- player and make him lose the motivation to update his strategies forcement learning algorithms have been applied in multiple re- in the stochastic game. search fields, including femtocell power allocation [27], interdo- main routing price setting [24], design of social agents [6] and de- KEYWORDS sign of new multi-agent reinforcement learning algorithms [7]. In this paper, by describing the dynamic process in stochastic Multi-agent Learning; Evolutionary Game Theory; Replicator Dy- game as a Markov chain [9, 22], we propose a new method, named namics; Stochastic Games state-transition replicator dynamics, to derive a set of ordinary dif- ACM Reference Format: ferential equations to model the learning behavior of players in sto- Chuang Deng, Zhihai Rong, Lin Wang, and Xiaofan Wang. 2021. Modeling chastic games. Moreover, based on the transition matrix of Markov Replicator Dynamics in Stochastic Games Using Markov Chain Method. In chain and the derived replicator dynamics system, we demonstrate Proc. of the 20th International Conference on Autonomous Agents and Multi- that, though players face the same social dilemmas in all states, a agent Systems (AAMAS 2021), Online, May 3–7, 2021, IFAAMAS, 9 pages. set of unbalanced transition probabilities between states can lead 1 INTRODUCTION to cooperation in the cooperation back state where players have a high probability to be in after mutual cooperation. Besides, we also The tragedy of the commons leads to the questions how todrive point out that in stochastic games, there exist several sets of spe- and reinforce cooperative behaviors in social and economic sys- cific transition probabilities which can control the expected pay- tems [4, 17]. These questions have been extensively explored by offs of players. For a stateless matrix game, Press et al. [15] prove analyzing stylized uncooperative game theory models with some that if players’ decisions depend on the previous actions of players feedback mechanisms using evolutionary game theory [14, 16, 18– participating in games, one player can unilaterally sets the payoff 20]. Besides the evolutionary game theory, another method used in of the other player. In stochastic games, transition probabilities can studying the dynamics of strategies is multi-agent reinforcement take this work. Utilizing the analyzing method in stateless matrix learning [1, 3, 12]. These works mainly derive and design multi- game [13, 15, 23] and state-transition replicator dynamics model, agent reinforcement learning algorithms which can lead strategies we present that a set of specifically derived transition probabilities of players to the Nash equilibrium existing in the game. can fix the expected payoffs of one player and this player losesthe Proc. of the 20th International Conference on Autonomous Agents and Multiagent Sys- motivation to update his strategies in this stochastic game. tems (AAMAS 2021), U. Endriss, A. Nowé, F. Dignum, A. Lomuscio (eds.), May 3–7, 2021, Online. © 2021 International Foundation for Autonomous Agents and Multiagent Sys- tems (www.ifaamas.org). All rights reserved. 420 Main Track AAMAS 2021, May 3-7, 2021, Online " �!! (� ,�) 2 BACKGROUND state 1 Agent Q state 2 Agent Q coop. defect defect 2.1 Stochastic games " coop. �!" (� ,�) " " P " " $ $ Stochastic games describe how much rewards players obtain based coop. � , � � ,� P $ $ !! !! !# !# # coop. �!!, �!! �!# ,�!# �!" (� ,�) on their strategies in multiple states. The concept of state in sto- gent " " " " gent $ $ $ $ A defect � ,� � ,� A chastic games refers to the environment where players interact #! #! ## ## defect �#!,�#! �##,�## # with each other. The current state players being in and the joint �!! (� ,�) action players taking not only determine the immediate payoffs players can receive in this round, but also the state they will stay Figure 1: The framework of a two-state two-player two- in the next round. To define a stochastic game, we need to spec- action stochastic game. There are two players, % and &. In ify the set of players, the set of states, the set of actions that each every round, they can be in one state, B1 or B2. In each state, player can take in each state, a transition function that describes each player has a corresponding strategy and payoff matrix. how states change over time and the payoff function which de- The state in the next round depends on the current state, on scribes how players’ actions in each state affect the players’ pay- the players’ joint action and on chance, IB0 ¹B, aº. offs. We follow the definition in [8] to give a definition of stochastic games. The game퐺 = h=, Y, 훀, z, g, c1, ··· , c=i is a stochastic game with : B B = players and : states. In each state B 2 Y = ¹B1, ··· ,B º, every 퐴 (퐵 ) determines the payoffs for player % (&) in state B. Player % g ¹B, aº 0B & g ¹B, aº 1B player 8 has an action set 훀8 ¹Bº and strategy c8 ¹Bº. In each round, receives % = 0% ,0& while player gets & = 0% ,0& every player 8 stays in one sate B and chooses an action 08 from for a given joint action a = ¹0% , 0& º. Player % has strategies p1 = action set 훀8 ¹Bº according to strategy c8 ¹Bº. The payoff function ¹?1, − ?1ºT p2 ¹?2, − ?2ºT B1 B2 Î= = 1 and = 1 in state and , respec- g ¹B, aº 훀 ¹Bº 7! ' a ¹0 , ··· , 0 º 1 2 : 8=1 8 maps the joint action = 1 = tively. Player & also has two strategies, q and q . Since there are to an immediateÎ payoff value for each player. The transition func- only two states, the transition probabilities have the relationship, I¹B, aº = 훀 ¹Bº 7! Δ:−1 1 1 2 2 tion : 8=1 8 determines the probabilistic IB1 ¹B , aº = 1 − IB2 ¹B , aº and IB2 ¹B , aº = 1 − IB1 ¹B , aº. The :− state transition, where Δ 1 is the ¹: − 1º-simplex and IB0 ¹B, aº is framework of a two-state two-player two-action stochastic game 0 the transition probability from state B to B under joint action a. is shown in Figure 1. In this work, we also follow the restriction proposed in [8] that all states B 2 Y are in an ergodic set. This restriction ensures that 2.3 Learning automata the game has no absorbing states. In this section, we give a description on how player can update his strategy with multi-agent reinforcement learning method [26]. A 2.2 Two-state two-player two-action stochastic player can be seen as a learning automata, who updates his strat- games egy by monitoring the reinforcement signal resulted from the ac- Here, we introduce the simplest version of stochastic games: two- tion he has chosen in this round. An automata wants to learn the state two-player two-action stochastic game. optimal strategy and maximizes the expected payoffs. In games containing two players, player % (&) has a strategy p In this paper, we focus on finite action-set learning automata (q). Each player chooses between only two actions, cooperation (2) (FALA) [8]. At the beginning of a round, every player draws a and defection (3). Thus, we can use a probability to fully define one random action 0¹Cº from his action set according to his strategy player’s strategy. For player %, the parameter ? means the proba- 0 ¹Cº. Based on the action 0¹Cº, the environment responds with a re- bility of cooperation and p = ¹?, 1 − ?ºT. Similarly, for player &, ward g ¹Cº. The automata uses this reward to update his strategy to q = ¹@, 1 − @ºT. The payoff function can be represented byabi- 0 ¹C ¸1º.