Stochastic Games
Total Page:16
File Type:pdf, Size:1020Kb
PERSPECTIVE PERSPECTIVE Stochastic games Eilon Solana,1 and Nicolas Vieilleb,1 aDepartment of Statistics and Operations Research, School of Mathematical Sciences, Tel Aviv University, Tel Aviv 6997800, Israel; and bDepartment of Economics and Decision Sciences, HEC Paris, 78 351 Jouy-en-Josas, France Edited by Jose A. Scheinkman, Columbia University, New York, NY, and approved September 22, 2015 (received for review July 9, 2015) In 1953, Lloyd Shapley contributed his paper “Stochastic games” to PNAS. In this paper, he defined the model of stochastic games, which were the first general dynamic model of a game to be defined, and proved that it admits a stationary equilibrium. In this Perspective, we summarize the historical context and the impact of Shapley’s contribution. game theory | stochastic games The 1950s were the decade in which game available actions. The collection of actions the first t stages, as well as the actions that the theory was shaped. After von Neumann and that the players choose, together with the players played in the first t − 1stages.A Morgenstern’s Theory of Games and Economic current state, determine the stage payoff that strategy of a player is a prescription how Behavior (1) was published in 1944, a group of each player receives, as well as a probability to play the game; that is, a function that assigns young and bright researchers started working distribution according to which the new state to every finite history an action to play should on game theory, each of whom published pa- that the play will visit. that history occur. A behavior strategy of a pers that opened new areas. In 1950, John Stochastic games extend the model of stra- player is a function that assigns to every finite Nash published two papers, one on the con- tegic-form games, which is attributable to von history a lottery over the set of available actions. cept of Nash equilibrium (2) and the other on Neumann (8), to dynamic situations in which A zero-sum game is defined to have a the bargaining problem (3). In 1953, Lloyd the environment changes in response to the “value” v if (i) player 1 has a strategy (which Shapley published two papers, one on stochas- players’ choices. They also extend the model is then said to be “optimal”), which ensures tic games (4) and the other on the Shapley of Markov decision problems, which was de- that his expected overall payoff over time value for coalitional games (5). In 1957, veloped by several researchers at the RAND does not fall below v,nomatterwhatisthe Harold Kuhn (6) provided a formulation Corporation in 1949–1952, to competitive sit- strategy followed by player 2, and (ii)ifthe for extensive-form games, and Duncan Luce uations with more than one decision maker. symmetric property holds when exchanging and Howard Raiffa published their book The complexity of stochastic games stems Games and Decisions: Introduction and Critical the roles of the two players. Shapley proved fromthefactthatthechoicesmadebythe the existence of a value. Because the param- Survey (7). These researchers and others, like players have two, sometimes contradictory, eters that define the game are independent Kenneth Arrow, Richard Bellman, and David effects. First, together with the current state, of time, the situation that the players face if Blackwell, interacted, worked and played to- ’ the players actions determine the immediate today the play is in a certain state is the same gether, exchanged ideas, and developed the payoff that each player receives. Second, the situation they would face tomorrow if to- theory of dynamic games as we know it today. current state and the players’ actions influence While in graduate school at Princeton, morrow the play is in that state. In particular, the choice of the new state, which determines one expects to have optimal strategies that Shapley befriended Martin Shubik and John the potential of future payoffs. In particular, Nash. The former became a well-known are stationary Markov, that is, they depend when choosing his actions, each player has to only on the current state of the game. Shapley economist, and the latter defined the concept balance these forces, a decision that may often of Nash equilibrium, the most prevalent proved that indeed such optimal strategies be difficult. Although this dichotomy is also solution concept in game theory and eco- exist, and characterized the value as the present in one-player sequential decision nomictheorytodate.In2013,Shapleywas unique fixed point of a nonlinear functional problems, the presence of additional players awarded the Nobel Prize in economics, 8 y operator—a two-player version of the dynamic who maximize their own goals adds com- after Nash was awarded the prize. programming principle. plexity to the analysis of the situation. The existence of stationary Markov opti- ’ 1. Stochastic Games Shapley s paper in PNAS introduced the mal strategies implies that, to play well, a Stochastic games model dynamic interactions model of stochastic games with positive player needs to know only the current state. in which the environment changes in response stopping probabilities, assuming a finite set In particular, the value of the game does not ’ ’ “ to players behavior. In Shapley swords, In a of states, and a finite set of possible actions change if players receive partial information stochastic game the play proceeds by steps for each player in each state. In this model, ’ from position to position, according to tran- thecurrentstateandtheplayers actions Author contributions: E.S. and N.V. wrote the paper. sition probabilities controlled jointly by the determine the probability that the game will The authors declare no conflict of interest. two players” (4). terminate once the current stage is over. His This article is a PNAS Direct Submission. A stochastic game is played by a set of paper deals with two-player zero-sum games, This article is part of the special series of PNAS 100th Anniversary players. In each stage of the game, the play is so that the gain of player 1 is always equal to articles to commemorate exceptional research published in PNAS ’ over the last century. See the companion article, “Stochastic in a given state (or position, in Shapley slan- the loss of player 2. ” t Games on page 1095 in issue 10 of volume 39. guage), taken from a set of states, and every A history of length in a stochastic game is 1To whom correspondence may be addressed. Email: eilons@post. player chooses an action from a set of the sequence of states that the game visited in tau.ac.il or [email protected]. www.pnas.org/cgi/doi/10.1073/pnas.1513508112 PNAS | November 10, 2015 | vol. 112 | no. 45 | 13743–13746 Downloaded by guest on October 1, 2021 on each other’s actions, and/or if they forget initial wealth, repeatedly play a zero-sum game on algebraic properties of the value of the previously visited states. that affects their wealth, until one player is discounted game, as a function of the dis- Inthefollowingparagraphs,webriefly ruined and declared loser. Everett (18) studied count factor, proven by Bewley and Kohlberg describe some of the directions in which the the so-called “recursive games,” in which ter- (22). One can view the monetary policy of theory has developed following Shapley’s mination probabilities are endogenous, and a the central bank, that of changing the interest groundbreaking paper. payoff is only received upon termination of the rate from time to time, as an implementation game, as a function of the terminating state. of this type of strategy. 2. Discounted Stochastic Games Gillette (19) introduced a deceptively simple- The theory of the non–zero-sum case has ’ Shapley s model is equivalent to one in which looking game, inspired by playground prac- flourished in the last two decades but has not players discount their future payoffs accord- tice, whose analysis resisted all attempts until been settled yet. A collection of strategies, one ing to a discount factor that depends on the Blackwell and Ferguson (20) provided its for each player, is a uniform «-equilibrium if current state and on the players’ actions. The solution. bydeviatingnoplayercanprofitmorethan«, game is called “discounted” if all stopping In a different but related framework, provided the game is sufficiently long or the probabilities equal the same constant, and one Aumann, Maschler, and Stearns (ref. 21, discount factor sufficiently close to 1, and if minusthisconstantiscalledthe“discount reprinted in 1995; see pp. 130–139) conceptu- the payoff vector is almost independent of the factor.” Models of discounted stochastic games alized a notion of a uniform solution to duration of the game. Existence of uniform are prevalent in economics, where the discount infinite-horizon games, in which payoffs are «-equilibria has been established in few cases: factor has a clear economic interpretation. received in each round. This approach views Vieille (25, 26) for two-player games, Solan A large literature followed, relaxing both the the infinite-horizon game as an idealized (27) for three-player games in which the state zero-sum and the finiteness assumptions of model for long-run interactions, in which the of the game changes at most once, and Solan Shapley’spaper.Innon–zero-sum games, a duration of the game is sufficiently long and/or and Vieille (28), Flesch et al. (29), and Simon collection of strategies, one for each player, is a the discount factor is sufficiently close to 1, and (30) in other classes of stochastic games. The “(Nash) equilibrium” if no player can profit by both are not perfectly known. It aims at pro- question in its most generality remains open. deviating from his strategy, assuming all other viding solutions whose optimality properties A weaker concept than Nash equilibrium players follow their prescribed strategies.