Downloaded by guest on September 27, 2021 www.pnas.org/cgi/doi/10.1073/pnas.1908643116 tcatcgames stochastic games. stochastic a competitive provide the of we value contribu- paper, of the present Both for limit the 0. formula In the to tractable PNAS. goes in to published rate equal were discount tions is the as which values value, discounted the robust a concept, Abraham be admit games solution a and to stochastic have competitive Mertens game Jean-Franc¸ois that games proved a 1982, Neyman stochastic In of competitive value. model games, that discounted dynamic stochastic proved general of and 2019) first model 21, defined, May the the review for defined were (received 2019 Shapley which 12, Lloyd November approved 1953, and Israel, In Jerusalem, University, Hebrew The Aumann, J. Robert by Edited game stochastic a a of Attia value Luc the for formula A oml o the for formula a 2 Paper. this of Outline B. of model the addi- games. into stochastic an insights bring also qualitative opening may and approach quantitative While our additional computations, contribution. faster present for path the tional which y, in 40 nearly settled for problem is open major a was games stochastic players the to worth patient. is sufficiently game are they the when value, playing the what on concept, represents Building solution robust which 0. a compet- to finite admit that games goes proved stochastic 6) rate itive discounted (5, discount Neyman the and the Mertens that as result, this proved limit posi- (4) a constant the admit Kohlberg a to values and at worth Bewley discounted is rate. are game tive rewards discounted the future a playing when have up what players games add represents these players that which the proved value, of (1) rewards Shapley stage zero. the to where states many and finitely actions, with games and stochastic 2-player is, that games, impact the and contribution. context seminal historical and Shapley’s the Solan of of to summary readers a the for com- refer the (3) and We Vieille in fields. mathematicians other by developed in used scientists and are puter used games stochastic were addi- of that In study tools science. operations computer mathematical in and tion, , studied biology, including been evolutionary rewards. have research, disciplines, future present applications high scientific the ensure their several in will and that rewards games states high Stochastic reach getting to trying between and at trade-off state confronted a the typically for are for with players reward probability the Consequently, stage the stage. a next determine the generate they state. They and current player, effects: the each at 2 them have to choices available Their are which actions choose decision one than more with decision maker. the Markov situations of which competitive the model to in the to problems extend response situations also in They is dynamic choices. changes which players’ to state) games, the (2), stochas- (henceforth, strategic-form Neumann 1953, environment of von in model to (1) the due Shapley extend games by tic Introduced games. dynamic Motivation. A. Introduction 1. Universit eted Math de Centre ttsorrslso nt opttv tcatcgms namely games, stochastic competitive finite on results our states idn rcal oml o h au ffiiecompetitive finite of value the for formula tractable a Finding stochastic competitive finite with deals paper present The players the stage, each At stages. in proceed games Stochastic ai-apie ai cecse ete,706Prs France Paris, 75016 Lettres, et Sciences Paris Paris-Dauphine, e ´ a,1 n iulOliu-Barton Miquel and mtqe Appliqu ematiques ´ tcatcgmsaetefis eea oe of model general first the are games Stochastic | eetdgames repeated dsone aus(rvdin (proved values λ-discounted hsppri raie sfollows: as organized is paper This ees, | ´ yai programming dynamic cl oyehiu,918Plieu rne and France; Palaiseau, 91128 Polytechnique, Ecole ´ b,1,2 ,and 3), Section Section is ulse eebr1,2019. 16, December published First 2 1 ulse ne the under Published Submission. Direct PNAS a is article This interest.y competing no declare authors paper.y The the wrote M.O.-B. and L.A. contributions: Author element an is 1 player for optimal An value. a nti ae ednt hscmo uniyby quantity common strategies. this Optimal denote we case, this In set over finite ties any For integers. Notation. in described are results 2D. Our Section games. stochastic In competitive 2B). finite (Section 2C and games 2A) stochastic tion (Section competitive games finite zero-sum about 2-player well- and about definitions results some recall known we precisely, results our state To Results Main and Context 2. formulas. 2 these 6 of Section tractability and implications in algorithmic (proved the value the for formula a game receives first 1 the simultaneously, and chooses Independently player follows: as played is and respectively, 2, player where Definition. games. zero-sum zero- henceforth 2-player games, about sum facts and definitions well-known some Games. recall Zero-Sum on Preliminaries A. by cardinality owo orsodnemyb drse.Eal miquel.oliu.barton@normalesup. Email: addressed. be org.y may correspondence whom work.y To this to equally contributed M.O.-B. and L.A. rbe hc eandusle o ery4 y. 40 nearly for open unsolved major remained a finite which settles problem of result value This the games. for stochastic formula stochastic competitive stochastic tractable of a of theory providing the by study to games contributes in the paper scientists This computer in fields. and other mathematicians developed by tools used and are mathematical games used addition, were In science. that biol- computer evolutionary and research, ogy, operations to disciplines, its economics, scientific and response several including games in in studied stochastic been of have changes theory applications inter- environment The dynamic behavior. the players’ model the Memorial to which 1953 Nobel in in the actions Shapley by Lloyd winner introduced Prize were games Stochastic Significance (S S egv re vriwo h eeatltrtr on literature relevant the of overview brief a give we , and PNAS b ocue ihrmrsadextensions. and remarks with concludes T E hogotti paper, this Throughout eted ehrhse Math en Recherches de Centre ρ(s eosmgm sdsrbdb triplet a by described is game zero-sum A , ρ) by T |E , | s NSlicense.y PNAS t a au whenever value a has r h eso osbesrtge o lyr1and 1 player for strategies possible of sets the are sup ∆(E s ) |. ∈ ∈S eebr2,2019 26, December n lyr2receives 2 player and S Let t inf n h eodpae chooses player second the and ∈T = ) (S ρ(s {f , ρ , : T t E : E y inf = ) , S ρ) → ednt h e fprobabili- of set the denote we × t eazr-u aewihhas which game zero-sum a be | 0 1] [0, ∈T T N eto 4 Section o.116 vol. h i fti eto sto is section this of aim The mtqe el D la de ematiques → ´ eoe h e fpositive of set the denotes sup s ∈S | R P −ρ(s ρ(s sapyf ucin It function. payoff a is e | ∈E ). , o 52 no. eto 5 Section t , ). f t h zero-sum The ). (e val 1 = ) t ecision, ρ. ´ | ∈ 26435–26443 (S T } describes s ∗ Player . n its and , ∈ T S Sec- , ρ), so

APPLIED ECONOMIC MATHEMATICS SCIENCES that ρ(s∗, t) ≥ val ρ for all t ∈ T . Similarly, t ∗ ∈ T is an optimal Strategies. A behavioral strategy, henceforth a strategy, is a deci- strategy for player 2 if ρ(s, t ∗) ≤ val ρ for all s ∈ S. sion rule from the set of possible observations of a player to The value operator. The following properties are well known: the set of probabilities over the set of the player’s actions. For- mally, a strategy for player 1 is a sequence of mappings σ = i) Minmax theorem: Let (S, T , ρ) be a zero-sum game. Sup- m−1 (σm ) , where σm :(K × I × J ) × K → ∆(I ). Similarly, a pose that S and T are 2 compact subsets of some topological m≥1 strategy for player 2 is a sequence of mappings τ = (τm )m≥1, vector space, ρ is a continuous function, the map s 7→ ρ(s, t) m−1 where τm :(K × I × J ) × K → ∆(J ). The sets of strategies is concave for all t ∈ T , and the map t 7→ ρ(s, t) is convex for are denoted, respectively, by Σ and T . all s ∈ S. Then (S, T , ρ) has a value and both players have The expected payoff. By the Kolmogorov extension theorem, optimal strategies. together with an initial state k and the transition function q, ii) Monotonicity: Suppose that (S, T , ρ) and (S, T , ν) have a any pair of strategies (σ, τ) ∈ Σ × T induces a unique proba- value, and ρ(s, t) ≤ ν(s, t) holds for all (s, t) ∈ S × T . Then k N val ρ ≤ val ν. bility Pσ,τ over the sets of plays (K × I × J ) on the sigma algebra generated by the cylinders. Hence, to any pair of strate- Matrix games. M = k In the sequel, we identify every real matrix gies (σ, τ) ∈ Σ × T corresponds a unique payoff γλ(σ, τ) in the (ma,b ) of size p × q with the zero-sum game (SM , TM , ρM ), discounted game (K , I , J , g, q, k, λ), where SM = ∆({1, ... , p}), TM = ∆({1, ... , q}) and where k k hX m−1 i γλ(σ, τ) := σ,τ λ(1 − λ) g(km , im , jm ) , p q E m≥1 X X ρM (s, t) = s(a) ma,b t(b) ∀(s, t) ∈ SM × TM . k a=1 b=1 where Eσ,τ denotes the expectation with respect to the probabil- k ity Pσ,τ . The value of the matrix M , denoted by val M , is the value Stationary strategies. A stationary strategy is a strategy that of (SM , TM , ρM ) which exists by the minmax theorem. The depends only on the current state. Thus, x : K → ∆(I ) is a following properties are well known: stationary strategy for player 1 while y : K → ∆(J ) is a sta- tionary strategy for player 2. The sets of stationary strategies iii) Continuity: Suppose that M (t) is a matrix with entries that are ∆(I )n and ∆(J )n , respectively. A pure stationary strategy depend continuously on some parameter t ∈ R. Then the is a stationary strategy that is deterministic. The sets of pure map t 7→ val M (t) is continuous. stationary strategies are I n and J n , respectively, and we refer iv) A formula for the value: For any matrix M , there exists to pure stationary strategies with the signs in boldface type, Mˆ M val M = det Mˆ i ∈ I n and j ∈ J n . a square submatrix of so that ϕ(Mˆ ) , where A useful expression. Suppose that both players use station- ϕ(Mˆ ) Mˆ denotes the sum of all of the cofactors of , with the x y ˆ ˆ ary strategies and in the discounted stochastic game convention that ϕ(M ) = 1 if M is of size 1 × 1. (K , I , J , g, q, k, λ) for some λ ∈ (0, 1]. The evolution of the state then follows a Markov chain, and the stage rewards depend Comments. Property i) is taken from Sion (7), a generalization n×n n of von Neumann’s (2) minmax theorem, while property iv) was only on the current state. Let Q(x, y) ∈ R and g(x, y) ∈ R denote, respectively, the corresponding transition matrix and the established by Shapley and Snow (8). The other 2 properties are 0 straightforward. vector of expected rewards. Formally, for all 1 ≤ `, ` ≤ n,

`,`0 X ` ` 0 B. Stochastic Games. We present now the standard model of finite Q (x, y) = x (i)y (j )q(` | `, i, j ) [2.1] competitive stochastic games, henceforth stochastic games for (i,j )∈I ×J simplicity. We refer the reader to Sorin’s book (ref. 9, chap. 5) ` X ` ` and to Renault’s notes (10) for a more detailed presentation of g (x, y) = x (i)y (j )g(`, i, j ). [2.2] stochastic games. (i,j )∈I ×J Definition. A stochastic game is described by a tuple (K , I , J , 1 n n Let γ (x, y) = (γ (x, y), ... , γ (x, y)) ∈ . Then Q(x, y), g, q, k), where K = {1, ... , n} is a finite set of states, for some λ λ λ R g(x, y), and γλ(x, y) satisfy the relations n ∈ N; I and J are the finite action sets of player 1 and player 2, respectively; g : K × I × J → R is a reward function to player 1; X m−1 m−1 γλ(x, y) = λ(1 − λ) Q (x, y)g(x, y) q : K × I × J → ∆(K ) is a transition function; and 1 ≤ k ≤ n is m≥1

an initial state. = λg(x, y) + (1 − λ)Q(x, y)γλ(x, y). The game proceeds in stages as follows: At each stage m ≥ 1, both players are informed of the current state km ∈ K , Let Id denote the identity matrix of size n. The matrix Id − (1 − where k1 = k. Then, independently and simultaneously, player λ)Q(x, y) is invertible because Q(x, y) is a stochastic matrix. −1 1 chooses an action im ∈ I and player 2 chooses an action jm ∈ Consequently, γλ(x, y) = λ(Id − (1 − λ)Q(x, y)) g(x, y) and, J . The pair (im , jm ) is then observed by both players, from by Cramer’s rule, which they can infer the stage reward g(km , im , jm ). A new state k km+1 is then chosen according to the probability distribution k dλ (x, y) γ (x, y) = , [2.3] q(k , i , j ) m + 1 λ 0 m m m , and the game proceeds to stage . dλ(x, y) Discounted stochastic games. For any discount rate λ ∈ (0, 1], 0 k we denote by (K , I , J , g, q, k, λ) the stochastic game (K , I , J , g, where dλ(x, y) = det(Id − (1 − λ)Q(x, y)) and where dλ (x, y) is q, k) where player 1 maximizes, in expectation, the normalized the determinant of the n × n matrix obtained by replacing the λ-discounted sum of rewards kth column of Id − (1 − λ)Q(x, y) with λg(x, y). The discounted values. The discounted stochastic game (K , I , X m−1 k λ(1 − λ) g(km , im , jm ), J , g, q, k, λ) and the zero-sum game (Σ, T , γλ) are equal by m≥1 construction. Thus, the discounted stochastic game has a value whenever while player 2 minimizes this amount. λ k k k In the following, the discount rate and the initial state are sup inf γλ(σ, τ) = inf sup γλ(σ, τ). considered as parameters, while (K , I , J , g, q) is fixed. σ∈Σ τ∈T τ∈T σ∈Σ

26436 | www.pnas.org/cgi/doi/10.1073/pnas.1908643116 Attia and Oliu-Barton Downloaded by guest on September 27, 2021 Downloaded by guest on September 27, 2021 nti ae h au sdntdby denoted is value the case, this In the h olwn euti u oSaly(1): Shapley to due is result following The tnsfrtedge fiptec ftepaes ntesense inter- the can in one players, Alternatively, the pret discounted. of are rewards impatience future of that degree the for stands Remark. operator by Shapley defined the values of point of fixed vector The h olwn euti u oMresadNya (5): Neyman and Mertens to due is result following The all for hold inequalities a in (1). handled . be by The can noted actions already players’ as on way, the depend similar on that and probabilities state stopping current of the case general more The icutdsohsi game stochastic discounted and values. games discounted stochastic The competitive literature finite brief of algorithms. our studied related theory restrict been the We have to disciplines. applications survey scientific its several and games in stochastic of ory Art. the of State C. (σ most value a at is least amount at this is any first the game for in the stage that per of reward guarantee average the can of expectation 1 player that exists there ti n Oliu-Barton and Attia values of vector Further- strategies. the stationary more, optimal have players both that and n h emtyo ope nltcvreis nte proof Another varieties. analytic values complex discounted of the geometry of the and characterization to goes Shapley’s rate the discount using for the zero, as proof values alternative discounted an the gave of convergence (11) Vrieze and Filar, observation Connell, convergence. the of on is proofs relies Alternative actions value rewards. discounted past stage the the the unlike of of of existence that, observation the noting the irrelevant, worth where games, is stochastic It 1980s values. early discounted Mertens the game zero. value in stochastic result a to every this tends that discounted strengthened establishing rate the 6) by discount that (5, Neyman 1976 the and in as proved converge (4) algebraic values Kohlberg real from and result theorem, Bewley elimination deep Tarski–Seidenberg a so-called on the geometry, and values discounted the value. the of Existence operator. explicit an of point vr tcatcgame stochastic Every vi) vr icutdsohsi game stochastic discounted Every v) ε , au,adbt lyr aeotmlsainr strategies. size stationary of optimal have each players For both and value, v dsone au ftesohsi game stochastic the of value λ-discounted G τ λ k ε λ,u ` ) lim = stepoaiiyta h aesosatreeystage. every after stops game the that probability the as ∈ v v := ntemdlo tcatcgms h icutrate discount the games, stochastic of model the In k k Σ hnfreach for then , | Φ h tcatcgame stochastic The I  T × n httevlecicdswt h ii fthe of limit the with coincides value the that and v λ→0 ` | × | 1 λg k (λ ≤ ∈ , (`, J γ v γ ` u R uhta,frsome for that, such λ λ k λ k |: k ≤ := ) ic t nrdcinb hpe 1,tethe- the (1), Shapley by introduction its Since i (σ . (σ uhta o any for that such , n j ε , (1 + ) , and val τ uligo hpe’ hrceiainof characterization Shapley’s on Building v λ τ ε n15,Saly()poe htevery that proved (1) Shapley 1953, In v v k ) ) ∈ λ k G ≥ ≤ − > ε u (K v (v = − λ,u (0, (K ` λ − v v n lyr2cngaatethat guarantee can 2 player and ε, ∈ tflosta ftegm has game the if that follows It ε. (v = k k , o all for , 0 λ , λ R I λ − + I 1 ) 0 hr xssapi fstrategies of pair a exists there , n , , (K ): X J ε ε osdrtefloigmatrix following the consider , . . . J λ 1 , v ntelt 90,Szczechla, 1990s, late the In , , , g Φ(λ, λ . . . g k ` n , I > ε 0 , , 1 v =1 n sotnrfre oas to referred often is and , q λ (K q λ n J ∀τ ∀σ ≤ , , 0 , ) v k , q 0 k ∈ · ` λ , g n ) (` T ∈ ∈ , : ) I hr exists there ste h unique the then is ≤ (K , ) 0 1] (0, λ a value a has M 0 , q Σ. steuiu fixed unique the is ) R J | n , 0 , `, disavalue a admits k n (K , I and ≤ g ) i → h following the , , , , J M a au if value a has , q j I R , )u , u , g k ≤ n J ∈ , ` , hc is which , M 0 q , +∞ λ) M  R g v , i k n 0 , ,j k stages a a has q ) and , . . such , has the k v ). λ k 2)b infiatyrdcn h eedneo h number the on dependence present the the reducing in al. significantly et obtained Hansen by of value algorithm (29) the the Based improved (30) exponential. of Oliu-Barton any doubly paper, characterization and num- for nonexplicit the the actions, both on on of is dependence states the number of However, the ber states. in of number polynomial algo- fixed an is providing which by worst-case improvement rithm remarkable are actions. a them of achieved number of (29) the in All Kouck or 28. Hansen, iteration states Recently, of strategy and number or 27 the value in refs. and exponential as 24–26, refs. such theory as methods, first-order such the reals, fall for the algorithms procedures known of decision The categories: state.) 2 each at into depends action function player’s transition the one where on one is sim- game (A programming. stochastic linear ple or polynomial-time testing be with primality to as problems such shown solvable, important been poly- eventually several have (nondeterministic and property NP this co-NP, both and class is time) in the games nomial problem because stochastic open intriguing simple famous is of problem a This is science. time computer polynomial in computed recently be respectively. (23), obtained Vigeral were and Sorin games Algorithms. by absorbing and (22) of characteriza- Laraki the value by additional using the Two characterization for operator. a tions these Shapley’s provided that and of proved Kohlberg value derivative case. a particular have a includes games as which and Match (21) states Big between value Kolhberg transition the which one mid-1970s, whose in most games at the stochastic game is of there In class stochastic a state. games, a absorbing initial introduced of the the on example of depends value an the Match,” determined Soon (20) equation. “Big optimality Ferguson cost and Markov average Blackwell irreducible an after, of an spirit induces the strategies in chain), stationary of pair any the limit the adapting for characterization the problems, By a of obtained decision (19) mid-1960s. Markov Karp for and the (18) Hoffman Howard to by back developed tools go games value. stochastic the of Characterization to order equal of cardinality of stages consecutive of cumulated the of average sufficiently for is, (K That (17). Vigeral and conjec- small as Venel, property, Sorin, that payoff by dis- constant proved tured the the (16) of satisfy Ziliotto value games and the stochastic Oliu-Barton with Finally, Neyman coincides model. game. value crete stochastic their time the that continuous proved of in games and value stochastic the discounted to considered and (15) equal converge, is games limit stochastic the weighted-average the of values hr lyr1mxmzsi xetto xdwihe aver- namely weighted rewards, If fixed of sequence a the expectation of in age maximizes 1 consid- player games (14) stochastic Ziliotto where is, game. limit that stochastic games, the stochastic the and weighted-average of ered converge, value the clock to duration stochas- equal random the is of a values with the games expected infinity, tic the to that converges and stages of game number the throughout decreases remain- of stages number ing of expected the ran- number that the an Assuming a about receive stages. remaining players information with the carries stage, which games each signal at stochastic additional is, That studied clock. (13) duration dom Sorin games. and stochastic of Neyman value the concerning results additional many value. the of Robustness for theorem theory alternative systems. the Motzkin’s linear on on and based chains (12), Markov finite Oliu-Barton of by obtained recently was P , I m , n aro pia taeiso h icutdgame discounted the of strategies optimal of pair any λ, ≥1 J λ dsone ausi h reuil ae(hti,when is, (that case irreducible the in values -discounted , v g |θ k , m . p q hte h au fafiiesohsi aecan game stochastic finite a of value the Whether +1 , k PNAS , − λ) θ m p a h rpryta,i xetto,the expectation, in that, property the has | | ovre ozr o some for zero to converges eebr2,2019 26, December dsone u frwrso n set any on rewards of sum λ-discounted ´ ,Luizn itre,adTsigaridas and Miltersen, Lauritzen, y, h er 00t 08hv brought have 2018 to 2010 years The h rtrslso h au of value the on results first The | P o.116 vol. m 1/λ ≥1 θ sapproximately is m | p g o 52 no. > (k hnthe then 0, m , i m | , 26437 j m ).

APPLIED ECONOMIC MATHEMATICS SCIENCES of states to an explicit polynomial dependence on the number 3. A Formula for the Discounted Values of pure stationary strategies. Although not polynomial in the In this section we prove Theorem 1. In the sequel, we consider a number of states, this algorithm is the most efficient algorithm fixed discounted stochastic game (K , I , J , g, q, k, λ). The proof that is known today. is based on the following 4 properties: 0 n n D. Main Results. As already argued, the value is a very robust solu- 1) dλ(i, j) is positive for all (i, j) ∈ I × J . 0 k tion concept for stochastic games. Its existence was proved nearly 2) (x, y, z) 7→ dλ(x, y) − zdλ (x, y) is a multilinear map. k 40 y ago, and an explicit characterization has been missing since 3) z 7→ val Wλ (z) is a strictly decreasing real map. k k then. The main contribution of the present paper is to provide a 4) val Wλ (vλ ) = 0. tractable formula for the value of stochastic games. Indeed, Theorem 1 clearly follows from the last two. The exten- Our result relies on a different characterization of the dis- sion of this result to the more general framework of compact- counted values, which is obtained by reducing a discounted continuous stochastic games (that is, stochastic games with com- stochastic game with n states to n independent parameterized pact metric action spaces and continuous payoff and transition matrix games, one for each initial state. functions) proceeds along the same lines and is postponed to For the rest of this paper, 1 ≤ k ≤ n denotes a fixed initial Section 6B. state. The parameterized game that corresponds to k is simply obtained by linearizing the ratio in Eq. 2.3 for all pairs of pure Notation. We use the following notation: stationary strategies, as follows: • For any x = (x 1, ... , x n ) ∈ ∆(I )n we denote by xˆ ∈ ∆(I n ) k the element that corresponds to the direct product of the Definition D.1. For any z ∈ R , define the matrix Wλ (z) of size |I |n × |J |n by setting coordinates of x. Formally,

n Y ` ` 1 n n xˆ(i) := x (i ) ∀ i = (i , ... , i ) ∈ I . W k (z)[i, j] := d k (i, j) − zd 0(i, j) ∀(i, j) ∈ I n × J n . λ λ λ `=1

The map x 7→ xˆ is one to one and defines the canonical inclu- Theorem 1 (A Formula for the Discounted Values). For any λ ∈ (0, 1], sion ∆(I )n ⊂ ∆(I n ). The map y 7→ yˆ is defined similarly and the value of the discounted stochastic game (K , I , J , g, q, k, λ) is gives the canonical inclusion ∆(J )n ⊂ ∆(J n ). the unique solution to • The letters x and y in boldface type refer to elements of ∆(I n ) and ∆(J n ), respectively. n n • For all z ∈ R and all (x, y) ∈ ∆(I ) × ∆(J ) we set k z ∈ , val Wλ (z) = 0. R k X k Wλ (z)[x, y] := x(i) Wλ (z)[i, j] y(j). (i,j)∈I n ×J n Theorem 2 (A Formula for the Value). For any z ∈ R, the limit k k n We now prove the 4 properties above. The first one is due to F (z) := limλ→0 val Wλ (z)/λ exists in R ∪ {±∞}. The value of the stochastic game (K , I , J , g, q, k) is the unique solution to Ostrovski (31), and for completeness we provide a short proof.

( Lemma 1. For any stochastic matrix P of size n × n and any λ ∈ z > w ⇒ F k (z) < 0 (0, 1], det(Id − (1 − λ)P) ≥ λn . w ∈ , R z < w ⇒ F k (z) > 0. Proof: Set M := Id − (1 − λ)P. Because P is a stochastic matrix, `,` P `,`0 M − 0 |M | ≥ λ for all 1 ≤ ` ≤ n. Hence, M is strictly Comments. ` =6 ` diagonally dominant. For any µ ∈ R so that µ < λ, the matrix 1) Theorem 1 provides an uncoupled characterization of the M − µId is still strictly diagonally dominant, so in particular discounted values. That is, each initial state is considered it is invertible. Consequently, all real eigenvalues of M are separately. This property, which contrasts with Shapley’s (1) larger than√ or equal to λ. Similarly, for any µ = a + bi ∈ C so characterization, provides the key to Theorem 2. that |µ| := a2 + b2 < λ, the matrix M − µId is strictly diag- 2) Theorem 1 can be extended to stochastic games with compact onally dominant, so that M − µId is invertible. Consequently, action spaces and continuous payoff and transition functions, if a + bi is a complex eigenvalue of M , then λ ≤ |a + bi|, but Theorem 2 cannot because the discounted values may fail so that λ2 ≤ |a + bi|2 = a2 + b2 = (a + bi)(a − bi). Recall that Qn to converge in this case. det M = `=1 µ`, where µ1, ... , µn are the eigenvalues of M 3) Theorem 2 provides a different and elementary proof of the counted with multiplicities. Because each real eigenvalue con- convergence of the λ-discounted values as λ tends to 0. tributes at least λ in the product, and each pair of conju- 4) Theorem 2 captures the characterization of the value for gate eigenvalues contributes at least λ2, it clearly follows that n absorbing games obtained by Kohlberg (21). det M ≥ λ .  5) The sign of F k (z) can be easily computed using linear pro- n n gramming techniques. This is a crucial aspect of the formula Lemma 2. For any (x, j) ∈ ∆(I ) × J and z ∈ R, of Theorem 2. 0 P 0 i) dλ(x, j)= i∈I n xb(i)dλ(i, j). 6) Theorems 1 and 2 suggest binary search algorithms for com- k P k ii) dλ (x, j)= i∈I n xb(i)dλ (i, j). puting, respectively, the discounted values and the value, by k k 0 k k iii) Wλ (z)[ˆx, j]= dλ (x, j) − zdλ(x, j). successively evaluating the sign of val Wλ (z) and of F (z) for well-chosen z. These algorithms are polynomial in the number of pure stationary strategies. The precise description Proof: and analysis of these algorithms are the object of a separate i) Let j ∈ J n be fixed. For any x ∈ ∆(I n ) set M (x, j) := Id − 0 paper (30). For completeness, we provide a brief description (1 − λ)Q(x, j), so that det M (x, j)= dλ(x, j) and, in par- 0 n in Section 5. ticular, det M (i, j)= dλ(x, j) for all i ∈ I . By Eq. 2.1, the

26438 | www.pnas.org/cgi/doi/10.1073/pnas.1908643116 Attia and Oliu-Barton Downloaded by guest on September 27, 2021 Downloaded by guest on September 27, 2021 hrfr,frall for Therefore, each for Proof: particular, In 3. Lemma ttda olw.Frall For follows. as stated all for valid also Remark: ti n Oliu-Barton and Attia h eutfloste rmtemntnct ftevalue the of 4. Lemma monotonicity the from item in then stated operator, follows result The h eutflosdrcl from directly follows result The iii) h ro osaogtesm ie as lines same the along goes proof The ii) d λ 0 rtrwof row first sn h aeagmn o h eann os one rows, remaining the for argument that obtains same inductively the Using obtain to determinant the of multilinearity strategies stationary Write the linear. is dependence d the to of equal is which I x det in Like on W i W X 1 (i, λ n k ∈I ∈ λ λ yEqs. By . (x x ˆ k ydefinition, By k det(Id = j) det x M k ∆(J (z . (z x em 2 Lemma (i, , hclm of column th 1 W j) o any For val nythrough only )[ )[ k (i M (x j) x b x b ,teepoete ml h eie eut namely result, desired the imply properties these i), val 1 ) λ n,i particular, in and, k , n , ) z W ∈ , (x (z W Indeed, j]. = j] let , i X = j) 7→ 2 W I 1 λ M , = = ∈I k λ )[i, n k 2.1 det = j) (v (x val λ z (z (z k d X X − (x × x 1 M P ssae o all for stated is = λ (z λ k k 1 , 2 j] )[ˆ < (1 , 0. = ) J and (x , W y (i Q 1 X k i∈I i∈I i∈I j) x − z M h atpoet,frisac,cnbe can instance, for property, last The ). n ) (x z 2 2 − , ˆ , λ (i, P 2 ec,by Hence, . − k ) ) of ii) eed on depends W n j) y x n n (x (x i (z ,  · · · and ∈ λ the 2.2, ∈I = ] ` x b val j) i∈I j) x x b b − X λ )Q , , k ) det (i) R (i)d (i)W det n httedpnec slinear. is dependence the that and j) y (z {(i etemti bandb replacing by obtained matrix the be x i sasohsi arxo size of matrix stochastic a is X zd sasrcl eraigra map. real decreasing strictly a is n n d 2 eto 2A. Section W , (i, 1 (i, ∈I i 2 λ by x b k z ∈I othat so (i λ 0 )[i, λ , M k (x det (i) det λ ) (x k j), x λ j)) x (i, det ) k λ ∈ x (z x 2 (x , n M (z hrwof row `th , g ( = j] , 1 y (x ∆(I (i 2 j) em 1, Lemma M ≥ j). sacne obnto of combination convex a as (x . . . (i )[i, , ≥ ) ) k x i), n j) − , (i, ≥ z − M λ )M k , det ) M j) (z 1 nythrough only , (i, othat so j), n ) z seulto equal is ,adtedfiiinof definition the and ii), j] z (z zd x < n ((i j). 2 2 o ovnec,btis but convenience, for (i, X ((i n = j) 2 − − × λ 0 z .Fix i). ), , M − 2 (x j) , x z z ∆(J , i∈I ∀(i, x 1 1 i 2 z ((i , M d )λ )d ytedefinition the by 1 2 , ∈ y λ n k )λ , . . . ). 1 k λ n 0 ) . . . (i, I j x b j) det , (x n (i, . n n s the use and }, ∈ i (i)d , 2 ∈ . × j) , , x J , x j) M j) x I . . . n R, 1 n o all for λ n n 0 ), n the and , o any For . k depends (i, ), × , (x j). i j) n n j) J ,  ), = j) n × i . j),   ∈ n eut eso httelimit the that show prove we we result, section this In Value the for Formula A 4. precisely set is element the Consequently, W function. real 1: decreasing Theorem of Proof an obtains similarly one players val the of of roles analogue the reversing By h map the tions for- the satisfies matrix item a in of value stated the mula (8), Snow and Shapley By λ. in exists (λ, each for that E equality. so this fractions collection rational satisfy finite num- of may a the that exists is fractions there so Consequently, rational finite, possible is the of submatrices as ber square However, possible vary. of also with number may vary may fraction submatrix rational square corresponding the of choice the Because 1] (0, Proof: 2 following the λ in 1. shown Lemma is This solution. lemmas. unique a admits min matrix ∆({1, any For Therefore, to equivalent is relation previous the det(Id h matrix The ySaly()a led oe nitem of in optimality noted already as let (1) Shapley in by 1 player of strategy stationary all for holds Proof: 0 λ htsatisfies that k > x b val W (z 1≤b ∗ hr xssartoa fraction rational a exists there , 0 λ R ∈ 0} = ) k . . . W ycntuto,teetisof entries the construction, By yLma2, Lemma By − othat so (v ≤q 1 ∆(I R , λ (1 λ λ k , k . . . Let P +∞} {−∞, ∪ (z p ) 7→ Q − W h ento ftevleipisthat implies value the of definition the }), ≤ n )) (x , 1≤a otisa otoeeeet By element. one most at contains val x (x ) em 2 Lemma λ W val R w z val λ hc ie h eie result. desired the gives which 0, ∗ PNAS k eoetedrc rdc fiscodnts The coordinates. its of product direct the denote , )Q eog oteuino h rpso h func- the of graphs the of union the to belongs L ∗ ∈ (v val j) λ implies ∈ ≤p W k , sarayntdi item in noted already As . W W Then, R. (v λ j) k d (x ∈ R, γ λ M W )[ k s λ λ k λ λ λ k k v of iv) k ssohsi fsize of stochastic is ∆(I k | ,terelation the iii), (v ∗ (a (x x b By (x λ (z )[ k (z λ , (m = k ∗ eebr2,2019 26, December . λ j)) k x b )m ∗ (z ) ∗ , ( = ) ) , , j] ) , em 3, Lemma o all for o all for scniuu on continuous is ≥ = j] j) z z n = ) = j) ≥ a eto 2A Section ≥ eoew sals this establish we Before 2. Theorem ,b > < × a R hr xssartoa fraction rational a exists there − j∈J min λ ,b 0 Consequently, . R (λ) w w d J n v ) d d n λ λ k ec,frany for Hence, (λ). n olw rmEqs. from follows k > λ λ z k 0 F (x W d (K Let . fsize of (x (x (i, ∈ 0 λ k 0 ⇒ ⇒ , λ (x (z k ∗ ∗ R , j) by y R (v , z , I osqety o any for Consequently, . =lim := ) ∀λ ) ∗ j) j) − n htteequation the that and W , x 7→ λ othat so λ , k ∈ Consequently, 1. Lemma J | ∗ F F j) n )[ ≥ v ∈ λ ∈ , k I ∈ val λ o.116 vol. x b k k k × ≥ g (z n 0 1] (0, v of v) (0, d (z (z p ∗ ∆(I , 0 1] (0, λ k × λ n , 0. 0 ) q W × ) ) . j] (x λ→0 , val r oyoil in polynomials are λ othat so of iii) E ∆(J > < k λ q ≥ 0 k ) , , hr exists there n and 2B, Section ). = (z Consequently, . 0 0 j) W | λ 0. this 4, Lemma val n any and ea optimal an be hc exists which ), ) ) {R o 52 no. λ n 3.1 k h point the λ, {z sastrictly a is (z eto 2A, Section n then and , W d 1 λ , 0 = ) ∈ λ . . . and (x k val (z R, | R ∗ λ R , , [4.1] [3.2] [3.1] )/λ M 26439 the , R = j) (λ). R and 3.2. λ s val L  ≥  ∈ ∈ ∈ n }

APPLIED ECONOMIC MATHEMATICS SCIENCES k k as λ varies on the interval (0, 1], the curve λ 7→ (λ, val Wλ (z)) By Lemma 3, the map z 7→ val Wλ (z) is strictly decreasing. By 0 k k can “jump” from the graph of R to the graph of R only at Lemma 4, val Wλ (vλ ) = 0. Therefore, Eq. 4.5 implies points where these 2 graphs intersect. Yet, for any 2 rational fractions, either they are congruent or they intersect finitely 0 k many times. Hence, there exists λ0 so that, for any R, R ∈ E, vλ > w − ε ∀λ ∈ (0, λ0). [4.6] 0 0 either R(λ) = R (λ) for all (0, λ0) or R(λ) 6= R (λ) for all (0, λ0). k In particular, there exists R ∈ E so that val Wλ (z) = R(λ) for k all (0, λ0).  Because ε is arbitrary, lim infλ→0 vλ ≥ w. By reversing the roles k of the players, one obtains in a similar manner lim supλ→0 vλ ≤ Lemma 2. Eq. 4.1 admits a unique solution. w. Hence, the λ-discounted values converge as λ goes to 0, and k limλ→0 v = w. The result follows then from item vi) of Sec- k n λ Proof: By Lemma 1, limλ→0 val Wλ (z)/λ exists for all z ∈ R. k 0 tion 2B, namely the existence of the value v and the equality Suppose that Eq. 4.1 admits 2 solutions w < w . Then, for any z ∈ k k 0 k k limλ→0 vλ = v , due to Mertens and Neyman (5).  (w, w ) one has F (z) < 0 and F (z) > 0, which is impossible. 2 Therefore, Eq. 4.1 admits at most one solution. Let (z1, z2) ∈ R 5. Algorithms satisfy z1 < z2. Rearranging the terms in Lemma 3, dividing by The formulas obtained in Theorems 1 and 2 suggest binary search λn , and taking λ to 0 yields methods for approximating the λ-discounted values and the k k value of a stochastic game (K , I , J , g, q, k), based on the eval- F (z1) ≥ F (z2) + z2 − z1. [4.2] k uation of the sign of the real functions z 7→ val Wλ (z) and z 7→ k In particular, the following relations hold: F (z), respectively. In this section we provide a brief descrip- tion of these algorithms and discuss their complexity using the  k k 0 0 logarithmic cost model (a model which accounts for the total F (z) ≥ 0 ⇒ F (z ) ≥ 0, ∀z ≤ z  k k 0 0 number of bits which are involved). We refer the reader to ref. F (z) ≤ 0 ⇒ F (z ) ≤ 0, ∀z ≥ z [4.3] 30 for more technical details and for 2 additional algorithms F k (z) = 0 ⇒ F k (z 0) 6= 0, ∀z 0 6= z. k k which provide exact expressions for vλ and v within the same complexity class. We now show that F k is not constant, which is still k k 1 2 m compatible with Eq. 4.2 if F ≡ +∞ or F ≡ −∞. Let Notation. For any m ∈ N, let Em := {0, m , m , ... , m } and C − := min g(k, i, j ) C + := max g(k, i, j ) 1 2 2m k,i,j and k,i,j . For any Zm := {0, m , m , ... , m }. − k + 2 2 2 λ ∈ (0, 1], one clearly has C ≤ vλ ≤ C . Consequently, by Lemma 3, A. Computing the Discounted Values. The following bisection algo- rithm, which is directly derived from Theorem 1, inputs a dis- k + k k k − val Wλ (C ) ≤ val Wλ (vλ ) ≤ val Wλ (C ). counted stochastic game with rational data and outputs an arbitrarily close approximation of its value. n Dividing by λ and taking λ to 0, one obtains Input. A discounted stochastic game (K , I , J , g, q, k, λ) so that, 2 for some (N , L) ∈ , the functions g and q take values in EN k + k − N F (C ) ≤ 0 ≤ F (C ). [4.4] and λ ∈ EL and a precision level r ∈ N. Output. 2−r v k − A approximation of λ . We now define recursively 2 real sequences (um )m≥1 and Complexity. Polynomial in n, |I |n , |J |n , log N , log L and r. + − − + + (um )m≥1 by setting u1 := C , u1 := C , and, for all m ≥ 1, 1) Set w := 0, w := 1 ( −r 1 (u− + u+) if F k 1 (u− + u+)≥ 0 2) WHILE w − w > 2 DO − 2 m m 2 m m w+w um+1 := − 2.1) z := um otherwise, 2 k 2.2) v := sign of val Wλ (z) 2.3) IF v ≥ 0 THEN w := z ( 1 − + k 1 − +  + 2 (um + um ) if F 2 (um + um ) ≤ 0 2.4) IF v ≤ 0 THEN w := z um+1 := + um otherwise. 3) RETURN u := w.

k − k + k −r By construction, F (um ) ≥ 0 and F (um ) ≤ 0 for all m ≥ 1. Clearly, the output u satisfies |u − vλ | ≤ 2 , and the number − − + + Moreover, Eqs. 4.3 and 4.4 imply C ≤ um ≤ um ≤ C for all of iterations in step 2, the “while” loop, is bounded by r. Also, − + m ≥ 1, so that (um )m is nondecreasing and (um )m is nonin- the complexity of each iteration depends crucially on the com- + − 1 + − creasing. Furthermore, um+1 − um+1 ≤ 2 (um − um ) for all m ≥ plexity of step 2.2. First of all, one needs to determine the matrix k 1. Hence, the 2 sequences admit a common limit u¯. For any ε > 0, Wλ (z) for some z ∈ Zr , and this requires the computation of − n n let mε be such that umε > u¯ − ε. By Eq. 4.2, this implies 2 n × n determinants for each of its |I | × |J | entries. Algo- rithms for computing the determinant of a matrix exist which k k − − F (¯u − ε) ≥ F (umε ) + umε − (¯u − ε) > 0. are polynomial in its size and in the number of bits that which are needed to encode this matrix. Second, the choice of z and Similarly, F k (¯u + ε) < 0 for any ε > 0. Together with Eq. 4.3, this Hadamard’s inequality imply that the number of bits which are k n n shows that u¯ is a solution to Eq. 4.1.  needed to encode Wλ (z) is polynomial in n, |I | , |J | , log N We are now ready to prove our main result. and log L, and r. Third, computing the value of a matrix can be done with linear programming techniques, and algorithms exist Proof of Theorem 2: Let w be the unique solution Eq. 4.1 and fix [for example, Karmarkar (32)] which are polynomial in its size ε > 0. By the choice of w, F k (w − ε) > 0. Consequently, there and in the number of bits which are needed to encode this matrix. n exists λ0 > 0 so that Consequently, the complexity of step 2.2 is polynomial in n, |I | , |J |n , log N and log L, and r, and the same is true for the entire k val Wλ (w − ε) > 0 ∀λ ∈ (0, λ0). [4.5] algorithm.

26440 | www.pnas.org/cgi/doi/10.1073/pnas.1908643116 Attia and Oliu-Barton Downloaded by guest on September 27, 2021 Downloaded by guest on September 27, 2021 o any For eue rmtepoete fteKoekrpout This product. Kronecker the of properties in the established from deduced relations linearity The has one construction By with matrices. replaced of are det products product the Kronecker where the and columns along oped by the removing by G 1) matrices. of product Kronecker 1 the on based Let is which game .A lentv omlto ftePrmtrzdGames. Parameterized 1 the orems of game Formulation parameterized Alternative An absorbing A. of value the for by captured (21) is Kohlberg games by explain obtained and formula function of transition extension the action and metric why payoff compact continuous with and games sets stochastic of framework parameterized eral the of definition alternative games an provide we First, Extensions and Remarks algorithm. 6. entire the for true is same the and variables, and encode to needed are ti n Oliu-Barton and Attia of computation the with replaced be can 2.2 val step in putation h inof sign the RETURN 3) WHILE 2) Set 1) Complexity. Output. of approximation Input. close arbitrarily an value. its outputs and data rational from derived directly is Value. the Computing B. soecm ihtehl fpooiin41o e.3:“o any “For 30: ref. of 4.1 proposition r of help to the due function with problematic the overcome seem is of might nature of that limiting computation sign the a the 2.2, computing step by requires at bounded iteration is each loop) however, “while” (the variable 2 the step in iterations of r functions the N, ≤ ∈ ∈ ` .)IF 2.4) IF 2.3) 2.2) 2.1) iebfr,teoutput the before, Like ra fmtie fsize of matrices of array det W ⊗ (g = let N, N. `, D r U D ec,tecmuaino tp22i oyoili these in polynomial is 2.2 step of computation the Hence, . λ λ ` k ⊗ r tcatcgame stochastic A w 0 W v z λ 0 = (z (`, ≤ 0 eoeamti foe fsize of ones of matrix a denote A := and =sign := (d = h eemnn fasur ra fmtie,devel- matrices, of array square a of determinant the =0, := λ λ    ≤ k v v ytecoc of choice the By ). n i (z 2 r ≤ ≥ −λ . . . −λ osdrtematrices the consider , val , w w ` −r := oyoilin Polynomial z W j eod eextend we Second, ). λ + ≤ epoiea lentv osrcino this of construction alternative an provide We 2. 0 2 0 0 )) u − w (i, lastksvle nteset the in values takes always w G G W λ prxmto of approximation N k n THEN THEN i := w of ,j g =1 := (z (` n 1 let , j)) λ −10n k and r > n s hmt omtefollowing the form to them use and w ( = ) F U 1) + (z i,j −(1 . 2 k hoe 2. Theorem D W hoe 2 Theorem − W ) and (z −r 2 q λ |I w w h olwn ieto loih,which algorithm, bisection following The ` −1) nusasohsi aewith game stochastic a inputs 2, Theorem o all for λ (1 λ ) k hclm fmtie from matrices of column th k | − aevle in values take ethe be r DO 2n := := (K (z (z u (−1) − |I n . . . λ)Q r k ) satisfies ) hn h inof sign the Then, . z z , , λ | × | λ det I splnma in polynomial is |I ly rca oei both in role crucial a plays )Q r z , k n n | J and n ⊗ ∈ ,1 det J al.Fnly eso htthe that show we Finally, fails. v 1,1 × , , D Z k |: g |J hoe 1 Theorem Q . n F r , λ ⊗ k z ...... | . ”Cneunl,tecom- the Consequently, .” | q u ` . k h ubro iswhich bits of number the , n ra fmtie obtained matrices of array − ,` D , . oee,ti difficulty this However, . , − E k 0 λ U log em 2 Lemma z k (q = ) N v −(1 (d = det ota,frsome for that, so − k n rcso level precision a and N ≤ | |I (1 Z ⊗ (` n − | × | λ and k otemr gen- more the to r F , 0 2 D − (i, niebefore, Unlike . |I . . . λ)Q −r | k λ 0 λ)Q (z J `, | j)) . r n h number the , a lobe also can o each For |. . ) , i 1,n i,j D | , seulto equal is J n j othat so , n Denote . ,n )) | n ×    i , r F ,j log and , (n . The- k N The and (z N + ∈ ) , noe ihteweak (α, the with endowed over short. distributions for bility games, by stochastic denote compact-continuous We as to referred and sets, ric games stochastic {1, consider we Games. tion Stochastic Compact-Continuous B. and (34). and L.A. paper forthcoming by games a inter- developed in stochastic The is M.O.-B. 6). between problems chap. eigenvalue exists 33, multiparameter which (ref. 1960s connection the problems esting in eigenvalue Atkinson multiparameter by of initiated theory the by) inspired n aro ttoaystrategies stationary of pair any games. stochastic continuous 1. Theorem of Extension known. well are results These ∆(I for expression alternative hi n h etro xetdrwrs omly hyare they Formally, rewards. Eqs. n expected in of like vector defined the Let and rewards. chain state-dependent with R chain Markov a game stochastic of compact-continuous (K the Furthermore, case. operator Shapley the define yidr,b h omgrvetninterm iei the in Like theorem. extension case, Kolmogorov finite the by cylinders, over state sure function the transition because defined the well is normalized which expected of vector gral item in stated game theorem zero-sum minmax the By opc ercset metric compact oethat Note setting by manner similar the λg replacing by d where elcn uswt h orsodn nerl n setting and integrals corresponding the x with sums replacing b λ k h sum the , n := h ro osaogtesm ie.Lk ntefiiecase, finite the in Like lines. same the along goes proof The o each For (x (x Φ(λ, ×n , . . . β I × x q , ) , R W , y y 1 (` I ∈ J , J and ×J ρ ) ). ⊗ · · · ⊗ λ n g 0 k , · ∆(I λ,u ` d o all For ). | g (z } (`, n ohpaeshv pia ttoaystrategies. stationary optimal have players both and ), λ stedtriato the of determinant the is 0 `, (K W d , em 2 Lemma (x safiiesto states, of set finite a is (α, )[i, q (x z g P α, α, ) g λ , k (x , ∈ × × k ` (z y ( and PNAS := j] β β β ∆(I , i x ⊗ h uiir game auxiliary the R, , =det(Id := ) ,j I ∆(J λ) := ) := ) := ) ) y n (∆(I ) y × 1 ) sn ogramti,btampigfo the from mapping a but matrix, a longer no is ∈I ∈ q ` a value a has I ∈ ≤ ) d k J )(i ×J | ∆(I n λ hi ietpouti eoe by denoted is product direct their ), λ r otnosfntos hs ae are games These functions. continuous are k Z Z and hclm of column th R 2.1 ) `, γ g (i, × I I ,∆( ), eebr2,2019 26, December N a eetne odfrwr,by word, for word extended be can , λ n k ×J ×J (`, x ` j (x I J ntesgaagbagnrtdb the by generated algebra sigma the on n hoe 1 Theorem 0 ∗ j) ` iial,let Similarly, ). q and eoetetasto arxo this of matrix transition the denote ∆(J ≤ (i ). n α, and q g , W − oooy o n aro measures of pair any For topology. − J y )y (` (`, to nueauiu rbblt mea- probability unique a induce n Φ(λ, ), β = ) λ zd k (1 0 ` and epciey h eso proba- of sets the respectively, ), u elcn,for replacing, but 2.2, v iei h nt ae consider case, finite the in Like R. (1 + ) (z i | J ρ (j λ λ 0 (x k , `, − λ,u ` hs esaecmatwhen compact are sets These . (i, λ j d hc steuiu xdpoint fixed unique the is which , ) d ) ) i , λ λ dsone u frewards, of sum -discounted k · 0 λ)Q u srmnseto o,rather, (or, of reminiscent is ihtecrepniginte- corresponding the with , I ) y d j) : ) (x (x a eetne ocompact- to extended be can j ∈ ) − (K (α ) and W a au,s n can one so value, a has , , R ∈ Id R d (x λ y y | n ⊗ λ , ∆(I n (α n k ) ) ) − I o.116 vol. k , → (z eset we , , ` J ∀(i, × X β hogotti sec- this Throughout y , γ of i) 0 h pair the , ⊗ n (1 =1 J )(i ) λ )) R r opc met- compact 2 are n ) (x , a edfie na in defined be can n β − n j) 0 6= q g , )(i × , arxobtained matrix (` , j iei h finite the in like λ ∈ the 2A, Section y q ) | ∆(J 0 )Q , ) where ), I | j o 52 no. ∈ n `, ), n where and (x × R (x α, ) 1 Q n , n J , y ≤ β (x α induces y n ) ethe be | ). and ), `, . ⊗ , 26441 K y with ` β 0 ) ≤ = ∈ ∈

APPLIED ECONOMIC MATHEMATICS SCIENCES the mixed extension of this game; that is, the zero-sum game with Notation. We assume without loss of generality that k0 = 1 and n n 2 n action sets ∆(I ) and ∆(J ) and payoff function set u(z) := (z, v , ... , v ) for all z ∈ R. Kolhberg’s result. Every absorbing game (K , I , J , g, q, 1) has a Z 1 k k value, denoted by v , which is the unique point where the func- Wλ (z)[x, y] := Wλ (z)[i, j] d(x ⊗ y)(i, j). I n ×J n tion T : R → R ∪ {±∞} changes sign; T is defined using the Shapley operator by By the minmax theorem stated in item i) of Section 2A, this game k Φ1(λ, u(z)) − z admits a value, denoted by val Wλ (z). Lemmas 3 and 4 can thus T (z) : = lim . be extended word for word as well; it is enough to replace all λ→0 λ sums with the corresponding integrals. The extension of Theorem 1 follows directly from these 2 lemmas. Comparison to our result. We claim that F 1 = T in the class of Extension of Theorem 2. Theorem 2 cannot be extended to absorbing games. First of all, for all (i, j) ∈ I n × J n , compact-continuous stochastic games. Indeed, Vigeral (35) provided an example of a stochastic 0 n−1 1 1  game with compact action sets and continuous payoff and transi- dλ(i, j)= λ 1 − (1 − λ)q(1 | 1, i , j ) tion functions for which the discounted values do not converge. n ! 1 n−1 1 1 X 1 1 ` In this sense, the extension of our result to this framework dλ(i, j)= λ λg(1, i , j ) + (1 − λ) q(` | 1, i , j )v . is not possible. However, we point out that only one point `=2 in our proof is problematic. Indeed, the failure occurs in the 1 Thus, for any z ∈ R, the (i, j) th entry of Wλ (z) is equal to use of Lemma 1, which relies on the formula stated as prop-

erty iv) in Section 2A, which holds only in the finite case. For n−1  1 1 Xn 1 1 `  k λ λg(1, i , j ) + (1 − λ) q(` | 1, i , j )u (z) − z . infinite action sets it is no longer true that λ 7→ val Wλ (z) `=1 is a rational fraction in λ in a neighborhood of 0 for all z ∈ R, which was crucial to prove the existence of the limit In particular, W 1(z) depends on (i, j) only through (i1, j1) ∈ k k n λ F (z) := limλ→0 val Wλ (z)/λ . I × J . By eliminating the redundant rows and columns of I J g 1 n−1 1 Determining necessary and sufficient conditions on , , , Wλ (z) one thus obtains the matrix λ (Gλ,u − zU ), where and q which ensure the convergence of the discounted values or 1 U denotes a matrix of ones of appropriate size, and Gλ,u the existence of the value is an open problem. Bolte, Gaubert, is the matrix game described in item v) of Section 2B. The and Vigeral (36) provided sufficient conditions, namely that g affine invariance of the value operator, namely val(cM + dU ) = and q are separable and definable. Without going into a pre- c val M + d for any matrix M and any (c, d) ∈ (0, +∞) × R, cise definition of these 2 conditions, they hold in particular when gives then the payoff function g and the transition q are polynomials in the I J g q players’ actions. However, the case where , , , and are semi- 1 n−1 1 1 d val W (z) λ val (G − zU ) Φ (λ, u(z)) − z algebraic is still unsolved. (A subset E of R is semialgebraic if it λ = λ,u(z) = . is defined by finitely many polynomial inequalities; a function is λn λn λ semialgebraic if its graph is semialgebraic.) Taking λ to 0 gives the desired equality. C. Absorbing Games. We now show that Kohlberg’s (21) result on absorbing games is captured in Theorem 2. An absorbing game 7. Data Availability is a stochastic game (K , I , J , g, q, k) so that, for some fixed state There are no data associated with this paper. k0 ∈ K , ACKNOWLEDGMENTS. We are greatly indebted to Sylvain Sorin, whose q(k | k, i, j ) = 1 ∀(i, j ) ∈ I × J , ∀ k 6= k0. comments on an earlier draft led to significant simplifications of our main proofs. We are also very thankful to for his careful read- ing and numerous remarks on a previous version of this paper and to For any initial state k 6= k0, the state does not evolve during the Bernhard von Stengel, the Editor, and the anonymous reviewers for their k game and, as a consequence, vλ is equal to the value of the insightful comments and suggestions at a later stage. M.O.-B. gratefully acknowledges the support of the French National Research Agency for the matrix (g(k, i, j ))(i,j )∈I ×J for all λ ∈ (0, 1] and k 6= k0. We use k k Project CIGNE (Communication and Information in Games on Networks) the notation v to emphasize that vλ does not depend on λ, for (ANR-15-CE38-0007-01), and the support of the Cowles Foundation at Yale all k 6= k0. University.

1. L. Shapley, Stochastic games. Proc. Natl. Acad. Sci. U.S.A. 39, 1095–1100 (1953). 13. A. Neyman, S. Sorin, Repeated games with public uncertain duration process. Int. J. 2. J. von Neumann, Zur theorie der gesellschaftsspiele. Math. Ann. 100, 295–320 (1928). 39, 29–52 (2010). 3. E. Solan, N. Vieille, Stochastic games. Proc. Natl. Acad. Sci. U.S.A. 112, 13743–13746 14. B. Ziliotto, A Tauberian theorem for nonexpansive operators and applications to zero- (2015). sum stochastic games. Math. Oper. Res. 41, 1522–1534 (2016). 4. T. Bewley, E. Kohlberg, The asymptotic theory of stochastic games. Math. Oper. Res. 15. A. Neyman, Stochastic games with short-stage duration. Dyn. Games Appl. 3, 236–278 1, 197–208 (1976). (2013). 5. J. F. Mertens, A. Neyman, Stochastic games. Int. J. Game Theory 10, 53–66 (1981). 16. M. Oliu-Barton, B. Ziliotto, Constant payoff in zero-sum stochastic games. 6. J. F. Mertens, A. Neyman, Stochastic games. Proc. Natl. Acad. Sci. U.S.A. 79, 2145–2146 ArXiv:1811.04518 (11 November 2018). (1982). 17. S. Sorin, X. Venel, G. Vigeral, Asymptotic properties of optimal trajectories in dynamic 7. M. Sion, On general theorems. Pac. J. Math. 8, 171–176 (1958). programming. Sankhya A 72, 237–245 (2010). 8. L. Shapley, R. Snow, “Basic solutions of discrete games” in Contributions to the Theory 18. R. Howard, Dynamic Programming and Markov Processes (John Wiley, New York, NY, of Games, Vol. I, H. Kuhn, A. Tucker, Eds. (Annals of Mathematics Studies, Princeton 1960). University Press, Princeton, NJ, 1950), vol. 24, pp. 27–35. 19. A. Hoffman, R. M. Karp, On nonterminating stochastic games. Manag. Sci. 12, 359– 9. S. Sorin, A First Course on Zero-Sum Repeated Games (Springer Science & Business 370 (1966). Media, 2002), vol. 37. 20. D. Blackwell, T. Ferguson, The big match. Ann. Math. Stat. 39, 159–163 (1968). 10. J. Renault, A tutorial on zero-sum stochastic games. ArXiv:1905.06577 (16 May 2019). 21. E. Kohlberg, Repeated games with absorbing states. Ann. Stat. 2, 724–738 (1974). 11. W. Szczechla, S. Connell, J. Filar, O. Vrieze, On the Puiseux series expansion of the 22. R. Laraki, Explicit formulas for repeated games with absorbing states. Int. J. Game limit discount equation of stochastic games. SIAM J. Control Optim. 35, 860–875 Theory 39, 53–69 (2010). (1997). 23. S. Sorin, G. Vigeral, Existence of the limit value of two person zero-sum discounted 12. M. Oliu-Barton, The asymptotic value in stochastic games. Math. Oper. Res. 39, 712– repeated games via comparison theorems. J. Optim. Theory Appl. 157, 564–576 721 (2014). (2013).

26442 | www.pnas.org/cgi/doi/10.1073/pnas.1908643116 Attia and Oliu-Barton Downloaded by guest on September 27, 2021 Downloaded by guest on September 27, 2021 9 .A asn .Kouck M. Hansen, A. K. 29. concurrent for improvement “Strategy Chatterjee, K. Alfaro, de L. stochastic Henzinger, two-player T. in strategies 27. optimal uniformly Computing Vieille, N. Solan, E. 26. in in are games” stochastic games concurrent “Recursive limit-average Yannakakis, M. Stochastic Etessami, K. Henzinger, T. 25. Majumdar, R. Chatterjee, K. 24. 8 .Ro .Cadaeaa,K ar loihsfrdsone tcatcgames. stochastic discounted for Algorithms Nair, K. Chandrasekaran, R. Rao, S. 28. ti n Oliu-Barton and Attia ihsfrsligsohsi ae”in games” Computing of Theory stochastic on Symposium solving for rithms Appl. Theory Optim. (QEST) Systems of in games” reachability games. 324–335. pp. 4052, vol. 2006), Germany, Berlin, Programming Springer, and Science, Computer Languages Automata, on Colloquium EXPTIME. cn Theor. Econ. n.J aeTheory Game J. Int. IE,20) p 291–300. pp. 2006), (IEEE, 3–5 (2010). 237–253 42, 2–3 (1973). 627–637 11, hr nentoa ofrneo h uniaieEvaluation Quantitative the on Conference International Third ,N arte,P .Mlesn .P sgrds Eatalgo- “Exact Tsigaridas, P. E. Miltersen, B. P. Lauritzen, N. y, ´ 1–3 (2008). 219–234 37, AM 01,p.205–214. pp. 2011), (ACM, rceig fte4r nulACM Annual 43rd the of Proceedings Pr I etr oe in Notes Lecture II, (Part International J. 1 .Otosi u ad la Sur (15 Ostrowski, ArXiv:1810.13019 A. games. stochastic 31. solving for algorithms New Oliu-Barton, M. 30. 5 .Vgrl eosmsohsi aewt opc cinst n oasymptotic no and sets action compact with game stochastic zero-sum A Vigeral, G. problems 35. eigenvalue multiparameter kernels, Shapley-Snow Oliu-Barton, M. Attia, L. 34. Atkinson, F. 33. in programming” linear for algorithm polynomial-time new “A Karmarkar, N. 32. 6 .Ble .Guet .Vgrl enbezr-u tcatcgames. stochastic zero-sum Definable Vigeral, G. Gaubert, S. Bolte, J. 36. a 2019). May value. 2019). May (22 ArXiv:1810.08798 games. stochastic and I. vol. 1972), Computing of Theory on Symposium 302–311. ACM pp. Annual 16th the of ceedings determinants. 7–9 (2014). 171–191 40, y.GmsAppl. Games Dyn. utprmtrEgnau Problems Eigenvalue Multiparameter ul c.Math. Sci. Bull. PNAS triaindsbre inf bornes des etermination ´ | 7–8 (2013). 172–186 3, eebr2,2019 26, December 93 (1937). 19–32 61, | Aaei rs,NwYr,NY, York, New Press, (Academic o.116 vol. rerspu n lsedes classe une pour erieures ´ | o 52 no. ah pr Res. Oper. Math. AM 1984), (ACM, | 26443 Pro-

APPLIED ECONOMIC MATHEMATICS SCIENCES