Games and Economic Behavior 104 (2017) 92–130

Contents lists available at ScienceDirect

Games and Economic Behavior

www.elsevier.com/locate/geb

✩ Continuous-time stochastic games

Abraham Neyman

Institute of Mathematics, and the Federmann Center for the Study of Rationality, The Hebrew University of Jerusalem, Givat Ram, Jerusalem 91904, Israel a r t i c l e i n f o a b s t r a c t

Article history: We study continuous-time stochastic games, with a focus on the existence of their Received 2 April 2015 equilibria that are insensitive to a small imprecision in the specification of players’ Available online 22 March 2017 evaluations of streams of payoffs. We show that the stationary, namely, time-independent, discounting game has a stationary JEL classification: equilibrium and that the discounting game and the more general game with time-separable C73 C70 payoffs have an epsilon equilibrium that is an epsilon equilibrium in all games with a sufficiently small perturbation of the players’ valuations. Keywords: A limit point of discounting valuations need not be a discounting valuation as some of Stochastic games the “mass” may be pushed to infinity; it is represented by an average of a discounting Continuous-time games valuation and a mass at infinity. Equilibrium We show that for every such limit point there is a profile that is an epsilon Uniform equilibrium equilibrium of all the discounting games with discounting valuations that are sufficiently close to the limit point. © 2017 Elsevier Inc. All rights reserved.

1. Introduction

1.1. Motivating continuous-time stochastic games

A fundamental shortfall of economic forecasting is its inability to make deterministic forecasts. A notable example is the repeated occurrence of financial crises. Even though each financial crisis was forecasted by many economic experts, many other experts claimed just before each occurrence that “this time will be different,” as documented in the book This Time Is Different: Eight Centuries of Financial Folly by Reinhart and Rogoff. On the other hand, many economic experts have forecasted financial crises that never materialized. Finally, even those who correctly predicted a financial crisis could not predict the time of its occurrence and many of them warned that it was imminent when in fact it occurred many years later. An analogous observation comes from sports, e.g., soccer. In observing the play of a soccer game, or knowing the qualities of both competing teams, we can recognize a clear advantage of one team over the other. Therefore, a correct forecast is that the stronger team is more likely than the other team to score the next goal. However, it is impossible to be sure which team will be the first to score the next goal, and it is impossible to forecast the time of the scoring. In both cases, the state of the economy and the score of the game can dramatically change in a split second, and players’ actions impact the likelihood of each state change.

✩ This research was supported in part by Israel Science Foundation grant 1596/10. E-mail address: [email protected]. http://dx.doi.org/10.1016/j.geb.2017.02.004 0899-8256/© 2017 Elsevier Inc. All rights reserved. A. Neyman / Games and Economic Behavior 104 (2017) 92–130 93

A game-theoretic model that accounts for the change of a state between different stages of the interaction, and where such change is impacted by the players’ actions, is a . However, no single deterministic-time dynamic game can capture the important common feature of these two examples: namely, the probability of a state change in any short time interval can be positive yet arbitrarily small. This feature can be analyzed by studying the asymptotic behavior in a sequence of discrete-time stochastic games, where the individual stage represents short time intervals that converge to zero and the transition probabilities to a new state also converge to zero. The limit of such a sequence of discrete-time stochastic games is a continuous-time stochastic game; see Neyman (2012, 2013). The analysis of continuous-time stochastic games enables us to model and analyze the important properties that (1) the state of the interaction can change, (2) the stochastic law of states is impacted by players’ actions, and (3) in any short time interval the probability of a discontinuous state change can be positive (depending on the players’ actions), but arbitrarily small for sufficiently short time intervals. Accounting for stochastic state changes – both gradual (continuous) and instantaneous (discontinuous) but with infinites- imal probability in any short time interval – is common in the theory of continuous-time finance; see, e.g., Merton (1992). However, in continuous-time finance theory the stochastic law of states is not impacted by agents’ actions. Accounting for (mainly deterministic and more recently also stochastic) continuous state changes that are impacted by agents’ actions is common in the theory of differential games; see, e.g., Isaacs (1999). In the present paper we study the three above-mentioned properties, with a special focus on discontinuous state changes. To do this we study the continuous-time stochastic game with finitely many states, since if there are finitely many states, then any state change is discontinuous.

1.2. Difficulties with continuous-time strategies

A classical game-theoretic analysis of continuous-time games entails a few unavoidable pathologies, as for some naturally defined strategies there is no distribution on the space of plays that is compatible with these strategies, and for other nat- urally defined strategies there are multiple distributions on the space of plays that are compatible with these strategies. In addition, what defines a strategy is questionable. In spite of these pathologies, we will describe unambiguously equilibrium payoffs and equilibrium strategies in continuous-time stochastic games. Earlier game-theoretic studies overcome the above-mentioned pathologies by restricting the study either to strategies with inertia, or to Markov (memoryless) strategies that select an action as a function of (only) the current time and state. See, e.g., Bergin and MacLeod (1993) and Simon and Stinchcombe (1989) for the study of continuous-time supergames, Perry and Reny (1993) for the study of continuous-time bargaining, Isaacs (1999) for the study of differential games, and Zachrisson (1964), Lai and Tanaka (1982), and Guo and Hernandez-Lerma (2003, 2005) for the study of continuous-time stochastic games, termed in the above-mentioned literature Markov games, or Markov chain games. A more detailed discus- sion of the relation between these earlier contributions and the present paper appears in Section 7. The common characteristic of these earlier studies is that they each consider only a subset of strategies, so that a profile1 of strategies selected from the restricted class defines a distribution over plays of the games, and thus optimality and equilibrium are well defined, but only within the restricted class of strategies. Therefore, in an equilibrium, there is no beneficial unilateral deviation within only the restricted class of strategies. Therefore, there is neither optimality nor nonexistence of beneficial unilateral deviation claims for general strategies. The restriction to Markov (memoryless) strategies (see, e.g., Zachrisson, 1964; Rykov, 1966; Miller, 1968a, 1968b; Kaku- manu, 1969, 1971, 1975; Guo and Hernandez-Lerma, 2009) is (essentially) innocuous in the study of continuous-time, discounted or finite-horizon Markov decision processes and two-person zero-sum stochastic games. The two properties of the discounted or finite-horizon payoffs that make this restriction innocuous are: (1) impatience, namely, the contribution to the payoff of the play in the distant future is negligible, and (2) time-separability of the payoff, namely, the payoff of a play on [0, ∞) is, for every s > 0, a sum of a function of the play on the time interval [0, s) and a function of the play on the time interval [s, ∞). The Markov (memoryless) assumption is restrictive in the Markov decision processes with a payoff that is not time- separable and in the zero-sum stochastic games with non-impatient payoffs. The Markov (memoryless) assumption is restrictive in the non-zero-sum game-theoretic framework. It turns out that in discounted stochastic games (with finitely many states and actions) a profile that is an equilibrium in the universe of Markov strategies is also an equilibrium in the universe of history-dependent strategies. However, the set of equilibrium strategies and equilibrium payoffs in the universe of Markov strategies is a proper subset of those in the universe of history-dependent strategies. A fundamental shortcoming of the restriction to memoryless strategies (and to oblivious strategies, which depend only on the state process) arises in the study of approximate equilibria that are robust to small changes of the discounting valuation. For example, there is a continuous-time (and discrete-time, even two-person zero-sum, e.g., the Big Match, Blackwell and Ferguson, 1968) stochastic game for which there is no Markov strategy profile that is an approximate equilibrium in all the discounted games with a sufficiently small discounting rate.

1 I.e., a list, one for each player. 94 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

The present paper proves that every continuous-time stochastic game with finitely many states and actions has such strategy profiles.

1.3. Discrete-time stochastic games

We recall the basic properties of the discrete-time stochastic game model (see, e.g., Shapley, 1953; Neyman and Sorin, 2003; Mertens et al., 2015); this will enable us to point out its similarities and differences with the continuous-time model to be described later. In a discrete-time stochastic game, play proceeds in stages and the stage state, the stage action profile, the previous stage, and the next stage are well defined. The stage payoff is a function g(z, a) of the stage state z and the stage action    profile a, and the transitions to the next state z are defined by conditional probabilities p(z | z, a) of the next state z given the present state z and the stage action profile a. Players’ stage-action choices are made simultaneously and are observed by all players following the stage play. Pure, mixed, and behavioral strategies are well and unambiguously defined in the discrete-time model. If at stage m = 0, 1, ..., the state is zm and the action profile played is am, the stage payoff is gm := g(zm, am). A play of the discrete-time stochastic game generates a stream of payoffs g0, g1, .... = A measure w on the nonnegative integers defines a payoff function uw on streams of payoffs by uw (g0, g1, ...) ∞ = m m=0 w(m)gm. The stationary discounting corresponds to measures w of the form w(m) cλ where c is a positive constant and 0 ≤ λ < 1. A (not necessarily stationary) discounting corresponds to a measure w with 0 ≤ w(m + 1) ≤ w(m). = ∞ = The common normalization uw (1, 1, ...) 1 corresponds to m=0 w(m) 1 and is useful in comparing outcomes of the same game with different discountings. The measures that define the payoff function of different players need not be identical.

1.4. Continuous-time stochastic games: payoffs

In a continuous-time game, the payoff is usually defined as an accumulation of infinitesimal payoffs. If at time t ∈ [s, s + ds) the state is z and the action profile played is a the accumulation of the payoff in the time interval [s, s + ds) is g(z, a)ds. Therefore, if the function t → (zt , at ), where zt is the state at time t and at is the action profile at time t, is [ s := measurable, then the accumulation of payoffs in the time interval 0, s) is 0 gt dtwhere gt g(zt , at ). ∞ −ρt ∞ −ρt The unnormalized, respectively, the normalized, ρ-discounted payoff, ρ > 0, is 0 e gt dt, respectively, ρ 0 e gt dt, s 1 s and the unnormalized, respectively, the normalized, s-horizon payoff, s > 0, is 0 gt dt, respectively, s 0 gt dt. Similarly, the payoffs are also defined on the space of measurable functions t → (zt , xt ), where zt is the state at time t and xt is the mixed action-profile (i.e., mixtures of pure action profiles) at time t, by setting g(z, x) to be the linear extension of g(z, a) and gt = g(zt , xt ). In general, the payoff in a stochastic game can be an arbitrary function of the function t → (zt , xt ). Consider for example a continuous-time stochastic game that models a single quarter-final or semi-final in the UEFA Champions League. The rule that specifies the winner, which qualifies for the next stage, follows. A single quarter-final or semi-final is played under the knockout system, on a home-and-away basis (two legs). The team that scores the greater aggregate of goals in the two matches qualifies for the next stage. Otherwise, the team that scores more away goals qualifies for the next stage. If this procedure does not produce a result, i.e., if both teams score the same number of goals at home and away, a 30-minute period of overtime is played at the end of the second leg. If both teams score the same number of goals during overtime, away goals count double (i.e., a visiting team that scored in the overtime qualifies). If no goals are scored during overtime, kicks from the penalty mark determine which team qualifies for the next stage. For simplicity, assume that the of the penalty kicks is random, e.g., each team is equally likely to win this phase. Each match lasts 90 minutes. Therefore, if the state of the stochastic game, zt , is the cumulative score at the t-th minute, the rules provide us with a function of (z90, z180, z210) that specifies the probability (1, 0, or 1/2) that team 1 wins. In the present paper we focus on payoffs of the stochastic game that satisfy several conditions. : → i := i First, the payoff to a player, say player i, is a function u of the stream g t gt g (zt , xt ) of his “stage payoffs.” Second, we assume that the payoff function u is linear and monotonic and is defined over the ordered linear space of all bounded measurable streams of payoffs f : t → ft (with the order f ≥ g iff ft ≥ gt ∀t, and the natural linear structure αf + βg : t → α ft + β gt ). We allow the payoff functions to differ among players. The payoff function u is normalized if u(1) = 1, where 1 is the stream f with ft = 1for every t ≥ 0. The payoff function u is impatient if u(1≥s) →s→∞ 0, where 1≥s is the stream f with ft = 1for every t ≥ s and ft = 0 for every 0 ≤ t < s. := ∞ A payoff function u that is defined (on the space of bounded measurable streams of payoffs f) by u(f) 0 ft dw(t), where w is a measure on [0, ∞), is called a time-separable payoff and is denoted by u w . A time-separable payoff function is linear and monotonic. If, in addition, w is a probability measure, then u w is nor- malized. A. Neyman / Games and Economic Behavior 104 (2017) 92–130 95

In the other direction, any linear and monotonic payoff function u that is impatient defines a unique measure w on [ ∞ = := ∞ 0, ) such that for any bounded continuous stream of payoffs f we have u(f) uw (f) 0 ft dw(t). An important special class of time-separable payoff functions u, called impatient valuations, is defined below. For a given measurable subset B of R+ =[0, ∞), we denote by 1B the indicator function of B (i.e., the stream of payoffs f such that ft = 1if t ∈ B and ft = 0if t ∈/ B). An impatient valuation is a time-separable payoff function u that is normalized and for every measurable subset B of R+ we have u(1B ) ≥ u(1c+B ), where c ≥ 0 and c + B := {c + b | b ∈ B}. The last condition reflects the of advancing the time of positive payoffs. The time-separable payoff function that is defined by a measure w on R+ is an impatient valuation if and only if w is a probability measure and w([b, b + c)) is, for every fixed c > 0, nonincreasing in b ≥ 0. An approximate description of a normalized, linear, monotonic, and impatient payoff function u is obtained by specifying, within a small positive error term, its value on finitely many continuous streams of payoffs with bounded support. Therefore, we say that a sequence uk of normalized time-separable payoff functions uk converges if the sequence uk(f) converges for every continuous function f with bounded support. Examples of converging time-separable payoff functions include: the normalized ρ-discounted payoff as ρ goes to 0, the normalized s-horizon payoff as s goes to infinity, and the sequence of payoff functions uk that are defined by the Dirac measures on a sequence of points that converge to some fixed point x0. = := Let uk uwk be a converging sequence of normalized time-separable payoff functions. Then, the limit u(f) limk→∞ uk(f) exists for every continuous function f that is defined on [0, ∞]. (Note that a continuous function f on [0, ∞] is identified with a continuous function f on [0, ∞) for which the limit of ft , as t goes to infinity, exists.) The limit u, which is defined on the space of all continuous functions on [0, ∞], is linear, monotonic, and normalized. However, it need not coincide with a time-separable payoff function u, as some of the corresponding wk-mass may be pushed to infinity. The restriction of the limit u to the space of continuous functions f on [0, ∞] is represented by a (unique) probability measure w on [0, ∞]: ∞

lim uk(f) = lim ft dwk(t) = ft dw(t). k→∞ k→∞ 0 [0,∞]

If uk are impatient valuations then the limiting probability measure w, which may have an atom at ∞ and at 0, is 2 ∞ dw ∞ [ ∞] absolutely continuous on (0, ) and dt is nonincreasing on (0, ); equivalently, w is a probability measure on 0, with w([b, b + c)) being, for every fixed c > 0, nonincreasing in b ≥ 0. Therefore, in studying approximate optimization that is insensitive to a small imprecision in the specification of an impatient valuation, it is useful to enlarge the space of measures on [0, ∞) to the space W of all measures on [0, ∞]. s ∈ ∞ = →∞ 1 For a given measure w W , the w-payoff is the integral [0,∞] gt dw(t), where g lims s 0 gt dt if the limit exists. The definition of g∞ is motivated by the characterization of valuations that follows. A payoff function u satisfies the time value of money principle if

s s

u(f) ≥ u(g) whenever f0 ≥ g0 and ft dt ≥ gt dt ∀s > 0.

0 0 The time value of money principle reflects the preference for expediting the time of positive payoffs: the faster the accumulation of payoffs, the better. The principle states that a unit of payoffs in a given time period is preferable to its being spread over future time periods. Note that a payoff function that satisfies the time value of money principle is monotonic. A valuation is a normalized linear payoff function u that satisfies the time value of money principle. A patient valuation is a valuation u such that u(1≥s) = 1 ∀s ≥ 0. One can show that a payoff function u is a patient valuation if and only if it is a linear function over the bounded stream of payoffs such that

s s 1 1 lim inf ft dt ≤ u(f) ≤ lim sup ft dt, s→∞ s s→∞ s 0 0 and that any valuation is a mixture of an impatient valuation and a patient one.

2 A finite measure μ on Borel subsets of the real line is absolutely continuous if for every positive number ε > 0there is a positive number δ>0such that μ(A) < ε for all Borel sets A of Lebesgue measure less than δ. If μ is absolutely continuous then there exists a Lebesgue integrable function g, denoted dμ = by dt , on the real line such that μ(A) A gdt for all Borel subsets A of the real line. 96 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

1.5. Continuous-time stochastic games: transitions

  The transitions of states are described by nonnegative real-valued transition rates μ(z , z, a), defined for all triples z , z, a,  of two distinct states z = z and an action profile a.   The nonnegative real number μ(z , z, a) describes the rate of transition from state z to state z when action profile a is played at state z. If the state at time t is zt = z and the action profile at all times t ≤ s < t + δ is a, then the probability that the state    zt+δ = z = z is approximately δμ(z , z, a) for small δ; explicitly, for two distinct states z, z and a time t, conditional on the   history of the play up to time t, zt = z, and as = a for all t ≤ s < t + δ, we have P(zt+δ = z )/δ → μ(z , z, a) as δ → 0+. The actions in a continuous-time game can depend on time. Therefore, one has to define the transitions resulting from time-dependent actions.  Given a measurable time-dependent action profile at , t ≥ 0, and two distinct states z = z, the transitions obey

t+δ   P (zt+δ = z | zt = z) = μ(z , z,as) ds + o(δ) as δ → 0 + . t

1.6. Continuous-time stochastic games: strategies

The classical definition of a pure strategy in a discrete-time game is a local definition. It defines, for each stage m, the selected action at stage m as a function of the information derived from (signals from) players’ and nature’s moves. In a stochastic game with observable states and actions, a strategy specifies the action at the discrete stage m as a function of the sequence of past states and actions of other players and of the current state. An equivalent definition is the global one that specifies the sequence of actions of the player as a function of the entire sequence of states and actions of other players, but with the obvious no-looking-ahead property. The no-looking-ahead property, termed also no anticipation, asserts that the specified action at stage t depends only on players’ past observable actions and past observable chance moves. The equivalence of the two definitions follows from the fact that the set of decision times is well ordered; see Sec- tion 2.2.1. The set of times t ≥ 0is not well ordered. Therefore, there are naturally defined local strategies that do not integrate into a global one; see Section 2.2.1. Therefore, one reverts to the global form of a strategy. i : → ≥ → i A full pure strategy of player i is a function σ from plays – functions h t (zt , at ), t 0– to functions t σt (h) (with proper measurability assumptions), with the no-anticipation property. =  =  =  ≤ i = i  ≤ Explicitly, if h (zt , at )t≥0 and h (zt , bt )t≥0 are two plays with (zt , at ) (zt , bt ) for t s, then σt (h) σt (h ) on t s. Full pure strategies are important in the study of perfect equilibria. Our focus in the present paper is on equilibria. Therefore, we confine attention to reduced strategies, namely strategies that define the action to be taken only on histories that are compatible with the strategy. i A reduced pure strategy, henceforth a pure strategy, of player i is a function σ from plays – functions h : t → (zt , at ), t ≥ 0 → i – to functions t σt (h) (with proper measurability assumptions), with the no-anticipation property and independent of its own past actions. =  =  −i =  −i ≤ i = i  Explicitly, if h (zt , at )t≥0 and h (zt , bt )t≥0 are two plays with (zt , at ) (zt , bt ) for t s, then σt (h) σt (h ) on t ≤ s. The assumption that the (reduced pure) strategy choice of player i is independent of player i’s past actions is conceptu- ally innocuous, as the strategy itself and other players’ past actions define player i’s past actions. i However, the independence of σt (h) of at least player i’s very recent past own actions is required for technical reasons, as otherwise, even in the one-person game, there are strategies that do not define a play; see Section 2.2.1. The reversion to global strategies does not phase out all pathologies, even in the special case of continuous-time su- pergames. Indeed, there are (“naturally” defined global) pure strategy profiles σ , e.g., in the continuous-time game, for which there is no play h such that σ (h) = h, and there are (“naturally” defined global) pure strategy profiles σ in continuous-time supergames for which there is more than one play h with σ (h) = h (see Proposition 1 in Section 2.2.1). We illustrate the above comment by explicit examples of such strategy profiles. This illustration may serve also, in particular, as examples of general strategies. Assume that each player has two actions, labeled 0 and 1. 1 1 2 = 1 1 = 1 = ∃ ↓ Define the global strategy σ of player 1 by σ ((at )t≥0) (at )t≥0 where a0 1, and for t > 0, at 1if t > tn 0 such that a2 = 0, and a1 = 0otherwise.Note that there is a sequence t > t ↓ 0suchthat a2 = 0ifand only if there is a tn t n tn sequence t ↓ 0such that a2 = 0. n tn 2 2 1 = 2 2 = 2 = ∃ ↓ Define the global strategy σ of player 2 by σ ((at )t≥0) (at )t≥0 where a0 1, and for t > 0 at 1if t > tn 0such that a1 = 1, and a2 = 0otherwise. tn t Assume that σ ((a ) ≥ ) = (a ) ≥ , where σ = (σ 1, σ 2). If there is a sequence t ↓ 0suchthat a2 = 0, then, by the t t 0 t t 0 n tn 1 1 = ≥ 2 2 = ≥ definition of σ , at 1for every t 0, and therefore, by the definition of σ , at 1for every t 0, contradicting the A. Neyman / Games and Economic Behavior 104 (2017) 92–130 97 existence of a sequence t ↓ 0such that a2 = 0. If there is no sequence t ↓ 0such that a2 = 0, then, by the definition of n tn n tn 1 1 = 2 2 = σ , at 0for every t > 0, and therefore, by the definition of σ , at 0for every t > 0, contradicting the nonexistence of a sequence t ↓ 0such that a2 = 0. Therefore, there is no play ((a ) ≥ ) such that σ ((a ) ≥ ) = (a ) ≥ . n tn t t 0 t t 0 t t 0 2 2 1 = 2 2 = 2 = ∃ ↓ Define the global strategy τ of player 2 by τ ((at )t≥0) (at )t≥0 where a0 1, and for t > 0, at 0if t > tn 0such = 2 = = = that atn 1, and at 1otherwise. Note that the two plays (at )t≥0 where a0 (1, 1) and either at (1, 0) for all t > 0or 1 2 at = (0, 1) for all t > 0are compatible with the strategy pair (σ , τ ). Therefore, there is no proper strategic normal form defined over all strategies. However, equilibrium is unambiguously defined as a profile of strategies with no unilateral beneficial deviation, as for- mally defined below.

1.7. Continuous-time stochastic games: equilibria

i A strategy profile σ = (σ )i∈N (in a continuous-time stochastic game with player set N) is called admissible if for every i i −i i −i = player i and every strategy τ (recall that τt is independent of its past actions), the strategy profile (σ , τ ) (where σ j = [ ∞ → (σ ) j =i ) defines a unique distribution Pσ −i ,τ i on plays h (zt , xt )t≥0 with right continuous state function 0, ) t zt . The uniqueness applies only to the space of plays with right continuous state functions. Therefore we henceforth assume that the state function of a play is right continuous. An example of an admissible strategy profile in a continuous-time stochastic game is a profile of pure stationary strate- gies. An example of a strategy profile that is not admissible is (σ 1, 0), where σ 1 is the strategy of player 1 that is defined in the previous section and 0is the strategy of player 2 that always plays 0. The play with at = (1, 0) for every t is the only one that is compatible with (σ 1, 0). However, if player 2 deviates to the strategy σ 2 (of the previous section) then there is no play that is compatible with (σ 1, σ 2). Similarly, for any strategy η1 of player 1, (η1, σ 2) is not an admissible strategy profile. An ε-equilibrium is an admissible strategy profile for which no single player can benefit by more than ε from a unilateral deviation. An equilibrium is a 0-equilibrium.  = i [ ∞] For example, if w (w )i∈N is a profile of measures on 0, , then a strategy profile σ is an ε-equilibrium of w if σ is admissible and for every player i and every strategy τ i of player i, we have

i ≥ ¯ i −i i − γ w (z, σ ) γw (z, σ , τ ) ε, (1) where i := z i i ¯ i −i i := z ¯ i i γ  (z, σ ) E gt dw (t), γ  (z, σ , τ ) E gt dw (t), w Pσ w Pσ −i ,τ i [0,∞] [0,∞] i = ¯ i := i ∞ gt gt g (zt , xt ) for t < , s s 1 1 i := i ¯ i := i g∞ lim inf gt dt, and g∞ lim sup gt dt. s→∞ s s→∞ s 0 0

Obviously, if w is supported on [0 ∞ , as in the case of stationary discounting or finite-horizon games, i z = ¯ i z , , ) γ w ( , σ ) γw ( , σ ) and then we write i z for short. γw ( , σ ) In the remainder of this section we introduce the continuous-time equilibrium conditions, which are analogous to those in the discrete-time model, for three special cases: the time-independent discounting game, the finite horizon game, and the limiting-average game. We show that each one of these equilibrium conditions corresponds to inequality (1) for a suitable profile of measures w . Fix a profile ρ = (ρi)i∈N of discounting rates. The strategy profile σ is an ε-equilibrium, ε ≥ 0, of the normalized ρ-discounted game if σ is an admissible strategy profile and for every state z, every player i, and every strategy τ i of player i, we have ∞ ∞ − − E z gi e ρit dt ≥ E z gi e ρit dt − σ t ρi σ −i ,τ i t ρi ε. 0 0

−  i i ρit This inequality corresponds to inequality (1) with the profile w = (w )i∈N of measures where w ([t, ∞)) = e or, i −ρit equivalently, where w is the probability measure on [0, ∞) with density ρie . Indeed, the left-hand side of the above − inequality corresponds to i z and the right-hand side of the above inequality corresponds to i z i i . γw ( , σ ) γw ( , σ , τ ) 98 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

Fix a finite horizon s > 0. The strategy profile σ is an ε-equilibrium, ε ≥ 0, of the normalized s-horizon game if σ is an admissible strategy profile and for every state z, every player i, and every strategy τ i of player i, we have

s s i z 1 i z 1 i γ (z, σ ) := E g dt ≥ E − g dt − ε. s σ s t σ i ,τ i s t 0 0  i i This inequality corresponds to inequality (1) with the profile w = (w )i∈N of measures where w is the uniform proba- [ ] i [ ] 1 bility distribution on 0, s , namely, where w is the probability measure on 0, s with density s . Indeed, for this profile of measures the left-hand side of the above inequality corresponds to i z and the right-hand side of the above inequality γw ( , σ ) − corresponds to i z i i . γw ( , σ , τ ) The strategy profile σ is an ε-equilibrium, ε ≥ 0, of the limiting-average game if σ is an admissible strategy profile and for every state z, every player i, and every strategy τ i of player i, we have

s s 1 1 E z lim inf gi dt ≥ E z lim sup gi dt − σ t σ −i ,τ i t ε. s→∞ s s→∞ s 0 0  i This inequality corresponds to inequality (1) with the profile w = (w )i∈N of probability measures on [0, ∞] where wi(∞) = 1. Indeed, for this profile of probability measures the left-hand side of the above inequality corresponds to − i z and the right-hand side of the above inequality corresponds to ¯ i z i i . γ w ( , σ ) γw ( , σ , τ )

1.8. Uniform equilibrium

In this section we define a uniform equilibrium payoff of a continuous-time stochastic game. The definition is the natural concept that is analogous to the much-studied uniform equilibrium payoffs of discrete-time stochastic games. i S×N A uniform ε-equilibrium payoff of the non-zero-sum continuous-time stochastic game is a vector u = (u (z))i∈N,z∈S ∈ R such that there is an admissible strategy profile σε and a sufficiently long finite horizon sε such that for every initial state z, i every player i ∈ N, every s ≥ sε, and every strategy σ of player i, we have i −i i − ≤ i ≤ i + γs (z, σε , σ ) ε u (z) γs (z, σε) ε.

i S×N A uniform equilibrium payoff of the non-zero-sum continuous-time stochastic game is a vector u = (u (z))i∈N,z∈S ∈ R that is a uniform ε-equilibrium payoff for every ε > 0. A uniform ε-equilibrium strategy profile (or equilibrium strategy, for short) of the non-zero-sum continuous-time stochastic game is an admissible strategy profile σε such that there is a sufficiently long finite horizon sε such that for every initial i state z, every player i ∈ N, every s ≥ sε, and every strategy σ of player i, we have i −i i − ≤ i ≤ i + γs (z, σε , σ ) ε u (z) γs (z, σε) ε, where u is a uniform equilibrium payoff. It is important to note that a strategy profile σε that is an ε-equilibrium strategy in all sufficiently long finite-horizon games – for which there is a sufficiently long finite-horizon sε such that for every initial state z, every player i ∈ N, every ≥ i i −i i − ≤ i s sε, and every strategy σ of player i, we have γs (z, σε , σ ) ε γs (z, σε) – need not be a uniform 2ε-equilibrium i ∞ strategy. The reason is that the payoff γs (z, σε) need not converge as the horizon s goes to . The uniform equilibrium and the limiting-average equilibrium concepts are both tailored for the long-horizon undis- counted games. However, they are two different concepts. A uniform ε-equilibrium strategy need not be a limiting-average 2ε-equilibrium strategy, and, vice versa, a limiting- average ε-equilibrium strategy need not be a uniform 2ε-equilibrium strategy. A third equilibrium concept, which extends simultaneously both of the above, is the uniform-limiting-average, or uniform-l-average for short. A uniform-l-average ε-equilibrium payoff of the non-zero-sum continuous-time stochastic game is a vector u = i S×N (u (z))i∈N,z∈S ∈ R such that there is an admissible strategy profile σε and a sufficiently long finite horizon sε such i that for every initial state z, every player i ∈ N, every s ≥ sε, and every strategy τ of player i, we have

i −i i − ≤ i ≤ i + γs (z, σε , σ ) ε u (z) γs (z, σε) ε and ¯ i −i i − ≤ i ≤ i + γ (z, σε , σ ) ε u (z) γ (z, σε) ε. A uniform-l-average ε-equilibrium strategy of the non-zero-sum continuous-time stochastic game is an admissible strategy profile σε for which there is a sufficiently long finite horizon sε such that for every initial state z, every player i ∈ N, every i s ≥ sε, and every strategy τ of player i, we have A. Neyman / Games and Economic Behavior 104 (2017) 92–130 99

i −i i − ≤ i ≤ ¯ i ≤ i + γs (z, σε , σ ) ε γ (z, σε) γ (z, σε) γs (z, σε) ε and ¯ i −i i − ≤ i γ (z, σε , σ ) ε γ (z, σε).

A uniform-l-average equilibrium payoff of the non-zero-sum continuous-time stochastic game is a vector u = i S×N (u (z))i∈N,z∈S ∈ R that is a uniform-l-average ε-equilibrium payoff for every ε > 0. Obviously, a uniform-l-average equilibrium payoff is both a uniform equilibrium payoff and a limiting-average equilibrium payoff.

1.9. Robust equilibria

Our main objective is the study of approximate3 equilibria (of discounted continuous-time stochastic games) that are insensitive to a small imprecision in the specification of players’ evaluations of streams of payoffs. The interest in such a study stems from the fact that in most real-life interactions, the exact duration and/or the exact discounting rate is unknown (and definitely not commonly known). Moreover, equilibrium analysis of game models whose duration is known, but not commonly known, to all players leads to results that are completely different from those of the model with a fixed finite duration; see Neyman (1999).  For each continuous-time stochastic game , we wish to identify a large family of sequences of profiles wk of measures for which the following condition holds. For every ε > 0there is a pair consisting of a strategy profile σ and a vector payoff v such that for k sufficiently large  σ is an ε-equilibrium of w with a payoff within ε of v. If the condition holds for a sequence of profiles wk we say that  k has (wk)-robust-equilibria.  The sequence of profiles wk of measures might be interpreted as an attempt to construct “better and better” approxima- tions to a desired profile of measures, or to a small imprecision in the profile of measures that define the time-separable payoffs of the players. The “better and better” approximations are subject to the usual caveats of converging sequences. The convergence can be with respect to a metric or a topology. Note that the smaller the metric, respectively, the weaker the topology, the larger the set of converging sequences of profiles of measures. Before turning to our definition of converging sequences and the corresponding concept of robust equilibrium, we intro- duce a simple example of robustness.  [ ∞  i Let wk be a sequence of profiles of measures on 0, ) such that wk is, for every i, a Cauchy sequence in the norm 4 i distance. Note that the sequence of payoffs γ  (z, σ ) converges uniformly in σ and that it is easy to prove that for any wk profile of measures w on [0, ∞) and ε > 0, a continuous-time stochastic game (with finitely many states and actions) has an -equilibrium. Therefore, for sufficiently large k, an -equilibrium of  is, for every l ≥ k, a 2 -equilibrium of ε ε σ wk ε  with a payoff within of  z . Therefore has w -robust equilibria. wl ε γwk ( , σ ) ( k) We continue with the introduction of a few examples of sequences of profiles of measures. Let us state upfront that our  results prove that a continuous-time stochastic game with finitely many states and actions has (wk )-robust equilibria for  each one of the sequences (wk) that appear in Examples 1–5 below.

Example 1. Fix a profile w of probability measures that are supported on [0, ∞). Consider the family of all sequences of     profiles of probability measures wk such that wk([0, t)) →k→∞ w([0, t)) for every point t with w(t) = 0.

 i [ ] Example 2. The sequence of profiles wk of measures, where wk is the uniform probability measure on 0, k . Note that for this sequence of profiles of measures, the statement that has a uniform equilibrium payoff is equivalent to the statement  that has (wk)-robust equilibria.

i i −ρi t Example 3. The family of sequences of profiles w  of measures, ↓ 0 ∀i, where w ([t, ∞)) = e k . ρk ρk i ρk

 ∀ i [ + + ∀ ≤ Example 4. The family of sequences of profiles wk of measures such that i, wk( a c, b c)) is nonincreasing in c ( 0 ∞ i [ → a < b < ) and wk( 0, 1)) k→∞ 0.

Note that each one of the sequences of the profiles of measures in Examples 2 and 3 is in the family of sequences described in Example 4.

3 It is impossible to find a strategy profile (or a vector payoff) that is an exact equilibrium (or an exact equilibrium payoff) in all sufficiently small changes in the payoff valuations. 4 X  −  := | − | The norm distance between two probability measures P and Q on a measurable space (X, ) is defined by P Q 2 supB∈X P(B) Q (B) . If X X  −  = | − | is finite (or countable) and consists of all subsets of X, then P Q x∈X P(x) Q (x) . 100 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

 [ ∞ ∀ i [ + Example 5. The family of sequences of profiles wk of measures that are supported on 0, ) and such that i, wk( a c, + ∀ ≤ ∞ i [ ∞ → i −ρit  b c)) is nonincreasing in c ( 0 a < b < ) and wk( t, )) k→∞ θ e , where ρ is a fixed profile of nonnegative numbers ρi ≥ 0 and θ is a fixed profile with 0 ≤ θ i ≤ 1.

Note that each one of the sequences of the profiles of measures in Example 3 is in the family of sequences described in Example 5.  ∗ [ ∞] → Our robustness results will require that the sequence wk w -converge; i.e., for every continuous function 0, t ∈ R ∞ i →∞ g(t) the integrals 0 g(t) dwk(t) converge as k . ∗  [ ∞ i In a w -converging sequence of measures wk on 0, ), part of the mass of wk may be pushed to infinity. Therefore, there need not be a profile w of measures on [0, ∞) that represent the limit. The limit is represented by a profile w of measures on [0, ∞] such that for every player i and every continuous function ∞ i i g :[0, ∞] → R, we have g(t) dw (t) →k→∞ [ ∞] g(t) dw (t). 0 k 0, ∗ We equip the space of measures W on [0, ∞] with the minimal topology (called the w -topology) such that for every [ ∞] → →  := N continuous function 0, t ft , the function W w [0,∞] ft dw(t) is continuous. W W is equipped with the product topology. Given w ∈ W , we say that has w-robust equilibria if there is a vector payoff v (called a w-time-separable-robust equilibrium payoff ) such that for every ε > 0there is a strategy profile σ (called a w-robust ε-equilibrium strategy) and a   ∈ neighborhood U of w, such that for every u U , σ is an ε-equilibrium of u with a payoff within ε of v. A discounting measure is a measure w ∈ W with w([a + c, b + c)) nonincreasing in c ≥ 0for every 0 ≤ a < b < ∞. ∞ dw ∞ A discounting measure w is absolutely continuous on (0, ) with dt (t) nonincreasing. It may have atoms at 0 and . Small values of w([0, 1])/w([0, ∞]) correspond to valuations of patient players. The space of all discounting measures is denoted by Wd.   N  Given w ∈ Wd := (Wd) , we say that has w-discounting-robust equilibria if there is a vector payoff v (called a w-discounting-robust equilibrium payoff ) such that for every ε > 0there is a strategy profile σ called a w-discounting-robust   ∈ ∩  ε-equilibrium (strategy) and a neighborhood U of w, such that for every u U Wd, σ is an ε-equilibrium of u with a payoff within ε of v.    The continuous-time stochastic game has discounting-robust equilibria if for every w ∈ Wd, has w-discounting-robust equilibria.

1.10. The results

The main result is that every continuous-time stochastic game with finitely many states and actions has discounting- robust equilibria.  1  The set Wd of all profiles of probability discounting measures is infinite. However, as it is a compact subset of W , the existence of discounted-robust equilibria implies (and is equivalent to) the following finiteness result.  1 For every ε > 0, there is a finite open cover (Uα)α∈A of W , a finite list of strategy profiles (σα)α∈A, and a finite list of d  vector payoffs (vα)α∈A, such that for every profile of discounting measures w in Uα, σα is an ε-equilibrium of w with a payoff within ε of vα. The discounting measure can vary among players. Therefore the main result applies also to cases where some of the players are extremely patient (e.g., a government) and other players’ patience can vary (e.g., as a function of each player’s uncertain life horizon). The discounting measure of player i, wi , may have a positive mass at infinity, representing the weight of the discounting measure on the distant future. If both wi ([0, ∞)) and wi ({∞}) are positive, the discounting measure wi represents a valuation that is an average of a classical (not necessarily stationary) discounting valuation and a valuation of an extremely patient player. The existence of w -discounting-robust equilibria, where each discounting measure wi is supported on {∞}, implies the existence of a uniform-l-average equilibrium payoff. The existence of a uniform (or a limiting-average) equilibrium payoff in any continuous-time stochastic game with finitely many5 states and actions is in sharp contrast to our knowledge about uniform equilibrium in discrete-time stochastic games, where it is still unknown whether all discrete-time stochastic games with finitely many states and actions have a uniform (or a limiting-average) equilibrium payoff. A continuous-time stochastic game with finitely many states and actions need not have a w -robust equilibrium payoff for every w ∈ W . However, if the profile of measures w ∈ W is supported on [0, ∞), then it has a w -robust equilibrium. Earlier studies of continuous-time stochastic games include Zachrisson (1964), Lai and Tanaka (1982), and Guo and Hernandez-Lerma (2003, 2005). An important difference between the continuous-time game model of the present paper and those of the other papers is that in the present paper we consider all strategies in the continuous-time game, while each one of the other papers confines the strategy space to a subclass of all strategies.

5 The assumption of finitely many states is essential even in the case of a single player; see, e.g., Neyman (2003b, Section 1.4). The assumption of finitely many actions is essential even in the two-person zero-sum case with finitely many states; see Vigeral (2012). A. Neyman / Games and Economic Behavior 104 (2017) 92–130 101

On the other hand, the present paper focuses on the model with finitely many states and actions, while the earlier papers include also more general action and state spaces. Below, we briefly mention the contributions of the above-mentioned papers to continuous-time stochastic games with finitely many states and actions. Zachrisson (1964) studies two-person zero-sum finite-horizon stochastic games with a terminal payoff, where players are confined to Markov strategies, and proves the existence of the value and optimal strategies. Lai and Tanaka (1982) study non-zero-sum discounted games, where players are confined to Markov strategies, and prove the existence of stationary equilibria. Guo and Hernandez-Lerma (2003) study a subclass of two-person zero-sum stochastic games with the average payoff criterion, where players are confined to continuous Markov strategies (see Section 2.1 for the definition), and prove the existence of the value and optimal strategies. Guo and Hernandez-Lerma (2005) study two-person non-zero-sum discounted stochastic games, where players are con- fined to continuous6 Markov strategies, and prove the existence of stationary equilibria. For completeness, we derive the existence of stationary equilibrium in the continuous-time non-zero-sum discounted stochastic game (without the restriction to continuous Markov strategies and more generally without the restriction to (memoryless) Markov strategies).

2. The model

A continuous-time stochastic game (with finitely many states and actions) is defined by: a finite set of players N; a finite set of states S; for each z ∈ S and each player i ∈ N a finite set of actions Ai (z); a (vector-valued) payoff function g : A → RN , i  where A ={(z, a) : a ∈ A(z)} and A(z) =×i∈N A (z); and for each z = z ∈ S and a ∈ A(z) a real-valued transition rate  μ(z , z, a) ≥ 0. The i-th component of g is denoted by gi . i i i For notational convenience we set A =∪z∈S A (z) and A =×i∈N A . The i-th coordinate, i ∈ N, of a ∈ A(z) is denoted by ai . i i The set A (z) represents the set of feasible actions of player i when the state is z. An element a ∈ A(z), a = (a )i∈N with ai ∈ Ai (z), is called an action profile. The interpretation of the payoff function and of the transition rates is that when the state is z ∈ S and play- ers play the action profile a ∈ A(z) during the infinitesimal time dt, then the payoff to player i is gi(z, a)dt and the   state moves to state z = z with probability μ(z , z, a)dt and stays in state z with probability 1 + μ(z, z, a)dt, where := −  μ(z, z, a) z =z μ(z , z, a). A pure play of the continuous-time stochastic game is a measurable function h :[0, ∞) → S × A, t → h(t) = (zt , at ), with at ∈ A(zt ), and t → zt right continuous. A pure play h is identified with a pair of functions hS :[0, ∞) → S, which is the restriction of h to the first coordinate, and h A :[0, ∞) → A, which is the restriction of h to the second coordinate. Given a pure play h we define ht as the restriction of the first coordinate of h to the time interval [0, t] and the restriction [ S A S S of the second coordinate to 0, t). ht is identified with the pair (h[0,t], h[0,t)), where h[0,t] is the restriction of h to the time interval [0, t] and h A is the restriction of h A to the time interval [0, t) [0,t) ∞ −ρt The unnormalized, respectively, normalized, ρ-discounted payoff (ρ > 0) of a pure play h is 0 e g(zt , at ) dt, respec- ∞ −ρt 7 1 s tively, ρ 0 e g(zt , at ) dt. The s-horizon normalized payoff of a pure play h is s 0 g(zt , at ) dt. A play of the continuous-time stochastic game is a (measurable) function h :[0, ∞) → S × (A), t → h(t) = (zt , xt ), with xt ∈ (A(zt )), where (∗) denotes all probabilities on the set ∗. Given a play h we define ht , as we did for the pure play, i.e., as the restriction of the first coordinate of h to the time interval [0, t] and the restriction of the second coordinate to [0, t). ∞ − The -discounted payoff of a play h is e ρt g(z , x )dt, where g(z, x) = x(a)g(z, a) is the linear extension of g. ρ 0 t t a∈A(z) 1 s The s-horizon normalized payoff of a play h is s 0 g(zt , xt ) dt.

2.1. Strategies in continuous-time games

In this section we define and discuss several classes of strategies: stationary strategies, Markov strategies, discretimized strategies, and general strategies. Each one of these classes of strategies is a proper subclass of the next one in the list. In each equilibrium existence result we desire that the equilibrium strategies be as simple as possible.

6 The confinement there is more severe, as it requires the strategy of a player to be a Markov strategy such that for any Markov strategy of the other player the transition rates are continuous. See Guo and Hernandez-Lerma (2005, Definition 3.1). However, unless the game is degenerate, there are no such strategies. Nevertheless, the proofs there hold also for continuous Markov strategies. b 7 Much of the theory developed remains intact even if the integral f (t) dt of a real-valued bounded function f refers to a fixed monotonic linear a b = − c = b + c ≤ functional (on the space of real-valued bounded functions), with a Cdt C(b a) and a fdt a fdt b fdt for all 0 a, b, c. This defines the discounted and s-horizon payoffs over all plays, not necessarily measurable ones. 102 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

However, one has to recall that the equilibrium strategy profile should be such that there is no unilateral beneficial deviation, and a player can deviate to any general strategy. Therefore, an equilibrium strategy profile need be such that, in particular, any unilateral deviation to a general strategy yields a strategy profile that defines a probability distribution on plays.

2.1.1. Markov strategies Let X i (z), X(z), X i , and X denote all probability distributions over Ai (z), A(z), Ai , and A, respectively. A Markov strategy of player i is defined by a measurable function σ i : S ×[0, ∞) → X i with σ i(z, t) ∈ X i (z). A profile i : ×[ ∞ → ∈ [ ] = σ of Markov strategies (σ )i∈N defines a Markov strategy profile σ S 0, ) X with σ (z, t) X(z) by σ (z, t) a i [ i] i∈N σ (z, t) a . A Markov correlated strategy is defined by a measurable function σ : S ×[0, ∞) → X with σ (z, t) ∈ X(z). Note that a profile of Markov strategies is a special case of a Markov correlated strategy. A Markov strategy σ i of player i is continuous if σ i(z, t) is continuous in t. Similarly, a Markov correlated strategy σ is continuous if σ (z, t) is continuous in t. A stationary strategy of player i is a Markov strategy σ i where σ i(z, t) is independent of t. A profile σ of Markov strategies, or more generally, a Markov correlated strategy σ , and an initial state z0 define on the space of right-continuous functions t → zt ∈ S a unique probability distribution Pσ that for every z = zt satisfies the equality t+δ

Pσ (zt+δ = z | ht ) = μ(z, zt , σ (zt , s)) ds + o(δ), (2) t  :=  where μ(z , z, x) a∈A μ(z , z, a)x(a). S This unique distribution is supported on the space H of all functions t → zt that have left limits and are right contin- uous at every t ≥ 0. (The fact that with Pσ probability 1, the function t → zt has left and right limits at any t follows from condition (2) and the right continuity follows from our assumption.) In order to define the probability Pσ on the space of right continuous functions t → zt , it suffices to define the distribu- tion of (sk, zsk )k≥0 where sk is the time of the k-th state change. − Conditional on (s, zs )≤k the distribution of sk+1 sk is given by s +a k μ(z ,z ,σ (z ,s)) ds − ≥ | = sk sk sk sk ≥ Pσ (sk+1 sk a (s, zs )≤k) e for a 0, and (using the Lebesgue density theorem) conditional on (s)≤k+1 and (zs )≤k, the distribution of zsk+1 is given by −μ(z, z , σ (z , s + )) = | = sk sk k 1 Pσ (zsk+1 z (s)≤k+1,(zs )≤k) a.e. μ(zsk , zsk , σ (zsk , sk+1)) The space of mixed strategies that are mixtures of Markov strategies depends on the measurable structure on the space of Markov strategies. Obviously, there are many measurable structures on the space of Markov strategies. One of them is × i ≤ ∞ derived from the minimal topology for which, for every real-valued continuous function f on S X and all 0 a < b < , i → b i the function σ a f (z, σ (z, t)) dt is continuous.

2.1.2. Discretimized strategies Fix a strictly increasing sequence of times T = (tk)k≥0 with t0 = 0 and R tk →∞ as k →∞. i i i A T -discretimized strategy of player i specifies for each k and ht ∈ Ht a function σ : S ×[t , t + ) → X with σ (z, t) ∈ k k ht k k 1 ht X i(z). k k i i i i The interpretation of this description of σ is that at time t ≤ t < t + the strategy σ selects an action σ (zt , t) ∈ X (z) k k 1 ht ∈ k as a function of the state zt at time t and the history of play htk Htk up to time tk. i A profile σ = (σ )i∈N of T -discretimized strategies defines for each k and ht ∈ Ht a function σh : S ×[tk, tk+1) → X k k tk := ⊗ ∈ i ∈×∈ i ⊂ ⊗ with σht (z, t) i N σh (z, t) i N X (z) X(z) (where stands for the product of measures). k tk A T -discretimized correlated strategy specifies for each k and ht ∈ Ht a function σh : S ×[tk, tk+1) → X with σh (z, t) ∈ k k tk tk X(z). Note that a profile of T -discretimized strategies is a special case of a T -discretimized correlated strategy. A T -discretimized correlated strategy (with suitable measurability assumptions) defines (for each initial state z0) a (unique) probability distribution Pσ on plays (with right continuous state function t → zt ) such that for tk ≤ t < tk+1 and z ∈ S,

xt = σh (zt ,t) and (3) tk t+δ

Pσ (zt+δ = z | ht ) = 1z=z + μ(z, zt , σh (zt , s)) ds + o(δ), (4) t tk t = = = = where 1z=zt 1if z zt , and 1z=zt 0if z zt . A. Neyman / Games and Economic Behavior 104 (2017) 92–130 103

In all our results, the profile of equilibrium strategies will be a profile of T -discretimized strategies (and recall that T is a sequence of real numbers). ∗ ∗ However, a deviation of a player (even in the single-player case) need not be a T -discretimized strategy with T being a sequence of real numbers. For example, the strategy that for t ≥ s1 plays one action at = a, and for t < s1 plays another action at = b = a, is not a T -discretimized strategy w.r.t. a strictly increasing sequence T = (tk)k≥0 of real numbers. Let us show, informally, that a profile of strategies that is obtained by a unilateral deviation from a profile of T -discretimized strategies defines a unique probability on plays. The derivation is not formal as we have not yet introduced the definition of general strategies. However, aside from the needed suitable measurability assumptions, the general form of the behavior of a strategy of player i when confronted with T -discretimized strategies of the other players should be clear. Following our definition (in the next section) of general strategies, it will be clear that the assumed behavior of a general unilateral deviation holds for any general strategy, and therefore, with the suitable measurability assumptions, the informal arguments below turn into a formal proof. Let σ be a profile of T -discretimized strategies, where T = (tk)k≥0 is an increasing sequence of real numbers with t0 = 0 and tk →k→∞ ∞. Let s0 = 0 and let sl be the time of the l-th state change. If there are fewer than l state changes then sl =∞. Note that sl is a function of the play. l = ∨ ∧ ∨ ∧ ≤ l ≤ Define tk sl (sl+1 tk), where a b stands for max(a, b) and a b stands for min(a, b). Note that sl tk sl+1 and l → tk k→∞ sl+1. i Assume that player i is deviating to a strategy τ . By induction on l assume that the distribution of hsl is uniquely −i i defined by the strategy profile (σ , τ ). We will show that this assumption implies that the distribution of hsl+1 is uniquely defined. ≤ The first step shows that for sl t < sl+1, the action profile xt is a well-defined function of hsl and the strategy profile − (σ i , τ i ). = j T = l ≤ l j For any j i, the strategy σ is a (tk)k≥0-discretimized strategy. Therefore, for any tk t < tk+1, xt is well defined j as a function of h l and σ . tk l ≤ l −i = j −i = j Therefore, for any t t < t + , xt (xt ) j =i is well defined as a function of h l and σ (σ ) j =i . k k 1 tk −i l ≤ l −i As xt is well defined for all t t < t + as a function of h l and σ , player i does not learn any new information from k k 1 tk −i l ≤ l −i l ≤ l i i the actions xt , tk t < tk+1. Therefore, (given σ ) the action of player i at time tk t < tk+1, xt , which is defined by τ , is a function of h l . tk l l l l Therefore, for any t ≤ t < t + , the action profile xt is for all t ≤ t < t + a well-defined function of h l and the strategy k k 1 k k 1 tk − profile (σ i, τ i ). ≤ By induction on k, the action profile xt is defined for every sl t < sl+1 as a function of hsl and the strategy profile − (σ i , τ i ). −i i For sl ≤ t, we denote by xl,t the action profile defined by the strategy profile (σ , τ ) at time t conditional on sl+1 > t. −i i Note that the action profile xl,t is a well-defined function of hsl and the strategy profile (σ , τ ). ≤ This enables us to define the action profile xt at time sl t < sl+1 and the conditional distribution of (sl+1, zsl+1 ) as a ˆ := −i i function of hsl and the strategy profile σ (σ , τ ), as follows. For sl ≤ t < sl+1,

xt = xl,t , (5) s +θ l μ(z ,z ,x ) dt ≥ + | = sl sl sl l,t Pσˆ (sl+1 sl θ hsl ) e , (6)  = and (using the Lebesgue density theorem) for z zsl ,  −μ(z , z , x ) =  | = sl l,sl+1 Pσˆ (zsl+1 z hsl , sl+1)   a.e. (7) μ(z , z , xl,sl+1 ) By induction on l we deduce, using Tulcea’s extension theorem (see, e.g., Strook and Varadhan, 2006, Theorem 1.1.9), that − the strategy profile (σ i , τ i ) defines a unique probability distribution on plays.

2.1.3. General strategies :[ ∞ → = −i −i ∈ The space of all plays is denoted by H. The space of all (measurable) functions h 0, ) h(t) (zt , xt ), with xt j −i (×N j =i A ) and zt ∈ S, is denoted by H . i i −i i i i A strategy σ of player i is a (measurable) function σ : H ×[0, ∞) → X with σ (h, t) ∈ X (zt ) and such that for all  ∈ −i × −i ×[ ∞  = i = i  triples (h , h, t) H H 0, ) with ht ht we have σ (h, t) σ (h , t). This last condition asserts that a player cannot 104 A. Neyman / Games and Economic Behavior 104 (2017) 92–130 decide his current mixed action based on future events. Note that a discretimized strategy is a special case of a (general) strategy. ∈ = i i = i −i ≥ A play h H is compatible with the strategy profile σ (σ )i∈N if xt σ (h , t) for every t 0. S A correlated strategy σ is a (measurable) function σ : H ×[0, ∞) → X with σ (h, t) ∈ X(zt ) and such that for all triples  ∈ S × S ×[ ∞  = ∀ ≤ =  ((zs)s≥0, (zs)s≥0, t) H H 0, ) with zs zs s t we have σ ((zs)s≥0, t) σ ((zs)s≥0, t). A play h = ((zs)s≥0, (xs)s≥0) ∈ H is compatible with the correlated strategy σ if xt = σ ((zs)s≥0, t) for every t ≥ 0. A profile of strategies need not define unambiguously a probability distribution over plays; see, e.g., Proposition 1. i i An admissible profile of strategies σ = (σ )i∈N is a strategy profile σ such that for every player i and every strategy τ −i i of player i the strategy profile (σ , τ ) defines a (unique and thus unambiguously) probability distribution Pσ −i ,τ i on plays (with right continuous state function t → zt ). Throughout, we have used measurability assumptions in our definitions. The meaning of a play and of a strategy depends on the σ -algebras that define measurability. However, we skipped the spelling out of the σ -algebras for the following reason. The space of measurable functions from a measurable space (X, X ) to a measurable space (Y , Y) is monotonic increasing in X (namely, for each fixed space X and measurable space (Y , Y), the larger the σ -algebra X , the larger is the space of measurable functions from the measurable space (X, X ) to the measurable space (Y , Y)) and monotonic decreasing in Y. In constructing a strategy profile in an existence result, it is desirable that this strategy be in a space of strategies whose meaning, implementation, and interpretation are as simple as possible. To this end, we wish to take a minimal reasonable σ -algebra on the space of plays in the domain of a strategy, and a maximal reasonable σ -algebra in the image of a strategy. On the other hand, when we wish to demonstrate that there are no unilateral beneficial deviations, it is desirable to demonstrate it for the most general deviation. To this end, we wish to take a maximal reasonable σ -algebra on the space of plays in the domain of a strategy, and a minimal reasonable σ -algebra on the image of a strategy.

2.2. Discussion of strategies

The realization of an increasing sequence of real numbers is a well-ordered subset of the reals. This enables us to define unambiguously the distribution on plays that is defined by a profile of strategies that is obtained by a unilateral deviation from a profile of discretimized strategies. In this section we illustrate the difficulties that arise when the set of decision times is not well ordered.

2.2.1. Prelude: the set-theoretic setup   Consider a totally ordered set T . For every element t ∈ T let Pt := {t ∈ T : t < t}. Fix a set A with at least two elements. If we interpret the set T as the set of action times, and A as the set of single-stage action profiles, then a play is an element T h ∈ A , and the value of h at time t, h(t), is interpreted as the action profile at time t. A history of play up to (and not Pt Pt including) time t is an element A , and given a play h we denote by ht its restriction to A . A local pure strategy profile Pt σ is a list of functions σt : A → A, t ∈ T . The integral of a local pure strategy profile σ is the set of all plays h such that h(t) = σt (ht ) for every t. A local pure strategy profile is integrable if its integral is nonempty; i.e., there is at least one play T h ∈ A such that for every t ∈ T we have h(t) = σt (ht ). The following simple proposition demonstrates that when the set of times is not well ordered then there are local strategies that are not integrable, and there are local strategies whose integral contains more than one element. On the other hand, if the set of times is well ordered, the integral contains a single element.

Proposition 1. The following conditions are equivalent: 1) every local pure strategy profile σ is integrable, 2) the set of times T is well ordered, and 3) the integral of every local pure strategy contains at most one element.

Proof. Assume that T is well ordered. We define for every t ∈ T the action profile h(t) by transfinite induction: (1) for ∗ ∗   the first element t of T , ht∗ =∅ and therefore we set h(t ) = σt∗ (∅); (2) if h(t ) has been defined for every t < t, then ht has been defined and we set h(t) = σt (ht ). As T is well ordered the action profile h(t) is defined uniquely for every t ∈ T . Therefore, the play h is the unique element in the integral of σ . If T is not well ordered, there is an infinite decreasing sequence t1 > t2 >... of times in T . Let a, b be two distinct ∗ = ∈{ : ≥ } = |{ ≤ elements of A. Define the local pure strategy profile σ by σt ( ) a if t / t j j 1 , σti (hti ) b if lim supk→∞ i < j + : = }| ≥ = |{ ≤ + : = }| i k h(t j) a /k 1/2, and σti (hti ) a if lim supk→∞ i < j i k h(t j) a /k < 1/2. Note that for every play h, |{ ≤ + : = }| lim supk→∞ i < j i k h(t j) a /k is independent of i. Therefore, if h is in the integral of σ , h(ti) is a constant and = σti (hti ) h(ti ), contradicting the local definition of σ . = |{ : = }| ∞ ∗ = Define the local pure strategy τ by τti (hti ) b if j i < j and h(t j) a < , and τt ( ) a otherwise. The play that plays a everywhere, and the play that plays b at time t = t j , j ≥ 1, and a elsewhere, i.e., at times t ∈{/ t j : j ≥ 1}, are in the integral of τ . 2 A. Neyman / Games and Economic Behavior 104 (2017) 92–130 105

∗ A delayed local pure strategy profile is a local strategy σ such that there is a well-ordered8 subset T of T such that for ∗ ∗ ∗ ∗ ∗ ∗ every t ∈ T that is not a maximal element of T there is t ∈ T such that t ≤ t < t¯ , where t¯ is the least element in T ∗ ∗ ∗ that is > t , and for t ≤ t < t¯ we have   = ∗ = σt (ht ) σt (ht ) whenever ht ht∗ .

Proposition 2. A delayed local pure strategy profile is integrable and its integral contains a unique element.

∗ ∗ ∗ ∗ Proof. The restriction of the function h(t) to the interval t ≤ t < t¯ is defined by transfinite induction on t ∈ T . 2

Proposition 2 is standard and classical, and is used in the theory of differential games, when the set of decision times T is either a finite interval [0, T ] or the nonnegative real numbers [0, ∞).

2.3. Strategies that observe and select mixed actions

In discrete-time games with perfect monitoring we focus on strategies that observe the pure past actions and select a pure action. Below we discuss our choice of the model, where players observe past mixed actions and strategies select mixed actions as a function of past observed variables. These assumptions are conceptually innocuous in the case where each player is a continuum of agents and the observable variables are the statistics of actions of the continuum of agents. In the continuous-time model, we find these assumptions natural even in the case where each player represents a single decision maker. This is motivated in part by the limitations on the players’ perception. The perception of players is not without limitation. A person watching a sequence of blue and yellow pictures will be unable to tell the exact times when the blue ones were presented if the switching rate crosses some threshold. In fact, he will observe a greenish picture, and the greenness will change as a function of the fraction of time in which the blue pictures were presented. It is therefore desirable to model the observation of the past by the observation of time averages of pure actions, and time averages of pure actions correspond to mixed actions. Given this limitation on perception, it is also natural to model the interaction by assuming that the players select mixed actions. These assumptions – of observing and selecting mixed actions – are also technically convenient. They result in a tractable analytic model. In addition, the analysis of this analytic model leads to analogous results in the asymptotic theory of discrete-time stochastic games with short-stage durations and in the (discretimized approach to) continuous-time stochastic games where players are subject to a short delay in observing other players’ actions; see Neyman (2012, 2013).

3. Non-zero-sum continuous-time stochastic games

3.1. Useful inequalities

Fix a continuous-time stochastic game =N, S, A, g, μ. In this section we introduce several inequalities that are often used in the paper. S → ∈ HS Recall that H is the space of all functions t zt S that are right continuous and have left limits everywhere. Let t , S S respectively, H , be the σ -algebra of subsets of H that is generated by all the S-valued random variables zs , s ≤ t, respec- tively, s < ∞. The concept of a T -discretimized strategy with respect to an increasing sequence of real numbers, which was defined T HS earlier, extends naturally to the case where is an increasing sequence of ( t )t≥0-stopping times. HS T = = →∞ →∞ Fix a strictly increasing sequence of ( t )t≥0-stopping times (tk)k≥0 with t0 0 and tk as k . A special case of interest is where T = (sk)k≥0, where s0 = 0 and sk ≤∞ is the time of the k-th state change. In Section 2.1.2 we showed that if σ is a profile of T -discretimized strategies, where T is an increasing sequence of real i ≤ numbers, and τ is a strategy of player i, then, for sl t < sl+1, the action profile xt is a well-defined function of hsl and −i i ˆ ˆ −i i the strategy (σ , τ ). Therefore, there is an (sk)k≥0-discretimized correlated strategy σ such that σ and (σ , τ ) define the same distribution on plays. The same property holds also in the case where each strategy σ j is a discretimized strategy with respect to a strictly HS increasing sequence (or even a well-ordered countable set) of ( t )t≥0-stopping times. The lemmas in this section assume that σ is a (sk)k≥0-discretimized correlated strategy. Therefore, the conclusions of the lemmas hold for any profile that is obtained by a unilateral deviation from a profile of discretimized strategies. For such a correlated strategy, the mixed action-profile at time t < s1 is well defined as a function of z0 (and σ ) and is denoted by x0,t . Note that x0,t is the mixed action-profile that is selected by the correlated strategy σ at time t conditional on no state change in the interval [0, t]. Note that x0,t is defined for every t ≥ 0as a function of z0 (and σ ).

8 A well-ordered subset of an ordered set is a subset such that every nonempty subset of it has a minimal element. 106 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

Using the notations that are used in the definition of a general correlated strategy, x0,t = σ (h, t), where ht is the unique play that is compatible with σ and zs = z0 for every s ≤ t. ≤ Similarly, the mixed action-profile at time s1 t < s2 is well defined as a function of (s1, zs1 ) (and σ ) and is denoted by x1,t . In other words, x1,t is the mixed action-profile that is selected by the correlated strategy σ at time t conditional on ≥ [ ] ≥ t s1 and no state change in the interval s1, t . Note that x1,t is defined for every t s1 as a function of (s1, zs1 ) and σ . = Using the notations that are used in the definition of a general correlated strategy, for every (s1, zs1 ) and t > s1, x1,t = = σ (h, t), where ht is unique play up to time t that is (1) compatible with σ , (2) zs z0 for every s < s1, and (3) zs zs1 for every s1 < s ≤ t. Let μ := max(z,a)∈A |μ(z, z, a)|. Obviously, μ is finite as it is the maximum over finitely many real numbers. The following Lemma shows that the probability of at least one state change in a time interval is bounded by a constant (μ) times the length of the time interval, and that the probability of at least two state changes in a time interval is bounded by a constant (μ2/2) times the square of the length of the time interval. The second bound implies that for any (finite or) countable collection of time intervals the probability of at least two state changes in one of these time intervals is bounded by a constant (μ2/2) times the sum of the squares of the lengths of the time intervals.

Lemma 1. For every θ>0, −μθ Pσ (s1 ≥ θ)≥ e ≥ 1 − θμ (8) and 2 2 Pσ (s2 ≤ θ)≤ θ μ /2. (9) θ θ −  z ≥ = 0 μ(z,z,x0,t ) dt ≥ ≥−  z ≥ ≥ μ θ ≥ −   Proof. As Pσ (s1 θ) e and 0 0 μ(z, z, x0,t ) dt μ θ, Pσ (s1 θ) e 1 θ μ , proving inequal- ity (8). θ μ(zs ,zs ,x1,t ) dt ≤ − ≤ − s1 1 1 ≤  − The conditional probability, given (s1, zs1 ), that s2 θ s1 is equal, on s1 θ, to 1 e μ (θ s1). η ∈ R − 0 μ(z,z,x0,t ) dt ≤  The density of s1 at η + is given by μ(z, z, x1,η)e μ . ≤ ≤ θ   −   = 2 2 2 Therefore, Pσ (s2 θ) 0 μ (θ s) μ ds μ θ /2, proving inequality (9).

Several proofs in the paper use arguments. In these arguments we will need to consider the expec- tation of the sum of the accumulation of payoffs in a (short) time interval and a function of the state at the end of the interval. Without loss of generality we can assume that the time interval starts at time 0 and ends at time δ. Therefore, we are i + [ led to estimate the expectation of [0,δ) gt dw(t) u(zδ), where w is a nonnegative measure on 0, δ) and u is a real-valued function that is defined on the state space. The expectation depends (obviously) on the strategy profile, or, more generally, the correlated strategy σ . i The next lemma uses the inequalities of Lemma 1 to approximate each one of the terms Eσ [0,δ) gt dw(t) and Eσ u(zδ), and therefore also their sum, by functions that depend on the choice of the mixed action-profile of the correlated strategy σ conditional on no state change in the interval [0, δ]. Recall that this choice is denoted by x0,t . i Recall that g = maxi,z,a |g (z, a)|.

Lemma 2. Let δ>0, wbe a finite nonnegative measure on [0, δ), and u : S → R. Let σ be a correlated strategy. Then, for every z ∈ S, ⎛ ⎞ ⎜ ⎟ | z i − i |≤     [ Eσ ⎝ gt dw(t)⎠ g (z, x0,t ) dw(t) 2 μ δ g w( 0, δ)), [0,δ) [0,δ) | z − − i   |≤  2 2  Eσ u(zδ) u(z) u (z ) μ(z , z, x0,t ) dt 2 μ δ u , ∈ z S [0,δ) and, therefore, by setting ε = ε(δ, μ, w) = 2μδgw([0, δ)) + 2μ2δ2u, ⎛ ⎞ ⎜ ⎟ + z i + ε Eσ ⎝ gt dw(t) u(zδ)⎠ [0,δ) δ i   ≥ g (z, x0,t ) dw(t) + u(z) + u(z ) μ(z , z, x0,t ) dt ∈ [0,δ) z S 0 A. Neyman / Games and Economic Behavior 104 (2017) 92–130 107 ⎛ ⎞ ⎜ ⎟ ≥ z i + − Eσ ⎝ gt dw(t) u(zδ)⎠ ε. [0,δ)

i i Proof. As xt = x0,t on t < t∗ := δ ∧ s1, |g (zt , xt ) − g (z, x0,t )| ≤ 2g I(t ≥ s1), where I(t ≥ s1) is the indicator of the event {t ≥ s1}. Therefore, | z i − i | Eσ gt dw(t) g (z, x0,t ) dw(t) [ [ 0,δ) 0,δ) ≤ z | i − i | ≤ z   I ≥ Eσ gt g (z, x0,t ) dw(t) Eσ 2 g (t s1) dw(t) [ [ 0,δ) 0,δ) ≤   z ≤     [ 2 g Pσ (s1 <δ)dw(t) 2 g δ μ w( 0, δ)). [0,δ)

z 2 2 The event {zt∗ = zδ} is a subset of the event {s2 ≤ δ}. Therefore, by inequality (9), P (zt∗ = zδ) ≤ δ μ /2. For every  σ z ∈ S,

δ   t  μ(z,z,x0,s)ds P (zt∗ = z ) = 1z =z + μ(z , z, x0,t )e 0 dt.

0 t z z x ds δ  | 0 μ( , , 0,s) − | ≤    | |   ≤ 2 2 Therefore, as e 1 t μ and thus z ∈S 0 μ(z , z, x0,t ) t μ dt δ μ ,

δ     2 2 | P (zt∗ = z )u(z ) − u(z) − u(z ) μ(z , z, x0,t ) dt|≤uδ μ , ∈ ∈ z S z S 0 | z − z | ≤   z = ≤ 2 2  which together with the inequality Eσ u(zδ) Eσ u(zt∗ ) 2 u Pσ (zδ zt∗ ) μ δ u completes the proof of the second part of the lemma. The last part of the lemma is obtained by summing the inequalities of the first two parts. 2

3.2. Stationary discounting games

Fix a profile ρ = (ρi)i∈N of discounting rates. We say that the strategy profile σ is an ε-equilibrium, ε ≥ 0, of the ρ-discounted continuous-time stochastic game ρ, if σ is an admissible strategy profile such that for every player i and every strategy τ i of player i we have ∞ ∞ − − ρit i ≥ ρit i − Eσ e g (zt , xt ) dt Eσ −i ,τ i e g (zt , xt ) dt ε 0 0

−i i where Eσ is the expectation with respect to the probability distribution Pσ defined by σ on plays, and σ , τ is the strategy profile whose i-th component is τ i and for j = i the j-th component is σ j . The strategy profile σ is an equilibrium if it is a 0-equilibrium.

Theorem 1. Every ρ-discounted continuous-time stochastic game ρ (with finitely many states and actions) has a profile of stationary strategies that is an equilibrium of ρ.

i i i i i i Proof. Define g  := maxz∈S maxa∈A(z) |g (z, a)|, J (z) =[−g /ρi, g /ρi], and J =×(i,z)∈N×S J (z). The (i, z)-th coordi- N×S i i i i i i nate of v ∈ R ⊃ J is denoted by v (z). Recall that X (z) = (A (z)). Let Y =×z∈S X (z) and Y =×i∈N Y . ∈ ∈ ∈ ∈ RN×S ∈ i [ ] × For every i N, z S, a A(z), v , and x Y , G z v (x), and f z,i are the real-valued functions defined on Y J as follows.  1 i [ ] = i +  i  +  i G z v (a) g (z,a) μ(z , z,a)v (z ) μ v (z) , μ+ρi z∈S where μ = maxz,a |μ(z, z, a)|. 108 A. Neyman / Games and Economic Behavior 104 (2017) 92–130  i [ ] = i [ ] j j G z v (x) G z v (a) x (z)(a ), and a∈A(z) j∈N = i [ ] −i i f z,i(x, v) max G z v (x , y ), yi ∈X i (z)

− where (x i(z), yi ) is the profile of mixed actions whose j-th coordinate for j = i is x j(z) and whose i-th coordinate is yi . i Let F z,i be the correspondence from Y × J to X (z) given by

= i [ ] −i i F z,i(x, v) arg max G z v (x , y ). yi ∈X i (z)

The cartesian product Y × J is a product of nonempty convex compact sets and therefore it is nonempty, convex, and compact. → i [ ] The function (x, v) G z v (x) is a polynomial in the coordinates of (x, v) and therefore continuous, and therefore f z,i(x, v) is also a continuous function of (x, v).  i i 1 i i i i i If v ≤g /ρi then |G [v](a)| ≤ g +μg /ρi ≤g /ρi . Therefore |G [v](x)| ≤g /ρi and therefore z μ+ρi z i also | f z,i(x, v)| ≤g /ρi . i We deduce that f z,i is a continuous function from X × J to J (z). i The correspondence F z,i from Y × J to X (z) is nonempty, convex-valued, and upper semicontinuous. Therefore the corre- × =× z ×{ } spondence F defined on Y J by F (x, v) z,i(Fi (x, v) f z,i(x, v) ) is a nonempty, convex-valued, upper semicontinuous correspondence from the nonempty convex compact set Y × J to itself, and therefore has a fixed point. i i i Let (x, V ) be a fixed point of F . We claim that the profile of stationary strategies σ with σ (ht ) = x (zt ) is an equilibrium of the ρ-discounted game with equilibrium payoff V . i Fix a player i ∈ N, a strategy τ of player i, and an initial state z = z0. First we prove that ∞ − E z e ρit g z x dt ≤ V i z (10) σ −i ,τ i ( t , t ) ( ). 0 For every correlated strategy, or a strategy profile that defines a unique distribution on plays, τ , state z ∈ S, and s ≥ 0, s − − z = z ρit i + ρi s i define fτ (s) Eτ ( 0 e g (zt , xt ) dt e V (zs)). ∞ − Set f (s) = f z(s) = f z (s). Note that f (0) = V i (z) and that f (s) → →∞ E z e ρit gi (z , x ) dt. Therefore, in order σ −i ,τ i s σ −i ,τ i 0 t t to prove (10), it suffices to prove that f is Lipschitz and that the upper-right derivative of f is nonpositive a.e. −i i −i i Let yt := x0,t (x, σ , τ ), namely, yt = (σ , τ )(h, t), where h is a play that is (1) compatible with σ and (2) zs = z for every 0 ≤ s <δ. By Lemma 2, ⎛ ⎞ δ − − f − f 0 = E z ⎝ e ρit g z x dt + e ρi δ V i z ⎠ (δ) ( ) σ −i ,τ i ( t , t ) ( s) 0 ⎛ ⎞ δ δ −ρit −ρi δ ⎝ i i   ⎠ 2 ≤ e g(z0, yt ) dt + e V (z0) + V (z ) μ(z , z0, yt ) dt + O (δ ) ∈ 0 z S 0  δ i   i 2 ≤ g(z0, yt ) V (z )μ(z , z0, yt ) − ρi V (z0) dt + O (δ ). ∈ 0 z S − Therefore, the function s → f (s) is Lipschitz (| f (s + δ) − f (s)| ≤ δe ρi s(2μV i ∞ +g + O (δ))). Therefore, in order to prove (10), it suffices to prove that the upper-right derivative of f is nonpositive a.e. +  i  − i ≤ The choice of σ implies that g(z0, yt ) z∈S μ(z , z0, yt )V (z ) ρi V (z0) 0and therefore the upper-right derivative f (δ)− f (0) of f at 0, lim supδ→0+ δ , is nonpositive. Similarly, for a.e. s > 0, the upper-right derivative of f at s is nonpositive. We conclude that f is nonincreasing. Now setting τ i = σ i , we can change ≤ in the above-derived inequalities to ≥ (and replace the terms +O (δ2) by −O (δ2)) z in order to deduce that fσ (x) is nondecreasing and thus ∞ − z ρit ≥ i Eσ e g(zt , xt ) dt V (z), 0 which together with inequality (10) completes the proof of the theorem. 2 A. Neyman / Games and Economic Behavior 104 (2017) 92–130 109

Covariance properties. In the case where ρi = ρ for every i, an alternative interpretation of the ρ-discounted game is as a game with an uncertain duration d in which each player maximizes his expected total payoff. Under this interpretation, − ρ is a parameter of the distribution of the random duration d, P(d ≥ t) = e ρt . Multiplying ρ by α means that the game terminates faster at a rate of α. This can be compensated by multiplying all transitions and payoffs by α as well. Therefore, any stationary ρ-discounted equilibrium in the game N, S, A, μ, g is also a stationary αρ -discounted equilibrium in the   ¯ ¯  | = 1   = game N, S, A, αμ, αg . In addition, if p is the transition matrix such that p(z z, a) 1−ρ μ(z , z, a) for all z z and ¯ | = + 1 p(z z, a) 1 1−ρ μ(z, z, a), then equations defining a fixed point (x, V ) of the correspondence F (used in the proof of the theorem) are equivalent9 to the equations defining a stationary equilibrium of the discrete-time stochastic game with a stage payoff function g and transition probabilities p¯ . i [ ]  = Formally, and more generally, fix α > 0. Note that the auxiliary function G z v of the ρ-discounted game i N, S, A, μ, g and the αρ-discounted game =N, S, A, αμ, αg are the same. Therefore, a point (x, V ) ∈×z∈S,i∈N (X (z) × i i [−g /ρi, g /ρi]) is a stationary equilibrium (strategies and payoffs) of the continuous-time ρ-discounted game =N, S, A, μ, g if and only if (x, V ) is a stationary equilibrium of the continuous-time αρ-discounted game = N, S, A, αμ, αg. i i i In addition, if 0 < ρ < 1 and μ ≤ 1 − ρ, then a point (x, V ) ∈×z∈S,i∈N (X (z) ×[−g /ρi, g /ρi]) is a stationary equilibrium (strategies and payoffs) of the continuous-time ρ-discounted game =N, S, A, μ, g if and only if it is a stationary equilibrium of the discrete-time ρ-discounted (where the discount factor is 1 − ρ) game ¯ =N, S, A, p¯ , g, ¯ ¯  | = 1   = where p is the transition probability that is given by p(z z, a) 1−ρ μ(z , z, a) for all z z. The discounting minmax. The next theorem is used in the proof of Theorem 3, which is used (in the punishing phases) in ¯ i the proof of the main result. It asserts that (1) the minmax Vρ of player i in the unnormalized ρ-discounted game exists, → ¯ i 10 (2) the function ρ Vρ is semialgebraic, and (3) there are minimaxing strategies that are stationary.

Theorem 2. Fix a ρ-discounted continuous-time stochastic game (with finitely many states and actions) ρ and a player i ∈ N. a) The set of equations in the real variables v(z), z ∈ S, 

−  −  ρv(z) = min max gi(z, x i ⊗ xi) + μ(z , z, x i ⊗ xi)v(z ) , (11) x−i xi z∈S −i = j ∈× j i ∈ i ¯ i where the minimum is over all x (x ) j =i j =i (A (z)) and the maximum is over all x (A (z)), has a unique solution Vρ. → ¯ i  ¯ i  ≤  b) The function ρ Vρ is semialgebraic and Vρ g /ρ. c) There are stationary strategies σ j , j = i, such that for every strategy σ i of player iand every initial state zwe have ⎛ ⎞ s − − E z ⎝ e ρt gi z x dt + e ρs V¯ i z ⎠ ≤ V¯ i z ∀s ≥ 0 and σ −i ,σ i ( t , t ) ρ( s) ρ( ) , 0 ∞ − E z e ρt gi z x dt ≤ V¯ i z (12) σ −i ,σ i ( t , t ) ρ( ). 0

RS = i [ ] −i i Proof. Let Q be the map from to itself that is given by Qv(z) minx−i maxxi G z v (x , x ), where 

− 1 −  −  Gi [v](x i, xi) = gi(z, x i ⊗ xi) + μ(z , z, x i ⊗ xi)v(z ) +μv(z) . z μ+ρ z∈S Q is a strict contraction and therefore has a unique fixed point V . = ¯ i ∈ RS = A vector V (Vρ(z))z∈S is a solution of Qv v if and only if it is a solution of equality (11). Therefore, equality (11) has a unique solution. This completes the proof of part a). ¯ i | ¯ i | ≤  Observe that any solution (Vρ (z))z∈S of equality (11) obeys Vρ(z) g /ρ, and the semialgebraicity of equality (11) ¯ i guarantees that the function that maps ρ to the unique solution (Vρ (z))z∈S is semialgebraic. This completes the proof of part b). −i j Let x (ht ) = (x (ht )) j =i be the minimizer of the right-hand side of (11) for z = zt . = j j = j For every j i we define the strategy σ by σt (h) x (ht ). As in the proof of Theorem 1, it follows that for every strategy σ i we have

9 This equivalence can be used to provide an alternative proof of the existence of a fixed point of F . 10 See, e.g., Neyman (2003a) for the definition of semialgebraic functions, and their applications to stochastic games. 110 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

∞ − E z e ρt gi z x dt ≤ V z σ i ,σ −i ( t , t ) ( ), 0 which completes the proof of the theorem. 2

→ i := ¯ i Theorem 2 implies that the function ρ vρ(z) ρVρ is a bounded semialgebraic function, and therefore it converges to a limit vi (z) as ρ → 0+.

3.3. The undiscounted minmax

Fix a continuous-time stochastic game =N, S, A, μ, g with finitely many states and actions. ∈ z A correlated strategy σ defines for every state z S a unique probability distribution Pσ on the plays of . The expec- z z tation with respect to the probability distribution Pσ is denoted by Eσ . For every player i ∈ N, we set s i z 1 i i i γ (z, σ ) := E g dt, where g := g (zt , xt ), s σ s t t 0 s s 1 1 ¯ i := z i i := z i γ (z, σ ) Eσ lim sup gt dt, and γ (z, σ ) Eσ lim inf gt dt. s→∞ s s→∞ s 0 0 Recall that the profile of strategies that is obtained by a unilateral deviation from a profile of admissible strategies (e.g., from a profile of discretimized strategies) is equivalent, in terms of the probability distribution that it defines on the plays, to a correlated strategy. In the previous section, we showed that there are stationary minimaxing strategies in the ρ-discounted games. There are continuous-time stochastic games (as in the discrete-time case) for which Markov strategies can serve nei- ther as approximate minimaxing strategies in the limiting-average game, nor as approximate minimaxing strategies in all sufficiently long finite-horizon games. The next result shows that such minimaxing strategies exist in the class of discretimized strategies. Moreover, the dis- cretimization can be done with the sequence of nonnegative integers. An N-discretimized strategy is a (tk)k≥0-discretimized strategy where tk = k.

j Theorem 3. For every ε > 0 and every player ithere are N-discretimized strategies σ , j = i, and a time sε such that for every strategy i σ of player iand every s > sε we have i −i i ≤ i + ¯ i −i i ≤ i + γs (z, σ , σ ) v (z) ε and γ (z, σ , σ ) v (z) ε. ≥ := k+1 i Proof. For every nonnegative integer k 0, let rk t=k g (zt , xt ) dt. Fix a player i ∈ N. For every sequence ρk, k ≥ 0, where ρk+1 > 0is a function of hk+1 (e.g., a function of ρk, zk+1, and rk), −i i there is by Theorem 2 an N \{i} profile of strategies σ = (σ j) j =i , so that for every strategy σ of player i we have ⎛ ⎞ k+1 hk ⎝ −ρ (t−k) i −ρ i ⎠ i E ρ e k g dt + e k v (z + ) ≤ v (z ), σ k t ρk k 1 ρk k k

−i i hk where σ is the strategy profile (σ , σ ), and Eσ f is the conditional expectation, given hk, of the random variable f w.r.t. the probability on plays defined by the strategy profile σ . | i | =| i | ≤   i  := | i | ≤  1 | − −ρt | +| − − −ρ| = | − − As gt g (zt , xt ) g and vρ maxz∈S vρ(z) g , the inequality 0 1 e ρ dt 1 ρ e 2 1 ρ − e ρ| ≤ O (ρ2) implies that ⎛ ⎞   k+1 hk i hk ⎝ −ρ (t−k) i −ρ i ⎠ 2 E ρ r + (1 − ρ )v (z + ) ≤ E ρ e k g dt + e k v (z + ) + O (ρ ). σ k k k ρk k 1 σ k t ρk k 1 k k

Therefore, for sufficiently small θ>0, the inequality ρk ≤ θ implies that   h i i E k ρ r + (1 − ρ )v (z + ) ≤ v (z ) + ε(θ)ρ , σ k k k ρk k 1 ρk k k where ε(θ) → 0as θ → 0+. A. Neyman / Games and Economic Behavior 104 (2017) 92–130 111

By the proof in Mertens and Neyman (1981) (or using the statement of Neyman, 2003b, Lemma 1) and the inequality 1 s i ≤ 1 [s] i + s 0 g (zt , xt ) dt [s] 0 g (zt , xt ) dt O (ε), we deduce that there is a sequence of history-dependent (hence, random) discount rates ρk (where ρ0 is a constant and for a positive integer k the discount rate ρk is a function of zk, ρk−1, and −i i rk−1) such that the corresponding strategy profile σ satisfies, for all s sufficiently large and all strategies σ of player i,

i −i i ≤ i + ¯ i −i i ≤ i + 2 γs (z, σ , σ ) v (z) O (ε) and γ (z, σ , σ ) v (z) O (ε).

The choice of N in the construction of the N-discretimized minimaxing strategies is for convenience. In fact, for any increasing sequence (tk) of reals with t0 = 0 and, e.g., bounded difference tk+1 − tk, there are undiscounted minimaxing strategies that are (tk)-discretimized.

3.4. Uniform l-average equilibria

One of the central open problems in stochastic games is the existence of a uniform (or a limiting-average) equilibrium payoff in (discrete-time) stochastic games with finitely many states and actions. The present paper proves the existence of uniform and limiting-average equilibrium payoffs in all continuous-time stochastic games with finitely many states and actions.

Theorem 4. Let =N, S, A, g, μ be a continuous-time stochastic game with finitely many states and actions. There exists a vector S×N i payoff u ∈ R such that for every ε > 0 there are (discretimized) strategies σ , i ∈ N, and a positive number s0 > 0, such that for i every s > s0, every player i, every state z ∈ S, and every strategy τ of player i, we have

− + i −i i ≤ i ≤ i + ε γs (z, σ , τ ) u (z) γs (z, σ ) ε, and (13) − −ε + γ¯ i(z, σ i, τ i) ≤ ui(z) ≤ γ i(z, σ ) + ε. (14)

Such a vector payoff u is called a uniform-l-average equilibrium payoff, and the corresponding strategy σ is called a uniform-l-average ε-equilibrium strategy. i A vector payoff u for which ∀ε > 0 ∃σ , s0 s.t. ∀s > s0, i, τ the system of inequalities (13) holds is called a uniform equilibrium payoff, and a vector payoff u for which ∀ε > 0 ∃σ s.t. ∀i, τ i the system of inequalities (14) holds is called a limiting-average equilibrium payoff. The corresponding strategies are called a uniform ε-equilibrium strategy and a limiting-average ε-equilibrium strategy. The proof of Theorem 4 is given in Section 4. The proof follows a similar outline to that used in Solan and Vieille (2002) for the proof of the existence of uniform extensive-form correlated equilibria in discrete-time stochastic games.

3.5. Robust equilibria

Given a correlated strategy profile σ (which, as said earlier, defines for every state z ∈ S a unique probability distribution z =   ∈ N Pσ on the plays of the continuous-time stochastic game N, S, A, μ, g ), a profile of discounting measures w W , and a player i ∈ N, we set i := z i i + i ∞ i = z i i γ w (z, σ ) Eσ gt dw (t) w ( )γ (z, σ ) Eσ gt dw (t), [0,∞) [0,∞] s i = i ∞ i = →∞ 1 i where g gt if t < and g∞ lim infs s 0 gt dt, and ¯ i := z i i + i ∞ ¯ i = z ¯ i i γw (z, σ ) Eσ gt dw (t) w ( )γ (z, σ ) Eσ gt dw (t), [0,∞) [0,∞] ¯ i = i ∞ ¯ i = 1 s i where gt gt if t < and g∞ lim sups→∞ s 0 gt dt.

Theorem 5. Let =N, S, A, g, μ be a continuous-time stochastic game with finitely many states and actions. For every ε > 0, there A  1 ∈ A ∈ RS×N is a finite set and an open cover (Uα)α∈A of Wd , such that for every α there is a vector payoff uα and a profile of  ∈ (discretimized) strategies σα, such that for every u Uα, σα is an ε-equilibrium of u with a payoff ε-close to uα; alternatively, such that for every player i, every state z ∈ S, and every strategy τ i of player iwe have

− + ¯ i −i i ≤ i ≤ i + ε γu(z, σα , τ ) u (z) γ u(z, σα) ε. (15) 112 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

The concept of robust equilibria depends on the topology defined on (W , hence on) Wd. Note that the coarser the topology is, the stronger the corresponding robustness result is. ∗ N {  = i ∈ N :| i i − i | i ∀ ∈ A basis for the w -neighborhoods of W are sets of the form w (w )i∈N W [0,∞] f j (t) dw (t) c j < d j j ∈ } i ∈ [ ∞] ≤ i ≤ i i ∈[ ] J, i N , where J is a finite set, f j C( 0, ) with 0 f j 1, and c j, d j 0, 1 . Therefore, the robustness result is equivalent to the following statement. A ∈ A = i For every ε > 0, there is δ>0 and a finite set , where α is a list ( fα, cα, vα, σα) with fα ( fα)i∈N a vector of N S×N [0, 1]-valued continuous functions on [0, ∞], {cα : α ∈ A} being a finite δ net of [0, 1] , vα ∈ R , and σα an admissible  ∈  1 | i i − i | strategy profile, such that for every w Wd with [0,∞] fα(t) dw (t) cα <δ, σα is an ε-equilibrium of w with a payoff ε-close to uα. A crucial step in the proof of Theorem 5 is Theorem 4, which is of independent interest.

4. Proof of Theorem 4

Theorem 4 is straightforward if μ = 0. Indeed, if μ = 0, there are no state changes in a play; i.e., conditional on the initial state z we have a continuous-time supergame. Therefore, if σ is a stationary strategy profile with σ (z) an equilibrium of the single-stage game with payoff function a → g(z, a) and u(z) = g(z, σ (z)), then inequalities (13) and (14) hold for every ε ≥ 0. Therefore, we assume that μ > 0. Without loss of generality we can assume that 0 ≤ gi ≤ 1 and μ = 1. Indeed, the continuous-time stochastic game = N, S, A, μ, g has a uniform-l-average equilibrium payoff if and only if ¯ =N, S, A, μ¯ , g¯ does, where μ¯ = αμ, and g¯ i = i i βi + αi g for some (and therefore for all) α, α1, ..., αn > 0 and β1, ..., βn ∈ R. Therefore, we assume that 0 ≤ g (z, a) ≤ 1 and μ = 1. In order to prove Theorem 4, it suffices to prove it with the ε-independent vector payoff u being replaced by an ε-dependent uε. Namely, it suffices to prove that ∀ε > 0, ∃uε, sε, σε s.t. ∀sε < s < ∞, (13) and (14) hold when u, s0, and σ ∃ → + → are replaced there by uε, sε, and σε, respectively. Indeed, let u be a limit point of uε; then εk k→∞ 0 s.t. uεk k→∞ u. = =  −  For ε > 0, (13) and (14) hold with s0 sεk and σ σεk and k sufficiently large so that εk < ε/2 and u uε < ε/2.

4.1. An informal outline of the proof

The proof constructs a profile of discretimized pure-action strategies τˆ that depends on ε > 0 and satisfies several properties. The discretimization is with respect to a refinement of a sequence of real numbers T = (t j) j≥0, where t0 = 0 and t j < t j+1 → j→∞ ∞. The approximate equilibrium strategy profile σ consists of following τˆ as long as no player deviates from playing ac- cording to τˆ. If no player deviates before time t j−1 and a single player deviates in the time interval [t j−1, t j), then all the other players switch at time t j to punishing the deviating player in the uniform-l-average game. The constructed strategy profile τˆ satisfies the following property. For any nonnegative integer j and s ≥ t j , the following inequality holds.

s 1 1 E z gi dt | H ≥ E z vi z − O − O (16) τˆ ( t t j ) τˆ ( t j ) (ε) ( ). s − t j s − t j t j

ˆ [ i A deviation from τ in the time interval t j−1, t j) cannot change the expectation of v (zt j ) or the payoff of a player in this time interval by more than O (t j − t j−1). − ≤ i Therefore, if sup j(t j+1 t j) ε, then for any strategy τ of player i,

s i −i i z 1 i i 1 γ (z, σ , τ ) = E − g dt ≤ γ (z, τˆ) + O (ε) + O ( ). (17) s σ i ,τ i s t s s 0 − Therefore, if sup j(t j+1 t j) is sufficiently small, σ is an approximate equilibrium of the s-horizon game with a payoff within O (ε) of γs(z, τˆ) for all sufficiently large s. A priori, the limit of γs(z, τˆ), as s →∞, need not exist. Moreover, the difference between the limit superior and the limit inferior of γs(z, τˆ) need not be small. Therefore, condition (17) by itself does not guarantee the existence of a uniform equilibrium payoff. In order to prove the existence of a uniform equilibrium payoff, the constructed strategy τˆ will be such that in addition to obeying (16) it satisfies

s 1 i ˆ z γ (z, τ ) = E ˆ gt dt converges as s →∞. (18) s τ s 0 A. Neyman / Games and Economic Behavior 104 (2017) 92–130 113

T = ˆ − The existence, for every ε > 0, of a profile of (t j) j≥0-discretimized pure-action strategies τ (with sup j(t j+1 t j) sufficiently small) that satisfies (16) and (18) guarantees the existence of a uniform equilibrium payoff. In order to prove the existence of a uniform-l-average equilibrium payoff, the constructed strategy τˆ will be such that, in addition to satisfying (16) and (18), it satisfies

s 1 z gt dt converges as s →∞with P ˆ − probability 1, (19) s τ 0 and s 1 i ˆ z lim γ (z, τ ) = E ˆ lim gt dt. (20) s→∞ s τ s→∞ s 0 The pure-action strategy τˆ is derived by a “purification” of a T -discretimized correlated strategy τ˜. If τ˜ satisfies conditions (16)–(20), then, by Lemma 2, if in each time interval [t j−1, t j), the empirical distribution of the actions of the pure Markov strategy τˆ is sufficiently close to the average of the mixed action profile of τ˜, then τˆ will satisfy − 2 these conditions with a possible error of O ( j(t j t j−1) ). Therefore, the existence of a T -discretimized correlated strategy τ˜ that satisfies conditions (16)–(20) will imply the existence of a uniform-l-average equilibrium. Let us mention that the existence of a continuous-time strategy that obeys conditions (16)–(20) follows easily from the existence of a stationary correlated strategy in the discrete-time game that obeys the discrete-time analogous conditions (16*)–(20*). Moreover, if τ˜ is a discrete-time stationary correlated strategy, then conditions (18*), (19*), and (20*) hold, and condition (16*) holds if and only if condition (17*) holds. However, the existence of a continuous-time correlated Markov strategy that obeys conditions (16)–(20) does not follow easily from the existence of a correlated Markov strategy (in the discrete-time game) that obeys conditions (16*)–(20*). The existence of such a correlated Markov strategy in the discrete-time game is proved in Solan and Vieille (2002). Therefore, we cannot use the result of Solan and Vieille (2002) directly. However, we follow similar ideas to those used in Solan and Vieille (2002). The strategy τˆ is constructed in a sequence of steps. First, one defines a strategy profile τ that is discretimized w.r.t. the sequence T = ( j/) j≥0, where  is a sufficiently large integer, and a sufficiently large positive integer N0, such that z i ≥ i − H Eτ v (zT ) v (z) ε1/2foreveryfinite( t )t≥0-stopping time T (21) and

γ i (z, τ ) ≥ vi(z) − ε /2, (22) N0 1 where ε1 is sufficiently small. See Section 4.4. The second step defines a correlated strategy τ˜ that is discretimized w.r.t. the sequence T = ( j/) j≥0 and (1) coincides with τ if the initial state is in some subset S2 of states, where the set S2 depends on τ , (2) coincides with a profile of stationary strategies y if the initial state is in some subset S¯ of states, where y and S¯ depend on the stochastic game (but not on τ ), and (3) coincides with a discretimized correlated strategy whose correlated mixed action is close to that of τ but ¯ avoids certain possible transitions if the initial state is in the complement S1 of S ∪ S2. See Section 4.7. The third step “approximates” the correlated strategy τ˜ by a pure-action strategy τ¯. The “approximation” is so that in [ k k+1 ¯ any time interval of the form  ,  ), where k is a nonnegative integer, the empirical distribution of the pure actions of τ is within O (1/) of the average of the correlated mixed actions of τ˜. The “approximation” enables us to use Lemma 2 in order to derive properties of τ¯ from the corresponding properties of τ˜. See Section 4.8. Finally, one defines the pure-action strategy τˆ that is played in blocks of play-dependent random duration that are ≤ N0. At the start of each block, τˆ restarts with the strategy τ¯. The periodicity of τˆ guarantees that τˆ obeys properties (18)–(20). See Section 4.8. The approximate equilibrium strategy σ follows τˆ until the first time t = j/, with j a positive integer, where a deviation by a single player was observed, and thereafter all other players punish the deviating player in the uniform-l-average game. See Section 4.10.

4.2. A few auxiliary functions of

i In this subsection we define vρ , v, Cz, w, and d > 0, as a function of . In short, vρ (z) is a payoff level where the other players can force the expected payoff of player i in the normalized ρ-discounted game that starts at state z to be no more i i →   = i  = i than this level; v (z) is the limit of vρ (z) as ρ 0; Cz is the subset of states z with v(z ) v(z) (namely, v (z ) v (z) 114 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

 for all i); w ∈ RS is a positive linear combination of the vectors vi ∈ RS , with 0 ≤ w(z) ≤ 1 and such that v(z) = v(z )    implies that w(z) = w(z ) (and therefore w(z) = w(z ) whenever v(z) = v(z )); and d > 0is a positive constant such that   w(z) = w(z ) implies that |w(z) − w(z )| > d. i = i ∈ RS Definition of vρ and v. Let Vρ (Vρ(z))z∈S be the (unique) solution V of the system of equations 

−  −  ρV (z) = min max gi(z, x i ⊗ xi) + μ(z , z, x i ⊗ xi)V (z ) , z ∈ S, x−i xi z∈S

−i j −i j i i where the minimum ranges over all x = (x ) j =i ∈ X (z) := × j =i (A (z)), and the maximum ranges over all x ∈ X (z). i = i → i i Set vρ ρVρ . The function ρ vρ(z) is a bounded semialgebraic function, and therefore it converges to a limit v (z) as → + ≤ i ≤ ≤ i ≤ ≤ i ≤ ∈ ρ 0 . The assumption 0 g 1implies that 0 vρ(z) 1, and therefore 0 v (z) 1, for every ρ > 0, i N, and z ∈ S.   Definition of Cz. For every state z ∈ S let Cz ={z ∈ S s.t. v(z ) = v(z)}. i i i i   i Definition of w and d. For i ∈ N, with maxz∈S v (z) > minz∈S v (z), we set di = min{v (z) − v (z ) : z, z ∈ S, and v (z) > i  i i   v (z )}. For i ∈ N with v (z) = v (z ) for every z, z ∈ S, we set di = 0. By renaming the players, we can assume without loss  of generality that N ={1, 2, ..., n} and di ≥ di+1 for 1 ≤ i < n. We will see that if d1 = 0(and then v(z) = v(z ) for every  z, z ∈ S) the theorem follows easily. So assume that d1 > 0 and let i0 be the maximal positive integer that is less than or equal to n and with di0 > 0. i N For z ∈ S, we denote by v(z) the vector (v (z))i∈N ∈ R . We define a [0, 1]-valued function w on S and a positive  number d > 0such that for every two states z, z ∈ S and player i1 ∈ N, we have =  =⇒ | −  |≥ v(z) v(z ) w(z) w(z ) d, and  (23) i i  i1 i1   v (z) = v (z ) ∀i < i1 and v (z)>v (z ) =⇒ w(z)>w(z ). (24) −  For example, the function w z = i0 2 i d vi z is [0 1]-valued and satisfies (24), and if d = min{w z − w z : ( ) i=1 ( j w(z )}, then d ≥ 2 n d > 0, and the function w and the positive constant d satisfy (23) and (24). j≤i0 j

4.3. The auxiliary mixed actions and their basic properties

i In this subsection we first select profiles of mixed actions x(z, ρ) ∈ X(z), z ∈ S, that depend on the vector ρ = (ρ )i∈N of ∗ individual discount rates, and X (z) ⊂ X(z) consists of all limit points of x(z, ρ) as ρ → 0. ∗ The subset of states S¯ is defined as all states z for which there is a stationary strategy y, z → y(z) ∈ X (z), such that z P (z1 ∈/ Cz) > 0. y ∗ We select a stationary strategy y : z → y(z) ∈ X (z) and a positive number 0 <δ<1 that satisfy z ∈ ∀ ∈ ¯ P y(z1 / Cz)>4δ z S. (25)

i N Definition of x(z, ρ). For every ρ = (ρ )i∈N ∈ (0, ∞) , let x(z, ρ), z ∈ S, be an equilibrium of the single-stage game with the payoff function to player i

→ i i +  i  a ρ g (z,a) μ(z , z,a)vρi (z ). z∈S − − Then, using the definition of vi and the fact that x(z, ) is a profile of mixed actions and therefore x i (z, ) ∈ X i(z), we ρi ρ ρ have 

i i = i i −i ⊗ i +  −i ⊗ i i  ρ v i (z) min max ρ g (z, x x ) μ(z , z, x x )v i (z ) ρ −i ρ x xi  z ∈S 

≤ i i −i  ⊗ i +  −i  ⊗ i i  max ρ g (z, x (z, ρ) x ) μ(z , z, x (z, ρ) x )v i (z ) i ρ x ∈ z S = i i  +   i  ρ g (z, x(z, ρ)) μ(z , z, x(z, ρ))vρi (z ), (26) z∈S

−i −i j i i i where the minimum is over all x ∈ X (z) := ×i = j∈N (A (z)) and the maximum is over all x ∈ X (z) := (A (z)). ∗ ∗ Definition of X (z). For every θ>0let X(z, θ) ⊂ X(z) be the closure of the set {(x(z, ρ)) : max j∈N ρ j <θ}. Let X (z) ⊂ ∗ X(z) be the intersection of the sets X(z, θ), namely, X (z) := ∩θ>0 X(z, θ).   ∗ As the sets X(z, θ) and X(z) are nonempty and compact, and X(z, θ) ⊆ X(z, θ ) whenever 0 <θ<θ , the set X (z) is nonempty and compact. A. Neyman / Games and Economic Behavior 104 (2017) 92–130 115

∗  i  Note that if y ∈ X (z) then there is a sequence (ρ(k)) s.t. ρ (k) →k→∞ 0 and x(z, ρ(k)) →k→∞ y. Therefore, by passing ∗ to the limit in inequality (26), for every z ∈ S, y ∈ X (z), and i ∈ N, we have

  μ(z , z, y)vi(z ) ≥ 0. (27) z∈S ∗ Definition of S¯ , y, and δ. The set of states S¯ , the stationary strategy profile y : z → y(z) ∈ X (z), and 0 <δ<1are such that

z ∈ ∀ ∈ ¯ P y(z1 / Cz)>4δ z S, ¯ ¯ ∗ μ((S \ Cz) ∪ S, z, x) = 0 ∀z ∈/ S, x ∈ X (z). An explicit construction of S¯ , y, and δ is as follows. ¯ The set S is the set of all states z for which there is a sequence z = z0, y1, z1, ..., yk, zk such that v(zk) = v(z), y j ∈ ∗ ∗ ¯ X (z j−1), and μ(z j , z j−1, y j) > 0, and y : z → y(z) ∈ X (z) is a stationary strategy such that for every state z ∈ S there is a sequence z = z0, z1, ..., zk such that v(zk) = v(z) and μ(z j , z j−1, y(z j−1)) > 0. An alternative longer definition, but somehow more explicit, can be obtained by defining the sequence of disjoint subsets ¯ ¯ of states Sk with z ∈ Sk iff k is the minimal positive integer for which there is a sequence z = z0, y1, z1, ..., yk, zk such that ∗ v(zk) = v(z), y j ∈ X (z j−1), and μ(z j , z j−1, y j) > 0. In that case, ¯ ∗ S1 := {z :∃y ∈ X (z) s.t. μ(S \ Cz, z, y)>0}, ¯ ∗ and for every z ∈ S1 we select an element y(z) ∈ X (z) with μ(S \ Cz, z, y) > 0. Inductively, for k ≥ 1, ¯ ¯ ∗ ¯ Sk+1 := {z ∈∪/ j≤k S j :∃y ∈ X (z) s.t. μ(Sk, z, y)>0}, ¯ ∗ ¯ and for every z ∈ Sk+1 we select an element y(z) ∈ X (z) with μ(Sk, z, y(z)) > 0. We set ¯ =∪|S| ¯ S k=1 Sk. ¯    Next, we show that S is a proper subset of S. Let z ∈ arg maxz∈S w(z ), i.e., w(z) ≥ w(z ) ∀z ∈ S. By inequality (27)  ¯  and implication (24), we deduce that (1) w(z) < max ∈ w(z ) for every z ∈ S , and (2) w(z) < max  ¯ w(z ) for every z S 1 z ∈Sk ¯    ¯ z ∈ Sk+1. Therefore, any state z ∈ arg maxz∈S w(z ) (i.e., w(z) ≥ w(z ) ∀z ∈ S) is not in S. ∗ For every z ∈ S \ S¯ we select an (arbitrary) element y(z) ∈ X (z), and define the stationary strategy y by y : z → y(z). z z ∈ ∈ ¯ As the probability P y that the state z1 at time 1 is not in Cz is > 0, i.e., P y(z1 / Cz) > 0, for every z S we can select z ¯ δ>0such that P (z1 ∈/ Cz) > 4δ for all z ∈ S. y ∗ Note that as y(z) ∈ X (z) for every z ∈ S, inequality (27) implies that z ≥ ∀ ∈ E y v(z1) v(z) z S. Definition of B(z). For every z ∈ S let ¯ B(z) := {a ∈ A(z) s.t. μ((S \ Cz) ∪ S, z,a)>0}. ¯ ¯ ∗ ¯ Fix z ∈/ S. By the definition of S, for every x ∈ X (z), we have μ((S \ Cz) ∪ S, z, x) = 0. Therefore, for any action profile a in ∗ ¯ ∗ ¯ the support of x ∈ X (z), we have μ((S \ Cz) ∪ S, z, a) = 0. As X (z) is nonempty, we deduce that for any z ∈/ S, B(z) is a proper subset of A(z). ¯ ∗ ∗ ∗ Therefore, for z ∈/ S, for every x ∈ X(z) there is a point x ∈ X(z) with x (B(z)) = 0 and x (a) ≥ x(a) for all a ∈ A(z) \ B(z). ∗  − ∗ := | ∗ − | = ∗ \ − \ + | ∗ − The norm distance between x and x , x x a∈A(z) x (a) x(a) ( x (A(z) B(z)) x(A(z) B(z)) a∈B(z) x (a) x(a)|), is equal to 2x(B(z)).   Definition of ξ . Let ξ>0be a positive constant such that μ(z , z, a) > 0implies that μ(z , z, a) >ξ.

4.4. The strategy τ and the time N0

k= j →  ∈ N Lemma 3. For every 1 > ε1 > 0, α0 > 0, and a positive integer  > 8/ε1, there are functions (zk/)k=0 ρ j (0, α0) (where k and jare nonnegative integers), and a sufficiently large positive integer N0, such that if τ is the profile of strategies with τ (h, t) = x(zt , ρ[t]), where [t] is the largest integer that is ≤ t, then for all z ∈ Sand i ∈ N, z i ≥ i − H Eτ v (zT ) v (z) ε1/2 for every ( t )t≥0-stopping time T (28) and

γ i (z, τ ) ≥ vi(z) − ε /2. (29) N0 1 116 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

i N Proof. For every fixed vector ρ = (ρ )i∈N ∈ (0, 1) , the stationary strategy α(ρ) : z → x(z, ρ) satisfies t z i −ρit i + −ρit i ≥ i ∀ ≥ Eα( ρ e g (zt , xt ) dt e vρi (zt )) vρi (z) t 0. (30) 0

Let ε2 ≤ ε1/2; e.g., ε2 = ε1/2. Following the proof in Mertens and Neyman (1981), we select an integrable function ¯ : ] → R   ≤ ∀ ∈ ] | ρ | ≥ i − i ∀ ¯ ≤ ∀ ∈ ψ (0, 1 + such that dvρ (x) ψ(x) x (0, 1 (equivalently, ρ ψ(x) dx vρ vρ¯ 0 < ρ, ρ 1 and i N), and define the function s : (0, 1] →[1, ∞) by 1 12 ψ(x) − s(y) = dx + y 1/2. ε2 x y The function s is strictly increasing and onto and its inverse function is denoted by λ. ˜ ˜ Letting z j = z j and H j = H j , we set  

• i ≥ s0 M with M sufficiently large, • i = i ρ j λ(s j), j+1 • i = i = z  i | H˜ u v i and y j E ( j g (zt , xt ) dt j), and j α(ρj ) ρ j  • i = { i + i − i ˜ + } s j+1 max M, s j y j u j(z j+1) 4ε2 .

The inequalities that follow depend on a fixed player i and for ease of notation we suppress the superscript i. Inequality (30) implies, as in the proof of Theorem 3, that for every θ>0there is α0 sufficiently small such that for i = i ρ j < α0 and λ j ρ j/, we have ρi z ˜ ˜ ˜ j E  (λ j y j + (1 − λ j)u j(z j+1) | H j) ≥ u j(z j) − θ . α(ρ j )  Equivalently, z ˜ − ˜ + − ˜ | H˜ ≥− Eα(u j(z j+1) u j(z j) λ j(y j u j(z j+1)) j) θλj. (31) Note the similarity between this last inequality and inequality (2.1) in Mertens and Neyman (1981). One minor difference, however, is the added term of −θλj on the right-hand side of the inequality. Another one is that y j depends ρj and is thus ˜ measurable with respect to H j , while in inequality (2.1) in Mertens and Neyman (1981) the corresponding term is not a function of the past. We first observe the following:

|s j+1 − s j|≤6 (32) − ˜ + ≤ − ≤ − ˜ + + y j u j(z j+1) 4ε2 s j+1 s j y j u j(z j+1) 4ε2 21s j+1=M . (33)

Since for every 1 > η > 0, s((1 − η)y) − s(y) →y→0+ ∞, (32) implies that there is M0 such that for every M ≥ M0 | i − i |≤ ρ j+1 ρ j ε2ρ j/6. (34)

Thus for M ≥ M0, ρi ρi j+1 j+1 ≥| − |≥ 12 | ψ(x) |≥ 6 | |≥ 6  −  6 s j+1 s j dx ψ(x) dx vρi vρi . ε x i i j+1 j 2 ε2ρ j ε2ρ j i i ρ j ρ j Hence,  i − i ≤ i v i v i ε2ρ j. (35) ρ j+1 ρ j By the mean value theorem and inequality (33),

s j+1 ∞ ∞ = − ≥ i − − i λ(y)dy λ(y)dy λ(y)dy ρ j(s j+1 s j) ε2ρ j

s j s j s j+1 ≥ i − ˜ + i ρ j(y j u j(z j+1)) 3ε2ρ j. (36) A. Neyman / Games and Economic Behavior 104 (2017) 92–130 117

Replacing, in (31), u (z˜ + ) by the larger number u + (z˜ + ) + ε ρi (see (35)), ρi (y − u (z˜ + )) by the larger number j j 1 j 1 j 1 2 j j j j j 1 ∞ ∞ i ˜ ∞ λ(y)dy − λ(y)dy − 3ε2ρ (see (36)), and letting Y j = u j(z j) − λ(y)dy, we have (for θ<ε2) s j s j+1 j s j − | H˜ ≥ i Eτ (Y j+1 Y j j) ε2ρ j. (37) | | ≤ H˜ z ≥ Y j is thus a bounded ( Y j 1) submartingale (by (37)), and therefore for any j -stopping-time T , Eτ Y T Y0. By summing the right-hand side inequalities of (33) for 0 ≤ j <n, we have

≥ ˜ + − − − y j u j(z j+1) sn s0 2 1s j+1=M 4nε2. (38) j<n j<n j<n z ˜ ≥ i − For sufficiently large M, we have Eτ u j(z j+1) v (z) ε2 and 1 1 z = ≤ z i ≤ z − Eτ 1s j+1 M Eτ ε2ρ j Eτ (Yn Y0). ε2λ(M0) ε2λ(M0) j<n j<n Therefore, for all sufficiently large n, n z i 1 z i E g (zt , xt ) dt = E y j ≥ v (z) − 6ε2. τ n τ 0 j<n This completes the proof of inequality (29). For every real number x, let x denote the smallest integer that is ≥ x. H  H˜ z ≥ Note that if T is an t -stopping time, then T is an j -stopping time. As Y j is a submartingale, we have Eτ YT Y0. i ˜ Therefore, as |Y j − v (z j )| < ε2/16 for M sufficiently large, we deduce that z i ˜ i − Eτ v (zT )>v (z) ε2/8. i i By Lemma 1 and the assumption μ = 1, we have E |v (zT ) − v (z T )| ≤ 1/ ≤ ε2/8.  Therefore, z i i − ≥ i − Eτ v (zT )>v (z) ε2/4 v (z) ε1/2. This completes the proof of inequality (28). 2

4.5. An auxiliary lemma

Lemma 4. Let C ⊂ Sbe a subset of states. Let T = T C = mins≥t {s : zs ∈ C}. Then for every (sk)k≥0-discretimized correlated strategy σ and n > t, ⎛ ⎞ T ∧n ⎝ ⎠ Pσ (t < T ≤ n | Ht ) = Eσ μ(C, zs, xs) ds | Ht (39) t where T ∧ n := min(T , n).

Proof. Equality (39) holds trivially on zt ∈ C. Assume that zt ∈/ C. It suffices to prove the lemma for t = 0. We provide two proofs. The first one approximates the probability that T ≤ n by summing the approximations of the probabilities that s < T ≤ s + δ, where δ>0is small, and s ranges over a grid of n/δ points. The error term of each approximation is O (δ2). The sum of the approximations is the desired formula and the cumulative error term is O (δ). The second proof derives the formula without an approximation. The mixed action profile xs that the (sk)k≥0-discretimized correlated strategy σ selects at time s is a function of the ∧ ≥ random list (sk s, zsk∧s), k 0. For s ≤ s < s + we denote this action profile by x . Note that x is measurable w.r.t. HS and is defined for every j j 1 j,s j,s s j s j ≤ s. It coincides with xs on s j ≤ s < s j+1, but may differ from xs on s j+1 ≤ s. ≤ z HS = ≤ + Given s j s < s j+1 and T > s, the conditional Pσ probability, given s , that s < T s j+1 s δ equals + s δ t z z x du s μ( s j , s j , j,u ) μ(C, zs j , x j,t )e dt. s

If there is no state change in the interval [s, s + δ], then x j,t = xt for every s ≤ t ≤ s + δ. Therefore, given s j ≤ s < s j+1, z = ≤ ≤ + −   the Pσ probability that xt x j,t for every s t s δ is at least 1 δ μ . 118 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

HS 2 HS Therefore, the above conditional probability, given s , is within O (δ ), the conditional probability, given s , of s+δ μ(C, z , x ) dt. s t t z ≤ z T ∧n Therefore, the Pσ probability that T n is, within O (δ), the expectation w.r.t. Pσ of 0 μ(C, zt , xt ) dt. As this holds for any δ>0the result follows. 2

Now we present the alternative proof. = + ≤ Note that on s j < n the conditional expectation of the indicator of the event s j < T s j 1 n, given the history up to t μ(zs ,zs ,x j,u ) du n s j j j time s j , equals μ(C, zs , x j,t )e dt. In symbols, s j j

n t μ(zs ,zs ,x j,u ) du I = ≤ | H = s j j j Eσ ( (s j < T s j+1 n) s j ) μ(C, zs j , x j,t )e dt,

s j I ∗ ∗ where ( ) denotes the indicator of the event . t μ(zs ,zs ,x j,u ) du s j j j The term e equals the conditional probability of the event t < s j+1, given the history up to time s j . Therefore,

n n t μ(zs ,zs ,x j,u ) du s j j j = I | H μ(C, xs j , x j,t )e dt μ(C, zs j , x j,t )Eσ ( (t < s j+1) s j ) dt.

s j s j ≤ = = On s j t < s j+1, zs j zt and x j,t xt . Therefore,

n s j+1 I | H = | H μ(C, zs j , x j,t )Eσ ( (t < s j+1) s j ) dt Eσ ( μ(C, zt , xt ) dt s j ).

s j s j We conclude that

s j+1 I = ≤ | H = | H Eσ ( (s j < T s j+1 n) s j ) Eσ ( μ(C, zt , xt ) dt s j ).

s j Taking the expectation on both sides and summing over all nonnegative integers j, we have ∧ ∞ T n z ≤ = z I = ≤ = z 2 Pσ (T n) Eσ ( (s j < T s j+1 n)) Eσ ( μ(C, zt , xt ) dt). = j 0 0

4.6. The partitioning of the state space: the sets S1 and S2 ¯ The subset S1 of S \ S is defined below so as to guarantee the existence of a correlated strategy τ˜ such that for z ∈ S1 z z (1) the distance between the Pτ -distribution and the Pτ˜ -distribution of the state process up to time N0 is small, (2) the z z difference between the expectation w.r.t. Pτ and the expectation w.r.t. Pτ˜ of the accumulated payoff up to time N0 is small, \ ¯ z and (2) the state process remains in Cz S with Pτ˜ -probability 1. See Section 4.7. ¯ Fix ε2 > 0(ε2 = O (ε); e.g., ε2 = εξ/2or ε2 = ε). The subset S1 of S \ S depends on the constant ε2 > 0 and the strategy HS = { : ∈ \ ∪ ¯} τ as follows. Define the ( t )t≥0-stopping time T by T min t zt (S Cz) S . Set ={ ∈ : z ≤ ≤ } S1 z S Pτ (T N0) 2ε2 and = \ ¯ ∪ ={ ∈ \ ¯ : z ≤ } S2 S (S S1) z S S Pτ (T N0)>2ε2 . ¯ ¯ Note that T = 0on z0 ∈ S. Therefore, by the definition of S1, we have S1 ∩ S =∅.

Lemma 5. For every z ∈ S1, ⎛ ⎞ T∧N0 z ⎝  [ ] ⎠ ≤ Eτ x(zt , ρt ) B(zt ) dt 2ε2/ξ. (40) 0 A. Neyman / Games and Economic Behavior 104 (2017) 92–130 119

Proof. It follows from the selection of ξ>0 that ¯ μ((S \ Cz) ∪ S, zt , x(zt , ρ[t])) ≥ ξ x(zt , ρ[t])[B(zt )] on t < T . (41)

Therefore (using Lemma 4), if z ∈ S1, then ⎛ ⎞ ⎛ ⎞ T∧N0 T∧N0 ≥ z ≤ = z ⎝ \ ∪ ¯  ⎠ ≥ z ⎝  [ ] ⎠ 2 2ε2 Pτ (T N0) Eτ μ((S Cz) S, zt , x(zt , ρ[t])) dt Eτ ξ x(zt , ρ[t]) B(zt ) dt . (42) 0 0

4.7. Definition of the correlated strategy τ˜

We start by defining an auxiliary correlated strategy τ˜. ⎧ ⎪ ¯ ⎨y(zt ) if z0 ∈ S ˜ = ∈ τ (h,t) ⎪τ (h,t) if z0 S2 ⎩ ∗ x (zt , ρ[t]) if z0 ∈ S1, ∗ ∗ where x (zt , ρ[t]) is a point in {x ∈ X(zt ) : x(B(zt )) = 0} with x (zt , ρ[t]) − x(zt , ρ[t]) ≤ 2x(zt , ρ[t])[B(z)] if x(zt , ρ[t])[B(z)] > ∗ ¯ 0, and x (zt , ρ[t]) = x(zt , ρ[t]) otherwise. Recall that for z ∈/ S, B(z) is a proper subset of A(z), and therefore {x ∈ X(z) : x(B(z)) = 0} is nonempty. ∗ Note τ (h, t) = x(zt , ρ[t]) is a profile of mixed actions. However, x (zt , ρ[t]) need not be a profile of mixed actions. There- fore, the correlated strategy τ˜ need not be a profile of strategies. ¯ ¯ ∗ ¯ Recall that by the definition of S it follows that for every z ∈ S \ S and every x ∈ X (z), we have μ(S, z, x) = 0 and   = z∈S μ(z , z, x)v(z ) 0.

Lemma 6. For every z ∈ S , P z z ∈ C \ S¯ ∀0 ≤ t ≤ N = 1 and therefore P z z ∈ C \ S¯ = 1. 1 τ˜ ( t z 0) τ˜ ( N0 z )

Proof. Follows from the definition of τ˜ on z0 ∈ S1. 2

Lemma 7. For every z ∈ S , the norm distance between the P z -distribution and the P z -distribution of the state process z ≤ ≤ ∧ 1 τ τ˜ ( t )0 t T N0 is ≤ 4ε2/ξ .

Proof. By the definition of τ˜, τ˜(h, t) − τ (h, t) ≤ 2x(zt , ρ[t])[B(z)] on z0 = z ∈ S1. Therefore, the result follows from Lem- mas 12 (in Section 6) and 5. 2

Lemma 8. For every z ∈ S1,

N0 N0   z i ≥ z i − + ≥ i − − Eτ˜ gt dt Eτ gt dt 2(2N0 2)ε2/ξ N0 v (z) ε1/2 O (ε2) . 0 0

Proof. In the following sequence of equations, equality (43) follows from the fact that on the plays that are compatible ˜ i = i ∗  with the strategy τ we have gt g (zt , x (zt , ρ[t])). Inequality (44) follows from:

i (i) the inequality 0 ≤ g (zt , x(zt , ρ[t])) ≤ 1, i (ii) ρ[t] and (therefore also) g (zt , x(zt , ρ[t])) depend only on the state process in time s ∈[0, N0], and (iii) half the norm distance between the distributions defined by τ and τ˜ on this state process is (by Lemma 7) less than or equal to 2ε2/ξ .

Inequality (45) follows from the inequality

i ∗ i ∗ g (zt , x (zt , ρ[t])) ≥ g (zt , x(zt , ρ[t])) −x(zt , ρ[t]) − x (zt , ρ[t]). Inequality (46) follows from the equality

N0 T∧N0 N0 z   − ∗   = z   − ∗   + z   − ∗   Eτ x(zt , ρ[t]) x (zt , ρ[t]) dt Eτ x(zt , ρ[t]) x (zt , ρ[t]) dt Eτ x(zt , ρ[t]) x (zt , ρ[t]) dt

0 0 T ∧N0 and the inequality 120 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

N0 z   − ∗   ≤ z Eτ x(zt , ρ[t]) x (zt , ρ[t]) dt Pτ (T < N0)2N0.

T ∧N0 Inequality (47) follows from:

z (i) the inequality P (T < N0) ≤ 2ε2 for z ∈ S1, τ ∗ (ii) the inequality x(zt , ρ[t]) − x (zt , ρ[t]) ≤ 2x(zt , ρ[t])[B(zt )], and ∧ z T N0  [ ] ≤ (iii) the inequality Eτ 0 2x(zt , ρ[t]) B(zt ) dt 4ε2/ξ .

Finally, inequality (48) follows from inequality (29).

N0 N0 z i = z i ∗  Eτ˜ gt dt Eτ˜ g (zt , x (zt , ρ[t])) dt (43) 0 0

N0 ≥ z i ∗  − Eτ g (zt , x (zt , ρ[t])) dt 2N0ε2/ξ (44) 0

N0 ≥ z i  − Eτ g (zt , x(zt , ρ[t])) dt 2N0ε2/ξ (45) 0

N0 − z   − ∗   Eτ x(zt , ρ[t]) x (zt , ρ[t]) dt 0

N0 ≥ z i − − z Eτ gt dt 2N0ε2/ξ Pτ (T < N0)2N0 (46) 0

T∧N0 − z   − ∗   Eτ x(zt , ρt ) x (zt , ρt ) dt 0

N0 ≥ z i − − + Eτ gt dt 4ε2 N0 ε2/ξ(4 2N0) (47) 0  i ≥ N0 v (z) − ε1/2 − O (ε2) . 2 (48)

4.8. The discretimization τ¯ of τ˜ and the strategy τˆ

Recall that conditional on h j , the strategy τ˜ on the time-interval [ j, j + 1) coincides with a stationary strategy, denoted by τ˜[h j] : z → τ˜[h j](z). We discretimize τ˜ into a pure-action strategy τ¯ as follows. Let  be a sufficiently large positive integer, and conditional  on z j := (z0, z1, ..., z j), j a nonnegative integer. The average of τ¯ on time intervals [ j + k/, j + (k + 1)/], k a positive ≤ ˜[ ] ˜ integer with 0 k <, is within O (1/) of τ h j – the average mixed action of τ over that interval. For example, define nonnegative integers z a ≥ ˜[h ] z [a] − 1, where z ∈ S and a ∈ A z , such that z a = . For B ⊂ A z we ( , )  τ j ( ) ( ) a∈A(z) ( , )  ( ) := ¯ ¯[ ] = ¯[ ] + 2 set (z, B) a∈B (z, a). Let τ be a pure-action strategy that conditional on h j , (1) τ h j (z, t) τ h j (z, j i/ ) if ≤ + 2 ≤ + + 2 ≤ + ≤ ∈ ∈ ¯[ ] + + 2 [ ] = j j i/ t < j (i 1)/ j 1, (2) for every 0 k <, z S, and a A(z), 0≤i< τ h j (z, j k/ i/ ) a ¯ [ ] = ∈ ¯  ∈ ⊂ ≤ (z, a), and (3) τ (ht ) B(zt ) 0on zt / S. Therefore, conditional on z j , for every z S, B A(z), and 0 k <, we have j+(k+1)/ ¯[h ](z, t)[B] dt = 1 |{0 ≤ i <: ¯[h ](z, j +k/ + i/2)[B] = 1}| ≥ (z, B)/2 ≥ ˜[h ](z)[B]/ − 1/2. Therefore, for j+k/ τ j 2 τ j τ j z ∈/ S¯ , j+(k+1)/ 1 τ˜[h j](z)[B]−(z, B)/  τ˜[h j](z) − τ¯[h j](z,t) dt=2 max  B⊂A(z)\B(z)  j+k/ ≤ 2|A(z)|/2 = O (1/2). (49) A. Neyman / Games and Economic Behavior 104 (2017) 92–130 121

 For two nonnegative integers k,  we denote by zk the sequence of states z0, z1/, z2/, ..., zk/. By Lemma 12 in Sec- tion 6, conditional on z0 ∈ S1,

|P z (z ) − P z (z )|≤N (4 + 2|A|)/ = O (1/), (50) τ˜ N0 τ¯ N0 0 z N0   where the sum is over all realizations z of the random sequence z = z0, z1 , z2 , ..., zN . N0 N0 / / 0

∈ z ∈ \ ¯ ∀ ≤ ≤ = Lemma 9. For every z S1, Pτ¯ (zt Cz S 0 t N0) 1, and

γ i (z, τ¯) ≥ vi(z) − ε − O (ε ) ∀ sufficiently large. N0 1 2

¯ Proof. The first assertion follows from τ¯ being a pure-action strategy and from the imposition of τ¯(h, t) ∈/ B(zt ) on zt ∈/ S. For every nonnegative integer k (for every strategy profile and every initial state), the conditional probability, given hk/, that zt = zk/ for all k/ ≤ t <(k + 1)/ is greater than or equal to 1 − 1/. On zt = zk/ for all k/ ≤ t <(k + 1)/, we have (by (49))

(k+1)/ (k+1)/ i i 2 g (zt , τ¯(h,t)) dt ≥ g (zt , τ˜(h,t)) dt −|A|/ .

k/ k/ ¯ Therefore, for z ∈ S1 and zk/ ∈ Cz \ S,

(k+1)/ (k+1)/ z i |  ≥ z i |  − | |+ 2 Eτ¯ ( gt dt zk) Eτ˜ ( gt dt zk) ( A 1)/ . (51) k/ k/

Therefore, using (51) and (50),

− (k+1)/ N 0 1 N γ i (z, τ¯) = E z E z ( gi dt | z) 0 N0 τ¯ τ¯ t k = k 0 k/ + N −1 (k 1)/ 0 (|A|+1)N ≥ z z i |  − 0 Eτ¯ Eτ˜ ( gt dt zk) =  k 0 k/

− (k+1)/ N 0 1 | |+ z  z i  ( A 1)N0 = P ¯ (z ) E ˜ ( g dt | z ) − τ N0 τ t k  z k=0 N0 k/ − (k+1)/ N 0 1 | |+ z  z i  ( A 1)N0 z  z  ≥ P ˜ (z ) E ˜ ( g dt | z ) − − |P ˜ (z ) − P ¯ (z )|N0 τ N0 τ t k  τ N0 τ N0 z k=0 z N0 k/ N0 | |+ | |+ 2 i ( A 1)N0 (2 A 4)N0 ≥ N0γ (z, τ˜) − − . N0   Using Lemma 8 in Section 4.7 we conclude that for all sufficiently large , we have γ i (z, τ¯) ≥ vi(z) − ε /2 − O (ε ). 2 N0 1 2

The strategy τˆ plays in blocks of random sizes that are ≤ N0, and in each block it follows τ¯ as if the game were starting at the state of the start of the block. The duration of a block that starts at a state z ∈ S1 is N0. The duration of a block that starts at a state z ∈ S¯ is 1. A block that starts at time t in state z ∈ S2 continues until the first time t + j/, with j being a positive integer and ¯ zs ∈/ Cz \ S for some t < s ≤ t + j/, if this time is ≤ t + N0. Otherwise it continues until time t + N0. Let t¯ be the time at the end of the k-th block (t¯ = 0), and z¯ be the state at the end of the block, i.e., z¯ = z¯ . k 0 k k tk ˆ ˆ ¯ ¯ ¯ ¯ ¯ ¯ The strategy τ is given by τ (h, t) = τ (tk ∗ h, t − tk), whenever tk ≤ t < tk+1, and where tk ∗ h is the left translation of the play h by t¯ ; i.e., if h = (z , x ) ≥ , then t¯ ∗ h = (z¯ , x¯ ) ≥ . k t t t 0 k tk+s tk+s s 0 122 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

Let us formally define the stopping times t¯ inductively. t¯ + is a function of t¯ and z¯ , and of the process that follows k k 1 k tk ¯ ∈ ¯ = ¯ + ∈ ¯ = ∧ { : N ∃ ≥ ¯ ∈ \ ∪ ¯} ∈ ¯ time tk. If zt¯ S1 then tk+1 tk N0. If zt¯ S2 then tk+1 N0 inf j/ j, j/ s > tk s.t. zs (S Cz) S . If zt¯ S ¯ k¯ k k then tk+1 = tk + 1.

¯ ¯ Lemma 10. The discrete-time process z0, z1, ..., with the probability Pτˆ is a homogeneous Markov chain process with a transition matrix Fthat obeys the following properties. 1) F :=  F  = 1 for z ∈ S , z,Cz z ∈Cz z,z 1 − ≥ ∈ ¯ 2) F z,S\Cz > 4δ O (1/) 3δ for z Sand  sufficiently large, 3) F \ + F ¯ ≥ 2ε − O (1/) ≥ ε for z ∈ S and  sufficiently large, and z,S Cz z,S 2 2 2   i ≥ i − I ∈ ∪ ¯ ∈ ∈ 4) z∈S F z,z v (z ) v (z) ε1 (z S2 S) for every z S, i N, and  sufficiently large. ˆ ¯ ¯ ¯ ¯ Proof. By the definition of τ , the Pτˆ conditional distribution of zk+1, given z0, ..., zk, depends only on zk. Therefore, the ¯ ¯ ˆ stochastic z0, ..., zk, ..., defined by τ , is a homogeneous Markov chain. ¯ ∈ z z Equality 1) follows from the definition of τ on z0 S1. Recall that the norm distance between the Pτ¯ and the Pτ˜ ¯ z z z distribution of z , z , ...,z is ≤ O (1/). For z ∈ S, F \ = P (z ∈ S \ C ) = P (z ∈ S \ C ) ≥ P (z ∈ S \ C ) − O (1/) > 0 1 N0 z,S Cz τˆ 1 z τ¯ 1 z τ˜ 1 z − ≥ ∈ \ + ¯ ≥ z ∃ ≤ ∈ \ ∪ ¯ ≥ 4δ O (1/) 3δ for sufficiently large . Therefore, 2) holds. For z S2, F z,S Cz F z,S Pτ¯ ( j N0 s.t. z j/ (S Cz) S) z ∃ ≤ ∈ \ ∪ ¯ − ≥ z ≤ − Pτ˜ ( j N0 s.t. z j/ (S Cz) S) O (1/) Pτ˜ (T N0) O (1/) (where the last inequality uses the fact that for every = ∀ ≤ ≤ + ≥ − z ≤ ≥ stopping time T and every strategy σ , Pσ (zs zT T s T 1/) 1 O (1/). Therefore, as Pτ˜ (T N0) 2ε2, part 3) follows.     ∈  i = i ∈ ¯  i = z = i ≥ By 1), for z S1 we have z∈S F z,z v (z ) v (z); for z S we have z∈S F z,z v (z ) z∈S Pτ¯ (z1 z )v (z )    z = i − ≥ i − ∈  ≥  − z∈S Pτ˜ (z1 z )v (z ) O (1/) v (z) O (1/), and for z S2, using (28), we have z∈S F z,z w(z ) z∈S F z,z w(z) ε1/2 − O (1/). Therefore, 4) follows. 2

4.9. A probabilistic lemma

The following lemma appears implicitly in Solan and Vieille (2002).

2 Lemma 11. Assume that 0 < ε := ε2 ≤ 1 and ε1 <δd ε/4. Then

¯ (a) All ergodic classes of the Markov chain (z j) are subsets of S1, ¯ (b) E(|{0 ≤ j < ∞ : z j ∈/ S1}|) < ∞, and := ¯ → i := i ¯ → i ∀ ∈ (c) w j w(z j ) j→∞ w∞, and therefore v j v (z j ) j→∞ v∞ i N,

ε2d2δ and, if ε1 < 16 , then

(d) E|{0 ≤ j < : z ∈ S¯ ∪ S }| ≤ 16 , and j 2 δd2ε ∀ ∈ i | ≥ i − i | ≥ i − (e) i N, E(v j z0) v0 ε and E(v∞ z0) v0 ε.

¯ ¯ Proof.t Le F j , j ≥ 0, be the algebra generated by z0, ..., z j . Consider the sequences w j and Y j of real-valued random = ¯ = 2 + = ¯ ≥ ≥ ∈ ¯ = variables w j w(z j) and Y j w j δ j , where δ j δ(z j ) and ε δ(z) 0is defined as follows. For z S, we set δ(z) 2 2 δd := δ(0), for z ∈ S2 we set δ(z) = δ(2) = δεd /4 − ε1, and for z ∈ S1 we set δ(z) = 0. We claim that ¯ ¯ ¯ E(Y j+1 | F j) ≥ Y j + δ(z j) ≥ Y j + δ(2)I(z j ∈ S2 ∪ S). (52) ¯ As the process (z j ) j≥0 is a stationary Markov chain, it suffices to prove inequality (52) for j = 0. ¯ ∈ = 2 = ≥ 2 = + ¯ ¯ ∈ On z0 S1 we have Y0 w0, w1 w0 a.e., and Y1 w1 Y0 δ(z0) a.e. Therefore, inequality (52) holds on z0 S1. ¯ ∈ ¯ | = | − | ≥ ≥ 2 | − 2 ≥ 2 On z0 S, we have E(w1 z0) w0 and the probability that w1 w0 d is 3δ. Therefore, E(w1 z0) w0 3δd , | ≥ 2 + 2 = + ≥ + implying that E(Y1 z0) w0 3δd Y0 2δ(0) Y0 δ(0). = := { ∈ : ¯ = ¯ | ¯ = ≥ } ={ ∈ : ≥ } We partition S2 into two subsets. S2( ) z S2 P(w(z1) w(z0) z0 z) ε/2 z S2 F z,S\Cz ε/2 and = := \ = + ¯ ≥ ∈ ∈ \ = ¯ ∈ ¯ | ¯ = ≥ S2( ) S2 S2( ). The inequality F z,S\Cz F z,S ε for z S2 implies that for z S2 S2( ) we have P(z1 S z0 z) ε/2. On z¯0 ∈ S2( =), the (conditional) probability that |w1 − w0| ≥ d is ≥ ε/2, and E(w1 | z¯0) ≥ w0 − ε1. Therefore, on z¯0 ∈ S2( =), 2 | ¯ = − 2 + − 2 | ¯ ≥ 2 + − − 2 E(w1 z0) E((w1 w0) 2w0 w1 w0 z0) d ε/2 2w0(w0 ε1) w0 ≥ 2 − + 2 ≥ 2 + = + w0 2ε1 d ε/2 w0 2δ(2) Y0 δ(2), 2 2 where the last inequality follows from 2δ(2) = δεd /2 − 2ε1 ≤ d ε/2 − 2ε1. A. Neyman / Games and Economic Behavior 104 (2017) 92–130 123

¯ Everywhere, and thus also on z0 ∈ S2 \ S2( =), E(w1 | z0) ≥ w0 − ε1. The conditional probability that z¯1 ∈ S, given 2 z¯0 ∈ S2 \ S2( =), is ≥ ε/2. Therefore, on z¯0 ∈ S2 \ S2( =), using the convexity of x → x , | ¯ ≥ | ¯ 2 + ≥ − 2 − 2 + 2 E(Y1 z0) (E(w1 z0)) δ(0)ε/2 (w0 ε1) ε1 δd ε/2 = 2 − + 2 = 2 + = + w0 2ε1 δd ε/2 w0 2δ(2) Y0 δ(2). This completes the proof of inequality (52). Inequality (52) shows that Y j is a bounded (0 ≤ Y j ≤ 2) submartingale, and therefore it converges a.e. to Y∞. By taking the expectation in inequality (52), and summing over all integers 0 ≤ j <, we have 2 ≥ EY ≥ Y0 + δ(2)E|{0 ≤ j < : z j ∈ ¯ ¯ S ∪ S2}|, implying that E|{0 ≤ j < : z j ∈ S ∪ S2}| is bounded by 2/δ(2). This completes the proof of (b). Claim (b) implies ¯ that with probability 1, |{0 ≤ j < ∞ : z j ∈/ S1}| < ∞, which together with part 1) of Lemma 10 proves part (c). 2 2 By selecting ε < ε d δ we have δ(2) ≥ εd2δ/8. Therefore, 2/δ(2) ≤ 16 , proving part (d). For every i ∈ N and j ≥ 0, by 1 16 εd2δ Lemma 10 (part 4), we have i | F ≥ − I ∈ ∪ ¯ E(v j+1 j) w j ε1 (z j S2 S). Summing these inequalities over all 0 ≤ j <, we have

i | ≥ i − I ∈ ∪ ¯ ≥ i − E(v z0) v (z0) ε1 E (z j S2 S) v (z0) ε, and therefore 0≤ j< i i E(v∞ | z0) ≥ v (z0) − ε. This completes the proof of part (e). 2

4.10. The uniform-l-average equilibrium strategy σ

∈ −i \{ } ≥ For every player i N let σε be an N i strategy profile and sε a positive number such that for every s sε and every ˜ i i −i ˜ i ≤ i + ≥ strategy σ we have γs (z, σε , σ ) v (z) ε. Therefore, for every s 0we have − sε γ i(z, σ i, σ˜ i) ≤ vi(z) + ε + . s ε s The strategy σ follows τˆ until the first time t = j/, with j a positive integer, where a deviation by a single player,  say player i, in the time interval [( j − 1)/, j/) is observed. Thereafter, all players i = i start playing the N \{i} strategy −i profile σε . ≤ ≤ HS ¯ Let s > 0 and 0 T s be a ( t )t≥0-stopping time. Recall that tk denotes the time of the end of the k-th block and that z¯ = z¯ . Let k be the smallest integer k s.t. t¯ ≥ T . k tk T k s−T ¯ Note that kT ≤ T + N0 and if kT ≤ k ≤ kT + − 2then tk + N0 ≤ s. N0 Note that for every T ≤ t ≤ s we have

i i g ≥ 1{¯ ∈ }1 ¯ ¯ g t zk S1 {tk≤t

¯ + s tk N0 z i ≥ z i − ¯ Eσ gt dt Eσ (gt 1{zk∈/S1}) dt ≤ ≤ + s−T − T kT k kT 2 t¯ N0 k z i = E N (γ (z¯ , τ¯) − 1{¯ ∈ }) σ 0 N0 k zk /S1 s−T kT ≤k≤kT + −2 N0 ≥ z i ¯ − ¯ − Eσ N0(v (zk) 1{zk∈/S1} O (ε)) s−T kT ≤k≤kT + −2 N0 124 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

≥ z i ¯ − ¯ − Eσ N0(v (zkT ) 1{zk∈/S1} O (ε)) s−T kT ≤k≤kT + −2 N0 ≥ z − i ¯ − − + Eσ (s T )(v (zkT ) O (ε)) (e 1)N0, (53) ∈ z |{ ∞ : ¯ ∈ }| where e is the maximum over z S of the expectation with respect to Pτˆ of k < zk / S1 . In particular, by setting T = 0, we have s 1 E z gi dt ≥ vi(z) − O (ε) for all s sufficiently large. (54) σ s t 0 i i Let σ˜ be a strategy of player i and let T be the minimum of s and the (random) smallest time j/ such that σ˜ deviates from the play under σ at some time in the interval [( j − 1)/, j/). Let σ˜ denote for short the strategy profile −i i (σ , σ˜ ). Note that the Pσ -distribution and the Pσ˜ -distribution of T (and thus also of s − T) coincide. By the additivity of the expectation,

s T s i −i i z 1 i z 1 i z 1 i γ (z, σ , σ˜ ) = E ˜ g dt = E ˜ g dt + E ˜ g dt. (55) s σ s t σ s t σ s t 0 0 T i = − = The Pσ -distribution and the Pσ˜ -distribution of 1{t

T T z 1 i z 1 i 1 E ˜ g dt ≤ E g dt + . (56) σ s t σ s t s 0 0 By the definition of σ˜ , s z 1 i 1 z i sε E ˜ g dt ≤ E ˜ (s − T)v (zT ) + ε + . (57) σ s t s σ  s T i −μ/ Recall that 0 ≤ v (z) ≤ 1 and that for every stopping time T and strategy profile τ , Pτ (zt = zT ∀T ≤ t ≤ T + 1/) ≥ e ≥ 1 − 1/. Therefore,

1 z i 1 z i 1 E ˜ (s − T)v (zT ) ≤ E (s − T)v (zT ) + . (58) s σ  s σ   By (53) s ˆ 1 z i z 1 i sε E (s − T)v (zT ) ≤ E g dt + ε + . (59) s σ  σ s t s T 1 Let  be sufficiently large so that  < ε. Summing inequalities (55)–(59), we deduce that − γ i(z, σ i, σ˜ i) ≤ γ i(z, σ ) + 4ε for all s sufficiently large. (60) s s s s Similarly, the limit of 1 gi dt as s →∞ exists a.e. with respect to P z , and E z lim →∞ 1 gi dt = ui z ≥−4 + s 0 t σ σ s s 0 t ( ) ε z 1 s i Eσ˜ lim sups→∞ s 0 gt dt. This completes the proof of Theorem 4. 2

5. Proof of Theorem 5

Theorem 5 is straightforward if μ = 0. Therefore, we assume that μ > 0.  ∈ 1 N In order to prove Theorem 5, it suffices to prove that for every continuous-time stochastic game , w (Wd ) , and ∈ RS×N  N ε > 0, there is a vector payoff v , a strategy profile σ , and a neighborhood U of w in Wd , such that for every ∗ w ∈ U , a player i, and a strategy profile τ such that τ j = σ j for all j = i, we have + z i ∗ ≥ i ≥ z ¯ i ∗ − ε Eσ gt dw (t) v (z) Eτ gt dw (t) ε, (61) [0,∞] [0,∞] ∗ ∗ where w stands for (w )i . A. Neyman / Games and Economic Behavior 104 (2017) 92–130 125

The idea of the proof is as follows. We consider an auxiliary k-stage discrete-time game that depends on a uniform-l-average payoff vector u, a sufficiently large t∗, and a partition t0 = 0 < t1 <... ε > 0, w (Wd ) , and let be a continuous-time stochastic game. By the covariance properties, we assume ≤ i ≤   = w.l.o.g. that 0 gt 1 and μ 1/2. S×N Let u ∈ R be a uniform l-average equilibrium payoff of , and let σε be a strategy profile and tε > 1such that for j j every z ∈ S, t ≥ tε, and a strategy profile τ such that τ = σε for every j = i,

t t 1 1 ε + E z gi dt ≥ ui(z) ≥ E z gi dt − ε, and (62) t σε t t τ t 0 0 + z i ≥ i ≥ z ¯ i − ε Eσε g∞ u (z) Eτ g∞ ε. (63) + i [ ∞ Fix t∗ such that 2(tε ε) maxi∈N w ( t∗, )) < ε/3. = = k ≤ Fix an increasing sequence 0 t0 < t1 < ... < tk t∗ and (ε j ) j=0, such that 0≤ j≤k ε j < ε/2 and for 1 j < k, (t + − t )wi + (t + − t )2 < ε , where wi := wi([t , t + )). j 1 j j j 1 j j j j j 1 k k k−1 − i + The existence of such sequences, (t j) j=0 and (ε j ) j=0, follows from the fact that for a fixed t∗, j=0(t j+1 t j)w j 2 (t j+1 − t j) goes to 0 as max0≤ j

→ i i + i + −  i  a w j g (z,a) u j+1(z) (t j+1 ti) μ(z , z,a)u j+1(z ), and ∈ z S i = i i + i + −  i  u j(z) w j g (z, σ j(z)) u j+1(z) (t j+1 t j) μ(z , z, σ j(zt j ))u j+1(z ). z∈S

≤ ≤ ≤ i ≤ i [ ∞] Now we show that for every 0 j k we have 0 u j(z) w ( t j, ). ≤ i ≤ ≤ i ≤ ≤ i ≤ i ∞ ≤ i [ ∞] As 0 g 1, 0 u (z) 1. Therefore 0 uk(z) w ( ) w ( tk, ). ≤ ≤ i ≤ i [ ∞] − ≤ Fix 0 j < k and assume that 0 u j+1 w ( t j+1, ). As t j+1 t j 1 (which follows from the assumption − 2 ≤ + − ≤  ≥ (t j+1 t j) < ε j < 1on the increasing sequence t j ) we have 0 1 (t j+1 t j)μ(z, z, a) 1. In addition, μ(z , z, a) 0for     =  = i +  − i z z, and z ∈S μ(z , z, a) 0. Therefore, the sum u j+1(z) z ∈S (t j+1 t j)μ(z , z, a)u j+1(z ) is a convex combination of i   ∈ ≤ i [ ∞] the nonnegative numbers u j+1(z ), z S. Each one of these nonnegative numbers is w ( t j+1, ) (by assumption) and i therefore also their convex combination is ≤ w ([t j+1, ∞]). i Therefore, by the definition of u j(z), we have

≤ i = i i + i + −  i  0 u j(z) w j g (z, σ j(z)) u j+1(z) (t j+1 t j)μ(z , z, σ j(z))u j+1(z ) z∈S ≤ i + i [ ∞] = i [ ∞] w j w ( t j+1, ) w ( t j, ).

≤ i ≤ i [ ∞] Therefore, 0 u j w ( t j, ). 126 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

∗ Fix a player i. We claim that there is a neighborhood U of wi such that for every w ∈ U and every strategy profile τ such that τ j = σ j for every player j = i, we have + z i i ≥ i ≥ z ¯ i i − 2ε Eσ gt dw (t) u0(z) Eτ gt dw (t) 4ε. (64) [0,∞] [0,∞] ≤ := { ∗ ∈ :| ∗ − i | } ∗ := ∗ [ For 0 j < k, let U j w W w j w j < ε j where w j w ( t j, t j+1)). − i + − 2 ≤ ∗ ∈ By Lemma 2 and the inequality (t j+1 t j)w j (t j+1 t j) < ε j , for any 0 j < k and w U j , ⎛ ⎞ ⎜ ⎟ z ⎝ i ∗ + i | ⎠ Eσ gt dw (t) u j+1(zt j+1 ) ht j

[t ,t + ) ⎛ j j 1 ⎞ ⎜ ⎟ ≥ z ⎝ i i + i | ⎠ −| ∗ − i | Eσ gt dw (t) u j+1(zt j+1 ) ht j w j w j

[t ,t + ) j j 1 ≥ i i + i − i  −  − g (zt j , σ (zt j ))w j u j+1(zt j ) u j+1(z )(t j+1 t j)μ(z , z, σ j(z)) 2ε j z∈S ≥ i − u j(zt j ) 2ε j (65) ¯ ≤ = ≤ ≤ and by denoting by yt the mixed action profile of τ at time t j t < t j+1, conditional on ht j and zs zt j for every t j s t,

i u (zt ) j j = i i + i + − i   w j g (zt j , σ (zt j )) u j+1(zt j ) (t j+1 t j) u j+1(z )μ(z , zt j , σ j(zt j )) ∈ z S ≥ ∗ i ¯ + i + − i   ¯ −| ∗ − i | w j g (zt j , yt ) u j+1(zt j ) (t j+1 t j) u j+1(z )μ(z , zt j , yt ) w j w j  ⎛ z ⎞∈S ⎜ ⎟ ≥ z ⎝ i ∗ + i | ⎠ − Eτ gt dw (t) u j+1(zt j+1 ) ht j 2ε j. (66)

[t j ,t j+1) ∗ Therefore, if w ∈∩0≤ j

[0,tk) [0,tk) ∗ ∗ ∗ ∗ Set t¯ := t + t and U ={w ∈ W :|w [t t¯ ] − wi [t t¯ ] | +|w [t ∞] − wi [t ∞] | . Let w ∈ U and define ∗ k ε k d ( k, ) ( k, ) ( k, ) ( k, ) < εk k := dw ∗ ∞ f (t) dt (t). Note that as w is a discounting measure, f (t) exists for all, but countably many exceptions, 0 < t < , and f is monotonic nonincreasing with f (t) → 0as t →∞. (In what follows, the integrals of the form G(t)df (t) refer ∗ to the Riemann–Stieltjes integral. Note that − (t − tk) df (t) = w ([tk, ∞)).) Then, by integration by part and by using [tk,∞) inequality (62), we have ⎛ ⎞ ⎛ ⎞ t ⎜ ⎟ ⎜ ⎟ z ⎝ i ∗ | ⎠ =− z ⎝ i | ⎠ ≥ Eσ gt dw (t) htk Eσ ( gt dt) df (t) htk [ ∞ [ ∞ t ⎛tk, ) ⎞ tk, ) k t ⎜ ⎟ ≥− z ⎝ i | ⎠ ≥− − i − Eσ ( gt dt) df (t) htk (t tk)(u (ztk ) ε) df (t) [¯ ∞ t [¯ ∞ t, ) k t, ) ≥− − i − − ∗ [ ¯ ¯ − (t tk)(u (ztk ) ε) df (t) 2w ( tk,t ))(t tk)

[tk,∞) = i − ∗ [ ∞ − ∗ [ ¯ ¯ − ≥ i ∗ ∞ − (u (ztk ) ε)w ( tk, ) 2w ( tk,t ))(t tk) u (zt∗ )w (tk, )) ε, and if τ is a strategy profile such that τ j = σ j for every j = i then A. Neyman / Games and Economic Behavior 104 (2017) 92–130 127 ⎛ ⎞ ⎛ ⎞ t ⎜ ⎟ ⎜ ⎟ z ⎝ i ∗ | ⎠ =− z ⎝ i | ⎠ ≤ Eτ gt dw (t) htk Eτ ( gt dt) df (t) htk [ ∞ [ ∞ t ⎛tk, ) ⎞ tk, ) k t ⎜ ⎟ ≤− z ⎝ i | ⎠ + ∗ [ ¯ ¯ − Eτ ( gt dt) df (t) htk w ( tk,t ))(t tk) [¯ ∞ t t, ) k ≤− − i + + ∗ [ ¯ ¯ − (t tk)(u (ztk ) ε) df (t) w ( tk,t ))(t tk) [¯ ∞ t, ) ≤− − i + + ∗ [ ¯ ¯ − (t tk)(u (ztk ) ε) df (t) w ( tk,t ))(t tk)

[tk,∞) = i + ∗ [ ∞ + ∗ [ ¯ ¯ − (u (ztk ) ε)w ( tk, ) w ( tk,t ))(t tk). These inequalities imply that ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ ⎜ ⎟ + z ⎝ i ∗ | ⎠ ≥ ∗ [ ∞ i ≥ z ⎝ i ∗ | ⎠ − 2ε Eσ gt dw (t) htk w ( tk, ))uk(ztk ) Eτ gt dw (t) htk 2ε. (68)

[tk,∞) [tk,∞) ∗ ∗ As w ∈ U (and thus |w ([t , ∞]) − wi([t , ∞])| < ε ), and k  k  k k   + z i ∗ ∞ | ≥ ∗ ∞ i ≥ z ¯ i ∗ ∞ | − ε Eσ g∞ w ( ) htk w ( )uk(ztk ) Eτ g∞ w ( ) htk ε (69) (by inequality (63)), we deduce that ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ ⎜ ⎟ + z ⎝ i ∗ | ⎠ ≥ i [ ∞] i ≥ z ⎝ ¯ i ∗ | ⎠ − 4ε Eσ gt dw (t) htk w ( tk, )uk(ztk ) Eτ gt dw (t) htk 4ε. (70)

[tk,∞] [tk,∞] ∗ k Inequalities (67) and (70) imply that for w ∈ U := ∩ = U j , j 0 + z i ∗ ≥ i ≥ z i ∗ − 2 6ε Eσ gt dw (t) u0(z) Eτ gt dw (t) 6ε. [0,∞] [0,∞] The (first) part of the proof that ends with inequality (67) proves that for every w ∈ (W )N that is supported on [0, ∞), has a w -time-separable robust equilibrium.

6. The continuous-time process

Recall that sk stands for the k-th time of a state change. In this section we study properties of the of states (zt )t≥0 that is defined by a correlated strategy. Recall that the correlated strategy choice at time t is, for sk ≤ t < sk+1, a function of t and the state process up to time sk. A special case of a correlated strategy is a Markov correlated strategy. If σ is a correlated strategy and τ is a Markov strategy such that σ (h, t) = τ (h, t) whenever s1(h) > t, then the Pσ and the Pτ distributions of the play up to time s1 coincide.

Therefore, for a correlated strategy σ , the conditional Pσ distribution of ((zt )sk

(xt )0≤t

11 For a proof in the case where Q (t) is continuous in t, see, e.g., Rosenblatt (1962). Variants of this result are stated in, e.g., Miller (1968a) and Martin-Lof (1967). For the present formulation and its proof, see Neyman (2012). 128 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

In addition, we derive a quantified continuity property of the state stochastic process as a function of the Markov correlated strategy σ .  × =   ∈ × Let S and T be finite sets. An S T matrix Q (Q z,z )(z,z )∈S×T is a transition matrix if for all (z, z ) S T we have  ≥ ∈  = Q z,z 0 and for all z S we have z∈T Q z,z 1. Let M be the space of all S × S matrices Q , let M0 be its subspace of all matrices Q with ∈ Q z,z = 0for every z ∈ S  z S  ≥ = and Q z,z 0for all z z , and let M1 be the subspace of M of all transition matrices. The identity matrix is denoted by I. The space M is a (noncommutative) Banach algebra with the norm Q  = max ∈  |Q  |, and the spaces M and z S z ∈S z,z 0 ∈ j M1 are closed subalgebras of M. For an ordered list F1, ..., F j M we denote by i=1 Fi the matrix (ordered) product F1 F2 ...F j .

η ∗ Theorem 6. Let Q :[0, ∞) → M(respectively, → M0) be measurable, with Q (t) ≤ C, and let Q :[0, ∞) → M converge w Q to Q . Then there is a unique family F (s, t) = F (s, t), t ≥ s ≥ 0, of matrices in M(respectively, in M1) such that for all 0 ≤ s ≤ t1 ≤ t and δ>0 we have

F (s, s) = I (71) t+δ F (t,t + δ) − I − Q (x) dx≤o(δ), (72)

t where o(δ) is a function of δ such that o(δ)/δ → 0 as δ → 0+, and

F (s,t) = F (s,t1)F (t1,t). (73) This unique family satisfies ∂ F (s,t) = F (s,t)Q (t) for almost every t (74) ∂t and for δ ≤ 1/Cwe have

t+δ F (t,t + δ) − I − Q (x) dx≤Cδ2. (75)

t

The above theorem derives the unique transition matrices F (s, t) that are defined by a Markov strategy σ via the tran-  sition rate matrices Q , where Q z,z = μ(z , z, σ (z, t)).  It is easy (and classic) to prove that any state process (zt )t≥0 with law P , such that P(zt = z | zs = z) = F z,z (s, t), has, with probability 1, left and right limits everywhere, and for each fixed t it is, with probability 1, continuous at t. Therefore, as our game payoffs are of the form gdw(t), and w has at most countably many atoms, there is no loss of generality in assuming in our game model that the state process is right continuous. S A correlated strategy σ defines a probability distribution Pσ on the state process h . The mixed-action choice of σ at time t is a function of the state process (up to time t) and the time t, and therefore is denoted by σ (hS , t).

= HS Lemma 12. Let σ and τ be correlated strategies, z z0 an initial state, and Ta bounded ( t )t -stopping time. For any positive integer , let T () := min{ j/ : j/ ≥ T }, and let Ti(), or Ti for short, be defined by Ti = T () ∧ i/ := min(T (), i/). ≥ ≤ ¯ ¯  For i 0 and Ti−1 t < Ti we denote by xt , respectively yt , the mixed-action choice of the correlated strategy σ , respectively τ , at time t conditional on no state change in the time interval [Ti−1, t].  := Then, the norm distance between the Pσ and Pτ distributions of zn z0, z1/, ..., zTn , respectively (zt )t≤T , is bounded by

T n i Tn   2 +   z  ¯ − ¯  ≤   2 +   z ¯ − ¯  4n μ / 2 μ Eσ (xt yt ) dt 4n μ / 2 μ Eσ xt yt dt, i=1 Ti−1 0   z T  S − S  respectively 2 μ Eσ 0 σ (h , t) τ (h , t) dt.

The proof uses Lemma 2 and the following lemma.

Lemma 13. Let {∅, } = F0 ⊂ F1 ...⊂ Fn be an increasing sequence of σ -algebras of subsets of a nonempty set , and let Pand Q be two probability measures on the measurable space (, Fn). Denote by Fi the set of all [−1, 1]-valued Fi -measurable functions on  and by P i and Q i the probability measures Pand Qon (, Fi ). Then, A. Neyman / Games and Economic Behavior 104 (2017) 92–130 129

P i − Q i=sup{E P f − E Q f : f ∈ Fi}, and n P − Q ≤sup{E P (E P ( fi | Fi−1) − E Q ( fi | Fi−1)) : fi ∈ Fi}. (76) i=1

Proof. P i − Q i  := sup{P(C) − Q (C) + Q ( \ C) − P( \ C) : C ∈ Fi} = sup{E P (1C − 1\C ) − E Q (1C − 1\C ) : C ∈ Fi} ≤ sup{E P f − E Q f : f ∈ Fi }. As L∞(P i + Q i ) f → E P f − E Q f is linear and 1C − 1\C , C ∈ Fi , are the extreme points of L∞(P i + Q i ), the last inequality is an equality. This proves the first part of the lemma, and the second part for n = 1. Assume that (76) holds for n = k and let f ∈ Fk+1. As E Q ( f | Fk) ∈ Fk, E P f − E Q f = E P E P ( f | Fk) − E Q E Q ( f | F = | F − | F + | F − | F ≤ | F − | F + −  ≤ k) EP (E P ( f k) E Q ( f k)) E P (E Q ( f k) E Q E Q ( f k) E P (E P ( f k) E Q ( f k)) Pk Q k { k+1 | F − | F : ∈ } = + sup E P i=1 (E P ( fi i−1) E Q ( fi i−1) fi Fi . Therefore, (76) holds for n k 1, and thus by induction for ev- ery n. 2

Proof of Lemma 12. Set in Lemma 13 F = HS , where T := T ∧ i/ = min(T , i/). i Ti i ¯ ¯  ≤ F Note that xt and yt , Ti−1 t < Ti , are measurable with respect to i−1.  T  : →[− ] | z | F − z | F | ≤  2 2 +|  i By Lemma 2, for every fi S 1, 1 , Eσ ( fi(zTi ) Ti−1 ) Eτ ( fi(zTi ) Ti−1 ) 4 μ / z ∈S fi(z ) μ(z , z, Ti−1 2 2 Ti 2 2 Ti x¯t − y¯ t ) dt| ≤ 4μ / + 2μ  (x¯t − y¯ t ) dt ≤ 4μ / + 2μ x¯t − y¯ t  dt. Ti−1 Ti−1 By summing these inequalities over i = 1, ..., n, the first bound in the lemma follows from Lemma 13. ¯ − ¯  →∞  S − S  As xt yt is bounded and with Pσ -probability 1 converges, as  , a.e. to σ (h , t) τ (h , t) , the second part of the lemma follows from the bounded convergence theorem. 2

7. Related literature

7.1. Continuous-time supergames and bargaining

Simon and Stinchcombe (1989) develop a general theory of continuous-time games. The pathologies of continuous-time strategies are resolved there by imposing assumptions (i.e., restrictions) on pure strategies that identify a class of strategies that yield a well-defined outcome. The assumptions are, essentially, that 1) the number of action changes is uniformly bounded on every finite time interval, 2) the strategies are piecewise continuous with respect to time, and 3) there is strong right continuity with respect to histories. Bergin and MacLeod (1993) develop a model of continuous-time supergames of and study the strategic equilibrium behavior of these games. The pathologies of continuous-time strategies are resolved there by consid- ering only strategies that have inertia. A strategy with inertia is essentially a limit, in a proper sense, of strategies that as a function of history and time select a constant action over a short time interval. In addition, the pure strategies choose a pure stage action, in contrast to our main model, which allows a strategy to choose a mixed action as a function of history. Perry and Reny (1993) develop a theory of continuous-time bargaining, where players can make offers whenever they like. The pathologies of continuous-time strategies are resolved there by considering only strategies where upon making an offer players must wait a fixed amount of time before making another offer. The continuous-time conclusions are obtained by studying the results when the waiting times go to zero.

7.2. Continuous-time stochastic games

Zachrisson (1964) introduced the theory of continuous-time two-person zero-sum stochastic games. He proves the exis- tence of a value and optimal strategies in a finite-horizon game where the payoff of a play is the sum of the integral of the payoffs gt = g(t, zt , xt ) and a terminal payoff that is a function of the state at the end of the game. In Zachrisson’s model it is assumed that the space of strategies of each player is the space of Markov strategies. The assumption that all strategies are Markov strategies is restrictive but indeed innocuous in the model of two-person zero-sum games in the sense that Zachrisson’s solution – the value and optimal strategies – is also a solution when one considers all strategies. On the other hand, Zachrisson’s model allows for time-dependent compact action spaces, transition rates, and payoffs. Explicitly, the action space of player i is a compact subset X i (t, z) of a Euclidean space that depends continuously (in the  Hausdorff metric) on t, the transition rates μ(z , z, t, xt ) depend continuously on t and xt , and the payoff function g(t, zt , xt ) T depends also on the time t. The payoff in Zachrisson’s T -horizon game is g t z x dt + f z . A condition on the 0 ( , t , t ) ( T ) → + ∈ RS existence of optimal strategies in a family of auxiliary one-stage games xt g(t, zt , xt ) z μ(z, zt , t, xt )v(z), v is shown to suffice for the existence of a value and optimal strategies. 130 A. Neyman / Games and Economic Behavior 104 (2017) 92–130

Lai and Tanaka (1982) prove that a finite12 continuous-time stochastic game has a stationary equilibrium in the game where the players are restricted to Markov strategies. Jasso-Fuentes (2005) studies the equilibrium conditions for Markov games with Polish state and action spaces when the game is restricted to Markov strategies. However, existence is not addressed there. The relation between the value of the discounted continuous-time and the discounted discrete-time , called the uniformization technique, is discussed also in Howard (1960). Guo and Hernandez-Lerma (2005) study two-person non-zero-sum continuous-time stochastic games with stationary discounted payoff criteria (and Borel action spaces), and give conditions that ensure the existence of in stationary strategies. Yehuda Levy (2013) proved that a non-zero-sum stochastic game of fixed duration has a Markovian correlated equilib- rium.

References

Bergin, J., MacLeod, W.B., 1993. Continuous time repeated games. Int. Econ. Rev. 34, 21–37. Blackwell, D., Ferguson, T.S., 1968. The big match. Ann. Math. Stat. 39, 159–163. Guo, X., Hernandez-Lerma, O., 2003. Zero-sum games for continuous-time Markov chains with unbounded transitions and average payoff rates. J. Appl. Probab. 40, 327–345. Guo, X., Hernandez-Lerma, O., 2005. Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs. J. Appl. Probab. 42, 303–320. Guo, X., Hernandez-Lerma, O., 2009. Continuous-Time Markov Decision Processes. Springer, Berlin. Howard, R., 1960. Dynamic Programming and Markov Processes. Wiley, New York. Isaacs, R., 1999. Differential Games. Dover. Jasso-Fuentes, H., 2005. Noncooperative continuous-time Markov Games. Morfismos 9, 39–54. Kakumanu, P., 1969. Continuous Time Markov Decision Models with Applications to Optimization Problems. Technical Report 63, Dept. of Operations Research, Cornell University. Kakumanu, P., 1971. Continuously discounted Markov decision model with countable state and action space. Ann. Math. Stat. 42, 919–926. Kakumanu, P., 1975. Continuous-time Markovian decision processes with average return criterion. J. Math. Anal. Appl. 52, 177–183. Lai, H.C., Tanaka, K., 1982. Non-cooperative n-person games with stopped set. J. Math. Anal. Appl. 85, 153–171. Levy, Y., 2013. Continuous-time stochastic games of fixed duration. Dyn. Games Appl. 3, 279–312. Martin-Lof, A., 1967. Optimal control of a continuous-time Markov chain with periodic transition probabilities. Oper. Res. 15, 872–881. Mertens, J.-F., Neyman, A., 1981. Stochastic games. Int. J. 10, 53–66. Mertens, J.-F., Sorin, S., Zamir, S., 2015. Repeated Games. Cambridge University Press, Cambridge. Merton, R.C., 1992. Continuous-Time Finance. Basil Blackwell, Cambridge, MA. Miller, B.L., 1968a. Finite state continuous time Markov decision processes with a finite planning horizon. SIAM J. Control 6, 266–280. Miller, B.L., 1968b. Finite state continuous time Markov decision processes with an infinite planning horizon. J. Math. Anal. Appl. 22, 522–569. Neyman, A., 1999. Cooperation in repeated games when the number of stages is not commonly known. Econometrica 67, 45–64. Neyman, A., 2003a. Real algebraic tools in stochastic games. In: Stochastic Games and Applications, pp. 57–75. Neyman, A., 2003b. Stochastic games: existence of the minmax. In: Stochastic Games and Applications, pp. 173–193. Neyman, A., Sorin, S. (Eds.), 2003. Stochastic Games and Applications. NATO ASI Series. Kluwer Academic Publishers. Neyman, A., 2012. Continuous-Time Stochastic Games. DP 616, Center for the Study of Rationality, Hebrew University. Neyman, A., 2013. Stochastic games with short-stage duration. Dyn. Games Appl. 3, 236–278. Perry, M., Reny, P., 1993. A non- model with strategically timed offers. J. Econ. Theory 59, 50–77. Rosenblatt, M., 1962. Random Processes. Oxford University Press, Oxford. Rykov, V.V., 1966. Markov decision processes with finite spaces of states and decisions. Theory Probab. Appl. 11, 343–351. Shapley, L.S., 1953. Stochastic games. Proc. Natl. Acad. Sci. USA 39, 1095–1100. Simon, L.K., Stinchcombe, M.B., 1989. Extensive form games in continuous time: pure strategies. Econometrica 57, 1171–1214. Solan, E., Vieille, N., 2002. in stochastic games. Games Econ. Behav. 38, 362–399. Strook, D.W., Varadhan, S.R., 2006. Multidimensional Diffusion Processes. Springer, Berlin. Vigeral, G., 2012. A Zero-Sum Stochastic Game with Compact Action Sets and no Asymptotic Value. CEREMADE, Universite Paris-Dauphine. Preprint. Zachrisson, L.E., 1964. Markov games. In: Advances in Game Theory. Princeton University Press, Princeton, NJ.

12 The existence result in Lai and Tanaka (1982) applies also to a class of games with a countable state space and infinite action spaces.