On Strong Determinacy of Countable Stochastic Games

Home , Ordinal number, Stochastic game

arXiv:1704.05003v1 [cs.GT] 17 Apr 2017 xeddvrino aeilpeetda IS21.arXiv. 2017. LICS at presented material of version Extended hte h taeiso h lyr a ihu restricti without can players the of strategies the whether predicate. the th satisfies to play corresponds a value payo that payoff general probability expected not the plays, Thus on functions. predicates via defined objectives tt space state eemnc ysoigta n ftepaesms aea have must players stron the strategy. strengthen of winning one we (MD) that deterministic games, memoryless showing branching by finitely determinacy in objectives i ae with games tic ae:i tawy h aeta h aiie rteminimiz the that or player, all against other maximizer enforces, the the that one of that i.e., strategy, case winning the a has always it is cases: c < 8,[] Player [7]. [8], branching. finitely is game the if even not determined, nfiiegmswt lotsr alojcie.O h othe the On objectives. re tail that previous almost-sure show a with we hand games generalizes finite vastly on This determined. strongly nwihpae wstecretstate. depending current round, the every owns in onl player action which turn-based an games) choose on of stochastic to gets simple subclass given player called the one is (also result In games the distribution. stochastic actions pre-defined of action a given combination of by each (out eve for action in and an player games, choose sets) while stochastic each concurrent players plays, both In of round it. set minimize the to on tries defined function payoff desra ae ewe w lyr temaximizer (the players minimizer two the and between games adversarial games. Stochastic admyacrigt r-enddsrbto.Stochast called distribution. also pre-defined are a games to according randomly aevle oee,srn eemnc ftrsodobjec event threshold an of by determinacy (given strong However, value. have i tmxmzn/iiiigtepoaiiyo ie eve reachabili as given such a plays), infinite ω of of probability set pl measurable the The (i.e., space. maximizing/minimizing state at infinite aim countably with games stochastic rglro oegnrlobjectives. general more or -regular tnadqetosaewehragm sdtrie,and determined, is game a whether are questions Standard ne Terms Index eso htams-ueojcie (where objectives almost-sure that show We oevr o lotsr ecaiiyadams-ueB¨ almost-sure and reachability almost-sure for Moreover, esuy2pae unbsdpretifrainstochas perfect-information turn-based 2-player study We hs ae r nw ob ekydtrie,ie,they i.e., determined, weakly be to known are games These Abstract )? W td -lyrtr-ae perfect-information turn-based 2-player study —We sohsi ae,srn eemnc,infinite determinacy, strong games, —stochastic ✷ onal infinite countably re omxmz h xetdvleo some of value expected the maximize to tries E ≥ ✸ n threshold a and tfnKiefer Stefan w-lyrsohsi ae 1]are [16] games stochastic Two-player .I I. 1 hr oedcsosaedetermined are decisions some where ) 2 / E 2 2 1 NTRODUCTION pae ae ntetriooyof terminology the in games -player sstse ihprobability with satisfied is c-Buh betvsaentstrongly not are objectives (co-)Büchi onal tcatcGames Stochastic Countable nSrn eemnc of Determinacy Strong On ∗ c tt pcs econsider We spaces. state ihr Mayr Richard , ∈ [0 , 1] a pni many in open was ) r CB 4.0. BY CC - org † ‡ nvriyo dnug,UK Edinburgh, of University c nvriyo iepo,UK Liverpool, of University ∗ nvriyo xod UK Oxford, of University 1 = ≥ strategies y Büchi,ty, † as Shirmohammadi Mahsa , c (resp. are ) ayers tives uchi sult on ry nt er ✸ ✷ ic ff y g e r - tutrlpoet httesrtge a ecoe fa of chosen be a finite-memory). can use or strategies MD often (e.g., the algorithms type particular that such property because winner the structural games, deciding result for stochastic algorithms These on of [6]. influence strong [7], a [10], have (MD) deterministic memoryless strat E.g. the in randomization). and and strategies determinacy requirements their (memory w.r.t. both complexity [23], [8], studied extensively [17], been [11], have spaces (memoryless state finite MD with games. e.g., Infinite-state type, vs. games randomized). particular Finite-state (finite-memory a FR of or be deterministic) to chosen be nnt ae.E.g., do games. infinite games finite from techniques questions below). many further though contributions [5] our [18], (see open spac [4], remained [19], state in infinite presented countably results were with Some games model games. stochastic infinite-state automata techniques of general underlying analysis used the general and works to a not adapted these [12], specially most are [14], However, that [1]. [13], were in fifo-queues in unbounded studied with systems studied on wi games were stochastic automata pushdown stacks) probabilistic unbounded (i.e., probabilist systems infinite-state cursive finit on still games or are Stochastic classes counters, branching. these of unbounded often Most fifo-queues). stacks, are memory unbounded pushdown infinite These use unbounded that well. automata (e.g., as of types considered various by been induced have games state betv.I h aei nt hnteeaeol finitely only are certain there a then w.r.t. finite game a is in game states the the If followi of objective. the values is difference the this Consider underlying reasons the of One oercnl,svrlcasso ntl rsne infinit presented finitely of classes several recently, More tsol entdta aysadr eut n proof and • results standard many that noted be should It • • vni tt a au,a pia taeyne not need strategy [19]. optimal objectives reachability an for value, even has not state exist, a if Even nfiiegmste odee o aiyojcie [8]). objectives parity for even hold (while they [18] games [4], not finite objectives reachability in do for below) even (see not hold, properties determinacy strong Some hr hyeit eur nnt eoy[19]. memory finitely infinite require if objective exist, (even they reachability where for games strategies infinite optimal In branching) countably [8]. ob- deterministic in parity memoryless for contrast, chosen strategies be optimal can dif- games, jectives are finite strategies In optimal ferent. of requirements memory The finite ∗ oii Wojtczak Dominik , tcatcprt ae a echosen be can games parity stochastic not ar vrt countably to over carry ‡ tcatcgames Stochastic cre- ic egy [9], ng. ely on es th e- s, s , , Objective > 0 >c ≥ c = 1 Objective > 0 >c ≥ c = 1 X X X X

Reachability (MD) (MD) (¬FR) (MD) Reachability X(MD) × × "(¬FR)

" ° ° " "

B¨uchi (¬FR) (MD) B¨uchi "(¬FR) × × (¬FR)

" ° ° " " Borel (¬FR) (¬FR) Borel "(¬FR) × × (¬FR) (a) Finitely branching games (b) Infinitely branching games TABLE I: Summary of determinacy and memory requirement properties for reachability, Büchi and Borel objectives and various probability thresholds. The results for safety and co-Büchi are implicit, e.g., > 0 Büchi is dual to to =1 co-Büchi. Similarly, (Objective,>c) is dual to (¬Objective, ≥ c). The results hold for every constant c ∈ (0, 1). Tables Ia and Ib show the results for finitely branching and infinitely branching countable games, respectively. “X(MD)” stands for “strongly MD-determined”, “X(¬FR)” stands for “strongly determined but not strongly FR-determined” and × stands for “not strongly determined”. New results are in boldface. (All these objectives are weakly determined by [20].) many such values, and in particular there exists some minimal Strong determinacy in infinite games. It was shown in [4], nonzero value (unless all states have value zero). This property [18], [5] that in finitely branching games with countable state does not carry over to infinite games. Here the set of states spaces reachability objectives with any threshold ✄c with is infinite and the infimum over the nonzero values can be c ∈ [0, 1], are strongly determined. However, the player ✷ zero. As a consequence, even for a reachability objective, it is strategy may need infinite memory [19], and thus reachability possible that all states have value > 0, but still the value of objectives are not strongly MD determined. Strong determi- some states is < 1. Such phenomena appear already in infinite- nacy does not hold for infinitely branching reachability games state Markov chains like the classic Gambler’s ruin problem with thresholds ✄c with c ∈ (0, 1); cf. Figure 1 in [4]. with unfair coin tosses in the player’s favor (e.g., 0.6 win and Our contribution to determinacy. We show that almost- 0.4 lose). The value, i.e., the probability of ruin, is always sure Borel objectives are strongly determined for games with > 0, but still < 1 in every state except the ruin state itself; countably infinite state spaces. (In particular this even holds for cf. [15] (Chapt. 14). infinitely branching games; cf. Table I.) This removes both the Weak determinacy. Using Martin’s result [21], Maitra & restriction to finite games and the restriction to tail objectives Sudderth [20] showed that stochastic games with Borel payoffs of [17, Theorem 3.3], and solves an open problem stated are weakly determined, i.e., all states have value. This very there. (To the best of our knowledge, strong determinacy was general result holds even for concurrent games and general open even for almost-sure reachability objectives in infinitely (not necessarily countable) state spaces. They work in the branching countable games.) framework of finitely additive probability theory (under weak On the other hand, we show that, for countable games, ✄c assumptions on measures) and only assume a finitely additive (co-)Büchi objectives are not strongly determined for any c ∈ law of motion. Also their payoff functions are general bounded (0, 1), not even if the game graph is finitely branching. Borel measurable functions, not necessarily predicates on Our contribution to strategy complexity. While ✄c reach- plays. ability objectives in finitely branching countable games are Strong determinacy. Given a predicate E on plays and a not strongly MD determined in general [19], we show that constant c ∈ [0, 1], strong determinacy of a threshold objective strong MD determinacy holds for many interesting subclasses. (E, ✄c) (where ✄ ∈ {>, ≥}) holds iff either the maximizer In finitely branching games, it holds for strict inequality > c or the minimizer has a winning strategy, i.e., a strategy that reachability, almost-sure reachability, and in all games where ✷ enforces (against any strategy of the other player) that the either player does not have any value-decreasing transitions ✸ predicate E holds with probability ✄c (resp. 6 ✄ c). In the case or player does not have any value-increasing transitions. of (E, = 1), one speaks of an almost-sure E objective. If the Moreover, we show that almost-sure Büchi objectives (but winning strategy of the winning player can be chosen MD not almost-sure co-Büchi objectives) are strongly MD deter- (memoryless deterministic) then one says that the threshold mined, provided that the game is finitely branching. objective is strongly MD determined. Similarly for other types Table I summarizes all properties of strong determinacy and of strategies, e.g., FR (finite-memory randomized). memory requirements for Borel objectives and subclasses on countably infinite games. Strong determinacy in finite games. Strong determinacy for almost-sure objectives (E, = 1) (and for the dual positive II. PRELIMINARIES probability objectives (E,> 0)) is sometimes called qualitative A probability distribution over a countable (not necessarily determinacy [17]. In [17, Theorem 3.3] it is shown that finite) set S is a function f : S → [0, 1] s.t. s∈S f(s)=1. finite stochastic games with Borel tail (i.e., prefix-independent) We use supp(f)= {s ∈ S | f(s) > 0} to denoteP the support objectives are qualitatively determined. (We’ll show a more of f. Let D(S) be the set of all probability distributions over S. 1 general result for countably infinite games and general objec- We consider 2 2 -player games where players have perfect tives; see below.) In the special case of parity objectives, even information and play in turn for infinitely many rounds. strong MD determinacy holds for any threshold ✄c [8]. Games G = (S, (S✷,S✸,S ), −→, P ) are defined such that

2 the countable set of states is partitioned into the set S✷ of deterministic (D) if πu and πs map to Dirac distributions; it states of player ✷, the set S✸ of states of player ✸ and implies that τ(w) is a Dirac distribution for all partial plays w. random states S . The relation −→ ⊆ S × S is the transition All combinations of the properties in {M, F, H}×{D, R} are relation. We write s−→s′ if (s,s′) ∈ −→, and we assume possible, e.g., MD stands for memoryless deterministic. HR that each state s has a successor state s′ with s−→s′. The strategies are the most general type. probability function P : S → D(S) assigns to each random state s ∈ S a probability distribution over its successor Probability Measure and Events. To a game G, an initial states. The game G is called finitely branching if each state state s0 and strategies (σ, π) we associate the standard prob- ω has only finitely many successors; otherwise, it is infinitely ability space (s0S , F, PG,s0,σ,π) w.r.t. the induced Markov branching. Let ⊙∈{✷, ✸}. If S⊙ = ∅, we say that player ⊙ chain. First one defines a topological space on the set of infi- ω ω is passive, and the game is a Markov decision process (MDP). nite plays s0S . The cylinder sets are the sets s0s1 ...snS , A Markov chain is an MDP where both players are passive. where s1,...,sn ∈ S and the open sets are arbitrary unions ω ∗ The stochastic game is played by two players ✷ (maximizer) of cylinder sets, i.e., the sets YS with Y ⊆ s0S . The Borel ω s0S and ✸ (minimizer). The game starts in a given initial state s0 σ-algebra F ⊆ 2 is the smallest σ-algebra that contains and evolves for infinitely many rounds. In each round, if all the open sets. the game is in state s ∈ S⊙ then player ⊙ chooses a The probability measure PG,s0,σ,π is obtained by first defin- successor state s′ with s−→s′; otherwise the game is in a ing it on the cylinder sets and then extending it to all sets in ′ random state s ∈ S and proceeds randomly to s with the Borel σ-algebra. If s0s1 ...sn is not a partial play induced ′ ω probability P (s)(s ). by (σ, π) then let PG,s0,σ,π(s0s1 ...snS )=0; otherwise let ω n−1 ω PG,s0,σ,π(s0s1 ...snS )= i=0 τ(s0s1 ...si)(si+1), where Strategies. A play w is an infinite sequence s0s1 ··· ∈ S Q ∗ τ is such that τ(ws) = σ(ws) for all ws ∈ S S✷, τ(ws) = of states such that s −→s +1 for all i ≥ 0; let w(i) = s ∗ i i i π(ws) for all ws ∈ S S✸, and τ(ws) = P (s) for all denote the i-th state along w. A partial play is a finite prefix ∗ ws ∈ S S . By Carathéodory’s extension theorem [2], this of a play. We say that (partial) play w visits s if s = w(i) defines a unique probability measure PG,s ,σ,π on the Borel for some i, and that w starts in s if s = w(0). A strategy 0 ∗ σ-algebra F. of the player ✷ is a function σ : S S✷ → D(S) that ∗ We will call any set E ∈F an event, i.e., an event is a assigns to partial plays ws ∈ S S✷ a distribution over the ′ ′ ∗ measurable (in the probability space above) set of infinite successors {s ∈ S | s−→s }. Strategies π : S S✸ → D(S) ✸ plays. Equivalently, one may view an event E as a Borel for the player are defined analogously. The set of all ω ✷ ✸ measurable payoff function of the form E : s0S → {0, 1}. strategies of player and player in G is denoted by ΣG ′ ω ′ ω Given E ⊆ S (where potentially E 6⊆ s0S ) we often write and ΠG , respectively (we omit the subscript and write Σ ′ ′ ω PG,s0,σ,π(E ) for PG,s0,σ,π(E ∩ s0S ) to avoid clutter. and Π if G is clear). A (partial) play s0s1 ··· is induced by strategies (σ, π) if si+1 ∈ supp(σ(s0s1 ··· si)) for all si ∈ S✷, Objectives. Let G = (S, (S✷,S✸,S ), −→, P ) be a game. and if s +1 ∈ supp(π(s0s1 ··· s )) for all s ∈ S✸. i i i The objectives of the players are determined by events E. We To emphasize the amount of memory required to implement write ¬E for the dual objective defined as ¬E = Sω \E. a strategy, we present an equivalent formulation of strategies. A strategy of player ⊙ can be implemented by a probabilistic Given a target set T ⊆ S, the reachability objective is defined by the event transducer T = (M, m0, πu, πs) where M is a countable set (the memory of the strategy), m0 ∈ M is the initial memory ω mode and S is the input and output alphabet. The probabilistic Reach(T )= {s0s1 ···∈ S | ∃i.si ∈T}. transition function πu : M × S → D(M) updates the memory mode of the transducer. The probabilistic successor Moreover, Reachn(T ) denotes the set of all plays visiting T function πs : M × S⊙ → D(S) outputs the next successor, in the first n steps, i.e., Reachn(T )= {s0s1 ···|∃i ≤ n.si ∈ ′ ′ where s ∈ supp(πs(m,s)) implies s−→s . We extend πu to T}. The safety objective is defined as the dual of reachability: D(M) × S → D(M) and πs to D(M) × S⊙ → D(S), in the Safety(T )= ¬Reach(T ). natural way. Moreover, we extend πu to paths by πu(m,ε)= For a set T ⊆ S of states called Buchi¨ states, the Buchi¨ m and πu(m,s0 ··· sn) = πu(πu(s0 ··· sn−1, m),sn). The ∗ objective is the event strategy τT : S S⊙ → D(S) induced by the transducer T T is given by τ (s0 ··· sn) := πs(sn, πu(s0 ··· sn−1, m0)). ω Büchi 0 1 Strategies are in general history dependent (H) and ran- (T )= {s s ···∈ S | ∀i ∃j ≥ i.sj ∈T}. domized (R). An H-strategy τ ∈ {σ, π} is finite memory (F) if there exists some transducer T with memory M The co-Buchi¨ objective is defined as the dual of Büchi. such that τT = τ and |M| < ∞; otherwise τ requires Note that the objectives of player ✷ (maximizer) and infinite memory. An F-strategy is memoryless (M) (also called player ✸ (minimizer) are dual to each other. Where player ✷ positional) if |M| = 1. For convenience, we may view M- tries to maximize the probability of some objective E, player ✸ strategies as functions τ : S⊙ → D(S). An R-strategy τ is tries to maximize the probability of ¬E.

3 III. DETERMINACY Theorem 3.3] it is shown that finite stochastic games with A. Optimal and ǫ-Optimal Strategies; Weak and Strong De- tail objectives are qualitatively determined. An objective E ∗ ω terminacy is called tail if for all w0 ∈ S and all w ∈ S we have w0w ∈ E ⇔ w ∈ E, i.e., a tail objective is independent of Given an objective E for player ✷ in a game G, state s has finite prefixes. The authors of [17] express “hope that [their value if qualitative determinacy theorem] may be extended beyond the sup inf PG,s,σ,π(E) = inf sup PG,s,σ,π(E). class of finite simple stochastic tail games”. We fulfill this σ∈Σ π∈Π π∈Π σ∈Σ hope by generalizing their theorem from finite to countable If s has value then valG (s) denotes the value of s defined games and from tail objectives to arbitrary objectives: by the above equality. A game with a fixed objective is called Theorem 2. Stochastic games, even infinitely branching ones, weakly determined iff every state has value. with almost-sure objectives are strongly determined. Theorem 1 (follows immediately from [20]). Countable Theorem 2 does not carry over to thresholds other than stochastic games (as defined in Section II) are weakly deter- 0 or 1; cf. Theorem 3. mined. The main ingredients of the proof of Theorem 2 are Theorem 1 is an immediate consequence of a far more gen- transfinite induction, weak determinacy of stochastic games eral result by Maitra & Sudderth [20] on weak determinacy of (Theorem 1), the concept of a “reset” strategy from [17], and (finitely additive) games with general Borel payoff objectives. Lévy’s zero-one law. The principal idea of the proof is to For ǫ ≥ 0 and s ∈ S, we say that construct a transfinite sequence of subgames, by removing ✷ • σ ∈ Σ is ǫ-optimal (maximizing) iff PG,s,σ,π(E) ≥ parts of the game that player cannot risk entering. This valG(s) − ǫ for all π ∈ Π. approach is used later in this paper as well, for Theorems 5 • π ∈ Π is ǫ-optimal (minimizing) iff PG,s,σ,π(E) ≤ and 11. valG(s)+ ǫ for all σ ∈ Σ. Example 1. We explain this approach using the reachability A 0-optimal strategy is called optimal. An optimal strategy game in Figure 1 as an example. Each state has value 1 in this ✷ for the player is almost-surely winning if valG (s)=1. game, except those labeled with 0. However, only the states Unlike in finite-state games, optimal strategies need not exist labeled with ⊥ are almost-surely winning for player ✷. To in countable games, not even for reachability objectives in see this, consider a player ✷ state labeled with 1. In order to finitely branching MDPs [3], [4]. reach T , player ✷ eventually needs to take a transition to a 0- However, since our games are weakly determined by The- labeled state, which is not almost-surely winning. This means orem 1, for all ǫ> 0 there exist ǫ-optimal strategies for both that the 1-labeled states are not almost-surely winning either. players. Hence, player ✷ cannot risk entering them if the player wants For an objective E and ✄ ∈ {≥,>} and threshold c ∈ [0, 1], to win almost surely. Continuing this style of reasoning, we ✄ we define threshold objectives (E, c) as follows. infer that the 2-labeled states are not almost-surely winning, ✄c • E is the set of states s for which there exists a strat- and so on. This implies that the ω-labeled states are not ✷ G egy σ such that, for all π ∈ Π, we have PG,s,σ,π(E) ✄ c. almost-surely winning, and so on. The only almost-surely ✄6 c • E is the set of states s for which there exists a strat- winning player ✷ state is the ⊥-labeled state at the bottom of ✸ G egy π such that, for all σ ∈ Σ, we have PG,s,σ,π(E)6✄c. the figure, and the only winning strategy is to take the direct We omit the subscript G where it is clear from the context. transition to the target in the bottom-left corner. We call a state s almost-surely winning for the player ✷ iff Proof of Theorem 2. The first step of the proof is to transform ≥1 s ∈ E . the game and the objective so that the objective can in some ✷ By the duality of the players, a (E, ≥ c) objective for respects be treated like a tail objective. Let Gˆ be a stochastic player ✷ corresponds to a (¬E,> 1 − c) objective from game with countable state space Sˆ and objective Eˆ. We convert player ✸’s point of view. E.g., an almost-sure Büchi objective the game graph to a forest by encoding the history in the states. for player ✷ corresponds to a positive-probability co-Büchi Formally we proceed as follows. The state space, S, of the new objective for player ✸. Thus we can restrict our attention to game, G, consists of the partial plays in Gˆ, i.e., S ⊆ Sˆ∗Sˆ. reachability, Büchi and general (Borel set) objectives, since Observe that S is countable. For any ⊙∈{✷, ✸, } we safety is dual to reachability, and co-Büchi is dual to Büchi, define S⊙ := {wsˆ ∈ S | sˆ ∈ Sˆ⊙}. A transition is a transition and Borel is self-dual. of G iff it is of the form wsˆ−→wsˆsˆ′ where wsˆ ∈ S and A game G with threshold objective (E, ✄c) is called strongly sˆ−→sˆ′ is a transition in Gˆ. The probabilities in G are defined ✷ ✸ determined iff in every state s either player or player has in the obvious way. For sˆ ∈ Sˆ we define an objective Esˆ so ✄c ✄6 c a winning strategy, i.e., iff S = E ✷ ⊎ E ✸ . that a play in G starting from the singleton sˆ ∈ S satisfies Esˆ Strong determinacy depends on the specified threshold iff the corresponding play from sˆ ∈ Sˆ in Gˆ satisfies Eˆ. Since ✄c. Strong determinacy for almost-sure objectives (E, = 1) strategies in G (for singleton initial states in Sˆ) carry over (and for the dual positive probability objectives (E,> 0)) to strategies in Gˆ, it suffices to prove our determinacy result is sometimes called qualitative determinacy [17]. In [17, for G.

4 ......

⊥ 0 1 ⊥ 1 2 ⊥ 2 3

. . ⊥ 0 1 ⊥ 1 2 ⊥ 2 3 . .

0 1 1 2 2 3 3 4 ···

ω ω ω ω ··· ......

⊥ ω ω+1

. . ⊥ ω ω+1 . .

ω ω+1 ω+1 ω+2 ···

⊥ ⊥ ω·2 ω·2 ···

Fig. 1: A finitely branching reachability game where the states of player ✷ are drawn as squares and the random states as circles. Player ✸ is passive in this game. The states with double borders form the target set T ; those states have self-loops which are not drawn in the figure. For each random state, the distribution over the successors is uniform. Each state is labeled with an ordinal, which indicates the index of the state. In particular, the example shows that transfinite indices are needed.

Let us inductively extend the definition of Es from s = Since the sequence of sets Sα is non-increasing and S0 = S ′ sˆ ∈ Sˆ to arbitrary s ∈ S. For any transition s−→s in G, is countable, it follows that this sequence of games Gα ′ ω define Es′ := {x ∈ s S | sx ∈ Es}. This is well-defined converges (i.e., is ultimately constant) at some ordinal β where as the transition graph of G is a forest. For any s ∈ S, the β ≤ ω1 (the first uncountable ordinal). That is, we have event Es is also measurable. By this construction we obtain the Gβ = Gβ+1. Note in particular that Gβ does not contain any ′ following property: If a play y in G visits states s,s ∈ S then dead ends. (However, its state space Sβ might be empty. In the suffix of y starting from s satisfies Es iff the suffix of y this case it is considered to be losing for player ✷.) ′ starting from s satisfies Es′ . This property is weaker than the We define the index, I(s), of a state s as the smallest tail property (which would stipulate that all Es are equivalent), ordinal α with s ∈ Dα, and as ⊥ if such an ordinal does but it suffices for our purposes. not exist. For all states s ∈ S we have: In the remainder of the proof, when G′ is (a subgame ′ ′ of) G, we write PG ,s,σ,π(E) for PG ,s,σ,π(Es) to avoid clutter. I(s)= ⊥ ⇔ s ∈ Sβ ⇔ valGβ (s)=1 Similarly, when we write valG′ (s) we mean the value with <1 respect to Es. We show that states with O are in , and states s I(s) ∈ E ✸ G s In order to characterize the winning sets of the players, =1 with I(s)= ⊥ are in E ✷ . we construct a transfinite sequence of subgames Gα of G, G where α ∈ O is an ordinal number, by stepwise removing Strategy πˆs: For each s ∈ S with I(s) ∈ O we construct a ✸ certain states that are losing for player ✷, along with their player strategy πˆs such that PG,s,σ,πˆs (E) < 1 holds for all incoming transitions. Thus some subgames Gα may contain player ✷ strategies σ. The strategy πˆs is defined inductively states without any outgoing transitions (i.e., dead ends). Such over the index I(s). dead ends are always considered as losing for player ✷. Let s ∈ S with I(s) = α ∈ O. In game Gα we have

(Formally, one might add a self-loop to such states and remove valGα (s) < 1. So by weak determinacy (Theorem 1) there is from the objective all plays that reach these states.) a strategy πˆs with PGα,s,σ,πˆs (E) < 1 for all σ. (For example, ✸ Let Sα denote the state space of the subgame Gα. We start one may take a (1 − valGα (s))/2-optimal player strategy). with G0 := G. Given Gα, denote by Dα the set of states s ∈ Sα We extend πˆs to a strategy in G as follows. Whenever the play O ′ ′ with valGα (s) < 1. For any α ∈ \{0} we deﬁne Sα := enters a state s ∈/ Sα (hence I(s ) < α) then πˆs switches to S \ Dγ. the previously deﬁned strategy πˆs′ . (One could show that only Sγ<α

5 ✷ σˆk+1 player can take a transition leaving Sα, although this is not implying Xi ≥ 2/3. This switch of strategy is referred to needed at the moment.) as a “reset” in [17], where the concept is used similarly. For We show by transfinite induction on the index that any k, strategy σˆk performs at most k such resets. Define σˆ as ✷ PG,s,σ,πˆs (E) < 1 holds for all player strategies σ and for the limit of the σˆk, i.e., the number of resets performed by σˆ all states s ∈ S with I(s) ∈ O. is unbounded. For the induction hypothesis, let α be an ordinal for which In order to show that σˆ is almost surely winning, we first this holds for all states s with I(s) < α. For the inductive argue that σˆ almost surely performs only a finite number of step, let s ∈ S be a state with I(s) = α, and let σ be an resets. Suppose w ∈ Sω and k,i are such that a k-th reset arbitrary player ✷ strategy in G. happens after visiting the i-th state in w. As argued above, σˆk Suppose that the play from s under the strategies σ, πˆs we have Xi (w) ≥ 2/3. Towards a contradiction assume that always remains in Sα, i.e., the probability of ever leaving Sα player ✸ has a strategy π1 to cause yet another reset with under σ, πˆs is zero. Then any play in G under these strategies probability p1 > 1/2, i.e., coincides with a play in Gα, so we have PG,s,σ,πˆs (E) = p1 := PGβ ,s0,σˆk ,π1 (R | Ei(w)) > 1/2 , PGα,s,σ,πˆs (E) < 1, as desired. Now suppose otherwise, i.e., the play from s under σ, πˆs, with positive probability, enters where R denotes the event of another reset after time i. If ′ ′ σˆk a state s ∈/ Sα, hence I(s ) < α. By the induction hypothesis another reset occurs, say at time j, then Xj (w) < 1/3, ′ ′ ′ ✸ we have PG,s ,σ ,πˆs′ (E) < 1 for any σ . Since the probability and then player can switch to a strategy π2 to force ′ of entering s is positive, we conclude PG,s,σ,πˆs (E) < 1, as PGβ ,s0,σˆk ,π2 (E | Ej (w)) ≤ 1/3. Hence: desired. p2 := PGβ ,s0,σˆk,π2 (E | R ∧ Ei(w)) ≤ 1/3 Strategy σˆ: For each s ∈ S with I(s)= ⊥ (and thus s ∈ Sβ) ✸ we construct a player ✷ strategy σˆ such that PG,s,σ,πˆ (E)=1 Let π1,2 denote the player strategy combining π1 and π2. holds for all player ✸ strategies π. We first observe that if Then it follows: s1−→s2 is a transition in G with s1 ∈ S✸∪S and I(s2) 6= ⊥ P ˆ (E ∧ R | E (w)) = p1 · p2 and O Gβ ,s0,σk ,π1,2 i then I(s1) 6= ⊥. Indeed, let I(s2)= α ∈ , thus valGα (s2) < PGβ ,s0,σˆk ,π1,2 (E ∧ ¬R | Ei(w)) ≤ PGβ ,s0,σˆk,π1,2 (¬R | Ei(w)) 1; if s1 ∈ Sα then valGα (s1) < 1 and thus I(s1) = α; if s1 ∈/ Sα then I(s1) < α. It follows that only player ✷ could = 1 − p1 ever leave the state space S , but our player ✷ strategy σˆ β Hence we have: will ensure that the play remains in S forever. Recall that G β β 2 does not contain any dead ends and that val (s)=1 for all Gβ PGβ ,s0,σˆk ,π1,2 (E | Ei(w)) ≤ (p1 · p2)+(1 − p1) ≤ 1 − p1 s ∈ S . For all s ∈ S , by weak determinacy (Theorem 1) we 3 β β 2 1 2 fix a strategy σs with PG ,s,σ ,π(E) ≥ 2/3 for all π. < 1 − · = , β s 3 2 3 Fix an arbitrary state s0 ∈ Sβ as the initial state. For a σ σ ω σˆk player ✷ strategy σ, define mappings X1 ,X2 ,... : s0S → contradicting Xi (w) ≥ 2/3. So at time i, the probability of [0, 1] using conditional probabilities: another reset is bounded by 1/2. Since this holds for every σ reset time i, we conclude that almost surely there will be only X (w) := inf PG ,s ,σ,π(E | Ei(w)) , i Π β 0 finitely many resets under σˆ, regardless of π. π∈ Gβ Now we can show that PGβ ,s0,σ,πˆ (E)=1 holds for all π. where denotes the event containing the plays that start Ei(w) Fix π arbitrarily. For k ∈ N define Qk as the event that exactly ω with the length-i prefix of w ∈ s0S . Thanks to our “forest” k resets occur. Let us write Pk = PG ,s ,σˆ ,π to avoid clutter. σ β 0 k construction at the beginning of the proof, Xi (w) depends, By Lévy’s zero-one law (see, e.g., [25, Theorem 14.2]), for in fact, only on the i-th state visited by w. any k, we have Pk-almost surely that either σ For some illustration, a small value of Xi (w) means that ✸ (E ∨ ¬Qk) ∧ lim Pk(E ∨ ¬Qk | Ei(w))=1 considering the length-i prefix of w, player has a strategy i→∞ that makes E unlikely at time i. Similarly, a large value or of Xσ(w) means that at time i (when the length-i prefix i (¬E ∧ Qk) ∧ lim Pk(E ∨ ¬Qk | Ei(w))=0 has been “uncovered”) the probability of E using σ is large, i→∞ regardless of the player ✸ strategy. holds. Let w be a play that satisfies the second option. In σ N σˆk In the following we view Xi as a random variable (taking particular, w ∈ Qk, so there exists i0 ∈ with Xi (w) ≥ 1/3 on a random value depending on a random play). for all i ≥ i0. It follows that Pk(E | Ei(w)) ≥ 1/3 holds for We define our almost-surely winning player ✷ strategy σˆ all i ≥ i0. But that contradicts the fact that limi→∞ Pk(E ∨ as the limit of inductively defined strategies σˆ0, σˆ1,.... Let ¬Qk | Ei(w)) =0. So plays satisfying the second option do σˆ0 σˆ0 := σs0 . Using the definition of σs0 we get X1 ≥ 2/3. For not actually exist. any k ∈ N, define σˆk+1 as follows. Strategy σˆk+1 plays σˆk Hence we conclude Pk(E ∨¬Qk)=1, thus Pk(¬E ∧Qk)= σˆk as long as Xi ≥ 1/3. This could be forever. Otherwise, let 0. Since the strategies σˆ and σˆk agree on all finite prefixes of σˆk i denote the smallest i with Xi < 1/3, and let s be the i-th all plays in Qk, the probability measures PGβ ,s0,σ,πˆ and Pk state of the play. At that time, σˆk+1 switches to strategy σs, agree on all subevents of Qk. It follows PGβ ,s0,σ,πˆ (¬E∧Qk)=

6 0. We have shown previously that the number of resets is some i ∈ N, resulting in a faithful simulation of infinite ′ ′ almost surely finite, i.e., PGβ ,s0,σ,πˆ ( k∈N Qk)=1. Hence we branching from s0 to some state ri, just like in the reachability have: W game in [4]. −i ′ −i From the fact that valG(ri)=1−2 and valG (ri)=2 , P ˆ (¬E) = P ˆ ¬E ∧ Q Gβ ,s0,σ,π Gβ ,s0,σ,π _ k we deduce the following properties of this game: k∈N • valG (s0) = 1, but there exists no optimal strategy ≤ P ˆ (¬E ∧ Q ) X Gβ ,s0,σ,π k starting in s0. The value is witnessed by a family of ǫ- k∈N optimal strategies σi: traversing the ladder s0s1 ··· si and = 0 choosing si−→ri. ′ • val (s )=0, but there exists no optimal minimizing Thus, P ˆ (E) = 1. Since σˆ is defined on G , this G 0 Gβ ,s0,σ,π β ′ strategy starting in s ; however, in analogy with si, there strategy never leaves Sβ. Since only player ✷ might have 0 are ǫ-optimal strategies. transitions that leave Sβ, we conclude PG,s0,σ,πˆ (E)=1. 1 • valG (i)= 2 . We argue below that neither player has an 1 ≥ 2 B. Reachability and Safety optimal strategy starting in i. It follows that i 6∈ E ✷ ⊎ 6> 1 It was shown in [4] and [18] (and also follows as a corollary E 2 for the Büchi condition ϕ. So neither player has a ✸ from [5]) that finitely branching games with reachability winning strategy, neither for (E, ≥1/2) nor for (E,>1/2). objectives with any threshold ✄c with c ∈ [0, 1] are strongly Indeed, consider any player ✷ strategy σ. Following σ, determined. In contrast, strong determinacy does not hold for once the game is in s0, Büchi states cannot be visited ✄ 1 infinitely branching reachability games with thresholds c with probability more than 2 · (1 − ǫ) for some fixed ✸ ǫ with c ∈ (0, 1); cf. Figure 1 in [4]. However, by Theorem 2, ǫ > 0 and all strategies π. Player has an 2 -optimal ′ strong determinacy does hold for almost-sure reachability and strategy π starting in s0. Then we have: safety objectives in infinitely branching games. By duality, 1 1 ǫ 1 P (E) ≤ · (1 − ǫ)+ · < , this also holds for reachability and safety objectives with G,i,σ,π 2 2 2 2 threshold >0. (For almost-sure safety (resp. > 0 reachability), so σ is not optimal. One can argue symmetrically that this could also be shown by a reduction to non-stochastic 2- player ✸ does not have an optimal strategy either. player reachability games [26].) In the example in Figure 2, the game branches from state i ′ C. Buchi¨ and co-Buchi¨ to s0 and s0 with probability 1/2 respectively. However, the Let E be the Büchi objective (the co-Büchi objective is above argument can be adapted to work for probabilities c and dual). Again, Theorem 2 applies to almost-sure and positive- 1 − c for every constant c ∈ (0, 1). probability Büchi and co-Büchi objectives, so those games are IV. MEMORY REQUIREMENTS strongly determined, even infinitely branching ones. In this section we study how much memory is needed to However, this does not hold for thresholds c ∈ (0, 1), not win objectives (E, ✄c), depending on E and on the constraint even for finitely branching games: ✄c. Theorem 3. Threshold (co-)Buchi¨ objectives (E, ✄c) with We say that an objective (E, ✄c) is strongly MD-determined thresholds c ∈ (0, 1) are not strongly determined, even for iff for every state s either finitely branching games. • there exists an MD-strategy σ such that, for all π ∈ Π, ✄ A fortiori, threshold parity objectives are not strongly de- we have PG,s,σ,π(E) c, or termined, not even for finitely branching games. We prove • there exists an MD-strategy π such that, for all σ ∈ Σ, ✄ Theorem 3 using the finitely branching game in Figure 2. It we have PG,s,σ,π(E)6 c. is inspired by an infinitely branching example in [4], where it If a game is strongly MD-determined then it is also strongly was shown that threshold reachability objectives in infinitely determined, but not vice-versa. Strong FR-determinacy is branching games are not strongly determined. defined analogously. Proof sketch of Theorem 3. The game in Figure 2 is finitely A. Reachability and Safety Objectives branching, and we consider the Büchi objective. The infinite Let T ⊆ S and (Reach(T ), ✄c) be a threshold reachability choice for player ✸ in the example of [4] is simulated with an objective. (Safety objectives are dual to reachability.) ′ ′ ′ infinite chain s0s1s2 ··· of Büchi states in our example. All Let us briefly discuss infinitely branching reachability ′ ′ ′ states s0s1s2 ··· are finitely branching and belong to player ✸. games. If c ∈ (0, 1) then strong determinacy does not hold; ✸ ′ The crucial property is that player can stay in the states si cf. Figure 1 in [4]. Objectives (Reach(T ), ≥ 1) are strongly for arbitrarily long (thus making the probability of reaching determined (Theorem 2), but not strongly FR-determined, ′ ✸ ✷ the state t arbitrarily small) but not forever. Since the states si because player needs infinite memory (even if player are Büchi states, plays that stay in them forever satisfy the is passive) [19]. Objectives (Reach(T ),> 0) correspond to Büchi objective surely, something that player ✸ needs to avoid. non-stochastic 2-player reachability games, which are strongly ✸ ′ ′ So a player strategy must choose a transition si−→ri for MD-determined [26].

7 1 2 s0 s1 s2 ··· si ···

1 1 1 1 2 2 2 2 r0 r1 r2 ··· ri ···

1 1 1 2 2 2 i t 1 1 1 2 2i 1 4 ′ 2 ′ ′ ··· ′ r0 r1 r2 ri ··· 3 1 4 1 − 2i s′ s′ s′ ··· s′ ··· 1 0 1 2 i 2 Fig. 2: A finitely branching game where the states of players ✷ and ✸ are drawn as squares and diamonds, respectively; ′ random states s ∈ S are drawn as circles. The states si and state t (double borders) are Büchi states, all other states are not. ≥ 1 6> 1 The value of the initial state i is 1 , for the Büchi objective E. However, i 6∈ E 2 ⊎ E 2 , meaning that neither player has 2 ✷ ✸ a winning strategy, neither for the objective (E, ≥1/2) nor for (E,>1/2).

In the rest of this subsection we consider finitely branching Remark 1. Condition (1) or (2) of Theorem 5 is trivially reachability games. It is shown in [4], [18] that finitely satisfied if the corresponding player is passive, i.e., in MDPs. It branching reachability games are strongly determined, but the was already known that MD strategies are sufficient for safety winning ✷ strategy constructed therein uses infinite memory. and reachability objectives in countable finitely branching Indeed, Kuˇcera [19] showed that infinite memory is necessary MDPs ([22], Section 7.2.7). Theorem 5 generalizes this result. in general: Remark 2. Theorem 5 does not carry over to stochastic reach- Theorem 4 (follows from Proposition 5.7.b in [19]). Finitely ability games with an arbitrary number of players, not even if branching reachability games with (Reach(T ), ≥ c) objec- the game graph is finite. Instead multiplayer games can require tives are not strongly FR-determined for c ∈ (0, 1). infinite memory to win. Proposition 4.13 in [24] constructs an 11-player finite-state stochastic reachability game with a pure The example from [19] that proves Theorem 4 has the subgame-perfect Nash equilibrium where the first player wins following properties: almost surely by using infinite memory. However, there is no (1) player ✷ has value-decreasing (see below) transitions; finite-state Nash equilibrium (i.e., an equilibrium where all (2) player ✸ has value-increasing (see below) transitions; players are limited to finite memory) where the first player (3) threshold c 6=0 and c 6=1; wins with positive probability. That is, the first player cannot (4) nonstrict inequality: ≥c. win with only finite memory, not even if the other players are Given a game G, we call a transition s−→s′ value-decreasing restricted to finite memory. ′ (resp., value-increasing) if valG (s) > valG(s ) (resp., The rest of the subsection focuses on the proof of Theo- ′ ✷ ✸ valG(s) < valG(s )). If player (resp., player ) controls rem 5. We will need the following result from [4]: ′ a transition s−→s , i.e., s ∈ S✷ (resp., s ∈ S✸), then the transition cannot be value-increasing (resp., value-decreasing). Lemma 6. (Theorem 3.1 in [4]) If G is a finitely branching We write RVI (G) for the game obtained from G by removing reachability game then there is an MD strategy π ∈ Π that ✸ the value-increasing transitions controlled by player ✸. Note is optimal minimizing in every state (i.e., valG(π(s)) = that this operation does not create any dead ends in finitely valG(s)). branching games, because at least one transition to a successor One challenge in proving Theorem 5 is that an optimal state with the same value will always remain for such games. minimizing player ✸ MD strategy according to Lemma 6 is We show that a reachability game is strongly MD- not necessarily winning for player ✸, even for almost-sure determined if any of the properties listed above is not satisfied: reachability and even if player ✸ has a winning strategy. Indeed, consider the game in Figure 2, and add a new player ✸ Theorem 5. Finitely branching games G with reachability state and transitions and . For the reachability objectives (Reach(T ), ✄c) are strongly MD-determined, pro- u u−→s0 u−→t vided that at least one of the following conditions holds. objective Reach({t}), we then have valG (u) = valG (s0) = valG(t)=1, and the player ✸ MD strategy π with π(u)= t (1) player ✷ does not have value-decreasing transitions, or is optimal minimizing. However, ✸ is not winning from u (2) player ✸ does not have value-increasing transitions, or w.r.t. the almost-sure objective (Reach({t}), ≥ 1). Instead the ′ ′ (3) almost-sure objective: ✄ = ≥ and c =1, or winning strategy is π with π (u)= s0. (4) strict inequality: ✄ = >. By the following lemma (from [4]), player ✷ has for every

8 Env ′ state an ǫ-optimal strategy that needs to be defined only on a been removed). If the game enters i then it will reach the Env ′ finite horizon: target from within i with probability ≥ λ. Moreover, if Env ′ the game stays inside i forever then it will almost surely Lemma 7. (Lemma 3.2 in [4]) If G is a finitely branching ∞ Env ′ reach the target, since (1 − λ) =0. Otherwise, it exits i game with reachability objective Reach(T ) then: ′ Env ′ at some state s ∈/ i (strictly speaking, at a distribution of N Env ′ ′ ∀ s ∈ S ∀ ǫ> 0 ∃ σ ∈ Σ ∃ n ∈ ∀ π ∈ Π . such states). If this was the k-th visit to i then, from s , ′ k+1 PG,s,σ,π(Reachn(T )) > valG(s) − ǫ , σ plays an ǫ 2 -optimal strategy w.r.t. Gi (with the same modification as above if it visits Env ′ again). We can now where Reach (T ) denotes the event of reaching T within at i n bound the error of σ′ from s as follows. The set of plays most n steps. Env ′ which visit i infinitely often contribute no error, since Towards a proof of item (1) of Theorem 5, we prove the they almost surely reach the target by (1 − λ)∞ =0. Since all following lemma: transitions are at least value-preserving in G and hence in Gi, the error of the plays which visit Env ′ at most j times is Lemma 8. Let G be a finitely branching game with reacha- i bounded by j ǫ 2k. Therefore, the error of σ′ from s in bility objective Reach(T ). Suppose that player ✷ does not Pk=1 G +1 is bounded by ǫ and thus val (s)= val (s). have any value-decreasing transitions. Then there exists a i Gi+1 Gi Finally, we can construct the player ✷ MD winning strategy player ✷ MD strategy σˆ that is optimal in all states. That σˆ as the limit of the MD strategies σ′, which are all compatible is, for all states s and for all player ✸ strategies π we have i with each other by the construction of the games Gi. We obtain PG,s,σ,πˆ (Reach(T )) ≥ valG(s). −i N PG,si,σ,πˆ (Reach(T )) > valG (si) − 2 for all i ∈ . Let Proof. In order to construct the claimed MD strategy σˆ, we s ∈ S. Since s = si holds for infinitely many i, we conclude define a sequence of modified games Gi in which the strategy Thus PG,s,σ,πˆ (Reach(T )) ≥ valG(s) as required. of player ✷ is already fixed on a finite subset of the state Towards a proof of items (2) and (3) of Theorem 5, we space. We will show that the value of any state remains the consider the operation RVI (G), defined before the statement same in all the G , i.e., val (s)= val (s) for all s. Fix an i Gi G of Theorem 5. The following lemma shows that in reachability enumeration s1,s2,... that includes every state in S infinitely games all value-increasing transitions of player ✸ can be often. Let G0 := G. removed without changing the value of any state (although Given Gi we construct Gi+1 as follows. We use Lemma 7 to N the outcome of the threshold reachability game may change get a strategy σi and ni ∈ s.t. PGi,si,σi,π(Reachni (T )) > −i in general). valGi (si) − 2 . From the finiteness of ni and the assumption that G is finitely branching, we obtain that Env i := Lemma 9. Let G be a finitely branching reachability game ≤ni ′ ′ RVI ′ {s | si−→ s} is finite. Consider the subgame Gi with finite and G := (G). Then for all s ∈ S we have valG (s) = ′ ′ state space Env i. In this subgame there exists an optimal valG(s). Thus RVI (G )= G . ′ MD strategy σi that maximizes the reachability probability Proof. Since only ✸ transitions are removed, we trivially have for every state in Env . In particular, σ′ achieves the same i i ′ ′ valG (s) ≥ valG(s). For the other inequality observe that approximation in G as σi in Gi, i.e., PG′ ,s ,σ′ ,π(Reach(T )) > i i i i the optimal minimizing strategy of Lemma 6 never takes any −i Env ′ Env valGi (si) − 2 . Let i be the subset of states s in i ′ ′ value-increasing transition and thus also guarantees the value with ′ . Since Env is finite, there exist N valG (s) > 0 i ni ∈ ′ ′ i ′ in G . Thus also valG (s) ≤ valG (s). and λ> 0 with P ′ ′ (Reach ′ (T )) ≥ λ for all s ∈ Env Gi,s,σi,π ni i and all ′ . Lemma 9 is in sharp contrast to Example 1 on page 4, which π ∈ ΠGi We now construct Gi+1 by modifying Gi as follows. For showed that the removal of value-decreasing transitions can ✷ Env ′ change the value of states and can cause further transitions to every player state s ∈ i we fix the transition according ′ ′ become value-decreasing. to σi, i.e., only transition s−→σi(s) remains and all other transitions from s are deleted. Since all moves from ✷ states Similar to the proof of Theorem 2, the proof of the following Env ′ ′ lemma considers a transfinite sequence of subgames, where in i have been fixed according to σi, the bounds above ′ ′ each subgame is obtained by removing the value-decreasing for Gi and σi now hold for Gi+1 and any σ ∈ ΣGi+1 . That −i transitions from the previous subgames. is, we have PGi+1,si,σ,π(Reach(T )) > valGi (si) − 2 and ′ P (Reach ′ (T )) ≥ λ for all s ∈ Env and all σ ∈ Gi+1,s,σ,π ni i Lemma 10. Let G be a finitely branching game with reach- ΣGi+1 and all π ∈ ΠGi+1 . ability objective Reach(T ). Then there exist a player ✷ MD Now we show that the values of all states s in Gi+1 are still strategy σˆ and a player ✸ MD strategy πˆ such that for all the same as in . Since our games are weakly determined, it Gi states s ∈ S, if G = RVI (G) or valG(s) =1, then the suffices to show that player ✷ has an ǫ-optimal strategy from following is true: s in Gi+1 for every ǫ > 0. Let π be an arbitrary ✸ strategy ∀ π ∈ ΠG : PG,s,σ,πˆ (Reach(T )) ≥ valG(s) or from s in Gi+1. Let s be a state and σ be an ǫ/2-optimal ✷ ′ strategy from s in Gi. We now define a ✷ strategy σ from s ∀ σ ∈ ΣG : PG,s,σ,πˆ(Reach(T )) < valG (s). Env ′ ′ in Gi+1. If the game does not enter i then σ plays exactly Proof. We construct a transfinite sequence of subgames Gα, Env ′ O as σ (which is possible since outside i no transitions have where α ∈ is an ordinal number, by stepwise removing

9 certain transitions. Let −→α denote the set of transitions of This exists by the assumption that G is finitely branching the subgame Gα. and the definition of Gα. In particular, since the transition ′ First, let G0 := RVI (G). Since G is assumed to have no s−→s is present in Gα, it is not value-increasing in the dead ends, it follows from the definition of RVI that G0 does game G; otherwise it would have been removed in the not contain any dead ends either. In the following, we only step from G to G0. remove transitions of player ✷. The resulting games Gα with • If I(s)= ⊥, πˆ plays the optimal minimizing MD strategy α> 0 may contain dead ends, but these are always considered on G from Lemma 6, i.e., we have πˆ(s)= s′ where s′ is to be losing for player ✷. (Formally, one might add a dummy an arbitrary but fixed successor of s in G with valG (s)= ′ loop at these states.) For each α ∈ O we define a set Dα as valG (s ). the set of transitions that are controlled by player ✷ and that Considering both cases, it follows that strategy πˆ is optimal O are value-decreasing in Gα. For any α ∈ \{0} we define minimizing in G. −→α := −→\ Dγ. O Sγ<α Let s0 be an arbitrary state with I(s0) ∈ . To show that Since the sequence of sets −→α is non-increasing and we PG,s0,σ,πˆ (Reach(T )) < valG (s0) holds for all σ, let σ be assumed that our game G has only countably many states and any strategy of player ✷. Let α 6= ⊥ be the smallest index transitions, it follows that this sequence of games Gα converges among the states that can be reached with positive probability at some ordinal β where β ≤ ω1 (the first uncountable from s0 under the strategies σ, πˆ. Let s1 be such a state with ordinal). I.e., we have Gβ = Gβ+1. In particular there are index α. In the following we write σ also for the strategy σ ✷ no value-decreasing player transitions in Gβ, i.e., Dβ = ∅. after a partial play leading from s0 to s1 has been played. ✷ The removal of transitions of player can only decrease Suppose that the play from s1 under the strategies σ, πˆ the value of states, and the operation RVI is value preserving always remains in Gα. Strategy πˆ might not be optimal by Lemma 9. Thus for all valGβ (s) ≤ valGα (s) ≤ valG(s) minimizing in Gα in general. However, we show that it is O. We define the index of a state by α ∈ s I(s) := min{α ∈ optimal minimizing in Gα from all states with index ≥ α. Let O ′ | valGα (s) < valG (s)}, and as ⊥ if the set is empty. s be a ✸ state with index I(s)= α ≥ α. By definition of πˆ we ′ ′ Strategy σˆ: Since Gβ does not have value-decreasing tran- have πˆ(s) = s where the transition s−→s is present in Gα′ ✷ ′ ′ ′ sitions, we can invoke Lemma 8 to obtain a player MD with valGα′ (s) = valGα′ (s ) and I(s ) = I(s) = α . In the ′ ′ strategy σˆ with PGβ ,s,σ,πˆ (Reach(T )) ≥ valGβ (s)= valG (s) case where α = α this directly implies that the step s−→s is ′ for all π and for all s with I(s) = ⊥. We show that, if optimal minimizing in Gα. The remaining case is that α > α.

I(s) = ⊥ and either valG (s)=1 or G = RVI (G), then Here, by definition of the index, valG(s) = valGα (s) and ′ ′ ′ also in G we have PG,s,σ,πˆ (Reach(T )) ≥ valG(s). The only valG(s )= valGα (s ). Since the transition s−→s is present potential difference in the game on G is that π could take a in Gα′ , it is also present in G0 and Gα. Since G0 = RVI (G), ′ ′′ ✸ transition, say s −→s , that is present in G but not in Gβ. this transition is not value-increasing in G. Also, it is not Since all ✸ transitions of G0 are kept in Gβ, such a transition value-decreasing in G, because it is a ✸ transition. Therefore ′ ′ would have been removed in the step G0 := RVI (G). We show valG(s) = valG (s ), and thus valGα (s) = valGα (s ). Also ′ that this is impossible. in this case the step s−→s is optimal minimizing in Gα. For the first case suppose that s satisfies I(s) = ⊥ and So the only possible exceptions where strategy πˆ might valG(s) = 1. It follows valGβ (s) = 1. Since Gβ does not be optimal minimizing in Gα are states with index < α. ′ not have value-decreasing transitions, we have valGβ (s ) = Since we have assumed above that such states cannot be ′′ ′ ′′ valGβ (s )=1, hence valG (s ) = valG(s )=1, so the reached under σ, πˆ, it follows that PG,s1,σ,πˆ(Reach(T )) ≤ ′ ′′ transition s −→s is not value-increasing in G. Hence the valGα (s1) < valG(s1). transition is present in G0, hence also in Gβ. Now suppose that the play from s1 under σ, πˆ, with positive For the second case suppose G = RVI (G). Since G does not probability, takes a transition, say s2−→s3, that is not present ′ ′′ contain any value-increasing transitions, the transition s −→s in Gα. Then this transition was value-decreasing for some ′ ′ is not value-increasing in G. So it is present in G0, and thus game Gα with α < α: that is, valGα′ (s2) > valGα′ (s3). ′ also in Gβ . Since the indices of both s2 and s3 are ≥ α > α , we have

It follows that under σˆ the play remains in the states of Gβ valG(s2) = valGα′ (s2) > valGα′ (s3) = valG(s3). Hence and only uses transitions that are present in Gβ, regardless the transition s2−→s3 is value-decreasing in G. Since πˆ is op- of the strategy π. In this sense, all plays under σˆ on G timal minimizing in G, we also have PG,s1,σ,πˆ(Reach(T )) < coincide with plays on Gβ . Hence PG,s,σ,πˆ (Reach(T )) = valG(s1).

PGβ ,s,σ,πˆ (Reach(T )) ≥ valG(s). Since πˆ is optimal minimizing in G, we conclude that we have P ˆ (Reach(T )) < val (s0). Strategy πˆ: It now suffices to define a player ✸ MD strategy πˆ G,s0,σ,π G so that we have PG,s,σ,πˆ(Reach(T )) < valG (s) for all σ and We are now ready to prove Theorem 5. for all s with I(s) ∈ O. This strategy πˆ is defined as follows. ′ ′ • If I(s) = α then πˆ(s) = s where s is an arbitrary but Proof of Theorem 5. Let G be a finitely branching game with ′ fixed successor of s where transition s−→s is present reachability objective (Reach(T ), ✄c). Let s0 ∈ S be an ′ ′ in Gα and valGα (s)= valGα (s ) and I(s )= I(s)= α. arbitrary initial state.

10 Suppose valG (s0) < c. Then player ✸ wins with the MD Hence finitely branching almost-sure Buchi¨ games are strongly strategy from Lemma 6. MD-determined. Suppose val (s0) > c. Let δ := val (s0) − c > 0. G G For the proof we need the following lemmas, which are By Lemma 7 there are a strategy σ ∈ Σ and n ∈ N such + δ variants of Lemmas 6 and 8 for the objective Reach (T ), that P (Reach (T )) > val (s0) − > c holds for G,s0,σ,π n G 2 which is defined as: all π ∈ Π. The strategy σ plays on the subgame G′ with ′ ′ ≤n ′ + ω state space S = {s ∈ S | s−→ s }, which is finite since Reach (T ) := {s0s1 ···∈ S | ∃ i ≥ 1.si ∈T} G is finitely branching. Therefore, there exists an MD strat- + ′ ′ ′ The difference to Reach(T ) is that Reach (T ) requires a path egy σ with PG ,s0,σ ,π(Reach(T )) ≥PG,s0,σ,π(Reachn(T )). Since S′ ⊆ S, the strategy σ′ also applies in G, to T that involves at least one transition. ′ ′ ′ hence PG,s0,σ ,π(Reach(T )) ≥ PG ,s0,σ ,π(Reach(T )). Lemma 12. Let G be a finitely branching game with objective By combining the mentioned inequalities we obtain that Reach+(T ). Then there is an MD strategy π ∈ Π that is ′ PG,s0,σ ,π(Reach(T )) > c holds for all π ∈ Π. So the MD optimal minimizing in every state. strategy σ′ is winning for player ✷. Proof. Outside T , the objectives Reach(T ) and Reach+(T ) It remains to consider the case valG (s0)= c. Let us discuss coincide, so outside T , the MD strategy π from Lemma 6 is the four cases from the statement of Theorem 5 individually. + optimal minimizing for Reach (T ). Any s ∈ T ∩ S✸ with (4) If ✄ = > then player ✸ wins with the MD strategy from ′ ′ valG(s) < 1 must have a transition s−→s with s ∈/ T and Lemma 6. ′ valG(s) = valG(s ), where the value is always meant with So for the remaining cases it suffices to consider the threshold respect to Reach+(T ). Set π(s) := s′. Then π is optimal objective (Reach(T ), ≥ valG(s0)). minimizing in every state, as desired. ✷ (1) If player does not have value-decreasing transitions Lemma 13. Let G be a finitely branching game with ob- ✷ then player wins with the MD strategy from Lemma 8. jective Reach+(T ). Suppose player ✷ does not have value- ✸ (2) If player does not have value-increasing transitions decreasing transitions. Then there is an MD strategy σ ∈ Σ ✷ ✸ then Lemma 10 supplies either player or player with that is optimal maximizing in every state. an MD winning strategy. Proof. Outside , the objectives Reach and Reach+ (3) If c = valG(s0)=1 then, again, Lemma 10 supplies T (T ) (T ) either player ✷ or player ✸ with an MD winning strategy. coincide, so outside T , the MD strategy σ from Lemma 8 is + optimal maximizing for Reach (T ). Any s ∈ T ∩ S✷ must This completes the proof of Theorem 5. ′ ′ ′ have a transition s−→s with s ∈ T or valG(s)= valG(s ), where the value is always meant with respect to Reach+(T ). B. Buchi¨ and co-Buchi¨ Objectives Set σ(s) := s′. Then σ is optimal maximizing in every state, Let E be the Büchi objective. (The co-Büchi objective is as desired. dual.) Quantitative Büchi objectives (E, ✄c) with c ∈ (0, 1) are not strongly determined, not even for finitely branching With this at hand, we prove Theorem 11. games (Theorem 3), but positive probability (E,> 0) and Proof of Theorem 11. We proceed similarly to the proof of almost-sure (E, ≥ 1) Büchi objectives are strongly determined Theorem 2. In the present proof, whenever we write valG′ (s) (Theorem 2). for a subgame G′ of G, we mean the value of state s with However, (E,> 0) objectives are not strongly FR- respect to Reach+(T ∩ S′), where S′ ⊆ S is the state space determined, even in finitely branching systems. Even in the of G′. special case of finitely branching MDPs (where player ✸ In order to characterize the winning sets of the players with is passive and the game is trivially strongly determined), respect to the objective Büchi(T ), we construct a transfinite player ✷ may require infinite memory to win [18]. sequence of subgames Gα of G, where α ∈ O is an ordinal In infinitely branching games, the almost-sure Büchi ob- number, by stepwise removing certain states, along with their jective (E, ≥ 1) is not strongly FR-determined, because it incoming transitions. Let Sα denote the state space of the subsumes the almost-sure reachability objective; cf. Subsec- 0 subgame Gα. We start with G0 := G. Given Gα, define Dα tion IV-A. as the set of states s ∈ Sα with valGα (s) < 1, and for any In contrast, in finitely branching games, the almost-sure i+1 i j i ≥ 0 define Dα as the set of states s ∈ Sα \ j=0 Dα ∩ Büchi objective (E, ≥ 1) is strongly MD-determined, as the ′ ′ S i (S✸ ∪S ) that have a transition s−→s with s ∈ Dα. The set following theorem shows: i 0 i∈N Dα can be seen as the backward closure of Dα under S ✸ Theorem 11. Let G be a finitely branching game with objec- random transitions and transitions controlled by player . For O i ✷ any α ∈ \{0} we define Sα := S \ N Dγ. tive Büchi(T ). Then there exist a player MD strategy σˆ Sγ<α Si∈ and a player ✸ MD strategy πˆ such that for all states s ∈ S: Since the number of states never increases and S is countable, it follows that this sequence of games Gα converges at ∀ π ∈ ΠG : PG,s,σ,πˆ (Büchi(T ))=1 or some ordinal β where β ≤ ω1 (the first uncountable ordinal).

∀ σ ∈ ΣG : PG,s,σ,πˆ (B¨uchi(T )) < 1. That is, we have Gβ = Gβ+1.

11 As in the proof of Theorem 2, some games Gα may contain Strategy σˆ: We define the claimed MD strategy σˆ for all dead ends, which are always considered to be losing for s ∈ S✷ with I(s)= ⊥ to be the MD strategy from Lemma 13 + player ✷. However, Gβ does not contain dead ends. (If Sβ for Gβ and Reach (T ∩ Sβ). This definition ensures that is empty then player ✷ loses.) We define the index, I(s), ofa player ✷ never takes a transition in G that leaves Sβ. Random i ✸ state s as the ordinal α with s ∈ i∈N Dα, and as ⊥ if such transitions and player transitions in G never leave Sβ either: S ′ ′ O ′ i an ordinal does not exist. For all states s ∈ S we have: indeed, if s ∈ S with I(s )= α ∈ then s ∈ Dα for some i, ′ hence if s ∈ S✸ ∪S and s−→s then I(s) ≤ α. We conclude I(s)= ⊥ ⇔ s ∈ Sβ ⇔ valGβ (s)=1 that starting from Sβ all plays in G remain in Sβ, under σˆ and all player ✸ strategies. ✷ In particular, player does not have value-decreasing tran- ✸ Let s ∈ Sβ, hence valGβ (s)=1. Let π be any player sitions in Gβ. We show that states s with I(s) ∈ O are <1 strategy. Since σˆ is optimal maximizing in Gβ, we have + in Büchi(T ) ✸ , and states s with I(s) = ⊥ are in G PGβ ,s,σ,πˆ (Reach (T ∩ Sβ)) = 1. As argued above, Sβ is =1 + Büchi(T ) ✷ , and in each case we give the claimed wit- not left even in G, hence PG,s,σ,πˆ (Reach (T ∩ Sβ))=1. G + nessing MD strategy. Therefore PG,s,σ,πˆ (Reach (T ∩ Sβ))=1 holds for all s ∈ Strategy πˆ: We define the claimed MD strategy πˆ for all Sβ and all π. Since Büchi is repeated reachability, we also 0 have P ˆ (Büchi(T )) = 1 for all π and all s ∈ S with s ∈ S✸ with I(s) = α ∈ O as follows. For all s ∈ D , G,s,σ,π α I(s)= ⊥. define πˆ(s) as in the MD strategy from Lemma 12 for Gα + i+1 N and Reach (T ∩ Sα). For all s ∈ Dα ∩ S✸ for some i ∈ , V. CONCLUSIONS AND OPEN PROBLEMS define πˆ(s) := s′ such that s−→s′ and s′ ∈ Di . α With the results of this paper at hand, let us review the In each G , strategy πˆ coincides with the strategy from α landscape of strong determinacy for stochastic games. We have Lemma 12, except possibly in states s ∈ S with val (s)= α Gα shown that almost-sure objectives are strongly determined 1. It follows that πˆ is optimal minimizing for all G with α (Theorem 2), even in the infinitely branching case. α ∈ O. Let us review the finitely branching case. Quantitative We show by transfinite induction on the index that reachability games are strongly determined [18], [4], [5]. They ˆ Büchi holds for all states with PG,s,σ,π( (T )) < 1 s ∈ S are generally not strongly FR-determined [19], but they are O and for all player ✷ strategies . For the induction I(s) ∈ σ strongly MD-determined under any of the conditions provided hypothesis, let be an ordinal for which this holds for all α by Theorem 5. Almost-sure reachability games and even states with . For the inductive step, let be s I(s) < α s ∈ S almost-sure Büchi games are strongly MD-determined (The- a state with , and let be an arbitrary player ✷ I(s) = α σ orems 5 and 11). Almost-sure co-Büchi games are generally strategy in . G not strongly FR-determined [18], even if player ✷ is passive, 0 • Let s ∈ Dα. Suppose that the play from s under the because player ✸ may need infinite memory to win. However, strategies σ, πˆ always remains in Sα, i.e., the prob- the following question is open: if a state is almost-surely ability of ever leaving Sα under σ, πˆ is zero. Then winning for player ✷ in a co-Büchi game, does player ✷ also any play in G under these strategies coincides with have a winning MD strategy? + a play in Gα, so we have PG,s,σ,πˆ (Reach (T )) = The same question is open for infinitely branching almost- + PGα,s,σ,πˆ(Reach (T ∩ Sα)). Since πˆ is optimal mini- sure reachability games (these games are generally not + mizing in Gα, we have PGα,s,σ,πˆ(Reach (T ∩ Sα)) ≤ strongly FR-determined either [19]). In fact, one can show + valGα (s) < 1. Since Büchi(T ) ⊆ Reach (T ), we that a positive answer to the former question implies a positive + have PG,s,σ,πˆ(Büchi(T )) ≤ PG,s,σ,πˆ(Reach (T )). By answer to the latter question. combining the mentioned equalities and inequalities we Acknowledgements. This work was partially supported by get , as desired. PG,s,σ,πˆ (Büchi(T )) < 1 the EPSRC through grants EP/M027287/1, EP/M027651/1, Now suppose otherwise, i.e., the play from s under ′ EP/P020909/1 and EP/M003795/1 and by St. John’s College, σ, πˆ, with positive probability, enters a state s ∈/ Sα, Oxford. hence I(s′) < α. By the induction hypothesis we ′ have PG,s′,σ′,πˆ (Büchi(T )) < 1 for any σ . Since REFERENCES ′ the probability of entering s is positive, we conclude [1] P. A. Abdulla, L. Clemente, R. Mayr, and S. Sandberg. Stochastic parity PG,s,σ,πˆ(Büchi(T )) < 1, as desired. games on lossy channel systems. Logical Methods in Computer Science, i 10(4:21), 2014. • Let s ∈ Dα for some i ≥ 1. It follows from the i [2] P. Billingsley. Probability and Measure. Wiley, New York, NY, 1995. definitions of Dα and of πˆ that πˆ induces a partial play Third Edition. ′ 0 ✷ of length i +1 from s to a state s ∈ Dα (player [3] T. Brázdil, V. Broˇzek, K. Etessami, A. Kuˇcera, and D. Wojtczak. One- does not play on this partial play). We have shown counter Markov decision processes. In SODA’10, pages 863–874. SIAM, 2010. above that PG,s′,σ,πˆ (Büchi(T )) < 1. It follows that [4] T. Brázdil, V. Broˇzek, A. Kuˇcera, and J. Obdrzálek. Qualitative PG,s,σ,πˆ(Büchi(T )) < 1, as desired. reachability in stochastic BPA games. Information and Computation, 209, 2011. We conclude that we have PG,s,σ,πˆ(Büchi(T )) < 1 for all σ [5] V. Broˇzek. Determinacy and optimal strategies in infinite-state stochastic and all s ∈ S with I(s) ∈ O. reachability games. TCS, 493, 2013.

12 [6] K. Chatterjee, L. de Alfaro, and T. Henzinger. Strategy improvement for concurrent reachability games. In QEST, pages 291–300. IEEE Computer Society Press, 2006. [7] K. Chatterjee, M. Jurdziński, and T. Henzinger. Simple stochastic parity games. In CSL’03, volume 2803 of LNCS, pages 100–113. Springer, 2003. [8] K. Chatterjee, M. Jurdziński, and T. A. Henzinger. Quantitative stochastic parity games. In Proceedings of the Fifteenth Annual ACM- SIAM Symposium on Discrete Algorithms, SODA ’04, pages 121– 130, Philadelphia, PA, USA, 2004. Society for Industrial and Applied Mathematics. [9] A. Condon. The complexity of stochastic games. Information and Computation, 96(2):203–224, 1992. [10] L. de Alfaro and T. Henzinger. Concurrent omega-regular games. In Proc. of LICS, pages 141–156. IEEE, June 2000. [11] L. de Alfaro, T. Henzinger, and O. Kupferman. Concurrent reachability games. In FOCS, pages 564–575. IEEE Computer Society Press, 1998. [12] K. Etessami, D. Wojtczak, and M. Yannakakis. Recursive stochastic games with positive rewards. In ICALP, volume 5125 of LNCS. Springer, 2008. [13] K. Etessami and M. Yannakakis. Recursive Markov decision processes and recursive stochastic games. In ICALP’05, volume 3580 of LNCS, pages 891–903. Springer, 2005. [14] K. Etessami and M. Yannakakis. Recursive concurrent stochastic games. LMCS, 4, 2008. [15] W. Feller. An Introduction to Probability Theory and Its Applications, volume 1. Wiley & Sons, second edition, 1966. [16] J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer, 1997. [17] H. Gimbert and F. Horn. Solving simple stochastic tail games. In Proceedings of SODA, pages 847–862. SIAM, 2010. [18] J. Krˇcál. Determinacy and Optimal Strategies in Stochastic Games. Master’s thesis, Masaryk University, School of Informatics, Brno, Czech Republic, 2009. [19] A. Kuˇcera. Turn-based stochastic games. In K. R. Apt and E. Grädel, editors, Lectures in Game Theory for Computer Scientists. Cambridge University Press, 2011. [20] A. Maitra and W. Sudderth. Finitely additive stochastic games with Borel-measurable payoffs. International Journal of Game Theory, 27:257–267, 1998. [21] D. Martin. The determinacy of Blackwell games. The Journal of Symbolic Logic, 63:1565–1581, 1998. [22] M. L. Puterman. Markov Decision Processes. Wiley, 1994. [23] L. S. Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39(10):1095–1100, 1953. [24] M. Ummels and D. Wojtczak. The complexity of Nash equilibria in stochastic multiplayer games. LMCS, 7(3:20), 2011. [25] D. Williams. Probability with Martingales. Cambridge University Press, 1991. [26] W. Zielonka. Infinite games on finitely coloured graphs with applications to automata on infinite trees. TCS, 200(1-2):135–183, 1998.