MASARYK UNIVERSITY FACULTY}w¡¢£¤¥¦§¨  OF I !"#$%&'()+,-./012345

Determinacy and Optimal Strategies in Stochastic Games

MASTER’S THESIS

Jan Krˇc´al

Brno, 2009 Declaration

I declare that this thesis is my own work and has not been submitted in any form for an- other degree or diploma at any university or other institution of tertiary education. Infor- mation derived from the published or unpublished work of others has been acknowledged in the text and a list of references is given.

Advisor: prof. RNDr. Anton´ın Kucera,ˇ Ph.D.

ii Acknowledgement

I would like to thank my advisor – Anton´ın Kuceraˇ for offering me such a great topic, for his patience with my repetitive questions, for valuable comments, and for the care he invested in the frequent seminars that helped me with my writing. Another person who helped me significantly with my thesis is Vaclav´ Brozek;ˇ I want to thank him for many fruitful discussions helping me to find mistakes as well as better solutions. Finally, I would like to thank my friends and family for encouraging me, for their un- derstanding, and for allowing me to concentrate on my work.

iii Abstract

We deal with the of stochastic turn-based perfect-information games with reachability, safety, and Buchi¨ winning objectives. We separately discuss the situation for finite-state games, finitely-branching infinite-state games and infinite-state games with un- limited branching. Even though all these games are determined thanks to the general result of Martin [13], we provide simpler proofs for these specific cases that also allow us to rea- son about the existence of the optimal strategies.

iv Keywords stochastic games, Markov decision processes, determinacy, optimal strategies, memory- less deterministic strategies, reachability, safety, Buchi¨ objectives, infinite state space, finite branching

v Contents

1 Introduction ...... 3 2 Definitions ...... 5 2.1 Basics ...... 5 2.2 Markov chain and its probability space ...... 5 2.3 Games ...... 6 2.4 Winning objectives ...... 8 2.5 Determinacy and values ...... 8 1 3 Finite 1 2 player games ...... 11 3.1 Reachability games ...... 11 3.2 Safety games ...... 20 1 4 Finite 2 2 player games ...... 23 4.1 Reachability games ...... 23 4.2 Safety games ...... 32 4.3 Buchi¨ games ...... 33 5 Finitely-branching games ...... 37 5.1 Reachability games ...... 38 5.2 Buchi¨ games ...... 41 6 Infinitely-branching games ...... 43 6.1 Reachability games ...... 43 6.2 Buchi¨ games ...... 48 7 Summary ...... 49

1 Chapter 1 Introduction

The formalism of stochastic games is an important tool in the area of formal verification. It allows us to describe models with non-determinism, randomness, and a malicious oppo- nent. These three forces move in turns a token along transitions between vertices in a tran- sition system. The path of the token through the system determines the winner. It is either the first player, expressing our non-determinism or the second player playing as our oppo- nent. The decisions of both players are resolved by specifying their strategies. The main concern of this thesis is the probability that the first player wins. Since there are two players on whose decisions the probability depends, we talk about the highest probability that the first player can achieve against any of the opponent. And similarly, we also talk about the lowest probability that the second player can achieve against any strategy of the first player. If these two quantities equal, we call them the value of the game and say that the game is determined. An optimal strategy of the first player is a strategy that guarantees his winning with a probability greater or equal to the value. Likewise, an optimal strategy of the second player ensures a probability of loosing lower or equal to the value. The way to determine the winner is called a winning objective. It is a set of paths through the transition system that we define as winning for the first player. The games are zero-sum, the set of the paths winning for the second player is a complement to the set of the paths winning for the first player. In this text, we consider three types of winning objectives. In a game with a reachability winning objective, the first player tries to reach any vertex from a specified set of target vertices. In a game with a safety winning objective, the first player must never leave a specified set of safe vertices. Finally, in a game with a Buchi¨ winning objective, there is also a set of target vertices, and the first player tries to visit some target vertex infinitely many times. In this text, we go through several types of games and give a uniform overview an- swering the following question: Are the games of this type determined? Are there optimal strategies for both players? Is it possible to compute the optimal strategies or the value of a game in some efficient way? The types of games covered are as follows: the games on a finite transition system, the games on an infinite transition system with finite branching, and the games with infi- nite branching. We also distinguish the games with only one player (i.e. with no adversary) from the games with two players. And finally, we separately treat the three winning objec- tives mentioned earlier. 2 1. INTRODUCTION

The question of determinacy has been answered by a general result of Martin [13]. It applies to all the types of games discussed here, therefore the results in this text are not new. And yet, we provide specific proofs for these types of games that may give more insight into the area. It has to be mentioned that the results we present are in most cases standard, the contribution of our text is mainly the rigorous uniform treatment.

3 Chapter 2 Definitions

2.1 Basics

≥0 By N, N0, Q, R, R , we denote the sets of natural numbers, natural numbers with zero, rational numbers, real numbers and non-negative real numbers, respectively. For every finite or countably infinite set A, the set of all finite words over A is denoted by A∗; the symbol A+ represents the set of all non-empty finite words over A. The length of a word w is written as w , and the letters in w are denoted by w(0), w(1), . . . , w( w 1); | | | | − the last letter (assuming w > 0) is denoted by last(w). For two words u, w A∗, their | | ∈ concatenation is written as uw. Concerning the basics of the probability theory, for a finite or countably infinite set A, ≥0 P a probability distribution on A is a function f : A R such that f(a) = 1.A → a∈A distribution is called positive if f(a) > 0 for each a A and Dirac if f(a) = 1 for some ∈ a A. The set of all distributions on A is denoted by (A). ∈ D A σ-algebra over a set X is a set 2X that includes X and is closed under comple- F ⊆ ments and countable unions. A measurable space is a pair (X, ) where is the set of all F F measurable subsets of X.A measure on space (X, ) is a function µ : R with the fol- F F → lowing properties: (1) µ( ) = 0, (2) for every E , µ(E) 0, and (3) for every countable ∅ ∈ F ≥ P S collection Ai i∈I of pairwise disjoint subsets of it holds µ(Ai) = µ( Ai).A { } F i∈I i∈I measure µ is a probability measure if µ(X) = 1.A probability space is a triple (X, ,P ), F where (X, ) is a measurable space and P is a probability measure on (X, ). The set X is F F called a sample space.

2.2 Markov chain and its probability space

A transition system is a pair = (S, ) where S is a set of states and S S is S → → ⊆ × a transition relation such that each state s S has some outgoing transition s t. For ∈ → a state s, we denote by succ(s) the set of successor states s0 s s0 . By a notation v v0 { | 7→ } ∀ 7→ it holds ϕ, we mean v0 succ(v) it holds ϕ. ∀ ∈ Definition 2.1. An infinite path (also called run) in is a countable sequence of states S w = s0s1s2 ... such that for each i N0, si si+1.A finite path (also called history) is ∈ → a word w S+ such that for all 0 i < w 1, we have that w(i) w(i + 1). The symbol ∈ ≤ | | − →

4 2. DEFINITIONS

S⊕ S+ denotes the set of all finite paths in , and the symbol S∞ denotes the set of all ⊆ S infinite paths in . We say that a state m is reachable from a state n if there is a finite path S of the form mwn. Later we define a game as a transition system where some transitions are controlled by the players. For a fixed pair of strategies of the two players, we can convert a game into a Markov chain, which is a transition system where each transition is assigned a fixed positive probability. A Markov chain allows us to reason about probabilities of events such as A = the run begins with a finite path w or B = the run visits sometime a specified state m. Definition 2.2. A Markov chain is a triple = (M, , P rob) where (M, ) is a transi- M → → tion system and P rob is a function that assigns to each m M a positive distribution on p ∈ succ(m). We write m n if m n and P rob(m)(n) = p. → → Each finite path w M ⊕ specifies a set of runs Run( , w) M ∞ containing all ∈ M ⊆ runs ω M ∞ that start with the prefix w. We call this set of runs a basic cylinder.A ∈ Markov chain and a state s M determine a probability space (Run( , s), ,P ) that M ∈ M F captures probabilities of certain sets of runs initiated in s. The sample space Run( , s) M are all the runs in the Markov chain starting with s. The collection of measurable sets of runs is a σ-algebra generated by the set Run( , w) w M ⊕, w(0) = s . Fi- F { M | ∈ } ⊕ nally, P is the unique probability measure such that for each w = s0s1 . . . sn M , Qn−1 ∈ P (Run( , w)) = P rob(si)(si ). M i=0 +1 For a given Markov chain and a starting state s, we can measure the probabilities of all sets in . In , there are all basic cylinders that were used as generators and all sets that F F can be constructed from basic cylinders using a finite number of complementations and countable unions. For example, the event A = the run begins with a finite path w is actu- ally directly the basic cylinder Run( , w) . The other example is more complicated, M ∈ F the event B = the run visits sometime a specified state m is a union of all basic cylinders that end with m, S Run( , w) w M ⊕, last(w) = m , which again belongs to . { M | ∈ } F

2.3 Games

1 Stochastic 2 2 player games are systems with two regular players  and ♦ , playing against each other, and one random player .

1 Definition 2.3. A stochastic 2 player game G is a tuple (V, , (V ,V ,V ), P rob), where 2 7→  ♦ V is a finite or countably infinite set of vertices, (V, ) is a transition system where 7→ 7→ is called an edge relation, (V ,V ,V ) is a partition of V , P rob is a function that assigns  ♦ to each random vertex v V a positive probability distribution on the set of outgoing ∈ transitions succ(v). We say that G is finite if the set V is finite, and that G is finitely branch- ing if for each v V the set succ(v) is finite. If the set V or the set V♦ is empty, we say that ∈ 1  G as a stochastic 1 2 player game of the player ♦ or the player  , respectively.

5 2. DEFINITIONS

A run of the game G starting at a vertex s is obtained by moving a token in the game graph, where the token is initially placed in the vertex s. If the token is in a vertex v ∈ V , it is moved by the player  to any vertex in succ(v); if v V , the token is moved  ∈ ♦ by the player ; if v V , the token is moved randomly according to P rob(v). ♦ ∈ The decisions of the players  and ♦ are formalized by strategies. In general, each player can decide according to the full history of the run. Furthermore, the decisions may be ran- dom, i.e. the strategy fixes a probability distribution that determines the following vertex in the run.

Definition 2.4. Let  , ♦ .A strategy for the player is a function ν such that for each ⊕ ∈ { } history wv V where v V is the current vertex, the function ν returns a probability ∈ ∈ distribution on the set of outgoing transitions from v, i.e. ν : wv (succ(v)). We denote 7→ D the set of all strategies for the player  and the player ♦ as Σ and Π, respectively. A strategy ν is memoryless (M) if for each history wv the decision depends on the cur- rent vertex only, ν(wv) = ν(v). A strategy ν is deterministic (D) if for each history wv, the distribution ν(wv) is a Dirac distribution, i.e. it assigns probability of 1 to some vertex u succ(v) and 0 to all remaining vertices. A strategy that is not memoryless is called ∈ history-dependent (H), and a strategy that is not deterministic is called randomized (R). Classes of strategies are denoted by the first letters, e.g. MD denotes the class of memory- less deterministic strategies. They form the following hierarchy: MD MR HR and MD ⊆ ⊆ HD HR. We sometimes use a simpler notation for a memoryless strategy, such that its ⊆ ⊆ domain is not the set of histories, but the set of vertices V representing the current vertex. Similarly, the codomain of a deterministic strategy is only the set of vertices V , not the set of distributions over V . In this simpler notation, the type of an MD strategy ν may be then written as ν : V V . 7→ To be able to reason about probabilities in games, we need to convert it into a Markov chain. From the definition of a game, there is P rob defined for random vertices V . If we take memoryless strategies σ ΣMR and π ΠMR, we can define P rob for the remaining ∈ ∈ vertices by P rob(v) = σ(v) for v V , and P rob(v) = π(v) for v V . Then, (V, , P rob) ∈  ∈ ♦ 7→ is a properly defined Markov chain. But this is not a general solution; because of history- dependent strategies, we must use the unfolding technique – create a Markov chain where states are histories of G. A game G = (V, , (V ,V ,V ), P rob) together with strategies σ and π induces a play. 7→  ♦ Definition 2.5. A play is a Markov chain G(σ, π) = (V ⊕,, , P rob0) where for each history → wu V ⊕ and vertex v V , wu , wuv if and only if u v and P rob0(wu)(v) > 0. ∈0 ∈ → ⊕ 0 → P rob (wv) is defined for any wv V as follows: P rob (wv) = P rob(v) for v V , ∈ ∈ P rob0(wv) = σ(wv) for v V , and P rob0(wv) = π(wv) for v V . ∈  ∈ ♦ σ,π,G For a game G, strategies σ Σ and π Π, and vertex v, Pv denotes the probability ∈ ∈ measure on the probability space defined by the induced Markov chain G(σ, π) and the ver- σ,π 1 tex v, if G is clear from the context, we write only Pv . Furthermore, if G is a 1 2 player

6 2. DEFINITIONS

σ π game of player  or ♦ , we omit the trivial strategy of the other player and write Pv or Pv , respectively.

2.4 Winning objectives

So far, we have established probabilities for plays starting in a fixed vertex, i.e. every mea- surable set of runs has a unique probability. Now, we need to decide the winner of a run in a play.

Definition 2.6. For a game G, a winning objective W is a set of runs in the unfolded tran- sition system (V ⊕,, ) that are winning for the player . Runs not in W are winning for →  the player ♦ . σ,π For a winning objective W , Pv (W ) is the measure of the winning runs for the player  in the Markov chain G(σ, π). In other words, it is the probability that the player  wins by using the strategy σ against the strategy π of the player ♦ if the play starts in the vertex v. In this text we consider three types of winning objectives:

Reachability winning objective For a set of vertices T V , reachability objective is ⊆ RT a set of runs that visit any vertex from T .

⊕ ∞ T = s0s1 ... (V ) n N0 : last(sn) T R { ∈ | ∃ ∈ ∈ }

Safety winning objective For a set of vertices S V , safety objective is a set of runs ⊆ ST that never leave the set S.

⊕ ∞ T = s0s1 ... (V ) n N0 : last(sn) S S { ∈ | ∀ ∈ ∈ }

B ¨uchiwinning objective For a set of vertices T V ,Buchi¨ objective is a set of runs ⊆ BT that visit some vertex from T infinitely many times. For v V and w = s s ... (V ⊕)∞ ∈ 0 1 ∈ let occurencesv(w) = n N0 last(sn) = v . { ∈ | } ⊕ ∞ = w (V ) v T : occurencesv(w) is infinite BT { ∈ | ∃ ∈ }

2.5 Determinacy and values

Definition 2.7. We say that a game G with a winning objective W is (weakly) determined if for each vertex s it holds

σ,π σ,π sup inf Ps (W ) = inf sup Ps (W ) σ∈Σ π∈Π π∈Π σ∈Σ

7 2. DEFINITIONS

If a game is determined, then we define a value in a vertex s V as ∈ W σ,π σ,π valG (s) = sup inf Ps (W ) = inf sup Ps (W ) σ∈Σ π∈Π π∈Π σ∈Σ

If G and W is obvious from the context, we write only val(s).

Thanks to the results of Martin [12], [13] and Maitra and Sudderth [11], we know that any stochastic game with a Borel winning objective is determined. For any set X, the win- ning objectives , and are Borel sets. Hence, the games discussed in this text are RX SX BX all determined. But the proofs showing this fact in general are hard to understand and do not provide any insight in the special cases discussed here.

≥ Definition 2.8. Let G be a game with a winning objective W . For  R 0, we call a strategy ∈ σ of the player -optimal if for all strategies π Π and all vertices s V it holds  ∈ ∈ P σ,π(W ) val(s)  s ≥ − Similarly, a strategy π of the player is -optimal if for all strategies σ Σ and all vertices ♦ ∈ s V it holds ∈ P σ,π(W ) val(s) +  s ≤ A strategy is optimal if it is 0-optimal. We call a strategy σ of the player  strictly optimal σ,π if for each π Π and each s V it holds Ps (W ) > val(s). A strategy π of the player ♦ is ∈ ∈ σ,π strictly optimal if for each σ Σ and each s V it holds Ps (W ) < val(s). ∈ ∈

8 Chapter 3

1 Finite 12 player games

For this section, let us fix a finite game G = (V, , (V , ,V ), P rob). 7→  ∅

3.1 Reachability games

We define a set of target vertices T , the goal of the player  is to reach any target vertex t T . ∈ 1 Example 3.1 (Tossing coins). We start with a simple example of a 1 2 player game, which is illustrated in Figure 3.1. The player gradually chooses either not to flip the coin or to flip it once or twice. When the player decides to stop tossing the coin, there are certain given probabilities of winning depending on the count of heads and tails the player has flipped. The probabilities are stated in the figure. The set of target vertices T = win and the start- { } ing vertex s = 0/0. Is there an optimal strategy, i.e. a strategy that maximizes the probability of winning? If so, what is the maximal probability of winning? Can we compute answers to both of these questions? To answer them, we must compute the value of the game in the starting ver- σ tex, i.e. supσ∈ P ( ), and construct the optimal strategy if exists. As we will see later, Σ s RT there is always an optimal strategy for this type of games. Furthermore, there is an optimal memoryless deterministic (MD) strategy. We conclude this example with a claim that the value of the game in the vertex 0/0 is 0.525 and that the optimal strategy is to toss the coin and stop if the result is heads, otherwise to toss it once more. We will return to this example and prove this claim later on.

The proof of the determinacy mainly follows the proof provided by Brazdil´ et al. [4]. The first observation, leading to the value of a reachability game, is that we can limit the number of steps, which makes the game simpler to reason about. If we take the prob- ability of winning in at most one step, at most two steps, at most three steps and so on, we get a non-decreasing sequence of probabilities. This sequence has a limit – the actual probability of winning in unlimited steps. First, we define the n-steps value, the maximal probability of reaching T in up to n steps. Notice that we define it in a more general way 1 for the 2 2 player games because we will use this definition also later. The set V♦ is empty 1 in the case of 1 2 player games.

9 1 3. FINITE 1 2 PLAYER GAMES

1 Figure 3.1: Tossing coins, a 1 2 player game. The player  has square vertices with solid-line transitions, the player has dotted circle vertices with dotted-line transitions. If the tran- sition probabilities of the player are not uniformly distributed, they are written next to the dotted-line transitions. The goal of the player  is to reach the vertex win. The game starts at the vertex 0/0. The fraction-like vertex labels mean the count of heads and tails the player tossed (heads/tails).

nd nd 1st 2 2 flip heads tails

heads 0/0 / tails 1/0 0/1 1/1 2/0 0/2 tossed

[0/0] [1/0] [0/1] [1/1] [2/0] [0/2]

0.75 0.25 0.75 0.25 0.5 0.4 0.5 0.1 0.9 0.9 0.1 0.6

win loose

Definition 3.1. For every v V, n N0 we define Vn(v) as ∈ ∈ V (v) = 1 if v T, 0 ∈ V (v) = 0 if v T, 0 6∈ Vn (v) = 1 if v T, +1 ∈ X 0 Vn+1(v) = p.Vi(v ) if v V T p ∈ ∩ ¬ v v0 −→ 0 Vn+1(v) = max Vi(v ) if v V T  v→v0 ∈ ∩ ¬ 0 Vn+1(v) = min Vi(v ) if v V♦ T v→v0 ∈ ∩ ¬

Then, we define V (v) as the limit: V (v) = limn→∞ Vn(v). The existence of the limit in the definition of V (v) follows from the fact that the sequence V0(v),V1(v),V2(v), ... is non-decreasing and bounded. In next step, we show that V (v) equals to the value of the game in the vertex v. Recall that by a value in a vertex v we mean sup P σ( ). σ∈Σ v RT

10 1 3. FINITE 1 2 PLAYER GAMES

n n By T we denote the set of all runs that visit T in at most n steps. Formally, T = R ⊕ ∞ R s s s ... (V ) x n : last(sx) T ). Notice that runs in the induced Markov { 0 1 2 ∈ | ∃ ≤ ∈ } chain are actually infinite sequences of histories of the game G.

σ n Lemma 3.1. Vn(v) = maxσ∈ P ( ). Σ v RT Proof. By induction. For n = 0, it trivially holds because for any strategy, the measure of runs starting at v visiting T in zero steps equals 1 for v T , and 0 for v T . ∈ 6∈ G,σ,v For n = k + 1, we divide it in two cases: if v V , by the definition of P we have ∈ that

σ k+1 X σ k X σ k X 0 max Pv ( T ) = max p.Pv ( T ) = p. max Pv ( T ) = p.Vk(v ) σ∈Σ σ∈Σ 0 σ∈Σ 0 R p R p R p v v v v v v −→ 0 −→ 0 −→ 0 = Vk+1(v)

The first equation may seem problematic when dealing with history dependent strate- gies. The optimal strategy may use the vertex we started in for its decision in the next step. But we disregard the history after the first step. We rewrite the probabilities of reaching T from the vertex v as an expression with probabilities of reaching T from some vertex v0 omitting that we have visited the vertex v before. In fact, it is correct because we take the maximum over all strategies. For any strategy σ and vertex v V , there is a strat- ∈ ⊕ egy σv that emulates σ with history v, i.e. for any vertex u V and history wu V , ∈  ∈ σv(wu) = σ(vwu). If v V : ∈  X max P σ( k+1) = max σ(v)(v0).P σ ( k ) = max max P σ ( k ) = max max P σ ( k ) v T v0 T v0 T v0 T σ∈Σ R σ∈Σ R σ∈Σ v→v0 R v→v0 σ∈Σ R v→v0 0 = max Vk(v ) = Vk+1(v) v→v0 The second equality follows from the fact that maximizing a weighted average of a set of values by setting the weights is simply giving weight of 1 to the maximal elements of the set.

After characterizing the n-steps games, we return back to the games with unlimited amount of steps claiming that V (v) is the value of the game starting at v.

σ Theorem 3.2. V (v) = supσ∈ P ( ) = val(v). Σ v RT Proof. The second equality holds by the definition, we need to verify the first one.

σ V (v) supσ∈ P ( ) • ≤ Σ v RT We have that V (v) = supi∈NVi(n), which means that V (v) is the least upper bound σ of the set Vi(n) i N . We need to show that supσ∈ΣPv ( ) is also an upper { | ∈ } RT

11 1 3. FINITE 1 2 PLAYER GAMES

bound of this set; thus, V (v) as the least upper bound is lower or equal to any other upper bound. For any i N: ∈ σ i σ Vi(n) = maxσ∈ P ( ) supσ∈ P ( ) Σ v RT ≤ Σ v RT The inequality holds because i ; hence, for any strategy σ Σ it holds RT ⊆ RT ∈ P σ( i ) P σ( ). It is evident that the probability of reaching a target set in up to i v RT ≤ v RT steps cannot be greater then the probability of reaching it in unlimited count of steps.

σ V (v) supσ∈ P ( ) • ≥ Σ v RT 0 σ0 Assume that there exists a strategy σ such that V (v) < Pv ( T ). Then there must be σ0 k R σ0 k k N such that V (v) < Pv ( ). But Vk(v) V (v), so we have that Vk(v) < Pv ( ) ∈ RT ≤ RT which contradicts Lemma 3.1.

≥ Lemma 3.3. For each  R 0, there is a strategy which is -optimal for starting in any vertex v. ∈ Proof. Due to Lemma 3.1, it follows that for each n N we have an HD (history-dependent ∈ σn deterministic) strategy σn such that for any vertex v it holds P ( ) Vn(v). v RT ≥ We define σn as follows: during the first n steps, i.e. for history wu of length 1 i n, ≤ ≤ σn chooses the successor of u with the highest Vn−i. After m steps from v where m n, ≤ it wins with the probability Vm(v). After the first n steps, we can define σn arbitrarily and still it wins with probability Vn(v). ≥ ≥0 Since the sequence Vn(v) n∈N is non-decreasing and bounded, for each  R , there { } σk ∈ is k N such that V (v) Vk(v) . Hence, σk is -optimal because Pv ( ) Vk(v) ∈ − ≤ RT ≥ ≥ val(v) . − 1 We know that there are -optimal HD strategies for 1 2 player reachability games. The next step to show is that for any such game,

there is an optimal strategy, i.e. a strategy that in any vertex guarantees the value • of the game.

Furthermore, there is an MD optimal strategy, an optimal strategy that uses neither • randomization nor history. It proves that randomization and history do not help for this type of games.

We will start with a na¨ıve approach to the construction of the MD optimal strategy. We show why the strategy constructed according to this approach may not be optimal. Example 3.2 (The choice among maximal successors does matter). The na¨ıve approach is that we always choose the successor with the highest value. If there are several such suc- cessors, we choose an arbitrary one. This can lead to a cycle so that we never reach T even from the vertices with a positive value. An example of such a game is shown in Figure 3.2.

12 1 3. FINITE 1 2 PLAYER GAMES

Figure 3.2: An example of a game where choosing an arbitrary successor with maximal value does not have to lead to an MD optimal strategy. The strategy is indicated by the thick arrows.

Game with the MD strategy Game with the values and the probabilities

A B A B 1 1 0 0

T T 1 1

The main idea of the proof that there is an MD optimal strategy is simple. We show that for any vertex of the player  , we can remove all outgoing edges except for a single one and the value of the game in any vertex does not change. Since there are only finitely many vertices, we can repeat this procedure for all vertices in V , and the value of the game is still  unchanged. Because the player  has then only one outgoing edge in every vertex, there is only one strategy σ∗ that is clearly memoryless deterministic and optimal. Observe that σ∗ is also MD and optimal in the original game, since adding outgoing edges to vertices ∗ of player  does not change the probabilities in the Markov chain induced by σ . ≥0 Lemma 3.4. For each vertex v V and for any  , there is a strategy σ that in v uses  R 0 ∈ ∈ σ only one outgoing edge v v regardless of the history, and that is -optimal in v, i.e. P ( T ) 7→ v0 R ≥ val(v) . −

Proof. First, we describe different types of runs as illustrated in Figure 3.3. For a given vertex v, let us denote the successors of v as v1, v2,..., vk. For any successor vn, we can classify runs starting at vn into two groups: runs that never return back to v, we denote their measure by pv ; and runs that at least once return to v with measure 1 pv . The first n − n group, the runs that never return to v, can be further divided into two subgroups: runs that reach the target (winning for the player), and runs that do not reach the target. If pvn > 0, the probability of reaching the target set conditioned by not visiting vertex v along the way σ is denoted by rvn . For given  we have from Lemma 3.3 an -optimal strategy σ0, but this strategy can use different edges outgoing from v through the history of the game. We claim that

val(v) = 0 (exists an edge v v0 such that pσ0 > 0 rσ0 val(v) ) (3.1) ∨ 7→ v0 ∧ v0 ≥ −

If this claim is true, it actually means that we can build an -optimal strategy σ that uses only one edge outgoing from v. If val(v) = 0, any σ using only one arbitrary edge is

13 1 3. FINITE 1 2 PLAYER GAMES

Figure 3.3: Classification of runs starting at the vertex vn (here v3) used in the proof of Lemma 3.4. By solid circles we denote arbitrary vertices in situations where their type does not matter.

probability of reaching T conditioned by never returning to v xσ v1 σ v3 rv3 = pσ v3 xσ = P σ (Reach(T ) Reach(v)) v3 v3 ∩ ¬ runs that never return to v v winning runs reaching T without 2 pσ = P σ ( Reach(v)) returning to v v3 v3 ¬ loosing runs not reaching v3 T and not returning to v v v 4 set of all runs runs that return to v without reaching T starting at v3 1 pσ = P σ (Reach(v)) − v3 v3

vk

clearly -optimal. Otherwise, we modify the -optimal strategy σ0 so that it uses in v only 0 the edge v v . We call this modified strategy σ and show that it is still -optimal. Notice 7→σ σ that pσ0 = p  and rσ0 = r  because these probabilities are not influenced by decisions v0 v0 v0 v0 in v. The probability of coming back to v is strictly less than 1; therefore, the probability of cycling forever through v is 0. In each cycle from v through v0 back to v, the conditioned probability of reaching T is val(v) . The total probability of reaching T from v is then ≥ − val(v) . Hence, σ is -optimal in v. ≥ − We have to prove the claim (3.1). We show that if it does not hold, then the strategy σ0 cannot be -optimal. Assume (3.1) does not hold. Then either for all v u, the probability σ0 7→ of not returning pu = 0. Using any edge, we return to v without reaching T . Therefore, val(v) = 0 contradicting the assumption that (3.1) does not hold. Or there are some edges v u with pσ0 > 0, but rσ0 < val(v)  for all of them. It is evident that σ0 cannot reach T 7→ u u − from any successor u with probability val(v) . ≥ − We can also understand the sequence Vk as an operator transforming Vk into Vk+1. The following definition makes this idea precise: Definition 3.2. For a game G, we define an operator L : [0, 1]V [0, 1]V , where L(x) = x0 → such that: 0 x v = 1 if v T 0 ∈ x v = max xv if v V T 0  v7→v0 ∈ ∩ ¬ x0 = min x v V T v v0 if ♦ v7→v0 ∈ ∩ ¬ 0 X x v = p.xv0 if v V T p ∈ ∩ ¬ v v0 −→ 14 1 3. FINITE 1 2 PLAYER GAMES

Figure 3.4: Two fixed points of the operator L. We can appropriately increase the values in vertices A and B that are connected in a cycle, and we get a greater fixed point.

Game with the values Game with the values of the least fixed point of a greater fixed point

A B A B 0.5 0.5 0.8 0.8

R R 0.5 0.5

T U T U 1 0 1 0

Recall that for a vertex v, the value of the game in the vertex v is defined as val(v) = sup P σ( ). By val we denote the vector of values of the game in the individual ver- σ∈Σ v RT tices.

Lemma 3.5. The vector of values val is the least fixed point of the operator L.

Proof. The set [0, 1]V with component-wise partial order is clearly a complete partial order L V = L(0) L(V ) = V val = sup Ln(0) and the operator is continuous. Since 0 , k k+1, and n∈N0 , val is by the Kleene fixed-point theorem the least fixed point of L.

Example 3.3 (The fixed points of the operator L are not unique). There may be multiple fixed points of the operator L due to the fact that we can appropriately increase the compo- nents of the vector in vertices that lie on a cycle and still get a fixed point. This is illustrated in Figure 3.4.

0 Definition 3.3. For a game G = (V, , (V ,V♦,V ), P rob) and vertices v V and v V 0 7→  ∈  ∈ such that v v , we denote by Gv7→v0 a game obtained from G by removing all outgoing 7→ 0 edges from v except for the edge v v . Similarly, we denote by Lv7→v the L operator 7→ 0 in the altered game.

Lemma 3.6. For each vertex v V with two or more outgoing edges, there is an edge to vertex v0 ∈  such that removing all the edges outgoing from v except for this one edge does not change the value of the game in any vertex u, i.e. for each u it holds that valGv v (u) = valG(u). 7→ 0 0 Proof. We start by proving that there is an edge v v such that valGv v (v) valG(v). 7→ 7→ 0 ≥ When we take  1 , 1 , 1 ,... , we get by Lemma 3.4 a sequence of strategies and a se- ∈ { 2 4 8 } quence of edges they use from v. Since there are only finitely many edges, there must be an edge that occurs infinitely many times in this infinite sequence. Hence, there is an edge

15 1 3. FINITE 1 2 PLAYER GAMES

0 0 0 v v such that for any , there is an   such that σ uses v v . Thus, even if we 7→ ≤ 0 7→ remove all the edges from v but v v0, we can get  close to the original value for any . 7→ The value in v cannot decrease: valGv v (v) valG(v). 7→ 0 ≥ On the other hand, by removing edges outgoing from a vertex of the player  (in other words by restricting the set of strategies Σ in the original game G), the value of any vertex cannot increase. For all u V , we get valGv v (u) valG(u). In particular, for the vertex v ∈ 7→ 0 ≤ we have shown both inequalities. Therefore, valG(v) = valGv v (v). Because there is only 0 7→ 0 0 0 one edge outgoing from v, valG (v) = valG (v ). Finally, valG (v ) = valG(v ); v v0 v v0 v v0 therefore, values in v and v0 in both7→ the original and7→ the altered game all7→ equal to the same number.

We know that valG is the least fixed point of the operator L. Let valGv v be component- 7→ 0 wise lower or equal to valG and in at least one component strictly lower than valG. Because 0 valGv v (v ) = valGv v (v) = valG(v), valGv v is also a fixed point of L, contradicting valG 7→ 0 7→ 0 7→ 0 being the least fixed point of L. Therefore valGv v = valG. 7→ 0 1 Theorem 3.7. For any 1 2 player game G with a reachability objective, there is an MD strategy optimal in any vertex. Proof. We subsequently remove edges outgoing from vertices in V so that there is only  one outgoing edge for every vertex v V . Let us denote this pruned game by G0. Thanks ∈  to Lemma 3.6, the value of the game does not change, i.e. for any vertex v, valG0 (v) = 0 ∗ ∗ valG(v). In G , there is only one strategy σ which is MD and optimal in any vertex. σ is also optimal and MD in G. Note that the existence of the optimal strategy also means that we can replace supre- 1 mum by maximum in the definition of the value for 1 2 player reachability games: σ val(v) = max Pv ( T ) σ∈Σ R Another important question is: Are we able to compute the value of a reachability game 1 in an efficient way? For the 1 2 -player case, the answer is positive; we can compute it using linear programming [2]. 1 Theorem 3.8. For a 1 2 player game G with reachability objective, the value of the game is the op- timal solution to the following linear program P 1 over the set of variables xv v∈V : { } X minimize xv (3.2) v∈V

satisfying xv 0 v V (3.3) ≥ ∀ ∈ xv 1 v T (3.4) ≥ ∀ ∈ xv xu v V T, u succ(v) (3.5)  ≥ X ∀ ∈ ∩ ¬ ∀ ∈ xv p.xu v V T (3.6) ≥ p ∀ ∈ ∩ ¬ v u −→

16 1 3. FINITE 1 2 PLAYER GAMES

Figure 3.5: Tossing coins, cont. Compared to Figure 3.1, this figure is extended by the val- ues of the game written below the vertex labels and by the optimal strategy indicated by the thick arrows.

st nd 1 flip 2 heads 2nd tails 0.525 0.7 0.3

0/0 heads 1/0 0/1 1/1 2/0 0/2 0.525 / tails tossed 0.75 0.3 0.5 0.9 0.1

[0/0] [1/0] [0/1] [1/1] [2/0] [0/2] 0.4 0.75 0.25 0.5 0.9 0.1 0.75 0.25 0.75 0.25 0.5 0.4 0.5 0.1 0.9 0.9 0.1 0.6

win loose 1 0

Proof. Because val is the least fixed point of the operator L, it is evident from the definition of L that val is a feasible solution of the program P 1. Next, we show that val is the optimal solution of P 1. Let η be another feasible solution of P 1. If all the inequalities (3.4), (3.5), and (3.6) in the constraints of P 1 are in fact equalities, η is a fixed-point of L; clearly val η and P P ≤ v∈V val(v) v∈V ηv. If there is a strict inequality in any of the constraints mentioned ≤ n above, then L(η) η. Because of the monotonicity of L, also limn→∞L (η) η, so the fixed ≤ n ≤ point generated by η is lower than η. Again, val limn→∞L (η) η. For any feasible so- ≤ ≤ lution η, val η; therefore, val is the optimal solution of P 1. ≤

Example 3.4 (Tossing coins, cont.). In the first part (Example 3.1) of this example, we claimed that the value of the game in the vertex 0/0 is 0.525 and that the optimal strat- egy is to toss the coin and stop if the result is heads. If the result is tails, toss it once more. We illustrate this fact in Figure 3.5. Since there are no cycles, the longest path through the game has length 6. The values val equal to V6, and can be easily computed by hand. From the values, it is trivial to build the optimal strategy because in each vertex of the player  , there is only one successor with maximal value (cf. Example 3.2).

17 1 3. FINITE 1 2 PLAYER GAMES

3.2 Safety games

Safety games are actually minimizing reachability games. If we have a set S of safe vertices, maximizing the probability of staying in this set is equivalent to minimizing the probability of reaching the set V S. Therefore, we can replace the vertices of the player by vertices \  of the player ♦ , define the target set T = V S and the winning objective T . \ 1 R Because this case is very similar to the maximizing reachability 1 2 games, we will state the dual results without the proofs where the proofs are also dual (just swapping  for ♦ , Σ for Π, max for min and sup for inf). Yet, there is one important difference: the dualized na¨ıve approach to constructing MD optimal strategy – that we choose an arbitrary successor with minimal value – works for the player ♦ . Intuitively, in the maximal case, the problem is that by choosing an arbitrary successor, we can form a cycle never reaching the target T . On the contrary, in the min- imal case, we want to avoid reaching T . Let U denote the set of such vertices u that T is not reachable from u. We can avoid reaching T by reaching U. But if we create a cycle in V (T U) by the choice of successors and never reach U, we still win. \ ∪ First we formalize the reduction to the minimizing reachability games. For the rest 1 of this subsection we assume a finite 1 2 player safety game G = (V, , (V , ,V ),P ) 0 7→  ∅ with an objective . We define an analogous reachability game G = (V, , ( ,V ,V ),P ) SS 7→ ∅ ♦ where V = V with the winning objective . ♦  RV \S Lemma 3.9. For any strategy σ in the game G and any vertex v it holds

P G,σ( ) = 1 P G0,σ( ) v SS − v RT

0 Proof. The strategy σ is a strategy for the player ♦ in the game G , the induced Markov chain G(σ) equals to the chain G0(σ), and the set of runs is a complement to the set RT . SS Now we state the results: Theorem 3.10. Let V be defined for G0 as in Definition 3.1 with respect to the set T = V S. For \ any vertex v it holds

π σ V (v) = inf Pv ( T ) = 1 sup Pv ( S) = 1 valG(v) π∈Π R − σ∈Σ S −

Proof. The proof of the first equation is dual to the proof of Theorem 3.2, the second equa- tion follows easily from Lemma 3.9 and the last equation holds by definition.

Along with the definition of V , we get a sequence of strategies πn satisfying πn+1(uwv) = 0 0 πn(wv) and πn(v) = v where Vn(v) = Vn−1(v ) = minv7→u Vn−1(u). In a proof dual to πn n the proof of Lemma 3.1, it can be shown that for any n and v it holds that P ( ) = Vn(v). v RT

18 1 3. FINITE 1 2 PLAYER GAMES

Theorem 3.11. There is an optimal MD strategy σ∗ for the safety game G.

∗ Proof. It suffices to find an optimal MD strategy π for the player ♦ in the reachability game G0, in other words a strategy that for any vertex v satisfies:

π∗ π Pv ( T ) V (v) = inf Pv ( T ) (3.7) R ≤ π∈Π R Let π∗ be any strategy such that for any vertex v it holds that π∗(v) = v0 where V (v0) = ∗ minv7→u V (u), i.e. π chooses an arbitrary successor with the minimal value. The idea of the proof is that by choosing successors with minimal values this strategy π∗ can keep the value no matter how long we play. If we look at the values in the vertices we reach after k steps, for arbitrary k, and sum the values weighted by the probabilities of the paths leading to them, we get the same value as in the vertex we started in. To put it formally, we claim that for any k and v, P π∗ ( k ) V (v) from which (3.7) v RT ≤ easily follows. We prove it by induction on k. For k = 0, it trivially holds. For k = n + 1, we prove it separately for both types of vertices. If v V and π∗(v) = v0, then ∈ ♦ P π∗ ( n+1) = P π∗ ( n ) V (v0) = V (v) v RT v0 RT ≤

On the other hand if v V then ∈ X X P π∗ ( n+1) = p.P π∗ ( n ) p.V (v0) = V (v) v T v0 T R p R ≤ p v7→v0 v7→v0

1 0 Theorem 3.12. For a 1 2 player game G with a minimizing reachability objective, the value of the game is the optimal solution to the following linear program P 2 over the set of variables xv v∈V : { } X maximize xv v∈V

satisfying xv 0 v V ≥ ∀ ∈ xv 1 v T ≤ ∀ ∈ xv xu v V♦ T, u succ(v) ≤ X ∀ ∈ ∩ ¬ ∈ xv p.xu v V T ≤ p ∀ ∈ ∩ ¬ v u −→ The value vector of the corresponding safety game G equals to 1 x, where x is the optimal solution − of P 2.

Proof. The first part is dual to the proof of Theorem 3.8. The second part is immediate due to Theorem 3.10.

19 Chapter 4

1 Finite 22 player games

When the second player enters the arena, it gets more complicated. At least from the com- putational point of view because the answers to the theoretical questions are in most cases the same. Again, we start with reachability where the goal of the player  is to reach the set of tar- get vertices T , the goal of the player ♦ is to avoid it.

4.1 Reachability games

1 Concerning the determinacy, the same results hold for the player  and ♦ as in the 1 2 player minimizing and maximizing reachability games, respectively. In particular, both players have optimal MD strategies. In most cases the proofs are similar, only a bit more technical because of the second player. Concerning the algorithms, there has been a lot of interest in this area and still there is one major question remaining open. Can we decide in polynomial time whether the value of the game in a given vertex is greater or equal to 1/2?

Determinacy and optimal strategies

We start with the determinacy and optimal strategies repeating some ideas from the Sec- tion 3.1 so this section is self-contained. This proof mainly follows the proof provided by Brazdil´ et al. [4]. First, we need one auxiliary lemma, completely standard in the :

Lemma 4.1. For any sets A and B and a bounded function f : A B R × → sup inf f(a, b) inf sup f(a, b) a∈A b∈B ≤ b∈B a∈A

Proof. For all a A, b B it holds ∈ ∈ f(a, b) sup f(a, b) ≤ a∈A

20 1 4. FINITE 2 2 PLAYER GAMES

We can apply the infimum and the supremum to both sides in sequence to get the desired result. Notice that a variable is bound by the innermost quantifier.

inf f(a, b) inf sup f(a, b) b∈B ≤ b∈B a∈A sup inf f(a, b) sup inf sup f(a, b) = inf sup f(a, b) a∈A b∈B ≤ a∈A b∈B a∈A b∈B a∈A

Recall that n denotes the set of runs that visit the target set T during the first n steps. RT Recall also Definition 3.1 of Vn, which is intuitively the value of the n-step game: V0(v) 0 is 1 for v T and 0 otherwise, Vk+1(v) is 1 for v T , maxv7→v0 Vk(v ) for v V T , ∈ 0 P ∈ 0 ∈  ∩ ¬ min V (v ) v V T p p.V (v ) v V T V (v) v7→v0 k for ♦ and finally v7→v k for . We define ∈ ∩ ¬ 0 ∈ ∩ ¬ 1 as the limit of Vn(v). The intuition mentioned earlier is formalized for 2 2 player games as follows:

Lemma 4.2. For any vertex v V and n N it holds ∈ ∈ σ,π n σ,π n Vn(v) = max min Pv ( T ) = min max Pv ( T ) (4.1) σ∈Σ π∈Π R π∈Π σ∈Σ R

∗ ∗ and there are HD optimal n-step strategies σn and πn such that

σn∗ ,π n σ,πn∗ n Vn(v) = min Pv ( T ) = max Pv ( T ) (4.2) π∈Π R σ∈Σ R

∗ ∗ ∗ ∗ Proof. We define HD strategies σn and πn inductively: let σ0 and π0 be any HD strate- ∗ ∗ ∗ 0 0 gies. Then, we define σk and πk for the first step: σk(v) := v for any v V , where v is 0 ∈  00 an arbitrary successor of v maximizing Vk−1, i.e. satisfying Vk−1(v ) = maxv7→v00 Vk−1(v ). ∗ 0 0 Accordingly, for any u V♦: πk(u) = u , where u is an arbitrary successor of u satis- 0 ∈ 00 ∗ ∗ fying Vk−1(u ) = minu7→u00 Vk−1(u ). Finally, we define σk and πk for a non-empty his- ⊕ ∗ ∗ tory sw V where s V : in the second step σk emulates σk− , so for any v V , ∈ ∈ 1 ∈  σ∗(swv) = σ∗ (wv). In a similar manner, π∗(swu) = π∗ (wu) for any u V . k k−1 k k−1 ∈ ♦ We prove the lemma by induction; for n = 0, it is immediate because in zero steps, strategies have no impact and we can reach the target only if we are already there. For n = k + 1, we show that:

σ,π k+1 σ,π k+1 Vk+1(v) max min Pv ( T ) min max Pv ( T ) Vk+1(v) (4.3) ≤ σ∈Σ π∈Π R ≤ π∈Π σ∈Σ R ≤ from which (4.1) immediately follows for k + 1. Notice that the second inequality in (4.3) follows from Lemma 4.1. To prove the first inequality in (4.3), we distinguish three cases by the type of the vertex σ,π k+1 σ,π k+1 v. First, we notice that maxσ∈Σ minπ∈Π Pv ( T ) is greater or equal to minπ∈Π Pv ( T ) R ∗ R for any specific strategy σ and also for the strategy σk+1, in particular. Then, we expand

21 1 4. FINITE 2 2 PLAYER GAMES the first step from the vertex v and by the induction hypothesis we show that the value ∗ achieved by σk+1 in v equals to Vk+1(v).

σ,π k+1 σk∗+1,π k+1 max min Pv ( T ) min Pv ( T ) = (4.4) σ∈Σ π∈Π R ≥ π∈Π R σk∗,π k ∗ v V : = min Pσ (v)( T ) = Vk(σk+1(v)) = (4.5) ∈  π∈Π k∗+1 R 0 = max Vk(v ) = Vk+1(v) (4.6) v7→v0 X σ ,π σ ,π v V : = min π(v)(v0).P k∗ ( k ) = min min P k∗ ( k ) = (4.7) ♦ v0 T v0 T ∈ π∈Π R π∈Π v7→v0 R v7→v0 σk∗,π k 0 = min min P ( ) = min Vk(v ) = Vk+1(v) (4.8) v0 T v7→v0 π∈Π R v7→v0 X σk∗,π k X σk∗,π k v V : = min p.Pv ( T ) = p. min Pv ( T ) = (4.9) π∈Π 0 π∈Π 0 ∈ p R p R v7→v0 v7→v0 X 0 = p.Vk(v ) = Vk+1(v) (4.10) p v7→v0 ∗ In the first inequality in (4.5), we move from v to σk+1(v) because it is the turn of the player  , and decrease the amount of remaining steps to k, the first equality in (4.6) holds by ∗ the definition of σk+1. The second equality in (4.7) holds because the value of a convex combination is minimal when all elements that are not minimal have weight 0. By a completely analogical procedure, it can be shown that:

σ,π k+1 σ,πk∗+1 k+1 min max Pv ( T ) max Pv ( T ) = Vk+1(v) π∈Π σ∈Σ R ≤ σ∈Σ R which proves the third inequality in (4.3). As a result, we have (4.1) and (4.2) for k + 1, so the induction is finished.

In the next step, we need to prove a similar statement in the limit case, i.e. for the infinite amount of steps.

Theorem 4.3. The game G has a value in any vertex. I.e. for any vertex v V it holds ∈ σ,π σ,π V (v) = sup inf Pv ( T ) = inf sup Pv ( T ) = val(v) σ∈Σ π∈Π R π∈Π σ∈Σ R

Proof. The proof is divided into three steps:

σ,π V (v) sup infπ∈ Pv ( ) • ≤ σ∈Σ Π RT Since V (v) = supn∈ Vn(v), it is the least upper bound of the set Vn(v) n∈N. If we N σ,π { } show that sup infπ∈ Pv ( ) is also an upper bound of this set, we are finished. σ∈Σ Π RT

22 1 4. FINITE 2 2 PLAYER GAMES

σ,π We need to show for each n N that Vn(v) supσ∈Σ infπ∈Π Pv ( T ). Yet, we have ∈ ≤σ,π n R from Lemma 4.2 that Vn(v) = supσ∈Σ infπ∈Π Pv ( T ), and the set of all runs reaching n R T in up to n steps, T , is by definition a subset of T . For each σ and π it holds that σ,π n σ,π R R σ,π Pv ( ) Pv ( ) and finally: Vn(v) sup infπ∈ Pv ( ). RT ≤ RT ≤ σ∈Σ Π RT σ,π σ,π sup infπ∈ Pv ( ) infπ∈ sup Pv ( ) • σ∈Σ Π RT ≤ Π σ∈Σ RT Holds by Lemma 4.1. σ,π infπ∈ sup Pv ( ) V (v) • Π σ∈Σ RT ≤ It suffices to show that there is a strategy π∗ such that

σ,π∗ sup Pv ( T ) V (v) (4.11) σ∈Σ R ≤

We use the same technique as in the proof of Theorem 3.11. Let π∗ be an arbitrary MD ∗ 0 0 strategy such that for any vertex v V , π (v) = v where V (v ) = minv7→u V (u). ∈ ♦ Now, we prove for any v V by induction on k that ∈ σ,π∗ k sup Pv ( T ) V (v) (4.12) σ∈Σ R ≤ which directly leads to (4.11). For k = 0, it trivially holds. Next, we assume that (4.12) holds for k = n and prove it for k = n + 1. If v T , then V (v) = 1 and (4.12) trivially ∈ holds. If v T , we first deal with the case of v V and assume that π∗(v) = v0. It 6∈ ∈ ♦ holds that

σ,π∗ n+1 σ,π∗ n 0 sup Pv ( T ) = sup Pv ( T ) V (v ) = V (v) σ∈Σ R σ∈Σ 0 R ≤ where the first equality holds by definition of π∗, the inequality by the induction hypothesis, and the last equality from the definition of π∗ and from the fact that V (v) = minv7→u V (u). Next, for v V we have that ∈  X σ,π σ,π sup P σ,π∗ ( n+1) = sup σ(v)(v0)P ∗ ( n ) sup max P ∗ ( n ) v T v0 T v0 T σ∈Σ R σ∈Σ R ≤ σ∈Σ v7→v0 R v7→v0 = max sup P σ,π∗ ( n ) max V (v0) = V (v) v0 T v7→v0 σ∈Σ R ≤ v7→v0

At last, for v V it holds ∈ X σ,π X σ,π sup P σ,π∗ ( n+1) = sup p.P ∗ ( n ) = p. sup P ∗ ( n ) v T v0 T v0 T σ∈Σ R σ∈Σ p R p σ∈Σ R v7→v0 v7→v0 X p.V (v0) = V (v) ≤ p v7→v0 which concludes the proof of the whole theorem.

23 1 4. FINITE 2 2 PLAYER GAMES

Figure 4.1: Classification of runs starting at the vertex vi.

probability of reaching T conditioned by never returning to v σ σ,π rv = infπ Π Pv (Reach(T ) Reach(v)) v1 i ∈ i | ¬

winning runs reaching T without runs that never return to v σ σ,π returning to v pv = infπ Π Pv ( Reach(v)) i ∈ i ¬ loosing runs not reaching vi T and not returning to v v

set of all runs runs that return to v without reaching T starting at v σ σ,π i 1 pv = supπ Π Pv (Reach(v)) − i ∈ i vk

Notice that the strategy π∗ is MD and because of Theorem 4.3 also optimal in any vertex, which results in the following corollary. ∗ Corollary 4.4. There is an MD optimal strategy π for the player ♦ . In the next step we show the existence of an MD optimal strategy for the player  . 1 The proof closely follows the proof of Theorem 3.7, the same claim for the 1 2 player games. Therefore, we present it now in a more concise way, for further explanation refer to the Sec- tion 3.1. The main difference between the two proofs lies in the following lemma. ≥0 Lemma 4.5. For each vertex v V and for any  R , there is a strategy σ that in v uses ∈  ∈ only one outgoing edge v v0 regardless of the history, and that is -optimal in v, i.e. P σ ( ) 7→ v RT ≥ val(v) . − ≥ Proof. Let us fix a vertex v and  R 0. Due to Lemma 4.2, we have an -optimal determin- ∈ istic strategy σ0. The problem is that this strategy is not memoryless, it can take different edges from v depending on the history of the game. We show that there must be a single edge that is sufficient for σ0. Assume the vertex v has successors v1 . . . , vk. Figure 4.1 describes the runs starting at the vertex vi in the Markov chain induced by a strategy σ and any strategy π. For the further arguments, we need two quantities: first, the measure of runs that never return back to v σ σ from vi; the infimum of this measure over all π Π is denoted by p . Second, if p > 0, we ∈ vi vi can talk about the conditional probability of reaching T from vi assuming we do not return to v; the infimum of this probability over all π Π is denoted by rσ . ∈ vi We claim that (val(v) = 0) or (there is an edge v v0 such that pσ0 > 0 and rσ0 val(v) ) (4.13) 7→ v0 v0 ≥ −

24 1 4. FINITE 2 2 PLAYER GAMES otherwise σ0 cannot be -optimal. Assume that val(v) > 0 and that there is no such edge. There are two possible reasons why this can happen. σ0 The first case is that for all edges v u : pu = 0. Let us fix an arbitrary δ > 0. If 0 7→ σ takes any edge, the player ♦ has a strategy πδ such that the probability of escaping (i.e. the probability of not returning to v) is δ. After the next return to v, the player can use ≤ ♦ a strategy π δ , then a strategy π δ and so on. We denote this halving strategy of the player 2 4 0 ♦ that starts with δ by πˆδ. The probability of reaching T with σ and πˆδ satisfies: δ δ δ P σ0,πˆδ ( ) δ + (1 δ) + (1 δ)(1 ) + ... (4.14) v RT ≤ − 2 − − 2 4 δ δ δ + + + ... (4.15) ≤ 2 4 = 2δ (4.16)

val(v)− σ0,πˆδ val(v)− If we set δ to , we get Pv ( ) , which contradicts the -optimality 4 RT ≤ 2 of σ0. σ0 The second case is that there are some edges v u such that pu > 0, but all of them σ0 7→ have the conditional probability ru < val(v) . Then, there is a strategy πˆ such that for σ0,πˆ − σ0,πˆ all these edges v u : Pu ( ) < val(v) . Therefore, Pv ( ) < val(v) , which 7→ RT − RT − again contradicts the -optimality of σ0. We have shown the claim (4.13), which allows us to create a strategy σ that uses only one edge outgoing from v and that is also -optimal. If val(v) = 0, then we can fix for σ any σ,π arbitrary edge and the strategy is -optimal, i.e. infπ∈Π Pv ( T ) > 0 . Otherwise, we 0 ⊕R − 0 have an edge v v satisfying (4.13). For any history wu V , we define σ(wu) = σ (wu) 7→ 0 σ0 ∈σ σ0 σ for u = v, and σ(wu) = v for u = v. Notice that pv = p and rv = r because for any 6 0 v0 0 v0 strategy π, the probabilities in the induced Markov chain are not influenced by the decision in v. Then, we can express the probability of reaching T as the probability of reaching T without returning to v plus the probability of reaching T with at least one return:

σ,π σ σ σ σ,π inf Pv ( T ) = pv .rv + (1 pv ). inf Pv ( T ) (4.17) π∈Π R 0 0 − 0 π∈Π R σ σ σ,π 0 = pv .(rv inf Pv ( T )) (4.18) 0 0 − π∈Π R σ,π σ inf Pv ( T )) = rv val(v)  (4.19) π∈Π R 0 ≥ − hence, σ is -optimal.

Now, we need to recall Definitions 3.2 and 3.3. For a game G, we define the operator V V 0 0 L : [0, 1] [0, 1] in a similar fashion to the vector Vk. L(x) = x such that x v = 1 if → v T x0 = max x v V x0 = min x v V , otherwise v v7→v0 v0 if , v v7→v0 v0 if ♦, and finally 0 ∈ ∈  ∈ n P p x v = p.xv0 if v V . Notice that L(0) = V1 and V = limn→∞ L (0). As shown v v0 ∈ in Lemma−→ 3.5, V is the least fixed point of the operator L.

25 1 4. FINITE 2 2 PLAYER GAMES

Furthermore, for a game G and vertices v V and v0 V such that v v0, we ∈  ∈ 7→ denote by Gv7→v0 a game obtained from G by removing all outgoing edges from v except 0 for the edge v v . Similarly, Lv7→v denotes the operator L in the altered game. 7→ 0 Lemma 4.6. For each vertex v V with two or more outgoing edges, there is an edge to vertex v0 ∈  such that removing all the edges outgoing from v except for this one edge does not change the value of the game in any vertex u, i.e. valGv v (u) = valG(u). 7→ 0

Proof. Due to Lemma 4.5, we have for each  a strategy σ which is -optimal in v and 1 1 which uses from v only one edge v v. If we take  2 , 4 ,... , we get an infinite 7→ ∈ { } 0 sequence of edges v v 1 , v v 1 ,... . There must be an edge v v infinitely many { 7→ 2 7→ 4 } 7→ times in this sequence. Therefore, for any  there is an -optimal strategy using only v v0, 7→ and the value valGv v (v) valG(v). 7→ 0 ≥ By removing transitions of the player  , we only restrict the set of strategies Σ, and the value of any vertex cannot increase, valGv v valG. 7→ 0 ≤ We have valG(v) = valGv v (v). Furthermore, valGv v (v) = maxu∈{v0} valGv v (u) = 0 7→ 00 0 7→ 0 7→ 0 valGv v (v ). Finally, valGv v (v ) valG(v ) valG(v) = valGv v (v) = valGv v (v ); val- 7→ 0 7→ 0 ≤ ≤ 7→ 0 7→ 0 ues in v and v0 in both the original and the altered game all equal to the same number.

Let valGv v < valG. We know that valG is the least fixed point of the operator L. Be- 7→ 0 0 cause valGv v (v ) = valGv v (v) = valG(v), also valGv v is a fixed point of L, contradicting 7→ 0 7→ 0 7→ 0 that valG is the least fixed point. Therefore, valGv v = valG. 7→ 0 ∗ Theorem 4.7. There is an MD optimal strategy σ for the player  .

Proof. For all vertices v V we subsequently remove all outgoing edges except for one, 0 ∈  0 resulting in a game G . Due to Lemma 4.6, valG0 = valG. In the game G , the player  has only one strategy σ∗ which is clearly optimal and MD. Because for each strategy π, we have G0(σ∗, π) = G(σ∗, π), thus σ∗ is also optimal in the original game G.

1 Corollary 4.8. Any 2 2 player game G with reachability objective is strongly determined: both of the players have optimal strategies. Furthermore, they have MD optimal strategies; for any vertex v V it holds ∈ σ,π σ,π σ,π max inf Pv ( T ) = max min Pv ( T ) = min max Pv ( T ) σ∈ΣMD π∈Π R σ∈ΣMD π∈ΠMD R π∈ΠMD σ∈ΣMD R σ,π = min sup Pv ( T ) = valG(v) π∈ΠMD σ∈Σ R

Complexity

We continue with the algorithmic questions. Because it is not the of the thesis, we will omit some technical details and provide only intuition for the proofs. The reader can look up the full details in the references.

26 1 4. FINITE 2 2 PLAYER GAMES

There is no known polynomial-time algorithm to compute the values of a game. Con- don [8] considered a special case of simple stochastic games and showed that the problem “is the value of a game in the vertex s > 1 ?” lies in NP co-NP . In the following text, we 2 ∩ will explain this proof.

Definition 4.1. A Simple stochastic game (SSG) is a tuple (V,E, (V ,V ,V )), where (V,E)  ♦ is a directed graph with three special vertices:  -sink, ♦ -sink and a starting vertex s. Fur- thermore, all vertices have exactly two outgoing edges, except for the two sink vertices, which have none. For a vertex v V , the probabilities of transitions are uniformly dis- 1 ∈ tributed, i.e. 2 for each of the two successor vertices. The goal of the player  is to reach the  -sink, the goal of the player ♦ is to avoid it. The Markov chain induced by strategies σ and π is defined similarly to our definition. The σ,π value of the game is supσ∈Σ infπ∈Π Pv (Reach(  -sink ). { } 1 First we need to understand the difference between SSGs and our definition of 2 2 player games. For any 2 1 game G with rational probabilities on vertices outgoing from -vertices, 2 we can build a simple stochastic game G0 by adding intermediate vertices and edges in- stead of every edge in G. This procedure is illustrated in Figure 4.2. Notice that the graph of the game G0 may be considerably larger than the graph of the game G. The increase q q mainly depends on the number maxv∈V r r = GCD(v), r is irreducible where GCD(v) { | } is the greatest common divisor of the probabilities on the edges outgoing from v. If the game G has irrational probabilities, it cannot be transformed into a SSG. As we can see, this model is weaker than the model used in our text, yet there are good reasons for such a simplification. Namely, it allows the following lemma, which we state without the proof. p Lemma 4.9. The value of a simple stochastic game G with n vertices is of the form q , where p, q are integers, 0 p, q < 4n−1. ≤ n Corollary 4.10. If the value of a SSG with n vertices is > 1 , then it is 1 + 1 . 2 ≥ 2 4 We will make use of this result later in the proof. How can we show that the discussed problem is in NP co-NP ? We have to create non-deterministic polynomial algorithms ∩ for deciding whether the value > 1 and whether the value 1 . These algorithms work 2 ≤ 2 in a similar procedure: 1. guess the vector v of rationals between 0 and 1 with both the numerator and the de- numerator < 4n−1 2. verify that v is the value vector of the game G 1 3. compare vs to 2 and output the desired result The steps 1 and 3 are trivial, the problem lies in the step 2. We can make use of the op- erator L (see Definition 3.2). Due to Lemma 3.5, the value vector of the game is a fixed- point of L. Yet in general, there may be several fixed-points of the operator (see Exam- ple 3.3). Therefore, we cannot verify that v is the value vector so easily. At this point, we

27 1 4. FINITE 2 2 PLAYER GAMES

1 Figure 4.2: Transforming a 2 2 player game into a simple stochastic game w.r.t. reachabil- ity. If a vertex of the player  or ♦ has more than two successors, we divide the decision of the player into multiple stages by adding vertices of the player, as shown in (i). In (ii) we present the transformation of a vertex v that does not satisfy the conditions for SSG. We build the smallest binary tree of vertices such that the amount of edges outgoing from the leaves in the tree is greater or equal to the denominator in GCD(v). Then we connect proportionally to the probabilities the leaves’ edges to the original successors of v. The re- maining edges are connected back to v. Notice that we do not allow multi-graphs; instead of two edges from 4 to b we create one edge and one loop resulting in the same probabil- ities. In (iii) we show how to deal with vertices with only one successor. In the same way we connect all vertices in T to the new vertex  -sink after we remove all their outgoing edges.

a v a (i) 3 b 1 b a v (ii) 1 c 1 a 2 c 7 4 d v 4 v d 7 b b 2 5 7 c (iii) 2 c v a v a 6

1

use the Corollary 4.10 and a concept of a stopping stochastic game introduced by Shap- ley [14]. We transform the game G into a game G0 such that it has a unique fixed point of the operator L and furthermore, the values in G0 do not differ much from the values 1 1 in G. Due to Corollary 4.10, we than get valG(s) > 2 if and only if valG0 (s) > 2 , which completes the step 2. For an integer m and a SSG G, we create a stopping game G0 by substituting a chain made of m new -vertices for every single edge v u in G. We connect the vertex v 7→ to the first chain link, the last chain link to the ♦ -sink and every chain link to the vertex u as shown in Figure 4.3. Thus, instead of moving from u to v as in the original game G, 1 we can end up in the ♦ -sink with a probability 2m . It is called a stopping game because 1 we reach a sink vertex (and stop) with probability 1. For a parameter m, we call it a 2m - stopping game. Shapley [14] proved the following lemma:

0 1 0 Lemma 4.11 (Shapley, [14]). If G is a 2m -stopping game corresponding to G, then G has a unique fixed-point of L. 28 1 4. FINITE 2 2 PLAYER GAMES

Figure 4.3: Transformation of a SSG into a stopping game. This procedure is repeated for every edge; there are E .m new vertices and E .(2m + 1) new edges in the stopping game. | | | | u

♦-sink v u v 1 2 m ♦

To be able to prove the main theorem, we need one more auxiliary lemma:

Lemma 4.12. There is a constant c > 0 such that if G is a SSG with n vertices, the value of G is 1 1 1 > 2 if and only if the value of the corresponding 2nc -stopping game is > 2 . Theorem 4.13. Deciding whether an SSG G with n vertices has value > 1 is in the class NP 2 ∩ co-NP .

1 Proof. The non-deterministic Turing machine guesses a vector v, constructs a 2nc -stopping game G0, where c is the constant from Lemma 4.12 and verifies whether v is a fixed-point 1 of LG0 . Finally, the machine for the SSG value problem accepts the input iff vs > 2 , the ma- 1 chine for the complement accepts the input iff vs . ≤ 2 This concludes the presentation of the result of Condon [8]. Notice that the theorem 1 can be proven in a simpler way even for general 2 2 player games because we know that there are MD optimal strategies for both players. In this alternative proof, we only require the probabilities on transitions from the random vertices to be rational. In a game G, the sets of MD strategies for both players are finite. Therefore, we can non- deterministically guess an optimal MD strategy of the player  . When we input this strat- 1 0 egy into the game G, we get a 1 2 player game G of the player ♦ by transforming the ver- 0 tices of the player  into random vertices. In the game G , we can compute the values using the linear program P 1 from Theorem 3.8. If there is a strategy such that the value 0 1 of the starting vertex s in the game G is greater than 2 , the value of the vertex s in the orig- inal game G is also > 1 . If there is no such strategy, the value of G in s is 1 . The comple- 2 ≤ 2 ment of the problem is also in the class NP ; we guess an optimal MD strategy of the player 1 ♦ which induces a 1 2 player game of the player  . In this game, we can also compute 1 the values in polynomial time and compare them to 2 .

4.2 Safety games

In the safety games, the goal of the player  is not to leave a set of vertices S, the safe states. Again the player ♦ tries to avoid it and eventually leave the set S. Which is exactly a dual problem to reachability.

29 1 4. FINITE 2 2 PLAYER GAMES

Recall that denotes a set of runs visiting only vertices in S. In a safety game SS G = (V, , (V ,V ,V ), P rob) with a winning objective S the goals are: 7→  ♦ S : maximize the prob. of staying in S minimize the prob. of reaching V S •  ⇐⇒ \ : minimize the prob. of staying in S maximize the prob. of reaching V S • ♦ ⇐⇒ \ The equivalences hold because = , i.e. the set of safe runs with respect to S SS ¬RV \S are the runs that do not reach the complement of S. Therefore, we define to the safety game 0 G a dual reachability game G = (V, , (V ,V ,V ), P rob) with a winning objective 7→ ♦  RV \S where the goals are:

: maximize the prob. of reaching V S •  \ : minimize the prob. of reaching V S • ♦ \ 0 Notice that we have swapped the vertices of players  and ♦ in G compared to G. Because for any strategies σ and π, the induced Markov chain G(σ, π) equals the chain 0 σ,π,G π,σ,G0 G (π, σ) and = , we have that Pv ( ) = 1 Pv ( ). Therefore we SS ¬RV \S SS − RV \S have

σ,π,G π,σ,G0 π,σ,G0 sup inf Pv ( S) = sup inf (1 Pv ( V \S)) = 1 inf sup Pv ( V \S) = σ∈Σ π∈Π S σ∈Σ π∈Π − R − σ∈Σ π∈Π R

σ0,π0,G0 = 1 inf sup Pv ( V \S) = 1 valG (v) π ∈Π 0 − 0 0 σ0∈Σ0 R −

0 0 0 where Σ and Π are the sets of strategies of the players  and ♦ in the game G , respectively. σ,π,G Similarly, infπ∈ sup Pv ( ) = 1 valG (v), which means that any safety game G Π σ∈Σ SS − 0 has a value valG(v) = 1 valG (v). − 0

4.3 B ¨uchigames

We first recall the Buchi¨ winning objective: there is a set of target vertices T and the goal is to visit some vertex from T infinitely many times. For this section, we draw inspira- tion from the proofs by Chatterjee et al. [7]. This problem is closely related to reachability. Assume the vertex that the player tries to visit infinitely many times is some v V . If  ∈ the player  starts then from the vertex v with a strategy σ, this strategy must guarantee revisiting v with probability 1 against any strategy π of the player ♦ . Otherwise, the runs visiting v infinitely many times would have 0 measure. With such a strategy σ, the player  only needs to reach the vertex v in order to win, when starting from some other vertex u. To be able to start reasoning formally about the Buchi¨ games, we need to introduce a slight variation of the reachability games that we call positive reachability games. Let T V be a set of vertices. Then, + is a set of runs that reach T in at least one step. ⊆ RT It means that positively reaching the set v from the vertex v is not trivial – you need { } to leave the vertex and return back.

30 1 4. FINITE 2 2 PLAYER GAMES

Because of the similarity, we will not go into detail; all the proofs are completely analo- gous to the case of the reachability games. We point out only a few differences. First, in par- + + allel to Definition 3.1, we define for any vertex v a sequence Vk (v) as follows: V0 (v) = 0, + P 0 0 0 V (v) p p.V (v ) max V (v ) min V (v ) v V k+1 is defined as v7→v k , or v7→v0 k , or v7→v0 k for , or 0 + + ∈ v V , or v V , respectively. Finally, V (v) = limn→∞ Vn (v). Since the sequence V is ∈  ∈ ♦ non-decreasing and bounded, V + is also non-decreasing and bounded. Theorem 4.14. For a game G with a positive reachability objective +, the value vector of the game RT equals to V + and both players have MD optimal strategies. For each v V it holds ∈ + σ,π + σ,π + σ,π + σ,π + V (v) = sup inf Pv ( T ) = inf sup Pv ( T ) = max inf Pv ( T ) = min sup Pv ( T ) σ∈Σ π∈Π R π∈Π σ∈Σ R σ∈Σ π∈Π R π∈Π σ∈Σ R

R Furthermore, if val T is the value vector of G with reachability objective , then for each v V RT ∈ it holds

R+ R val T (v) = max val T (u) v V v7→u ∀ ∈  R+ R val T (v) = min val T (u) v V v7→u ∀ ∈ ♦ R+ X R val T (v) = p . val T (u) v V p ∀ ∈ v u −→

We leave out the proof, since it is only a slight variation to the proof for the reachability objective. Now, we can return to the game G with the Buchi¨ objective . We characterize BT the value of the game using the reachability and the positive reachability objectives. Definition 4.2. Let G be a finite 2 1 player game with a Buchi¨ winning objective . The set 2 BT of the revisiting target vertices A is defined as follows:

+ R v A = v T val { } = 1 { ∈ | G } Theorem 4.15. For a finite 2 1 player game G with B¨uchiobjective and any v V we have 2 BT ∈

RA σ,π σ,π BT valG (v) = sup inf Pv ( T ) = inf sup Pv ( T ) = valG (v) σ∈Σ π∈Π B π∈Π σ∈Σ B

Proof. The last equality follows from the definition if we prove the first two equalities. We prove them by the following inequalities:

RA σ,π σ,π RA valG (v) sup inf Pv ( T ) inf sup Pv ( T ) valG (v) ≤ σ∈Σ π∈Π B ≤ π∈Π σ∈Σ B ≤ σ,π σ,π sup infπ∈ Pv ( ) infπ∈ sup Pv ( ) • σ∈Σ Π BT ≤ Π σ∈Σ BT It follows from Lemma 4.1. 31 1 4. FINITE 2 2 PLAYER GAMES

RA σ,π val (v) sup infπ∈ Pv ( ) • G ≤ σ∈Σ Π BT ∗ RA σ∗,π We construct the optimal strategy σ such that val (v) infπ∈ Pv ( ). For G ≤ Π BT a history w, σ∗ behaves like the optimal strategy for reaching the set A if w has not visited A yet. Otherwise, there is a vertex v such that it is the first visited ver- tex from A in the history w. Then, σ∗ behaves like the optimal strategy for revis- iting v. From the optimality of the strategies σ∗ is composed of, it is evident that σ∗,π RA σ∗,π infπ∈ Pv ( ) = val (v) and infπ∈ Pv ( ) = 1. Hence, it holds that Π RA G Π BT | RA σ∗,π RA infπ∈ Pv ( ) val (v). Π BT ≥ G σ,π RA infπ∈ sup Pv ( ) val (v) • Π σ∈Σ BT ≤ G ∗ σ,π∗ RA ∗ We construct the optimal strategy π such that sup Pv ( ) val (v). Let π σ∈Σ BT ≤ G be an MD optimal strategy for the winning objective . We fix an arbitrary strategy RA σ Σ and a vertex v V . Let X T be the set of the winning runs that do not ∈ ∈ σ,π∗ ⊆ B reach A. We show that Pv (X) = 0. If v A, then X is empty. Otherwise, let us fix ∈ some target vertex u T A. By Xu, we denote the set of such runs ω X that visit u ∈ \ σ,π∗ ∈ infinitely many times. If we show Pv (Xu) = 0, we are finished because of the ad- σ,π∗ σ,π∗ ditivity of Pv . Assume that Pv (Xu) > 0, then σ must be able to force revisiting of u with a probability 1; otherwise, it would not be able to repeat it infinitely many times with non-zero probability. Which is a contradiction to u A. 6∈

∗ The optimal strategy π for the player ♦ is MD. A natural question follows: is there an MD optimal strategy for the player  ? Theorem 4.16. For a finite 2 1 player game G with a B¨uchiobjective , there is an MD optimal 2 BT strategy for the player  . Proof. Let σ∗ be an MD optimal strategy for positive reachability objective + and π be RA an arbitrary strategy. From the definition of A and from the optimality of σ∗, for any vertex σ∗,π + v A, Pv ( ) = 1. Once the player gets inside the set A, the player can visit A ∈ RA  infinitely many times with a probability 1. Each run ω visiting A infinitely many times must visit some vertex u A infinitely many times because the set A is finite, so ω T . σ∗,π ∈σ∗,π ∈ B Therefore, Pv ( T ) Pv ( A). ∗ B ≥ R Moreover, σ is also optimal for the reachability objective A. In fact, for each vertex σ ,π σ ,π + R ∗ ∗ + RA RA v A, Pv ( A) = Pv ( A) and val (v) = val (v); on the contrary for v A, σ6∈∗,π R R R σ∗,π σ∗,π R ∈ Pv ( ) = 1 = val A (v). In total, we get Pv ( ) Pv ( ) = val A (v), the strat- RA BT ≥ RA egy σ∗ is optimal for the objective . BT As regards the complexity of computing the value vector, Chatterjee et at. [7] general- ized the results of Condon [8] explained in Section 4.1, and showed that the quantitative problem for games with Buchi¨ objectives is also in NP co-NP. ∩

32 Chapter 5 Finitely-branching games

So far, we have considered only games with finite state space. In this and also in the next chapter we move on to the infinite state space. At first, we talk about finitely-branching games. That are games, in which the set of outgoing transitions from any state is always finite. The game cannot branch in one step into infinitely many states. Secondly, we address the more general case allowing even infinite branching. We divide the text into two distinct chapters because there are important differences between the two cases. The main difference lies in the attention they attract, finitely-branching games are widely studied in contrast to the infinite branching games. The reason for this may be the fact that there are natural ways, how to express infinite-state finitely-branching games by a finite- state model. For infinitely-branching games, no such model has been studied to the best of our knowledge. The theoretical questions for infinite-state games such as the determinacy and the ex- istence of optimal strategies will be asked in this text without any such finite model. Yet, for the practical questions and for the practical applicability of this theory, such as model checking, a finite description of a game is crucial. In this text, we deal with the theoretical questions only; for detailed information about the models, consult the references. A short description of the main three models follows. Recursive Markov decision processes and Recursive simple stochastic games [9], [10] 1 1 are finite collections of finite 1 2 or 2 2 player games that can recursively call each other. In every component, there are specified places where another components may be called. The called component is entered via some call port and left via some return port. The next case is a model equivalent to the previous model. PDA games [5], [3], [4], [6] are games generated by pushdown automata. The pair of the current state together with the top stack symbol – called the head of the configuration – determines whose turn it is. For every head, the rules prescribe an allowed set of actions. Each action consists of changing the control state, and of removing the top stack symbol or replacing it by one or two symbols. The set of vertices in the resulting game is then QΓ∗, where Q is the finite set of control states and Γ is the finite stack alphabet. Game Probabilistic Lossy Channel System [1] are finite state games equipped with a fi- nite number of unbounded FIFO channels. In each transition, one fixed letter from the mes- sage alphabet may be sent to or received from some specified channel. The states are par- titioned between the player  and the player ♦ . After any transition, several letters from

33 5. FINITELY-BRANCHINGGAMES arbitrary positions in any channels may get lost. The message loss is independent and de- termined by a probability distribution – it is the player in the game.

So much for the overview of the main models used. Now, we move on to the determi- nacy. Again as for the finite state models, we will start with the reachability objective. We 1 will discuss only the general case of 2 2 player games.

5.1 Reachability games

Most of the proofs from Section 4.1 work directly for the infinite-state games with finite branching. We do not assume in the proofs that the state space is finite, on the other hand the finite branching is used in most of the proofs. The only proof that is not correct for the finitely-branching games shows the exis- tence of an optimal strategy for the player  . In this proof, we show that in any vertex of the player  , we can remove all outgoing edges except for a single one and the value of the game does not change. We repeat this procedure for all vertices of the player  until there is only one strategy of the player  – the MD optimal one. But we cannot repeat this procedure for infinitely many vertices, so for the case of infinite-state games, the proof does not hold. In fact, the theorem does not hold, the player  might have no optimal strategy. Exam- ple 5.1 shows such a game.

Example 5.1 (A game with no optimal strategy for the player  ). The player  does not necessarily have an optimal strategy in the situation of an infinite state space. There may be an infinite sequence of vertices such that the further the player goes in this sequence, the higher probability of reaching the target there is. To reach the target, the player must eventually leave the sequence. But continuing in this sequence for only one more step would guarantee the player an even higher probability of reaching the target. This game is illustrated in Figure 5.1 Now, we state the results from the finite games that apply directly to the infinite state games with finite branching. For the rest of this section we fix a finite branching game G = (V, , (V ,V ,V ), P rob) with a reachability objective T . 7→  ♦ R Theorem 5.1. The value of the game G is equal to V (v) in each vertex v V . Furthermore, ∈ the player ♦ has an MD optimal strategy. Hence, we have σ,π σ,π σ,π σ,π sup inf Pv ( T ) = inf sup Pv ( T ) = V (v) = sup min Pv ( T ) = min sup Pv ( T ) σ∈Σ π∈Π R π∈Π σ∈Σ R σ∈Σ π∈ΠMD R π∈ΠMD σ∈Σ R Proof. Definition 3.1, Lemma 4.1, Lemma 4.2 and Theorem 4.3 apply directly to the situa- tion of finitely branching games.

Since the player  does not have an optimal strategy in general, it leaves space for subtle questions: In what type of games the player  has an optimal strategy? If there is

34 5. FINITELY-BRANCHINGGAMES

Figure 5.1: A game with no optimal strategy for the player  . The set of target vertices T = T1, T2,... , the player starts in the vertex S and moves to the right. In a vertex p, { } 1 the player moves to the top from where there is a probability 1 p of reaching the target. − 2 The dotted round vertices are the random vertices, the probabilities of transitions from these vertices are uniformly distributed. In the vertices S, 1, 2,... the value of the game { } is 1, yet it cannot be achieved by any specific strategy.

T1 T2 T3 T4

U R1 R2 R3 R4

S 1 2 3 4

an optimal strategy for the player  , is there also an MD optimal strategy? What type of -optimal strategies does the player  have? We approach the first question by presenting a result of Brazdil´ et al. [4]. They showed that for every finite branching game with a reachability objective, one of the two following cases is true. Either the player ♦ has a strictly optimal strategy π. It is a strategy such that σ,π for all strategies σ it holds that Pv ( ) < val(v). Or the player has an optimal strategy. RT  First we need one auxiliary lemma: Lemma 5.2. For each v V and every  > 0, we have ∈ σ,π n σ Σ n N π Π: Pv ( T ) > val(v)  ∃ ∈ ∃ ∈ ∀ ∈ R −

Proof. Due to Lemma 4.2 we can choose such n that Vn(v) > V (v) . We get a strategy σn σn,π n − such that for each π : Pv ( ) > V (v)  = val(v) . RT − − Theorem 5.3. For any v V exactly one of the following statements holds ∈ π0 Π σ Σ: P σ,π0 ( ) < val(v) or (5.1) ∃ ∈ ∀ ∈ v RT σ0 Σ π Π: P σ0,π( ) val(v) (5.2) ∃ ∈ ∀ ∈ v RT ≥

σ0,π0 Proof. It is obvious that at most one of the statements holds, otherwise Pv ( T ) < val(v) σ0,π0 R and at the same time Pv ( ) val(v). We prove the theorem by showing (5.1) (5.2): RT ≥ ¬ ⇒ π Π σ Σ: P σ,π( ) val(v) σ∗ Σ π Π: P σ∗,π( ) val(v) (5.3) ∀ ∈ ∃ ∈ v RT ≥ ⇒ ∃ ∈ ∀ ∈ v RT ≥

35 5. FINITELY-BRANCHINGGAMES

We start with an overview of the following proof. Let us assume that the left hand side of (5.3) holds. We restrict the set of edges of the player to such edges u v that val(u) =  7→ val(v). We show that the left hand side of (5.3) holds in this altered game as well. Due to Lemma 5.2 we have for every vertex v in the restricted game a strategy σv and a nat- σv,π nv val(v) ∗ ural number nv such that for any π, Pv ( ) > . The optimal strategy σ then RT 2 in sequence emulates these vertex n-step strategies. Starting in the vertex v, it behaves like the strategy σv for the following nv steps. Then it reaches some vertex u, so for the fur- ther nu steps it behaves like the strategy σu. Then it reaches some vertex t, and in the next iteration it behaves like the strategy σt, and so on. Because of the restriction of the edges, the expected value of the vertex we reach after each iteration is greater or equal to the value of the vertex at start of the iteration, for any strategy π. Hence, in each iteration we decrease 1 the probability of not reaching T at least by a factor of 2 . Now we go through the arguments in a more precise manner. Let G0 denote the re- stricted game where for each v V there is an edge v u only if the edge v u is ∈  7→ 7→ also in the original game G and if valG(v) = valG(u). To satisfy the definition of a game, every vertex must have at least one successor. Indeed it has because valG(v) = V (v) = minv7→u V (u) = minv7→u valG(u). Let Σ0 denote the restricted set of strategies in G0. We want to show that

σ,π 0 σ,π π Π σ Σ: P ( ) valG(v) π Π σ Σ : P ( ) valG(v) (5.4) ∀ ∈ ∃ ∈ v RT ≥ ⇒ ∀ ∈ ∃ ∈ v RT ≥ For the sake of a contradiction, assume the opposite. There is a π such that each σ Σ0 sat- σ,π π ∈ isfies Pv ( T ) < valG(v). We denote by Σ the set of strategies σ Σ satisfying in the left R σ,π π 0 ∈ π hand side of (5.4) that Pv ( ) valG(v). Clearly, Σ Σ = . It means that any σ Σ RT ≥ ∩ ∅ ∈ must use some removed edge u t for some history wu that has a non-zero measure σ,π 7→ (Pv (Run(G(σ, π), wu)) > 0), and that does not visit T . We modify the strategy π such that after all such histories wut of σ Σπ it starts behaving like an optimal minimizing 0 ∈ σ,π0 strategy and call it π . We see that there cannot be any σ Σ such that Pv ( ) val(v), ∈ RT ≥ contradicting the left hand side of (5.3) for G. From (5.4) and from the fact that the value cannot increase be restricting the set of strate- gies of the player  , we have that valG0 = valG. Due to Lemma 5.2, for every vertex v, we have a strategy σv and a natural number σv,π nv val(v) nv such that Pv ( ) > for every π Π. We denote by A(n) the set of vertices RT 2 ∈ reachable from the vertex v in exactly n steps. For every i N0 we inductively define ∈ the maximal amount of steps mi that are necessary for performing the first i iterations: ∗ m = 0 and mk = mk + max nv v A(mk) . Using mi, we define the strategy σ . Let 0 +1 { | ∈ } w be any history, i N0 be such a number that mi < w mi+1, and w = xty such that ∈ ∗ ∗ | | ≤ x = mi, t V , and y V . Then, σ (w) = σt(y). | | ∈ ∈ Finally, we prove for v by induction on i that for any strategy π it holds that 1 P σ∗,π( mi ) > (1 ).val(v) (5.5) v RT − 2i

36 5. FINITELY-BRANCHINGGAMES

As the base of the induction, for i = 0 the claim trivially holds. We denote by B(n) the set of finite paths w such that w = n and w does not visit T . For i = k + 1, we have: | | m X val(u) P σ∗,π( k+1 ) = P σ∗,π( mk ) + P σ∗,π(Run(G(σ∗, π), wu)). (5.6) v RT v RT v 2 wu∈B(mk+1) val(v) X P σ∗,π( mk ) + . P σ∗,π(Run(G(σ∗, π), wu)) (5.7) ≥ v RT 2 v wu∈B(mk+1) val(v) = P σ∗,π( mk ) + .(1 P σ∗,π( mk )) (5.8) v RT 2 − v RT val(v) val(v) = P σ∗,π( mk )(1 ) + (5.9) v RT − 2 2 1 val(v) val(v) > (1 ).val(v)(1 ) + (5.10) − 2k − 2 2 1 (1 ).val(v) (5.11) ≥ − 2k+1 where in (5.6), we divide the first k iterations and the k + 1-th iteration; notice that a finite path after n steps has length n + 1. Next, (5.7) holds because the strategy σ in the ver- tex u always chooses a successor with value equal to the value of u. Hence, the weighted sum of the values of the vertices we can reach after an iteration that starts at the vertex v0 (weighted by the probabilities of the paths leading to them) is greater or equal to val(v0). Furthermore, in (5.7) we use the fact that the set of all runs not visiting T during the first mk steps is the complement to the set of runs that visit T in up to mk steps, (5.10) holds by the induction hypothesis, and (5.11) is easy to check. By taking i in (5.5) to the limit, we get the desired result that π Π it holds that σ∗,π ∀ ∈ Pv ( ) val(v) RT ≥

5.2 B ¨uchigames

To satisfy the Buchi¨ winning objective, the player  has to visit some vertex from the target set T infinitely many times. We will not provide the proof of the determinacy because Buchi¨ games are not the core of this text. We only show the difference from the finite Buchi¨ games. The main difference is that the player  does not necessarily have an optimal strategy. We cannot build an opti- mal strategy σ∗ because there might be no optimal strategies for the reachability objectives. In other words, we cannot guarantee the optimal probability of visiting the target set in- finitely many times if we cannot guarantee the optimal probability even for the first visit. Furthermore, we show that the player  might not have a memoryless -optimal strategy, in general. A game, where no memoryless -optimal strategy exists is shown in Figure 5.2.

37 5. FINITELY-BRANCHINGGAMES

Figure 5.2: A game with no memoryless -optimal strategy for the player  . The set of tar- get vertices T = S , the vertex S is also the initial vertex. Let 1 >  > 0 be a fixed number; { } the player tries to build a memoryless -optimal strategy σ. Notice that val(S) = 1. We show that randomization does not help. Assume n is the first vertex such that σ(n)(Rn) = µ > 0. 1 If there is no such vertex, σ never reaches the vertex S again. There is a probability µ. n2 > 0 of reaching the vertex U from the vertex n. Assume, the player reaches the vertex S in- finitely many times. The player also visits the vertex n infinitely many times. Hence, in- finitely many times there is a non-zero probability of reaching U; the player looses with probability 1.

10 20 30 40

U R1 R2 R3 R4

S 1 2 3 4

38 Chapter 6 Infinitely-branching games

The widest class of games we consider in this text are the games with countably infinite state space and unlimited branching, any vertex can have up to countably many succes- 1 sors. Again as in the previous chapter, we will mainly talk about the reachability 2 2 games. At last, we will also mention the Buchi¨ winning objective.

6.1 Reachability games

The first question we deal with is the type of the determinacy. Are the infinitely-branching games (weakly) determined? Furthermore do the players have optimal strategies? In Chap- ter 4 we observe that for the case of finite games, both players have optimal strategies, whereas in Chapter 5 we find out that the player  does not have an optimal strategy in general. Finally, Example 6.1 shows that in the case of infinitely-branching games, none of the players has an optimal strategy in general.

Example 6.1 (A game with no optimal strategy for the player ♦ ). There is a close similar- ity between this example and Example 5.1. The main idea of both examples is that there are countably many vertices the player can move to and the probabilities of reaching T from these vertices form a sequence converging to 1 or to 0 for the player  or the player ♦ , respectively. Then, there is no optimal choice for the player. In the situation of finitely- branching games this choice was built by an infinite sequence of vertices, dividing the de- cision into multiple steps. However, this construction allowed one more decision, namely not to choose at all, which was optimal for the player ♦ . On the other hand in the situation of infinitely-branching games, we can build such an infinite choice in a single step as shown in Figure 6.1. Then, the player ♦ must choose one of the options and therefore has no optimal strategy.

For this section, we fix an infinite branching game G = (V, , (V ,V ,V ), P rob). Even 7→  ♦ though there are no optimal strategies, we still need to show the weak determinacy, i.e. σ,π σ,π sup infπ∈ Pv ( ) = infπ∈ sup Pv ( ). To simplify the arguments in the proof, σ∈Σ Π RT Π σ∈Σ RT we assume for this section the following assumption.

Assumption 6.1. For an infinite branching game G with a reachability objective we assume RT that once a run enters the set T , it can never leave it, i.e. for all v T , succ(v) T . ∈ ⊆

39 6. INFINITELY-BRANCHINGGAMES

Figure 6.1: A game with no optimal strategy for the player ♦ . There is one target state T, the player starts in the vertex S and tries to avoid reaching T. From the random vertex Rk 1 (denoted by a dotted circle), the probability of reaching T is 2k . Probabilities on transitions from the random vertices are uniformly distributed and therefore omitted in the figure. σ,π There is only one trivial strategy σ in this example. Therefore, we have infπ∈Π PS ( T ) = σ,π R 0, but for no strategy π it holds that P ( ) = 0. S RT

S

T R1 R2 R3 R4

U1 U2 U3 U4

Any game G can be easily altered to satisfy this assumption by removing all transitions from T leading out of T . If there is a vertex with no transitions left, we add a self-loop. Notice that no probabilities regarding reachability get changed. We start with an infinite-branching analogy of the operator L: Definition 6.1. For an infinite branching game G we define an operator L∞ : [0, 1]V 7→ [0, 1]V where L∞(x) = x0 such that 0 x v = 1 if v T, ∈ 0 X 0 x v = p.xv if v V T p ∈ ∩ ¬ v v0 0 −→ 0 x v = sup p.x if v V T v  v→v0 ∈ ∩ ¬ 0 0 x v = inf p.xv if v V♦ T v→v0 ∈ ∩ ¬

The set [0, 1]V with component-wise partial order is clearly a complete partial order and the operator L∞ is continuous. Therefore, there exists the least fixed point of the operator L∞, we denote it by I. In the sequel, we show that I(v) is the value of the game in the vertex v. Notice that we cannot define the vector I in the same way as the vector V for the finitely branching games. For the infinitely branching games, the value does not equal to the limit of the n-step value, in general. Example of such a game is shown in Figure 6.2.

40 6. INFINITELY-BRANCHINGGAMES

Figure 6.2: A game where the limit of the n-step values does not equal to the value. The value in S equals to 1 because in each branch we certainly reach a target vertex. But for each n, the n-step value in S equals to 0 in this game. In fact, for each n, there is a strategy πn 1 that chooses Rn in the vertex S. This strategy then avoids reaching the target set in the first n steps. The target Tn is reached in the step n + 1.

S

1 1 1 1 R1 R2 R3 R4

2 2 2 T1 R2 R3 R4

3 3 T2 R3 R4

4 T3 R4

T4

We show that I(v) is the value of the game in the vertex v, that

σ,π σ,π I(v) = sup inf Pv ( T ) = inf sup Pv ( T ) (6.1) σ∈Σ π∈Π R π∈Π σ∈Σ R We start with one auxiliary lemma. σ,π ∞ Lemma 6.2. (sup infπ∈ Pv ( ))v∈V is a fixed point of the operator L . σ∈Σ Π RT σ,π Proof. We denote the vector (sup infπ∈ Pv ( ))v∈V by µ. The proof is divided into σ∈Σ Π RT three parts by the type of the vertex v. We need to show that:

∞ L (µ)v = µv (6.2) X σ,π σ,π v V : p. sup inf Pv ( T ) = sup inf Pv ( T ) (6.3) π∈Π 0 π∈Π ∈ p σ∈Σ R σ∈Σ R v7→v0 σ,π σ,π v V : sup sup inf Pv ( T ) = sup inf Pv ( T ) (6.4)  π∈Π 0 π∈Π ∈ v7→v0 σ∈Σ R σ∈Σ R v V : inf sup inf P σ,π( ) = sup inf P σ,π( ) (6.5) ♦ v0 T v T ∈ v7→v0 σ∈Σ π∈Π R σ∈Σ π∈Π R

41 6. INFINITELY-BRANCHINGGAMES

We begin with the random vertices, with the equality (6.3). The proof is immediate because P σ,π P σ,π p p. supσ∈Σ infπ∈Π Pv ( T ) = supσ∈Σ infπ∈Π p p.Pv ( T ) and for each σ and π v7→v0 P σ,π0 R σ,π v7→v0 0 R it holds that p p.Pv ( T ) = Pv ( T ). v7→v0 0 R R If v V , let us assume that in (6.4) the left hand side is strictly greater than the right ∈  hand side. Because the supremum is strictly greater, there must be some vertex u succ(v) ∈ for which it is strictly greater:

σ,π σ,π exists u succ(v) such that sup inf Pu ( T ) > sup inf Pv ( T ) (6.6) ∈ σ∈Σ π∈Π R σ∈Σ π∈Π R

Yet, for each strategy σ there is a strategy σ0 that in v chooses u and then it simulates σ. Simi- 0 σ,π larly, for each strategy π there is a strategy π that from u also simulates π. Then, Pu ( T ) = σ0,π0 σ,π σ,π R Pv ( ). Hence, we have that sup infπ∈ Pu ( ) sup infπ∈ Pv ( ). RT σ∈Σ Π RT ≤ σ∈Σ Π RT Assume that the left hand side is strictly less than the right hand side in (6.4). Then, ≥ there is some  R 0 such that: ∈ σ,π σ,π for each u succ(v) it holds that sup inf Pu ( T ) +  < sup inf Pv ( T ) (6.7) ∈ σ∈Σ π∈Π R σ∈Σ π∈Π R

Clearly, for any strategy σ on the right hand side, there is a strategy π (inspired by the min- imizing strategies on the left hand side) such that

σ,π  σ,π Pv ( T ) + < sup inf Pv ( T ) (6.8) R 2 σ∈Σ π∈Π R

σ,π  σ,π which leads to a contradiction that sup infπ∈ Pv ( ) + < sup infπ∈ Pv ( ). σ∈Σ Π RT 2 σ∈Σ Π RT For v V , the arguments are dual to the arguments for the situation of v V . ∈ ♦ ∈  Another step in the proof of the weak determinacy is the existence of -optimal strate- gies for the player ♦ . We build the -optimal strategy in the following way. It is a strategy 0 0 that in a vertex v chooses a successor v with I(v ) close to infv7→u I(u). Furthermore, we require this strategy to get in each following step twice as precise as in the previous step so that the total error of the infinite run converges to . For this proof, we need to define another subset of runs that reach T :

Definition 6.2. Let v V , T V , and k, i be natural numbers such that 0 i k. By k,i,v ∈ ⊆ ≤ ≤ we denote a set of runs ω that reach any vertex from T in at most k steps and the i-th RT vertex of ω is the vertex v. Recall that the vertices in the induced Markov chain are in fact k,i,v ⊕ ∞ histories of G so that = s s s ... (V ) last(si) = v x k : last(sx) T . RT { 0 1 2 ∈ | ∧ ∃ ≤ ∈ } Lemma 6.3. For each  > 0, there is a strategy π such that for each v V it holds that ∈

σ,π sup Pv ( T ) I(v) +  σ∈Σ R ≤

42 6. INFINITELY-BRANCHINGGAMES

∗ Proof. Let  be arbitrary, fixed. Then, π is any strategy that for each w V and v V 0 0  ∈ ∈ satisfies π(wv) = v such that I(v ) infv7→u I(u) + i , where i = wv . For better legibility, ≤ 2 | | we write π(i, v) for any π(wv), where i = wv . | | For any fixed k, any vertices v, u and any strategy σ we will prove by induction on i, 0 i k that ≤ ≤ k X  P σ,π ( k,i,v) I(v) + (6.9) u RT ≤ 2j j=i+1 From this we get the lemma, since for each k and σ it holds that k X  P σ,π ( k ) = P σ,π ( k,0,v) I(v) + I(v) +  (6.10) v RT v RT ≤ 2j ≤ j=1 We start the induction for i = k. Since there are no further steps left and we know σ,π k,i,v that we are in the vertex v, Pu ( T ) = 1 if v T and 0 otherwise. Notice that we used R ∈ σ,π k,i,v the Assumption 6.1 that after entering T it cannot be left. So clearly, Pu ( T ) I(v)+0. R 0 ≤ Assuming (6.9) for i = n + 1, we show it for i = n 0, for π(n + 1, v) = v : ≥ k X  v V : P σ,π ( k,n,v) = P σ,π ( k,n+1,v0 ) I(v0) + ∈ ♦ u RT u RT ≤ 2j j=n+2 k k  X  X  I(v) + + = I(v) + ≤ 2n+1 2j 2j j=n+2 j=n+1 k σ,π k,n,v σ,π k,n+1,v0 0 X  v V : Pu ( T ) sup Pu ( T ) sup (I(v ) + j ) ∈  R ≤ v7→v R ≤ v7→v 2 0 0 j=n+2 k k X  X  = I(v) + I(v) + 2j ≤ 2j j=n+2 j=n+1 k σ,π k,n,v X σ,π k,n+1,v0 X 0 X  v V : P ( ) = p.P ( ) p.(I(v ) + ) ∈ u RT u RT ≤ 2j p p j=n+2 v7→v0 v7→v0 k k X  X  = I(v) + I(v) + 2j ≤ 2j j=n+2 j=n+1

Now, we prove the weak determinacy of infinitely-branching games: Theorem 6.4. For each v V it holds that ∈ σ,π σ,π I(v) = sup inf Pv ( T ) = inf sup Pv ( T ) σ∈Σ π∈Π R π∈Π σ∈Σ R

43 6. INFINITELY-BRANCHINGGAMES

Figure 6.3: A game with no memoryless optimal strategy for the player ♦ . The set of target vertices T = T ,T ,... . The vertex S is the initial vertex. Player can subsequently visit { 1 2 } ♦ the vertices T ,T ,... never visiting any target vertex twice. Hence, val(S) = 0. Let π Π 1 2 ∈ be memoryless; there must be some k such that π(S)(Tk) > 0. The probability of visiting the vertex Tk infinitely many times is then 1, and the strategy π is not optimal.

S

T1 T2 T3 T4

B

Proof. We will prove the theorem by showing that

σ,π σ,π I(v) sup inf Pv ( T ) inf sup Pv ( T ) I(v) (6.11) ≤ σ∈Σ π∈Π R ≤ π∈Π σ∈Σ R ≤

σ,π The first inequality follows from Lemma 6.2. Because sup infπ∈ Pv ( ) as a compo- σ∈Σ Π RT nent of a fixed point of V ∞ is certainly greater or equal to the same component in the least fixed point – the vector I(v). The second inequality results from Lemma 4.1, the last in- equality from Lemma 6.3 because it holds

σ,π σ,π inf sup Pv ( T ) inf sup Pv ( T ) I(v) π∈ 0 Π σ∈Σ R ≤ ∈R≥ σ∈Σ R ≤

6.2 B ¨uchigames

In Buchi¨ games, the player  tries to visit some vertex from the target set T infinitely many times. Because there are no optimal strategies for reachability infinitely-branching games in general, both of the players might lack the optimal strategies in Buchi¨ games as well. We only mention one further difference from the finitely-branching games. We already mentioned that the player ♦ might have no optimal strategy. Furthermore, even if the player ♦ has an optimal strategy, there might not exist an optimal memoryless strategy. An exam- ple of a game with no memoryless optimal strategy is shown in Figure 6.3.

44 Chapter 7 Summary

In this text, we gave a uniform overview of the known results for the following questions for several types of games:

Are the games determined? • Are there optimal strategies for both players? • Are there MD optimal strategies for both players? • Is it possible to compute the optimal strategies or the value of a game in some efficient • way?

We mainly discussed the reachability objective and showed the following statements: The reachability games are determined for the finite games and the infinite games with both finite and infinite branching. In the finite reachability games, both players have MD 1 optimal strategies. The MD optimal strategies also exist for 1 2 player games of the player  or the player ♦ . In the case of the infinite games with finite branching, only the player ♦ has an opti- mal strategy, an MD optimal strategy in fact. If the player ♦ has no strictly optimal strategy, the player  has an optimal strategy. In the games with infinite branching, none of the play- ers has an optimal strategy in general. The safety games are dual to the reachability games, the roles of the players are re- versed. As regards the existence of the optimal strategies in the finite Buchi¨ games, it is the same as in the case of the reachability games. There are subtler questions in the scope of this thesis that remain unsolved such as: If there is an optimal strategy of player  in the finitely branching games, is there an MD op- timal strategy? This overview could be also further extended for different types of winning objectives such as the parity objective or objectives specified by some type of linear-time or branching-time logic.

45 Bibliography

[1] P.A. Abdulla, N.B. Henda, L. de Alfaro, R. Mayr, and S. Sandberg. Stochastic games with lossy channels. Lecture Notes in Computer Science, 4962:35, 2008.

[2] L. Alfaro. Formal verification of probabilistic systems. 1998.

[3] T. Brazdil,´ V. Brozek,ˇ V. Forejt, and A. Kucera.ˇ Reachability in recursive markov deci- sion processes. Information and Computation, 206(5):520–537, 2008.

[4] T. Brazdil,´ V. Brozek,ˇ A. Kucera,ˇ and J. Obdrzˇalek.´ Qualitative Reachability in Stochas- tic BPA Games. In STACS 2009, 26th International Symposium on Theoretical Aspects of Computer Science, 2009.

[5] V. Brozek.ˇ Decidability and complexity of infinite-state stochastic games. Master’s Thesis, Faculty of Informatics, Masaryk University, 2007.

[6] V. Brozek.ˇ Basic Model Checking Problems for Stochastic Games. Ph.D. Thesis, Fac- ulty of Informatics, Masaryk University, 2009.

[7] K. Chatterjee and T.A. Henzinger. Quantitative stochastic parity games. In Proceed- ings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, pages 121–130. Society for Industrial and Applied Mathematics Philadelphia, PA, USA, 2004.

[8] Anne Condon. The complexity of stochastic games. Information and Computation, 96:203–224, 1992.

[9] K. Etessami and M. Yannakakis. Recursive Markov decision processes and recursive stochastic games. In Proceedings of ICALP, volume 3580, pages 891–903. Springer, 2005.

[10] K. Etessami and M. Yannakakis. Efficient qualitative analysis of classes of recursive Markov decision processes and simple stochastic games. Lecture Notes in Computer Science, 3884:634–645, 2006.

[11] A. Maitra and W. Sudderth. Finitely additive stochastic games with borel measurable payoffs. International Journal of Game Theory, 27(2):257–267, 1998.

[12] Donald A. Martin. Borel determinacy. Annals of Math., 102:363–371, 1975.

46 7. SUMMARY

[13] Donald A. Martin. The determinacy of blackwell games. J. Symb. Log., 63(4):1565– 1581, 1998.

[14] LS Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39(10):1095–1100, 1953.

47