Determinacy and Optimal Strategies in Stochastic Games

MASARYK UNIVERSITY FACULTY}w¡¢£¤¥¦§¨ OF I !"#$%&'()+,-./012345<yA|NFORMATICS Determinacy and Optimal Strategies in Stochastic Games MASTER’S THESIS Jan Krˇcál Brno, 2009 Declaration I declare that this thesis is my own work and has not been submitted in any form for another degree or diploma at any university or other institution of tertiary education. Infor- mation derived from the published or unpublished work of others has been acknowledged in the text and a list of references is given. Advisor: prof. RNDr. Anton´ın Kucera,ˇ Ph.D. ii Acknowledgement I would like to thank my advisor – Anton´ın Kuceraˇ for offering me such a great topic, for his patience with my repetitive questions, for valuable comments, and for the care he invested in the frequent seminars that helped me with my writing. Another person who helped me significantly with my thesis is Vaclav´ Brozek;ˇ I want to thank him for many fruitful discussions helping me to find mistakes as well as better solutions. Finally, I would like to thank my friends and family for encouraging me, for their un- derstanding, and for allowing me to concentrate on my work. iii Abstract We deal with the determinacy of stochastic turn-based perfect-information games with reachability, safety, and Buchi¨ winning objectives. We separately discuss the situation for finite-state games, finitely-branching infinite-state games and infinite-state games with un- limited branching. Even though all these games are determined thanks to the general result of Martin [13], we provide simpler proofs for these specific cases that also allow us to reason about the existence of the optimal strategies. iv Keywords stochastic games, Markov decision processes, determinacy, optimal strategies, memory- less deterministic strategies, reachability, safety, Buchi¨ objectives, infinite state space, finite branching v Contents 1 Introduction ........................................3 2 Definitions .........................................5 2.1 Basics .........................................5 2.2 Markov chain and its probability space .....................5 2.3 Games ........................................6 2.4 Winning objectives .................................8 2.5 Determinacy and values ..............................8 1 3 Finite 1 2 player games .................................. 11 3.1 Reachability games ................................. 11 3.2 Safety games ..................................... 20 1 4 Finite 2 2 player games .................................. 23 4.1 Reachability games ................................. 23 4.2 Safety games ..................................... 32 4.3 Buchi¨ games ..................................... 33 5 Finitely-branching games ................................ 37 5.1 Reachability games ................................. 38 5.2 Buchi¨ games ..................................... 41 6 Infinitely-branching games ............................... 43 6.1 Reachability games ................................. 43 6.2 Buchi¨ games ..................................... 48 7 Summary .......................................... 49 1 Chapter 1 Introduction The formalism of stochastic games is an important tool in the area of formal verification. It allows us to describe models with non-determinism, randomness, and a malicious opponent. These three forces move in turns a token along transitions between vertices in a transition system. The path of the token through the system determines the winner. It is either the first player, expressing our non-determinism or the second player playing as our opponent. The decisions of both players are resolved by specifying their strategies. The main concern of this thesis is the probability that the first player wins. Since there are two players on whose decisions the probability depends, we talk about the highest probability that the first player can achieve against any strategy of the opponent. And similarly, we also talk about the lowest probability that the second player can achieve against any strategy of the first player. If these two quantities equal, we call them the value of the game and say that the game is determined. An optimal strategy of the first player is a strategy that guarantees his winning with a probability greater or equal to the value. Likewise, an optimal strategy of the second player ensures a probability of loosing lower or equal to the value. The way to determine the winner is called a winning objective. It is a set of paths through the transition system that we define as winning for the first player. The games are zero-sum, the set of the paths winning for the second player is a complement to the set of the paths winning for the first player. In this text, we consider three types of winning objectives. In a game with a reachability winning objective, the first player tries to reach any vertex from a specified set of target vertices. In a game with a safety winning objective, the first player must never leave a specified set of safe vertices. Finally, in a game with a Buchi¨ winning objective, there is also a set of target vertices, and the first player tries to visit some target vertex infinitely many times. In this text, we go through several types of games and give a uniform overview an- swering the following question: Are the games of this type determined? Are there optimal strategies for both players? Is it possible to compute the optimal strategies or the value of a game in some efficient way? The types of games covered are as follows: the games on a finite transition system, the games on an infinite transition system with finite branching, and the games with infinite branching. We also distinguish the games with only one player (i.e. with no adversary) from the games with two players. And finally, we separately treat the three winning objectives mentioned earlier. 2 1. INTRODUCTION The question of determinacy has been answered by a general result of Martin [13]. It applies to all the types of games discussed here, therefore the results in this text are not new. And yet, we provide specific proofs for these types of games that may give more insight into the area. It has to be mentioned that the results we present are in most cases standard, the contribution of our text is mainly the rigorous uniform treatment. 3 Chapter 2 Definitions 2.1 Basics ≥0 By N, N0, Q, R, R , we denote the sets of natural numbers, natural numbers with zero, rational numbers, real numbers and non-negative real numbers, respectively. For every finite or countably infinite set A, the set of all finite words over A is denoted by A∗; the symbol A+ represents the set of all non-empty finite words over A. The length of a word w is written as w , and the letters in w are denoted by w(0); w(1); : : : ; w( w 1); j j j j − the last letter (assuming w > 0) is denoted by last(w). For two words u; w A∗, their j j 2 concatenation is written as uw. Concerning the basics of the probability theory, for a finite or countably infinite set A, ≥0 P a probability distribution on A is a function f : A R such that f(a) = 1.A ! a2A distribution is called positive if f(a) > 0 for each a A and Dirac if f(a) = 1 for some 2 a A. The set of all distributions on A is denoted by (A). 2 D A σ-algebra over a set X is a set 2X that includes X and is closed under comple- F ⊆ ments and countable unions. A measurable space is a pair (X; ) where is the set of all F F measurable subsets of X.A measure on space (X; ) is a function µ : R with the fol- F F! lowing properties: (1) µ( ) = 0, (2) for every E , µ(E) 0, and (3) for every countable ; 2 F ≥ P S collection Ai i2I of pairwise disjoint subsets of it holds µ(Ai) = µ( Ai).A f g F i2I i2I measure µ is a probability measure if µ(X) = 1.A probability space is a triple (X; ;P ), F where (X; ) is a measurable space and P is a probability measure on (X; ). The set X is F F called a sample space. 2.2 Markov chain and its probability space A transition system is a pair = (S; ) where S is a set of states and S S is S ! ! ⊆ × a transition relation such that each state s S has some outgoing transition s t. For 2 ! a state s, we denote by succ(s) the set of successor states s0 s s0 . By a notation v v0 f j 7! g 8 7! it holds ', we mean v0 succ(v) it holds '. 8 2 Definition 2.1. An infinite path (also called run) in is a countable sequence of states S w = s0s1s2 ::: such that for each i N0, si si+1.A finite path (also called history) is 2 ! a word w S+ such that for all 0 i < w 1, we have that w(i) w(i + 1). The symbol 2 ≤ j j − ! 4 2. DEFINITIONS S⊕ S+ denotes the set of all finite paths in , and the symbol S1 denotes the set of all ⊆ S infinite paths in . We say that a state m is reachable from a state n if there is a finite path S of the form mwn. Later we define a game as a transition system where some transitions are controlled by the players. For a fixed pair of strategies of the two players, we can convert a game into a Markov chain, which is a transition system where each transition is assigned a fixed positive probability. A Markov chain allows us to reason about probabilities of events such as A = the run begins with a finite path w or B = the run visits sometime a specified state m. Definition 2.2. A Markov chain is a triple = (M; ; P rob) where (M; ) is a transi- M ! ! tion system and P rob is a function that assigns to each m M a positive distribution on p 2 succ(m).

Load more