Solving Parity Games Through Fictitious Play

Imperial College London Department of Computing Solving Parity Games Through Fictitious Play Huaxin Wang April 2013 Supervised by Michael Huth Submitted in part fulfilment of the requirements for the degree of Doctor of Philosophy in Computing of Imperial College London and the Diploma of Imperial College London I hereby certify that I am the sole author of this thesis and to the best of my knowledge, my thesis does not infringe upon anyoneąŕs copyright nor violate any proprietary rights and that any ideas, techniques, quotations, or any other material from the work of other people included in my thesis, published or otherwise, are fully acknowledged in accordance with the standard referencing practices. The copyright of this thesis rests with the author and is made available under a Creative Commons Attribution Non-Commercial No Derivatives licence. Researchers are free to copy, distribute or transmit the thesis on the condition that they attribute it, that they do not use it for commercial purposes and that they do not alter, transform or build upon it. For any reuse or redistribution, researchers must make clear to others the licence terms of this work i Abstract The thesis aims to find an efficient algorithm for solving parity games. Parity games are graph-based, 0-sum, 2-person games with infinite plays. It is known that these games are determined: all nodes in these games are won by exactly one player. Solving parity games is equivalent to the model checking problem of modal mu-calculus; an efficient solution has important implications to program verification and controller synthesis. Although the decision problem of which player wins a given node is generally believed to be in PTIME, all known algorithms so far have been shown to run in (sub)exponential time. The design of existing algorithms either derives from the determinacy proof of parity games or from a purely graph theoretical perspective, using certain rank functions to iteratively search for an optimal solution. Since parity games are 2-person, 0-sum games, in this thesis I borrow ideas of game theory and investigate the viability of using fictitious play to solve them. Fictitious play is a method where two players choose strategies in strict alternation, and where these choices are “best responses” against the last k (so called bounded recall length) or against all strategies (unbounded recall length) of the other player chosen so far. I use this method to design an algorithm that can solve partity games and study its theoretical and experimental properties. For example, I prove that the basic algorithm solves fully connected games in polynomial time through a number of iterations that is bounded by a small constant. Although the proof is not extended to the general cases in the thesis, the basic algorithm performs demonstrably well against existing solvers in experiments over a large number and variety of games. In particular, the empirically obtained number of iterations that our basic algorithm requires appears to increase polynomially against the game sizes for all the games tested. Furthermore, the algorithm is conjectured to have a run time complexity bounded by (n4 log2(n)) and O I provide a discussion of strategy graphs and their emperically observed ii properties that motivates this conjecture. One caveat of fictitious play with bounded recall length is that the algorithm may fail to converge to the optimal solution due to the presence of non- optimal strategy cycles of length greater than 2. In this thesis, I observe that in practice such cases account for less than 0.01% of the games tested. Different cycle resolution methods are explored in the thesis to address this. One particular method combines our basic algorithm and the discrete strategy solver together such that the resulting algorithm is guaranteed to terminate with the optimal solution. Also, this combined solver shares the runtime performance of fictitious play. iii Contents 1 Introduction 1 1.1 Model Checking and Parity Games . 1 1.2 Motivation and Objectives . 4 1.3 Contributions . 5 1.4 Structure of This Thesis . 5 2 Background to Parity Games 7 2.1 Parity Games . 7 2.1.1 How a Parity Game is Played . 7 2.2 Known Complexity Results and Some Solvers . 12 2.2.1 Zielonka’s Algorithm . 13 2.2.2 Discrete Strategy Improvement . 19 3 Background to Game Theory 30 3.1 Equilibria . 31 3.2 Pure vs. Mixed Strategies . 33 3.3 Forms of Games . 36 3.3.1 Normal Form . 36 3.3.2 Extensive Form . 37 3.4 Normal Form and Fictitious Play . 38 3.4.1 The Fictitious Play method FP* . 41 3.4.2 Total vs. Fixed Length Recalls . 45 3.5 Extensive Form and Backward Induction . 51 4 Experimental Infrastructure 55 4.1 Tools, Softwares and Environment . 55 4.2 Test Data . 56 4.2.1 Potential Bias in Random Games . 58 iv 4.3 A Query Language for Parity Games . 59 4.3.1 Algebra of preprocessors . 61 4.4 The workbench as a webservice . 68 4.4.1 Software architecture. 68 4.4.2 Data model. 69 4.4.3 User session. 70 4.4.4 Parser and query optimization. 71 4.4.5 Using the workbench . 73 5 FP Algorithms for Parity Games 76 5.1 FP* for Parity Games . 76 5.1.1 Translation into 0-sum Games . 76 5.1.2 0-sum game on average payoff . 78 5.2 FP* shadowed by FP1 . 80 5.2.1 Rate of Convergence . 80 5.3 FP1 and Efficient Best Replies . 87 5.3.1 0-sum on Payoff Vectors . 88 5.3.2 Selection Criteria for BR . 91 5.4 Best Response Implementation . 97 5.4.1 Attracting to Self Reachable Dominant Priorities . 98 5.4.2 Degeneracy and Valuation Refinement . 99 5.4.3 Payoff Interval Augmented Valuation . 103 5.5 Experiments . 109 6 Strategy Graph Implications on FP1 116 6.1 Strategy Graph . 116 6.1.1 Levels of Optimality . 124 6.2 Depth of Strategy Graph . 127 6.2.1 Fully Connected Games . 128 6.2.2 Non Fully Connected Games . 149 7 Cycle Resolution with FP1 155 7.1 Cycle Detection . 155 7.2 Randomization . 156 7.3 Static Analysis . 157 7.3.1 Strategy enforcement on local convergence . 158 7.3.2 Prune strictly dominated strategies . 163 v 7.4 Combining best reply and strategy improvement . 164 7.4.1 Context switching on interval update . 164 7.5 Experiment and Findings . 167 7.5.1 Context switching on detection of cycles . 170 8 Conclusion and Future Work 176 8.1 Conclusions . 176 8.1.1 The Workbench . 177 8.1.2 Design of New Algorithms and Key Results . 178 8.2 Future Work . 180 Bibliography 184 vi List of Figures 1.1 Expressiveness of LTL, CTL, CTL* and modal mu-calculus, a good discussion of the relationship of LTL, CTL and CTL* can be found in the paper [11] by E.M. Clarke and A. Draghicescu. 2 2.1 In this sample game, vertices are identified by v0 through v5. Circles are player 0’s vertices, boxes are player 1’s. The priority of each vertex is shown in the square brackets. Winning region and winning strategy of player 0 is highlighted in green; winning region and winning strategy of player 1 is highlighted in red. 9 2.2 Zielonka’s algorithm . 18 2.3 Illustration of pre-order over player 0 strategies. 24 2.4 Generic Strategy Improvement Algorithm [26] . 24 2.5 Discrete Strategy Improvement Algorithm [26] . 28 2.6 Subvaluation procedure [26] . 29 3.1 Games presented in normal form . 36 3.2 Game Examples in Extensive Form . 39 3.3 The method of Fictitious Play for game matrix M . 42 3.4 Strategy graphs of FP1 for Example 3.1.2 and Example 3.2.3 45 3.5 Strategy graphs of FP2 for Example 3.2.3, equilibrium strategies of player 0 are coloured in green, no equilibrium strategy of player 1 is detected . 46 3.6 Strategy graphs of FP4 for Example 3.2.3, equilibrium strategies of player 0 are coloured in green, and equilibrium strategies of player 1 are coloured in red. 47 3.7 FP modified for termination with bounded recall lengths . 48 3.8 A game of pure equilibrium but with non-converging cycles when running in FP1 ....................... 50 vii 3.9 Search path of backward induction working with extensive form of the game in Example 2.1 starting from vertex V0 . 54 4.1 Comparing ’hardness’ of games with 1024 vertices on different out-degree using the fictitious play algorithm. 57 4.2 An 8-vertices parity game with 9 arcs, and its solution. All vertices are won by player 1, and so the solution consists only of her strategy, indicated by boldface arcs. 62 4.3 Example query patterns, instantiable with preprocessors p and q. ................................. 66 4.4 Subset (relevant for this thesis) of query language implemented in our workbench. 66 4.5 Overall architecture of the query engine for our workbench. And the typical sequence of interactions between user and tool. 70 4.6 A typical user session on the query server, (1) the user enters a query; (2) the database of games is searched for a witness; (3) the interface then displays a game that refutes a universally quantified query or verifies an existentially quantified query (if applicable). 71 4.7 Specific 8-vertices games found using the query server. Witness (A) is solved by PROBE[0](A2;A3) but not by PROBE[2](A3); witness (B) is solved by A2;A3 but not by A1;A2;A3; witness (C) is solved by A2;A3 or by L(A;2A3) but not by L(L(A2;A3)); witness (D) is resilient to PROBE[3](A3).

Solving Parity Games Through Fictitious Play

Learning and Equilibrium

Improving Fictitious Play Reinforcement Learning with Expanding Models

Approximation Guarantees for Fictitious Play

Modeling Human Learning in Games

Lecture 10: Learning in Games Ramesh Johari May 9, 2007

On the Convergence of Fictitious Play Author(S): Vijay Krishna and Tomas Sjöström Source: Mathematics of Operations Research, Vol

Information and Beliefs in a Repeated Normal-Form Game

Arxiv:1911.08418V3 [Cs.GT] 15 Nov 2020 Minimax Point Or Nash Equilibrium If We Have: to Tune

Chronology of Game Theory

A Choice Prediction Competition for Market Entry Games: an Introduction

Actor-Critic Fictitious Play in Simultaneous Move Multistage Games Julien Pérolat, Bilal Piot, Olivier Pietquin

Quantal Response Methods for Equilibrium Selection in 2×2 Coordination Games