Equilibria in Finite Games

Equilibria in Finite Games Thesis submitted in accordance with the requirements of the University of Liverpool for the degree of Doctor in Philosophy by Anshul Gupta Department of Computer Science November 2015 Supervisory team: Dr. Sven Schewe (Primary) and Prof. Piotr Krysta (Secondary) Examiner committee: Prof. Thomas Brihaye and Prof. Karl Tuyls ii Contents Notations viii Abstract xi Acknowledgements xiii Preface xiv 1 Introduction 1 1.1 Bi-matrix Games . .2 1.2 Finite games of infinite duration . .3 1.3 Equilibrium concepts in game theory . .6 1.3.1 Solution concepts . .7 1.4 Contribution . 10 1.5 Related work . 12 1.6 Outline of this thesis . 14 2 Definitions 15 2.1 Leader strategy profiles . 16 2.2 Incentive strategy profiles . 16 3 Bi-Matrix Games 19 3.1 Abstract . 19 3.2 Introduction . 20 3.3 Motivational examples . 23 3.3.1 Related Work . 25 3.4 Definitions . 27 3.5 Incentive equilibria in bi-matrix games . 28 3.6 Incentive Equilibria . 34 3.6.1 Existence of bribery stable strategy profiles . 34 3.6.2 Optimality of simple bribery stable strategy profiles . 35 3.6.3 Description of simple bribery stable strategy profiles . 35 3.6.4 Computing incentive equilibria . 36 3.6.5 Friendly incentive equilibria . 38 3.6.6 Friendly incentive equilibria in zero-sum games . 40 3.6.7 Monotonicity and relative social optimality . 41 iii 3.7 Secure incentive strategy profiles . 43 3.7.1 "-optimal secure incentive strategy profiles . 43 3.8 Secure incentive equilibria . 44 3.8.1 Constructing secure incentive equilibria { outline . 46 3.8.2 Existence of secure incentive equilibria . 47 3.8.3 Construction of secure incentive equilibria { Given a strategy j and a set Jloss .............................. 51 Extended constraint system LAPG(A;B) ........... 51 j;Jloss 3.8.4 For an unknown set Jloss ........................ 52 Estimating the value of κ ................... 53 Computing a suitable constant K ............... 53 3.9 Evaluation . 54 3.10 Discussion . 58 4 Mean-payoff Games 61 4.1 Abstract . 61 4.2 Introduction . 62 4.2.1 Motivational Examples . 64 4.2.2 Related Work . 68 4.3 Preliminaries . 69 4.4 Leader equilibria . 71 4.4.1 Superiority of leader equilibria . 71 4.4.2 Reward and punish strategy profiles for leader equilibria . 72 Linear programs for well behaved reward and punish strategy profiles .............. 74 From Q, S, and a solution to the linear programs to a well behaved reward and punish strategy profile .................... 76 Decision & optimisation procedures .......... 78 4.4.3 Reduction to two-player mean-payoff games . 79 4.5 Incentive Equilibria . 80 4.5.1 Canonical incentive equilibria . 80 4.5.2 Existence and construction of incentive equilibria . 83 4.5.3 Secure " incentive strategy profiles . 85 4.6 NP-hardness . 86 Zero-sum games ....................... 88 4.7 Implementation . 89 4.7.1 Strategy Improvement Algorithm . 89 4.7.2 Quantitative evaluation of mean-payoff games . 92 Solving 2MPGs . 93 Solving multi-player mean-payoff games ............ 93 4.7.3 Linear Programming problem . 94 Constraints on SCCs ........................ 94 4.7.4 Experimental Results . 95 4.8 Discussion . 97 5 Discounted sum games 99 iv 5.1 Abstract . 99 5.2 Introduction . 100 5.2.1 Related Work . 101 5.2.2 Contributions . 102 5.3 Preliminaries . 103 5.4 Leader and Nash equilibria . 104 5.5 Reward and punish strategy profiles in discounted sum games . 108 5.6 Constraints for finite pure reward and punish strategy profiles . 110 5.7 Equilibria with extended observations . 112 5.8 Discussion . 116 6 Summary and conclusions 119 6.1 Summary . 119 6.2 Conclusions and Future work . 121 A Appendix 125 A.1 Leader equilibria . 125 A.1.1 Computing simple leader equilibria . 125 A.1.2 Friendly leader equilibria . 126 Bibliography 129 v Illustrations List of Figures 1.1 A multi-player mean-payoff game. .5 1 1.2 A discounted-payoff game with discount factor 2 ................5 3.1 Prisoner's Dilemma in Extensive-form. 22 3.2 Incentive strategy profiles ⊇ Leader strategy profiles ⊇ Nash strategy profiles. ... 30 4.1 σ1 = :(ra _ rb), σ2 = :(ra _ rb _ ga _ gb): .................... 65 0 0 0 0 4.2 g = ga _ gb _ ga _ gb, r = ra _ rb _ ra _ rb _ ": ................. 65 0 0 0 0 4.3 σ1 = ra _ rb _ ga _ gb _ ", σ2 = σ1 _ ga _ gb: .................... 66 4.4 The rational environments (Figure 4.3) and the system (Figure 4.4), shown as automata that coordinate on joint actions. 66 4.5 The multi-player mean-payoff game from the properties from Figures 4.1,4.2 and 4.3,4.4. 67 4.6 Incentive equilibrium beats leader equilibrium beats Nash equilibrium. 67 4.7 Incentive equilibrium gives much better system utilisation. 68 4.8 An MMPG, where the leader equilibrium is strictly better than all Nash equilibria. 72 4.9 Secure equilibria. 85 4.10 Token-ring example. 96 4.11 Results for randomly generated MMPGs. 97 5.1 A discounted sum game with no memoryless Nash or leader equilibrium. ...... 100 1 5.2 A discounted sum game with discount factor 2 . ................... 104 5.3 General strategy profiles ⊇ Leader strategy profiles ⊇ Nash strategy profiles. .... 105 5.4 Increasing the memory helps. 105 5.5 Leader benefits from infinite memory. ........................ 107 5.6 Leader benefits from infinite memory in Nash equilibria. .............. 108 5.7 More memory states ) more strategies. ....................... 108 0 0 0 0 5.8 C1;C2::::::Cm are m conjuncts each with n variables and there are intermediate leader 0L0 nodes. A path through the satisfying assignment is shown here. ..... 111 5.9 Unobservability of deviation in mixed strategy with discount factor λ. ....... 113 5.10 Leader benefits from memory in mixed strategies. .................. 114 5.11 Use of incentives in discounted sum game. ...................... 117 List of Tables 1.1 A bi-matrix example. .2 1.2 Summary of the complexity results for different equilibria. ............. 12 vii 3.1 Equilibria in a bi-matrix game. 28 3.2 Prisoners' Dilemma. 29 3.3 An example where follower does not benefit from incentive eqilibrium. 30 3.4 Leader behaves friendly when her follower is also friendly. 30 3.5 An unfriendly follower suffers in a secure equilibria. 31 3.6 A variant of the prisoner's dilemma. 32 3.7 A Battle-of-Sexes game. 32 3.8 Increasing the payoff matrix for the follower by ................ 42 3.9 A simple bi-matrix game without a secure incentive equilibrium. 44 3.10 A variant of Battle-of-Sexes game . 50 3.11 Values using continuous payoffs in the range 0 to 1. 56 3.12 Values using integer payoffs in the range -10 to 10. 56 3.13 Average leader return and follower return in different equilibria. .......... 57 3.14 "Prisoners dilemma" payoff matrix. 57 3.15 Loss of inconsiderate follower. 59 viii Notations The following abbreviations and notations are found throughout this thesis: MPG Mean-payoff game 2MPG Two-player mean-payoff games MMPG Multi-player mean-payoff games DSG Discounted sum game 2DSG Two-player discounted sum games MDSG Multi-player discounted sum game DBA Deterministic Büchi automata SP Strategy profiles LSP Leader strategy profiles ISP Incentive strategy profiles PISP Perfectly incentivised strategy profiles SCC Strongly Connected Component Nash SP Nash strategy profiles NE Nash equilibria LE Leader equilibria IE Incentive equilibria ix FIE Friendly incentive equilibria SIE Secure incentive equilibria fpayoff Follower payoff in a strategy profile lpayoff Leader payoff in a strategy profile x Abstract This thesis studies various equilibrium concepts in the context of finite games of infinite duration and in the context of bi-matrix games. We considered the game settings where a special player { the leader { assigns the strategy profile to herself and to every other player in the game alike. The leader is given the leeway to benefit from deviation in a strategy profile whereas no other player is allowed to do so. These leader strategy profiles are asymmetric but stable as the stability of strategy profiles is considered w.r.t. all other players. The leader can further incentivise the strategy choices of other players by transferring a share of her own payoff to them that results in incentive strategy profiles. Among these class of strategy profiles, an 'optimal' leader resp. incentive strategy profile would give maximal reward to the leader and is a leader resp. incentive equilibrium. We note that computing leader and incentive equilibrium is no more expensive than computing Nash equilibrium. For multi-player non-terminating games, their complexity is NP complete in general and equals the complexity of computing two-player games when the number of players is kept fixed. We establish the use of memory and study the effect of increasing the memory size in leader strategy profiles in the context of discounted sum games. We discuss various follower behavioural models in bi-matrix games assuming both friendly follower and an adversarial follower. This leads to friendly incentive equilibrium and secure incentive equilibrium for the resp. follower behaviour. While the construction of friendly incentive equilibrium is tractable and straight forward the secure incentive equilibrium needs a constructive approach to establish their existence and tractability. Our overall observation is that the leader return in an.

Load more