Markovian Competitive and Cooperative Games with Applications to Communications
Total Page:16
File Type:pdf, Size:1020Kb
DISSERTATION in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy from University of Nice Sophia Antipolis Specialization: Communication and Electronics Lorenzo Maggi Markovian Competitive and Cooperative Games with Applications to Communications Thesis defended on the 9th of October, 2012 before a committee composed of: Reporters Prof. Jean-Jacques Herings (Maastricht University, The Netherlands) Prof. Roberto Lucchetti, (Politecnico of Milano, Italy) Examiners Prof. Pierre Bernhard, (INRIA Sophia Antipolis, France) Prof. Petros Elia, (EURECOM, France) Prof. Matteo Sereno (University of Torino, Italy) Prof. Bruno Tuffin (INRIA Rennes Bretagne-Atlantique, France) Thesis Director Prof. Laura Cottatellucci (EURECOM, France) Thesis Co-Director Prof. Konstantin Avrachenkov (INRIA Sophia Antipolis, France) THESE pr´esent´ee pour obtenir le grade de Docteur en Sciences de l’Universit´ede Nice-Sophia Antipolis Sp´ecialit´e: Automatique, Traitement du Signal et des Images Lorenzo Maggi Jeux Markoviens, Comp´etitifs et Coop´eratifs, avec Applications aux Communications Th`ese soutenue le 9 Octobre 2012 devant le jury compos´ede : Rapporteurs Prof. Jean-Jacques Herings (Maastricht University, Pays Bas) Prof. Roberto Lucchetti, (Politecnico of Milano, Italie) Examinateurs Prof. Pierre Bernhard, (INRIA Sophia Antipolis, France) Prof. Petros Elia, (EURECOM, France) Prof. Matteo Sereno (University of Torino, Italie) Prof. Bruno Tuffin (INRIA Rennes Bretagne-Atlantique, France) Directrice de Th`ese Prof. Laura Cottatellucci (EURECOM, France) Co-Directeur de Th`ese Prof. Konstantin Avrachenkov (INRIA Sophia Antipolis, France) Abstract In this dissertation we deal with the design of strategies for agents interacting in a dynamic environment. The mathematical tool of Game Theory (GT) on Markov Decision Processes (MDPs) is adopted. The agents’ strategies control both the transition probabilities among the states and the rewards earned by each agent. Rewards are geometrically discounted over time. We first study the competitive case, in which two agents act selfishly. The game is zero-sum and the agents control disjoint sets of states. We devise two algorithms to compute the Nash equilibrium for all discount factors close enough to 1. Then we consider the long-run cooperative case, in which agents can coordinate their strategies. We utilize our two algorithms to compute the value of the coalitions in a routing game, in which several providers share the same network and control the routing in disjoint sets of nodes. Next we deal with dynamic cooperative GT on MDPs, in which coalitions can form throughout the game. We show how to enforce a common agreement for which the pay-off is distributed at each state, in a global optimum way and such that no coalition is ever enticed to break the agreement. We apply these concepts to a wireless multiple access channel, in which the channel is quasi-static. We assign the rate to users in each channel state in a fair and satisfactory manner. Next we provide three methods to compute a confidence interval for Shapley value on Markovian games. Such methods have polynomial complexity in the number of agents, while the complexity of the exact computation is exponential. Two methods are still valid when the values in each state are learned during the game. Finally we assess the performance of two strategies to dynamically select the frequency band to communicate on. We exploit an MDP formulation with uncountable state space. i ii R´esum´e R´esum´e Dans cette th`ese, nous ´etudions la th´eorie des jeux sur les Processus de D´ecision de Markov (PDM). Des agents interagissent dans un environnement dynamique, mod´elis´ee par une chaˆıne de Markov. Les strat´egies des agents contrˆolent les probabilit´es de transition entre les ´etats et les r´ecompenses gagn´ees par chaque agent. Les r´ecompenses sont g´eom´etriquement pond´er´ees avec le temps. Nous ´etudions d’abord le cas comp´etitif. Deux agents agissent ´ego¨ıstement, le jeu est `asomme nulle et les agents contrˆolent des ensembles disjoints d’´etats. Nous ´elaborons deux algorithmes pour calculer l’´equilibre de Nash pour tous les facteurs de pond´eration suffisamment proche de 1. Puis nous consid´erons le cas statique coop´eratif, dans lequel les joueurs peuvent coordonner leurs strat´egies. Nous utilisons ces deux algorithmes pour calculer la valeur des coalitions dans un jeu de routage. Plusieurs fournisseurs partagent le mˆeme r´eseau. Ensuite, nous traitons des jeux dy- namiques et coop´eratifs sur PDM. Des coalitions peuvent se former tout au long du jeu. Nous montrons comment distribuer la r´ecompense dans chaque ´etat, de sorte que la solution est optimale pour la communaut´eet tous les agents sont satisfaits pendant le jeu. Nous appliquons ces concepts `aun canal `aacc`es multiple par fil avec un canal quasi-statique. Nous assignons le d´ebit du codage dans chaque ´etat du canal d’une mani`ere ´equitable. En- fin, nous proposons trois m´ethodes afin de calculer un intervalle de confiance pour la valeur de Shapley sur les jeux de Markov. Ces m´ethodes ont une complexit´epolynomiale en le nombre d’agents, tandis que la complexit´edu calcul exact est exponentielle. Enfin, nous ´evaluons la performance de deux strat´egies pour s´electionner dynamiquement la bande de fr´equence de com- munication. Nous exploitons une formulation PDM avec un espace d’´etats d´enombrable. iii iv R´esum´e Contents Abstract................................. i R´esum´e ................................. i Contents................................. v Acronyms ................................ ix Notations ................................ xi Acknowledgements xiii Introduction 1 Introduction (en fran¸cais) 11 1 Markov Decision Processes (MDPs) and Game Theory: a brief overview 21 1.1 Discrete time Markov chains . 21 1.2 Linearprogramming . 22 1.3 Markov Decision Processes . 23 1.3.1 Optimality equation and linear programming . 24 1.3.2 Value iteration . 25 1.3.3 Blackwell optimality . 26 1.4 GameTheory........................... 27 1.4.1 Competitive Game Theory with two players . 27 1.4.2 Static Cooperative games . 27 2 Competitive and Long-run Cooperative MDPs 31 2.1 Algorithms for uniform optimal strategies in two-player zero- sum Competitive MDPs with perfect information . 31 2.1.1 Introduction ....................... 31 2.1.2 Themodel ........................ 33 2.1.3 The ordered field of rational functions with real coef- ficients........................... 35 2.1.4 Computation of Blackwell optimum policy in MDPs . 37 2.1.5 Uniform optimality in perfect information games . 39 2.1.6 Algorithm description . 41 2.1.7 Anexample........................ 45 v vi Contents 2.1.8 A lower complexity algorithm . 46 2.2 Transientgames ......................... 48 2.3 Conclusions ............................ 49 2.4 Stochastic Games for Cooperative Network RoutingandEpidemicSpread. 54 2.4.1 Introduction ....................... 54 2.4.2 Routingmodel ...................... 55 2.4.3 Routing Long-run Cooperative game . 55 2.4.4 Network design . 60 2.4.5 Hacker-Provider routing game . 60 2.4.6 Naturaldisaster . 61 2.4.7 Epidemicnetwork . 62 2.4.8 Conclusions........................ 62 2.4.9 Appendix ......................... 63 3 Dynamic Cooperative MDPs 65 3.1 Cooperative MDPs: Time Consistency, Greedy Players Sat- isfaction, and Cooperation Maintenance . 65 3.1.1 Introduction ....................... 65 3.1.2 Discounted Cooperative MDPs . 67 3.1.3 Cooperative Payoff Distribution Procedure . 72 3.1.4 TerminalFairness . 73 3.1.5 TimeConsistency . 75 3.1.6 Greedy Players Satisfaction . 77 3.1.7 Cooperation Maintenance . 81 3.1.8 Strictly convex static games . 87 3.1.9 Transition probabilities not depending on the actions . 88 3.1.10 Conclusions . .. .. .. .. .. .. .. 89 3.2 Dynamic Rate Allocation in Markovian Quasi-Static Multiple AccessChannels ......................... 91 3.2.1 Introduction ....................... 91 3.2.2 SystemModel....................... 93 3.2.3 Optimal and fair rate allocation design . 95 3.2.4 Optimal and Satisfactory allocations: A game-theoretical approach . 105 3.2.5 Conclusions . .. .. .. .. .. .. .. 111 3.3 Confidence intervals for the Shapley-Shubik power index in Markoviangames. .. .. .. .. .. .. .. 112 3.3.1 Introduction . .. .. .. .. .. .. .. 112 3.3.2 Markovian Model and Background results . 115 3.3.3 Motivations of the Markovian model . 117 3.3.4 Complexity of deterministic algorithms . 118 3.3.5 Randomized static approach . 119 3.3.6 Randomized dynamic approaches . 122 Contents vii 3.3.7 Comparison among the proposed approaches . 126 3.3.8 Complexity of confidence intervals . 129 3.3.9 Conclusions .. .. .. .. .. .. .. 130 3.3.10 Appendix ......................... 131 4 Restless Bandits for Dynamic Channel Section: An MDP formulation 137 4.0.11 Introduction . 137 4.0.12 Model ...........................139 4.0.13 Single user: MDP formulation . 139 4.0.14 Multi user: Competitive MDP formulation . 144 4.0.15 Conclusions . 147 List of publications 151 Conclusions and future perspectives 153 viii Contents List of Acronyms AR(p) : Auto-Regressive model of order p AWGN : Additive White Gaussian Noise Co : Core set solution CM : Cooperation Maintaining set solution DCGT : Dynamic Cooperative Game Theory HMC : Homogeneous Markov Chain MAB : Multi Armed Bandit MAC : Mutiple Access Channel MDP : Markov Decision Process MDP-CPDP : Cooperative Pay-off Distribution Procedure on MDP NTU : Non-Transferable Utility SCGT : Static Cooperative Game Theory Sh : Shapley value