Markovian Competitive and Cooperative Games with Applications to Communications

DISSERTATION in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy from University of Nice Sophia Antipolis Specialization: Communication and Electronics Lorenzo Maggi Markovian Competitive and Cooperative Games with Applications to Communications Thesis defended on the 9th of October, 2012 before a committee composed of: Reporters Prof. Jean-Jacques Herings (Maastricht University, The Netherlands) Prof. Roberto Lucchetti, (Politecnico of Milano, Italy) Examiners Prof. Pierre Bernhard, (INRIA Sophia Antipolis, France) Prof. Petros Elia, (EURECOM, France) Prof. Matteo Sereno (University of Torino, Italy) Prof. Bruno Tuffin (INRIA Rennes Bretagne-Atlantique, France) Thesis Director Prof. Laura Cottatellucci (EURECOM, France) Thesis Co-Director Prof. Konstantin Avrachenkov (INRIA Sophia Antipolis, France) THESE présentée pour obtenir le grade de Docteur en Sciences de l’Universitéde Nice-Sophia Antipolis Spécialité: Automatique, Traitement du Signal et des Images Lorenzo Maggi Jeux Markoviens, Compétitifs et Coopératifs, avec Applications aux Communications Thèse soutenue le 9 Octobre 2012 devant le jury composéde : Rapporteurs Prof. Jean-Jacques Herings (Maastricht University, Pays Bas) Prof. Roberto Lucchetti, (Politecnico of Milano, Italie) Examinateurs Prof. Pierre Bernhard, (INRIA Sophia Antipolis, France) Prof. Petros Elia, (EURECOM, France) Prof. Matteo Sereno (University of Torino, Italie) Prof. Bruno Tuffin (INRIA Rennes Bretagne-Atlantique, France) Directrice de Thèse Prof. Laura Cottatellucci (EURECOM, France) Co-Directeur de Thèse Prof. Konstantin Avrachenkov (INRIA Sophia Antipolis, France) Abstract In this dissertation we deal with the design of strategies for agents interacting in a dynamic environment. The mathematical tool of Game Theory (GT) on Markov Decision Processes (MDPs) is adopted. The agents’ strategies control both the transition probabilities among the states and the rewards earned by each agent. Rewards are geometrically discounted over time. We first study the competitive case, in which two agents act selfishly. The game is zero-sum and the agents control disjoint sets of states. We devise two algorithms to compute the Nash equilibrium for all discount factors close enough to 1. Then we consider the long-run cooperative case, in which agents can coordinate their strategies. We utilize our two algorithms to compute the value of the coalitions in a routing game, in which several providers share the same network and control the routing in disjoint sets of nodes. Next we deal with dynamic cooperative GT on MDPs, in which coalitions can form throughout the game. We show how to enforce a common agreement for which the pay-off is distributed at each state, in a global optimum way and such that no coalition is ever enticed to break the agreement. We apply these concepts to a wireless multiple access channel, in which the channel is quasi-static. We assign the rate to users in each channel state in a fair and satisfactory manner. Next we provide three methods to compute a confidence interval for Shapley value on Markovian games. Such methods have polynomial complexity in the number of agents, while the complexity of the exact computation is exponential. Two methods are still valid when the values in each state are learned during the game. Finally we assess the performance of two strategies to dynamically select the frequency band to communicate on. We exploit an MDP formulation with uncountable state space. i ii Résumé Résumé Dans cette thèse, nous étudions la théorie des jeux sur les Processus de Décision de Markov (PDM). Des agents interagissent dans un environnement dynamique, modélisée par une chaˆıne de Markov. Les stratégies des agents contrôlent les probabilités de transition entre les états et les récompenses gagnées par chaque agent. Les récompenses sont géométriquement pondérées avec le temps. Nous étudions d’abord le cas compétitif. Deux agents agissent égo¨ıstement, le jeu est àsomme nulle et les agents contrôlent des ensembles disjoints d’états. Nous élaborons deux algorithmes pour calculer l’équilibre de Nash pour tous les facteurs de pondération suffisamment proche de 1. Puis nous considérons le cas statique coopératif, dans lequel les joueurs peuvent coordonner leurs stratégies. Nous utilisons ces deux algorithmes pour calculer la valeur des coalitions dans un jeu de routage. Plusieurs fournisseurs partagent le même réseau. Ensuite, nous traitons des jeux dy- namiques et coopératifs sur PDM. Des coalitions peuvent se former tout au long du jeu. Nous montrons comment distribuer la récompense dans chaque état, de sorte que la solution est optimale pour la communautéet tous les agents sont satisfaits pendant le jeu. Nous appliquons ces concepts àun canal àaccès multiple par fil avec un canal quasi-statique. Nous assignons le débit du codage dans chaque état du canal d’une manière équitable. En- fin, nous proposons trois méthodes afin de calculer un intervalle de confiance pour la valeur de Shapley sur les jeux de Markov. Ces méthodes ont une complexitépolynomiale en le nombre d’agents, tandis que la complexitédu calcul exact est exponentielle. Enfin, nous évaluons la performance de deux stratégies pour sélectionner dynamiquement la bande de fréquence de communication. Nous exploitons une formulation PDM avec un espace d’états dénombrable. iii iv Résumé Contents Abstract................................. i Résumé ................................. i Contents................................. v Acronyms ................................ ix Notations ................................ xi Acknowledgements xiii Introduction 1 Introduction (en fran¸cais) 11 1 Markov Decision Processes (MDPs) and Game Theory: a brief overview 21 1.1 Discrete time Markov chains . 21 1.2 Linearprogramming . 22 1.3 Markov Decision Processes . 23 1.3.1 Optimality equation and linear programming . 24 1.3.2 Value iteration . 25 1.3.3 Blackwell optimality . 26 1.4 GameTheory........................... 27 1.4.1 Competitive Game Theory with two players . 27 1.4.2 Static Cooperative games . 27 2 Competitive and Long-run Cooperative MDPs 31 2.1 Algorithms for uniform optimal strategies in two-player zero- sum Competitive MDPs with perfect information . 31 2.1.1 Introduction ....................... 31 2.1.2 Themodel ........................ 33 2.1.3 The ordered field of rational functions with real coef- ficients........................... 35 2.1.4 Computation of Blackwell optimum policy in MDPs . 37 2.1.5 Uniform optimality in perfect information games . 39 2.1.6 Algorithm description . 41 2.1.7 Anexample........................ 45 v vi Contents 2.1.8 A lower complexity algorithm . 46 2.2 Transientgames ......................... 48 2.3 Conclusions ............................ 49 2.4 Stochastic Games for Cooperative Network RoutingandEpidemicSpread. 54 2.4.1 Introduction ....................... 54 2.4.2 Routingmodel ...................... 55 2.4.3 Routing Long-run Cooperative game . 55 2.4.4 Network design . 60 2.4.5 Hacker-Provider routing game . 60 2.4.6 Naturaldisaster . 61 2.4.7 Epidemicnetwork . 62 2.4.8 Conclusions........................ 62 2.4.9 Appendix ......................... 63 3 Dynamic Cooperative MDPs 65 3.1 Cooperative MDPs: Time Consistency, Greedy Players Sat- isfaction, and Cooperation Maintenance . 65 3.1.1 Introduction ....................... 65 3.1.2 Discounted Cooperative MDPs . 67 3.1.3 Cooperative Payoff Distribution Procedure . 72 3.1.4 TerminalFairness . 73 3.1.5 TimeConsistency . 75 3.1.6 Greedy Players Satisfaction . 77 3.1.7 Cooperation Maintenance . 81 3.1.8 Strictly convex static games . 87 3.1.9 Transition probabilities not depending on the actions . 88 3.1.10 Conclusions . .. .. .. .. .. .. .. 89 3.2 Dynamic Rate Allocation in Markovian Quasi-Static Multiple AccessChannels ......................... 91 3.2.1 Introduction ....................... 91 3.2.2 SystemModel....................... 93 3.2.3 Optimal and fair rate allocation design . 95 3.2.4 Optimal and Satisfactory allocations: A game-theoretical approach . 105 3.2.5 Conclusions . .. .. .. .. .. .. .. 111 3.3 Confidence intervals for the Shapley-Shubik power index in Markoviangames. .. .. .. .. .. .. .. 112 3.3.1 Introduction . .. .. .. .. .. .. .. 112 3.3.2 Markovian Model and Background results . 115 3.3.3 Motivations of the Markovian model . 117 3.3.4 Complexity of deterministic algorithms . 118 3.3.5 Randomized static approach . 119 3.3.6 Randomized dynamic approaches . 122 Contents vii 3.3.7 Comparison among the proposed approaches . 126 3.3.8 Complexity of confidence intervals . 129 3.3.9 Conclusions .. .. .. .. .. .. .. 130 3.3.10 Appendix ......................... 131 4 Restless Bandits for Dynamic Channel Section: An MDP formulation 137 4.0.11 Introduction . 137 4.0.12 Model ...........................139 4.0.13 Single user: MDP formulation . 139 4.0.14 Multi user: Competitive MDP formulation . 144 4.0.15 Conclusions . 147 List of publications 151 Conclusions and future perspectives 153 viii Contents List of Acronyms AR(p) : Auto-Regressive model of order p AWGN : Additive White Gaussian Noise Co : Core set solution CM : Cooperation Maintaining set solution DCGT : Dynamic Cooperative Game Theory HMC : Homogeneous Markov Chain MAB : Multi Armed Bandit MAC : Mutiple Access Channel MDP : Markov Decision Process MDP-CPDP : Cooperative Pay-off Distribution Procedure on MDP NTU : Non-Transferable Utility SCGT : Static Cooperative Game Theory Sh : Shapley value

Markovian Competitive and Cooperative Games with Applications to Communications

Probabilistic Goal Markov Decision Processes∗

1 Introduction

Arxiv:2004.13965V3 [Cs.LG] 16 Feb 2021 Optimal Policy

A Brief Taster of Markov Decision Processes and Stochastic Games

Markov Decision Processes

Factored Markov Decision Process Models for Stochastic Unit Commitment

Capturing Financial Markets to Apply Deep Reinforcement Learning

Markov Decision Processes and Dynamic Programming 1 The

An Introduction to Temporal Difference Learning

Markov Decision Processes

Reinforcement Learning and Markov Decision Processes (Mdps)

MARKOV DECISION PROCESSES Abstract: the Theory of Markov