Game Theory Approach in Multi-Agent Resources Sharing Tangui Le Gléau, Xavier Marjou, Tayeb Lemlouma, Benoît Radier

Game Theory Approach in Multi-agent Resources Sharing Tangui Le Gléau, Xavier Marjou, Tayeb Lemlouma, Benoît Radier To cite this version: Tangui Le Gléau, Xavier Marjou, Tayeb Lemlouma, Benoît Radier. Game Theory Approach in Multi- agent Resources Sharing. 25th IEEE Symposium on Computers and Communications (ISCC), Jul 2020, Rennes, France. hal-02901236 HAL Id: hal-02901236 https://hal.archives-ouvertes.fr/hal-02901236 Submitted on 17 Jul 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Game Theory Approach in Multi-agent Resources Sharing Tangui Le Gleau´ Xavier Marjou Tayeb Lemlouma Benoit Radier Orange Labs Orange Labs IRISA Orange Labs Lannion, France Lannion, France Lannion, France Lannion, France [email protected] [email protected] [email protected] [email protected] Abstract—In multiple real life situations involving several network operators can exchange resources to reach an optimal agents, cooperation can be beneficial for all. For example, situation and we consider that their personal interest are driven some telecommunication or electricity providers may cooperate only by their personal utility (which can for example be in order to address occasional resources needs by giving to coopetitors some quantities of their own surplus while expecting considered as the quality of experience in telecommunications in return a similar service. However, since agents are a priori area). We introduce three major assumptions in the context. egoist, the risk of being exploited is high. First, there is no main regulator, it is impossible for a controller In this work, we propose to model this kind of situations as to compute optimal transactions. Secondly, due to strategic a social dilemma (a situation where Nash Equilibrium is non issues, agents don’t share all their personal state (i.e. the optimal) in which each agent knows only its own state. We quantities of their under/overused resources), then they have to design an algorithm modelling the agents whose goal is to make transactions in order to augment their own utility. The algorithm communicate only a partial state to other agents. Finally, each needs to be robust to defection and encourage cooperation. agent is assumed to be selfish, unlike, for example, consensus Our framework modelling each agent consists in iterations di- optimization problems where each node is encouraged to vided in four major steps: the communication of demands/needs, cooperate. This last assumption is very important as being the detection of opponent cooperation, the cooperation response exploited by a defector has an utility cost and reaching a policy and finally the allocation of resources. In this paper, we focus on the cooperation response policy. consensus can be longer than other consensus algorithms due We propose a new version of tit-for-tat and we evaluate it the risk of exploitation. with metrics such as safety and incentive-compatibility. Several experiments are performed and confirm the relevance of our Games with partial information are generally solved with improvement. frameworks involving consensus optimisation issues which Index Terms—Multi-Agent System, Game Theory, Social has been well studied in literature in optimisation [6]–[10] or Dilemma synchronisation [11], [12]. In our work, we tackle the issue of self-interested agents, which means that agents have to reach a I. INTRODUCTION consensus while taking into account the fact that others agents Sharing resources or services between multiple coopetitors are a priori selfish. with autonomous agents is very common in industrial use We propose to formulate the problem as a non-cooperative cases. Markov game and show that it can be viewed as a social For example, in telecommunications, sharing resources be- dilemma, i.e. a situation where Nash equilibria are non-optimal tween operators has been suggested, in particular with the ar- : agents have no incentive to cooperate despite the fact that rival of the next generation of telecom (5G) in order to extend mutual cooperation is the optimal global strategy. This typical and improve capacity and coverage of operators connectivity. game theory case has been well studied recently [13]–[17], A well-suited model for exchanging connectivity resources involving in particular Reinforcement Learning Multi-Agent is the framework called Licensed Shared Access (LSA) [1] systems. which aims at optimising spectrum utilisation. Several pre- In section II, we define the problem as a multi-agent system vious works address the issue of LSA spectrum sharing, with different items to be shared whose marginal utility is generally with auction mechanisms [2]–[5]. In particular, some assumed to be decreasing function (i.e. utility function is of these works study truthfull mechanisms (such as Vickrey- concave). The game theory issues naturally raised by the Clarke-Groves mechanism) which has the particularity to have problem are described in section III. To address this sharing good properties (fairness, incentive-compatibility). This kind problem, we propose an algorithm based on Tit-for-Tat (TFT), of mechanisms involves financial transactions. We instead known to be robust to iterated dilemmas. We present the focus on addressing the issue without money and we follow architecture of our model in section IV. Some numerical an utilitarian way, in particular to be able to deal with services simulations are performed in section V to evaluate some key which can’t be shared financially. In this paper, we assume that properties of our algorithm such as efficiency, speed, incentive- 978-1-7281-8086-1/20/$31.00 ©2020 IEEE compatibility and safety. II. PROBLEM FORMULATION A. Nash Equilibrium We consider a game with N agents A ,...,A and Let us begin by defining a strategy, it is a function that maps 1 N 2I M items B ,...,B . To simplify the notation, we then personal states (s(i)) to actions (quantities allocated to others 1 M 2K consider that and are confounded with sets [0,..,N 1] agents): I K − and [0,..,M 1]. is the state set which corresponds to ⇡ : s(i) (X ) (4) − S i 7! k i,j agents resources: at each time t, the state is defined by (i) If G (⇡i) is the payoff of agent Ai, a joint strategy (⇡i⇤)i s(t)= s (t), i k , where s (t) corresponds 2I { i,k 8 2I8 2K} i,k is said to be a Nash Equilibrium if [18]: to the quantity of item Bk owned by agent Ai. We assume (i) that the utility function of each agent Ai is f (s(t)) defined (i) (i) i , ⇡i,G (⇡i⇤,⇡⇤ i) G (⇡i,⇡⇤ i) by: 8 2I 8 − ≥ − (5) with ⇡ i =[⇡0,...,⇡i 1,⇡i+1,...,⇡N 1] − − − M 1 (i) − (i) B. Social Dilemmas f (s(t)) = f (si,k(t)) (1) k In the following section, we model our exchange problem kX=0 (i) as a sequential social dilemma [13]. Before defining the multi- where each fk is considered monotonically increasing agents continuous version, let us consider the case with N =2 and concave due to the assumption of decreasing marginal agents and with simply two possible actions: cooperation and utility of resource. defection. We can establish a payoff matrix of the game as follows: At each time t, each agent A executes an action a(i)(t) i Cooperate Defect which is a set of transactions: Cooperate R,R S,T a(i)(t)=(u(1),...,u(m)) Defect T,S P,P where u(k) R is a transaction (give one quantity Table I: Payoffs of agent A (left) and B (up) in Social 2K⇥I⇥ of one item to one agent). Dilemma To simplify, we consider that actions are gathered in a joint Such a game is called social dilemma if the following action X =(X1,...,XM ) where each Xk N (R) sums up 2M inequalities are verified: the transactions of item Bk and is then anti-symmetric (giving R>P (1) ∆ to another agent means receiving ∆ from him). • − R>S(2) The transition function maps states and actions with next • T at least one of these two inequalities: states by executing transactions: • – T>R: greed (3a) – P>S: fear (3b) s(t + 1) = (s(t),X =(X ,...,X )) (2a) T 1 M (1) means that mutual cooperation is better than mutual N 1 − defection and (2) that mutual cooperation is better than being s (t + 1) = s (t)+ (X ) (2b) i,k i,k k i,j exploited. If such a game is iterated, it is called Sequential j=0 X Social Dilemma (SSD), then it is relevant to add a condition: Finally, we assume a game with partial observation: at R> 1 (S + T ): mutual cooperation is better than t A • 2 each time , agent i observes a part of the total state equiprobable different choice (4) (s(t),i)= s (t), k (X ) (t0), j, t0 <t . O { i,k 8 2K}[{ k i,j 8 8 } Social dilemma admit non-optimal Nash equilibria in particular (Defect, Defect) in Prisoner’s Dilemma (where The objective is to maximise the social welfare which is the greed (3a) and fear (3b) are verified). sum of utilities: N 1 − max f (i)(s(t)) (3) C. Continuous Multi-Agents Social Dilemmas i=0 X Let us introduce the case with N 2 agents. Each agent where each agent Ai wants to maximise independently its ≥ own utility function f (i)(s(t)) which leads to a non-optimal is free to cooperate or not with all other agents.

Load more