Dynamics of Profit-Sharing Games
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Dynamics of Profit-Sharing Games John Augustine, Ning Chen, Edith Elkind, Angelo Fanelli, Nick Gravin, Dmitry Shiryaev School of Physical and Mathematical Sciences Nanyang Technological University, Singapore Abstract information about other players. Such agents may simply re- spond to their current environment without worrying about An important task in the analysis of multiagent sys- the subsequent reaction of other agents; such behavior is said tems is to understand how groups of selfish players to be myopic. Now, coalition formation by computationally can form coalitions, i.e., work together in teams. In limited agents has been studied by a number of researchers in this paper, we study the dynamics of coalition for- multi-agent systems, starting with the work of [Shehory and mation under bounded rationality. We consider set- Kraus, 1999] and [Sandholm and Lesser, 1997]. However, tings where each team’s profit is given by a concave myopic behavior in coalition formation received relatively lit- function, and propose three profit-sharing schemes, tle attention in the literature (for some exceptions, see [Dieck- each of which is based on the concept of marginal mann and Schwalbe, 2002; Chalkiadakis and Boutilier, 2004; utility. The agents are assumed to be myopic, i.e., Airiau and Sen, 2009]). In contrast, myopic dynamics of they keep changing teams as long as they can in- non-cooperative games is the subject of a growing body of crease their payoff by doing so. We study the prop- research (see, e.g. [Fabrikant et al., 2004; Awerbuch et al., erties (such as closeness to Nash equilibrium or to- 2008; Fanelli et al., 2008]). tal profit) of the states that result after a polyno- mial number of such moves, and prove bounds on In this paper, we merge these streams of research and ap- the price of anarchy and the price of stability of the ply techniques developed in the context of analyzing the dy- corresponding games. namics of non-cooperative games to coalition formation set- tings. In doing so, we depart from the standard model of games with transferable utility, which allows the players in 1 Introduction a team to share the payoff arbitrarily: indeed, such flexibility Cooperation and collaborative task execution are fundamen- will necessitate a complicated negotiation process whenever a tally important both for human societies and for multiagent player wants to switch teams. Instead, we consider three pay- systems. Indeed, many tasks are too complicated or resource- off models that are based on the concept of marginal utility, consuming to be executed by a single agent, and a collective i.e., the contribution that the player makes to his current team. effort is needed. Such settings are usually modeled using the Each of the payoff schemes, when combined with a coopera- framework of cooperative games, which specify the amount tive game, induces a non-cooperative game, whose dynamics of payoff that each subset of agents can achieve: when the can then be studied using the rich set of tools developed for game is played the agents split into teams (coalitions), and such games in recent years. the payoff of each team is divided among its members. We will now describe our payment schemes in more detail. The standard framework of cooperative game theory is We assume that we are given a concave cooperative game, static, i.e., it does not explain how the players arrive at a i.e., the values of the teams are given by a submodular func- particular set of teams and a payoff distribution. However, tion; the submodularity property means that a player is more understanding the dynamics of coalition formation is impor- useful when he joins a smaller team, and plays an important tant for many applications of cooperative games, and game role in our analysis. In our first scheme, the payment to each theorists have studied bargaining and coalition formation in agent is given by his marginal utility for his current team; by cooperative environments (see, e.g. [Chatterjee et al., 1993; submodularity, the total payment to the team members never Moldovanu and Winter, 1995; Yan, 2003]). Much of this exceeds the team’s value. This payment scheme rewards each research assumes that the agents are fully rational, i.e., can agent according to the value he creates; we will therefore call predict the consequences of their actions and maximize their these games Fair Value games. Our second scheme takes into (expected) utility based on these predictions. However, full account the history of the interaction: we keep track of the or- rationality is a strong assumption that is unlikely to hold in der in which the players have joined their teams, and pay each many real-life scenarios: first, the agents may not have the agent his marginal contribution to the coalition formed by the computational resources to infer their optimal strategies, and players who joined his current team before him. This ensures second, they may not be sophisticated enough to do so, or lack that the entire payoff of each team is fully distributed among 37 its members. Moreover, due to the submodularity property S by changing the strategy of player i from si to si, i.e., a player’s payoff never goes down as long as he stays with (S−i,si)=(s1,s2,...,si−1,si,si+1,...,sn). the same team. This payoff scheme is somewhat reminiscent Nash Equilibria and Dynamics Given a strategy profile of the reward schemes employed in industries with strong la- S =(s1,s2,...,sn), a strategy si ∈ Σi is an improvement bor unions; we will therefore refer to these games as Labor move for player i if ui(S−i,si) >ui(S); further, si is called Union games. Our third scheme can be viewed as a hybrid α i ui(S−i,s ) > (1 + α)ui(S) of the first two: it distributes the team’s payoff according to an -improvement move for if i , α> sb ∈ the players’ Shapley values, i.e., it pays each player his ex- where 0. A strategy i Σi is a best response for player i S pected marginal contribution to a coalition formed by its pre- in state if it yields the maximum possible payoff given u S ,sb ≥ decessors when players are reordered randomly; the resulting the strategy choices of the other players, i.e., i( −i i ) u S ,s s ∈ α games are called Shapley games. i( −i i) for any i Σi.An -best response move is both an α-improvement and a best response move. Our Contributions We study the equilibria and dynamics A (pure) Nash equilibrium is a strategy profile in which of the three games described above. We establish that all our every player plays her best response. Formally, S = games admit a Nash equilibrium. Further, we argue that in (s1,s2,...,sn) is a Nash equilibrium if for all i ∈ N and all these games the total profit of the system in a Nash equi- for any strategy s ∈ Σi we have ui(S) ≥ ui(S−i,s ).We librium, i.e., the sum of teams’ values, is within a factor of 2 i i denote the set of all (pure) Nash equilibria of a game G by from optimal. In addition, for the first two classes of games, NE(G). A profile S =(s ,...,sn) is called an α-Nash equi- a natural dynamic process quickly converges to a state that is 1 librium if no player can improve his payoff by more than a almost as good as a Nash equilibrium, and the best Nash equi- factor of (1+α) by deviating, i.e., (1+α)ui(S) ≥ ui(S−i,si) librium is in fact an optimal state. We also show that Labor for any i ∈ N and any ui ∈ Σi. The set of all α-Nash equilib- Union games have other desirable properties. α ria of G is denoted by NE (G).Inastrong Nash equilibrium, Related Work The games studied in this paper belong to no group of players can improve their payoffs by deviating, the class of potential games, introduced by Monderer and i.e., S =(s1,...,sn) is a strong Nash equilibrium if for all Shapley [1996]. In potential games, any sequence of im- I ⊆ N S s ,...,s and any strategy vector =(1 n) such that provements by players converges to a pure Nash equilibrium. si = si for i ∈ N \ I,ifui(S ) >ui(S) for some i ∈ I, then However, the number of steps can be exponential in the de- uj(S ) <uj(S) for some j ∈ I. scription of the game. The complexity of computing (ap- Let Δi(S) be the improvement in the player’s payoff if proximate) Nash equilibrium in various subclasses of poten- b he performs his best response, i.e., Δi(S)=ui(S−i,si ) − tial games such as congestion games, cut games, or party b ui(S), where s is the best response of player i in state affiliation games has received a lot of attention in recent i S. For any Z ⊆ N let ΔZ (S)= Δi(S), and let years [Fabrikant et al., 2004; Skopalik and Vocking,¨ 2008; i∈Z Δ(S)=ΔN (S).ANash dynamic (respectively, α-Nash Bhalgat et al., 2010]. Related issues are how long it takes dynamic) is any sequence of best response (respectively, α- for some form of best response dynamics to reach an equilib- best response) moves. A basic Nash dynamic (respectively, rium [Chien and Sinclair, 2007; Ackermann et al., 2008],or basic α-Nash dynamic) is any Nash dynamic (respectively, how good are the states reached after a polynomial number of α-Nash dynamic) such that at each state S the player i that steps [Awerbuch et al., 2008; Fanelli et al., 2008].