Best-Response Mechanisms

Best-Response Mechanisms Noam Nisan1 Michael Schapira2 Gregory Valiant3 Aviv Zohar4 1School of Eng. and Computer Science, The Hebrew University of Jerusalem 2Computer Science Dept., Princeton University 3Dept. of Computer Science, UC Berkeley 4Microsoft Research, Silicon Valley [email protected] [email protected] [email protected] [email protected] Abstract: Under many protocols—in computerized settings and in economics settings—participants repeatedly “best respond” to each others’ actions until the system “converges” to an equilibrium point. We ask when does such myopic “local rationality” imply “global rationality”, i.e., when is it best for a player, given that the others are repeatedly best-responding, to also repeatedly best-respond? We exhibit a class of games where this is indeed the case. We identify several environments of interest that fall within our class: models of the Border Gateway Protocol (BGP) [7], that handles routing on the Internet, and of the Transmission Control Protocol (TCP) [5], and also stable-roommates [3] and cost-sharing [9, 10], that have been extensively studied in economic theory. Keywords: Best Response; Mechanism Design; Incentive Compatible Dynamics 1 Introduction rational” dynamics, e.g., fictitious play or regret 1.1 Motivation: When is it Best to Best- minimization. Respond? Our focus in this paper is on a different ques- The basic object of study in game theory and tion that has received little attention so far: “Is in economics is the equilibrium: a “stable” state such locally rational behavior really rational?”. from which none of the players wish to deviate. Specifically, we consider games in which repeated Equilibrium is a static concept that often abstracts best-response dynamics do converge to an equilib- away the question of how it is reached. Once we rium and study the incentive properties of this pro- start looking at dynamics, or at algorithms for find- cess: Is it rational for players to repeatedly best- ing equilibria, we cannot escape questions of the respond? Can a long-sighted player improve, in form “How is an equilibrium reached?”. While the long run, over this repeated myopic optimiza- there can be different formalizations of this question? tion, in most cases, a truly satisfactory answer These questions about incentives are best ex- would have each player performing only simple plored in the context of games with incomplete “locally rational” actions and yet, mysteriously, information. Switching our attention from games the system would reach a global equilibrium. The with complete information to games with uncou- simplest example of such phenomena is repeated pled incomplete information, we see that repeated best-response dynamics: each player selects the best-response exhibits another attractive trait: to best (locally optimal) response to what others are best-respond each player need only know his own currently doing, and this process goes on “for a utility function (“type”), as his best response does while” until it “converges” to what must be a (pure not depend on other players’ utility functions, but Nash) equilibrium. Convergence of repeated best- only on their actions. Thus, we can view best- response is, unfortunately, not guaranteed in gen- response dynamics as a natural protocol for grad- eral, and is the subject of much research, as is ual and limited sharing of information in an effort the convergence of more sophisticated “locally- to reach an equilibrium. Indeed, in many real- 1 life contexts the interaction between decision mak- and, in fact, even in more general settings.1 We ers with incomplete information takes the form of discuss tie-breaking rules below. best-response dynamics (e.g., Internet routing [7]). Goal: Our general aim is to identify interesting When regarding best-response dynamics from this classes of (base) games for which best-response perspective, it is an indirect mechanism in the mechanisms are incentive-compatible. Intu- private-information mechanism-design sense. We itively, a best-response mechanism is incentive- wish to understand when such a mechanism, that compatible if, when all other players are repeat- dictates that all players repeatedly best-respond, is edly best-responding, then a player is incentivized incentive compatible. to do the same. Defining incentive compatibility in our setting involves many intricacies. We opt 1.2 The Setting to focus here on a very general notion of incentive compatibility that, we believe, captures essentially Let us begin by laying out our setting for study- any variant that the reader may desire; in a com- ing and formalizing incentives for repeated best- panion paper [11], we present several more games response. In our framework, each player holds (auctions) where only strictly weaker notions of a private utility function, and all players’ utility incentive compatibility can be obtained. Our no- functions, when put together, determine a full- tion of incentive compatibility here captures the information base game with some commonly- two following distinct but complementary points known strategy spaces. We desire that the out- of view: a mechanism design perspective and a come of the dynamics be an equilibrium of this learning equilibrium [1, 2] perspective. base game. Mechanism design perspective (in a prior-free Base game: We are given an n-player (one-shot) non-Bayesian setting): This point of view is nat- base game G, with players 1; : : : ; n, in which each ural when analyzing finite-time protocols in com- × × player i has strategy space Si, and S = S1 ::: puterized and economic settings. We are given a Sn. Each player i has a utility function ui such game with incomplete information G, where each 2 ⊆ × · · · × that (u1; : : : ; un) U U1 Un, where player’s utility function is private, and we wish ⊆ <jSj Ui is player i’s utility space. Each player to implement a pure Nash equilibrium (PNE) of knows only his own utility function, i.e., we view G. We point out that this uncommon objective— ui itself as player i’s type. implementing an equilibrium—proves to be a nat- Best-response mechanisms: We study a class ural implementation goal in many contexts (see of indirect mechanisms, that we term “repeated- Section 3, where we show that desirable outcomes response mechanisms”: players take turns select- can be regarded as “stable states”). Best-response ing strategies; at each (discrete) time step t, some mechanisms are incentive compatible, from this t 2 perspective, if the desired outcomes are imple- player it selects and announces strategy si mented in the ex-post Nash sense2. Importantly, Sit . Observe that one course of action avail- able to each player in a repeated-response mech- from this point of view, no actual play happens anism is to always choose a best-response to the during the process of best-response dynamics and most recently announced strategies of the oth- players merely announce strategies as their com- ers, that is, repeated-best-response. We call a munication with the mechanism; each player only repeated-response mechanism in which the pre- cares about maximizing his benefit from the final scribed behavior for each player is to repeatedly outcome of the mechanism, that is expected to ter- best-respond a “best-response mechanism”. To 1 fully-specify a best-response mechanism we must Our results actually hold even for (1) asynchronous player activation orders in which multiple players can best-respond specify (1) the starting state; (2) the order of player simultaneously or based on outdated information (as studied activations (which player is “active” when); and in [12]); (2) adaptive player activation orders that can change (3) for each player, a rule for breaking ties among based on the history of play; and also when (3) the mechanism multiple best responses. All of our results hold re- terminates as soon as all players “pass”, that is, each player repeats his last strategy. gardless of the initial state and of the order of play- 2The Revelation Principle then implies that the direct reve- ers’ activations (so long as it is “long enough”), lation mechanism is truthful (in the ex-post-nash sense). 2 minate after some finite predetermined number of rules, to the point that one may desire an intuitive time steps. justification for these choices. Roughly speaking, Learning equilibrium perspective: This point of there are two main, conflicting, intuitions: in some view is natural when analyzing environments such cases we simply ask players to break ties so as to as Internet protocols and global financial transac- be “nice” to others; in other cases we break ties actions, where players repeatedly interact with each cording to some “iterated-trembling-hand” logic. other and there is no “final turn”. Now, the players are actually involved in infinite repeated play of the 1.3 Games with Incentive-Compatible Best- Response Mechanisms incomplete-information game G and each player has a rule for selecting his next strategy based on Our main results are identifying a class of games the history of play. We are interested in the nat- for which best-response mechanisms are incen- ural rule that dictates that a player simply always tive compatible, and exhibiting several interesting best-respond to others’ most-recent strategies. In games that fall within this class (and thus have this context, each player wishes to maximize his incentive-compatible best-response mechanisms). long-term payoff, that we model to be the lim sup While at first glance, it might seem that the ex- of his stage utilities in this infinitely-played game3. istence of a unique PNE to which best-response Best-response mechanisms are incentive compati- dynamics are guaranteed to converge implies the ble, from this perspective, if the “best-response” incentive-compatibility of best-response mecha- rules are themselves in equilibrium in this infinite nisms, this intuition is false.

Best-Response Mechanisms

Evolutionary Stable Strategy Application of Nash Equilibrium in Biology

Game Theory]: Basics of Game Theory

Collusion Constrained Equilibrium

Contemporaneous Perfect Epsilon-Equilibria

The Computational Complexity of Evolutionarily Stable Strategies

Success Rates in Simplified Threshold Public Goods Games - a Theoretical Model

Monetary Policy Obeying the Taylor Principle Turns Prices Into Strategic Substitutes

Problem Set #8 Solutions: Introduction to Game Theory

Real-Time Monitoring in a Public Goods Game*

Monetary Policy Obeying the Taylor Principle Turns Prices Into Strategic Substitutes Camille Cornand, Frank Heinemann

Section Note 3

Quantum Tic-Tac-Toe: a Teaching Metaphor for Superposition In