POPL: G: Reasoning About Incentive Compatibility
Total Page:16
File Type:pdf, Size:1020Kb
POPL: G: Reasoning about Incentive Compatibility Suguman Bansal Rice University [email protected] Abstract haviors that conform with agent objectives are called the solution We study equilibrium computation in regular repeated games, a concepts of the game. model of infinitely repeated, discounted, general-sum games where Algorithmic analysis of solution concepts in games has been agent strategies are given by a finite automaton. Our goal is to com- widely studied in algorithmic game theory [24]. In this field, com- pute, for subsequent querying and analysis, the set of all Nash equi- plex games are studied as repeated games [1, 14, 17]. An important libria in such a game. Our technical contribution is an algorithm solution concept is that of Nash equilibria [23]. Intuitively, Nash for computing all Nash equilibria in a regular repeated game. We equilibria is a state of a game in which no agent can increase its use our algorithm for automated analysis of properties such as in- reward by changing its own behavior only. centive compatibility on complex multi-agent systems with reward There is a wide literature on modeling repeated games as finite maximizing agents, such as the Bitcoin protocol. state machines [19, 27, 28]. The problem of computing Nash equi- libria in repeated games has received a lot of interest. The seminal work by Abreu, Pearce, and Stacchetti [2], and its followups [10], 1. Introduction considered the problem of computing all or some equilibrium re- Games [26], defined as multi-agent systems with reward maximiz- wards rather than equilibria. The problem of efficiently computing ing agents, find applications in modeling and understanding prop- a single equilibrium strategy in infinite games has been studied be- erties in a wide variety of fields, such as economics [11, 13, 20, fore [3, 15, 16]. There are also a few approaches to computing rep- 25, 32], social sciences [8, 21, 22], biology [4, 9, 18, 33], etc. resentations of approximations to the set of equilibria [5–7]. Except With the emergence of massive online protocols such as auctions, for work on finding one equilibrium, most other approaches do not adopted by cyber giants such as Google Auctions and eBay, games guarantee crisp complexity bounds. have generated much interest for utilitarian research from within For our purpose of analysis in repeated games, we are interested the computer science community. Presence of bugs in these proto- in the problem of computing all Nash equilibria in a game. In this cols have tremendous impact on both, the protocol providers and paper, we combine ideas from Game theory, and tools and tech- its large number of users. Hence, the analysis of properties on such niques developed in Programming languages and Formal methods systems is crucial. to realize this goal. First, we define a finite state machine based One concrete instance of the analysis of such protocols arises framework to model repeated games. We call our framework regu- in the context of the Bitcoin protocol. Bitcoins are one of the lar repeated games. Next, we develop an algorithm ComputeNash most widely used on-line currency in today’s time. The protocol to compute and enumerate all Nash equilibria in a regular repeated by which Bitcoins are mined (minted) is called the Bitcoin pro- game. Unlike previous approaches, our algorithm has a crisp com- tocol. Agents (miners) of the protocol receive rewards by mining plexity bound, and can determine all Nash equilibria in a regular more bitcoins. Earlier it was believed that miners receive the most repeated game. Lastly, as a proof of concept, we perform case stud- reward when they mine following the honest policies of the proto- ies on the Bitcoin protocol and repeated auctions, and corroborate col. Since the protocol rewarded miners proportional to the num- previously known results on them via analysis by ComputeNash. ber of bitcoins they mined, it was considered to be a fair proto- The merit in our approach over earlier approaches for the analysis col. However, a surprising result by Eyal and Sirer disproved this of games is that our analysis approach is automated, hence quicker misconception[12]. They proved via rigorous manual analysis that and not prone to errors. For the Bitcoin protocol case, we first ar- miners can receive greater rewards by following dishonest policies. gue that the Bitcoin protocol can be viewed as a repeated game, and A game is said to be incentive compatible if it ensures that reward hence we can apply ComputeNash for its analysis. maximizing agent will behave according to the honest policies of a This paper is organized as follows. Section 2 provides the nec- game. Hence, Eyal and Sirer proved that the Bitcoin protocol is not essary preliminary background from game theory and program- incentive compatible. ming languages. Section 3 introduces and defines regular repeated Incentive compatibility is a desirable property in games, since it games as a model for repeated games. Section 4 glosses over the ensures that all agents behave honestly in the game. However, erro- description of ComputeNash. Lastly, Section 5 demonstrates how neous system design is common. Therefore, analysis of properties ComputeNash can be used for automatic analysis of incentive such as incentive compatibility is critical for good game design. compatibly in the Bitcoin protocol and repeated auctions. Unfortunately, as demonstrated in the Bitcoin protocol case, even in today’s times most of this analysis is performed manually. One 2. Preliminaries cannot simply port results on verification of traditional systems in the context of games since their verification differs due to the pres- 2.1 Repeated Games ence of reward maximizing agents. More specifically, unlike in tra- Repeated games [17] are a model for multi-agent systems in which ditional systems, not all behaviors in a game need to be analyzed. agents interact with each other infinitely often. In each round of In the analysis of games, one can safely ignore behaviors that do interaction, each agent picks an action. These actions are picked not conform with the reward maximizing objectives of agents. Be- synchronously by all agents i.e. in each round of interaction, agents 1 2016/5/9 are not aware of what actions other agents will take in that round. 3. Regular repeated games (RRG) However, agents become aware of actions taken by other agents in We define Regular repeated games as a formal model for repeated all previous rounds. Agents receive a payoff (or reward) based on games. Regular repeated games builds on top of the well estab- the actions taken by each agent in an interaction. The payoff of an lished Rubinstein’s model for repeated games [27]. agent from a repeated game is computed by aggregating the rewards Formally, a regular repeated game G is given by a set of k received by the agent in each round using the discounted sum. The agents, where the i-th agent Pi consists of a finite set Action(i) discounted sum of a sequence of payoff A = {a0,a1,a2 . } of actions, and a finite set Strategy(i) of strategies. A strategy M with discount factor d > 1, denoted by DS(A,d), is given by ∞ i for Pi is B¨uchi automaton with weights. Figure 1 shows a strat- Σ ai/d . i=0 egy for the agent Pi. Each transition of the automaton represents Each agent in a repeated games has its own strategies. Intu- one round of the repeated game. Transitions occur over the alpha- itively, a strategy can be thought of as a guideline for the agent on bet of (a, (a)−i) where a ∈ Action(i) is the action of the agent how to pick an action in a round based on actions taken by all agents Pi, and (a)−i ∈ Action(j) is the environment action. Agent in previous rounds. For example, in an auction one strategy for an j=6 i Pi receives a reward along each transition. In Figure 1, transition agent is to bid truthfully, and another strategy is to never bid truth- (C,D),0 Q fully. All agents pick their strategy for the entire game only once in q0 −−−−→ q1 denotes that when the agent acts C, and the environ- the beginning. Agents pick their strategies synchronously. Strategy ment acts D, then the agent receives a reward of 0. profile, denoted by M, is a k-tuple of strategies (M1,..., Mk) Intuitively, accepting runs of a strategy M offer an agent-level i view of executions of the game. Specifically, consider a transition where M is the strategy of the i-th agent. ′ Various solution concepts have been defined as a means to char- (q, (a)−i,ai,p,q ) that appears at the j-th position of an accepting acterize which strategy an agent should pick in order to receive run of M. This means that there is a possible execution of the greater rewards from a game. This is important, since we assume game in which: (i) the i-th agent is at state q immediately before that agents in multi-agent systems are reward-maximizing by na- the j-th round of the game; (ii) in this round, the i-th agent and ture. Two such solution concepts are: Nash equilibria and best re- its environment synchronously perform the actions ai and (a)−i, respectively; (iii) the concurrent action leads agent Pi to transition sponse strategy. ′ Before defining these concepts, we introduce some necessary to state q at the end of this round, and receive payoff p. ′ notation. For each agent Pi and its strategy M , we define a strat- ′ Strategy profiles Now we define the semantics of interactions egy profile M[i := M ] obtained by starting with M, and then between agents P1,..., Pk that constitute a game G.