Different Modes of Discounting in Repeated Games

Different Modes of Discounting in Repeated Games Stephen Wolff∗ under the guidance of JérômeRenault and Thomas Mariotti Toulouse School of Economics September 3, 2011 Abstract In this paper we study different modes of discounting in infinitely repeated games with perfect monitoring. In particular, we explore the equilibrium payoff set of the infinitely repeated symmetric two-player prisoner's dilemma under geometric and quasi-hyperbolic discounting. For each mode of discounting, we analyze the settings in which players have identical and different discount factors, and we provide an explicit characterization of a payoff profile on the Pareto frontier. 1 Introduction Repeated games have become a centerpiece in economic theory, describing repeated interac- tions among strategically thinking individuals. In most studies involving repeated games, two assumptions are made. The first is that players discount geometrically. Geometric discounting is one of the simplest and most mathematically rich modes of discounting. However, research suggests that humans discount hyperbolically (see Ainslie (1992)), assigning more weight to payoffs that occur closer to the present, a phenomenon known as \present bias". Laibson (1997) popularized a family of discount functions that approximate hyperbolic discounting and that capture its key features while also allowing for tractable mathematical analysis. This family of functions, called \quasi-hyperbolic" discount functions, is the form usually encountered in the literature, and the form we will use in this paper. The second assumption commonly made in game-theoretic literature is that players use identical discount factors. While valid for certain settings, this assumption is by no means universally appropriate, and in fact it hides a time-dynamic aspect of the more general case. When players have different discount factors, the possibility of intertemporal trade becomes available. In particular, a relatively patient player can agree to grant a relatively impatient player higher rewards in earlier periods (which the impatient player weights more heavily) in exchange for higher rewards to the patient player in later periods (which the patient player weights more heavily). Such intertemporal trade can yield discounted total payoffs for the repeated game that lie outside the set of feasible rewards of the stage game. With hyperbolic discounting another novel aspect arises, namely, time inconsistency of players' planned actions. A player exhibiting present bias may plan to follow one strategy to maximize her total repeated-game payoff; but because of her present bias, when she actually finds herself in later periods, she may prefer to play a different strategy. This time inconsistency ∗Contact Stephen.Wolff@rice.edu with comments, corrections, or suggestions. 1 Discounting in Repeated Games Stephen Wolff, TSE (2011) LR T 1; 1 −1; 2 B 2; −1 0; 0 Figure 1: The Prisoner's Dilemma. means that subgame-perfect equilibria are no longer stable across time a priori, and we must refine our equilibrium concept, using instead the idea of Strotz-Pollak equilibria. This paper is organized as follows. We start by detailing the model we will use in our study. In section 3 we then analyze the model under geometric discounting, first laying out the well-studied baseline case of identical discount factors before moving to the case of different discount factors. Next, in section 4 we turn to hyperbolic discounting, again looking at the cases of identical and different discount factors. In section 5 we summarize our discussion and give some concluding remarks. 2 The Model Consider a stage game G = (N; (Ai)i2N ; (ui)i2N ), where N = f1; : : : ; ng is the (finite) set of players, Ai is the (finite) set of actions for player i, and ui : A ! R is the utility function for player i, describing the utility player i receives from each pure action profile in A = A1×· · ·×An. We allow players to use mixed actions in the stage game, so that the relevant stage game is the mixed extension of G, in which each player i chooses an action αi 2 ∆Ai and receives the expected stage-game utility, or reward, given by X ui(α) = ×j=1;:::;nαj(aj)ui(a); a2A where αj(aj) denotes the probability assigned by the mixed action αj to the pure action aj. The particular stage game that we will use throughout this study is the symmetric two-player prisoner's dilemma shown in Figure 1. We will assume the existence of a public randomization device; this ensures that the set of stage-game rewards (and later in the analysis, the set of continuation payoffs and the set of equilibrium payoffs) is convex. Let P denote the set of outcomes of the public randomization device. Given a public randomization device, the set V of feasible stage-game rewards is the convex hull of the set of pure-action rewards: V = convfu(a) j a 2 Ag: Let vi denote the minimax reward for player i in the stage game G: vi = min max ui(ai; α−i): α−i2∆A−i ai2Ai Let IRi = fv 2 V j vi ≥ vig denote the half-space of individually rational rewards for player i, and let IR denote the intersection of all such half-spaces for all players: \ IR = IRi = fv 2 V j vi ≥ vi8i 2 Ng: i2N 2 Discounting in Repeated Games Stephen Wolff, TSE (2011) ∗ Similarly, let IRi = fv 2 V j vi > vig denote the half-space of strictly individually rational rewards for player i, and let IR∗ denote the intersection of all such half-spaces for all players: ∗ \ ∗ IR = IRi = fv 2 V j vi > vi8i 2 Ng: i2N For the prisoner's dilemma in Figure 1, the set V of feasible stage-game rewards is the solid parallelogram with its four vertices at the four pure reward profiles. The minimax reward for each player i is vi = 0. The set IR of individually rational reward profiles is thus this parallelogram V intersected with the nonnegative quadrant, and the set IR∗ of strictly individually rational rewards is V intersected with the strictly positive quadrant. The stage game G is repeated infinitely many times, starting at period t = 0. We will assume perfect monitoring; that is, at the end of each stage, all players can perfectly observe the actions chosen in that stage (as well as the outcomes of all previous stages) and can condition their future actions on these outcomes. The stage-t history of the repeated game is a vector ht that includes all past actions of all players, i.e. (a0; : : : ; at−1) 2 At, and all past realized values of the public randomization device, i.e. (p0; : : : ; pt−1) 2 P t. We denote the set of all possible stage-t histories by Ht = At × P t. Players observe the realization of the public randomization device in period t before choosing t their actions, so that a stage-t pure strategy for player i is a map si : H × P ! Ai. 2.1 A Remark on Discounting In what follows we consider cases in which players use various modes of discounting to evaluate their payoff streams. However, we should note that discounting is not the only method for representing players' preferences over time. Osborne and Rubinstein (1994) detail two other methods. Under the limit of means criterion, player i strictly prefers the stream of rewards t +1 t +1 (vi )t=0 to the stream of rewards (wi)t=0 if and only if T X t t lim inf (vi − wi)=T > 0: T !+1 t=0 t +1 t +1 Under the overtaking criterion, player i strictly prefers the stream of rewards (vi )t=0 to (wi)t=0 if and only if T X t t lim inf (vi − wi) > 0: T !+1 t=0 Whereas discounting weights a given reward less and less the further away in time the reward is received (for discount factors less than one), the limit of means and overtaking criteria weight all time periods equally. There are convincing economic arguments for using discounting, and it is by far the most commonly encountered method of representing time-dependent preferences in the literature. Therefore, we will restrict our attention to discounting in the remainder of the paper. 3 Geometric Discounting Geometric discounting is the most widely used representation of time-dependent preferences. Under geometric discounting, player i has a discount factor δi 2 (0; 1); a reward obtained t 3 Discounting in Repeated Games Stephen Wolff, TSE (2011) t periods after the current period is multiplied by the weight δi . Note that, as t ! +1, the t weight δi ! 0, so that if the set Vi of player i's feasible stage-game rewards is bounded, then t +1 t the infinite sum of any reward stream (vi )t=0 is well-defined, where vi 2 Vi for all t. In practice, we usually define player i's payoff in the infinitely repeated game to be "+1 # X t t πi(σ) = (1 − δi)Eσ δi ui(a ) ; (1) t=0 where σ is any behavior-strategy profile of the infinitely repeated game and where the expec- tation is taken with respect to the probability distribution generated by σ over the (infinite) terminal histories. The normalization factor 1 − δi is included to render the payoffs of the infinitely repeated game directly comparable to the rewards of the stage game. Under geometric discounting, the repeated game assumes a symmetric structure through k time: The set of player i's continuation payoffs Ci in any future period k is identical to her set of feasible payoffs: " +1 # k X t−(k+1) t Ci (σ) = (1 − δi)Eσ δi ui(a ) : (2) t=k+1 The equivalence between the continuation payoff (2) and the feasible payoff (1) is easily seen by reindexing the sum in (2) with t0 = t − k.

Load more