Lecture Notes:

School of Economics, Shandong University Contents

1 INTRODUCTION 1 1.1 What is Game Theory? ...... 1 1.2 Elements of a ‘Game’ ...... 1 1.3 Types of Games: The Time Aspect ...... 3 1.4 Cooperative and Non-Cooperative Games ...... 3 1.5 Reading ...... 4

2 STATIC GAMES OF 5 2.1 THE NORMAL FORM AND TWO EQUILIBRIUM CONCEPTS . 5 2.1.1 Example: The Prisoners’ Dilemma ...... 5 2.1.2 Equilibrium Concept 1: Iterated Elimination of Dominated Strategies ...... 6 2.1.3 Equilibrium Concept 2: The ...... 7 2.1.4 Mixed Strategies ...... 9 2.1.5 Examples of Prisoners’ Dilemmas in Economics ...... 10 2.2 APPLICATIONS TO OLIGOPOLY ...... 11 2.2.1 The Cournot Duopoly Game ...... 11 2.2.2 The Cournot-Nash Equilibrium to the Duopoly Game . . . . 12 2.2.3 Reaction Functions ...... 13 2.2.4 The Bertrand Model of Duopoly ...... 14 2.2.5 ...... 15 2.3 MACROECONOMIC POLICY GAMES ...... 17 2.3.1 The Expectations Augmented Phillips Curve (EAPC) . . . . 17 2.3.2 The Barro-Gordon (1983) Monetary Policy Game ...... 18 2.3.3 The Macroeconomic Policy ...... 23

3 DYNAMIC GAMES OF COMPLETE INFORMATION 27 3.1 Introduction ...... 27 3.2 A General Two-Stage Game with Observed Actions ...... 27 3.2.1 Solution by Backwards Induction ...... 28 3.2.2 and Credibility ...... 29

i 3.3 The Stackelberg Duopoly Model ...... 30 3.4 The Monopoly Union Model ...... 32 3.5 Bargaining Theory: Rubinstein’s Sequential Bargaining Model. . . . 33 3.6 Repeated Games...... 35 3.6.1 The General Idea ...... 35 3.6.2 The Infinitely repeated Prisoners’ Dilemma ...... 35 3.6.3 Collusion Between Two Duopolists ...... 38 3.6.4 The Barro-Gordon (1983) Monetary Policy Game ...... 39

4 GAMES OF INCOMPLETE INFORMATION 43 4.1 Static Games of Incomplete Information ...... 43 4.2 The Principal-Agent Problem and the Market for Lemons ...... 45 4.2.1 Adverse Selection and the Market for ‘Lemons’ ...... 45 4.3 Baysian Learning using Bayes’ Rule ...... 46 4.4 Dynamic Games of Incomplete Information with Bayesian Learning . 48 4.4.1 Introduction to Perfect Bayesian Equilibria ...... 49 4.4.2 Spence’s Model of Job-Market Signalling ...... 51 4.4.3 A Reputational Model of Monetary Policy ...... 55

5 : THEORY AND EXPERIMENTS 59 5.1 Introduction ...... 59 5.2 Less than Rational Adjustment Processes ...... 59 5.2.1 The Cournot Adjustment Process ...... 59 5.2.2 A More Rational Adjustment Process ...... 62 5.3 The Real World: Experimental Game Theory ...... 66

ii Chapter 1

INTRODUCTION

1.1 What is Game Theory?

Traditional Game Theory is concerned with the strategic interaction of rational individuals or economic agents such as firms, governments, central banks and trade unions. Chapters 1 to 3 of the course will be traditional in this sense and will emphasize applications taken from both microeconomics and macroeconomics. It will be largely structured around the four main chapters of Gibbons. Rasmusen is an important additional text for this part of the course. More recently Game Theory has taken a more critical look at the treatment of rationality assumption. Casual observation would suggest that people are not rational in the sense used in game theory and this is confirmed by experimentation. Chapter 4 looks at recent developments in Game Theory which assumes people learn by trial and error and players are boundedly rational. The books particularly useful for this part of the course (but also relevant for the first four topics) are: Rasmusen, Ch 4; Binmore, and Hargreaves Heap and Varoufakis. In addition to these books you will be referred to important papers which apply game theory to different areas of economics. If as a result of this course (or in spite of it) you become ‘hooked’ on Game Theory the book that you need to turn to, sooner or later, is Game Theory by Fudenberg and Tirole, MIT Press.

1.2 Elements of a ‘Game’

The essential elements of a game are: rationality, players, actions (or moves), strategies, information, payoffs, equilibria and outcomes. Consider these in turn: • Rational agents do the best they can given constraints and available infor- mation. In economic theory this is formalised as utility maximisation. Un-

1 der uncertainty this becomes expected utility maximisation. In addition the ’rationality’ assumption includes the assumption of that players are rational ie, all players know that all players are rational, and that all players know that all players know that all players are rational, etc. Experimental game theory provides only limited support for the assumption that people are generally ’rational’ in the economists’ sense. A ’minimalist defence’ of the assumption would go as follows: studying rational behaviour is a useful ’thought experiment’ even if we believe that people are not ra- tional. As Myerson (1990) argues, the goal of social science is not just to predict human behavour, but to analyze social institutions (the market econ- omy, private firms, government, the world trading system, central banks, etc) and evaluate proposals for their reform. It is useful to analyze such institu- tions under the assumptions that agents are rational so as to identify the flaws in their design rather than the people within them. If institutions perform badly when agents are rational then this is a argument for better design. If well-designed institutions still perform badly because of irrational people, then this is an argument for informing and educating people better. For example, general equilibrium theory tells us that criticisms of the market economy as an institution should concentrate on its distributional consequences, market failure arising from small numbers, externalities, the existence of public goods and departures from complete information. Markets are not fundamentally anarchistic or inefficient. • The players are individuals or other economic agents who make decisions. They choose actions (or moves) to maximise their utility. • Information available to each player is central to game theory. The infor- mation set is the knowledge available to each player at each point in time. This may include observations of key variables affecting their utility such as price, the history of moves of other players up to that point and the nature of other players’ utilities. Then: • A player’s is a rule that tells her which move to choose at each instant of the game, given her information set. Let Si be the set of strategies available to player i (or i’s strategy space) and let siϵSi denote a member of this set. Strategies can be mixed strategies where each player randomizes over pure strategies si ∈ Si; for example if there are 2 elements in Si, si1, si2, choosing si1 with probability p and si2 with probability 1 − p.

Consider an n-player game. Let (s1, . . . , sn) be a vector denoting an arbitrary combination of strategies on for each player. In general each player’s utility will dependent on what all players are doing. Denote player i’s utility by ui. The corresponding to each combination of strategies by all players is a payoff function ui(s1, . . . , sn).

2 • The normal form of a game is defined as G = {S1,S2, ···,Sn; u1, u2, ···, un} • ∗ ∗ ∗ An equilibrium s = (s1, . . . , sn) is a strategy combination consisting of the ’best’ strategy for all players. Different types of games have their own equilibrium concept.

• Corresponding to each equilibrium is an equilibrium for the de- scribing all the variables of the model.

1.3 Types of Games: The Time Aspect

First there is a time aspect of games. Games can be played once all moves being made simultaneously (‘one-shot’ or static games) or played over time moves being played sequentially and possibly many times (dynamic games). Dynamic games which continue for ever are infinite games.

The Information Aspect Second, games are defined according to the information structure assumed. In the course we distinguish games of complete or incomplete information, and games of perfect or imperfect information. In games of complete information each player’s payoff function is common knowledge among all the players. In games of the player with the move knows the full history of the game up to that point. Using these definitions and concepts the structure of the ‘traditional’ part of the course (and the Gibbon’s book) can be summarised as follows:

Information Structure Time Aspect Equilibrium Concept Complete Static Nash Complete Dynamic -Perfect Incomplete Static Bayesian Incomplete Dynamic Perfect Bayesian

1.4 Cooperative and Non-Cooperative Games

We can distinguish between cooperative and non-cooperative games. Cooper- ative games require some ‘precommitment mechanism’ that forces players to pre- commit to a particular course of action over time. Then the issue is how the gains from cooperation can be split among the players. Non-cooperative games lack such a mechanism and the focus of the course is on this type of game. Note that coopera- tion is still studied but the emphasis is on how cooperation can be enforced without

3 an external precommitment mechanism. In effect cooperation is then a possible non-cooperative equilibrium.

Zero-Sum Games Our final category is the idea of a zero-sum game. For such a game the sum of the payoffs is zero whatever the strategy combination of the players. Then there are clearly no gains from cooperation.

1.5 Reading

Gibbons (G) Preface Hargreaves Heap and Varoufakis (HHV) Chs 1,2. Myerson, R. B. (1999), ’Nash Equilibrium and the History of Economic Theory’, Journal of Economic Literature, 37, no. 3, 1067-1082. Nasar, S. (2001), ‘The Life of Mathematical Genius and Nobel Laureate John Nash’, Simon & Schuster Rasmusen (R) Preface, Introduction, Chs 1 part 1.1.

4 Chapter 2

STATIC GAMES OF COMPLETE INFORMATION

2.1 THE NORMAL FORM AND TWO EQUILIB- RIUM CONCEPTS

2.1.1 Example: The Prisoners’ Dilemma The following offer is made to two prisoners held separately:

IF BOTH CONFESS: EACH RECEIVE 10 YEARS.

IF NEITHER CONFESS: EACH RECEIVE 3 YEARS.

IF ONLY ONE CONFESSES: HE/SHE RECEIVES 1 YEAR AND HIS/HER PARTNER RECEIVES THE MAXIMUM SENTENCE OF 20 YEARS.

We can construct a payoff matrix as follows: take a prisoner’s utility to be -sentence in years. Then the payoff matrix takes the form:

Prisoner 2 SILENT CONFESS Prisoner 1 SILENT (−3, −3) (−20, −1) CONFESS (−1, −20) (−10, −10)

For this game the elements referred to in the introduction are:

• n = 2 (there are 2 rational players).

5 • Moves are equivalent to strategies in a static or ‘one-shot’ game.

• The strategy set for each player is Si = {Silence,Confess}, i = 1, 2 for players 1 and 2; or {S, C} for short. Each player chooses a strategy (or move) siϵSi i.e. si = S or C giving 4 possible outcomes (C,C), (C,S), (S, C), (S, S) for (s1, s2).

• There is complete information.

• The payoffs (u1(s1, s2), u2(s1, s2)) associated with these outcomes: (u1(C,C), u2(C,C)), (u1(S, C), u2(S, C)), (u1(C,S), u2(C,S)) and (u1(S, S), u2(S, S)) are shown in the payoff matrix. This describes the game in normal form.

• To solve this non-cooperative game we need an equilibrium concept; i.e. a definition of a strategy (or, for this game, an action) combination that is consistent with utility maximization by both players.

2.1.2 Equilibrium Concept 1: Iterated Elimination of Dominated Strategies Before proposing our first equilibrium concept we first define: Dominated Strategies and their Elimination A strictly dominated strategy is always worse than some other strategy irrespec- tive of the strategies the other players might pick. For example in the Prisoners’ Dilemma Silence is strictly worse (i.e., the outcome yields a lower utility and not just a utility that is lower or equal) than Confess irrespective of whether the other Prisoner confesses or remains silent. Now if we postulate that rational players do not play strictly dominated strategies then this assumption solves the Prisoners’ Dilemma. The two players will not play Silence which is strictly dominated by Confess. It follows that (C,C) is the equilibrium to this game. We can use this postulate to solve some games, but not others. Consider the fol- lowing game in normal form. We can solve this game by the iterated elimination of strictly dominated strategies (IESDS).

Worked Example 1

Player 2 Left Middle Right Player 1 Up (1, 0) (1, 2) (0, 1) Down (0, 3) (0, 1) (2, 0)

6 However IESDS does not always solve the game. eg consider the following:

Worked Example 2

Player 2 Left Middle Right Player 1 Up (1, 0) (1, 2) (0, 1) Down (0, 3) (0, 1) (2, 2)

Worked Example 3

Player 2 Left Middle Right Player 1 Up (1, 0) (1, 2) (0, 1) Down (0, 3) (0, 1) (2, 1)

When IESDS fails we need to strengthen the equilibrium concept further in order to solve the second example. This leads to our next equilibrium concept:

2.1.3 Equilibrium Concept 2: The Nash Equilibrium

The strategies of a game are a Nash Equilibrium (NE) if for each player the equilibrium strategy is the best (utility maximising) response to the equilibrium strategy of all other players. To state this formally first introduce some notation: For any vector s = (s1, s2, . . . , sn) denote by s−i the vector (s1, s2, . . . , si−1, si+1, . . . , sn) the portion not associated with i. Now drop the line under vectors where the latter are obvious. Then rear- ranging we can write s = (si, s−i) and the utility for player i as:

ui(s) = ui(si, s−i)

Definition: s∗ is a Nash Equilibrium iff for each player iϵ1, 2, . . . n we have

∗ ∗ ≥ ∗ ui(si , s−i) ui(si, s−i) for all si.

7 ∗ − ie si is the to the strategies specified by the other n 1 players. To find the NE (there may be more than one) of a game set it out in normal form and then examine all the possible outcomes in turn. Thus in the Prisoners’ Dilemma (C,C) is the only NE. For all other 3 outcomes at least one prisoner will prefer a different choice given the choice of the other.

Worked Example 4 In examples 1 to 3 show that (up, middle) is the only NE. Now consider the following battle of the sexes. A couple Pat and Chris have the following preferences for a night’s entertainment: Both prefer to be together than apart irrespective of the choice of outing. Chris prefers opera to football and Pat prefers football to opera. These preferences can be presented as payoffs and the game in normal form can be represented as follows:

Pat Opera Football Chris Opera (2, 1) (0, 0) Football (0, 0) (1, 2)

There are no dominated strategies so the IESDS does not get us anywhere. However inspection of the payoff matrix reveals that there are two NE: (opera, opera) and (football, football) (Why?). There are now multiple NE. Worked Example 5 Some games have no Nash equilibrium in pure strategies. One such game is known as . Each player’s strategy space is {Heads, T ails}. Ecah player displays her choice of Head or Tail. If the two pennies match Player 1 wins Player 2’s penny; if they do not match Player 2 wins Player 1’s penny. The payoff matrix is as follows:

8 Player 2 Heads Tails Player 1 Heads (−1, 1) (1, −1) Tails (1, −1) (−1, 1)

Show there is no Nash equilibrium in pure strategies.

To summarise we have introduced two equilibrium concepts: strategies that survive the IESDS and Nash equilibria. Neither of these solution concepts necessarily solve a particular game in the sense that they predict a unique outcome. Of these two concepts the NE is the stronger solution in that it can be shown that NE always survive the IESDS but the converse is not true. (See Gibbons, Appendix 1.1.2). Some games have no Nash equilibrium in pure strategies.

2.1.4 Mixed Strategies In the Matching Pennies Game a mixed strategy for Player 1 is to play the pure strategy Heads with probability r ∈ [0, 1] and Tails with probability 1−r. We should interpret this as capturing player 2’s uncertainty about what player 1 is going to do. (This interpretation was proposed by Harsanyi). Similarly a mixed strategy for Player 2 is to play the pure strategy Heads with probability q ∈ [0, 1] and Tails with probability 1 − q which capture’s Player 1’s uncertainty regarding Player 2. Generally a mixed strategy can be defined as follows. Suppose Player i has K { ··· } pure strategies Si = si1, , siK . Then a mixed strategy for Player∑ i is a probability ··· ∈ ··· K distribution (pi1, , piK ) where pik [0, 1] for k = 1, K and k=1 pik = 1. Returning to the Matching Pennies game we can now solve for a mixed strategy Nash equilibrium. The game is now played under uncertainty and now each player must maximize her expected utility. Player 1’ expected utility (payoff) from playing mixed strategy r, 1 − r when Player 2 plays q, 1 − q is given by r[q × (−1) + (1 − q) × 1] + (1 − r)[q × 1 + (1 − q) × (−1)] = (2q − 1) + r(2 − 4q) (2.1) It follows that if q < 1/2 then Heads (r = 1) is the best strategy for Player 1 and 1 if q > 1/2 then Tails (r = 0) is the best strategy for Player 1. If q = 2 then Player 1 is indifferent as between Heads and Tails. We can express this result in terms of a best response function r(q) giving the best choice of r by Player 1 given the choice q by Player 2. We have found that 1 r(q) = 1 if q < 2 1 = 0 if q > 2

9 Worked Example 6 In the Matching Pennies game show that the best response function q(r) by Player 1 given the choice r by Player 2. We have found that

1 q(r) = 0 if r < 2 1 = 1 if r > 2 Hence find the mixed-strategy Nash equilibrium in this game.

2.1.5 Examples of Prisoners’ Dilemmas in Economics Exchange Rate Policy Suppose a government has the following preferences: Unilateral devaluation ≻ Do nothing by all ≻ An attempt by all countries to devalue ≻ Revaluation. 1 Assign utilities: 20, 10, 0, −10 to these outcomes. Then for a two-country world, payoff matrix becomes

Country 1 DEVALUE DO NOTHING Country 2 DEVALUE (0, 0) (20, −10) DO NOTHING (−10, 20) (10, 10)

Note that one country’s devaluations is another country’s revaluation–The Beggar- thy-neighbour aspect of exchange rate policy. Nash Equilibrium is both attempt to devalue. This is Inefficient.

Fiscal Policy ’Free-riding’ on another country’s fiscal expansion ≻ Joint expansion ≻ Do nothing ≻ Unilateral expansion Assign utilities: 20, 10, 0, −10. Then the two-country payoff matrix is:

Country 1 FISCAL EXPANSION DO NOTHING Country 2 FISCAL EXPANSION (10, 10) (−10, 20) DO NOTHING (20, −10) (0, 0)

1Note that ≻ means “is preferred to”.

10 Do nothing is a dominant strategy. Hence both countries doing nothing is a Nash Equilibrium. This outcome is also Inefficient. Examples of the prisoners’ dilemma, such as these, abound in the social sciences. All have the following common features:

1. The existence of spillovers between players in the form of positive or negative externalities

2. Benefits from cooperation that can achieve Pareto-improving outcomes.

3. The cooperative outcome is not a Nash Equilibrium.

Here are a few examples: oligopoly, environmental policy, international trade, arms races, the arms trade.

Worked Example 7. Identify the 3 features above in these examples. What does feature 3 imply for the sustainability of cooperation?

2.2 APPLICATIONS TO OLIGOPOLY

2.2.1 The Cournot Duopoly Game This originated with Cournot, “Recherches sur les Principles Mathemamatique de la Theorie des Richesses” (1838) who anticipated the Nash equilibrium concept of Nash, discovered in (1950). Consider the following model: 2 firms produce amounts of a homogeneous output q1 and q2. Let total industry output be given by

Q = q1 + q2 (2.2) and let the industry demand curve be given by a linear relationship

P (Q) = a − bQ = a − b(q1 + q2) (2.3)

For each firm assume a constant returns to scale cost function of producing a given level of output at given factor prices:

Ci(qi) = ciqi; i = 1, 2 (2.4)

Thus if c1 < c2 then firm 1 is more efficient than firm 2. This completes the model. We now specify the details of the Cournot Game.

• The players: The two oligopolists.

11 • Strategies: The strategy or move (for a one-shot game) is the choice of output. The strategy space for each firm is any positive output i.e. we seem to have qiϵSi = [0, ∞). However this is not quite right because we must ensure a positive price i.e. P = a − bQ > 0 which implies that Q < a/b. The strategy space is therefore qiϵSi = [0, a/b).

• Payoffs: firm 1 maximises profits

π1 = π1(q1, q2) = P q1 − c1q1 = q1(P − c1) = q1(a − c1 − b(q1 + q2)) (2.5)

substituting for P from (2.3). Similarly for firm 2.

• Type of Game: We assume complete information. Furthermore we only consider a static game played once (a ’one-shot game’) with both firms choos- ing output simultaneously. The two firms could cooperate in a cartel and agree on a level of total output and distribution between the firms. This is a cooperative game. Alternatively they could act independently in a non- cooperative game.

We then need an equilibrium concept and we choose a NE. Since Cournot anticipated the NE we refer to this as:

2.2.2 The Cournot-Nash Equilibrium to the Duopoly Game Recall the definition of a NE: the strategies of a game are a NE if for each player the equilibrium strategy is the utility maximising response to the equilibrium strategy ∗ ∗ of all other players. For our duopoly denote the NE by (q1, q2) . Then this is a NE if

∗ ∗ ≥ ∗ π1(q1, q2) π1(q1, q2) for all q1ϵS1 (2.6) for firm 1 , and similarly

∗ ∗ ≥ ∗ π2(q1, q2) π1(q1, q2) for all q2ϵS2 (2.7) for firm 2. ie in the NE both firms cannot improve their situation given that the other firm is playing its NE strategy. To calculate the NE consider firm 1’s profit-maximising choice of output. Then using (2.5), in a NE firm 1 maximises with respect to q1:

∗ − − ∗ π1(q1, q2) = q1(a c1 b(q1 + q2)) (2.8) ∗ given q2. Differentiating with respect to q1, the first order condition is

12 ∂π1 − − − ∗ = a c1 2bq1 bq2 = 0 (2.9) ∂q1 In the NE we must therefore have

1 q∗ = (a − c − bq∗) (2.10) 1 2b 1 2 Performing a similar calculation for firm 2 gives

1 q∗ = (a − c − bq∗) (2.11) 2 2b 2 1 ∗ ∗ Equations (2.10) and (2.11) give two equations in two unknowns q1, q2 which can be solved to yield the NE. If firms are identical (c1 = c2 = c,say) be easily found. Then we know that ∗ ∗ ∗ q1 = q2 = q say, and using (2.10) or (2.11) we arrive at the NE levels of output a − c q∗ = (2.12) 3b The price and profits are given by

2 1 P = a − b(q∗ + q∗) = a − 2bq∗ = a − (a − c) = (a + 2c) (2.13) 1 2 3 3 (a − c)2 π∗ = π∗ = (2.14) 1 2 9b Note that the price can be written as P = c+bq∗ and hence the price > marginal cost (c) as long as output in equilibrium is positive. Duopoly therefore results in Pareto-inefficiency. Under complete information both firms can calculate this NE. They will therefore proceed straight to producing at this level of output. The same NE can be arrived at used the idea of:

2.2.3 Reaction Functions Equations (2.10) and (2.11) above give each firm’s best response to the other firm in equilibrium. Suppose that firm 2 produces at an arbitrary level q2ϵS2. The reaction function of firm 1 is the profit-maximizing response to q2. We denote this by R1 = R1(q2) and R2 = R2(q1) is defined similarly for firm 2. By a similar reasoning that lead to (2.10) and (2.11) we have that

1 R (q ) = (a − c − bq ) (2.15) 1 2 2b 1 2 1 R (q ) = (a − c − bq ) (2.16) 2 1 2b 2 1

13 Note that these reaction function represent out-of-equilibrium behaviour ev- erywhere except at the intersection of the two curves which gives the NE. This is shown below for the case c1 = c2 = c. AB is the reaction function for ∗ firm 2 and CD for firm 1. At the intersection we have a1 = q2 = q = (a − c)/3b as before.

q1

= q2 R2 (q1) ofslope-2

NE = q1 R1(q2 ) ofslope-1/2

q2

Figure 2.1: The NE as the Intersection of Reaction Functions

2.2.4 The Bertrand Model of Duopoly Suppose firms choose prices rather than output. i.e. their moves consist of a choice of the price piϵSi where Si = [0, ∞); i = 1, 2. Will this make any difference? This situation is only possible if we allow for some departure from the ’law of one price’. Suppose that the goods produced by the two firms are not perfect substitutes but consist of differentiated goods. Write the demand for good 1 as the following function of both prices p1 and p2:

q1 = q1(p1, p2) = a − bp1 + d(p2 − p1) = a − (b + d)p1 + dp2 (2.17)

14 Equation (2.17) says that demand for good 1 goes up if its own price falls and/or the difference between its rival’s price and its own rises.

Worked Example 8 Find the NE to this game. Show that price > marginal cost. What happens as d becomes large? Interpret your result.

2.2.5 Collusion Up to now we have considered two non-cooperative equilibria: Cournot and Bertrand. We complete the study of duopoly by comparing these equilibria with a cooperative outcome. Suppose that the firms are identical: ie, c1 = c2 = c. Joint profits are therefore

π1 + π2 = (P − c)q1 + (P − c)q2 = (P − c)(q1 + q2) = (a − bQ − c)Q (2.18) which is only a function of total output Q. A Pareto-efficient outcome from the viewpoint of the two firms (but not society) is then obtained by maximizing total profits (otherwise one firm could be made better off without making the other worse off) - see the figure below. Firms would then agree to maximize total profits or, in other words, at as a monopoly and maximize (2.18) with respect to Q. This leads to total output and profits

a − c Q = (2.19) 2b (a − c)2 π + π = (2.20) 1 2 4b There still remains the question of where to produce the output: any distribution a−c (a−c)2 ranging from q1 = 0; q2 = Q = 2b at A in the diagram to π1 = 0; π2 = 2b ; a−c (a−c)2 q2 = 0; q1 = Q = 4b ; π2 = 0; π1 = 4b ; at B will satisfy (2.19). As in the exchange economy with the Edgeworth box the outcome will depend on bargaining. Since both firms always have the option of abandoning the cartel and producing independently at the Cournot-Nash equilibrium (assuming that output is the choice (a−c)2 of move) both firms are guaranteed of at least π1 = π2 = 9b in this equilibrium (see exercises) at NE. Therefore the bargaining zone for cooperation is actually reduced to the shaded region which is Pareto-superior to the NE. The outcome is somewhere along CD depending on the bargaining power of the two firms. Exactly where is examined using bargaining theory - a branch of game theory we consider in topic 2. If firms are identical a natural choice for the outcome is equal shares in which case

15 2 NE (a - c) p p = 1 9b a - c 2 B p M = ( ) > p NE 8b

D

(p M ,p M )

C (a - c)2 NE NE p +p = (p ,p ) 1 2 4b

45 A p 2

Figure 2.2: Pareto-Efficient Outcomes

a − c q∗ = q∗ = (2.21) 1 2 4b (a − c)2 π∗ = π∗ = (2.22) 1 2 8b in the cooperative equilibrium. However this intuitive choice of cooperative outcome is not supported by the bargaining model of the next topic. The main problem with (2.21) as it stands is that it is not really an equilibrium. a−c Given that one firm, say firm 1, has agreed to produce at the agreed level q1 = 4b firm 2 would find it optimal to renege on the agreement and produce along its reaction function given by (2.16); i.e., by

1 3(a − c) a − c R (q ) = (a − c − bq ) = > (2.23) 2 1 2b 1 8b 4b at a higher level of output. This is precisely the Prisoners’ Dilemma revisited. To sustain the cartel either some external commitment mechanism or a tit-for-tat pun-

16 ishment in a is required. The enforcement of silence in the Prisoners’ Dilemma or cooperation in an oligopoly game can be achieved in a repeated game if players threaten to revert to the NE in the event of the other players reneging. This is a subject pursued in the next topic.

2.3 MACROECONOMIC POLICY GAMES

2.3.1 The Expectations Augmented Phillips Curve (EAPC) Our starting point is the well-known Expectations Augmented Phillips Curve (EAPC). Lucas (1972) provides early micro-foundations for this aggregate supply relationship based on the idea that producers only observe their own goods and not the aggregate price level. Here we provide a more ‘European’ perspective of the phenomenon in which labour market distortions take centre stage. Consider the determination of nominal wage contracts by wage-setters, who can be either regarded as monopoly unions or minimum wage legislators. They set one- period nominal wage contracts to minimise an expected welfare loss Et−1(Ut) where

2 Ut = (wt − pt − wˆ) (2.24)

All variables in (2.24) are expressed in logarithms, wt is the nominal wage and pt is the domestic price level. (2.24) says the wage-setters have a bliss pointw ˆ for the real wage. Performing this optimization we obtain

wt = Et−1(pt) +w ˆ (2.25) Hence the ex post real wage is

wt − pt = −[pt − Et−1(pt)] +w ˆ (2.26)

The interpretation of this equation is that a price or inflation surprise,[pt − Et−1(pt)], lowers the real wage. Now consider the determination of employment and output. We assume a Cobb- Douglas production function with constant capital stock:

α 1−α 1−α Yt = AKt Lt = BLt (2.27) say, where B is a constant depending on the constant stock of capital and Yt, Kt and Lt denote output, capital stock (assumed fixed throughout) and employment respectively in levels. Then equating the real wage with the marginal product of labour we have

the real wage = Wt/Pt ∂Yt − −α = MPL = = (1 α)BLt (2.28) ∂Lt

17 Then taking logs: [ ] Wt log = log Wt − log Pt = log((1 − α)B) − α log Lt (2.29) Pt

Putting wt = log Wt etc as in (2.24) we then have:

wt − pt = c − αlt (2.30) where c = log((1 − α)B). This model of employment together with the real wage (2.26) and the Cobb-Douglas production function completes the supply side of the model for an given capital stock: Combining (2.26) and (2.30) we have

1 1 l = l + [p − E − (p )] = l + [π − E − (π )] (2.31) t α t t 1 t α t t 1 t where l is the equilibrium employment level defined as

c − wˆ l = (2.32) α and we have expressed the relationship in terms of inflation πt = pt − pt−1 . (2.31) is one form of the short-run EAPC giving the familiar positive effect of surprise inflation on employment. The long-run EAPC is of course lt = l where expectations are realised. In terms of output taking logs of (2.27) we have

− − e yt = b + (1 α)lt = y + β[πt πt ] (2.33)

(1−α) − where b = log B, β = α , y = b + (1 α)l is the equilibrium output level and e E πt = t−1(πt). (2.33) is the EAPC expressed in terms of output.

2.3.2 The Barro-Gordon (1983) Monetary Policy Game This is a game played between the monetary authority (i.e., central bank) and the private sector in an economic environment described by the EAPC. To summarise the previous section: we have derived the short-run EAPC for the closed economy as

y = y + β(π − πe) (2.33′) where y is output (in logs), y is the ‘natural rate’ of output, π, πe is the inflation rate and expectations of the inflation rate respectively. The payoffs for the central bank is as follows: suppose that the authorities have an ambitious target for outputy ˆ > y and a social welfare function (i.e., payoff) per period of the form

18 Inflation Long-RunEAPC p (with)p = p e p e > 0

p e = 0

Short-RunEAPCs (with)p ¹ p e

y Outputy

Figure 2.3: The Expectations Augmented Phillips Curve (EAPC)

U = −a(y − yˆ)2 − π2 (2.34) which is a utility function with bliss points at y =y ˆ and π = 0 . We can interpret yˆ as full-employment output.2 The single-shot game passes through the following sequence of events:

e E − 1. The private sector forms an expectation πt = t−1(πt) in period t 1.

2. The monetary authority chooses an inflation target πt in period t which is as- sumed to be reached using actual monetary instruments; i.e., the central bank (CB) acts as if its monetary instrument is the inflation rate. The subscript t can be dropped in what follows.

The payoff to the private sector (PS) is simply −(π − πe)2 which is a formal way of saying that the private sector wants to forecast correctly. The game in normal form is then summarised by table 1:

2This is the same setup as in Gibbons except that his notation is different. He puts y = by,ˆ b < 1 , and callsy, ˆ y∗. What I call β he calls ‘d’. Finally he minimises (1/a)U = −(y − yˆ)2 − (1/a)π2 and calls ‘1/a’, ‘c’. You should be able to translate my results into those obtained in Gibbons and vice versa.

19 Player CB PS MOVES π πe PAYOFFS U = −a(y − yˆ)2 − π2 −(π − πe)2

Table 2.1: Table 1. The Monetary Policy Game

In Barro and Gordon (BG) a slightly different utility function is adopted. They put

U = ay − (1/2)π2 (2.35) which removes the assumption of a bliss point for output. (Is this reasonable?). We will look at both utility functions in what follows.3 The Nash equilibrium of the single-shot game is found as follows. Using the CB payoff function (2.34) and substituting from (2.33) we can write

U = −[a(y − yˆ + β(π − πe))2 + π2] = U(π, πe) (2.36) In stage 2 of the monetary policy game the CB maximises U w.r.t π given expectations πe. The first order condition is:

∂U = −[2a(y − yˆ + β(π − πe))β + 2π] = 0 (2.37) ∂π Solving for π given πe leads to the following reaction function of the CB:

aβ(ˆy − y) + aβ2πe π = (2.38) 1 + aβ2 By contrast the reaction function of the private sector is very straightforward. They simply minimise −(π − πe)2 w.r.t. πe given π. i.e., put πe = π and forecast accurately. Another way of expressing this idea is as a rational expectations equi- librium which, in the absence of uncertainty is a perfect foresight equilibrium. The Nash equilibrium is at the intersection of these reaction functions (see figure 4). Putting πe = π in (2.38) the Nash equilibrium inflation rate and output level are obtained as: π = πe = aβ(ˆy − y); y = y (2.39)

Worked Example 9 Interpret the result (2.39).

3Note again the different notation in BG. To go from BG to our formulation put ‘b’ in BG equal to βa in our notation and ‘a’ in BG equal to 1. Ignore stochastic changes to b.

20 p p = p e

ReactionFunction (equation(33))

p e

Figure 2.4: The Nash Equilibrium Inflation Rate given by (2.39)

Worked Example 10. Suppose that the CB adopts a payoff function (2.35). This case is in fact more straightforward. Show that the CB reaction function is π = βα, the private sector reaction function is π = πe = βα; y = y as before and the Nash equilibrium is π = πe = βα; y = y. For both utility functions we have a Nash equilibrium that is clearly Pareto- inefficient. Both players, the CB and the private sector, can be made better off if the CB set inflation at zero and the private sector forecasts correctly. This is the social optimum (SO):

π = πe = 0; y = y (2.40)

Compared with the social optimum the Nash equilibrium gives an ‘inflationary bias’ without any corresponding output gain. The CB could announce and intend to follow the SO (ie, precommit); but given πe = 0 it is optimal for the CB to set inflation along its reaction function and not implement π = 0. In a complete information Nash equilibrium the private sector anticipates all this so πe is given by the Nash equilibrium, in which case it is optimal for the CB also to set π at the Nash equilibrium value. This is the credibility problem facing the CB. The credibility problem is illustrated in figure 5. The social optimum consistent e with rational expectations (πt = πt on the LRPC) is zero inflation at point P (for

21 precommitment). But this requires precommitment to enforce. In the absence of e such precommitment at time t expectations πt formed in the previous period are given and the policymaker can choose a point on the SRPC to reach a utility curve closer to the bliss point B, at point C. This is a ‘cheating’ policy in which zero inflation is promised and believed, but non-zero inflation is delivered. In a rational expectations equilibrium however the private sector can anticipate the calculations of the policymaker. High inflation is anticipated and we end up on the LRPC with output back at its natural rate and high inflation at point D. PD is the inflationary bias. LRPC

SRPCs ) p e = ab (y - y) ) 0 < p e < ab (y - y) D p e = 0

C P B) y y Outputy

Figure 2.5: The Inflationary Bias.

B is the bliss point (full employment output) and πt = 0. PC is the SRPC with e πt = 0. PD=the inflationary bias.

How can the credibility problem be solved? This is formally equivalent to the problem facing prisoners in the prisoners’ dilemma and oligopolists trying to enforce collusion. The solution is through the mechanism of a in a repeated game examined in topic 2.

22 2.3.3 The Macroeconomic Policy Coordination Game . The need for macroeconomic policy coordination arises from (a) a role for gov- ernment fiscal or monetary policy intervention (i.e., real effects of policy) and (b) the existence of international policy spillovers from such interventions. Spillovers can exist for both monetary and fiscal policy, but here we concentrate on the former. Consider a two-country trading bloc. Our first change is to the EAPC. In open economies monetary spillover exist i.e., a monetary expansion in one country affects employment and output in interrelated economies with which it trades. To pursue this further, consider a two-country model representing two countries trading with each other but closed to the outside world. Suppose that the monetary authority in one country engages in a bout of surprise inflation. This will increase employment in that country as in the closed economy considered previously; but, in addition, the country will experience a real exchange rate depreciation. For the second country this corresponds to an appreciation. As before we are assuming that monetary policy is conducted in terms of an inflation target; but now inflation includes an imported component. A exchange rate depreciation in the first country implies an appreciation in the second. This will enable the CB in the second country to accommodate an increase in the price of domestic output without missing its inflation target. Thus the real product wage will fall in that country boosting employment and output. In short, surprise inflation in one country will increase employment in both countries, though this spillover effect will be less than the domestic effect. This may be formalised in terms of the following open economy EAPC:

− e − ∗ − ∗e yt = y + β[ϕ(πt πt ) + (1 ϕ)(πt πt )] (2.41) ∗ where πt is the inflation rate in country two and unstarred variables now refer to as country one. (2.41) generalises (2.33) to the two-country case. The parameter ϕ < 1 captures the size of the spillover effect of monetary policy. ϕ = 1 for the closed economy, and for the domestic effect of a monetary expansion to exceed the imported effect of a foreign expansion, we must have, in addition, ϕ > 1/2. An analogous result applies to the second country:

∗ ∗ − ∗e − − e yt = y + β[ϕ(πt πt ) + (1 ϕ)(πt πt )] (2.42) Now lety ˆ be the policymakers’ desired values (’bliss points’) for output in both countries. This is interpreted as before as the full-employment level of output. We assume payoffs:

− − 2 − 2 U = a(yt yˆ) πt (2.43) ∗ − ∗ − 2 − ∗ 2 U = a(yt yˆ) (πt ) (2.44)

23 for the two countries. There are now two aspects to the problem: credibility and coordination. We focus here on the latter by not assuming rational expectations as before. Instead we simply assume expectations of inflation are exogenous. For convenience put e ∗e πt = πt = 0 in which case (2.41) and (2.42) become

− ∗ ∗ yt = y + β[ϕπt + (1 ϕ)πt ] = y + γ1πt + γ2πt ; γ2 < γ1 (2.45) ∗ ∗ − ∗ yt = y + β[ϕπt + (1 ϕ)πt] = y + γ1πt + γ2πt (2.46) say. p * ) y - y Country1 Bat ,0( p*) = ,0( ) g 2 B ) y - y B*at (p )0, = ( )0, g 2

C ContractCurve

NE Country2 45 B* p

Figure 2.6: The Hamada Diagram.

NE=non-cooperative Nash equilibrium; C=cooperative equilibrium.

Now distinguish between coordinated and uncoordinated outcomes (i.e., co- operative and non-cooperative equilibria). Without cooperation each policy- maker chooses its own instrument to minimise its own welfare loss subject to the constraint of the model and given the policy setting of the second bloc. This

24 gives rise to reaction functions and, at the intersection (point NE below on the Hamada Diagram), a Nash Equilibrium. If policymakers cooperate then they must first choose a global welfare loss function. If the two blocs are identical (which we assume) then a natural choice for this is simply the sum of U and U ∗. This ∗ will give us outcomes on the contract curve. In equilibrium πt = πt (by sym- metry for identical blocs). Using the Hamada Diagram, or doing the algebra (see exercise), it can be shown that cooperation increases the utility compared with non-cooperation, i.e., there are welfare gains from cooperation. Note that for the ‘domestic’ country to achieve its bliss point at yt =y ˆ and πt = 0, from (2.45) it ∗ − requires the ‘foreign’ country to obligingly set its inflation at πt = (ˆy y)/γ2. Hence ∗ − its bliss point is at πt = 0 and πt = (ˆy y)/γ2. Similarly for the ’foreign’ country.

25 26 Chapter 3

DYNAMIC GAMES OF COMPLETE INFORMATION

3.1 Introduction

We now turn from one-shot or static games to dynamic or multi-stage games which are played over time. We first assume that information is complete and perfect; ie each players’ payoff function is common knowledge and the player with the move knows the full history of the games, ie all the moves made, up to that point. Fudenberg and Titole refer to a dynamic game with perfect information as a multi-stage game with observed actions. Section 2 sets out a general two- stage game of this form. Sections 3 and 4 study applications of this form of game to oligopoly and union-firm wage and employment decisions respectively. The rest of this topic concerns cooperative games. There are two issues here: first, in order to cooperate players must agree on how to “share the cake”. This is a bargaining problem and is examined in section 5. The second issue is that having agreed how the share the gains from cooperation, a mechanism is required to stop players from reneging on this agreement. Section 6 shows that in repetitions of one-shot games such as those studied in topic 1, a trigger strategy over time that in effect penalizes departure by either player from the cooperative agreement may be effective in sustaining that agreement.

3.2 A General Two-Stage Game with Observed Actions

In this section we examine a 2-stage game with complete and perfect information. We introduce two key concepts: backwards induction and subgame-perfection. The game is as follows:

Stage 1. Player 1 chooses a1ϵA1.

27 Stage2. Player 2 observes a1 and chooses a2ϵA2.

The game then ends and payoffs are: u1(a1, a2) and u2(a1, a2) for players 1 and 2 respectively. This is a very general class of game and two specific examples follow. Key features are (i) sequential moves (ii) all previous moves are observed and (iii) payoff functions and the rationality assumption are common knowledge. (i) and (ii) imply perfect information; (iii) implies complete information.

3.2.1 Solution by Backwards Induction To solve this game we start at the end ; ie we proceed by backwards induction. Player 2 then:

Chooses a2 to max u2(a1, a2) w.r.t. a2ϵA2 given a1.

Assume that this has a unique solution a2 = R2(a1). This is player 2’s best re- sponse or reaction function which can be thought of a strategy that conditions the choice a2 on the observed move a1 whatever that turns out to be. Proceeding to stage 1 player 1 can anticipate this response (the assumption of compete information and common knowledge) and therefore

Chooses a1 to max u1(a1,R2(a1)) w.r.t. a1ϵA1.

∗ Assume this optimization problem has a unique solution denoted by a1. The strat- ∗ egy pair (a1,R2(a1)) where R2(a1) is a reaction function conditional on any value a1 is the backwards-induction equilibrium (BIE) to the game. Since R2(a1) is ∗ ∗ the best response of player 2 to a1, player 2 chooses a2 = R2(a1). The pair of moves ∗ ∗ (a1, a2) is called the backwards-induction outcome to the game.

Specific Example Let A1 = A2 = {L, R}. Corresponding to a normal-form representation of a static one-shot game, dynamic games also have an extensive- form representation which, in addition to specifying the players of the game, their strategy spaces and payoffs at each stage of the game, also specifies: when each player has the move and the information set for each player when she makes the move. The extensive form of the present example can be represented by the follow- ing game tree. The game tree begins with a decision node for player 1, where 1 chooses be- tween L and R. Then two decision nodes are reached for player 2 depending on whether 1 has chosen L or R. Following player 2’s choice a terminal node is reached and the payoffs indicated are reached. Solving the game by backwards induction: Player 2 plays the following strategy ∗ s2 say:

28 1

L R

2 2

L RL R

(3,1) (1,2) (2,1) (0,0)

Figure 3.1: The Game Tree

If player 1 chooses R, choose L. If player 2 chooses L, choose R.

∗ ∗ Given that player 2 follows s2, player 1 follows the strategy s1: Play R.

The backwards-induction equilibrium (BIE) of the game is the strategy ∗ ∗ pair (s1, s2). The backwards-induction outcome of the game is then: stage 1, player 1 plays R. At stage 2 player 2 plays L. The payoff is (2, 1).

3.2.2 Subgames and Credibility For repeated games considered later in this topic we need the concept of a subgame. For games of perfect information we define a subgame (loosely) as: the piece of the game that remains to be played starting at any node. Thus the subgames of the game above are the entire game and the following two games: Then following Selten (1965) - cited in Gibbons - ∗ In our example s2 is the best response by player 2 to either L or R played by ∗ ∗ ∗ player 1. Therefore (s1, s2) a BIE in both subgames. Moreover s1 is the best response

29 2 2

(3,1) (1,2) (2,1) (0,0)

Figure 3.2: Subgames at Stage 2 of the Game

∗ ∗ ∗ of player 1 to s2. Therefore (s1, s2) is a BIE for the entire game (which counts as a subgame). Not all strategy are BIE. For example the strategy pair:

∗ Player 1: s1 = play L ∗ Player 2: s2 = play R (irrespective of what player 1 does). constitutes a outcome with payoffs (1, 2). But play R is not a best response for the subgame on the right-hand-side of figure 2 above. Therefore the strategy pair is not a BIE. One way of seeing this is that although playing R in all circumstances yields a better outcome for player 2, it lacks credibility because if player 1 plays R it is not the best response for 2. Player 1 knows this (the assumption of common knowledge and complete information). Therefore player 2’s strategy to play R even if 1 plays R is an empty threat and will not be believed by player 1. This the BIE concept ensures the credibility of the response at stage 2.

3.3 The Stackelberg Duopoly Model

This is a simple example of the general two-stage dynamic game of complete and perfect information set out in the previous section.

Stage 1: A dominant or leader firm 1 chooses output q1

Stage 2: A subordinate or follower firm 2 chooses output q2 knowing the history of the game, i.e.the choice the choice of output q1 by firm 1.

To solve this game we proceed as before by backwards induction. i.e., we start with stage 2 and work out firm 2’s optimal strategy given its observation of q1. Then given firm 2’s optimal strategy R2(q1) we proceed to stage 1 and calculate

30 firm 1’s optimal move. The resulting equilibrium in strategies R2(q1) for firm 2 at ∗ stage 2 and q1 is a BIE because it is optimal in every part of the game that is left; i.e., in every subgame. To calculate this equilibrium we start by noting that R2(q1) is the reaction function derived previously in topic 1; i.e.,

1 R (q ) = (a − c − bq ) (3.1) 1 2 2b 2 where we assume identical firms: ie, c1 = c2 = c. This solves stage 2 of the game. Proceeding to stage 1, in a complete information game firm 1 can anticipate (3.1) and therefore maximizes

π1 = (P − c)q1 = (a − b[q1 + q2] − c)q1 (3.2)

Substituting for q2 from (3.1) this becomes ( ) 1 q π = a − b[q + {a − c − bq }] − c q = 1 (a − bq − c) (3.3) 1 1 2b 1 1 2 1 which firm 1 maximizes with respect to q1. Differentiating and solving the first order conditions it is straightforward to show:

Worked Example 1 The resulting equilibrium is given by a − c a − c q∗ = ; q∗ = R (q∗) = (3.4) 1 2b 2 2 1 4b i.e., firm 1 produces more and firm 2 less than in the Cournot equilibrium; total 3(a−c) output is now Q = 4b which is greater than that of the Cournot equilibrium: 2(a−c) Q = 3b . Hence Stackelberg leadership leads to a lower price and an outcome closer to the Pareto-efficient competitive outcome.

Worked Example 2 Compare the profits of the leader and follower in the Stackelberg game with those in a Cournot duopoly game. Show that the outcome in all these equilibria are inefficient.

Note that the leader, firm 1, benefits from being able to move first, ie being able ∗ to commit itself to q1 before firm 2 chooses its output. Note that this commitment ∗ ∗ is essential because given q2 , q1 is not the best response, i.e., in the absence of commitment, ex post firm 1 would want to do something else. There must be some mechanism that makes this commitment credible; e.g., firm 1 must plan ahead because of long-term legally enforced contracts with workers and suppliers.

31 3.4 The Monopoly Union Model

This is a model of union-firm relationships due to Leontief (1946) in which the union can set the wage1 but the firm is then free to choose employment. Gibbons provides a very thorough treatment of which this is an outline. Assume that the firm operates in a competitive market facing a market price P . The sequencing of this game is:

Stage 1. The union sets the wage W .

Stage 2. The firm chooses employment L.

The payoffs are: ∂U ∂U The union has a utility function U = U(W, L), ∂W > 0; ∂L > 0( eg U = WL). The firm’s utility is profits Π = PY (K,L) − WL where the price P is given and we hold capital K fixed. Thus we can write Π(W, L) = R(L) − WL in terms of the moves W and L, where R stands for revenue.

Solving by backward induction:

Stage 2: The firm maximizes profits w.r.t. L given the wage. The first order condition is W = R′(L) = MPL (3.5)

Solving for L we arrive at the demand curve: L = g(W ) say.

Stage 1: The union solves

max U(W, L) = U(W, g(W )) w.r.t W

The solution is illustrated graphically below. We can show graphically that the outcome is inefficient (see below). Point X is the BIE of the monopoly union model. Points in the shaded area are Pareto-superior to X. AB is the contract curve; i.e., points at which the iso-profit curves and the union indifference curves are tangential (have common tangents). Example 2 of the tutorial exercises works out the BIE and the contract curve for particular functional forms for union utility and production function.

1There is no wage bargaining in this model, hence the term monopoly union, though because the firm has total control over employment it could be equally seen as a monopoly employer model.

32 W W=R'(L) UnionIndifference Curves X

A

B

Iso-ProfitCurves

L

Figure 3.3: Solution of the Monopoly Union Game

3.5 Bargaining Theory: Rubinstein’s Sequential Bar- gaining Model.

Throughout this course we come across examples where agents where agents benefit from cooperation2, provided that they can agree in advance on how to ‘divide the cake’ and provided that the incentive to cheat on the agreement can be overcome. The latter problem is addressed in the context of repeated games, section 6 below. Here we consider the former bargaining game. The solution we discuss is due to Rubinstein (1982, see G, page 68). In Rubinstein (1982) two players must agree on how to share a fixed cake. In periods 1,3,5,.. player one proposes shares (s1, 1 − s1), (s3, 1 − s3), etc. that player 2 can accept or reject. If player 2 accepts any offer the game ends. Otherwise if player 2 rejects an offer (s2k−1, 1 − s2k−1) in period 2k − 1, k = 1, 2,... then player 2 proposes an alternative share (s2k, 1 − s2k) in period 2k. If player 1 accepts the game ends etc.

2Examples of where it is necessary to agree first on the distribution of the gains from cooperation are in a cartel of oligopolists (how much should each produce?), and cooperation between the union and the firm in the previous game (where on the contract curve should the agreed (W, L) combination be?). Note in the prisoners’ dilemma, there is a unique cooperative solution (S, S) and there is no need to bargain.

33 This is a dynamic game of perfect information. We now show that this game has a unique solution if the players are impatient and prefer to receive a given share now rather than in the next period. Then each player has to weigh the consequences of waiting for a possible better offer against accepting the existing offer now. Suppose each player has a discount factor δϵ(0, 1) so that an offer δs for player 1 now is equivalent to an offer s next period. First consider a 3-period bargaining game which ends in the third period with an imposed settlement (s, 1 − s). Proceed by backward induction. Period 2: Player 2 offers a settlement (s2, 1−s2). For player 1 s2 is acceptable iff s2 ≥ δs . Player 2 should then offer the lowest possible s2 = δs iff (1−δs) ≥ δ(1−s). Since 1 > δ this is certainly true. We conclude that the BIE settlement for period 2 is

∗ − ∗ − (s2, 1 s2) = (δs, 1 δs)

Period 1: Player 1 offers a settlement (s1, 1 − s1). For player 2 1 − s1 is − ≥ − ∗ − acceptable iff 1 s1 δ(1 s2) = δ(1 δs). The lowest offer player 1 can make is − − − − ≥ ∗ 2 1 s1 = δ(1 δs) which is acceptable for player 1 as well iff 1 δ(1 δs) δs2 = δ s. This condition holds so we conclude that

∗ − ∗ − − − (s1, 1 s1) = (1 δ(1 δs), δ(1 δs)) Thus for our 3-period model we have:

∗ − 2 − 2 s1 = 1 δ + δ δ s We can generalize this result to any number of periods, n. For n periods it is apparent that

∗ − 2 − 3 n−1 − n−1 s1 = 1 δ + δ δ + ... + δ δ s Let n → ∞; then for δϵ(0, 1)

1 δ s∗ = ; 1 − s∗ = 1 1 + δ 1 1 + δ Further Reading. This reasoning is valid for large n, but cannot be strictly applied to the infinite time-horizon case because there is no last period at which to start. G page 71, and HHV, describe Rubinstein’s rigorous proof for the infinite period case which does not depend on this limiting case argument. HHV also discusses other solutions to the bargaining game, including Nash’s solution (not to be confused with a Nash equilibrium), and shows that the Nash and Rubinstein solutions to be same. HHV also provide an interesting discussion of the strengths and weakness of Rubinstein’s solution.

34 3.6 Repeated Games.

3.6.1 The General Idea

A subset of dynamic games is that of repeated games in which players face the same ‘stage game’ or ‘constituent’ game in every period. In particular the ‘model’ or physical environment is the same in each period; this rules out games involving the build up of stocks over time or any learning process. The repeated games we study have the following features:

1. The moves made by players in each stage game are simultaneous.

2. The game is played indefinitely; ie the game is an infinitely repeated game.

3. Each player’s overall payoff from a sequence of moves with stage game payoffs π1, π2, π3, ··· , is measured as the weighted average:

∑∞ 2 t−1 U = (1 − δ)(π1 + δπ2 + δ π3 + ··· ) = (1 − δ) δ πt (3.6) t=1 where δ is the discount factor. The payoff defined by (3.6) has the convenient property that if πt = π then U = π. Since this game is identical to one with sequential moves within each stage game where players do not observe each other’s move, it is referred to as a game of imperfect information. The assumption of infinitely repeated games is not as restrictive as it sounds. Consider the following exercise:

Worked Example 3 Suppose that the rate of discount of each player is r and the game ends with a probability p. If utility after the game has ended is 0, derive an effective discount factor δ to represent the average payoff in equation (3.6)

The general idea that runs through this topic is whether threats or promises about future behavior can influence current behavior in repeated games. In partic- ular can efficient outcomes such as (silence, silence) in the prisoners’ dilemma be enforced by such threats or promises, and can the latter be made ‘credible’ in a rigorous sense?

3.6.2 The Infinitely repeated Prisoners’ Dilemma

Consider the stage game in normal form:

35 Prisoner 2 SILENT CONFESS Prisoner 1 SILENT (−3, −3) (−20, −1) CONFESS (−1, −20) (−10, −10) where we recall that the utility in the stage game is minus the sentence. As the game stands, the overall payoff from infinite repetitions is

U = (1 − δ)(1 + δ + δ2 + ··· ) × (−10) = −10 since (Confess, Confess) =(C,C) say, is the equilibrium in strategies and the equi- librium outcome in every stage game. The central problem facing this game and all the other games studied in this topic is: can cooperation be enforced as a non-cooperative equilibrium of the infinitely repeated game?

Enforcing Cooperation as an Equilibrium The key to the solution is that of a trigger strategy. Denote the efficient outcome of the stage game, (Silence, Silence), by (S,S). Consider the following strategy:

Play S in the first stage of the game In the tth stage if the outcome of all previous t − 1 stages was (S, S) , play S. Otherwise play C in the tth stage and in all subsequent stages.

This strategy is called a trigger or ‘punishment’ strategy because player i cooperates (plays S) until the other player deviates from its cooperative move S. This triggers a switch to noncooperation for ever (’capital punishment’). Punishment strategies which last for a given finite number of periods are also possible, and these will be studied in the context of the monetary policy game. We need to examine:

1. The outcome of the game.

2. Whether the pair of strategies constitutes a Nash equilibrium of the game as a whole.

3. Whether the punishment threats are credible. For this we need a new equi- librium concept: a subgame perfect equilibrium (SPE).

36 Definition of Subgame Perfection

A Nash Equilibrium is subgame-perfect if the players’ strategies constitute a Nash equilibrium in every subgame.

1 EquilibriumPath C S

2 2 C S C S b b b b 1 1 1 1 C S C S C S C S

CCS S C S C SC S C SC S C S b b b b b b b bb bb bb b b b

Figure 3.4: Extensive Form of the Repeated Prisoners’ Dilemma.

Consider the Prisoners’ Dilemma in extensive form. The first two stages of the game are shown in extensive form in figure 4. Note that we can represent the game as sequential where simultaneous moves introduces imperfect information. Decision nodes within the same information set are connected by a dotted line. In- formation sets with only one decision node are referred to a singleton information sets.

(1) The outcome. Clearly since the players start the game by playing (S,S) this cooperative outcome will be played thereafter.

(2) To show the trigger strategy is a Nash equilibrium. We need to show that if player i adopts the strategy then it is player j’s best response to also adopt the strategy.

37 (a) Suppose player i has deviated from S at some previous stage. Since the trigger strategy calls for player i to play C forever player, j also plays C because (C,C) is a Nash equilibrium of the stage game and therefore C is the best response by j. (b) Suppose i has not deviated from S. Player j then may either play S and receive a payoff:

(1 − δ)(−3 − 3δ − 3δ2 − 3δ3 + ··· ) = −3 (A) or player j may deviate to C in the current period with player i still playing S, and plays C thereafter along with player i. Since (C,C) is a Nash equilibrium of the stage game both players are responding to each other in an optimal fashion. The payoff is then:

(1 − δ)(−3 − 10δ − 10δ2 + ··· ) = −(1 − δ) − 10δ (B)

Then S is a best response by j iff payoff (A) > payoff (B) ie

−3 > −(1 − δ) − 10δ, i.e. δ > 2/9 By symmetry S is also a best response by j. We conclude that for a sufficiently high discount factor δ > 2/9, the strategies constitute a Nash equilibrium and the outcome is (S, S) at all times (including the first move) along the equilibrium path shown in figure 4.

(3) To show the strategies constitute a SPE We have examined the incentive to deviate along the equilibrium path at subgames beginning at nodes a,a,.... The other class of subgames begin at nodes b,b,... after at least one deviation from the overall trigger strategy. Since (C,C) is a Nash equilibrium of the stage game, the trigger strategy is a Nash equilibrium in every one of these subgames. Hence, play S along a,a,a,....; otherwise play C, constitutes a SPE. In short the threat to deviate if deviation has occurred previously, which we have shown deters any deviation (if δ > 2/9), is credible.

3.6.3 Collusion Between Two Duopolists

Recall the duopoly game: firms produce amounts of a homogeneous output q1 and q2. Total industry output is given by

Q = q1 + q2 (3.7) and the industry demand curve is given by

P (Q) = a − bQ = a − b(q1 + q2) (3.8)

38 For each firm assume a constant returns to scale cost function:

Ci(qi) = ciqi; i = 1, 2 (3.9)

This completes the model. For equal firms the Cournot-Nash Equilibrium is a − c qNE = (3.10) 3b whereas the ‘equal shares’ monopoly output3 a − c qM = (3.11) 4b for each firm. The corresponding profit levels are:

(a − c)2 (a − c)2 π = ; π = (3.12) NE 9b M 8b We now examine the possibility of enforcing the efficient output level (from the firms’ point of view) by means of the following trigger strategy:

Produce qM in the first period. In the tth period produce qM if both firms have produced qM in each of the previous (t − 1) periods; otherwise produce qNE.

We can show that if the discount factor is sufficiently high then this trigger strategy will enforce collusion as a SPE of a non-cooperative game (see tutorial and Gibbons, page 102).

Worked Example 4 If firm i plays qM show that the profit-maximizing output of firm j is given by qD = 3(a − c)/(8b) with deviation profits given by πD = 9(a − c)2/(64b) > πM .

3.6.4 The Barro-Gordon (1983) Monetary Policy Game We have studied the stage game played between the monetary authority (i.e., central bank) and the private sector in an economic environment described by the expec- tations augmented Phillips Curve (EAPC) economy:

y = y + β(π − πe) (3.13) where y is output (in logs), y is the ‘natural rate’ of output, π, πe is the inflation rate and expectations of the inflation rate respectively. The authorities have an

3Strictly speaking this would not be agreed as a result of Rubinstein bargaining. For δ close to unity however, the latter gets very close to equal shares.

39 ambitious target for outputy ˆ > y and a social welfare function (i.e., payoff) per period of the form

U = −a(y − yˆ)2 − π2 (3.14) which is a utility function with bliss points at y =y ˆ and π = 0. The Nash equilibrium is obtained as:

π = πe = aβ(ˆy − y); y = y (3.15) which is clearly Pareto-inefficient. Both players, the CB and the private sector, can be made better off if the CB set inflation at zero and the private sector forecasts correctly. This is the social optimum (SO):

π = πe = 0; y = y (3.16) Compared with the social optimum the Nash equilibrium gives an ‘inflationary bias’ without any corresponding output gain. The CB could announce and intend to follow the SO; but given πe = 0 it is optimal for the CB to set inflation along its reaction function and not implement π = 0. In a complete information Nash equi- librium the private sector anticipates all this so πe is given by the Nash equilibrium, in which case it is optimal for the CB also to set π at the Nash equilibrium value. This is the credibility problem facing the CB. How can the credibility problem be solved? This is formally equivalent to the problem facing prisoners in the prisoners’ dilemma and duopolists trying to enforce collusion, and employers facing the shirking problem. The solution is through the mechanism of the following trigger strategy in a repeated game. Let π = πNE be the Nash equilibrium of the game. The trigger strategy that will enforce the efficient outcome (zero inflation) is then:

The CB: Play π = 0 in the first period and thereafter if the history of play is π = πe = 0. Otherwise play πNE.

The Private Sector: Play πe = 0 in the first period and thereafter if the history of play is π = πe = 0. Otherwise play πe = πNE.

Does this strategy constitute a subgame perfect Nash equilibrium? To show this we proceed as before to prove (1) the strategies constitute a Nash equilibrium; (2) they are subgame perfect. The crucial part of the proof is the proof of (1) and the ‘no deviation’ condition for the CB. Consider the game with CB payoff (3.14) which we write in the form U(π, πe). Suppose that the private sector chooses to play its trigger strategy. Is it optimal for the CB to do likewise? If it does the discounted payoff over time is

40 U(0, 0) U(0, 0)[1 + δ + δ2 + ...] = (3.17) 1 − δ The alternative given πe = 0 (since the private sector is playing its trigger strategy) is to deviate by playing π = πd. The optimal deviation is along its reaction function (3.10) is

βa(ˆy − y) πd = (3.18) 1 + β2a which leads to a payoff U(πd, 0) for one period. Thereafter the private sector plays its trigger strategy and we move into the punishment phase of the game. With πe = πNE the optimal policy for the CB is to play π = πNE too. The discounted utility following deviation is therefore

U(πd, 0) + U(πNE, πNE)[δ + δ2 + ...] = U(πd, 0) + U(πNE, πNE)[δ/(1 − δ)] (3.19)

The crucial no deviation condition for a SPE is then that (3.17) ≥ (3.19) (see (2.3.9) in Gibbons). Tutorial example 6 asks you to complete the details of this no deviation condition. The course assignment then requires you to show that both conditions (1) and (2) above for a SPE are met.

41 42 Chapter 4

GAMES OF INCOMPLETE INFORMATION

4.1 Static Games of Incomplete Information

We now turn to games of incomplete information or Bayesian games. We first restrict ourselves to static games. The idea is best illustrated by the following example: Cournot Duopoly under Asymmetric Information Consider example 2 of the first tutorial sheet: two duopolists produce a homo- geneous good and choose outputs q1 and q2. The market-clearing price is given by P = a − b(q1 + q2). The firms are not identical in that cost functions are different and given by Ci(qi) = ciqi, i = 1, 2 for the two firms. Let b = 1 (as in Gibbons) to simplify things. Suppose there are many ‘types’ of firm distinguished by their efficiency. Efficient firms produce at low cost with low marginal cost c. Inefficient firms have a high marginal cost. Each firm’s type is known to itself but not necessarily to other firms. In our duopoly example suppose that firm 1’s type is known to firm 2 i.e. c1 = c where c is common knowledge, but firm 2’s type is not known to firm 1. In general firm 2 could be one of many types, but we follow Gibbons and simplify the problem to one where firm 2 is one of only two types with either c2 = cL or cH (low or high marginal cost). Firm 2 knows its type but firm 1 has ‘beliefs’ that c2 = cH with probability θ and c2 = cL with probability 1 − θ. Now consider the formal definitions of a static and a Bayesian Nash Equilibrium: Definitions: The normal-form representation of an n-player static Bayesian game specifies the players’:

• action spaces A1,A2,...,An

• their type spaces T1,T2,...,Tn. Player i’s type tiϵTi is known by player i.

43 • beliefs of player i pi1, pi2, . . . , pin about the other players’ types.

• their payoffs u1, u2, . . . , un where ui = ui(a1, a2, . . . , an : ti); aiϵAi

∗ ∗ ∗ Then the strategies (s1, s2, . . . , sn) are a pure-strategy Bayesian Nash Equilib- rium (BNE) if for each player i and for each of its types it maximizes its expected payoff, given the strategies of other players where expectations are formed over all other players’ types based on player i’s beliefs. In our example aiϵAi is the choice of output qiϵ[0, ∞], T1 = {c},T2 = {cL, cH }, p12 is the discrete probability distribution, Pr(t2 = low cost type ) = (1 −θ) or Pr(t2 = high cost type ) = θ. Payoffs are π1(q1, q2; c) for firm 1 and π2(q1, q2; cL) for firm 2 if it is of type t2 = cL and π2(q1, q2; cH ) if it is of type t2 = cH . Now let us solve the asymmetric duopoly game for a Bayesian Nash equilibrium. ∗ ∗ In a Nash equilibrium we seek output q1 for firm 1 and q2(cL) for firm 2 of low cost ∗ type, and q2(cH ) of high cost type. Neither of these types face uncertainty regarding ∗ firm 1 and choose q2(cH ) to solve − ∗ − − max w.r.t. π2 = q2[(a q1 q2) cH ]q2 (4.1) ∗ given q1, for the high cost type and − ∗ − − max w.r.t. π2 = q2[(a q1 q2) cL]q2 (4.2) ∗ given q2, for the low cost type. Firm 1 does not know its competitor’s type and must there maximize its expected payoff over the two possible types of firm 2. i.e. it solves E − − ∗ − − − − ∗ − max w.r.t. q1 (π1) = θ[(a q1 q2(cH ) c]q1 +(1 θ)[a q1 q2(cL) c]q1 (4.3) ∗ ∗ given q2(cH ) and q2(cL). The first order conditions for these three optimization problems are: ∗ − ∗ − q2(cH ) = (a q1 cH )/2 (4.4) ∗ − ∗ − q2(cL) = (a q1 cL)/2 (4.5) ∗ − ∗ − − − ∗ − q1 = [θ(a q2(cH ) c) + (1 θ)(a q2(cL) c)]/2 (4.6) ∗ ∗ ∗ This gives three equations for q2(cH ), q2(cL) and q1. Solving gives the following Bayesian Nash Equilibrium ∗ − − − q2(cH ) = (a 2cH + c)/3 + (1 θ)(cH cL)/6 (4.7) ∗ − − − q2(cL) = (a 2cL + c)/3 θ(cH cL)/6 (4.8) ∗ − − q1 = [a 2c + θcH + (1 θ)cL]/3 (4.9)

44 Worked Example 1 Compare this with the complete information Nash Equilibrium:

∗ − ̸ qi = (a 2ci + ci)/3; i = 1, 2; i = j (4.10)

What do you notice? Work out the conditions for each firm to produce positive output.

4.2 The Principal-Agent Problem and the Market for Lemons

The general framework economists use for analyzing asymmetric information is the principal-agent model. This is a game where the two players are the principal and the agent, usually considered as representative of some group. In the archetypal principal-agent model, the principal hires an agent to perform some task, but this is generalised to consider any relationship between players with different objectives. The crucial feature is the possible existence of asymmetric information: the agent acquires an informational advantage about, for example, his ‘type’ (eg, his payoff), or his actions, or the outside world (eg, the ’moves by nature’). The principal is the relatively uninformed player and the agent is the relatively informed player. Now distinguish between two departures from complete and perfect information. Players may not observe all the moves of other players: this is a game of imperfect information and is a hidden action problem. Players may not know the type of player they are facing: this is a game of incomplete information and is called a hidden information problem. Moral Hazard and Adverse Selection refer to principal-agent problems. If the principal cannot observe the actions of the agent, ie there is hidden action, this is a moral hazard problem. If the principal does not know the agent’s type, ie there is hidden information, this is an adverse selection problem. Insurance involves both problems: the agent is the policyholder who insures against theft. The hidden action is the care taken to avoid theft which is not observed by the insurance company (the principal). Health insurance involves adverse selection. Both healthy and unhealthy people may take out insurance, but the insurance company may not be able to ob- serve their type. Note that these definition are not consistently used in the literature; for example Rasmusen, section 6.1, adopts a more complicated classification.

4.2.1 Adverse Selection and the Market for ‘Lemons’ Consider a second-hand car market in which for simplicity we assume there are only two types of cars: poor quality cars (‘lemons’) or good quality cars (‘plums’). Under complete information cars sell at their average cost, c1 for plums and c2 for lemons where high quality plums cost more than low quality lemons; i.e., c1 > c2.

45 Now assume incomplete information in which the quality of the car is not ob- served. Formally the buyer is the principal who does not know whether the seller, the agent, is a type who sells lemons or plums. Suppose that a proportion 0 < b < 1 of cars are lemons and (1 − b) are plums and this is common knowledge. Assume that for a car produced at cost c, which represents the value to the consumer, and a price p. The typical buyer then has a utility U = c − p if it makes the purchase and a utility zero otherwise. The problem is that the cost c is not observed. Following the pioneers of game theory the buyer adopts a von Neumann- Morgenstern expected utility function:

E(U) = E(c) − p = bc2 + (1 − b)c1 − p (4.11) for a given observed price. Then the price must satisfy p ≤ E(c) for the buyer to benefit from the exchange. But at this price since ⌋ < c1, p < c1 and only lemons will be on offer! Sellers will not be willing to part with plums for less than c1 and only lemons will get sold. The game tree below illustrates the game: the dotted line indicates a decision point where the player has incomplete information; the double lines indicate two possible equilibria. The equilibria are (i) ‘nature’ chooses the lemon with probability b. The seller offers a maximum price which is accepted by the buyer. (ii) ‘nature’ chooses the plum with probability 1 − b. The seller offers a maximum price which is rejected by the buyer. The source of this market failure is incomplete information and an externality between the sellers of good and bad cars. This is our first example of adverse selection

4.3 Baysian Learning using Bayes’ Rule

In the market for lemons, the buyer (the less informed player or the principal) makes the offer first and there is no scope for learning about the seller’s type. If the seller (the relatively informed player) made an offer this would be a signal that the buyer could use to draw inferences about it’s type. This leads to signalling games, the subject of the rest of this topic. Let us now consider beliefs in more detail. Player i’s belief describes i’s uncer- tainty about the n−1 other players’ types. In in our familiar notation this is written: (t1, t2, . . . , ti−1, ti+1, . . . , tn) = t−i. The Bayesian Nash Equilibrium is Bayesian only in the sense that players form beliefs of other players’ types conditional on their own type, pi(t−i|ti) , according to Bayes’ rule. Bayes’ rule is a rational way to update beliefs. Suppose that a player starts with a particular prior belief of an event A, p(A) say. She then observes another event B say which provides more information to enable her to update to the posterior belief p(A|B) i.e., the probability of A conditional on observing B. To do this she uses p(B|A) which is the likelihood of observing B given that A actually occurred and

46 Nature lemon (probabilityb) plum (probability1-b) B B

offerp=E(c) offerp=E(c)

S accept S ³ accept p c1 reject ³ reject p < c p c2 < 1 p c2

Figure 4.1: Game Tree for the Market for Lemons

p(B|Ac) which is the probability of observing A given that A has not occurred (note that Ac is the complement of A i.e., ‘not A’). Now consider the probability of both A and B occurring p(A, B) (sometimes written p(A ∩ B) ). We know that

Pr(A, B) = Pr(A|B) Pr(B) = Pr(B|A) Pr(A) (4.12)

Note that (4.12) is no more than the definitions of conditional probabilities p(A|B),P (B|A). From (4.12) we have that Pr(B|A) Pr(A) Pr(A|B) = (4.13) Pr(B) which is so-called Bayes Rule. The denominator p(B) is found by noting that

Pr(B) = Pr(B|A) Pr(A) + Pr(B|Ac) Pr(Ac) (4.14)

The player then uses her prior p(A) and likelihoods p(B|A), p(B|Ac) to form an update p(A|B).

47 To see the usefulness of Bayes’ Theorem consider an example which I take from Binmore, ‘Fun and Games’. In a multiple choice test a candidate chooses from m possibilities for a particular answer. Each candidate is of two types: either he is completely ignorant, and chooses at random, or he is a ‘good’ candidate and knows the correct answer. The examiner has a belief that the proportion of good candidates is g. (We take this as given and don’t go into the reasons for this belief). Suppose a particular candidate got the correct answer. What is the probability that he is in fact an ignorant candidate and was guessing? The prior that a particular candidate is ignorant p(ignorant) is 1 − g. The examiner can now use Bayes’ rule to update her prior: p(correct|ignorant)p(ignorant) p(ignorant|correct) = p(correct) The examiner knows p(correct|ignorant) = 1/m . The denominator is found from 1 p(correct) = p(correct|ignorant)p(ignorant)+p(correct|good)p(good) = (1−g)+1·g m noting that p(correct|good) = 1 . Hence 1 − g p(ignorant|correct) = < 1 − g 1 − g + gm which shows that the Bayesian rule will decrease the prior probability of ignorance if the candidate answers correctly. (To get some feel for this result calculate the result with numbers). For most static Bayesian games we assume that the players’ types are indepen- dent in which case pi(t−i|ti) = pi(t−i) and Bayes’ rule in its full form is not required. This is also true of the market for lemons with the buyer offering first. But for dynamic games of incomplete information examined in the next section this is no longer the case.

4.4 Dynamic Games of Incomplete Information with Bayesian Learning

This topic introduces our final equilibrium concept: a perfect Bayesian equilibrium (PBE). The equilibrium concept in the regulation game was a simple PBE without Bayesian learning. We now consider the general form of this equilibrium. Gibbons provides an excellent introduction to this concept which you should read carefully. Also go back to my introduction to this course to see how the four equilibrium

48 concepts: Nash, subgame-perfect Nash, Bayesian Nash and perfect Bayesian Nash a apply to different classes of games. The table from this introduction is reproduced below.

Information Structure Time Aspect Equilibrium Concept Complete Static Nash Complete Dynamic Subgame-Perfect Incomplete Static Bayesian Nash Incomplete Dynamic Perfect Bayesian

4.4.1 Introduction to Perfect Bayesian Equilibria

Consider the example in Gibbons in extensive form (figure 2) and normal form (table 1). This is a game in which player 1 choose from an action space A1 = {L, M, R}. If R is chosen the game ends; otherwise player learns that R was not chosen, but not whether L or M was chosen. Player 2 then chooses from an action space ′ ′ A2 = {L ,R } and the game ends. Note that the dotted line indicates that player 2 does not know which node she is at. R (1,3)

L M

L' R' L' R'

(2,1) (0,0) (0,2) (0,1)

Figure 4.2: The Game in Extensive Form

49 L’ R’ L (2, 1) (0, 0) M (0, 2) (0, 1) R (1, 3) (1, 3)

Table 4.1: The Game in Normal Form

There are two pure-strategy Nash equilibria for this game: (L, L′) and (R,R′). However if player 2 gets a move she will choose L′ because given that player 1 has not chosen R, L′ dominates R′. Therefore player 1 should choose L to ensure a payoff 2 at (L, L′) greater than 1 at (R,R′). The equilibrium (R,R′) can only be reached if player 2 makes the non-credible (in fact incredible) threat to always play R′. Can we then rule out this implausible equilibrium by an appeal to subgame perfectness? The answer is no because a subgame must begin at a singleton infor- mation set (i.e., one with only one decision node) by definition. Since a subgame cannot begin at the game’s first decision node (also by definition) it follows that the game has no subgames. The requirement that the players’ strategies constitute a Nash equilibrium on every subgame is therefore satisfied trivially. For this game the two Nash equilibria are trivially subgame perfect equilibria for the dynamic game. Thus a ‘bad equilibrium’ escapes elimination by what amounts to a technicality. To get around this problem we introduce beliefs regarding which node we have reached. Formally, the concept of a perfect Bayesian equilibrium (PBE) imposes the fol- lowing three requirements:

1. At each information set the player with the move must have a belief regarding which node has been reached.

2. Given their beliefs at each information set the current move and subsequent strategies must be optimal given the beliefs and subsequent strategies of the other players.

3. Beliefs are determined by Bayes’ rule and players’ equilibrium strategies.

Note that Gibbons introduces a 4th requirement that distinguishes between in- formation set which are on the equilibrium paths (and is reached with positive prob- ability if the game is played according to the equilibrium strategies) and information sets off the equilibrium path which are certain not to be reached if equilibrium strate- gies are played. However requirements 1 to 3 capture the essence of the Bayesian Nash equilibrium concept. Returning to the game above do these requirements eliminate the bad equilib- rium? The answer is that requirements 1 and 2 alone, which insist that players have beliefs and act optimally given these beliefs, are sufficient to rule out (R,R′). Suppose following player 1’s move which is nor R, player 2 believes 1 has played L

50 with probability p and M with probability 1 − p. Then her expected payoff from playing L′ is p × 1 + (1 − p) × 2 = 2 − p. Her expected payoff from playing R′ is p × 0 + (1 − p) × 1 = 1 − p. Since 2 − p > 1 − p for all p, player, 2 will never play R′. Hence player 1 will play L and the only perfect Bayesian equilibrium (PBE) is (L, L′). Requirement 3 insists that these beliefs are also optimal which means they must be updated using Bayes’ rule. To illustrate this aspect suppose that there were a mixed-strategy equilibrium in which player 1 plays L with probability q1, M with ′ probability q2 and R with probability 1 − q1 − q2. (In fact (L, L ) is the only PBE so it turns out that q1 = 1, q2 = 0). However given this equilibrium strategy, player 2 would form a belief following a move not R by player 1 given by

p(not R|L)p(L) 1 × q1 q1 p(L|not R) = = = > q1 (because) q1 + q2 < 1 p(not R) q1 + q2 q1 + q2 To really see the ’Bayesian’ aspect of a PBE we need to turn to signalling games. These are principal-agent games where the agent (the well-informed player) makes moves which act as signals for the less-informed player, the principal. Gibbons pro- vides a general description of this class of game (section 4.2A) and works through three applications: Spence’s (1973) model of job-market signalling, Myers and Ma- jluf’s (1984) model of corporate investment and capital structure and Vickers (1986) and Barro’s (1986) model of monetary policy. We focus on the first and last of these applications.

4.4.2 Spence’s Model of Job-Market Signalling Consider the following education game with adverse selection. Firms employ workers of which there are two types, high ability types with output per worker aH > 0 , and low ability types with output per worker 0 < aL < aH . Workers know their own ability but employers do not have this information. However employers do observe education levels e which we assume to take discrete values e = 0 or e = 1. The sequence of moves and the game tree (the game in extensive form) in this game are as follows:

1. Nature chooses the ability of the worker a = aH with probability ν0 and a = aL with probability 1 − ν0 . 2. The worker chooses an education level e = 0 or 1.

3. The employer observes education e, but not ability a. The prior ν0 is updated to ν1 using Bayes’ Rule. The employer then offers a contract contingent on education w = w(e) namely, wL = w(0) for the low education worker and wH = w(1) for the high education worker. 4. The worker accepts or rejects the contract.

51 − 5. Payoffs are profits πfirm = a w for the firm, and the wage minus the cost − of education equal to πworker = w αe/a for the worker, if the contract is accepted, and zero otherwise.

Nature a = a ( probability1 n- ) L 0 = n a aH ( probability 0 )

W W e=0 e=1 e=0 e=1

E E E E w = w )0( = = w = w )1( L wH w )1( wL w )0( H W W W W accept accept accept accept reject reject reject reject

éa - w ù é0ù é a - w ù é ù - L H 0 éa - w ù é0ù é a wH ù é 0 ù ê ú ê ú ê ú ê a ú L ê a ú a - ê ú ê ú ê a ú - ë wL û ë0û - ê ú 0 ê ú êwH ú ë a û ë wL û ë û êw - ú ë a û ë a û ë H a û

Figure 4.3: Game Tree for the Spencer Education Game

The crucial feature of this setup is that the cost of achieving a particular level of education e for the high type of worker equals αe/aH < αe/aL , the corresponding cost for the low type. This means that education sends a signal to the employer which we now show can enable the latter to distinguish high ability from low ability workers. To complete the description of the game we need to add two further features: participation constraints (PC) for the firm and worker, and the incentive compatibil- ity constraint (IC) for the worker. For the firm we assume that competition forces expected outside profits to zero (there is no capital in the model) so that at stage 3 above when the firm offers the wage we have the participation constraint of the firm (PC-firm) as ≥ PC-firm: E(πfirm) 0 (4.15)

52 If the industry in question is also competitive this constraint must bind. The worker knows her type and will only participate if the actual payoff is not negative; ie we have − ≥ PC-worker: πworker = w αe/a 0 (4.16) This completes the description of the game. The appropriate equilibrium concept for this game is the Perfect Bayesian Equi- librium (PBE). Let us consider the Bayesian aspect first. For a particular observed education level e Bayes’ Rule says

ν1(e) = Pr(high ability type|e) Pr(e|high ability type) Pr(high ability type) = Pr(e) Pr(e|high ability type)ν = 0 (4.17) Pr(e|high ability type)ν0 + Pr(e|low ability type)(1 − ν0)

Depending on values for aL, aH and α, we will show that a separating perfect Bayesian equilibrium can exist where high ability workers are educated and low ability workers are not. In such an equilibrium if the employer observes an educated worker e = 1 it can deduce that the worker is of the high ability type. If the employer observes an uneducated worker e = 0 it can deduce that the worker is of the low ability type. Formally we can obtain this using (4.17). If the employer observes a highly edu- cated worker (e = 1) it will substitute Pr(e = 1|high type) = Pr(e = 0|low type) = 1 and Pr(e = 1|low type) = Pr(e = 0|high type) = 0 into (4.17) to give

1 × ν0 ν1(1) = = 1 (4.18) 1 × ν0 + 0 and if it observes a lowly educated worker it will put e = 0 and obtain

0 × ν0 ν1(0) = = 0 (4.19) 0 × ν0 + 1 × (1 − ν0)

Again depending on values for aL, aH and α , we can show that a pooling perfect Bayesian equilibrium can exist where no workers are educated. Then it will substitute Pr(e = 0|high type) = Pr(e = 0|low type) = 1 into (4.17) to give 1 × ν0 ν1(0) = = ν0 (4.20) 1 × ν0 + 1 × (1 − ν0) i.e., no learning will occur. To investigate which of these PBE exist we need to consider the incentives for the worker to invest in education or not in the form of her

53 (IC) Constraint. In a , low ability workers will choose e = 0 and the contract designed for that type wL = w(0) rather than the high education contract, e = 1 and wH = w(1) iff the following IC holds:

IC-low type: wL − α × 0/aL > wH − α × 1/aL (4.21) and will actually accept wL if the following participation constraint holds

PC-low type: wL > α × 0/aL = 0 (4.22) Similarly the high ability workers will choose e = 1 and the contract designed for that type wH = w(1) rather than the high education contract, e = 0 and wL = w(0) iff the following IC holds:

IC-high type: wH − α × 1/aH > wL − α × 0/aH = wL (4.23) and will actually accept wL if the following participation constraint holds

PC-high type: wH > α × 1/aH (4.24)

From the firm’s PC we must have wH = aH and wL = aL . Hence the three conditions for a separating equilibrium become:

IC-low type: aL > aH − α/aL (4.25)

PC-low type: aL > 0 (4.26)

IC-high type: aH − α/aH > aL (4.27)

PC-high type: aH > α/aH (4.28) The IC conditions combine to give

IC Conditions: α/aL > aH − aL > α/aH (4.29) which implies (4.28). Condition (4.26) of course always holds. Hence condition (4.29) is necessary and sufficient for a separating PBE to hold. As the parameter α determining the cost to the worker of acquiring education increases then condition (4.29) is less likely to hold and the separating PBE then does not exist. What happens then? A Pooling Equilibrium in which both types choose zero education exists iff ′ IC-low type: wL − α × 0/aL = wL > wH − α × 1/aL (4.21 )

IC-high type: wL − α × 0/aH = wL > wH − α × 1/aH (4.30)

We have seen that in a pooling equilibrium ν1 = ν0 and no learning takes place. The firms PC constraint then implies that the firm offers the average marginal product to both types of worker

w = wH = wL = ν0aH + (1 − ν0)aL (4.31)

54 This offer clearly satisfies the IC conditions (4.21′) and (4.30). The workers’ PC condition (4.16) is also satisfied in this proposed equilibrium with e = 0. Both types of worker will choose e = 0 in to e = 1, and accept the offer (4.31). A pooling equilibrium in which workers choose zero education and receive the same wage therefore always exists, in addition to a possible separating equilibrium. In the latter case the equilibrium is conditional on parameters satisfying (4.29).

Worked Example 2 Work out the payoffs for the two players for the separating and pooling equilibria and comment on these results.

We have investigated only equilibria where workers adopt pure strategies. In games of incomplete information hybrid strategies may exist where workers choose mixed strategies. Low ability workers may then randomize choosing the low educa- tion wage contract w(0) with probability x and the high education wage contract with probability 1−x. Similarly the high ability type may choose the high education contract with probability y and the low education contract with probability 1 − y. Hybrid strategies feature in our next example of a dynamic game with incomplete information.

4.4.3 A Reputational Model of Monetary Policy Let us return to the Barro-Gordon policy game. We use the simpler of the two central bank payoff functions considered. The game is then summarized by

The model: y = y + β(π − πe) (4.32) where y is output (in logs), y is the ’natural rate’ of output, π , πe is the inflation rate and expectations of the inflation rate respectively. The moves and payoffs are given by:

Player CB PS Moves π πe Payoffs U = ay − (1/2)π2 −(π − πe)2

Table 4.2: The Monetary Policy Game

The Nash equilibrium is then π = πe = βa; y = y. Suppose now that there are two types of government, a ‘tough government who only cares about inflation and has a payoff with a = 0 and a ‘weak’ government who has a > 0. In a Nash equilib- rium of the single-shot game a tough government sets inflation at zero and a weak government sets non-zero inflation as before. The game is now one of incomplete

55 information and we examine the possibility that in repetitions of the one-shot game zero inflation can be a perfect Bayesian outcome. Let pt be the probability assigned by the private sector to the event that the government is of the strong type. Regard pt as a measure of reputation. Consider the following hybrid strategy by the weak government: act as a weak government with probability qt but mimic a strong government with probability 1 − qt. For a strong government πt = 0 whilst for a weak government π = βa . Hence private sector expectations of inflation, given pt and qt are: e × − − × − πt = pt 0 + (1 pt)(qtβa + (1 qt) 0) = βa(1 pt)qt (4.33)

The probability pt is updated by Bayes rule in period t + 1 as follows.

• Suppose that the government plays πt > 0 in period t. Since a strong gov- ernment always plays zero inflation the private sector then knows that the government must be of the weak type and therefore sets pt+1 = 0.

• Suppose that the government plays πt = 0 in period t. The private sector then infers that the government might be strong or it might be a weak government mimicking a strong government with probability 1 − qt . By Bayes rule the first possibility is given by

Pr(πt = 0|tough) Pr(tough) P r(tough|πt = 0) = (4.34) Pr(πt = 0)

On the rhs on (4.34) we know that Pr(tough) = pt (the prior at time t) and that Pr(πt = 0|tough) = 1. The denominator is given by

Pr(πt = 0) = Pr(πt = 0|tough) Pr(tough)+Pr(πt = 0|weak) Pr(weak) = 1×pt+(1−qt)(1−pt) (4.35) Hence if πt = 0 we have the updated posterior probability given by

pt e ̸ pt+1 = > pt if qt = 0 (4.36) 1 − (1 − pt)qt

Otherwise if πt > 0 , pt+1 = 0. What this means is that a government that plays zero inflation builds up a reputation for being tough irrespective of its type as long as there is a non-zero probability that weak governments do not play zero inflation. Now consider the following strategy profile.

• A strong government always plays πt = 0 .

• In period t a weak government acts as a strong government and plays πt = 0 with probability 1−qt or reveals its type with probability qt and plays πt = βa.

56 • At the beginning of period 0 the private sector chooses its prior p0 > 0. In period t the private sector receives the ‘signal’ consisting of inflation from the government. At the end of the period it updates pt the probability it assigns to the government being tough using Bayes rule as set out above and forms expectations of the next period’s inflation using (4.33).

In principle there are three types of equilibria to these games. If both strong and weak governments send the same message (i.e. play zero inflation) we have a pooling equilibrium. If they send different messages this gives a separating equilibrium. If one or more players randomizes with a mixed strategy we have a hybrid equilibrium. Thus in the above game, qt = 0 gives a pooling equilibrium, qt = 1 a separating equilibrium and 0 < qt < 1 a hybrid equilibrium.

Worked Example 3 Set out the first two periods of the game in extensive form.

If qt = 0 is a PBE to this game we have solved the credibility problem. Then the weak government always mimics the tough government and plays zero inflation. Zero inflation can then be sustained even by a weak government. As with our repeated games under complete information it is sufficient to show that given these beliefs by the private sector there is no incentive for a weak government to ever deviate from acting strong. The no deviation condition is derived as follows. Write the payoff of the weak government as

U(π, πe) = aβ(π − πe) − (1/2)π2 (4.37)

Then U(0, 0) = 0,U(βa, 0) = (1/2)β2a2 and U(βa, βa) = −(1/2)β2a2 . Then the discounted payoff from playing strong is

U s = U(0, 0)(1 + δ + δ2 + ...) = U(0, 0)/(1 − δ) (4.38) and the discounted payoff from deviating is

U d = U(βa, 0)+U(βa, βa)(δ +δ2 +δ3 +...) = U(βa, 0)+U(βa, βa)δ/(1−δ) (4.39)

A sufficient condition for no deviation and a PBE is then U s > U d (see tutorial question below). If this condition is satisfied, then the private sector knows that qt = 0 (ie there is no incentive for a weak government to reveal its type) and pt+1 = pt = p0 > 0 . Reputation does not change but as long as p0 > 0, the above profile with qt = 0 is a PBE in which a weak government mimics a strong government and plays zero inflation. We have examined games with an infinite time horizon. For finite time hori- zons the PBE is more complicated. Then a weak government always deviates to

57 high inflation in the last period and reveals its type. In the periods just before the last period it will randomize and at the beginning of the game it mimics a tough government and plays zero inflation with probability one (see Barro (1986) cited in Gibbons).

58 Chapter 5

BOUNDED RATIONALITY: THEORY AND EXPERIMENTS

5.1 Introduction

This section addresses a number of related issues. First there is the alleged Achilles heel of game theory (and indeed economic theory in general), which is the fundamen- tal assumption of rationality and common knowledge. (But recall my defence of the assumption). We now modify this assumption with one of ‘bounded ra- tionality’ which assumes that economic agents learn by trial-and-error. We study adaptation or adjustment or tatonnement or ‘learning-by-doing’ processes. Binmore uses the term libration as a generic term for these equilibrating processes. Second we discuss the experimental examination of the rationality postulate.

5.2 Less than Rational Adjustment Processes

5.2.1 The Cournot Adjustment Process

Recall the Cournot duopoly model: 2 firms produce amounts of a homogeneous output q1 and q2. Let total industry output be given by

Q = q1 + q2 (5.1) and let the industry demand curve be given by a linear relationship

P (Q) = a − bQ = a − b(q1 + q2) (5.2)

59 For each firm assume a identical cost function of producing a given level of output at given factor prices: 2 Ci(qi) = cqi + dqi ; i = 1, 2 (5.3) If d > 0 we have decreasing returns to scale; if d < 0 we have increasing returns to scale. This leads to the following reaction functions:

1 R (q ) = (a − c − bq ) (5.4) 1 2 2(b + d) 2 1 R (q ) = (a − c − bq ) (5.5) 2 1 2(b + d) 1 for firms 1 and 2 respectively, and to a symmetric Nash equilibrium

a − c q∗ = (5.6) 3b + 2d Under the assumption of rationality and common knowledge both duopolists will produce on their reaction functions, and know the other is producing on her reaction function. Both will therefore produce at the symmetric Nash equilibrium, if this is the only equilibrium. In fact we will show that if there are increasing returns to scale there can be other Nash equilibria. Now drop the assumption of common knowledge. Then the duopolists do not are not fully rational in the sense they fail to take into account the rationality of the other producer. However they continue to be profit-maximizers given their expectations of the other producer’s output; i.e., they produce on their reaction functions. An example, taken from Cournot (1838), is the following ad hoc forecasting rule for the other duopolist’s output: in period t duopolist i thinks that duopolist j will produce at the level observed in the previous period. Let qit be the output of producer i at time t. Then with this assumption the dynamic reaction functions become:

1 q = R (q − ) = (a − c − bq − ) (5.7) 1t 1 2t 1 2(b + d) 2t 1 1 q = R (q − ) = (a − c − bq − ) (5.8) 2t 2 1t 1 2(b + d) 1t 1

These two equations describe a dynamic process which describe trajectories q11, q12, q13, ··· for the output of firm 1 and q21, q22, q23, ··· for the output of firm 2. The question is do these trajectories converge? First consider the possibility that the processes will converge to the symmetric Nash Equilibrium q∗. The Nash equilibrium is at the intersection of the reaction functions and therefore satisfies:

60 1 q∗ = (a − c − bq∗) (5.9) 2(b + d) Now subtract (5.9) from (5.10) and (5.11) to get

∗ b ∗ q − q = (q − − q ) (5.10) 1t 2(b + d) 2t 1 ∗ b ∗ q − q = (q − − q ) (5.11) 2t 2(b + d) 1t 1 Hence lagging (5.11) by one period and substituting into (5.10) we arrive a difference equation for q1t

2 ∗ b ∗ q − q = (q − − q ) (5.12) 1t 4(b + d)2 1t 2 ∗ Let xt = q1t − q . Then (5.12) is of the form:

xt = αxt−2 (5.13) where α = b2/(4(b + d)2). This has a solution at time t = T :

T x2T = α x0 (5.14) ∗ where x0 is the initial point of the process. As T → ∞ then xT → 0 and q1T → q , the symmetric Nash equilibrium, iff |α| < 1 ie iff b/(2(a + b)) < 1. For the case of decreasing returns to scale (d > 0) this will always be true. For increasing returns (d < 0), if 2d < −b then α > 1 and the process does not converge to the symmetric Nash equilibrium. The figures below show these two cases.

Figure 1 shows the case 2d > −b, which is certainly true under decreasing returns to scale (d > 0). Reaction functions intersect at the Nash equilibrium q∗ only, and starting at any initial output the Cournot adjustment process converges to this equilibrium. In other words players learn to act as if they had common knowledge in a complete information game. In figure 2, 2d < −b and then increasing returns apply to the extent that the reaction curves ‘swap over’ and intersect at points where one produces nothing, at points r∗ and p∗. There are now three Nash equilibria, two asymmetric Nash ∗ ∗ ∗ equilibria r , p , as well as the symmetric Nash equilibrium q . At points like r0 ∗ ∗ the process converges to r ; at points like p0 to p and at points like s0 it oscillates. The lessons be drawn from this model are: 1. Players can ‘learn to be rational’ in the sense that even with simple and even stupid forecasting rules they eventually end up at a complete information Nash equilibrium

61 q2t Firm1'sreactionfunction

= q20 q0 (q10 ,q20 )

q2 q* q 1 Firm2'sreactionfunction

q10 q1t

Figure 5.1: The Case 2d > −b; q1 = (q11, q21), q2 = (q12, q22).

2. When selecting an equilibrium from possible multiple equilibria, one must look at the learning process, the question of stability and at initial conditions.

5.2.2 A More Rational Adjustment Process Consider the following simultaneous move game in normal form:

Strategies t1 t2 s1 (0, 2) (3, 0) s2 (2, 1) (1, 3)

It is clear that there are no pure Nash strategies. We will now show that there is a Nash equilibrium in mixed strategies in which players randomize. (See also Gibbons, chapter 1, section 1.3). Consider mixed strategies in which player 1 players s1 with probability p and s2 with probability 1−p. Player 2 plays t1 with probability q and t2 with probability 1 − q. The expected payoff for player 1 is then:

q(p × 0 + (1 − p) × 2) + (1 − q)(p × 3 + (1 − p) × 1) = p(2 − 4q) + 1 + q

62 q2t * r = r2 Firm2

r1

s1 = s3 = s5 = .... * r0 q

p1 Firm1 s0 = s2 = s4 = .... p0 * = p p2 q1t

Figure 5.2: The Case 2d < −b.

Maximizing this with respect to p leads to player 1 choosing p = 1 if 2 > 4q( ie q < 1/2) and p = 0 if 2 < 4q( ie q > 1/2). For player 2 the expected payoff is

p(q × 2 + (1 − q) × 0) + (1 − p)(q × 1 + (1 − q) × 3) = q(4p − 2) + 3(1 − p)

Maximizing this with respect to q leads to player 2 choosing q = 1 if 4p > 2( ie p > 1/2) and q = 0 if 4p < 2( ie p < 1/2). Figure 3 shows these mixed-strategy reaction curves in (p, q) space. They intersect at (1/2, 1/2) which is the unique Nash equilibrium in mixed strategies. Suppose now that the game is played repeatedly by players with bounded ratio- nality in the sense that common knowledge of rationality no longer holds. Instead players observe the choices of their opponent and estimates the probabilities p and q (for players 2 and 1 respectively) using the observed frequencies with which these strategies occur. This way of guessing what an opponent will do is certainly not fully rational, but neither is it naive as in the Cournot adjustment process.

63 q 1 Player2'sReactionFunction

1 2 NE A

Player1'sReactionFunction

1 1 P 2

Figure 5.3: Mixed Strategy Reaction Functions and the Nash Equilibrium

Consider the shaded area A in figure 3. Everywhere in region A player 1 plays s1 with probability p = 1 and player 2 plays t1 with probability q = 0 (ie, plays t2). Let p(t) be the frequency with which player 1 has played s1 up to time t. Let q(t) be the frequency with which player 2 has played t1 up to time t. Consider the small interval of time from t to t + τ. If (p(t), q(τ)) lies within A then for τ sufficiently small (p(t + t), q(t + τ)) also lies within A. Let λ be the number of repetitions of the game per unit of time. Up to time t player 1 will then have played s1 λtp(t) times. Between t and t + τ he will always play s1 in region A, λτ times. Hence the revised frequency with which s1 is played is

λtp(t) + λτ tp(t) + τ p(t + τ) = = (5.15) λ(t + τ) t + τ

Hence p(t + τ) − p(t) tp(t) + τ − p(t)(t + τ) 1 − p(t) = = (5.16) τ τ(t + τ) t + τ

Letting t → 0 we have: dp 1 − p(t) = (5.17) dt t

64 By a similar argument since player 2 plays t2 in A we have λtq(t) + λτ × 0 tq(t) q(t + τ) = = (5.18) λ(t + τ) t + τ

Hence q(t + τ) − q(t) tq(t) − q(t)(t + τ) −q(t) = = (5.19) τ τ(t + τ) t + τ Letting t → 0 we have: dq −q(t) = (5.20) dt t giving us two differential equations in p and q. Fortunately they are straightforward to solve: (5.17) can be written

d (tp) = 1 (5.21) dt which by integrating has a solution a tp = t − a; ie, 1 − p = (5.22) t where a is a constant of integration. Similarly (5.20) can be written

d (tq) = 0 (5.23) dt which by integrating has a solution

b tq = b; ie, q = (5.24) t where b is a constant of integration. Then dividing (5.22) by (5.24) we arrive at:

1 − p a = (5.25) q b

As a and b vary, the points (p, q) that satisfy this equation lie on a straight line that passes through the point (1, 0). This describes the learning process for (p, q) in the region A. In region B by a similar argument (p, q) must lie on a straight line passing through (1, 1). Then in region C the trajectory must head for (0, 1) and in D for (0, 0) etc. (see tutorial example 1). It is clear that this process converges to the Nash equilibrium in mixed strategies; again, using a more sophisticated learning process than before, players eventually act as if they had common knowledge in a complete information game.

65 q 1

D C

A B

1 P

Figure 5.4: Adjustment towards the Nash Equilibrium

5.3 The Real World: Experimental Game Theory

See Hargreaves Heap and Varoufakis, chs 7,8. Binmore summarizes the results of experimental game theory by asserting that rationality is a reasonable empirical postulate about human behavior only if

1. The game is simple 2. the games have been played many time before and hence have created the possibility of trial-and-error learning. 3. The incentives for playing well are adequate.

Note, however, these are necessary but not sufficient, so when all three criteria are satisfied, ‘game-theoretic predictions can only realistically be applied with great caution’ (Binmore, page 51). Andreoni and Miller (1993) present experiments designed to examine PBE in the finitely repeated Prisoners’ Dilemma with incomplete information. The single- shot game with complete information is set out in tutorial example 2 below. They

66 suppose that the altruism parameter in this set-up is private information. Their results support the reputation-building predictions of a PBE (and hence indirectly the rationality postulate) and also suggest that altruism does exist.

67