Multi-agent learning

Gerard Vreeswijk, Intelligent Software Systems, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands.

Wednesday 19th May, 2021

Fictitious play: motivation

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide2

Fictitious play: motivation

■ One of the most important, if not the most important,

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide2

Fictitious play: motivation

■ One of the most important, if not the most

important, representative of a

.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide2

Fictitious play: motivation

■ One of the most important, if not the most

important, representative of a

. ■ Rather than considering your own payoffs,

.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide2 Fictitious play: motivation

■ One of the most important, if not the most

important, representative of a

. ■ Rather than considering your own payoffs,

. ■ Project the behaviour of an opponent onto a

, s.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide2 Fictitious play: motivation

■ One of the most important, if not the most

important, representative of a

. ■ Rather than considering your own payoffs,

. ■ Project the behaviour of an opponent onto a

, s. ■ Then issue a to s.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide2 Fictitious play: motivation

■ One of the most important, if not the most

important, representative of a

. ■ Rather than considering your own payoffs,

. ■ Project the behaviour of an opponent onto a

, s. ■ Then issue a best response to s. ■ Brown (1951): explanation for play.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide2 Fictitious play: motivation

■ One of the most important, if not the most

important, representative of a

. ■ Rather than considering your own payoffs,

. ■ Project the behaviour of an opponent onto a

, s. ■ Then issue a best response to s. ■ Brown (1951): explanation for Nash equilibrium play. In terms of current use, the name actually is a bit of a misnomer, since play actually occurs (Berger, 2005).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide2 Plan for today

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide3 Plan for today

Part I. Best reply

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide3 Plan for today

Part I. Best reply strategy 1. Pure fictitious play.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide3 Plan for today

Part I. Best reply strategy 1. Pure fictitious play. 2. Results that connect pure fictitious play to Nash equilibria.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide3 Plan for today

Part I. Best reply strategy 1. Pure fictitious play. 2. Results that connect pure fictitious play to Nash equilibria.

Part II. Extensions and approximations of fictitious play

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide3 Plan for today

Part I. Best reply strategy 1. Pure fictitious play. 2. Results that connect pure fictitious play to Nash equilibria.

Part II. Extensions and approximations of fictitious play 1. Smoothed fictitious play.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide3 Plan for today

Part I. Best reply strategy 1. Pure fictitious play. 2. Results that connect pure fictitious play to Nash equilibria.

Part II. Extensions and approximations of fictitious play 1. Smoothed fictitious play. 2. Exponential regret matching.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide3 Plan for today

Part I. Best reply strategy 1. Pure fictitious play. 2. Results that connect pure fictitious play to Nash equilibria.

Part II. Extensions and approximations of fictitious play 1. Smoothed fictitious play. 2. Exponential regret matching. 3. No-regret property of smoothed fictitious play (Fudenberg et al., 1995).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide3 Plan for today

Part I. Best reply strategy 1. Pure fictitious play. 2. Results that connect pure fictitious play to Nash equilibria.

Part II. Extensions and approximations of fictitious play 1. Smoothed fictitious play. 2. Exponential regret matching. 3. No-regret property of smoothed fictitious play (Fudenberg et al., 1995). 4. Convergence when players have limited resources (Young, 1998).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide3 Plan for today

Part I. Best reply strategy 1. Pure fictitious play. 2. Results that connect pure fictitious play to Nash equilibria.

Part II. Extensions and approximations of fictitious play 1. Smoothed fictitious play. 2. Exponential regret matching. 3. No-regret property of smoothed fictitious play (Fudenberg et al., 1995). 4. Convergence when players have limited resources (Young, 1998).

Shoham et al. (2009): Multi-agent Systems. Ch. 7: “Learning and Teaching”. H. Young (2004): Strategic Learning and it Limits, Oxford UP. D. Fudenberg and D.K. Levine (1998), The Theory of Learning in Games, MIT Press.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide3 Part I: Pure fictitious play

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide4 Repeated

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R*

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0) 2. R L

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0) 2. R L (1,1)(1,1)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0) 2. R L (1,1)(1,1) 3. L* R*

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0) 2. R L (1,1)(1,1) 3. L* R* (1, 2)(2,1)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0) 2. R L (1,1)(1,1) 3. L* R* (1, 2)(2,1) 4. R L

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0) 2. R L (1,1)(1,1) 3. L* R* (1, 2)(2,1) 4. R L (2,2)(2,2)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0) 2. R L (1,1)(1,1) 3. L* R* (1, 2)(2,1) 4. R L (2,2)(2,2) 5. R* R*

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0) 2. R L (1,1)(1,1) 3. L* R* (1, 2)(2,1) 4. R L (2,2)(2,2) 5. R* R* (2, 3)(2, 3)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0) 2. R L (1,1)(1,1) 3. L* R* (1, 2)(2,1) 4. R L (2,2)(2,2) 5. R* R* (2, 3)(2, 3) 6. R R

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0) 2. R L (1,1)(1,1) 3. L* R* (1, 2)(2,1) 4. R L (2,2)(2,2) 5. R* R* (2, 3)(2, 3) 6. R R (2, 4)(2, 4)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0) 2. R L (1,1)(1,1) 3. L* R* (1, 2)(2,1) 4. R L (2,2)(2,2) 5. R* R* (2, 3)(2, 3) 6. R R (2, 4)(2, 4) 7. R R

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0) 2. R L (1,1)(1,1) 3. L* R* (1, 2)(2,1) 4. R L (2,2)(2,2) 5. R* R* (2, 3)(2, 3) 6. R R (2, 4)(2, 4) 7. R R (2, 5)(2, 5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5 Repeated coordination game

Players receive a positive payoff iff they coordinate. This game possesses three Nash equilibria, viz. (0,0), (0.5, 0.5), and (1,1). * = chosen randomly.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0,0)(0,0) 1. L* R* (0, 1)(1,0) 2. R L (1,1)(1,1) 3. L* R* (1, 2)(2,1) 4. R L (2,2)(2,2) 5. R* R* (2, 3)(2, 3) 6. R R (2, 4)(2, 4) 7. R R (2, 5)(2, 5) ......

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide5

Some types of Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6

Some types of Nash equilibria

■ . No party wins by unilateral deviation.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6

Some types of Nash equilibria

■ . No party wins by unilateral deviation.

■ . Every party loses by unilateral deviation

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6

Some types of Nash equilibria

■ . No party wins by unilateral deviation.

■ . Every party loses by unilateral deviation, i.e. B(s) = s . { }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6

Some types of Nash equilibria

■ . No party wins by unilateral deviation.

■ . Every party loses by unilateral deviation, i.e. B(s) = s . { } Opposite: .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6

Some types of Nash equilibria

■ . No party wins by unilateral deviation.

■ . Every party loses by unilateral deviation, i.e. B(s) = s . { } Opposite: .

■ . Every party maintains a pure strategy.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6

Some types of Nash equilibria

■ . No party wins by unilateral deviation.

■ . Every party loses by unilateral deviation, i.e. B(s) = s . { } Opposite: .

■ . Every party maintains a pure strategy. Opposite:

(one, more, all may be pure), and (all mixed).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6

Some types of Nash equilibria

■ . No party wins by unilateral deviation.

■ . Every party loses by unilateral deviation, i.e. B(s) = s . { } Opposite: .

■ . Every party maintains a pure strategy. Opposite:

(one, more, all may be pure), and (all mixed).

■ . No coalition wins by unilateral deviation.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6

Some types of Nash equilibria

■ . No party wins by unilateral deviation.

■ . Every party loses by unilateral deviation, i.e. B(s) = s . { } Opposite: .

■ . Every party maintains a pure strategy. Opposite:

(one, more, all may be pure), and (all mixed).

■ . No coalition wins by unilateral deviation. Implies Pareto-optimality.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6

Some types of Nash equilibria

■ . No party wins by unilateral deviation.

■ . Every party loses by unilateral deviation, i.e. B(s) = s . { } Opposite: .

■ . Every party maintains a pure strategy. Opposite:

(one, more, all may be pure), and (all mixed).

■ . No coalition wins by unilateral deviation. Implies Pareto-optimality. Opposite: ? (Not weak: a Nash equilibrium can be both strong and weak, either, or neither.)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6

Some types of Nash equilibria

■ . No party wins by unilateral deviation.

■ . Every party loses by unilateral deviation, i.e. B(s) = s . { } Opposite: .

■ . Every party maintains a pure strategy. Opposite:

(one, more, all may be pure), and (all mixed).

■ . No coalition wins by unilateral deviation. Implies Pareto-optimality. Opposite: ? (Not weak: a Nash equilibrium can be both strong and weak, either, or neither.)

■ . No party wins by small unilateral deviation, moreover the the one who deviates then loses.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6 Some types of Nash equilibria

■ . No party wins by unilateral deviation.

■ . Every party loses by unilateral deviation, i.e. B(s) = s . { } Opposite: .

■ . Every party maintains a pure strategy. Opposite:

(one, more, all may be pure), and (all mixed).

■ . No coalition wins by unilateral deviation. Implies Pareto-optimality. Opposite: ? (Not weak: a Nash equilibrium can be both strong and weak, either, or neither.)

■ . No party wins by small unilateral deviation, moreover

the the one who deviates then loses. Opposite: .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6 Some types of Nash equilibria

■ . No party wins by unilateral deviation.

■ . Every party loses by unilateral deviation, i.e. B(s) = s . { } Opposite: .

■ . Every party maintains a pure strategy. Opposite:

(one, more, all may be pure), and (all mixed).

■ . No coalition wins by unilateral deviation. Implies Pareto-optimality. Opposite: ? (Not weak: a Nash equilibrium can be both strong and weak, either, or neither.)

■ . No party wins by small unilateral deviation, moreover

the the one who deviates then loses. Opposite: . There are many more equilibrium types (semi-strict, locally stable, ǫ-equilibrium, correlated, coarse correlated, . . . ).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6 Some types of Nash equilibria

■ . No party wins by unilateral deviation.

■ . Every party loses by unilateral deviation, i.e. B(s) = s . { } Opposite: .

■ . Every party maintains a pure strategy. Opposite:

(one, more, all may be pure), and (all mixed).

■ . No coalition wins by unilateral deviation. Implies Pareto-optimality. Opposite: ? (Not weak: a Nash equilibrium can be both strong and weak, either, or neither.)

■ . No party wins by small unilateral deviation, moreover

the the one who deviates then loses. Opposite: . There are many more equilibrium types (semi-strict, locally stable, ǫ-equilibrium, correlated, coarse correlated, ...). Which equilibrium types imply which other equilibrium types is an interesting question.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide6 Steady states are pure (but possibly weak) NE

Definition (Steady state). Let S by a dynamic process. A state s∗ is said

to be (or , or )

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide7 Steady states are pure (but possibly weak) NE

Definition (Steady state). Let S by a dynamic process. A state s∗ is said

to be (or , or ) if it is the case that whenever S is in state s∗, it remains so.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide7 Steady states are pure (but possibly weak) NE

Definition (Steady state). Let S by a dynamic process. A state s∗ is said

to be (or , or ) if it is the case that whenever S is in state s∗, it remains so.

Theorem. Suppose a pure strategy profile is a steady state of fictitious play.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide7 Steady states are pure (but possibly weak) NE

Definition (Steady state). Let S by a dynamic process. A state s∗ is said

to be (or , or ) if it is the case that whenever S is in state s∗, it remains so.

Theorem. Suppose a pure strategy profile is a steady state of fictitious play. Then it must be a (possibly weak) pure Nash equilibrium in the stage game.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide7 Steady states are pure (but possibly weak) NE

Definition (Steady state). Let S by a dynamic process. A state s∗ is said

to be (or , or ) if it is the case that whenever S is in state s∗, it remains so.

Theorem. Suppose a pure strategy profile is a steady state of fictitious play. Then it must be a (possibly weak) pure Nash equilibrium in the stage game. Proof .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide7 Steady states are pure (but possibly weak) NE

Definition (Steady state). Let S by a dynamic process. A state s∗ is said

to be (or , or ) if it is the case that whenever S is in state s∗, it remains so.

Theorem. Suppose a pure strategy profile is a steady state of fictitious play. Then it must be a (possibly weak) pure Nash equilibrium in the stage game. Proof . Suppose a =(a1,..., an) is a steady state.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide7 Steady states are pure (but possibly weak) NE

Definition (Steady state). Let S by a dynamic process. A state s∗ is said

to be (or , or ) if it is the case that whenever S is in state s∗, it remains so.

Theorem. Suppose a pure strategy profile is a steady state of fictitious play. Then it must be a (possibly weak) pure Nash equilibrium in the stage game. Proof . Suppose a =(a1,..., an) is a steady state. Let 1 i n be the ≤ ≤ index of an arbitrary player.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide7 Steady states are pure (but possibly weak) NE

Definition (Steady state). Let S by a dynamic process. A state s∗ is said

to be (or , or ) if it is the case that whenever S is in state s∗, it remains so.

Theorem. Suppose a pure strategy profile is a steady state of fictitious play. Then it must be a (possibly weak) pure Nash equilibrium in the stage game. Proof . Suppose a =(a1,..., an) is a steady state. Let 1 i n be the ≤ ≤ i index of an arbitrary player. At every round, i’s opponent model is a− .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide7 Steady states are pure (but possibly weak) NE

Definition (Steady state). Let S by a dynamic process. A state s∗ is said

to be (or , or ) if it is the case that whenever S is in state s∗, it remains so.

Theorem. Suppose a pure strategy profile is a steady state of fictitious play. Then it must be a (possibly weak) pure Nash equilibrium in the stage game. Proof . Suppose a =(a1,..., an) is a steady state. Let 1 i n be the ≤ ≤ i index of an arbitrary player. At every round, i’s opponent model is a− . By definition of fictitious play, player i must have played a best response

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide7 Steady states are pure (but possibly weak) NE

Definition (Steady state). Let S by a dynamic process. A state s∗ is said

to be (or , or ) if it is the case that whenever S is in state s∗, it remains so.

Theorem. Suppose a pure strategy profile is a steady state of fictitious play. Then it must be a (possibly weak) pure Nash equilibrium in the stage game. Proof . Suppose a =(a1,..., an) is a steady state. Let 1 i n be the ≤ ≤ i index of an arbitrary player. At every round, i’s opponent model is a− . By definition of fictitious play, player i must have played a best response, i.e., i i a BR(a− ). ∈

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide7 Steady states are pure (but possibly weak) NE

Definition (Steady state). Let S by a dynamic process. A state s∗ is said

to be (or , or ) if it is the case that whenever S is in state s∗, it remains so.

Theorem. Suppose a pure strategy profile is a steady state of fictitious play. Then it must be a (possibly weak) pure Nash equilibrium in the stage game. Proof . Suppose a =(a1,..., an) is a steady state. Let 1 i n be the ≤ ≤ i index of an arbitrary player. At every round, i’s opponent model is a− . By definition of fictitious play, player i must have played a best response, i.e., i i a BR(a− ). ∈ Since i was arbitrary, this holds for every player i.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide7 Steady states are pure (but possibly weak) NE

Definition (Steady state). Let S by a dynamic process. A state s∗ is said

to be (or , or ) if it is the case that whenever S is in state s∗, it remains so.

Theorem. Suppose a pure strategy profile is a steady state of fictitious play. Then it must be a (possibly weak) pure Nash equilibrium in the stage game. Proof . Suppose a =(a1,..., an) is a steady state. Let 1 i n be the ≤ ≤ i index of an arbitrary player. At every round, i’s opponent model is a− . By definition of fictitious play, player i must have played a best response, i.e., i i a BR(a− ). ∈ Since i was arbitrary, this holds for every player i. The action profile a is therefore a Nash equilibrium. 

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide7

Pure strict Nash equilibria are steady states

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8

Pure strict Nash equilibria are steady states

Theorem. Suppose a pure strategy profile is a strict Nash equilibrium of a stage game.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8

Pure strict Nash equilibria are steady states

Theorem. Suppose a pure strategy profile is a strict Nash equilibrium of a stage game. Then it must be a steady state of fictitious play.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8

Pure strict Nash equilibria are steady states

Theorem. Suppose a pure strategy profile is a strict Nash equilibrium of a stage game. Then it must be a steady state of fictitious play. Notice the use of terminology: ■

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8

Pure strict Nash equilibria are steady states

Theorem. Suppose a pure strategy profile is a strict Nash equilibrium of a stage game. Then it must be a steady state of fictitious play. Notice the use of terminology: ■ “pure strategy profile” for Nash equilibria. ■

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8

Pure strict Nash equilibria are steady states

Theorem. Suppose a pure strategy profile is a strict Nash equilibrium of a stage game. Then it must be a steady state of fictitious play. Notice the use of terminology: ■ “pure strategy profile” for Nash equilibria. ■ “steady state” for fictitious play.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8

Pure strict Nash equilibria are steady states

Theorem. Suppose a pure strategy profile is a strict Nash equilibrium of a stage game. Then it must be a steady state of fictitious play. Notice the use of terminology: ■ “pure strategy profile” for Nash equilibria. ■ “steady state” for fictitious play.

Proof .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8

Pure strict Nash equilibria are steady states

Theorem. Suppose a pure strategy profile is a strict Nash equilibrium of a stage game. Then it must be a steady state of fictitious play. Notice the use of terminology: ■ “pure strategy profile” for Nash equilibria. ■ “steady state” for fictitious play.

Proof . Suppose the pure action profile a is a strict Nash equilibrium.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8

Pure strict Nash equilibria are steady states

Theorem. Suppose a pure strategy profile is a strict Nash equilibrium of a stage game. Then it must be a steady state of fictitious play. Notice the use of terminology: ■ “pure strategy profile” for Nash equilibria. ■ “steady state” for fictitious play.

Proof . Suppose the pure action profile a is a strict Nash equilibrium. Suppose a is played at round t.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8

Pure strict Nash equilibria are steady states

Theorem. Suppose a pure strategy profile is a strict Nash equilibrium of a stage game. Then it must be a steady state of fictitious play. Notice the use of terminology: ■ “pure strategy profile” for Nash equilibria. ■ “steady state” for fictitious play.

Proof . Suppose the pure action profile a is a strict Nash equilibrium. Suppose a is played at round t. Because a is strict, ai is the unique best response to a i, for each i. −

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8

Pure strict Nash equilibria are steady states

Theorem. Suppose a pure strategy profile is a strict Nash equilibrium of a stage game. Then it must be a steady state of fictitious play. Notice the use of terminology: ■ “pure strategy profile” for Nash equilibria. ■ “steady state” for fictitious play.

Proof . Suppose the pure action profile a is a strict Nash equilibrium. Suppose a is played at round t. Because a is strict, ai is the unique best response to a i, for each i. Therefore, the action profile a will be played in round t + 1−again. 

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8

Pure strict Nash equilibria are steady states

Theorem. Suppose a pure strategy profile is a strict Nash equilibrium of a stage game. Then it must be a steady state of fictitious play. Notice the use of terminology: ■ “pure strategy profile” for Nash equilibria. ■ “steady state” for fictitious play.

Proof . Suppose the pure action profile a is a strict Nash equilibrium. Suppose a is played at round t. Because a is strict, ai is the unique best response to a i, for each i. Therefore, the action profile a will be played in round t + 1−again.  Summary of the two theorems:

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8

Pure strict Nash equilibria are steady states

Theorem. Suppose a pure strategy profile is a strict Nash equilibrium of a stage game. Then it must be a steady state of fictitious play. Notice the use of terminology: ■ “pure strategy profile” for Nash equilibria. ■ “steady state” for fictitious play.

Proof . Suppose the pure action profile a is a strict Nash equilibrium. Suppose a is played at round t. Because a is strict, ai is the unique best response to a i, for each i. Therefore, the action profile a will be played in round t + 1−again.  Summary of the two theorems:

Strict and pure Nash Steady state Pure Nash. ⇒ ⇒

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8 Pure strict Nash equilibria are steady states

Theorem. Suppose a pure strategy profile is a strict Nash equilibrium of a stage game. Then it must be a steady state of fictitious play. Notice the use of terminology: ■ “pure strategy profile” for Nash equilibria. ■ “steady state” for fictitious play.

Proof . Suppose the pure action profile a is a strict Nash equilibrium. Suppose a is played at round t. Because a is strict, ai is the unique best response to a i, for each i. Therefore, the action profile a will be played in round t + 1−again.  Summary of the two theorems:

Strict and pure Nash Steady state Pure Nash. ⇒ ⇒

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide8 What if all equilibria are mixed?

Example. .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5) 3. T H

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5) 3. T H (3.5,3.0)(2.0, 4.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5) 3. T H (3.5,3.0)(2.0, 4.5) 4. H H

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5) 3. T H (3.5,3.0)(2.0, 4.5) 4. H H (4.5,3.0)(3.0, 4.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5) 3. T H (3.5,3.0)(2.0, 4.5) 4. H H (4.5,3.0)(3.0, 4.5) 5. H H

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5) 3. T H (3.5,3.0)(2.0, 4.5) 4. H H (4.5,3.0)(3.0, 4.5) 5. H H (5.5,3.0)(4.0, 4.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5) 3. T H (3.5,3.0)(2.0, 4.5) 4. H H (4.5,3.0)(3.0, 4.5) 5. H H (5.5,3.0)(4.0, 4.5) 6. H H

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5) 3. T H (3.5,3.0)(2.0, 4.5) 4. H H (4.5,3.0)(3.0, 4.5) 5. H H (5.5,3.0)(4.0, 4.5) 6. H H (6.5,3.0)(5.0,4.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5) 3. T H (3.5,3.0)(2.0, 4.5) 4. H H (4.5,3.0)(3.0, 4.5) 5. H H (5.5,3.0)(4.0, 4.5) 6. H H (6.5,3.0)(5.0,4.5) 7. H T

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5) 3. T H (3.5,3.0)(2.0, 4.5) 4. H H (4.5,3.0)(3.0, 4.5) 5. H H (5.5,3.0)(4.0, 4.5) 6. H H (6.5,3.0)(5.0,4.5) 7. H T (6.5,4.0)(6.0,4.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5) 3. T H (3.5,3.0)(2.0, 4.5) 4. H H (4.5,3.0)(3.0, 4.5) 5. H H (5.5,3.0)(4.0, 4.5) 6. H H (6.5,3.0)(5.0,4.5) 7. H T (6.5,4.0)(6.0,4.5) 8. H T

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5) 3. T H (3.5,3.0)(2.0, 4.5) 4. H H (4.5,3.0)(3.0, 4.5) 5. H H (5.5,3.0)(4.0, 4.5) 6. H H (6.5,3.0)(5.0,4.5) 7. H T (6.5,4.0)(6.0,4.5) 8. H T (6.5,5.0)(7.0,4.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 What if all equilibria are mixed?

Example. Matching Pennies. This is a zero-sum game. A’s goal is to have pennies matched. B’s goal is the opposite.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (1.5, 2.0)(2.0,1.5) 1. T T (1.5, 3.0)(2.0, 2.5) 2. T H (2.5, 3.0)(2.0, 3.5) 3. T H (3.5,3.0)(2.0, 4.5) 4. H H (4.5,3.0)(3.0, 4.5) 5. H H (5.5,3.0)(4.0, 4.5) 6. H H (6.5,3.0)(5.0,4.5) 7. H T (6.5,4.0)(6.0,4.5) 8. H T (6.5,5.0)(7.0,4.5) ......

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide9 Frequencies of fictitious play

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 10 Convergent empirical distribution of strategies

Theorem. If the empirical distribution of strategies converges in fictitious play, then it converges to a Nash equilibrium.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 11 Convergent empirical distribution of strategies

Theorem. If the empirical distribution of strategies converges in fictitious play, then it converges to a Nash equilibrium. Proof .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 11 Convergent empirical distribution of strategies

Theorem. If the empirical distribution of strategies converges in fictitious play, then it converges to a Nash equilibrium. Proof . Let i be arbitrary.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 11 Convergent empirical distribution of strategies

Theorem. If the empirical distribution of strategies converges in fictitious play, then it converges to a Nash equilibrium. Proof . Let i be arbitrary. If the empirical distribution converges to q, i then i’s opponent model converges to q− .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 11 Convergent empirical distribution of strategies

Theorem. If the empirical distribution of strategies converges in fictitious play, then it converges to a Nash equilibrium. Proof . Let i be arbitrary. If the empirical distribution converges to q, i then i’s opponent model converges to q− . By definition of fictitious play, qi BR(q i). ∈ −

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 11 Convergent empirical distribution of strategies

Theorem. If the empirical distribution of strategies converges in fictitious play, then it converges to a Nash equilibrium. Proof . Let i be arbitrary. If the empirical distribution converges to q, i then i’s opponent model converges to q− . By definition of fictitious play, qi BR(q i). Because of convergence, all such (mixed) best replies ∈ − converge along.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 11 Convergent empirical distribution of strategies

Theorem. If the empirical distribution of strategies converges in fictitious play, then it converges to a Nash equilibrium. Proof . Let i be arbitrary. If the empirical distribution converges to q, i then i’s opponent model converges to q− . By definition of fictitious play, qi BR(q i). Because of convergence, all such (mixed) best replies ∈ − converge along. By definition q is a Nash equilibrium. 

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 11 Convergent empirical distribution of strategies

Theorem. If the empirical distribution of strategies converges in fictitious play, then it converges to a Nash equilibrium. Proof . Let i be arbitrary. If the empirical distribution converges to q, i then i’s opponent model converges to q− . By definition of fictitious play, qi BR(q i). Because of convergence, all such (mixed) best replies ∈ − converge along. By definition q is a Nash equilibrium. 

Remarks:

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 11 Convergent empirical distribution of strategies

Theorem. If the empirical distribution of strategies converges in fictitious play, then it converges to a Nash equilibrium. Proof . Let i be arbitrary. If the empirical distribution converges to q, i then i’s opponent model converges to q− . By definition of fictitious play, qi BR(q i). Because of convergence, all such (mixed) best replies ∈ − converge along. By definition q is a Nash equilibrium. 

Remarks: 1. The qi may be mixed.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 11 Convergent empirical distribution of strategies

Theorem. If the empirical distribution of strategies converges in fictitious play, then it converges to a Nash equilibrium. Proof . Let i be arbitrary. If the empirical distribution converges to q, i then i’s opponent model converges to q− . By definition of fictitious play, qi BR(q i). Because of convergence, all such (mixed) best replies ∈ − converge along. By definition q is a Nash equilibrium. 

Remarks: 1. The qi may be mixed.

i 2. It actually suffices that the q− converge asymptotically to the actual distribution (Fudenberg & Levine, 1998).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 11 Convergent empirical distribution of strategies

Theorem. If the empirical distribution of strategies converges in fictitious play, then it converges to a Nash equilibrium. Proof . Let i be arbitrary. If the empirical distribution converges to q, i then i’s opponent model converges to q− . By definition of fictitious play, qi BR(q i). Because of convergence, all such (mixed) best replies ∈ − converge along. By definition q is a Nash equilibrium. 

Remarks: i 1. The q may be mixed. 3. If the empirical distributions i converge (hence, converge to a 2. It actually suffices that the q− converge asymptotically to the Nash equilibrium), the actually actual distribution (Fudenberg played responses per stage & Levine, 1998). need not be Nash equilibria of the stage game.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 11 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs 0.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.5, 1.0)(1.0, 0.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.5, 1.0)(1.0, 0.5) 1. B A

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.5, 1.0)(1.0, 0.5) 1. B A (1.5,1.0)(1.0, 1.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.5, 1.0)(1.0, 0.5) 1. B A (1.5,1.0)(1.0, 1.5) 2. A B

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.5, 1.0)(1.0, 0.5) 1. B A (1.5,1.0)(1.0, 1.5) 2. A B (1.5, 2.0)(2.0,1.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.5, 1.0)(1.0, 0.5) 1. B A (1.5,1.0)(1.0, 1.5) 2. A B (1.5, 2.0)(2.0,1.5) 3. B A

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.5, 1.0)(1.0, 0.5) 1. B A (1.5,1.0)(1.0, 1.5) 2. A B (1.5, 2.0)(2.0,1.5) 3. B A (2.5,2.0)(2.0, 2.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.5, 1.0)(1.0, 0.5) 1. B A (1.5,1.0)(1.0, 1.5) 2. A B (1.5, 2.0)(2.0,1.5) 3. B A (2.5,2.0)(2.0, 2.5) 4. A B

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.5, 1.0)(1.0, 0.5) 1. B A (1.5,1.0)(1.0, 1.5) 2. A B (1.5, 2.0)(2.0,1.5) 3. B A (2.5,2.0)(2.0, 2.5) 4. A B (2.5, 3.0)(3.0,2.5)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.5, 1.0)(1.0, 0.5) 1. B A (1.5,1.0)(1.0, 1.5) 2. A B (1.5, 2.0)(2.0,1.5) 3. B A (2.5,2.0)(2.0, 2.5) 4. A B (2.5, 3.0)(3.0,2.5) ......

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.5, 1.0)(1.0, 0.5) 1. B A (1.5,1.0)(1.0, 1.5) 2. A B (1.5, 2.0)(2.0,1.5) 3. B A (2.5,2.0)(2.0, 2.5) 4. A B (2.5, 3.0)(3.0,2.5) ......

■ This game possesses three equilibria, viz. (0,0), (0.5, 0.5), and (1,1), with expected payoffs 1, 0.5, and 1, respectively.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12 Empirical distributions converge to Nash stage Nash 6⇒

. Players receive payoff 1 iff they coordinate, else 0.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.5, 1.0)(1.0, 0.5) 1. B A (1.5,1.0)(1.0, 1.5) 2. A B (1.5, 2.0)(2.0,1.5) 3. B A (2.5,2.0)(2.0, 2.5) 4. A B (2.5, 3.0)(3.0,2.5) ......

■ This game possesses three equilibria, viz. (0,0), (0.5, 0.5), and (1,1), with expected payoffs 1, 0.5, and 1, respectively. ■ Empirical distribution of play converges to (0.5, 0.5),—with payoff 0, rather than 0.5.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 12

Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs 0.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.0, 0.0, 0.5)(0.0, 0.5,0.0)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.0, 0.0, 0.5)(0.0, 0.5,0.0) 1. Rock Scissors

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.0, 0.0, 0.5)(0.0, 0.5,0.0) 1. Rock Scissors (0.0, 0.0, 1.5)(1.0,0.5,0.0)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.0, 0.0, 0.5)(0.0, 0.5,0.0) 1. Rock Scissors (0.0, 0.0, 1.5)(1.0,0.5,0.0) 2. Rock Paper

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.0, 0.0, 0.5)(0.0, 0.5,0.0) 1. Rock Scissors (0.0, 0.0, 1.5)(1.0,0.5,0.0) 2. Rock Paper (0.0, 1.0,1.5)(2.0,0.5,0.0)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.0, 0.0, 0.5)(0.0, 0.5,0.0) 1. Rock Scissors (0.0, 0.0, 1.5)(1.0,0.5,0.0) 2. Rock Paper (0.0, 1.0,1.5)(2.0,0.5,0.0) 3. Rock Paper

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.0, 0.0, 0.5)(0.0, 0.5,0.0) 1. Rock Scissors (0.0, 0.0, 1.5)(1.0,0.5,0.0) 2. Rock Paper (0.0, 1.0,1.5)(2.0,0.5,0.0) 3. Rock Paper (0.0, 2.0,1.5)(3.0,0.5,0.0)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.0, 0.0, 0.5)(0.0, 0.5,0.0) 1. Rock Scissors (0.0, 0.0, 1.5)(1.0,0.5,0.0) 2. Rock Paper (0.0, 1.0,1.5)(2.0,0.5,0.0) 3. Rock Paper (0.0, 2.0,1.5)(3.0,0.5,0.0) 4. Scissors Paper

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.0, 0.0, 0.5)(0.0, 0.5,0.0) 1. Rock Scissors (0.0, 0.0, 1.5)(1.0,0.5,0.0) 2. Rock Paper (0.0, 1.0,1.5)(2.0,0.5,0.0) 3. Rock Paper (0.0, 2.0,1.5)(3.0,0.5,0.0) 4. Scissors Paper (0.0, 3.0,1.5)(3.0, 0.5, 1.0)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.0, 0.0, 0.5)(0.0, 0.5,0.0) 1. Rock Scissors (0.0, 0.0, 1.5)(1.0,0.5,0.0) 2. Rock Paper (0.0, 1.0,1.5)(2.0,0.5,0.0) 3. Rock Paper (0.0, 2.0,1.5)(3.0,0.5,0.0) 4. Scissors Paper (0.0, 3.0,1.5)(3.0, 0.5, 1.0) 5. Scissors Paper

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.0, 0.0, 0.5)(0.0, 0.5,0.0) 1. Rock Scissors (0.0, 0.0, 1.5)(1.0,0.5,0.0) 2. Rock Paper (0.0, 1.0,1.5)(2.0,0.5,0.0) 3. Rock Paper (0.0, 2.0,1.5)(3.0,0.5,0.0) 4. Scissors Paper (0.0, 3.0,1.5)(3.0, 0.5, 1.0) 5. Scissors Paper (0.0, 4.0,1.5)(3.0, 0.5, 2.0)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Empirical distr. of play does not need to converge

Rock-paper-scissors. Winner receives payoff 1, else 0.

■ Rock-paper-scissors with these payoffs is known as . ■ Shapley’s game possesses one equilibrium, viz. (1/3,1/3,1/3), with expected payoff 1/3.

Round A’s action B’s action A’s beliefs B’s beliefs 0. (0.0, 0.0, 0.5)(0.0, 0.5,0.0) 1. Rock Scissors (0.0, 0.0, 1.5)(1.0,0.5,0.0) 2. Rock Paper (0.0, 1.0,1.5)(2.0,0.5,0.0) 3. Rock Paper (0.0, 2.0,1.5)(3.0,0.5,0.0) 4. Scissors Paper (0.0, 3.0,1.5)(3.0, 0.5, 1.0) 5. Scissors Paper (0.0, 4.0,1.5)(3.0, 0.5, 2.0) ......

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 13 Repeated Shapley game: phase diagram

Scissors

N

• N N

N N Rock Paper

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 14 FP on Shapley’s game; strategy profiles in a simplex

There are many player couples. Each couple is connected by a gray line. Yellow is row; green is column. Player location is determined by the mixed strategy it projects on its opponent (i.e., normalised action count of its opponent). Each player starts with a biased action count. For example, with [100,0,0] (lower left) or [0,100,0] (lower right) or [33,33,33] (center). Initial action counts of player pairs are unrelated.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 15 FP on a 3x3 game; strategy profiles in a simplex

There are many player couples. Each couple is connected by a gray line. Yellow is row; green is column. Player location is determined by the mixed strategy it projects on its opponent (i.e., normalised action count of its opponent). Each player starts with a biased action count. For example, with [100,0,0] (lower left) or [0,100,0] (lower right) or [33,33,33] (center). Initial action counts of player pairs are unrelated.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 16 Part II: Extensions and approximations of fictitious play

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 17 Proposed extensions to fictitious play

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Proposed extensions to fictitious play

1. Build forecasts, not on complete history, but on

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Proposed extensions to fictitious play

1. Build forecasts, not on complete history, but on ■ Recent data

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Proposed extensions to fictitious play

1. Build forecasts, not on complete history, but on ■ Recent data, say on m most recent rounds.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Proposed extensions to fictitious play

1. Build forecasts, not on complete history, but on ■ Recent data, say on m most recent rounds. ■ Discounted data

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Proposed extensions to fictitious play

1. Build forecasts, not on complete history, but on ■ Recent data, say on m most recent rounds. ■ Discounted data, say with discount factor γ.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Proposed extensions to fictitious play

1. Build forecasts, not on complete history, but on ■ Recent data, say on m most recent rounds. ■ Discounted data, say with discount factor γ. ■ Perturbed data

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Proposed extensions to fictitious play

1. Build forecasts, not on complete history, but on ■ Recent data, say on m most recent rounds. ■ Discounted data, say with discount factor γ. ■ Perturbed data, say with error ǫ on individual observations.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Proposed extensions to fictitious play

1. Build forecasts, not on complete history, but on ■ Recent data, say on m most recent rounds. ■ Discounted data, say with discount factor γ. ■ Perturbed data, say with error ǫ on individual observations. ■ Random samples of historical data

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Proposed extensions to fictitious play

1. Build forecasts, not on complete history, but on ■ Recent data, say on m most recent rounds. ■ Discounted data, say with discount factor γ. ■ Perturbed data, say with error ǫ on individual observations. ■ Random samples of historical data, say on random samples of size m.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Proposed extensions to fictitious play

1. Build forecasts, not on complete history, but on ■ Recent data, say on m most recent rounds. ■ Discounted data, say with discount factor γ. ■ Perturbed data, say with error ǫ on individual observations. ■ Random samples of historical data, say on random samples of size m.

2. Give not necessarily best responses, but

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Proposed extensions to fictitious play

1. Build forecasts, not on complete history, but on ■ Recent data, say on m most recent rounds. ■ Discounted data, say with discount factor γ. ■ Perturbed data, say with error ǫ on individual observations. ■ Random samples of historical data, say on random samples of size m.

2. Give not necessarily best responses, but ■ ǫ-greedy.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Proposed extensions to fictitious play

1. Build forecasts, not on complete history, but on ■ Recent data, say on m most recent rounds. ■ Discounted data, say with discount factor γ. ■ Perturbed data, say with error ǫ on individual observations. ■ Random samples of historical data, say on random samples of size m.

2. Give not necessarily best responses, but ■ ǫ-greedy. ■ Perturbed throughout, with small random shocks.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Proposed extensions to fictitious play

1. Build forecasts, not on complete history, but on ■ Recent data, say on m most recent rounds. ■ Discounted data, say with discount factor γ. ■ Perturbed data, say with error ǫ on individual observations. ■ Random samples of historical data, say on random samples of size m.

2. Give not necessarily best responses, but ■ ǫ-greedy. ■ Perturbed throughout, with small random shocks. ■ Randomly, and proportional to expected payoff.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 18 Jordan’s framework for FP

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 19

Framework for predictive learning (e.g. fictitious play)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 20

Framework for predictive learning (e.g. fictitious play)

A for player i is a function that maps a history to a probability distribution over the opponents’ actions in the next round

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 20

Framework for predictive learning (e.g. fictitious play)

A for player i is a function that maps a history to a probability distribution over the opponents’ actions in the next round:

fi : H ∆(X i). → −

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 20

Framework for predictive learning (e.g. fictitious play)

A for player i is a function that maps a history to a probability distribution over the opponents’ actions in the next round:

fi : H ∆(X i). → −

A for player i is a function that maps a history to a probability distribution over i’s own actions in the next round

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 20

Framework for predictive learning (e.g. fictitious play)

A for player i is a function that maps a history to a probability distribution over the opponents’ actions in the next round:

fi : H ∆(X i). → −

A for player i is a function that maps a history to a probability distribution over i’s own actions in the next round :

g : H ∆(X ). i → i

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 20 Framework for predictive learning (e.g. fictitious play)

A for player i is a function that maps a history to a probability distribution over the opponents’ actions in the next round:

fi : H ∆(X i). → −

A for player i is a function that maps a history to a probability distribution over i’s own actions in the next round :

g : H ∆(X ). i → i

A for player i is the combination of a forecasting rule and a response rule.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 20 Framework for predictive learning (e.g. fictitious play)

A for player i is a function that maps a history to a probability distribution over the opponents’ actions in the next round:

fi : H ∆(X i). → −

A for player i is a function that maps a history to a probability distribution over i’s own actions in the next round :

g : H ∆(X ). i → i

A for player i is the combination of a forecasting rule and a response rule. This is typically written as ( fi, gi).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 20 Framework for predictive learning (e.g. fictitious play)

A for player i is a function that maps a history to a probability distribution over the opponents’ actions in the next round:

fi : H ∆(X i). → −

A for player i is a function that maps a history to a probability distribution over i’s own actions in the next round :

g : H ∆(X ). i → i

A for player i is the combination of a forecasting rule and a response rule. This is typically written as ( fi, gi). ■ This framework can be attributed to J.S. Jordan (1993).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 20 Framework for predictive learning (e.g. fictitious play)

A for player i is a function that maps a history to a probability distribution over the opponents’ actions in the next round:

fi : H ∆(X i). → −

A for player i is a function that maps a history to a probability distribution over i’s own actions in the next round :

g : H ∆(X ). i → i

A for player i is the combination of a forecasting rule and a response rule. This is typically written as ( fi, gi). ■ This framework can be attributed to J.S. Jordan (1993). ■ Forecasting and response functions are deterministic.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 20 Framework for predictive learning (e.g. fictitious play)

A for player i is a function that maps a history to a probability distribution over the opponents’ actions in the next round:

fi : H ∆(X i). → −

A for player i is a function that maps a history to a probability distribution over i’s own actions in the next round :

g : H ∆(X ). i → i

A for player i is the combination of a forecasting rule and a response rule. This is typically written as ( fi, gi). ■ This framework can be attributed to J.S. Jordan (1993). ■ Forecasting and response functions are deterministic. ■ Reinforcement and regret do not fit.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 20 Framework for predictive learning (e.g. fictitious play)

A for player i is a function that maps a history to a probability distribution over the opponents’ actions in the next round:

fi : H ∆(X i). → −

A for player i is a function that maps a history to a probability distribution over i’s own actions in the next round :

g : H ∆(X ). i → i

A for player i is the combination of a forecasting rule and a response rule. This is typically written as ( fi, gi). ■ This framework can be attributed to J.S. Jordan (1993). ■ Forecasting and response functions are deterministic. ■ Reinforcement and regret do not fit. (They are not involved with prediction.)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 20

Forecasting and response rules for fictitious play

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 21

Forecasting and response rules for fictitious play

Let ht Ht be a history of play up to and including round t ∈

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 21

Forecasting and response rules for fictitious play

Let ht Ht be a history of play up to and including round t and ∈ jt φ =Def the empirical distribution of j’s actions u.t.a.i. round t.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 21

Forecasting and response rules for fictitious play

Let ht Ht be a history of play up to and including round t and ∈ jt φ =Def the empirical distribution of j’s actions u.t.a.i. round t.

Then the is given by

t jt fi(h ) =Def ∏ φ . j=i 6

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 21

Forecasting and response rules for fictitious play

Let ht Ht be a history of play up to and including round t and ∈ jt φ =Def the empirical distribution of j’s actions u.t.a.i. round t.

Then the is given by

t jt fi(h ) =Def ∏ φ . j=i 6 Let fi be a fictitious play forecasting rule.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 21 Forecasting and response rules for fictitious play

Let ht Ht be a history of play up to and including round t and ∈ jt φ =Def the empirical distribution of j’s actions u.t.a.i. round t.

Then the is given by

t jt fi(h ) =Def ∏ φ . j=i 6

Let fi be a fictitious play forecasting rule. Then gi is said to be a

if all values are best responses to values of fi.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 21 Forecasting and response rules for fictitious play

Let ht Ht be a history of play up to and including round t and ∈ jt φ =Def the empirical distribution of j’s actions u.t.a.i. round t.

Then the is given by

t jt fi(h ) =Def ∏ φ . j=i 6

Let fi be a fictitious play forecasting rule. Then gi is said to be a

if all values are best responses to values of fi. Remarks:

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 21 Forecasting and response rules for fictitious play

Let ht Ht be a history of play up to and including round t and ∈ jt φ =Def the empirical distribution of j’s actions u.t.a.i. round t.

Then the is given by

t jt fi(h ) =Def ∏ φ . j=i 6

Let fi be a fictitious play forecasting rule. Then gi is said to be a

if all values are best responses to values of fi. Remarks: 1. Player i attributes a mixed strategy φjt to player j. This strategy reflects the number of times each action is played by j.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 21 Forecasting and response rules for fictitious play

Let ht Ht be a history of play up to and including round t and ∈ jt φ =Def the empirical distribution of j’s actions u.t.a.i. round t.

Then the is given by

t jt fi(h ) =Def ∏ φ . j=i 6

Let fi be a fictitious play forecasting rule. Then gi is said to be a

if all values are best responses to values of fi. Remarks: 1. Player i attributes a mixed strategy φjt to player j. This strategy reflects the number of times each action is played by j. 2. The mixed strategies are assumed to be independent.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 21 Forecasting and response rules for fictitious play

Let ht Ht be a history of play up to and including round t and ∈ jt φ =Def the empirical distribution of j’s actions u.t.a.i. round t.

Then the is given by

t jt fi(h ) =Def ∏ φ . j=i 6

Let fi be a fictitious play forecasting rule. Then gi is said to be a

if all values are best responses to values of fi. Remarks: 1. Player i attributes a mixed strategy φjt to player j. This strategy reflects the number of times each action is played by j. 2. The mixed strategies are assumed to be independent. 3. Both (1) and (2) are simplifying assumptions.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 21 Smoothed fictitious play

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 22

Smoothed fictitious play

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 23

Smoothed fictitious play

Notation:

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 23

Smoothed fictitious play

Notation: i p− : strategy profile of opponents as predicted by fi in round t.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 23

Smoothed fictitious play

Notation: i p− : strategy profile of opponents as predicted by fi in round t. i i ui(xi, p− ) : expected utility of action xi, given p− .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 23

Smoothed fictitious play

Notation: i p− : strategy profile of opponents as predicted by fi in round t. i i ui(xi, p− ) : expected utility of action xi, given p− . i q : strategy profile of player i in round t + 1. I.e., gi(h).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 23

Smoothed fictitious play

Notation: i p− : strategy profile of opponents as predicted by fi in round t. i i ui(xi, p− ) : expected utility of action xi, given p− . i q : strategy profile of player i in round t + 1. I.e., gi(h). i i i Task: define q given p− and ui(xi, p− ).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 23

Smoothed fictitious play

Notation: i p− : strategy profile of opponents as predicted by fi in round t. i i ui(xi, p− ) : expected utility of action xi, given p− . i q : strategy profile of player i in round t + 1. I.e., gi(h). i i i Task: define q given p− and ui(xi, p− ). Idea:

Respond randomly, but (somehow) proportional to expected payoff.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 23 Smoothed fictitious play

Notation: i p− : strategy profile of opponents as predicted by fi in round t. i i ui(xi, p− ) : expected utility of action xi, given p− . i q : strategy profile of player i in round t + 1. I.e., gi(h). i i i Task: define q given p− and ui(xi, p− ). Idea:

Respond randomly, but (somehow) proportional to expected payoff.

Elaborations of this idea:

b) Through (a.k.a. ):

i ui(xi,p− )/γ i i e q (xi p− ) =Def i . ui(x′,p− )/γ | ∑x X e i i′∈ i

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 23

Smoothed fictitious play

Notation: i p− : strategy profile of opponents as predicted by fi in round t. i i ui(xi, p− ) : expected utility of action xi, given p− . i q : strategy profile of player i in round t + 1. I.e., gi(h). i i i Task: define q given p− and ui(xi, p− ). Idea:

Respond randomly, but (somehow) proportional to expected payoff.

Elaborations of this idea: a) Strictly proportional:

u (x , p i) qi(x p i) = i i − . i − Def ∑ i | x′ Xi ui(xi′, p− ) i ∈

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 23 Smoothed fictitious play

Notation: i p− : strategy profile of opponents as predicted by fi in round t. i i ui(xi, p− ) : expected utility of action xi, given p− . i q : strategy profile of player i in round t + 1. I.e., gi(h). i i i Task: define q given p− and ui(xi, p− ). Idea:

Respond randomly, but (somehow) proportional to expected payoff.

Elaborations of this idea:

a) Strictly proportional: b) Through (a.k.a. ):

i u (x ,p i )/γ ui(xi, p− ) e i i − qi(x p i) = . i i i − Def i q (xi p− ) =Def i . | ∑ ui(x′, p ) ui(xi′,p− )/γ xi′ Xi i − | ∑x X e ∈ i′∈ i

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 23 Soft max a.k.a. mixed logit a.k.a. quantal response

Soft max: i ui(xi,p− )/γ i i e q (xi p− ) =Def i . ui(x′,p− )/γ | ∑x X e i i′∈ i

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 24 Soft max a.k.a. mixed logit a.k.a. quantal response

Soft max: i ui(xi,p− )/γ i i e q (xi p− ) =Def i . ui(x′,p− )/γ | ∑x X e i i′∈ i Soft max respects best replies, but leaves room for experimentation:

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 24 Soft max a.k.a. mixed logit a.k.a. quantal response

Soft max: i ui(xi,p− )/γ i i e q (xi p− ) =Def i . ui(x′,p− )/γ | ∑x X e i i′∈ i Soft max respects best replies, but leaves room for experimentation:

γ ∞ →

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 24 Soft max a.k.a. mixed logit a.k.a. quantal response

Soft max: i ui(xi,p− )/γ i i e q (xi p− ) =Def i . ui(x′,p− )/γ | ∑x X e i i′∈ i Soft max respects best replies, but leaves room for experimentation:

γ ∞ γ = 1 →

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 24 Soft max a.k.a. mixed logit a.k.a. quantal response

Soft max: i ui(xi,p− )/γ i i e q (xi p− ) =Def i . ui(x′,p− )/γ | ∑x X e i i′∈ i Soft max respects best replies, but leaves room for experimentation:

γ ∞ γ = 1 γ 0 → ↓

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 24 Smoothed fictitious play limits regret

Theorem (Fudenberg & Levine, 1995). Let G be a finite game and let ǫ > 0. If a given player uses smoothed fictitious play with a sufficiently small smoothing γ, then with probability one his regrets are bounded above by ǫ.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 25 Smoothed fictitious play limits regret

Theorem (Fudenberg & Levine, 1995). Let G be a finite game and let ǫ > 0. If a given player uses smoothed fictitious play with a sufficiently small smoothing γ, then with probability one his regrets are bounded above by ǫ. ■ Young does not reproduce the proof of Fudenberg et al., but shows that in this case ǫ-regret can be derived from a later and more general result of Hart and Mas-Colell in 2001.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 25 Smoothed fictitious play limits regret

Theorem (Fudenberg & Levine, 1995). Let G be a finite game and let ǫ > 0. If a given player uses smoothed fictitious play with a sufficiently small smoothing γ, then with probability one his regrets are bounded above by ǫ. ■ Young does not reproduce the proof of Fudenberg et al., but shows that in this case ǫ-regret can be derived from a later and more general result of Hart and Mas-Colell in 2001. ■ This later result identifies a large family of rules that eliminate regret, based on an extension of Blackwell’s approachability theorem.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 25 Smoothed fictitious play limits regret

Theorem (Fudenberg & Levine, 1995). Let G be a finite game and let ǫ > 0. If a given player uses smoothed fictitious play with a sufficiently small smoothing γ, then with probability one his regrets are bounded above by ǫ. ■ Young does not reproduce the proof of Fudenberg et al., but shows that in this case ǫ-regret can be derived from a later and more general result of Hart and Mas-Colell in 2001. ■ This later result identifies a large family of rules that eliminate regret, based on an extension of Blackwell’s approachability theorem. (Blackwell’s approachability theorem generalises maxmin reasoning to vector-valued payoffs.)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 25 Smoothed fictitious play limits regret

Theorem (Fudenberg & Levine, 1995). Let G be a finite game and let ǫ > 0. If a given player uses smoothed fictitious play with a sufficiently small smoothing γ, then with probability one his regrets are bounded above by ǫ. ■ Young does not reproduce the proof of Fudenberg et al., but shows that in this case ǫ-regret can be derived from a later and more general result of Hart and Mas-Colell in 2001. ■ This later result identifies a large family of rules that eliminate regret, based on an extension of Blackwell’s approachability theorem. (Blackwell’s approachability theorem generalises maxmin reasoning to vector-valued payoffs.)

Fudenberg & Levine, 1995. “Consistency and cautious fictitious play,” Journal of Economic Dy- namics and Control, Vol. 19 (5-7), pp. 1065-1089.

Hart & Mas-Colell, 2001. “A General Class of Adaptive Strategies,” Journal of Economic Theory, Vol. 98(1), pp. 26-54.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 25 Smoothed fictitious play converges to ǫ-CCE

Definition. Let X be action profiles, and q ∆(X). Then q is a ∈ (CCE) if no one wants to opt out prior to a realisation of q in the form of an action profile.

In a ǫ (ǫ-CCE), no player wants to opt out to gain more in expected utility than ǫ.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 26 Smoothed fictitious play converges to ǫ-CCE

Definition. Let X be action profiles, and q ∆(X). Then q is a ∈ (CCE) if no one wants to opt out prior to a realisation of q in the form of an action profile.

In a ǫ (ǫ-CCE), no player wants to opt out to gain more in expected utility than ǫ.

Theorem (Fudenberg & Levine, 1995). Let G be a finite game and let ǫ > 0.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 26 Smoothed fictitious play converges to ǫ-CCE

Definition. Let X be action profiles, and q ∆(X). Then q is a ∈ (CCE) if no one wants to opt out prior to a realisation of q in the form of an action profile.

In a ǫ (ǫ-CCE), no player wants to opt out to gain more in expected utility than ǫ.

Theorem (Fudenberg & Levine, 1995). Let G be a finite game and let ǫ > 0. If all players use smoothed fictitious play with sufficiently small smoothing parameters γi, then empirical play converges to the set of coarse correlated ǫ-equilibria a.s.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 26 Smoothed fictitious play converges to ǫ-CCE

Definition. Let X be action profiles, and q ∆(X). Then q is a ∈ (CCE) if no one wants to opt out prior to a realisation of q in the form of an action profile.

In a ǫ (ǫ-CCE), no player wants to opt out to gain more in expected utility than ǫ.

Theorem (Fudenberg & Levine, 1995). Let G be a finite game and let ǫ > 0. If all players use smoothed fictitious play with sufficiently small smoothing parameters γi, then empirical play converges to the set of coarse correlated ǫ-equilibria a.s. Summary of the two theorems:

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 26 Smoothed fictitious play converges to ǫ-CCE

Definition. Let X be action profiles, and q ∆(X). Then q is a ∈ (CCE) if no one wants to opt out prior to a realisation of q in the form of an action profile.

In a ǫ (ǫ-CCE), no player wants to opt out to gain more in expected utility than ǫ.

Theorem (Fudenberg & Levine, 1995). Let G be a finite game and let ǫ > 0. If all players use smoothed fictitious play with sufficiently small smoothing parameters γi, then empirical play converges to the set of coarse correlated ǫ-equilibria a.s. Summary of the two theorems: ■ Smoothed fictitious play limits regret.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 26 Smoothed fictitious play converges to ǫ-CCE

Definition. Let X be action profiles, and q ∆(X). Then q is a ∈ (CCE) if no one wants to opt out prior to a realisation of q in the form of an action profile.

In a ǫ (ǫ-CCE), no player wants to opt out to gain more in expected utility than ǫ.

Theorem (Fudenberg & Levine, 1995). Let G be a finite game and let ǫ > 0. If all players use smoothed fictitious play with sufficiently small smoothing parameters γi, then empirical play converges to the set of coarse correlated ǫ-equilibria a.s. Summary of the two theorems: ■ Smoothed fictitious play limits regret. ■ Smoothed fictitious play converges to ǫ-CCE.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 26 Smoothed fictitious play converges to ǫ-CCE

Definition. Let X be action profiles, and q ∆(X). Then q is a ∈ (CCE) if no one wants to opt out prior to a realisation of q in the form of an action profile.

In a ǫ (ǫ-CCE), no player wants to opt out to gain more in expected utility than ǫ.

Theorem (Fudenberg & Levine, 1995). Let G be a finite game and let ǫ > 0. If all players use smoothed fictitious play with sufficiently small smoothing parameters γi, then empirical play converges to the set of coarse correlated ǫ-equilibria a.s. Summary of the two theorems: ■ Smoothed fictitious play limits regret. ■ Smoothed fictitious play converges to ǫ-CCE. But there is another l.a. with no regret and convergence to zero-CCE!

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 26 Exponentiated regret matching

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 27

Exponentiated regret matching

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 28

Exponentiated regret matching

Let

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 28

Exponentiated regret matching

Let j : action j, where 1 j k ≤ ≤

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 28

Exponentiated regret matching

Let j : action j, where 1 j k t ≤ ≤ u¯i : i’s (realised) average payoff up to and including round t

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 28

Exponentiated regret matching

Let j : action j, where 1 j k t ≤ ≤ u¯i : i’s (realised) average payoff up to and including round t it φ− : the (realised) joint empirical distribution of i’s opponents

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 28

Exponentiated regret matching

Let j : action j, where 1 j k t ≤ ≤ u¯i : i’s (realised) average payoff up to and including round t it φ− : the (realised) joint empirical distribution of i’s opponents it u¯i(j, φ− ) : i’s hypothetical mean payoff for playing j against

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 28

Exponentiated regret matching

Let j : action j, where 1 j k t ≤ ≤ u¯i : i’s (realised) average payoff up to and including round t it φ− : the (realised) joint empirical distribution of i’s opponents it it u¯i(j, φ− ) : i’s hypothetical mean payoff for playing j against φ−

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 28

Exponentiated regret matching

Let j : action j, where 1 j k t ≤ ≤ u¯i : i’s (realised) average payoff up to and including round t it φ− : the (realised) joint empirical distribution of i’s opponents it it u¯i(j, φ− ) : i’s hypothetical mean payoff for playing j against φ− r¯it : player i’s regret vector in round t, i.e., u¯ (j, φ it) u¯t i − − i

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 28 Exponentiated regret matching

Let j : action j, where 1 j k t ≤ ≤ u¯i : i’s (realised) average payoff up to and including round t it φ− : the (realised) joint empirical distribution of i’s opponents it it u¯i(j, φ− ) : i’s hypothetical mean payoff for playing j against φ− r¯it : player i’s regret vector in round t, i.e., u¯ (j, φ it) u¯t i − − i

(PY, p. 59) is defined as

i(t+1) ∝ it a qj [r¯j ]+

where a > 0.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 28 Exponentiated regret matching

Let j : action j, where 1 j k t ≤ ≤ u¯i : i’s (realised) average payoff up to and including round t it φ− : the (realised) joint empirical distribution of i’s opponents it it u¯i(j, φ− ) : i’s hypothetical mean payoff for playing j against φ− r¯it : player i’s regret vector in round t, i.e., u¯ (j, φ it) u¯t i − − i

(PY, p. 59) is defined as

i(t+1) ∝ it a qj [r¯j ]+

where a > 0. (For a = 1 we have ordinary regret matching.)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 28 Exponentiated regret matching

Let j : action j, where 1 j k t ≤ ≤ u¯i : i’s (realised) average payoff up to and including round t it φ− : the (realised) joint empirical distribution of i’s opponents it it u¯i(j, φ− ) : i’s hypothetical mean payoff for playing j against φ− r¯it : player i’s regret vector in round t, i.e., u¯ (j, φ it) u¯t i − − i

(PY, p. 59) is defined as

i(t+1) ∝ it a qj [r¯j ]+

where a > 0. (For a = 1 we have ordinary regret matching.) A theorem on exponentiated regret matching (Mas-Colell et al., 2001) ensures that individual players have no-regret with probability one, and the empirical distribution of play converges to the set of coarse correlated equilibria (PY, p. 37 for RM, p. 60 for ERM).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 28 FP

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 29 FP vs. Smoothed FP

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 29 FP vs. Smoothed FP vs. Exponentiated regret matching

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 29 e×Ø b

eØØeÖ b

Best reply vs. better reply

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 30 eØØeÖ b

Best reply vs. better reply

■ Fictitious play is a e×Ø b : it plays a best reply to sampled past play.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 30 Best reply vs. better reply

■ Fictitious play is a e×Ø b : it plays a best reply to sampled past play.

■ Regret matching may be called a eØØeÖ b : it plays a reply for which the expected payoff is better than past average payoff.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 30 Best reply vs. better reply

■ Fictitious play is a e×Ø b : it plays a best reply to sampled past play.

■ Regret matching may be called a eØØeÖ b : it plays a reply for which the expected payoff is better than past average payoff. ■ Smoothed fictitious play is a best reply strategy nor a better reply strategy.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 30 Best reply vs. better reply

■ Fictitious play is a e×Ø b : it plays a best reply to sampled past play.

■ Regret matching may be called a eØØeÖ b : it plays a reply for which the expected payoff is better than past average payoff. ■ Smoothed fictitious play is a best reply strategy nor a better reply strategy.

● It is not a best reply strategy because it puts positive probabilities on all actions at every period.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 30 Best reply vs. better reply

■ Fictitious play is a e×Ø b : it plays a best reply to sampled past play.

■ Regret matching may be called a eØØeÖ b : it plays a reply for which the expected payoff is better than past average payoff. ■ Smoothed fictitious play is a best reply strategy nor a better reply strategy.

● It is not a best reply strategy because it puts positive probabilities on all actions at every period.

● It is not a better reply strategy because it is not concerned with past payoffs.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 30 Best reply vs. better reply

■ Fictitious play is a e×Ø b : it plays a best reply to sampled past play.

■ Regret matching may be called a eØØeÖ b : it plays a reply for which the expected payoff is better than past average payoff. ■ Smoothed fictitious play is a best reply strategy nor a better reply strategy.

● It is not a best reply strategy because it puts positive probabilities on all actions at every period.

● It is not a better reply strategy because it is not concerned with past payoffs.

Hart, S., and Mas-Colell, A. (2000). “A simple adaptive procedure leading to correlated equilib- rium”. Econometrica, 68, pp. 1127-1150.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 30

FP vs. Smoothed FP vs. Exponentiated RM

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 31

FP vs. Smoothed FP vs. Exponentiated RM

Fictitious play Plays best responses.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 31

FP vs. Smoothed FP vs. Exponentiated RM

Fictitious play Plays best responses.

■ Does depend on of opponent(s).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 31

FP vs. Smoothed FP vs. Exponentiated RM

Fictitious play Plays best responses.

■ Does depend on of opponent(s). ■ Puts zero probabilities on sub-optimal responses.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 31

FP vs. Smoothed FP vs. Exponentiated RM

Fictitious play Plays best responses.

■ Does depend on of opponent(s). ■ Puts zero probabilities on sub-optimal responses.

Smoothed fictitious play May play sub-optimal responses, e.g., proportional to their estimated payoffs.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 31

FP vs. Smoothed FP vs. Exponentiated RM

Fictitious play Plays best responses.

■ Does depend on of opponent(s). ■ Puts zero probabilities on sub-optimal responses.

Smoothed fictitious play May play sub-optimal responses, e.g., proportional to their estimated payoffs.

■ Does depend on of opponent(s).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 31

FP vs. Smoothed FP vs. Exponentiated RM

Fictitious play Plays best responses.

■ Does depend on of opponent(s). ■ Puts zero probabilities on sub-optimal responses.

Smoothed fictitious play May play sub-optimal responses, e.g., proportional to their estimated payoffs.

■ Does depend on of opponent(s). ■ Puts non-zero probabilities on sub-optimal responses.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 31

FP vs. Smoothed FP vs. Exponentiated RM

Fictitious play Plays best responses. ■ Approaches fictitious play when γ 0 (PY, p. 84).

■ Does depend on of ↓ opponent(s). ■ Puts zero probabilities on sub-optimal responses.

Smoothed fictitious play May play sub-optimal responses, e.g., proportional to their estimated payoffs.

■ Does depend on of opponent(s). ■ Puts non-zero probabilities on sub-optimal responses.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 31

FP vs. Smoothed FP vs. Exponentiated RM

Fictitious play Plays best responses. ■ Approaches fictitious play when γ 0 (PY, p. 84).

■ Does depend on of ↓ opponent(s). Exponentiated regret matching ■ Puts zero probabilities on Plays regret suboptimally, i.e., sub-optimal responses. proportional to a power of positive regret. Smoothed fictitious play May play sub-optimal responses, e.g., proportional to their estimated payoffs.

■ Does depend on of opponent(s). ■ Puts non-zero probabilities on sub-optimal responses.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 31 FP vs. Smoothed FP vs. Exponentiated RM

Fictitious play Plays best responses. ■ Approaches fictitious play when γ 0 (PY, p. 84).

■ Does depend on of ↓ opponent(s). Exponentiated regret matching ■ Puts zero probabilities on Plays regret suboptimally, i.e., sub-optimal responses. proportional to a power of positive regret. Smoothed fictitious play May ■ Does depend on own play sub-optimal responses, e.g., . proportional to their estimated payoffs.

■ Does depend on of opponent(s). ■ Puts non-zero probabilities on sub-optimal responses.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 31 FP vs. Smoothed FP vs. Exponentiated RM

Fictitious play Plays best responses. ■ Approaches fictitious play when γ 0 (PY, p. 84).

■ Does depend on of ↓ opponent(s). Exponentiated regret matching ■ Puts zero probabilities on Plays regret suboptimally, i.e., sub-optimal responses. proportional to a power of positive regret. Smoothed fictitious play May ■ Does depend on own play sub-optimal responses, e.g., . proportional to their estimated payoffs. ■ Puts non-zero probabilities on sub-optimal responses.

■ Does depend on of opponent(s). ■ Puts non-zero probabilities on sub-optimal responses.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 31 FP vs. Smoothed FP vs. Exponentiated RM

Fictitious play Plays best responses. ■ Approaches fictitious play when γ 0 (PY, p. 84).

■ Does depend on of ↓ opponent(s). Exponentiated regret matching ■ Puts zero probabilities on Plays regret suboptimally, i.e., sub-optimal responses. proportional to a power of positive regret. Smoothed fictitious play May ■ Does depend on own play sub-optimal responses, e.g., . proportional to their estimated payoffs. ■ Puts non-zero probabilities on sub-optimal responses.

■ Does depend on of opponent(s). ■ Approaches fictitious play when exponent a ∞ (PY, p. 84). ■ Puts non-zero probabilities on → sub-optimal responses.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 31 FP vs. Smoothed FP vs. Exponentiated RM

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 32 FP vs. Smoothed FP vs. Exponentiated RM

FP Smoothed FP Exponentiated RM

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 32 FP vs. Smoothed FP vs. Exponentiated RM

FP Smoothed FP Exponentiated RM Depends on past play of √ √ − opponents

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 32 FP vs. Smoothed FP vs. Exponentiated RM

FP Smoothed FP Exponentiated RM Depends on past play of √ √ − opponents Depends on own past √ − − payoffs

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 32 FP vs. Smoothed FP vs. Exponentiated RM

FP Smoothed FP Exponentiated RM Depends on past play of √ √ − opponents Depends on own past √ − − payoffs Puts zero probabilities √ − − on sub-optimal responses

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 32 FP vs. Smoothed FP vs. Exponentiated RM

FP Smoothed FP Exponentiated RM Depends on past play of √ √ − opponents Depends on own past √ − − payoffs Puts zero probabilities √ − − on sub-optimal responses Best response √ when smoothing when exponent parameter γ 0 a ∞ ↓ →

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 32 FP vs. Smoothed FP vs. Exponentiated RM

FP Smoothed FP Exponentiated RM Depends on past play of √ √ − opponents Depends on own past √ − − payoffs Puts zero probabilities √ − − on sub-optimal responses Best response √ when smoothing when exponent parameter γ 0 a ∞ ↓ → Individual no-regret Within ǫ > 0, almost Exact, almost − always (PY, p. 82) always (PY, p. 60)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 32 FP vs. Smoothed FP vs. Exponentiated RM

FP Smoothed FP Exponentiated RM Depends on past play of √ √ − opponents Depends on own past √ − − payoffs Puts zero probabilities √ − − on sub-optimal responses Best response √ when smoothing when exponent parameter γ 0 a ∞ ↓ → Individual no-regret Within ǫ > 0, almost Exact, almost − always (PY, p. 82) always (PY, p. 60) Collective convergence Within ǫ > 0, almost Exact, almost − to coarse correlated always (PY, p. 83) always (PY, p. 60) equilibria

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 32 Fictitious play compared to other algorithms

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 33 Part III: Finite memory and inertia

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 34

Finite memory: motivation

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 35

Finite memory: motivation

■ In their basic version, most learning rules rely on the entire history of play.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 35

Finite memory: motivation

■ In their basic version, most learning rules rely on the entire history of play. ■ People, as well as computers, have a finite memory.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 35

Finite memory: motivation

■ In their basic version, most learning rules rely on the entire history of play. ■ People, as well as computers, have a finite memory. (On the other hand, for average or discounted payoffs this is no problem.)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 35

Finite memory: motivation

■ In their basic version, most learning rules rely on the entire history of play. ■ People, as well as computers, have a finite memory. (On the other hand, for average or discounted payoffs this is no problem.) ■ Nevertheless: experiences in the distant past are apt to be less relevant than more recent ones.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 35 Finite memory: motivation

■ In their basic version, most learning rules rely on the entire history of play. ■ People, as well as computers, have a finite memory. (On the other hand, for average or discounted payoffs this is no problem.) ■ Nevertheless: experiences in the distant past are apt to be less relevant than more recent ones.

■ : let players have a

of m rounds.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 35 Finite memory: motivation

■ In their basic version, most learning rules rely on the entire history of play. ■ People, as well as computers, have a finite memory. (On the other hand, for average or discounted payoffs this is no problem.) ■ Nevertheless: experiences in the distant past are apt to be less relevant than more recent ones.

■ : let players have a

of m rounds.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 35

Inertia: motivation

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 36

Inertia: motivation

■ When players’ strategies are constantly re-evaluated, discontinuities in behaviour are likely to occur.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 36

Inertia: motivation

■ When players’ strategies are constantly re-evaluated, discontinuities in behaviour are likely to occur. Example: the asymmetric coordination game.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 36

Inertia: motivation

■ When players’ strategies are constantly re-evaluated, discontinuities in behaviour are likely to occur. Example: the asymmetric coordination game. ■ Discontinuities in behaviour are less likely to lead to equilibria of any sort.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 36 Inertia: motivation

■ When players’ strategies are constantly re-evaluated, discontinuities in behaviour are likely to occur. Example: the asymmetric coordination game. ■ Discontinuities in behaviour are less likely to lead to equilibria of any sort.

■ : let players play the same action as in the previous round with probability λ.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 36 Inertia: motivation

■ When players’ strategies are constantly re-evaluated, discontinuities in behaviour are likely to occur. Example: the asymmetric coordination game. ■ Discontinuities in behaviour are less likely to lead to equilibria of any sort.

■ : let players play the same action as in the previous round with probability λ.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 36

Weakly acyclic games

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 37

Weakly acyclic games

■ Game G with action space X.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 37

Weakly acyclic games

■ Game G with action space X.

■ G′ =(V, E) where V = X and

E = (x, y) for some i : { | y i = x i and ui(yi, y i) > − − − ui(xi, x i) . − }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 37

Weakly acyclic games

■ Game G with action space X.

■ G′ =(V, E) where V = X and

E = (x, y) for some i : { | y i = x i and ui(yi, y i) > − − − ui(xi, x i) . − } ■ For all x X: x is a sink iff x is ∈ a pure Nash equilibrium.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 37

Weakly acyclic games

■ Game G with action space X.

■ G′ =(V, E) where V = X and

E = (x, y) for some i : { | y i = x i and ui(yi, y i) > − − − ui(xi, x i) . − } ■ For all x X: x is a sink iff x is ∈ a pure Nash equilibrium.

■ G is said to be

if every node is connected to a sink.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 37 Weakly acyclic games

■ Game G with action space X.

■ G′ =(V, E) where V = X and

E = (x, y) for some i : { | y i = x i and ui(yi, y i) > − − − ui(xi, x i) . − } ■ For all x X: x is a sink iff x is ∈ a pure Nash equilibrium.

■ G is said to be

if every node is connected to a sink.

■ pure Nash ⇒ ∃ equilibrium.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 37 Weakly acyclic games

■ Game G with action space X.

■ G′ =(V, E) where V = X and (4,2) (2,4) (1,1)

E = (x, y) for some i : { | y i = x i and ui(yi, y i) > − − − ui(xi, x i) . − } (2,4) (4,2) (1,1) ■ For all x X: x is a sink iff x is ∈ a pure Nash equilibrium.

■ G is said to be

if every node is connected to a sink. (1,1) (1,1) (3,3) ■ pure Nash ⇒ ∃ equilibrium.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 37

Examples of weakly acyclic games

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 38

Examples of weakly acyclic games

Coordination games Two-person games with identical actions for all players, where best responses are formed by the diagonal of the joint action space.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 38

Examples of weakly acyclic games

Coordination games Two-person games with identical actions for all players, where best responses are formed by the diagonal of the joint action space.

Potential games (Monderer and Shapley, 1996).

There is a function ρ : X R, called the , →

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 38

Examples of weakly acyclic games

Coordination games Two-person games with identical actions for all players, where best responses are formed by the diagonal of the joint action space.

Potential games (Monderer and Shapley, 1996).

There is a function ρ : X R, called the , → such that for every player i

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 38

Examples of weakly acyclic games

Coordination games Two-person games with identical actions for all players, where best responses are formed by the diagonal of the joint action space.

Potential games (Monderer and Shapley, 1996).

There is a function ρ : X R, called the , → such that for every player i and every action profile x, y X: ∈

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 38

Examples of weakly acyclic games

Coordination games Two-person games with identical actions for all players, where best responses are formed by the diagonal of the joint action space.

Potential games (Monderer and Shapley, 1996).

There is a function ρ : X R, called the , → such that for every player i and every action profile x, y X: ∈

y i = x i ui(yi, y i) ui(xi, x i) = ρ(y) ρ(x) − − ⇒ − − − −

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 38 Examples of weakly acyclic games

Coordination games Two-person games with identical actions for all players, where best responses are formed by the diagonal of the joint action space.

Potential games (Monderer and Shapley, 1996).

There is a function ρ : X R, called the , → such that for every player i and every action profile x, y X: ∈

y i = x i ui(yi, y i) ui(xi, x i) = ρ(y) ρ(x) − − ⇒ − − − −

Example: .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 38 Examples of weakly acyclic games

Coordination games Two-person games with identical actions for all players, where best responses are formed by the diagonal of the joint action space.

Potential games (Monderer and Shapley, 1996).

There is a function ρ : X R, called the , → such that for every player i and every action profile x, y X: ∈

y i = x i ui(yi, y i) ui(xi, x i) = ρ(y) ρ(x) − − ⇒ − − − −

Example: . true : The potential function increases along every path.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 38 Examples of weakly acyclic games

Coordination games Two-person games with identical actions for all players, where best responses are formed by the diagonal of the joint action space.

Potential games (Monderer and Shapley, 1996).

There is a function ρ : X R, called the , → such that for every player i and every action profile x, y X: ∈

y i = x i ui(yi, y i) ui(xi, x i) = ρ(y) ρ(x) − − ⇒ − − − −

Example: . true : The potential function increases along every path. : Paths cannot cycle. ⇒

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 38 Examples of weakly acyclic games

Coordination games Two-person games with identical actions for all players, where best responses are formed by the diagonal of the joint action space.

Potential games (Monderer and Shapley, 1996).

There is a function ρ : X R, called the , → such that for every player i and every action profile x, y X: ∈

y i = x i ui(yi, y i) ui(xi, x i) = ρ(y) ρ(x) − − ⇒ − − − −

Example: . true : The potential function increases along every path. : Paths cannot cycle. ⇒ : In finite graphs, paths must end. ⇒

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 38

Weakly acyclic games under finite memory and inertia

Theorem. Let G be a finite weakly acyclic n-person game. Every better-reply process with finite memory and inertia converges to a pure NE of G.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 39

Weakly acyclic games under finite memory and inertia

Theorem. Let G be a finite weakly acyclic n-person game. Every better-reply process with finite memory and inertia converges to a pure NE of G.

Proof . (Outline.)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 39

Weakly acyclic games under finite memory and inertia

Theorem. Let G be a finite weakly acyclic n-person game. Every better-reply process with finite memory and inertia converges to a pure NE of G.

Proof . (Outline.) m 1. Let the Z be X .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 39 Weakly acyclic games under finite memory and inertia

Theorem. Let G be a finite weakly acyclic n-person game. Every better-reply process with finite memory and inertia converges to a pure NE of G.

Proof . (Outline.) m 1. Let the Z be X . 2. A state x¯ Xm is called ∈ if it consists of identical action profiles x. Such a state is denoted by x . h i

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 39 Weakly acyclic games under finite memory and inertia

Theorem. Let G be a finite weakly acyclic n-person game. Every better-reply process with finite memory and inertia converges to a pure NE of G.

Proof . (Outline.) m 1. Let the Z be X . 2. A state x¯ Xm is called ∈ if it consists of identical action profiles x. Such a state is denoted by x . h i Z∗ =Def homogeneous states . { }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 39 Weakly acyclic games under finite memory and inertia

Theorem. Let G be a finite weakly acyclic n-person game. Every better-reply process with finite memory and inertia converges to a pure NE of G.

Proof . (Outline.) m 1. Let the Z be X . 2. A state x¯ Xm is called ∈ if it consists of identical action profiles x. Such a state is denoted by x . h i Z∗ =Def homogeneous states . { } 3. It can be shown that the process hits Z∗ infinitely often.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 39 Weakly acyclic games under finite memory and inertia

Theorem. Let G be a finite weakly acyclic n-person game. Every better-reply process with finite memory and inertia converges to a pure NE of G.

Proof . (Outline.) probability to play any action is m bounded away from zero. 1. Let the Z be X . 2. A state x¯ Xm is called ∈ if it consists of identical action profiles x. Such a state is denoted by x . h i Z∗ =Def homogeneous states . { } 3. It can be shown that the process hits Z∗ infinitely often. 4. It can be shown that the overall

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 39 Weakly acyclic games under finite memory and inertia

Theorem. Let G be a finite weakly acyclic n-person game. Every better-reply process with finite memory and inertia converges to a pure NE of G.

Proof . (Outline.) probability to play any action is m bounded away from zero. 1. Let the Z be X . 2. A state x¯ Xm is called 5. It can be shown that ∈ if it consists of Absorbing ⊆ Z∗ Pure Nash. identical action profiles x. Such ∩ a state is denoted by x . ⊇ h i Z∗ =Def homogeneous states . { } 3. It can be shown that the process hits Z∗ infinitely often. 4. It can be shown that the overall

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 39 Weakly acyclic games under finite memory and inertia

Theorem. Let G be a finite weakly acyclic n-person game. Every better-reply process with finite memory and inertia converges to a pure NE of G.

Proof . (Outline.) probability to play any action is m bounded away from zero. 1. Let the Z be X . 2. A state x¯ Xm is called 5. It can be shown that ∈ if it consists of Absorbing ⊆ Z∗ Pure Nash. identical action profiles x. Such ∩ a state is denoted by x . ⊇ h i 6. It can be shown that, due to Z∗ =Def homogeneous states . weak acyclicity, inertia, and (4), { } the process eventually lands in 3. It can be shown that the process an absorbing state which, due hits Z∗ infinitely often. to (5), is a repeated pure Nash equilibrium.  4. It can be shown that the overall

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 39

First claim: process hits Z∗ infinitely often

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 40

First claim: process hits Z∗ infinitely often

Let inertia be determined by λ > 0.

Pr(all players play their previous action) = λn.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 40

First claim: process hits Z∗ infinitely often

Let inertia be determined by λ > 0.

Pr(all players play their previous action) = λn.

Hence,

Pr(all pl. play their previous action during m subsequent rounds) = λnm.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 40

First claim: process hits Z∗ infinitely often

Let inertia be determined by λ > 0.

Pr(all players play their previous action) = λn.

Hence,

Pr(all pl. play their previous action during m subsequent rounds) = λnm.

If all players play their previous action during m subsequent rounds, then the process arrives at a homogeneous state.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 40

First claim: process hits Z∗ infinitely often

Let inertia be determined by λ > 0.

Pr(all players play their previous action) = λn.

Hence,

Pr(all pl. play their previous action during m subsequent rounds) = λnm.

If all players play their previous action during m subsequent rounds, then the process arrives at a homogeneous state. Therefore, for all t,

Pr(process arrives at a homogeneous state in round t + m) λnm. ≥

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 40

First claim: process hits Z∗ infinitely often

Let inertia be determined by λ > 0.

Pr(all players play their previous action) = λn.

Hence,

Pr(all pl. play their previous action during m subsequent rounds) = λnm.

If all players play their previous action during m subsequent rounds, then the process arrives at a homogeneous state. Therefore, for all t,

Pr(process arrives at a homogeneous state in round t + m) λnm. ≥ Infinitely many disjoint histories of length m occur, hence infinitely many independent events “homogeneous at t + m” occur.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 40 First claim: process hits Z∗ infinitely often

Let inertia be determined by λ > 0.

Pr(all players play their previous action) = λn.

Hence,

Pr(all pl. play their previous action during m subsequent rounds) = λnm.

If all players play their previous action during m subsequent rounds, then the process arrives at a homogeneous state. Therefore, for all t,

Pr(process arrives at a homogeneous state in round t + m) λnm. ≥ Infinitely many disjoint histories of length m occur, hence infinitely many independent events “homogeneous at t + m” occur. Apply the (second) ∞

: if E are independent events, and ∑ Pr(E ) is { n}n n=1 n unbounded, then Pr( an infinite number of En ’s occur ) = 1. 

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 40 Second claim: probability to play any action is positive

A from states (finite histories) to strategies (probability distributions on actions)

γ : Z ∆(X ) i → i possesses two important properties:

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 41 Second claim: probability to play any action is positive

A from states (finite histories) to strategies (probability distributions on actions)

γ : Z ∆(X ) i → i possesses two important properties: 1. The formula itself is deterministic.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 41 Second claim: probability to play any action is positive

A from states (finite histories) to strategies (probability distributions on actions)

γ : Z ∆(X ) i → i possesses two important properties: 1. The formula itself is deterministic. 2. Every action is played with positive probability.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 41 Second claim: probability to play any action is positive

A from states (finite histories) to strategies (probability distributions on actions)

γ : Z ∆(X ) i → i possesses two important properties: 1. The formula itself is deterministic. 2. Every action is played with positive probability. Therefore, if γ = inf γ (z)(x ) z Z, x X . i Def { i i | ∈ i ∈ i}

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 41 Second claim: probability to play any action is positive

A from states (finite histories) to strategies (probability distributions on actions)

γ : Z ∆(X ) i → i possesses two important properties: 1. The formula itself is deterministic. 2. Every action is played with positive probability. Therefore, if γ = inf γ (z)(x ) z Z, x X . i Def { i i | ∈ i ∈ i} then γi > 0 since all γi(z)(xi) > 0 and Z and Xi are finite.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 41 Second claim: probability to play any action is positive

A from states (finite histories) to strategies (probability distributions on actions)

γ : Z ∆(X ) i → i possesses two important properties: 1. The formula itself is deterministic. 2. Every action is played with positive probability. Therefore, if γ = inf γ (z)(x ) z Z, x X . i Def { i i | ∈ i ∈ i} then γi > 0 since all γi(z)(xi) > 0 and Z and Xi are finite. So if γ = min γ 1 i n . Def { i | ≤ ≤ }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 41 Second claim: probability to play any action is positive

A from states (finite histories) to strategies (probability distributions on actions)

γ : Z ∆(X ) i → i possesses two important properties: 1. The formula itself is deterministic. 2. Every action is played with positive probability. Therefore, if γ = inf γ (z)(x ) z Z, x X . i Def { i i | ∈ i ∈ i} then γi > 0 since all γi(z)(xi) > 0 and Z and Xi are finite. So if γ = min γ 1 i n . Def { i | ≤ ≤ } then γ > 0. 

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 41 Final claim: probability to reach a sink from Z∗ > 0

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 42 Final claim: probability to reach a sink from Z∗ > 0

Suppose the process is in x . h i

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 42 Final claim: probability to reach a sink from Z∗ > 0

Suppose the process is in x . h i 1. If x is pure Nash, we are done, because response functions are deterministic better replies.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 42 Final claim: probability to reach a sink from Z∗ > 0

Suppose the process is in x . h i 1. If x is pure Nash, we are done, because response functions are deterministic better replies. 2. If x is not pure Nash, there must be an edge x y in the → better reply graph.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 42 Final claim: probability to reach a sink from Z∗ > 0

Suppose the process is in x . h i 1. If x is pure Nash, we are done, because response functions are deterministic better replies. 2. If x is not pure Nash, there must be an edge x y in the → better reply graph. Suppose this edge concerns action xi of player i.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 42 Final claim: probability to reach a sink from Z∗ > 0

Suppose the process is in x . h i 1. If x is pure Nash, we are done, because response functions are deterministic better replies. 2. If x is not pure Nash, there must be an edge x y in the → better reply graph. Suppose this edge concerns action xi of player i. We now know that xi is played with probability at least γ, irrespective of player and state.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 42 Final claim: probability to reach a sink from Z∗ > 0

Suppose the process is in x . h i 1. If x is pure Nash, we are done, because response functions are deterministic better replies. 2. If x is not pure Nash, there must be an edge x y in the → better reply graph. Suppose this edge concerns action xi of player i. We now know that xi is played with probability at least γ, irrespective of player and state. Further probabilities:

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 42 Final claim: probability to reach a sink from Z∗ > 0

Suppose the process is in x . h i 1. If x is pure Nash, we are done, because response functions are deterministic better replies. 2. If x is not pure Nash, there must be an edge x y in the → better reply graph. Suppose this edge concerns action xi of player i. We now know that xi is played with probability at least γ, irrespective of player and state. Further probabilities: ■ All other players j = i keep 6 n 1 playing the same action : λ − .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 42 Final claim: probability to reach a sink from Z∗ > 0

Suppose the process is in x . ■ Edge x y is actually h i → n 1 1. If x is pure Nash, we are done, traversed : γλ − . because response functions are deterministic better replies. 2. If x is not pure Nash, there must be an edge x y in the → better reply graph. Suppose this edge concerns action xi of player i. We now know that xi is played with probability at least γ, irrespective of player and state. Further probabilities: ■ All other players j = i keep 6 n 1 playing the same action : λ − .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 42 Final claim: probability to reach a sink from Z∗ > 0

Suppose the process is in x . ■ Edge x y is actually h i → n 1 1. If x is pure Nash, we are done, traversed : γλ − . because response functions are ■ Profile y is maintained for deterministic better replies. another m 1 rounds, so as to − arrive at state y : λn(m 1). 2. If x is not pure Nash, there h i − must be an edge x y in the → better reply graph. Suppose this edge concerns action xi of player i. We now know that xi is played with probability at least γ, irrespective of player and state. Further probabilities: ■ All other players j = i keep 6 n 1 playing the same action : λ − .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 42 Final claim: probability to reach a sink from Z∗ > 0

Suppose the process is in x . ■ Edge x y is actually h i → n 1 1. If x is pure Nash, we are done, traversed : γλ − . because response functions are ■ Profile y is maintained for deterministic better replies. another m 1 rounds, so as to − arrive at state y : λn(m 1). 2. If x is not pure Nash, there h i − must be an edge x y in the ■ → To traverse from x to y : better reply graph. Suppose n 1 n(m 1) h i nm h1 i γλ − λ − = γλ − . this edge concerns action xi of · player i. We now know that xi is played with probability at least γ, irrespective of player and state. Further probabilities: ■ All other players j = i keep 6 n 1 playing the same action : λ − .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 42 Final claim: probability to reach a sink from Z∗ > 0

Suppose the process is in x . ■ Edge x y is actually h i → n 1 1. If x is pure Nash, we are done, traversed : γλ − . because response functions are ■ Profile y is maintained for deterministic better replies. another m 1 rounds, so as to − arrive at state y : λn(m 1). 2. If x is not pure Nash, there h i − must be an edge x y in the ■ → To traverse from x to y : better reply graph. Suppose n 1 n(m 1) h i nm h1 i γλ − λ − = γλ − . this edge concerns action xi of · ■ (1) (l) player i. We now know that xi The “image” x ,..., x of h i ( ) h i( ) is played with probability at a better reply-path x 1 ,..., x l least γ, irrespective of player is followed to a sink : (γλnm 1)L, where L is the and state. ≥ − length of a longest better-reply Further probabilities: path. ■ All other players j = i keep 6 n 1 playing the same action : λ − .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 42 Final claim: probability to reach a sink from Z∗ > 0

Suppose the process is in x . ■ Edge x y is actually h i → n 1 1. If x is pure Nash, we are done, traversed : γλ − . because response functions are ■ Profile y is maintained for deterministic better replies. another m 1 rounds, so as to − arrive at state y : λn(m 1). 2. If x is not pure Nash, there h i − must be an edge x y in the ■ → To traverse from x to y : better reply graph. Suppose n 1 n(m 1) h i nm h1 i γλ − λ − = γλ − . this edge concerns action xi of · ■ (1) (l) player i. We now know that xi The “image” x ,..., x of h i ( ) h i( ) is played with probability at a better reply-path x 1 ,..., x l least γ, irrespective of player is followed to a sink : (γλnm 1)L, where L is the and state. ≥ − length of a longest better-reply Further probabilities: path. ■ All other players j = i keep 6 Since Z is encountered infinitely playing the same action : λn 1. ∗ − often, the result follows. 

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 42

Summary

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 43

Summary

■ With fictitious play, the

is modelled by (or represented

by, or projected on) a .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 43

Summary

■ With fictitious play, the

is modelled by (or represented

by, or projected on) a . ■ Fictitious play ignores sub-optimal actions.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 43 Summary

■ With fictitious play, the

is modelled by (or represented

by, or projected on) a . ■ Fictitious play ignores sub-optimal actions.

■ There is a family of so-called

, that i) play sub-optimal actions, and ii) can be brought arbitrary close to fictitious play.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 43 Summary

■ With fictitious play, the

is modelled by (or represented

by, or projected on) a . ■ Fictitious play ignores sub-optimal actions.

■ There is a family of so-called

, that i) play sub-optimal actions, and ii) can be brought arbitrary close to fictitious play. ■ In weakly acyclic n-person games, every better-reply process with finite memory and inertia converges to a pure Nash equilibrium.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 43

What next?

Gradient dynamics: ■ Like fictitious play, players model (or assess) each other through mixed strategies. ■ Strategies are not played, only maintained. ■ Due to CKR (common knowledge of rationality, cf. Hargreaves Heap & Varoufakis, 2004), all models of mixed i i strategies are correct. (I.e., q− = s− , for all i.) ■ Players gradually adapt their mixed strategies through hill-climbing in the payoff space.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 44

What next?

Bayesian play: Gradient dynamics: ■ Like fictitious play, players model (or assess) each other through mixed strategies. ■ Strategies are not played, only maintained. ■ Due to CKR (common knowledge of rationality, cf. Hargreaves Heap & Varoufakis, 2004), all models of mixed i i strategies are correct. (I.e., q− = s− , for all i.) ■ Players gradually adapt their mixed strategies through hill-climbing in the payoff space.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 44

What next?

Bayesian play: Gradient dynamics: ■ With fictitious play, ■ Like fictitious play, players model (or the behaviour of assess) each other through mixed opponents is strategies.

modelled by a ■ Strategies are not played, only . maintained. ■ Due to CKR (common knowledge of rationality, cf. Hargreaves Heap & Varoufakis, 2004), all models of mixed i i strategies are correct. (I.e., q− = s− , for all i.) ■ Players gradually adapt their mixed strategies through hill-climbing in the payoff space.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 44 What next?

Bayesian play: ■ With fictitious play, the behaviour of opponents is

modelled by a

. ■ With Bayesian play, opponents are modelled by a

.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 44 What next?

Bayesian play: Gradient dynamics: ■ With fictitious play, the behaviour of opponents is

modelled by a

. ■ With Bayesian play, opponents are modelled by a

.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 44 What next?

Bayesian play: Gradient dynamics: ■ With fictitious play, ■ Like fictitious play, players model (or the behaviour of assess) each other through mixed opponents is strategies.

modelled by a

. ■ With Bayesian play, opponents are modelled by a

.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 44 What next?

Bayesian play: Gradient dynamics: ■ With fictitious play, ■ Like fictitious play, players model (or the behaviour of assess) each other through mixed opponents is strategies.

modelled by a ■ Strategies are not played, only . maintained. ■ With Bayesian play, opponents are modelled by a

.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 44 What next?

Bayesian play: Gradient dynamics: ■ With fictitious play, ■ Like fictitious play, players model (or the behaviour of assess) each other through mixed opponents is strategies.

modelled by a ■ Strategies are not played, only . maintained. ■ With Bayesian play, ■ Due to CKR (common knowledge of opponents are rationality, cf. Hargreaves Heap & modelled by a Varoufakis, 2004), all models of mixed i i strategies are correct. (I.e., q− = s− , for all i.)

.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 44 What next?

Bayesian play: Gradient dynamics: ■ With fictitious play, ■ Like fictitious play, players model (or the behaviour of assess) each other through mixed opponents is strategies.

modelled by a ■ Strategies are not played, only . maintained. ■ With Bayesian play, ■ Due to CKR (common knowledge of opponents are rationality, cf. Hargreaves Heap & modelled by a Varoufakis, 2004), all models of mixed i i strategies are correct. (I.e., q− = s− , for all i.)

. ■ Players gradually adapt their mixed strategies through hill-climbing in the payoff space.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 44 Exam problems

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 45 Empirical frequencies of fictitious play

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem. Consider the empirical frequencies in fictitious play in a normal form game with mixed equilibria only.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem. Consider the empirical frequencies in fictitious play in a normal form game with mixed equilibria only. a) These do not converge.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem. Consider the empirical frequencies in fictitious play in a normal form game with mixed equilibria only. a) These do not converge. b) These converge to an equilibrium.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem. Consider the empirical frequencies in fictitious play in a normal form game with mixed equilibria only. a) These do not converge. b) These converge to an equilibrium. c) These converge, but not necessarily to an equilibrium.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem. Consider the empirical frequencies in fictitious play in a normal form game with mixed equilibria only. a) These do not converge. b) These converge to an equilibrium. c) These converge, but not necessarily to an equilibrium. d) Another answer.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem. Consider the empirical frequencies in fictitious play in a normal form game with mixed equilibria only. a) These do not converge. b) These converge to an equilibrium. c) These converge, but not necessarily to an equilibrium. d) Another answer.

Answer.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem. Consider the empirical frequencies in fictitious play in a normal form game with mixed equilibria only. a) These do not converge. b) These converge to an equilibrium. c) These converge, but not necessarily to an equilibrium. d) Another answer.

Answer. d):

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem. Consider the empirical frequencies in fictitious play in a normal form game with mixed equilibria only. a) These do not converge. b) These converge to an equilibrium. c) These converge, but not necessarily to an equilibrium. d) Another answer.

Answer. d): For some but not all games with mixed equilibria only, the empirical frequencies converge.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem. Consider the empirical frequencies in fictitious play in a normal form game with mixed equilibria only. a) These do not converge. b) These converge to an equilibrium. c) These converge, but not necessarily to an equilibrium. d) Another answer.

Answer. d): For some but not all games with mixed equilibria only, the empirical frequencies converge. For example, for matching pennies they converge, but for the Shapley game they do not.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem. Consider the empirical frequencies in fictitious play in a normal form game with mixed equilibria only. a) These do not converge. b) These converge to an equilibrium. c) These converge, but not necessarily to an equilibrium. d) Another answer.

Answer. d): For some but not all games with mixed equilibria only, the empirical frequencies converge. For example, for matching pennies they converge, but for the Shapley game they do not. It is a theorem that, if the empirical frequencies converge in FP, they converge to a NE.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem. Consider the empirical frequencies in fictitious play in a normal form game with mixed equilibria only. a) These do not converge. b) These converge to an equilibrium. c) These converge, but not necessarily to an equilibrium. d) Another answer.

Answer. d): For some but not all games with mixed equilibria only, the empirical frequencies converge. For example, for matching pennies they converge, but for the Shapley game they do not. It is a theorem that, if the empirical frequencies converge in FP, they converge to a NE. The actually played responses per stage need not be NE of the stage game, however.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem. Consider the empirical frequencies in fictitious play in a normal form game with mixed equilibria only. a) These do not converge. b) These converge to an equilibrium. c) These converge, but not necessarily to an equilibrium. d) Another answer.

Answer. d): For some but not all games with mixed equilibria only, the empirical frequencies converge. For example, for matching pennies they converge, but for the Shapley game they do not. It is a theorem that, if the empirical frequencies converge in FP, they converge to a NE. The actually played responses per stage need not be NE of the stage game, however. As is the case with, for instance, matching pennies.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Empirical frequencies of fictitious play

Problem. Consider the empirical frequencies in fictitious play in a normal form game with mixed equilibria only. a) These do not converge. b) These converge to an equilibrium. c) These converge, but not necessarily to an equilibrium. d) Another answer.

Answer. d): For some but not all games with mixed equilibria only, the empirical frequencies converge. For example, for matching pennies they converge, but for the Shapley game they do not. It is a theorem that, if the empirical frequencies converge in FP, they converge to a NE. The actually played responses per stage need not be NE of the stage game, however. As is the case with, for instance, matching pennies. 

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 46 Smoothed fictitious play

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem. Consider smoothed fictitious play with smoothing parameter γ > 0 vs. exponentiated regret matching with exponent a 0. ≥

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem. Consider smoothed fictitious play with smoothing parameter γ > 0 vs. exponentiated regret matching with exponent a 0. ≥ i) Behave identically when γ 0 and a ∞. ↓ →

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem. Consider smoothed fictitious play with smoothing parameter γ > 0 vs. exponentiated regret matching with exponent a 0. ≥ i) Behave identically when γ 0 and a ∞. ↓ → ii) Behave identically when a = 0 and γ ∞. →

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem. Consider smoothed fictitious play with smoothing parameter γ > 0 vs. exponentiated regret matching with exponent a 0. ≥ i) Behave identically when γ 0 and a ∞. ↓ → ii) Behave identically when a = 0 and γ ∞. → a) None.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem. Consider smoothed fictitious play with smoothing parameter γ > 0 vs. exponentiated regret matching with exponent a 0. ≥ i) Behave identically when γ 0 and a ∞. ↓ → ii) Behave identically when a = 0 and γ ∞. → a) None. b) Only i).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem. Consider smoothed fictitious play with smoothing parameter γ > 0 vs. exponentiated regret matching with exponent a 0. ≥ i) Behave identically when γ 0 and a ∞. ↓ → ii) Behave identically when a = 0 and γ ∞. → a) None. b) Only i). c) Only ii).

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem. Consider smoothed fictitious play with smoothing parameter γ > 0 vs. exponentiated regret matching with exponent a 0. ≥ i) Behave identically when γ 0 and a ∞. ↓ → ii) Behave identically when a = 0 and γ ∞. → a) None. b) Only i). c) Only ii). d) Both.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem. Consider smoothed fictitious play with smoothing parameter γ > 0 vs. exponentiated regret matching with exponent a 0. ≥ i) Behave identically when γ 0 and a ∞. ↓ → ii) Behave identically when a = 0 and γ ∞. → a) None. b) Only i). c) Only ii). d) Both.

Answer

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem. Consider smoothed fictitious play with smoothing parameter γ > 0 vs. exponentiated regret matching with exponent a 0. ≥ i) Behave identically when γ 0 and a ∞. ↓ → ii) Behave identically when a = 0 and γ ∞. → a) None. b) Only i). c) Only ii). d) Both.

Answer d):

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem. Consider smoothed fictitious play with smoothing parameter γ > 0 vs. exponentiated regret matching with exponent a 0. ≥ i) Behave identically when γ 0 and a ∞. ↓ → ii) Behave identically when a = 0 and γ ∞. → a) None. b) Only i). c) Only ii). d) Both.

Answer d): When γ 0 and a ∞, then both converge to fictitious play. ↓ →

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem. Consider smoothed fictitious play with smoothing parameter γ > 0 vs. exponentiated regret matching with exponent a 0. ≥ i) Behave identically when γ 0 and a ∞. ↓ → ii) Behave identically when a = 0 and γ ∞. → a) None. b) Only i). c) Only ii). d) Both.

Answer d): When γ 0 and a ∞, then both converge to fictitious play. ↓ → When a = 0 and γ ∞, then exponentiated regret matching plays → randomly, and smoothed fictitious play converges to random play.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem. Consider smoothed fictitious play with smoothing parameter γ > 0 vs. exponentiated regret matching with exponent a 0. ≥ i) Behave identically when γ 0 and a ∞. ↓ → ii) Behave identically when a = 0 and γ ∞. → a) None. b) Only i). c) Only ii). d) Both.

Answer d): When γ 0 and a ∞, then both converge to fictitious play. ↓ → When a = 0 and γ ∞, then exponentiated regret matching plays → randomly, and smoothed fictitious play converges to random play. See also the cake pies on Slide 24 and the table on Slide 32.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Smoothed fictitious play

Problem. Consider smoothed fictitious play with smoothing parameter γ > 0 vs. exponentiated regret matching with exponent a 0. ≥ i) Behave identically when γ 0 and a ∞. ↓ → ii) Behave identically when a = 0 and γ ∞. → a) None. b) Only i). c) Only ii). d) Both.

Answer d): When γ 0 and a ∞, then both converge to fictitious play. ↓ → When a = 0 and γ ∞, then exponentiated regret matching plays → randomly, and smoothed fictitious play converges to random play. See also the cake pies on Slide 24 and the table on Slide 32. 

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 47 Explanation of conditioning on an event

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E { }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { } = Pr E F Pr F + Pr E F Pr F { | } { } { | } { }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { } = Pr E F Pr F + Pr E F Pr F { | } { } { | } { } = Pr E F Pr F + Pr E F [1 Pr F ]. { | } { } { | } − { }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { } = Pr E F Pr F + Pr E F Pr F { | } { } { | } { } = Pr E F Pr F + Pr E F [1 Pr F ]. { | } { } { | } − { } Problem.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { } = Pr E F Pr F + Pr E F Pr F { | } { } { | } { } = Pr E F Pr F + Pr E F [1 Pr F ]. { | } { } { | } − { } Problem. A: policy holder is accident prone;

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { } = Pr E F Pr F + Pr E F Pr F { | } { } { | } { } = Pr E F Pr F + Pr E F [1 Pr F ]. { | } { } { | } − { }

Problem. A: policy holder is accident prone; A1: policy holder will have an accident within one year.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { } = Pr E F Pr F + Pr E F Pr F { | } { } { | } { } = Pr E F Pr F + Pr E F [1 Pr F ]. { | } { } { | } − { }

Problem. A: policy holder is accident prone; A1: policy holder will have an accident within one year. Suppose Pr A A = 0.4, Pr A A = 0.2 { 1| } { 1| } and Pr A = 0.3. { }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { } = Pr E F Pr F + Pr E F Pr F { | } { } { | } { } = Pr E F Pr F + Pr E F [1 Pr F ]. { | } { } { | } − { }

Problem. A: policy holder is accident prone; A1: policy holder will have an accident within one year. Suppose Pr A A = 0.4, Pr A A = 0.2 { 1| } { 1| } and Pr A = 0.3. Compute Pr A . { } { 1}

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { } = Pr E F Pr F + Pr E F Pr F { | } { } { | } { } = Pr E F Pr F + Pr E F [1 Pr F ]. { | } { } { | } − { }

Problem. A: policy holder is accident prone; A1: policy holder will have an accident within one year. Suppose Pr A A = 0.4, Pr A A = 0.2 { 1| } { 1| } and Pr A = 0.3. Compute Pr A . { } { 1} Answer.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { } = Pr E F Pr F + Pr E F Pr F { | } { } { | } { } = Pr E F Pr F + Pr E F [1 Pr F ]. { | } { } { | } − { }

Problem. A: policy holder is accident prone; A1: policy holder will have an accident within one year. Suppose Pr A A = 0.4, Pr A A = 0.2 { 1| } { 1| } and Pr A = 0.3. Compute Pr A . { } { 1} Answer. Pr A { 1}

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { } = Pr E F Pr F + Pr E F Pr F { | } { } { | } { } = Pr E F Pr F + Pr E F [1 Pr F ]. { | } { } { | } − { }

Problem. A: policy holder is accident prone; A1: policy holder will have an accident within one year. Suppose Pr A A = 0.4, Pr A A = 0.2 { 1| } { 1| } and Pr A = 0.3. Compute Pr A . { } { 1} Answer. Pr A = Pr A A Pr A { 1} { 1| } { }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { } = Pr E F Pr F + Pr E F Pr F { | } { } { | } { } = Pr E F Pr F + Pr E F [1 Pr F ]. { | } { } { | } − { }

Problem. A: policy holder is accident prone; A1: policy holder will have an accident within one year. Suppose Pr A A = 0.4, Pr A A = 0.2 { 1| } { 1| } and Pr A = 0.3. Compute Pr A . { } { 1} Answer. Pr A = Pr A A Pr A + Pr A A [1 Pr A ] { 1} { 1| } { } { 1| } − { }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { } = Pr E F Pr F + Pr E F Pr F { | } { } { | } { } = Pr E F Pr F + Pr E F [1 Pr F ]. { | } { } { | } − { }

Problem. A: policy holder is accident prone; A1: policy holder will have an accident within one year. Suppose Pr A A = 0.4, Pr A A = 0.2 { 1| } { 1| } and Pr A = 0.3. Compute Pr A . { } { 1} Answer. Pr A = Pr A A Pr A + Pr A A [1 Pr A ] = 0.26. { 1} { 1| } { } { 1| } − { }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Explanation of conditioning on an event

If E and F are events, then

Pr E = Pr EF EF { } { ∪ } = Pr EF + Pr EF { } { } = Pr E F Pr F + Pr E F Pr F { | } { } { | } { } = Pr E F Pr F + Pr E F [1 Pr F ]. { | } { } { | } − { }

Problem. A: policy holder is accident prone; A1: policy holder will have an accident within one year. Suppose Pr A A = 0.4, Pr A A = 0.2 { 1| } { 1| } and Pr A = 0.3. Compute Pr A . { } { 1} Answer. Pr A = Pr A A Pr A + Pr A A [1 Pr A ] = 0.26. { 1} { 1| } { } { 1| } − { } Similarly,

Pr T h = Pr T F, h Pr F h + Pr T F, h Pr c h . { | } { | } { | } { | } { | }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 48 Noisy fictitious play

Determine the strategy of Row at round eight if the history of play is

h = TC, BC, TL, BL, TR, TR, TL.

Answer. Projected mixed strategy of Col after seven rounds: L: 3/7; C: 2/7; R: 2/7. 3 2 2 10 E[T] = 0 + 1 + 4 = , Expected payoffs: 7 7 7 7 3 2 2 13 E[M] = 1 + 0 + 5 = , 7 7 7 7 3 2 2 13 E[B] = 3 + 2 + 0 = . 7 7 7 7

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 49 Noisy fictitious play

Problem. Determine the strategy of Row at round eight if the history of play is

h = TC, BC, TL, BL, TR, TR, TL.

Answer. Projected mixed strategy of Col after seven rounds: L: 3/7; C: 2/7; R: 2/7. 3 2 2 10 E[T] = 0 + 1 + 4 = , Expected payoffs: 7 7 7 7 3 2 2 13 E[M] = 1 + 0 + 5 = , 7 7 7 7 3 2 2 13 E[B] = 3 + 2 + 0 = . 7 7 7 7

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 49 Noisy fictitious play

Problem. Row is a 10% noise (plays a random action 10% of the time at random moments) fictitious player in a normal form game with payoffs

L C R T 0,2 1,0 4,0 M 1,1 0,1 5,2 B 3,1 2,3 0,1 .

Answer. Projected mixed strategy of Col after seven rounds: L: 3/7; C: 2/7; R: 2/7. 3 2 2 10 E[T] = 0 + 1 + 4 = , Expected payoffs: 7 7 7 7 3 2 2 13 E[M] = 1 + 0 + 5 = , 7 7 7 7 3 2 2 13 E[B] = 3 + 2 + 0 = . 7 7 7 7

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 49 Noisy fictitious play

Problem. Row is a 10% noise (plays a Determine the strategy of random action 10% of the time at random Row moments) fictitious player in a normal form game with payoffs

L C R T 0,2 1,0 4,0 M 1,1 0,1 5,2 B 3,1 2,3 0,1 .

Answer. Projected mixed strategy of Col after seven rounds: L: 3/7; C: 2/7; R: 2/7. 3 2 2 10 E[T] = 0 + 1 + 4 = , Expected payoffs: 7 7 7 7 3 2 2 13 E[M] = 1 + 0 + 5 = , 7 7 7 7 3 2 2 13 E[B] = 3 + 2 + 0 = . 7 7 7 7

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 49 Noisy fictitious play

Problem. Row is a 10% noise (plays a Determine the strategy of random action 10% of the time at random Row at round eight if the moments) fictitious player in a normal form history of play is game with payoffs h = TC, BC, TL, L C R BL, TR, TR, TL. T 0,2 1,0 4,0 M 1,1 0,1 5,2 B 3,1 2,3 0,1 .

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 49 Noisy fictitious play

Problem. Row is a 10% noise (plays a Determine the strategy of random action 10% of the time at random Row at round eight if the moments) fictitious player in a normal form history of play is game with payoffs h = TC, BC, TL, L C R BL, TR, TR, TL. T 0,2 1,0 4,0 M 1,1 0,1 5,2 B 3,1 2,3 0,1 .

Answer.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 49 Noisy fictitious play

Problem. Row is a 10% noise (plays a Determine the strategy of random action 10% of the time at random Row at round eight if the moments) fictitious player in a normal form history of play is game with payoffs h = TC, BC, TL, L C R BL, TR, TR, TL. T 0,2 1,0 4,0 M 1,1 0,1 5,2 B 3,1 2,3 0,1 .

Answer. Projected mixed strategy of Col after seven rounds: L: 3/7; C: 2/7; R: 2/7.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 49 Noisy fictitious play

Problem. Row is a 10% noise (plays a Determine the strategy of random action 10% of the time at random Row at round eight if the moments) fictitious player in a normal form history of play is game with payoffs h = TC, BC, TL, L C R BL, TR, TR, TL. T 0,2 1,0 4,0 M 1,1 0,1 5,2 B 3,1 2,3 0,1 .

Answer. Projected mixed strategy of Col after seven rounds: L: 3/7; C: 2/7; R: 2/7. 3 2 2 10 E[T] = 0 + 1 + 4 = , Expected payoffs: 7 7 7 7 3 2 2 13 E[M] = 1 + 0 + 5 = , 7 7 7 7 3 2 2 13 E[B] = 3 + 2 + 0 = . 7 7 7 7

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 49 Noisy fictitious play (ctd)

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 50 Noisy fictitious play (ctd)

If Row would be a pure fictitious player, it would play either M or B, each with probability 1/2.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 50 Noisy fictitious play (ctd)

If Row would be a pure fictitious player, it would play either M or B, each with probability 1/2. However, Row isn’t a pure fictitious player, instead it randomizes 10% of the time.

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 50 Noisy fictitious play (ctd)

If Row would be a pure fictitious player, it would play either M or B, each with probability 1/2. However, Row isn’t a pure fictitious player, instead it randomizes 10% of the time. If F represents the event that Row does not randomize and plays according to fictitious play, then

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 50 Noisy fictitious play (ctd)

If Row would be a pure fictitious player, it would play either M or B, each with probability 1/2. However, Row isn’t a pure fictitious player, instead it randomizes 10% of the time. If F represents the event that Row does not randomize and plays according to fictitious play, then

Pr T h = Pr T F, h Pr F h + Pr T F, h Pr F h { | } { | } { | } { | } { | }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 50 Noisy fictitious play (ctd)

If Row would be a pure fictitious player, it would play either M or B, each with probability 1/2. However, Row isn’t a pure fictitious player, instead it randomizes 10% of the time. If F represents the event that Row does not randomize and plays according to fictitious play, then

Pr T h = Pr T F, h Pr F h + Pr T F, h Pr F h { | } { | } { | } { | } { | } 9 1 1 1 2 = 0 + = (= ). · 10 3 · 10 30 60

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 50 Noisy fictitious play (ctd)

If Row would be a pure fictitious player, it would play either M or B, each with probability 1/2. However, Row isn’t a pure fictitious player, instead it randomizes 10% of the time. If F represents the event that Row does not randomize and plays according to fictitious play, then

Pr T h = Pr T F, h Pr F h + Pr T F, h Pr F h { | } { | } { | } { | } { | } 9 1 1 1 2 = 0 + = (= ). · 10 3 · 10 30 60 Pr M h = Pr M F, h Pr F h + Pr M F, h Pr F h { | } { | } { | } { | } { | }

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 50 Noisy fictitious play (ctd)

If Row would be a pure fictitious player, it would play either M or B, each with probability 1/2. However, Row isn’t a pure fictitious player, instead it randomizes 10% of the time. If F represents the event that Row does not randomize and plays according to fictitious play, then

Pr T h = Pr T F, h Pr F h + Pr T F, h Pr F h { | } { | } { | } { | } { | } 9 1 1 1 2 = 0 + = (= ). · 10 3 · 10 30 60 Pr M h = Pr M F, h Pr F h + Pr M F, h Pr F h { | } { | } { | } { | } { | } 1 9 1 1 29 = + = . 2 · 10 3 · 10 60

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 50 Noisy fictitious play (ctd)

If Row would be a pure fictitious player, it would play either M or B, each with probability 1/2. However, Row isn’t a pure fictitious player, instead it randomizes 10% of the time. If F represents the event that Row does not randomize and plays according to fictitious play, then

Pr T h = Pr T F, h Pr F h + Pr T F, h Pr F h { | } { | } { | } { | } { | } 9 1 1 1 2 = 0 + = (= ). · 10 3 · 10 30 60 Pr M h = Pr M F, h Pr F h + Pr M F, h Pr F h { | } { | } { | } { | } { | } 1 9 1 1 29 = + = . 2 · 10 3 · 10 60 Pr B h = . . . { | } 29 = = . ··· 60

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 50 Noisy fictitious play (ctd)

If Row would be a pure fictitious player, it would play either M or B, each with probability 1/2. However, Row isn’t a pure fictitious player, instead it randomizes 10% of the time. If F represents the event that Row does not randomize and plays according to fictitious play, then

Pr T h = Pr T F, h Pr F h + Pr T F, h Pr F h { | } { | } { | } { | } { | } 9 1 1 1 2 = 0 + = (= ). · 10 3 · 10 30 60 Pr M h = Pr M F, h Pr F h + Pr M F, h Pr F h { | } { | } { | } { | } { | } 1 9 1 1 29 = + = . 2 · 10 3 · 10 60 Pr B h = . . . { | } 29 = = .  ··· 60

Author: Gerard Vreeswijk. Slides last modified on May 19th,2021at16:50 Multi-agentlearning:FictitiousPlay,slide 50