This is a record of most of the different games we have tested with RSRS. In all cases, the prediction algorithm used is fictitious play, and games are repeated 100 times. In each figure the top graph shows performance against a random opponent, the center graph shows performance against an omniscient opponent, and the bottom graph shows performance against a worst-case opponent. Note that we do not include games with a single dominant , such as Prisoner’s Dilemma. This is because RSRS is not an algorithm for cooperation - it will always choose Defect, because Defect is always the best response.

0.1 Battle of the Sexes Player 2 Player 1 Opera Boxing Opera 2 1 0 0 Boxing 0 0 1 2 Battle of the Sexes is a frequently used example in . Performance is shown in figure 1. RSRS is slightly slower to adapt to the random opponent, possibly because it is preparing for a best responding opponent. Against a best responding opponent all agents except RNR manage to find the advantageous equilibrium. In the worst case only RSRS and SPS are able to adopt the neces- sary randomized strategy to get some chance of a payoff.

0.2 Chicken Player 2 Player 1 Don’t Dare Dare Don’t Dare 9 9 8 10 Dare 10 8 0 0 Chicken is another popular game from game theory. Performance is shown in figure 2. Most agents are quick to adapt to the worst case opponent, although there is some exploration. Against an omniscient opponent RNR, RSR, and RSRS all manage to find the favored equilibrium. RSRS is still a bit slow to find the best response to a random opponent.

0.3 Player 2 Player 1 A B C D a 1 1 0 0 0 0 0 0 b 0 0 2 2 0 0 0 0 c 0 0 0 0 3 3 0 0 d 0 0 0 0 0 0 4 4 This is a strict coordination game we have constructed, with 4 equilibria, which both players rank identically. Performance is shown in figure 3. All players are able to eventually find the best response to a random opponent, and all except RNR are able to find the best equilibrium with an omniscient

1 opponent. Only RSRS and SPS are able to find the best response to the worst case opponent, and it takes them a while to find it. That could probably be improved by better parameter selection for calculating the risk factor to adopt.

0.4 Cuban Missile Crisis Soviets US Withdraw Maintain Blockade 3 3 2 4 Air strike 4 2 1 1 Cuban missile crisis is another example from game theory. It is the label given to a class of 2 player games with that payoff structure. The Soviets choose between withdrawing from Cuba and staying in Cuba, and the US chooses between a blockade of Cuba, and airstrikes on Cuba (which could lead to nuclear war - the 1,1 option). Performance is shown in figures 4 and 5. When playing as the US RNR, RSR, and RSRS where all slightly worse against a random opponent, but significantly better against an omniscient or a worst-case opponent. When playing as the Soviets, all strategies performed well against a random opponent. Performance against an omniscient opponent varied widely, with RSRS and RSR having slightly higher peak performance. In the worst-case scenario all agents were able to figure out they should Withdraw.

0.5 Pursuer/Evader Evader Pursuer Location 1 Location 2 Location 3 Location 4 Location 1 10 0 0 11 0 12 0 14 Location 2 1 10 11 1 1 12 1 14 Location 3 2 10 2 11 12 2 2 14 Location 4 4 10 4 11 4 12 14 4 This is a game we constructed for testing. The pursuer receives 10 points for chosing the same location as the evader, otherwise the evader receives those 10 points. Players also value ending up in particular locations, with location 4 being most preferred. Performance is shown in figures 6 and 7. As the pursuer RSRS and RSR are a bit slow to find the optimal response against a random strategy, but they perform significantly better against an omniscient opponent. In the worst-case scenario, only RSRS and SPS find a good strategy. As the evader RNR and SPS perform comparatively better against the omniscient opponent, but again only RSRS and SPS perform well in the worst case.

2 0.6 Inspector/Worker Worker Inspector Don’t Work Work Don’t inspect 1 4 4 2 Inspect 2 1 3 3 This is another example from game theory. We have a worker who would prefer not to work, unless the inspector is watching, and an inspector who wants the worker to work, but would rather not have to inspect. Performance is shown in figures 8 and 9. As the inspector all agents do ok against the random opponent and the worst-case opponent. Against an omniscient opponent it’s very noisy, due to fluctuations in the predictor. As the worker RSRS RSR, and RNR are slightly slower to find a response to the random opponent, but they all do better against an omniscient opponent. In the worst case only RSRS and SPS find the appropriate mix of working and slacking off.

0.7 Hero Weak player Strong Player Weak option Strong option Weak option 3 4 1 1 Strong option 2 2 4 3 Hero is a variant of Battle of the Sexes in which both players prefer to play their own strategy, if they don’t succeed in choosing the same strategy. Performance is shown in figure 10. In this game RNR is slightly slower to find the best response to a random strategy. All agents do well against an omniscient opponent, but only SPS and RSRS find a good response to the worst-case opponent.

0.8 Player 2 Player 1 Heads Tails Heads 4 1 1 4 Tails 2 3 3 2 This is an asymetric version of matching pennies, in which both players would prefer to win while Player 2 chooses Heads. Performance is shown in figures 11 and 12. Because it is zero-sum, performance is the same against an omniscient opponent and in the worst case. Despite the asymmetry of the game performance is similar as both the first and second player. RSRS, RSR, and RNR are all slower to find a best-response to a random opponent, but all do better against the omniscient/worst-case opponent.

3 0.9 Pursuit of the Israelites Pharoah Moses Pursue Don’t Pursue Help Israelites 4 2 2 3 Don’t Help 1 4 3 1 Pursuit of the Israelites is another name for a category of games. The general idea is that Pharoah would prefer to pursue the Israelites, but only if Moses doesn’t pray to God to help them. Moses would prefer to pray, but only if the Pharoah is pursuing. Performance is shown in figures 13 and 14. RSRS, RSR, and RNR don’t perform quite as well against a random opponent, but do better against an omniscient opponent. In the worst case, only RSRS and SPS find the best strategy.

0.10 Player 2 Player 1 Stag Rabbit Stag 2 2 0 1 Rabbit 1 0 1 1 Stag Hunt is another classic example from game theory. Both players choose whether to hunt stag or rabbits. If both players choose stag, they receive a payoff of 2, but it’s safer to choose rabbit, which doesn’t require help from the other player. Performance is shown in figure 15. RSR and RNR perform slightly worse against random opponents, but all are able to hunt stag with the omniscient opponent. In the worst case, RSRS and SPS figure out that they need to hunt rabbit.

0.11 Travellers Dilemma In the Traveller’s Dilemma both players choose a value between 1 and 10. Each player receives the value chosen by the lower player, but the player chosing the lowest value gains a point from the other player (nothing happens in the case of a tie). Performance is shown in figure 16. All players perform roughly the same against random opponents. Against an omniscient opponent the Stackelberg based strategies are able to find the best strategy while other approaches move towards the worst outcome. In the worst case all agents other than RSR are able to find the best response.

4 Outcomes vsRandom Opponent inbos

2 Best Responder Safe Policy Selection 1.8 Restricted Nash Response Restricted Stackelberg Response 1.6 Restricted Stackelberg Response with Safety

1.4

1.2

1 Score 0.8

0.6

0.4

0.2

0

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent inbos

2 Best Responder Safe Policy Selection 1.8 Restricted Nash Response Restricted Stackelberg Response 1.6 Restricted Stackelberg Response with Safety

1.4

1.2

1 Score 0.8

0.6

0.4

0.2

0

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent inbos

2 Best Responder Safe Policy Selection 1.8 Restricted Nash Response Restricted Stackelberg Response 1.6 Restricted Stackelberg Response with Safety

1.4

1.2

1 Score 0.8

0.6

0.4

0.2

0

10 20 30 40 50 60 70 80 90 100 Round

Figure 1: Performance in the Battle of the Sexes game.

5 Outcomes vsRandom Opponent inchicken 10 Best Responder 9 Safe Policy Selection Restricted Nash Response Restricted Stackelberg Response 8 Restricted Stackelberg Response with Safety

7

6

5 Score 4

3

2

1

0 10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent inchicken 10 Best Responder 9 Safe Policy Selection Restricted Nash Response Restricted Stackelberg Response 8 Restricted Stackelberg Response with Safety

7

6

5 Score 4

3

2

1

0 10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent inchicken 10 Best Responder 9 Safe Policy Selection Restricted Nash Response Restricted Stackelberg Response 8 Restricted Stackelberg Response with Safety

7

6

5 Score 4

3

2

1

0 10 20 30 40 50 60 70 80 90 100 Round

Figure 2: Performance in Chicken.

6 Outcomes vsRandom Opponent incoordination4

4 Best Responder Safe Policy Selection 3.5 Restricted Nash Response Restricted Stackelberg Response Restricted Stackelberg Response with Safety 3

2.5

2 Score

1.5

1

0.5

0

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent incoordination4

4 Best Responder Safe Policy Selection 3.5 Restricted Nash Response Restricted Stackelberg Response Restricted Stackelberg Response with Safety 3

2.5

2 Score

1.5

1

0.5

0

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent incoordination4

4 Best Responder Safe Policy Selection 3.5 Restricted Nash Response Restricted Stackelberg Response Restricted Stackelberg Response with Safety 3

2.5

2 Score

1.5

1

0.5

0

10 20 30 40 50 60 70 80 90 100 Round

Figure 3: Performance in a coordination game.

7 Outcomes vsRandom Opponent incubanmcus

4 Best Responder Safe Policy Selection 3.5 Restricted Nash Response Restricted Stackelberg Response Restricted Stackelberg Response with Safety 3

2.5

2 Score

1.5

1

0.5

0

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent incubanmcus

4 Best Responder Safe Policy Selection 3.5 Restricted Nash Response Restricted Stackelberg Response Restricted Stackelberg Response with Safety 3

2.5

2 Score

1.5

1

0.5

0

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent incubanmcus

4 Best Responder Safe Policy Selection 3.5 Restricted Nash Response Restricted Stackelberg Response Restricted Stackelberg Response with Safety 3

2.5

2 Score

1.5

1

0.5

0

10 20 30 40 50 60 70 80 90 100 Round

Figure 4: Performance in the Cuban Missile Crisis game as the Soviet player.

8 Outcomes vsRandom Opponent incubanmcsov

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent incubanmcsov

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent incubanmcsov

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Figure 5: Performance in the Cuban Missile Crisis game as the US player.

9 Outcomes vsRandom Opponent inpursuer4 14 Best Responder Safe Policy Selection 12 Restricted Nash Response Restricted Stackelberg Response Restricted Stackelberg Response with Safety 10

8 Score 6

4

2

0 10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent inpursuer4 14 Best Responder Safe Policy Selection 12 Restricted Nash Response Restricted Stackelberg Response Restricted Stackelberg Response with Safety 10

8 Score 6

4

2

0 10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent inpursuer4 14 Best Responder Safe Policy Selection 12 Restricted Nash Response Restricted Stackelberg Response Restricted Stackelberg Response with Safety 10

8 Score 6

4

2

0 10 20 30 40 50 60 70 80 90 100 Round

Figure 6: Performance in the Pursuer/Evader game as the evader.

10 Outcomes vsRandom Opponent inevader4 14 Best Responder Safe Policy Selection 12 Restricted Nash Response Restricted Stackelberg Response Restricted Stackelberg Response with Safety 10

8 Score 6

4

2

0 10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent inevader4 14 Best Responder Safe Policy Selection 12 Restricted Nash Response Restricted Stackelberg Response Restricted Stackelberg Response with Safety 10

8 Score 6

4

2

0 10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent inevader4 14 Best Responder Safe Policy Selection 12 Restricted Nash Response Restricted Stackelberg Response Restricted Stackelberg Response with Safety 10

8 Score 6

4

2

0 10 20 30 40 50 60 70 80 90 100 Round

Figure 7: Performance in the Pursuer/Evader game as the pursuer.

11 Outcomes vsRandom Opponent ininspecterevader

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent ininspecterevader

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent ininspecterevader

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Figure 8: Performance in the Inspector/Worker game as the worker.

12 Outcomes vsRandom Opponent inevaderinspecter

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent inevaderinspecter

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent inevaderinspecter

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Figure 9: Performance in the Inspector/Worker game as the inspector.

13 Outcomes vsRandom Opponent inherostrong

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent inherostrong

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent inherostrong

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Figure 10: Performance in the Hero game.

14 Outcomes vsRandom Opponent inmatchingpenniescyclea

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent inmatchingpenniescyclea

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent inmatchingpenniescyclea

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Figure 11: Performance in the custom matching pennies game as player 2.

15 Outcomes vsRandom Opponent inmatchingpenniescycleb

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent inmatchingpenniescycleb

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent inmatchingpenniescycleb

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Figure 12: Performance in the custom matching pennies game as player 1.

16 Outcomes vsRandom Opponent inpursuitisraelitesmoses

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent inpursuitisraelitesmoses

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent inpursuitisraelitesmoses

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Figure 13: Performance in the Pursuit of the Israelites game as Pharoah.

17 Outcomes vsRandom Opponent inpursuitisraelitespharoah

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent inpursuitisraelitespharoah

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent inpursuitisraelitespharoah

4 Best Responder Safe Policy Selection Restricted Nash Response 3.5 Restricted Stackelberg Response Restricted Stackelberg Response with Safety

3

2.5 Score

2

1.5

1

10 20 30 40 50 60 70 80 90 100 Round

Figure 14: Performance in the Pursuit of the Israelites game as Moses.

18 Outcomes vsRandom Opponent instag

2 Best Responder Safe Policy Selection 1.8 Restricted Nash Response Restricted Stackelberg Response 1.6 Restricted Stackelberg Response with Safety

1.4

1.2

1 Score 0.8

0.6

0.4

0.2

0

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent instag

2 Best Responder Safe Policy Selection 1.8 Restricted Nash Response Restricted Stackelberg Response 1.6 Restricted Stackelberg Response with Safety

1.4

1.2

1 Score 0.8

0.6

0.4

0.2

0

10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent instag

2 Best Responder Safe Policy Selection 1.8 Restricted Nash Response Restricted Stackelberg Response 1.6 Restricted Stackelberg Response with Safety

1.4

1.2

1 Score 0.8

0.6

0.4

0.2

0

10 20 30 40 50 60 70 80 90 100 Round

Figure 15: Performance in the Stag Hunt game.

19 Outcomes vsRandom Opponent intravellersdilemma 10 Best Responder Safe Policy Selection 9 Restricted Nash Response Restricted Stackelberg Response 8 Restricted Stackelberg Response with Safety

7

6

5 Score 4

3

2

1

0 10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsOmniscient Opponent intravellersdilemma 10 Best Responder Safe Policy Selection 9 Restricted Nash Response Restricted Stackelberg Response 8 Restricted Stackelberg Response with Safety

7

6

5 Score 4

3

2

1

0 10 20 30 40 50 60 70 80 90 100 Round

Outcomes vsWorstCase Opponent intravellersdilemma 10 Best Responder Safe Policy Selection 9 Restricted Nash Response Restricted Stackelberg Response 8 Restricted Stackelberg Response with Safety

7

6

5 Score 4

3

2

1

0 10 20 30 40 50 60 70 80 90 100 Round

Figure 16: Performance in the Travellers Dilemma game.

20