Arxiv:1911.08418V3 [Cs.GT] 15 Nov 2020 Minimax Point Or Nash Equilibrium If We Have: to Tune

Fast Convergence of Fictitious Play for Diagonal Payoff Matrices Jacob Abernethy∗ Kevin A. Lai† Andre Wibisono‡ Abstract later showed the same holds for non-zero-sum games Fictitious Play (FP) is a simple and natural dynamic for [20]. Von Neumann's theorem is often stated in terms repeated play in zero-sum games. Proposed by Brown in of the equivalence of a min-max versus a max-min: 1949, FP was shown to converge to a Nash Equilibrium by min max x>Ay = max min x>Ay: Robinson in 1951, albeit at a slow rate that may depend on x2∆n y2∆m y2∆m x2∆n the dimension of the problem. In 1959, Karlin conjectured p that FP converges at the more natural rate of O(1= t). It is easy to check that the minimizer of the left hand However, Daskalakis and Pan disproved a version of this side and the maximizer of the right exhibit the desired conjecture in 2014, showing that a slow rate can occur, equilibrium pair. although their result relies on adversarial tie-breaking. In One of the earliest methods for computing Nash this paper, we show that Karlin's conjecture is indeed correct Equilibria in zero sum games is fictitious play (FP), pro- for the class of diagonal payoff matrices, as long as ties posed by Brown [7, 8]. FP is perhaps the simplest dy- are broken lexicographically. Specifically, we show that FP namic one might envision for repeated play in a game| p converges at a O(1= t) rate in the case when the payoff in each round, each player considers the empirical distri- matrix is diagonal. We also prove this bound is tight by bution of the actions of the other player and selects their showing a matching lower bound in the identity payoff case action as the best response to this statistic. Formally, under the lexicographic tie-breaking assumption. we can define state variables x(t); y(t) at each iteration t and update according to the rule 1 Introduction (t+1) (t) (t) > (t) x = x + ei(t) ; i = arg min ei Ay In a two-player zero-sum game, we are given a payoff i2[n] n×m (1.1) matrix A 2 R , whose ij-th entry denotes how (t+1) (t) (t) (t) > y = y + ej(t) ; j = arg max (x ) Aej: much the row player pays the column player when the j2[m] two players play actions i and j respectively. When th each player selects their actions randomly, with the row where e` is the ` standard unit basis vector. Brown (t) (t) player sampling i from some distribution x 2 ∆n and conjectured that the scaled state variables (^x ; y^ ) = 1 (t) 1 (t) the column player sampling j from some distribution ( t x ; t y ) would converge to a minimax point, and y 2 ∆m (where ∆d is the (d − 1)-dimensional simplex in 1951, Robinson showed that FP converges asymptot- in Rd), the expected gain to the column player (or ically to a minimax point [25]. equivalently, the expected loss to the row player) is In addition to having an intuitive game theoretic exactly x>Ay. Following the work of von Neumann and interpretation, FP has several other strengths. The Nash, we say that a pair of distributions (x∗; y∗) is at a update itself is simple, having no step-size parameter arXiv:1911.08418v3 [cs.GT] 15 Nov 2020 minimax point or Nash Equilibrium if we have: to tune. Since the row and column players only need O(n) or O(m) memory respectively, the dynamic (x∗)>Ay ≤ (x∗)>Ay∗ ≤ x>Ay∗ is also amenable to distributed computation. These qualities have have FP an appealing object of study, for all (x; y) 2 ∆n × ∆n. and many works have sought to prove convergence of In what might be considered the fundamental the- FP in more general settings [18, 26, 19, 6]. FP has orem of game theory, von Neumann proved [30] that also inspired many algorithms, such as the Follow- every zero-sum game admits an equilibrium pair; Nash The-Perturbed Leader algorithm [15] and other online learning algorithms. More recently, DeepMind used an ∗School of Computer Science, Georgia Institute of Technology. algorithm called Prioritized Fictitious Self Play as part Email: [email protected] of the training for their AlphaStar program for playing †School of Computer Science, Georgia Institute of Technology. Email: [email protected] competitive Starcraft [29]. ‡School of Computer Science, Georgia Institute of Technology. Despite the extensive work on FP, much of it has Email: [email protected] focused on asymptotic convergence, leaving significant Copyright © 2021 by SIAM Unauthorized reproduction of this article is prohibited questions about the convergence rate of the dynamic. the first positive result showing any improvement over While not stated as such, Robinson's 1951 proof actually Robinson's result for matrices of size 3 × 3 or larger. − 1 −1=2 implies that FP converges to within O(t m+n−2 ) of We further provide a lower bound of Ω(t ) under the equilibrium pair (x∗; y∗) after t rounds of play lexicographic tie-breaking, showing that our iteration [25]. Robinson's result utilized a recursive argument complexity bound is tight. that successively eliminates actions of the players, and Our analysis gives a tight characterization of the she did not address whether this was a tight rate. how the FP dynamic evolves in the diagonal case. We In what is often known as Karlin's Conjecture from show how lexicographic tie-breaking causes the dynamic 1959, Karlin [17] suggested that the true rate may be to behave in a specific way, which allows us to prove our significantly faster, perhaps on the order of O(t−1=2). upper and lower bounds. To our knowledge, this is the This remained an open question for decades, but was first such work that leverages lexicographic tie-breaking seemingly put to rest in 2014 by Daskalakis and Pan [10] to prove fast convergence, and we hope our work lays the who were able to produce an instance of a game and groundwork for proving the O(t−1=2) upper bound for a FP dynamic for which the convergence rate was arbitrary payoff matrices. We conclude by discussing Ω(t−1=n), in particular the rate is slow and depends some ways to extend our analysis and related open on the number of actions, similar to the bound of questions. Robinson. Their lower bound construction follows along the same lines as the upper bound of Robinson, 1.1 Related work We give a brief overview of prior recursively generating harder instances as more actions work on fictitious play and related game dynamics. are given to the players. Fictitious Play The original formulation of FP The goal of our work is to show that Karlin's was by Brown [7, 8], where he mentions both dis- conjecture may have only been ostensibly resolved, crete and continuous time dynamics. Since then, FP and we argue that a slightly more precise version of has been studied extensively{many works have explored the conjecture is likely to be true, namely that a the asymptotic convergence of fictitious play in various particular form of FP will admit a rate of O(t−1=2). game settings [18, 26, 19, 6], while another notable line The imprecise aspect of Karlin's conjecture is that the of work examines properties of the continuous time ver- arg min and arg max in (1.1) are not well-defined to sion of FP [13, 21, 22, 27]. The first convergence rate the extent that many solutions can exist in the event for FP was shown by Robinson [25], who proves that − 1 of ties. Daskalakis and Pan distinguish between the FP achieves a rate of O(t m+n−2 ) under arbitrary tie- model in which ties arising in (1.1) can be broken in an breaking. Karlin [17] later conjectures that the conver- − 1 arbitrary (adversarial) fashion and the model in which gence rate is O(t 2 ). This matches the convergence ties are broken lexicographically; they acknowledge rate of several related dynamics based on no-regret al- that their lower bound holds only in the former case. gorithms, as described below. Moreover, FP appears to Their lower bound construction heavily exploits the ill- always achieve this rate empirically, as we illustrate in defined nature of (1.1), employing carefully-constructed Figure 2. Daskalakis and Pan [10] construct a counter- tie-making and adversarial tie-breaking to obtain the example for Karlin's strong conjecture using carefully slow rate. We emphasize that one of the appealing designed adversarial tie-breaking rules, showing that FP properties of FP is that it is a natural game dynamic, for a zero-sum game on the n × n identity matrix has a − 1 yet the dynamic proposed by Daskalakis and Pan, while worst-case convergence rate of Ω(t n ). technically satisfying a definition of fictitious play, is by No-regret dynamics The literature on so-called no means natural. online learning [9] considers the family of problems in We consider the convergence of a well-defined ver- which an algorithm must make a decision on each of a sion of FP with lexicographic tie-breaking, where the sequence of T rounds|this could be a discrete choice arg min and arg max functions break ties by selecting among n alternatives, for example, or a real-valued pa- the winner with the smallest index. Lexicographic tie- rameter vector θ|and then the decision is evaluated breaking is one of the simplest tie-breaking methods, according to some loss which provides appropriate feed- being the default when writing a min or max in code.

Arxiv:1911.08418V3 [Cs.GT] 15 Nov 2020 Minimax Point Or Nash Equilibrium If We Have: to Tune

Learning and Equilibrium

Improving Fictitious Play Reinforcement Learning with Expanding Models

Approximation Guarantees for Fictitious Play

Modeling Human Learning in Games

Lecture 10: Learning in Games Ramesh Johari May 9, 2007

On the Convergence of Fictitious Play Author(S): Vijay Krishna and Tomas Sjöström Source: Mathematics of Operations Research, Vol

Information and Beliefs in a Repeated Normal-Form Game

Chronology of Game Theory

A Choice Prediction Competition for Market Entry Games: an Introduction

Actor-Critic Fictitious Play in Simultaneous Move Multistage Games Julien Pérolat, Bilal Piot, Olivier Pietquin

Quantal Response Methods for Equilibrium Selection in 2×2 Coordination Games

Learning in Games I