Learning and Convergence to Nash in Games with Continuous Action Sets∗

Sebastian Bervoets† Mario Bravo‡ and Mathieu Faure§

November 9, 2016

Abstract

We analyze convergence to Nash equilibria in games that have continuous action sets, when players use a simple learning process. We assume that agents are unsophis- ticated and know nothing about the game. They only observe their own payoff after actions have been played and choose their next action according to how their last pay- offs have varied. We also assume that players update their actions simultaneously, and we allow for heterogeneous patterns of interactions. We show that despite such limited information, in all games with concave payoff functions, convergence to Nash occurs much more easily than in discrete games. Further, we show that in ordinal potential games, convergence to Nash occurs with probability one. We also show that in games with strategic complements, the process converges with probability one to a Nash equi- librium that is stable. Finally we focus our attention on network games, where we get even stronger results. We believe that these results argue strongly for the use of in .

Keywords: Learning, Nash equilibrium, Potential games, Strategic complements ∗We wish to thank Y. Bramoull´e,P. Mertikopoulos, N. Allouch, F. Dero¨ıan,M. Belhaj, and participants of the IHP seminar in Paris and participants of the Conference on Network Science in Economics in Stanford for useful conversations. The authors thank the French National Research Agency (ANR) for their financial support through the program ANR 13 JSH1 0009 01. Part of this work was completed while the authors were hosted at the Institute for Mathematical Sciences, National University of Singapore, during the Workshop on Stochastic Methods in Game Theory in 2015. †Aix-Marseille University (Aix-Marseille School of Economics), CNRS and EHESS ‡Universidad de Santiago de Chile, Departamento de Matem´aticay Ciencia de la Computaci´on §Aix-Marseille University (Aix-Marseille School of Economics), CNRS and EHESS

1 1 Introduction

The concept of Nash equilibrium is central to economics. However, computing Nash equi- libria requires agents to possess an amount of information and sophistication that may seem unrealistic. This has motivated a large literature on learning, which aims at investigating whether agents with limited information and sophistication playing a , will eventually converge to a Nash equilibrium. Attention has mainly been devoted to games with finite sets. However, many variables appearing in economic models, such as , effort, time allocation give rise to games with continuous action sets, about which little is known in terms of learning. We tackle this issue by proposing a natural learning process for continuous games in contexts where agents have as little information as possible on the game they are playing, and almost no sophistication. We analyze the convergence of this learning process, which yields some general insights. Contrary to discrete games, where convergence to Nash is generally difficult to obtain even for two- or three-player games, we show that convergence is easier to obtain with continuous action sets, for very general classes of games and with arbitrary numbers of players. Actually, convergence is possible in any game, provided the payoff functions are strictly concave in one’s own action. In addition, while convergence to Nash occurs in many cases in ordinal potential games, it is guaranteed in games with strategic complements. In that case we also prove that the process will converge to Nash equilibria that are stable. In the learning process we consider, a continuous game is repeated across periods that are divided into two stages. First, all players simultaneously move a small distance away from their current action, in a direction chosen at random. This distance decreases over time and goes to zero as the periods unfold. We call this the exploration stage. Second, players observe the payoff associated with their exploration, and compare it with the previous payoff. If the payoff has increased (resp. decreased), players move in the same (resp. opposite) direction, with an amplitude proportionate to the variation in payoff. We call this the updating stage. Players receive a new payoff and a new exploration stage begins. This learning process is very simple: it requires no sophistication from the agents, who only need to compare their own payoff in the last and the current periods. Nor does it require any information to be made available to agents, as they do not know what mechanism produced these payoffs. They know nothing about their payoff function, who they are playing against or what actions their opponents have chosen. They may actually not even know that they are playing a game. There is a stochastic part to the process in the exploration stage. We analyze its (random)

2 set of accumulation points, called the limit set1, by resorting to stochastic approximation theory. This theory tells us that the long-run behavior of the stochastic process is driven by some underlying deterministic dynamic system. We thus start by identifying the deterministic dynamic system that underlies our specific stochastic learning process, and use its asymptotic behavior to establish the properties of the limit set of the learning process. In fact, to understand the behavior of the limit set of our learning process, we only need to relate it to the set of stationary points of the deterministic dynamic system. We show that Nash equilibria are always stationary points, although other points may also be stationary. Before detailing our results, we should point out that two cases may arise. While in many economic applications, the set of Nash equilibria consists of isolated points, in some cases continua of equilibria may appear. Here, to be as general as possible, we treat both cases. When equilibria are isolated we will talk about convergence in the usual sense of convergence to a point; when they are in a continuum we will talk about convergence to sets. Likewise, when discussing stability we will refer to linear stability in the case of isolated equilibria and to the notion of attractors in the case of continua. Naturally, results for the general case are less precise than results for the specific case of isolated equilibria. Therefore in each section we will state general results in the form of propositions, and results for isolated points in the form of theorems. Our first set of results (section 3) holds for any game that is strictly concave. We prove that, although some stationary points are non-Nash, the limit set of our process is never contained in this set. Further, we show that if some set of Nash equilibria is stable, then the process will converge to it with positive probability. To be able to make such a statement at this level of generality is surprising. In the case of isolated points these results get more precise: the process will never converge to stationary points that are non-Nash, and it will converge to a Nash equilibrium that is stable with positive probability. Thus, agents playing any (strictly concave) game that they know nothing about, and following a simple updating process, can end up playing a stable Nash equilibrium. Given how difficult it is to obtain convergence to Nash in discrete games, we believe that these findings are striking. In sections 4 and 5, we provide more precise convergence results by focusing on two broad classes of games. We start with ordinal potential games, for which we obtain a striking improvement. First, the limit set of our learning process is always contained in the set of stationary points of the dynamics. When equilibria are isolated, this implies that the process converges to a Nash equilibrium with probability one. This is a very positive result.

1By abuse of language, we will say that the process converges to its limit set, whether it is a point or a non trivial set. It is non convergent when the process goes to infinity.

3 Second, although convergence to unstable equilibria cannot be ruled out, it is still possible to qualitatively distinguish between equilibria that are stable and those that are not, using the properties of the potential function. Finally, we focus on games with strategic complements, a class of games that has also been extensively studied for their ability to model wide-ranging economic situations. However, unlike for potential games, the literature on games with strategic complements does not provide any general method that can help determine whether our stochastic process converges to Nash. This is why convergence results for standard learning processes with discrete actions are hard to obtain in these games (see Bena¨ımand Faure (2012)). However, for our learning process we are able to prove a very tight result: under some weak restrictions on the pattern of interactions, our process converges to a Nash equilibrium with probability one in games with strategic complements. Moreover, we prove that this equilibrium is necessarily stable. Thus, learning not only leads to Nash, it also serves as a selection device between stable and unstable equilibria. Last, we look into games played on networks (section 6). These games have become popular in economics over the last twenty years, as they consider heterogeneous interactions that are described by an underlying network. As we show, this framework allows us to interpret our different validity conditions as a simple condition on the bipartiteness of the network. It also enables us to obtain slightly tighter results.

Related Literature Our paper employs a method based on stochastic approximation theory, initially intro- duced in Ljung (1977) and often referred to as the ODE method, which consists in approx- imating a random process by a mean ordinary differential equation. This method has been extended by many authors (see for instance Kushner and Clark (1978), Benveniste et al. (1990), Duflo (1996) or Kushner and Yin (2003)) to simple dynamics (typically linear or gradient-like systems). In a series of papers, Bena¨ımand coauthors introduced a more gen- eral approach describing the asymptotic behavior of the random process, beyond gradients and linear dynamics. Most of the ensuing results can be found in Bena¨ım(1999), on which our methodology heavily relies. It is worth noting that stochastic approximation theory can be partially extended to differential inclusions (see e.g. Bena¨ımet al. (2005) and Faure and Roth (2010) for instance). As mentioned earlier, the learning literature has essentially focused on finite action games. (introduced in Brown (1951)) is possibly the most widely-explored adaptive process in game theory. At each step, players play a to the opponent’s empirical average moves. Under this learning rule, players’ average actions are shown to converge for 2-player zero-sum games (Robinson (1951)), for 2 × N games (Miyazawa (1961) and Berger

4 (2005)), for potential games (Monderer and Shapley (1996a)). Most of these convergence results also hold for stochastic fictitious play, introduced by Fudenberg and Kreps (1993), where payoff functions are perturbed by random variables: convergence for this procedure holds in 2 × 2 games (see Bena¨ımand Hirsch (1999)), zero-sum and potential games (see Hofbauer and Sandholm (2002)) and supermodular games (see Bena¨ımand Faure (2012)). Apart from these specific cases however, examples can be found where this class of learning procedures does not converge to Nash once there are at least 3 actions per player (see Shapley et al. (1964)). Our paper differs from all these contributions in assuming that agents do not observe their opponents’ actions. Moreover we consider continuous action games2. When agents do not observe their opponents’ actions, Reinforcement Learning processes are the most popular (see B¨orgersand Sarin (1997) or Erev and Roth (1998) for pioneer work). In these procedures, a state variable that aggregates all the available information is updated every stage, and agents choose their action based on these variables and on some of the previous moves. The state variable may or may not include information about opponents’ actions and payoffs. Reinforcement learning processes have been studied in very specific finite games: 2×2 games in Posch (1997), 2-player games with positive payoffs in B¨orgersand Sarin (1997), Beggs (2005) or Hopkins and Posch (2005). On the same topic, see also Leslie and Collins (2005), Cominetti et al. (2010), Bravo and Faure (2015) and Bravo (2016). However, it is not known if these procedures converge to Nash in more general games. Once again, we differ from the literature in focusing on continuous games, to which it would be difficult to adapt reinforcement learning. In the context of continuous action space and limited information, Dindos and Mezzetti (2006) consider a stochastic adjustment process called the better-reply. At each step, agents are sequentially picked to play a strategy chosen at random, while other players do not move. The agent then observes the hypothetical payoff that this action would yield, and decides whether to stick to this new strategy or to go back to the previous one, depending on the payoff. This process converges to Nash when actions are either substitutes or complements in what are called aggregative games. Although the authors, like us, assume almost no knowledge of the game, their contribution differs in that agents revise their strategy sequen- tially. The driving force for convergence is that with positive probability, every player will be randomly drawn as many times as necessary to approximate a best response. The proof techniques therefore consist in showing that best responses are well-enough approximated to guarantee convergence. However, it is easy to construct a simple game where simultaneity drives to non-convergence. In our paper, we assume that all players move simultaneously.

2Note that stochastic fictitious play has been adapted by Perkins and Leslie (2014) to continuous action games; however this adapted process is not only complex, but it still assumes that agents observe their opponents’ actions.

5 Last, a recent paper by Mertikopoulos (2016) analyzes learning in concave games with continuous action space and shows that convergence to Nash is possible. Yet, contrary to our model, it is assumed that players have enough information on the game they are playing to compute an approximation of their payoff gradient in every direction they can take. The rest of the paper is organized as follows: we introduce the model in section 2 and prove that our learning process is a discrete time stochastic process. In section 3, we give general results on convergence, before turning to the case of ordinal potential games in section 4 and games with strategic complements in section 5. All our proofs are in the appendix.

2 The model

We consider a repeated game G = (N , X, u), where N = {1,...,N} is the set of players,

Xi ⊂ R is a continuous action space and X = ×i=1,...,N Xi. An action xi ∈ Xi can be thought of as an effort level chosen by individuals, a set by a firm, a monetary contribution to a , etc. Because in all these examples actions take positive values, we will generally consider Xi = [0, +∞[. Finally, u = (ui)i=1,...,N is the vector of payoff functions, where ui will have various properties in different sections of the paper. However, we will always make the following standing assumption:

N Hypothesis 1 ui is assumed to be twice-differentiable on R+ and strictly concave in xi: 2 ∂ ui 2 (x) < 0. ∂xi Hypothesis 1 implies that once opponents’ actions are fixed, individuals’ payoff functions have a unique maximum.

Interactions

We allow for interactions, measured by ∂ui to be as general as possible: they can be ∂xj heterogeneous across players, they can be of any sign, and they may depend on the action profile x. While for most of our results no assumptions on the patterns of interactions are needed, we will occasionally use two restrictions. The first is when they are symmetric in sign:

Definition 1 (Symmetric interactions) A game G is said to have symmetric interactions if ∀i 6= j and ∀x, we have  ∂u  ∂u  sgn i (x) = sgn j (x) ∂xj ∂xi where sgn(a) = 0 if a = 0.

6 Most of the continuous games in the economic literature fall into this class. Note that a game with symmetric interactions does not require them to be of equal intensity. Also, symmetric interactions allow for patterns where i exerts a positive on individual j and a negative externality on individual k. Note finally that games with symmetric inter-     actions do not imply that sgn ∂ui (x) = sgn ∂ui (x0) . Thus signs may change when x ∂xj ∂xj changes. The second restriction is essentially for illustration purposes: it concerns interactions captured by an underlying graph, as in the popular literature on games in networks. This graph is induced by the interaction matrix G(x) defined as: gii(x) = 0 and, for i 6= j, ∂ui gij(x) = 0 whenever there exists a neighborhood U of x such that = 0 on U. Otherwise, ∂xj gij(x) = 1. Note that the interaction matrix is defined in the most general way, as it depends on the action profile x. Thus it can change as actions change. However, in the economic models we have in mind, the interaction graph is independent of the action profile, in which case we will say that our game is a network game:

Definition 2 (Network game) A game G is a network game if, for any x, x0 ∈ X and any 0 i 6= j, we have gij(x) = gij(x ).

Hence, in a network game, we can drop the x from the notation and write gij = 0 or gij = 1, as is standard in the network literature. Note that, as usual, interactions can flow through the network, and even though gij = 0, individual j’s action will affect i’s payoff at equilibrium when the network is connected, through the path of individuals that connects i to j. In section 6, we focus on network games since, in addition to being of interest for the economic literature, these games allow us to express some of our conditions in the form of topological conditions that are easy to check. Network games also provide a convenient way of illustrating many of the concepts that we refer to in the next sections. We present a very simple running economic example, i.e. Cournot on a networked .

Example 1 (Networked ) Consider a set of firms, all producing the same good and competing `ala cournot. Suppose that there exists an underlying undirected graph whose nodes are the firms and whose edges represent interactions, so that firm i only + competes against her neighbors. Let xi ∈ R denote the quantity produced by firm i at a constant c > 0, which they will sell at price pi, and let α and γ be positive real α−c numbers. Denote r ≡ γ and assume that the parameters are such that r > 0. Finally, let x¯ be the aggregate quantity produced by i’s neighbors: x¯ = P x . The market price i i {j;gij =1} j

7 is given by: γ p (x) = α − x − γx¯ i 2 i i and firm i’s profits are given by:

ui(x) = pi(x)xi − cxi

This game is a network game that satisfies hypothesis 1 and exhibits symmetric interactions.

The Learning Process

Agents aim at maximizing their payoff by choosing an action, but they do not know the payoff functions (ui)i, nor the set of players. They observe the realization of their payoffs after every period and choose some new action, according to a rule that we describe in words before presenting it formally. We detail what agent i does, bearing in mind that every agent simultaneously uses the same rule.

• At the beginning of a round, agent i is playing action xi and is enjoying the associated payoff ui(xi, x−i). Every round of the game is subdivided into two periods.

0 1. In the first period, player i plays a new action xi that is very close to his current action 0 xi. This deviation |xi −xi| is decreasing and tends to zero as rounds of the game unfold. 0 0 Whether xi > xi or xi < xi is drawn at random, each with probability one half. Agent 0 0 i then discovers the new payoff ui(xi, x−i), and this ends the first period. We call this the exploration stage.

00 2. The second period is the updating stage, where agent i chooses an action xi as follows: 0 0 if ui(xi, x−i) is higher than ui(xi, x−i), then agent i moves in the same direction as in 0 0 the first period. Conversely, if ui(xi, x−i) is lower than ui(xi, x−i), then agent i moves in the opposite direction. The magnitude by which agent i changes his action depends 0 on the variation in payoff: if increasing action from xi to xi has resulted in a small 00 increase (resp. decrease) of his payoff, agent i will choose an action xi that is slightly

higher (resp. lower) than xi. If on the contrary, it has resulted in a large increase (resp. 00 decrease) in his payoff, agent i will choose an action xi that is far higher (resp. lower)

than xi.

00 00 • At the end of the second period, agent i gets ui(xi , x−i) and the round ends. A new round starts with x00 as the new action profile.

i i i Formally, let (n)n be a sequence of i.i.d random variables such that P(n = 1) = P(n = i −1) = 1/2. Initially agent i chooses an action e0. Then at each stage n ≥ 0, player i selects his actions at stages 2n + 1 and 2n + 2 in the following way:

8 i i i 1 i (i) n is drawn and player i chooses e2n+1 := e2n + n+1 n;(exploration stage)

i i i i (ii) agent i plays e2n+2 := e2n(1 + n∆un+1), where (updating stage)

i i −i i −i ∆un+1 := ui(e2n+1, e2n+1) − ui(e2n, e2n)

Let xn = e2n and Fn be the history generated by {e1, ..., e2n+1}. Studying the asymptotic behavior of the random sequence (en)n amounts to studying xn. Hence the main focus of this paper is on the convergence of the random process (xn)n.

i i Remark 1 Notice that in the updating stage, we multiply ∆un+1 by e2n. This is because at each step of the process, we need to make sure that actions stay within the bounds (positive, that is). To remain positive, agents’ actions therefore need to be changing more and more slightly as they approach 0. One natural way of doing that is to multiply by a factor that takes smaller and smaller values as actions get closer to 0, as we did. In the appendix we show that our process is well-defined.

The next lemma is fundamental for the rest of our analysis. It provides the first step in relating the behavior of the random process (xn)n to the behavior of a mean deterministic system.

Lemma 1 The iterative process can be written as 1 x = x + (F (x ) + U + ξ ) (1) n+1 n n + 1 n n+1 n+1 where

∂ui (i) F (x) = (Fi(x))i with Fi(x) = xi (xi, x−i) ∂xi

(ii) Un+1 is a bounded martingale difference (i.e E (Un+1 | Fn) = 0)

(iii) ξn = O(1/n).

1 P 1 The iterative process (1) is a discrete time stochastic process with step n+1 , with n n+1 = 1 3 ∞ and limn→∞ n+1 = 0. . If there were no stochastic term, the process (1) would write 1 x = x + F (x ) n+1 n n + 1 n 3These two observations are crucial. It is important that the sum diverges, to guarantee that the process does not get ”stuck” anywhere, unless agents want to stay where they are. Further, it is important that the 1 terms go to zero, so that the process can ”settle” when agents want to. In fact, the term n+1 can be replaced P by any step of the form αn+1, where n αn+1 = ∞ and limn→∞ αn+1 = 0, without affecting the results.

9 which corresponds to the well-known Euler method, a numerical procedure for approximating the solutions of the deterministic ordinary differential equation (ODE)

x˙ = F (x) (2) Let ϕ(x, t) denote the flow of F (.), i.e. the position of the solution of (2) with initial condition x, at time t.4 Of course, the (stochastic) process (1) differs from the (determinis- tic) process (2), because of the random noise. However, properties (ii) and (iii) of lemma 1 guarantee that, on average, the noise will be negligible. This allows us to use stochastic approximation theory to relate the behavior of (xn)n to the flow of the ODE (2). For this N purpose, we will consider the restriction of the flow ϕ on X(= R+ ), although the dynamical N N 5 system is defined over R . We can do this since X = R+ is invariant for its flow, and our ran- dom process (1) always remains in the positive orthant. In the remainder of this section, we define the concepts that we will be using when we study the underlying deterministic system.

Convergence concept and stationary points

Any solution curve ofx ˙ = F (x) can either diverge, converge to a stationary point of the dynamics, be a periodic orbit or even exhibit more complex behavior. More generally, a point z is an ω-limit point if some solution curve gets arbitrarily close to z, at arbitrarily large times.

Definition 3 (ω-limit sets of x˙ = F (x)) Let x ∈ X. The omega limit set of x is

ω(x) := {z ∈ X; limk→∞ϕ(x, tk) = z for some tk → ∞}.

By abuse of language, we will say that the system converges to a set S when ω(x) ⊆ S. Among the ω-limit sets, the stationary points of the dynamical system will be of particular interest to us. A point x is a stationary point of F if F (x) = 0. The set of stationary points, denoted Z(F ), will be called the zeroes of F : Z(F ) := {x ∈ X; F (x) = 0}. Let NE denote the set of Nash Equilibria of the game G.

Proposition 1 x ∈ NE =⇒ x ∈ Z(F )

4Notice that by the regularity assumption on u(.), F satisfies the Cauchy-Lipschitz condition that guar- antees that, for all x ∈ X, this flow is well-defined and unique. 5Let S be a compact subset of RN . Then S is invariant for the flow ϕ if i) ∀x ∈ S, ∀t ∈ R, ϕ(x, t) ∈ S and ii) ∀y ∈ S, ∀t ∈ R, there exists x ∈ S such that ϕ(x, t) = y. Condition i) states that once inside the invariant set, the solution should stay inside. Condition ii) states that the solution can ”visit” every point within the set.

10 ∂ui To see why proposition 1 holds, observe that Fi(x) = xi (x). Thus, ∂xi   ∂ui x ∈ Z(F ) ⇐⇒ ∀i ∈ N, xi = 0 or (xi, x−i) = 0 ∂xi while   ∂ui ∂ui x ∈ NE ⇐⇒ ∀i ∈ N, (xi, x−i) = 0, or xi = 0 and (xi, x−i) ≤ 0 ∂xi ∂xi Unfortunately, the set of stationary points of the ODE contains more than the set of Nash equilibria:   ∂ui Z(F ) = NE ∪ x : F (x) = 0 and ∃i s.t. xi = 0, (x) > 0 . ∂xi

We call x ∈ Z(F ) \ NE an other zero (OZ) of the dynamical system: Z(F ) = NE ∪ OZ. In many economic applications, the stationary points of the dynamical system, and in particular Nash equilibria, would consist of isolated points. However, in some cases they can be part of a continuum of zeroes. Because we wish to be as general as possible, we will consider connected components of Z(F ):

Definition 4 Let Λ be a compact connected subset of Z(F ) and let N δ(Λ) := {y ∈ X : d(y, Λ) < δ}. We say that Λ is a connected component of Z(F ) if there exists δ > 0 such that N δ(Λ) ∩ Z(F ) = Λ.

When a connected component of Z(F ) is degenerate (i.e. a singleton), we say that this zero is isolated. A non-degenerate component can contain both Nash equilibria and other zeroes. When it only contains Nash equilibria, we will say that it is a Nash component. We illustrate these definitions on our running example.

Back to Example 1 Take the Cournot competition model presented in example 1. The first order conditions imply that, at equilibrium:

∗ ¯∗ xi = max{r − xi , 0}

The set Z(F ) is given by

Z(F ) = {x ∈ X : xi > 0 ⇒ xi = 1 − xi}.

As we now show, depending on the network structure, many different connected components of zeroes can appear in this simple game.

11 3 3 α 1 2 1 2

r−α 4 4

(a) Nash component, α ∈ [0, r] (b) Isolated Nash, connected to other zeroes

Figure 1: Different types of Nash equilibria (black dots correspond to effort r, while white dots correspond to null effort

• First consider the network with 4 agents featured in Figure 1. Then there is a Nash component {(r, 0, α, r − α), α ∈ [0, r]}. Another connected component of zeroes is given by {(0, r −α−β, α, β): α+β ≤ r} it contains exactly one Nash equilibrium (0, 1, 0, 0) and other zeroes. This Nash equilibrium is isolated in NE, but obviously not in Z(F). • Now consider the network with 4 agents featured in Figure 2. Then (r, 0, r, 0) is a Nash equilibrium of the game, and is isolated in Z(F ), as shown in panel 2a. Also the set

( 4 ) X Λ := x : xi = r, x1x3 = 0 i=1 is a component of Z(F ). It can be decomposed as

Λ = {(0, α, 0, r − α): α ∈ [0, r]} ∪ {(0, α, β, r − α − β): β > 0, α + β ≤ r} (3) ∪ {(β, α, 0, r − α − β): β > 0, α + β ≤ r} , (4) where the first element is a continuum of NE and the two others are continua of OZ. Notice that, even though the first element is a continuum of equilibria, it is not a Nash component, since it is connected to other zeroes. • Finally, in the square (Figure 3), there are three isolated Nash equilibrium (r, 0, r, 0), (0, r, 0, r) and (r/3, r/3, r/3, r/3) (which is interior). Also, the set ( 4 ) 4 X ∪k=1 x : xk = xk+1 = 0, xi = r i=1 is a component of Z(F ), included in OZ (it does not contain any NE).

12 1 2 1 2 α

r−α 4 3 4 3

(a) Isolated Nash (b) Continuum of Nash, α ∈ [0, 1], con- tained in a larger connected component

Figure 2: Different Types of Nash equilibria (black dots correspond to effort r, while white dots correspond to null effort)

Stability concepts

We now introduce the stability concept we will use for invariant sets of the dynamics. The usual notion of stability for stationary points of an ODE is linear stability 6. However, this concept is suitable for analyzing the stability of isolated zeroes of F , not of sets (i.e. components of Z(F )). The main reason is that the Jacobian matrix of equilibria that are in a continuum have at least one eigenvalue with null real part. We will say that a component of Z(F ) is stable if it is an attractor for the dynamical system (see Ruelle (1981)):

Definition 5 Let S ⊂ X be invariant for the flow ϕ. Then a set A ⊂ S is an attractor for x˙ = F (x) if (i) A is compact and invariant; (ii) there exists an open neighborhood U of A with the following property:

∀ > 0, ∃T > 0 such that ∀x ∈ U, ∀t ≥ T, d(ϕ(x, t),A) < .

An attractor for a dynamical system is a set with strong properties: it uniformly attracts a neighborhood of itself. Of course, the concept of attractor is more general than linear stability:

Remark 2 Let xˆ ∈ Z(F ). Then xˆ is an attractor for x˙ = F (x) if and only if xˆ is linearly stable. 6Given x ∈ Z(F ), let DF (x) be the Jacobian matrix of F evaluated at x. Then x is linearly unstable if ∃λ ∈ Sp(DF (x)) s.t. Re(λ) > 0, while it is linearly stable if ∀λ ∈ Sp(DF (x)) s.t. Re(λ) < 0.

13 1 2 1 2 r/3 r/3

r/3 r/3 4 3 4 3

(a) Isolated specialized Nash (b) Isolated specialized Nash 1 2 α r − α

0 0 4 3

(c) A stationary point in OZ

Figure 3: Two isolated NE and a point in OZ

3 General results using stochastic approximation the- ory

Our main focus is on describing the asymptotic behavior of our process (xn)n. Hence the object we are interested in is its limit set:

Definition 6 (Limit set of (xn)n) Given a realization of the random process, we denote by:

L ((xn)n) := {x ∈ X; ∃ a subsequence xnk such that; xnk → x} (5)

Note that the limit set of the learning process is a random object, because the asymptotic behavior of the sequence (xn)n depends on the realization of the random sequence (n)n, drawn at every exploration stage.

14 Now let us say something about the relationship between the asymptotic behavior of our stochastic process and its deterministic counterpart. For illustration purposes, let us 7 introduce the continuous-time piecewise-linear version of (xn)n, that we call (X(t))t≥0. Standard results in stochastic approximation theory (see Bena¨ım(1996) or Bena¨ım(1999) for instance) tell us that, as periods unfold, the random process (X(t))t≥0 gets arbitrarily close to the solution curve of the dynamical system: for any T > 0, limt→+∞ suph∈[0,T ] kX(t+ h) − ϕ(X(t), h)k = 0. In other words, given a time horizon T > 0 - however large it might be - the process shadows the trajectory of some solution curve between times t and t + T with arbitrary accuracy, provided t is large enough. Therefore it is reasonable to expect that

L ((xn)n) is closely related to the omega limit sets ofx ˙ = F (x). As mentioned earlier, stationary points are of special interest, but they contain both Nash equilibria and other zeroes. Points that are in OZ are ”unrealistic” outcomes from a game theory point of view, because they include agents who would like to deviate. The next proposition is particularly interesting in this regard.

Proposition 2 We have that

P (L(xn)n ⊂ OZ) = 0.

Although this proposition does not guarantee that the process converges to the set of Nash equilibria, it is good news that it rules out convergence to the set of other zeroes. Our second proposition concerns stability.

Proposition 3 Let A be an attractor of the ODE (2), with basin of attraction B(A). Then

P (L(xn)n ⊂ A) > 0 on the event {x0 ∈ B(A)}.

That is, the process will converge to attractors of the ODE (2) with positive probability. This is again good news, as it shows that agents with no sophistication whatsoever can eventually settle to stable situations (if these stable situations exist). If the stationary points are isolated, more precise statements can be made:

Theorem 1 Assume xˆ ∈ Z(F ) is isolated. Then:

a) If xˆ ∈ OZ, then P (limn xn =x ˆ) = 0.

b) If xˆ ∈ NE is linearly stable, then P (limn xn =x ˆ) > 0.

7 Pn 1 t−τn Let τn := . We then define (X(t))t≥0 by X(t) = xn + (xn+1 − xn) for t ∈ [τn, τn+1[. k=1 k τn+1−τn

15 The previous propositions tell us that when components of zeroes are not degenerate, the limit set of the process may include some points in OZ. For instance, the process could come arbitrarily close to a continuum of NE that is connected to a continuum of OZ and oscillate between the two. The main refinement of theorem 1 is that when stationary points in OZ are isolated, this cannot happen and we can guarantee that the process will not converge to them. At this stage of generality, it is difficult to say more than theorem 1. Yet, it is surprising that so much can already be said. Agents playing any (strictly concave) game, about which they know nothing, and following a simple updating process, will never end up in ”unde- sirable” stationary points, and can end up playing a stable Nash equilibrium. Given how difficult it is to obtain convergence to Nash in discrete games, we believe that what we obtain here is striking. In order to better understand the behavior of unsophisticated agents, we now analyze specific, yet large, classes of games.

4 Ordinal Potential Games

Definition 7 A game G is said to be a potential game if there is a function P : X → R such 0 that for all x−i ∈ X−i, for all xi, xi ∈ Xi, we have

0 0 ui(xi, x−i) − ui(xi, x−i) = P (xi, x−i) − P (xi, x−i) It is an ordinal potential game (OPG) if P is differentiable and we have8 ∂u   ∂P  sgn i (x) = sgn (x) (6) ∂xi ∂xi Heuristically, this means that variations in terms of payoffs for any agent are captured through a single function, the same for all players. In their paper, Monderer and Shapley (1996b) show that a continuous game with twice-differentiable payoffs is a potential game 2 2 if and only if ∂ ui = ∂ uj for all pairs i, j ∈ N, with i 6= j. We are not aware of any ∂xi∂xj ∂xj ∂xi similar characterization for OPGs, but the class is larger, essentially because it allows for heterogeneity in interactions. Many games studied in the economic literature fall into this class: for instance, our running example with Cournot competition. In fact the following function can be shown to be an ordinal potential for the Cournot competition played on a network.   X x2 1 X P (x) = rx − i − x x (7)  i 2 2 i j i j:gij =1

8 0 0 The original definition of (OPG) is ui(xi, x−i) − ui(xi, x−i) > 0 ⇐⇒ P (xi, x−i) − P (xi, x−i) > 0, which is more restrictive when functions are differentiable.

16 Assume that G is an OPG with continuously differentiable potential P . Then P is a strict Lyapunov function for the dynamical systemx ˙ = F (x) (see Lemma 4 in appendix). In terms of convergence of our random process, the main feature of ordinal potential games is the following.

Proposition 4 Assume that G is an OPG. If P is sufficiently regular, then

P (L(xn)n ⊂ Z(F )) = 1.

Theorem 2 If Nash equilibria are isolated, then

 ∗ ∗ ∃ x ∈ NE : lim xn = x = 1. P n Theorem 2 directly follows from Proposition 4 and proposition 2. These results are great improvements on previous results: for ordinal potential games, the only set to which our stochastic process can converge is the set of zeroes of F . Complex ω-limit sets of the dynam- ical system, which are non-zeroes, can be discarded. As a consequence, when Nash equilibria are isolated, convergence to a Nash equilibrium is guaranteed. In terms of stability too, we can go further tan we did earlier: we can distinguish between stable and unstable sets, by relating attractors of the dynamics to the potential function P , and to the Best-Response Dynamics (BRD).

Definition 8 Let F be a smooth map and Λ be a component of Z(F ), we say that Λ is a local maximum of P if

(i) P is constant on Λ: P (x) = v, ∀x ∈ Λ;

(ii) there exists an open neighborhood U of Λ such that P (y) ≤ v ∀y ∈ U

Obviously, attractors are included in Z(F ) by Proposition 4. We can actually go even further:

Lemma 2 If a set Λ is an attractor for x˙ = F (x), then Λ is a union of components of Z(F ).

This allows us to state the following:

Theorem 3 Assume G is an OPG and let Λ be a component of Z(F ). Then the following statements are equivalent

(i)Λ is an attractor for x˙ = F (x)

(ii)Λ is a local maximum of P

17 (iii)Λ ⊂ NE and Λ is an attractor for the best-response dynamics x˙ = −x + BR(x).

In terms of stability, this result is again very informative and positive. First, it tells us that attractors are necessarily included in the set of Nash equilibria. Thus, although the process might converge to other zeroes when stationary points are non-degenerate components, by virtue of theorem 2, these points are unstable. If we focus only on stable outcomes, then this result guarantees that unsophisticated agents will always end up in a Nash equilibrium. Second, theorem 3 provides two methods to find the attractors: one way is to look for local maxima of the potential function, and the other is to look for attractors for another dynamics, possibly simpler to analyze, the BRD. Note that this second method establishes a relation between two dynamics that are conceptually unrelated. Indeed, the BRD assumes that agents are very sophisticated, as they know precisely their payoff function, they observe their opponents’ play and they perform potentially complex computations. Solution curves may be very different, but both dynamics share the same set of attractors. We illustrate theorem 3 with our Cournot example, on the network of figure 2. For instance, we wish to check whether Λ1 = {(0, α, 0, r − α); α ∈ [0, r]}, which is a connected set of Nash equilibria, is an attractor. Note first that Λ1 is not a component, it is contained in the component Λ (see (3) above). However, Λ 6⊂ NE, and point (iii) tells us that it is not an attractor. The theorem gives us another method: checking that the corresponding component (3) is not a local maximum of P . The potential (7) is constant and equal to r2/2 along Λ1, of which the profile (0, 0, 0, r) is a member. However, for small enough  > 0, we r2 2 9 have P (, 0, , r − 2) = 2 +  , with (, 0, , r − 2) ∈/ Λ.

5 Games with strategic complements

The previous class contains some games with strategic complements (i.e. games with payoff 2 functions such that ∂ ui ≥ 0 for all i 6= j), but obviously not all of them. Under hypothesis ∂xi∂xj 1, it can be verified that the best-response functions are differentiable and strictly increasing. In that case, Vives (1990) proves in Theorem 5.1 and Remark 5.2 that, except for a specific set of initial conditions, the BRD, whether in discrete or in continuous time, monotonically converges to an equilibrium point. This could be useful when analyzing the deterministic systemx ˙ = F (x) underlying our stochastic process which, as we have seen, shares some properties with the BRD. Unfortunately, in our case the set of problematic initial conditions cannot be discarded, in particular because the process is stochastic. In fact, it could be that

9In a companion paper, Bervoets and Faure (2016) characterize the attractors for the BRD in a class of games that include our Cournot example. It can be verified that the necessary and sufficient conditions for being an attractor for the BRD are not met by Λ1 and hence, Λ1 is not an attractor for our learning process.

18 the stochastic process very often passes through these points, in which case the above result is ineffective. Actually, the dynamical system, once it passes through one of these points, is known to be able to converge to very complicated sets10. Hence, we cannot rely on any of these existing results. In this section, we impose two mild conditions on the game in addition to hypothesis 1:

Hypothesis 2 For any agent i, ∂u i (0, 0) > 0. ∂xi Hypothesis 2 simply guarantees that the boundary is repulsive and thus contains no Nash equilibrium. However, the boundary still contains the other zeroes and we need to be sure that the process (xn)n does not get there. We show that this is the case.

N Lemma 3 Under Hypothesis 2, there exists a > 0 such that L((xn)n) ⊂ [a, +∞[ almost surely.

This lemma guarantees that all the action takes place in the interior of the state space, where we impose our second condition:

Hypothesis 3 The matrix DF (x) is irreducible for any x in the interior of X

This hypothesis amounts to saying that there is no point in the interior where the com- plementarities between all agents of a subgroup and all agents of another subgroup are equal to zero. Furthermore, an immediate consequence of the fact that the game exhibits strategic complements is that the systemx ˙ = F (x) is cooperative, that is all non-diagonal entries of DF (x) are nonnegative. Together with Hypothesis 3, this implies that the dynamical system is cooperative and irreducible in Int(X), which guarantees useful features (see for instance Bena¨ım(2000))11. The result we are about to state is very general, but it comes at the price of an additional assumption on the random noise (Un)n:

T Definition 9 Let Q(x) := E(Un+1Un+1 | xn = x). We say that the noise is uniformly ∗ N exciting if Q(x) is positive definite for any x ∈ Int(X) = (R+) 10See Hirsch (1999). 11 ∂ϕ(x,t) Among others, the associated flow has positive derivatives: ∂x  0 for t > 0. This implies that it is strongly monotonic: ϕ(x, t)  ϕ(y, t) for all x > y and t > 0.

19 Let us recall that a matrix is positive definite if all its eigenvalues are strictly positive. This condition then implies that the noise can push the process in every direction. Let ZU (F ) denote the set of linearly unstable zeroes and ZS(F ) = Z(F ) \ ZU (F ). Note that ZS(F ) ⊂ NE, thus we will call these points stable equilibria. Note also that ZS(F ) may include equilibria that are not linearly stable. We have the following:

Theorem 4 Consider a game with strategic complements and smooth payoff functions that exhibit symmetric interactions. Assume that Hypothesis 2 and 3 hold, and that the noise is uniformly exciting. Then our learning process (xn) almost surely converges to a stable Nash equilibrium:  ∗ S ∗ ∃ x ∈ Z (F ) : lim xn = x = 1 P n on the event {lim supn kxnk < +∞}.

This result is very tight. In the preceding sections, we obtained separate results on convergence and on stability. Here we have one result tackling both at the same time. Furthermore, hypotheses 2 and 3 are verified for most common economic models. As a consequence, it seems that in a very general class of continuous games, learning leads to Nash and additionally, serves as a selection device between stable and unstable equilibria. Our result is general but there is a downside: unfortunately, a condition on the noise needs to be imposed: if the noise is zero for instance, the process would remain trapped in unstable equilibria. Showing that the noise is uniform is generally not easy. Nevertheless, in the next section we give a very simple sufficient condition guaranteeing that the noise is uniformly exciting in network games.

Remark 3 If we are to relax the assumption on the noise, then we still get a positive result that we prove in the appendix: consider a game that exhibits strategic complements and symmetric interactions and let xˆ be a linearly unstable interior zero of F . Then   lim xn =x ˆ = 0. P n This means that, without the noise condition, we cannot guarantee that the process will not converge to general unstable sets12. However, we can still exclude convergence to linearly unstable equilibria. 12Linearly unstable equilibria are unstable sets, but unstable sets also include much more complex struc- tures.

20 6 Network Games

Many of our results can be made more precise within the framework of network games. Additionally, some conditions previously imposed become very simple conditions in terms of the topology of the network.

Definition 10 A network is said to be bipartite if the set N of players can be partitioned into N1 and N2 such that for any pair of players i and j we have

gij = 1 =⇒ (i ∈ N1 and j ∈ N2) or (i ∈ N2 and j ∈ N1)

Here we state a theorem applicable to network games that mirrors theorems 1, 2 and 4.

Theorem 5 Assume the game is a network game with symmetric interactions, played on a non-bipartite graph. Then

• Assume xˆ ∈ Z(F ) is isolated. If xˆ ∈ NE is interior and linearly unstable, then   lim xn =x ˆ = 0 P n

• Assume the game is an OPG where NE consists of interior isolated points. Then

 ∗ S ∗ ∃x ∈ Z (F ) : lim xn = x = 1. P n

• Assume the game has strategic complements and hypotheses 2 and 3 hold. Then

 ∗ S ∗ ∃ x ∈ Z (F ) : lim xn = x = 1 P n

on the event {lim supn kxnk < +∞}.

As we can see, the simple non-bipartiteness condition of the network helps strengthen all the preceding results. The first result complements theorem 1 and rules out convergence to unstable equilib- ria that are interior. Again, this result is very general because the symmetric interactions assumption is satisfied by most of the games commonly analyzed in the social network lit- erature. The only case with isolated stationary points that is not covered is convergence to unstable points that are not interior. However, we conjecture that this cannot happen. Although the proof is detailed in the appendix, this is the intuition behind it. Assume that at some period, the process goes near a stable equilibrium of the game. Then the deter- ministic dynamics (2) underlying the stochastic process (1) will drive it towards that stable equilibrium. When getting close to the equilibrium the deterministic drift vanishes. The

21 stochastic noise might then push the process away. However, as the process moves away, the deterministic drift becomes once again the main driving force. The process will thus go back to the stable equilibrium. Now, assume that, at some stage, the process is close to an un- stable equilibrium. Because this equilibrium is unstable, there is a direction along which the deterministic dynamics is driven away from the equilibrium. If the random process follows this direction, it will go away and escape. The condition on non-bipartiteness guarantees that the noise can push the system in any direction, including the unstable one. Although non-bipartiteness and symmetric interactions are only sufficient conditions, we illustrate through two examples in the appendix why it is difficult to drop either of the two conditions (see examples 2 and 3). The second result strengthens theorem 2 for OPGs, and follows as a direct consequence of theorem 2 and the first point of theorem 5. Without the network condition we were able to guarantee that the process converges to a Nash equilibrium; now we also can prove that this equilibrium is stable. The third result is just a simpler statement of theorem 4, because the non-bipartiteness condition guarantees that the noise is uniformly exciting. Note that, here again, non- bipartiteness is only a sufficient condition. Actually, a non-convergence result could hardly rely on a necessary and sufficient condition. However this condition is not unreasonably strong since, given a bipartite graph, payoff functions can be found such that the noise is not uniformly exciting. In the appendix, after the proof of theorem 4, we illustrate this point with a specific game.

Remark 4 Most of the games considered in the flourishing network literature are ordinal potential games. This literature has largely focused on games where the payoff of an agent depends on his own action and on the (possibly discounted) sum of the actions of his neigh- bors. We call these games local aggregate games; they include the game of linear comple- mentarities (Ballester et al. (2006)), the public good game (Bramoull´eand Kranton (2007) and Bramoull´eet al. (2014)), the public good with private provision game (Allouch (2015)) and games, including the Cournot competition presented in example 1 (Bimpikis et al. (2014), K¨oniget al, 2014). In all of these games, Nash equilibria have been studied and depend on the structure of the network, inducing another dimension of complexity for unsophisticated agents13. They exhibit very different equilibrium structures. For instance, the game in Ballester-Calv´o-Zenou(2006) has strategic complements and either a unique isolated

13For instance, in the game introduced by Ballester, Calv´o-Armengoland Zenou, the Nash equilibrium is related to the Bonacich centrality of agents, very difficult to compute, being infinite sums of discounted paths of the network.

22 equilibrium that is interior or no equilibrium, while the public good game in Bramoull´eand Kranton (2007) has strategic substitutes and, in many networks, multiple components of equi- libria where each component is itself a continuum. However, a direct adaptation of Dubey et al. (2006) shows that they all share the same ordinal potential. This potential is given by ! X 1 X P (x) = H (x ) − δ x x i i 2 ij i j i∈N j6=i with Z x0 Hi(xi) = min{bi(x), xi}dx 0 −1 where bi(x) denotes the best response of player i to x, x0 is b (0) for a game with substitutes and +∞ for a game with complements. For instance, the Cournot competition model of example 1 is a local aggregate game, with δij = δji = 1 if gij = 1, δij = δji = 0 otherwise.

7 Conclusion

In this paper we addressed a simple question. Given how difficult it is to get convergence to Nash equilibria in standard games with discrete strategy space, is there any chance of getting convergence in standard economic games where actions are continuous? Allowing for a very wide range of games, including games played on networks where interactions are asymmetric, we consider the least sophisticated possible agents: they do not know anything. They only observe their payoff at the end of each round and make their next choice according to how their last payoff varied. With this simple framework, we find many positive results showing that the Nash equi- librium concept is very appropriate in the context of continuous games. nash equilibria are candidates for convergence in any concave game, and they will often be the only possible converging points in OPGs and in games with complementarities. Furthermore, in OPGs, we find that stable Nash equilibria are likely to emerge, and that in games with strategic complements, it is only stable Nash equilibria that will emerge. This paper constitutes a first positive step in the analysis of learning in continuous games. In future research, it would be interesting to study learning processes where individuals have more information, for instance observing the actions of their neighbors, and/or some limited sophistication. We conjecture that convergence to Nash would occur even more frequently.

23 Appendix

7.1 Proof of results of Section 2

Proof of Lemma 1. We have, for any i ∈ N,

i i i i i e2n+2 − e2n = e2nn∆un+1 A first order development gives   1 1   i ∆ui = i u ei + i , e−i + −i − u (ei , e−i) n n+1 n i 2n n + 1 n 2n n + 1 n i 2n 2n 1 ∂ui 1 X ∂ui 1 = (i )2 (e ) + i j (e ) + O( ) n + 1 n ∂xi 2n n + 1 n n ∂xj 2n n2 j6=i

i 2 Because (n) = 1 and xn = e2n, we have

1 ∂ui 1 X ∂ui 1 xi − xi = xi (x ) + i xi j (x ) + O( ) n+1 n n + 1 n ∂xi n n + 1 n n n ∂xj n n2 j6=i

i i i P j ∂ui j By setting Un+1 = nxn j6=i n ∂xj (xn), we get equation (1). Finally, note that E (n) = 0 i j for all j, and that n and n are independent, so that

E (Un+1 | Fn) = 0 

i 1 Proof of Remark 1. We prove the following statement: For n ≥ 0, we have xn > n+1 and i i e2n+1 > 0, provided x0 > 1.

Notice that hypothesis 1 implies that Dui is bounded everywhere. For simplicity and 0 0 without loss of generality, we will assume that |ui(x) − ui(x )| < kx − x k∞. This is just meant to guarantee that our learning process is well behaved, although it can easily be 0 0 adapted when it does not hold. Let n ≥ 0. By assumption, |ui(x) − ui(x )| ≤ kx − x k∞ for all i. Thus, i xn+1 i ≥ (1 − ke2n+1 − xnk∞) xn i i 1 and |e2n+1 − xn| ≤ n+2 for all i. As a consequence i xn+1 1 i ≥ (1 − ). xn n + 2 and n−1 Y  1  1 1 xi ≥ xi 1 − = xi > . n 0 k + 2 n + 1 0 n + 1 k=0

i i 1 Moreover e2n+1 ≥ xn − n+2 > 0. 

24 7.2 Proof of results of Section 3

Proof of Proposition 2. Observe first that we can work on the event   sup kxnk < +∞ n→+∞ since, otherwise, there is nothing to prove. For η > 0, let Kη ⊂ Z(F ) \ NE be given by

 ∂uj ∂ui  K = x ∈ X : xj = 0 ∀ j ∈ {1,...,N} and ∃ i with xi = 0 and ≥ 2η , η ∂xj ∂xi

Clearly, (Kη)η>0 is a collection of compact sets,that is increasing as η decreases to zero, and

Z(F ) \ NE = ∪η>0∈N∗ Kη. Thus the family of events Eη = {L((xn)n) ⊂ Kη} is increasing, as η decreases to zero, and ∪η>0Eη = {L((xn)n) ⊂ Z(F ) \ NE}, since L((xn)n) is compact. Assume by contradiction that P (L((xn)n) ⊂ Z(F ) \ NE) > 0. Then necessarily there exists ∗ ∗ η > 0 such that P (Eη∗ ) > 0. On this event, there exists n ∈ N (random) such that ∂ui ∗ ∗ ∂xi (xn) ≥ η , ∀n ≥ n .   ξi  ˜ i i P j ∂ui i i 1 ∂ui ˜ i n+1 Let U =   (xn), so that x = x 1 + (xn) + U + i . Define n+1 n j n ∂xj n+1 n n+1 ∂xi n+1 xn QN i the random variable yn = i=1 xn ≥ 0. We then have that

N   i  Y 1 ∂ui ξ y = xi 1 + (x ) + U˜ i + n+1 n+1 n n + 1 ∂x n n+1 xi i=1 i n N   i ! X 1 ∂ui ξ = y 1 + (x ) + U˜ i + n+1 n n + 1 ∂x n n+1 xi i=1 i n i 1 i As a consequence, since ξn = O( n2 ) and xn ≥ 1/(n + 1), we have N    ! 1 1 X 1 ∂ui 1 = 1 − (x ) + U˜ i + o . y y n + 1 ∂x n n+1 n n+1 n i=1 i

∗ PN 1 ∂ui ∗ 1 For n ≥ n , we have (xn) ≥ η , which implies that i=1 n+1 ∂xi n+1

N 1 1 1 η∗ 1 1 X ≤ − − U˜ i . y y y 2 n + 1 y n+1 n+1 n n n i=1 As a consequence  1  1 1 η∗ 1 E | Fn ≤ − , yn+1 yn yn 2 n + 1 that is, the random sequence (1/yn)n is a positive supermartingale. It then converges almost surely to some random variable Y . However, on the event {L((xn)n) ⊂ Z(F ) \ NE}, we

25 have almost surely yn →n 0. these two convergence properties are in contradiction. Thus P (L((xn)n) ⊂ Z(F ) \ NE) = 0.  Proof of Proposition 3. This result can be found in Bena¨ım(1999). Proof of Theorem 1. Both points are direct consequences of Propositions 2 and 3.

7.3 Proof of results of section 4.

Before proving Proposition 4 and Theorem 2, let us define the following dynamical concept:

Definition 11 Let P : X → R be continuously differentiable. We say that P is a strict14 Lyapunov function for x˙ = F (x) if

• for x ∈ Z(F ) the map t 7→ P (ϕ(x, t)) is constant;

• for x∈ / Z(F ) the map t 7→ P (ϕ(x, t)) is strictly increasing.

Lemma 4 Assume that G is an OPG with continuously differentiable potential P . Then

(i) P is a Lyapunov function for x˙ = −x + BR(x), with respect to NE.

∂ui (ii) P is a Lyapunov function for x˙ = F (x) (where Fi(x) = xi (x)) with respect to Z(F ). ∂xi

Proof. By assumption, ∂u ∂P ∂u ∂P ∀x, ∀i, i (x) > 0 ⇒ (x) > 0 and i (x) < 0 ⇒ (x) < 0. ∂xi ∂xi ∂xi ∂xi (i) We have X ∂P hDP (x), −x + BR(x)i = (x)(−x + BR (x)). ∂x i i i i We need to check that, if x∈ / NE, then this quantity is positive. Let i be such that

∂ui xi 6= BRi(x), say xi < BRi(x−i). Then by strict concavity of ui we have (x) > 0. Thus ∂xi ∂P (x) > 0 and hDP (x), −x + BR(x)i > 0. ∂xi (ii) We have X ∂ui ∂P hDP (x),F (x)i = x (x) (x). i ∂x ∂x i i i

14Generally, P is a Lyapunov function forx ˙ = F (x) with respect to Λ if t 7→ P (ϕ(x, t)) is constant on Λ and strictly increasing for x∈ / Λ; when the component Λ coincides with the set of stationary points of the flow, then we say that P is strict.

26 We need to check that, if x∈ / Z(F ), then this quantity is positive. Let i be such that

∂ui Fi(x) 6= 0. Then xi > 0 and (x) 6= 0, which implies that ∂xi

∂ui ∂P xi (x) (x) > 0. ∂xi ∂xi and the proof is complete. 

Proof of Proposition 4. This is a consequence of Proposition 6.4 in Bena¨ım(1999). To be able to apply this result we need to show that P (Z(F )) has an empty interior:

Lemma 5 Assume G is an OPG. Then, if P is sufficiently regular, P (Z(F )) has an empty interior.

Proof of Lemma 5. The sketch of this proof is as follows: we decompose the set of zeroes of F as a finite union of sets on which we can use Sard’s Theorem.

Let A be any subset of agents and ZA be the set   ∂ui x ∈ Z(F ): xi = 0 ∀i∈ / A, = 0 ∀i ∈ A . ∂xi

It is not hard to see that ZA is closed. Moreover Z(F ) = ∪A∈P({1,...,N})ZA. A A We now prove that P is constant on ZA. Let P : [0, 1] → R be defined as

P A(z) := P (z, 0).

A A A For x ∈ ZA, denote by x = (xi)i∈A. We then have P (x ) = P (x). Moreover, for i ∈ A,

∂P A = 0 ∂xi by definition of ZA and the additional assumption we made on P . Hence

A A A {x : x ∈ ZA} ⊂ {z ∈ [0, 1] : ∇zP = 0}.

A A A Now P is sufficiently differenciable, so is P , and by Sard’s Theorem, P ({x : x ∈ ZA}) A A A has empty interior in R . As an immediate consequence, P is constant on {x : x ∈ ZA} , which directly implies that P (ZA) has empty interior. Since Z(F ) is a finite union of such sets, P (Z(F )) has empty interior.  Proof of Theorem 2. It follows directly from Proposition 6.6 in Bena¨ım(1999) and The- orem 2. 

27 Proof of Theorem 3. First we prove that (i) implies (ii). Since G is an OPG, P (Z(F )) has empty interior (see lemma 5 above). Moreover, by virtue of theorem 2, we have Λ ⊂ Z(F ). Then P is constant on Λ. Let v := P (Λ). If Λ is not a local maximum of P then there exists a sequence xn such that d(xn, Λ) →n 0 and P (xn) > v. Since Λ is isolated we have xn ∈ X \ Z(F ) and P (ϕ(xn, t)) > P (xn) > v for any t > 0 hence d(ϕ(xn, t), Λ) 9 0 and Λ is not an attractor. Let us now prove that (ii) implies (iii). First we show that Λ is contained in NE. Suppose that there existsx ˆ ∈ Λ \ NE. Without loss of generality, we suppose that

∂u1 xˆ1 = 0, (ˆx) > 0 ∂x1

Since ∂u1 (ˆx) > 0, we also have ∂P (ˆx) > 0, by definition of an OPG. As a consequence,x ˆ is ∂x1 ∂x1 not a local maximum of P . We now prove that Λ is an attractor for the best-response dynamics. P is a strict Lya- punov function for the best-response dynamics15 and Λ ⊂ NE. The statement we want to prove is then a consequence of Proposition 3.25 in Bena¨ımet al. (2005). We adapt the proof in our context for convenience. First of all observe that Λ is actually a strict local maximum of P : there exists an open (isolating) neighborhood U of Λ such that P (x) < v = P (Λ), ∀x ∈ U \Λ. This is a simple consequence of the fact that P is strictly increasing along any solution curve with initial conditions in U \ Λ. Now let Vr := {x ∈ U : P (x) > v − r}. Clearly ∩rVr = Λ. 16 Also ϕ(Vr, t) ⊂ Vr, for t > 0, r small enough . This implies that Λ = ∩r>0Vr contains an attractor A. The potential being constant on Λ, A cannot be strictly contained in Λ and therefore Λ is an attractor.

Now clearly (iii) implies (i): Λ = ωBr(U) for some open neighborhood U of Λ. Since U ∩ Z(F ) ⊂ NE, ωF (U) = ωBr(U) and the proof is complete 

7.4 Proof of results of section 5

Proof of Lemma 3. The proof is in the same spirit as the proof of Proposition 2. Under assumption 2, for any i, there exists xi > 0 such that

∂ui (xi, 0) > αi > 0, ∀xi < xi. ∂xi Since the game has strategic complements,

∂ui (xi, x−i) > αi > 0, ∀xi < xi, ∀x−i ∈ X−i. ∂xi 15Keep in mind that this means that it is a lyapunov function with respect to NE. 16 −1 We need to make sure that r is small enough so that Vr = P ([v − r, v]) ⊂ U

28 As a consequence, any solution trajectory with initial condition in the set {x ∈ X : xi ∈]0, xi[} is in the set {x ∈ X : xi > xi} after some time t > 0. Let a = maxi xi. Then either N L((xn)n) ⊂ [a, ∞[ or L((xn)n) ⊂ ∂X. ∂ui Let η := mini=1,...,N (0, 0) > 0. Clearly ∂xi

∂ui ∂X ⊂ {x ∈ X : ∃i with xi = 0 and (x) ≥ η} ∂xi

The end of the proof follows the same argument as the proof of Proposition 2. 

As mentioned earlier, by a well-known result of Bena¨ım(1999), on the event {lim supn kxnk <

+∞}, the limit set of (xn) is always compact, invariant and attractor-free. This class of sets is called internally chain transitive (ICT). Zeroes of F , periodic orbits and more generally the omega limit set of any point x (if not empty) are ICT sets. However some more complicated sets also are ICT: for instance any connected set of zeroes of F . In order to be able to claim that (xn)n converges to a Nash equilibrium, we need additional information on the game. The good news is, because of the nice properties of our dynamics -x ˙ = F (x) is cooperative and irreducible - we know more about the internally chain transitive sets and a restatement of one of the main result of Bena¨ımand Faure (2012) adapted to our settings is the following:

Theorem 6 (Bena¨ımand Faure, 2012) Let (xn)n ∈ X be a random process that can be written as 1 x = x + (F (x ) + U + ξ ) (8) n+1 n n + 1 n n+1 n+1 where

(i) F : X → X is a smooth map, that is cooperative and irreducible in Int(X);

(ii) Un+1 is a bounded martingale difference and is uniformly exciting;

(iii) ξn = O(1/n);

(iv) there exists a > 0 such that L(xn)n ⊂ [a, +∞[ almost surely.

Then  ∗ S ∗ ∃ x ∈ Z (F ) : lim xn = x = 1 P n on the event {lim supn kxnk < +∞}.

Proof of Theorem 4. We apply previous theorem. Points (i), (iii) and (iv) follow from Lemmas 1 and 3, while point (ii) is a consequence of the assumptions. 

29 Proof of Remark 3. We assume without loss of generality that the unstable space at xˆ is one-dimensional and generated by the unitary vector v, corresponding to the positive eigenvalue µ, and that the interaction graph is connected bipartite (the other case is cov- ered by Theorem 1). Then there exists a partition of N,(A, B), such that if a ∈ A and ∂ua (ˆx) ∂ub (ˆx) > 0 then b ∈ B. Using the computations developed in the proof of Theorem 1, ∂xb ∂xa we need to show that  2 X ∂ua ∂ua v xa (ˆx) + v xb (ˆx) 6= 0. a ∂xb b ∂xb a 0 ∀ a ∈ A and vb < 0 ∀ b ∈ B. By a simple rearrangement of the indexes, the Jacobian matrix inx ˆ can be written as follows: ! D M DF (ˆx) = A , NDB

a 2 a 2 where DA is diagonal and the diagonal terms are equal to (x ∂ ua/∂(x ) )(ˆx) ≤ 0, a ∈ A; i 2 i j and similarly for DB. M and N are non-negative matrices, as x ∂ ui/∂x ∂x ≥ 0 ∀i 6= j. Because µ is strictly positive and v is the corresponding normalized eigenvector, we should P 2 have hvDF (ˆx), vi = µ i vi > 0, since v 6= 0. However,

 2 2  X X ∂ ua ∂ ub hvDF (ˆx), vi = v2xi∂2u /∂(xi)2 + v v xa + xb ≤ 0, i i a b ∂xa∂xb ∂xa∂xb i a∈A,b∈B a contradiction. 

7.5 Proof of results of section 6

Proof of the first point of Theorem 5. Assume without loss of generality that the unstable space atx ˆ is one-dimensional, that is DF (ˆx) has only one strictly positive eigenvalue µ, and we call v the associated normalized eigenvector. We use a result from Pemantle (1990), more precisely in the settings of Brandiere and Duflo (1996), which states that a sufficient condition for non convergence tox ˆ is that the noise is exciting in the unstable direction:

2  lim inf < Un+1, v > | Fn > 0 (9) n→+∞ E on the event {limn xn =x ˆ}. We show that under our assumptions, this conditions is met for any v 6= 0. We have

 !2 X ∂ui ∂uj < U , v >2= i j v xi (x ) + v xj (x ) n+1 n n i n ∂xj n j n ∂xi n i

30 i j i 2 Hence, using E (nn) = 0 if i 6= j and (n) = 1, we get  2 X ∂ui ∂uj < U , v >2| F  = v xi (x ) + v xj (x ) E n+1 n i n ∂xj n j n ∂xi n i | Fn = vixˆ (ˆx) + vjxˆ (ˆx) n→+∞ ∂xj ∂xi i 0, j (ˆx) k (ˆx) > 0, i (ˆx) k (ˆx) > 0. ∂xj ∂xi ∂xk ∂xj ∂xk ∂xi

We thus have sgn(vi) = −sgn(vj) = sgn(vk) = −sgn(vi) which implies, sincex ˆ is interior, that vi = vj = vk = 0. As a consequence, for every agent l linked to i, j or k, we have vl = 0. Recursively, since the interaction graph is connected, we have v = 0, which is a contradiction and concludes the proof.  Proof of the second point of Theorem 5. This is a consequence of the first point and Theorem 2.

Proof of the third point of Theorem 5. It is sufficient to prove that if a network is non bipartite and the game exhibits symmetric interactions, then the noise is uniformly exciting. We wish to show that hv, Q(x) · vi = 0 if and only if v = 0 But 2  hv, Q(x) · vi = E < Un+1, v > | xn = x so the proof was established in the first point of theorem 5.  At this point, some comments are in order. The key argument that is needed to rule out convergence to linearly unstable equilibria is the noise condition (9). Whenx ˆ is interior and linearly unstable, it is clearly not necessary to assume non-bipartiteness of the network

31 and/or symmetric interactions for this condition to hold. However, as we shall see in the next example, we might run into troubles in bipartite graphs if the game does not have strategic complements. We provide an example with strategic substitutes with 4 agents on a square.

Example 2 Consider the following 4-player public good game:

u1(x) = −cx1 + b(x1 + x2 + x4), u2(x) = −cx2 + b(x2 + x1 + x3),

u3(x) = −cx3 + b(x3 + x2 + x4), u4(x) = −cx4 + b(x4 + x1 + x3), with b strictly concave and such that b0(1) = c. Then xˆ = (1/3, 1/3, 1/3, 1/3) is a Nash equilibrium. Choosing b such that b00(1) = −3 for simplicity, the Jacobian matrix associated to xˆ is −1 −1 0 −1 −1 −1 −1 0    DF (ˆx) =   ,  0 −1 −1 −1 −1 0 −1 −1 which corresponds to a saddle point, as the eigenvalues are −3, −1, −1, 1. The eigenspace associated to the positive eigenvalue is generated by v = (1, −1, 1, −1) so that, on the event

{limn xn =x ˆ}, we have  2 2  ∂u1 ∂u2 lim E < Un+1, v > | Fn = (ˆx) − (ˆx) n→+∞ ∂x2 ∂x1  ∂u ∂u 2 ∂u ∂u 2 + − 2 (ˆx) + 3 (ˆx) + 3 (ˆx) − 4 (ˆx) = 0 ∂x3 ∂x2 ∂x4 ∂x3 and the noise condition (9) does not hold.

Example 3 If interactions are not symmetric in xˆ then, even if the game exhibits strategic complements, the noise can vanish in the unstable direction. Consider the 2-player game with with x2 x2 u (x , x ) = − 1 + 2x − x (2 − x )2; u (x , x ) = − 2 − x2(2 − x ). 1 1 2 2 1 1 2 2 1 2 2 1 2 It is not hard to see that (1, 1) is a Nash equilibrium, and that

∂2u i (ˆx) = 2, i = 1, 2. ∂xi∂xj As a consequence the Jacobian matrix associated to the dynamics F is simply ! −1 2 DF (ˆx) = , 2 −1

32 which corresponds to a saddle point, as the eigenvalues are −3 and 1. The eigenspace asso- ciated to the positive eigenvalue is generated by v = (1, 1). Now it is not hard to see that the

∂u2 ∂u1 interactions are antisymmetric: (x) = − (x). Thus, on the event {limn xn =x ˆ}, we ∂x1 ∂x2 have  2 2  ∂u1 ∂u2 lim E < Un+1, v > | Fn = (ˆx) + (ˆx) = 0 n→+∞ ∂x2 ∂x1 and the noise condition does not hold.

References

Allouch, N. (2015). On the private provision of public on networks. Journal of Eco- nomic Theory, 157:527–552.

Ballester, C., Calv´o-Armengol,A., and Zenou, Y. (2006). Who’s who in networks. wanted: the key player. Econometrica, 74(5):1403–1417.

Beggs, A. (2005). On the convergence of reinforcement learning. Journal of Economic Theory, 122(1):1–36.

Bena¨ım,M. (1996). A Dynamical System Approach to Stochastic Approximations. SIAM Journal on Control and Optimization, 34:437.

Bena¨ım,M. (1999). Dynamics of stochastic approximation algorithms. S´eminaire de proba- bilit´esde Strasbourg, 33:1–68.

Bena¨ım,M. (2000). Convergence with probability one of stochastic approximation algorithms whose average is cooperative. Nonlinearity, 13(3):601–616.

Bena¨ım,M. and Faure, M. (2012). Stochastic Approximations, Cooperative Dynamics and Supermodular Games. Annals of Applied Probability, 22(5):2133–2164.

Bena¨ım,M. and Hirsch, M. (1999). Mixed Equilibria and Dynamical Systems Arising from Fictitious Play in Perturbed Games. Games and Economic Behavior, 29(1-2):36–72.

Bena¨ım,M., Hofbauer, J., and Sorin, S. (2005). Stochastic approximations and differential inclusions. I. SIAM Journal on Optimization and Control, 44:328–348.

Benveniste, A., Metivier, M., and Priouret, P. (1990). Stochastic Approximations and Adap- tive Algorithms.

33 Berger, U. (2005). Fictitious play in 2× n games. Journal of Economic Theory, 120(2):139– 154.

Bimpikis, K., Ehsani, S., and Ilkilic, R. (2014). Cournot competition in networked markets. In Proceedings of the 15th acm conference of economics and computation.

B¨orgers,T. and Sarin, R. (1997). Learning through reinforcement and replicator dynamics. Journal of Economic Theory, 77(1):1–14.

Bramoull´e,Y. and Kranton, R. (2007). Public goods in networks. Journal of Economic Theory, 135(1):478–494.

Bramoull´e,Y., Kranton, R., and D’amours, M. (2014). Strategic interaction and networks. The American Economic Review, 104(3):898–930.

Brandiere, O. and Duflo, M. (1996). Les algorithmes stochastiques contournent-ils les pi`eges? Annales de l’I. H. P. Probabilit´eset statistiques, 32(3):395–427.

Bravo, M. (2016). An adjusted payoff-based procedure for normal form games. Mathematics of , 41(4):1469–1483.

Bravo, M. and Faure, M. (2015). Reinforcement learning with restrictions on the action set. SIAM Journal on Control and Optimization, 53(1):287–312.

Brown, G. W. (1951). Iterative solution of games by fictitious play. Activity analysis of production and allocation, 13(1):374–376.

Cominetti, R., Melo, E., and Sorin, S. (2010). A payoff-based learning procedure and its application to traffic games. Games and Economic Behavior, 70(1):71–83.

Dubey, P., Haimanko, O., and Zapechelnyuk, A. (2006). Strategic complements and substi- tutes, and potential games. Games and Economic Behavior, 54(1):77–94.

Duflo, M. (1996). Algorithmes stochastiques.

Erev, I. and Roth, A. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review, 88(4):848–881.

Faure, M. and Roth, G. (2010). Stochastic approximations of set-valued dynamical systems: Convergence with positive probability to an attractor. Mathematics of Operations Research, 35(3):624–640.

34 Fudenberg, D. and Kreps, D. (1993). Learning mixed equilibria. Games and Economic Behavior, 5(3):320–367.

Hirsch, M. (1999). Chain transitive sets for smooth strongly monotone dynamical systems. Dynamics of Continuous Discrete and Impulsive Systems, 5:529–544.

Hofbauer, J. and Sandholm, W. (2002). On the Global Convergence of Stochastic Fictitious Play. Econometrica, 70(6):2265–2294.

Hopkins, E. and Posch, M. (2005). Attainability of boundary points under reinforcement learning. Games and Economic Behavior, 53(1):110–125.

Kushner, H. and Clark, D. (1978). Stochastic Approximation Methods for Constrained and Unconstrained Systems. Springer-Verlag.

Kushner, H. and Yin, G. (2003). Stochastic Approximation and Recursive Algorithms and Applications. Springer.

Leslie, D. S. and Collins, E. J. (2005). Individual q-learning in normal form games. SIAM Journal on Control and Optimization, 44(2):495–514.

Ljung, L. (1977). Analysis of recursive stochastic algorithms. Automatic Control, IEEE Transactions on, 22(4):551–575.

Miyazawa, K. (1961). On the convergence of the learning process in a 2× 2 non-zero-sum two-person game. Econometric Research Program, Princeton University. Research Memo- randum.

Monderer, D. and Shapley, L. (1996a). Fictitious Play Property for Games with Identical Interests. Journal of Economic Theory, 68(1):258–265.

Monderer, D. and Shapley, L. S. (1996b). Potential games. Games and economic behavior, 14(1):124–143.

Pemantle, R. (1990). Nonconvergence to unstable points in urn models and stochastic ap- proximations. ANN. PROB., 18(2):698–712.

Perkins, S. and Leslie, D. S. (2014). Stochastic fictitious play with continuous action sets. Journal of Economic Theory, 152:179–213.

Posch, M. (1997). Cycling in a stochastic learning algorithm for normal form games. Journal of , 7(2):193–207.

35 Robinson, J. (1951). An iterative method of solving a game. Annals of mathematics, pages 296–301.

Ruelle, D. (1981). Small random perturbations of dynamical systems and the definition of attractors. Communications in Mathematical Physics, 82(1):137–151.

Shapley, L. et al. (1964). Some Topics in Two-Person Games. Advances in Game Theory, 52:1–28.

Vives, X. (1990). Nash equilibrium with strategic complementarities. Journal of , 19(3):305–321.

36