<<

arXiv:1501.06558v1 [physics.soc-ph] 26 Jan 2015 iin r ttehato niiuldcsosi ulcgo public in decisions individual [24]. of environments heart the at player are other an about cisions beliefs [23] o and favor preferences return in social evidence that of fact is rate there and the th [13], to interactions evidence ” positive sensitive experimental the are of rich contributions “tragedy is the a there to Nevertheless, lead thus [22]. and equilibr time Nash over free-riding [21] the approach su 19] in c [18, decisions situations contribution the [20], of punishment mec interest as additional such the Without nisms in contribute. case is to everyone the it for is players, lective it other the the on return free-ride of of to tributions rates is temptation typical individual the For while by that, return. multiplied of being good rate public contributions fixed the total of from share results equal the which an by receives a he chosen to himself, the of player contribute Regardless to not. whether or form, pool decide simplest common players its that In requires 19]. inter- game [18, group interests between individual conflict and soc a est the from result captures may succinctly that that dilemma context archetypical an as netw static and [14], intera failure 16]. positive collective [15, structure to of different selection risk from group [13], ranging tions and 12], reciprocity coop- [11, of which promoted types of be means might by mechanisms eration different identifie of has Research [5–10] number theory 4]. a game [3, evolutionary of today realm are the we in that “supercooperators” the us aeaeotnctda h ieysucso h other-regar the alloparental genus of the sources of for likely abilities the as necessity cited often the Whil are care and success. evolutionary conflicts remarkable our between-group of pillars the of † ∗ lcrncades [email protected] address: Electronic lcrncades [email protected] address: Electronic nti ae,hwvr esalcnie nenvironment an consider shall we however, paper, this In established is particular, in [17], game goods public The oprto nszbegop a enietfida one as identified been has groups sizable in 2 aut fNtrlSine n ahmtc,Uiest of University Mathematics, and Sciences Natural of Faculty ht oehrwt h aaeeso h erigmdl th model, learning the of parameters Aumann-str the the with and together equilibrium that, Nash the both nest which ria, rfrne,sll ntebsso noriae directio uncoordinated of basis the explai on be solely thus preferences, can cooperation public of levels Substantial spaesaesfcetyrsosv otecagsi thei in changes the to responsive sufficiently are goods players public as of provisioning The distribution. stationary etfebcsfo ahpae’ w at eso htter the equilibr that Nash show the We from away past. deviations own explain player’s each from . strate feedbacks the individual ment in the players and However, information, other such the from not. uncoupled about or and contribution game the individual about an make to whether ecnie nevrnetweepaesaeivle napu a in involved are players where environment an consider We 1 eateto oilSine,EHZuih Clausiusstras Z¨urich, ETH Sciences, Social of Department Homo ietoa erigadtepoiinn fpbi goods public of provisioning the and learning Directional 1 ] ti tl eae htmade what debated still is it 2], [1, 3 aut fSine igAdlzzUiest,Jda,Sau Jeddah, University, Abdulaziz King Science, of Faculty enihH Nax H. Heinrich ’de- s’ the f con- ding ium ods ork ha- the ol- ial ch c- 1, at d d a e ∗ n ajˇ Perc Matjaˇz and a (mis)learning. nal a escrdee ne des odtos slong as conditions, adverse under even secured be can e ihu ruet novn eflsns rsocial or selflessness involving arguments without ned eietlrglrt 1]aebsdo niiullvlc 26]. individual-level [25, on ‘error’ of based are [18] regularity th publi of perimental conditions, explanations Previous adverse prevail. under still even may cooperation why, rate as the well with as rise return, contributions why and for alternative explanation an provides simple pat phenomenon equilibrium this show, the will off we play- As learning where reinforce model, mutually learning may simple I ers a germane. propose not shall are we beliefs explanations stead, and hence preferences and social of players, terms other in about or and/ game a the information relevant strategically no have players where ainaothwt epn et sarsl,tebehavior the result, a be As to has best. players respond of to a how relevant in strategically about strategy no mation have better players the i.e., whether is period, given ante contributing ex not determine or to such contributing play- any unable Without other are payoffs information. of players no knowledge, own adjustments have they strategy their which the about because ers, to does) due it changing reality are ti over in know constant even though the stays return not of (even of return do rate what of Players underlying rate nor the is. the whether are, game what there goods nor many public did, they underlying how what nor nor earn, do, they they what nor are, iltepplto e opaigteNs qiiru,and equilibrium, close Nash how the playing hand, to one get the population on the will interests: No collective and private information. relevant strategically of that lack how the is despite question important public The recent well in 34]. observed [33, dire been experiments such goods has Indeed, behavior prefer learning will learning. we tional directional general of more terminology genera therefore the to is also and applies spaces o reinforcement strategy for Since terminologies model. adequate learning both a are reinforcement learning space, directional strategy the of one-dimensionality learning tional taeyajsmn yaisaelkl oflo omof form a follow to likely reinforcement are dynamics adjustment strategy yajsmn yaisaedie nyb reinforce- by only driven are dynamics adjustment gy aio,Krˇk et 6,S-00Mrbr Slovenia Maribor, SI-2000 160, Koroˇska cesta Maribor, maximal e u.W nrdc h ocp of concept the introduce We ium. w aof n dutteratosaccordingly. actions their adjust and payoffs own r ups ahpae nw ete h h te players other the who neither knows player each Suppose n qiiru stoseilcss n eshow we and cases, special two as equilibrium ong well h eutn eairo lyr scompletely is players of behavior resulting The iltepplto er opa h ulcgosgame goods public the play to learn population the will lyr aksrtgclyrlvn information relevant strategically lack players slig“ietoa erig ssfcetto sufficient is learning” “directional esulting lcgosgm n utdcd repeatedly decide must and game goods blic e3-3 02Zrc,Switzerland Zurich, 8092 37-C3, se eehstomaig u otecnitbetween conflict the to due meanings two has here k − tegho qiiru eemnsthe determines equilibrium of strength 2,3]febc r sw hl alit, call shall we as or, feedback 30] [29, ,3, 2, 3,3] ent ht normdl u othe to due model, our in that, note We 32]. [31, † opeeyuncoupled completely iArabia di k − togequilib- strong 2,2] n their and 28], [27, direc- infor- sex- is bout osts me nd ny of n- ur c- h. te c l , 2 on the other hand, how close will the populationget to playing contribution). This regularity has previously been explained the socially desirable . only at an individual level, namely that ‘errors’ are less costly The learning model considered in this paper is based on a – and therefore more likely – the higher the rate of return, fol- particularly simple “directional learning” algorithm which we lowing quantal-response equilibrium arguments [25, 26]. By shall now explain. Suppose each player plays both cooper- contrast, we provide a group-dynamic argument. Note that ation (contributing to the common pool) and defection (not the alternative explanation in terms of individual costs is not contributing) with a mixed strategy and updates the weights germane in our setting, because we have assumed that players for the two based on their relative performances in have no information to make such assessments. It is in this previous rounds of the game. In particular, a player will in- sense that our explanation perfectly complements the expla- crease its weight on contributing if a previous-round switch nation in terms of costs. from not contributing to contributing led to a higher realized In what follows, we presentthe results, where we first set up payoff or if a previous-round switch from contributing to not the model and then deliver our main conclusions. We discuss contributing led to a lower realized payoff. Similarly, a player the implications of our results in section 3. Further details will decrease its weight on contributing if a previous-round about the applied methodology are provided in the Methods switch from contributing to not contributing led to a higher section. realized payoff or if a previous-round switch from not con- tributing to contributing led to a lower realized payoff. For simplicity, we assume that players make these adjustments Results at a fixed incremental step size δ, even though this could easily be generalized. In essence, each player adjusts its with directional learning mixed strategy directionally depending on a Markovian per- formance assessment of whether a previous-round contribu- In the public goods game, each player i in the population tion increase/decrease led to a higher/lower payoff. N = 1, 2, ..., n chooses whether to contribute (ci =1) or not Since the mixed strategy weights represent a well-ordered to contribute (ci = 0) to the common pool. Given a fixed strategy set, the resulting model is related to the directional rate of return r > 0, the resulting payoff of player i is then learning/ aspiration adjustment models [31, 32, 35], and simi- ui = (1−ci)+(r/n)∗Pj∈N cj . We shall call r/n the game’s lar models have previously been proposed for bid adjustments marginal per-capita rate of return and denote it as R. Note in assignment games [36], as well as in two-player games that for simplicity, but without loss of generality, we have as- [37]. In [36] the dynamic leads to stable cooperativeoutcomes sumed that the group is the whole population. In the absence that maximize total payoffs, while Nash equilibria are reached of restrictions on the interaction range of players [38], i.e., in in [37]. The crucial difference between these previous stud- well-mixed , the size of the groups and their for- ies and our present study is that our model involves more than mation can be shown to be of no relevance in our case, as long two players in a voluntary contributions setting, and, as a re- as R rather than r is considered as the effective rate of return. sult, that there can be interdependent directional adjustments The directional learning dynamics is implemented as fol- of groups of players including more than one but not all the lows. Suppose the above game is infinitely repeated at time players. This can lead to uncoordinated (mis)learning of sub- steps t =0, 1, 2, ..., and suppose further that i, at time t, plays t t t populations in the game. ci =1 with probability pi ∈ [δ, 1 − δ] and ci =0 with proba- t t Consider the following example. Suppose all players in a bility (1 − pi). Let the vector of contribution probabilities p large standard public goods game do not contribute to start describe the state of the game at time t. We initiate the game 0 with. Then suppose that a player in a subpopulation uncoor- with all pi lying on the δ-grid between 0 and 1, while subse- dinatedly but by chance simultaneously decide to contribute. quently individual mixed strategies evolve randomly subject If this group is sufficiently large (the size of which depends to the following three “directional bias” rules: on the rate of return), then this will result in higher payoffs t t−1 t t−1 t upward: if ui(ci) > ui(ci ) and ci > ci , or if ui(ci) < for all players including the contributors, despite the fact that t−1 t t−1 t+1 t t not contributing is the dominant strategy in terms of unilat- ui(ci ) and ci < ci , then pi = pi + δ if pi < 1; t+1 t eral replies. In our model, if indeed this generates higher otherwise, pi = pi. payoffs for all players including the freshly-turned contrib- neutral: if u (ct)= u (ct−1) and/or ct = ct−1, then pt+1 = utors, then the freshly-turned contributors would continue to i i i i i i i pt , pt+δ,or pt−δ with equal probability if 0

learning dynamics and study the resulting stationary states. In r particular, we consider perturbations of order such that, with 0 4 8 12 16 20 24 28 32 ǫ 1

probability 1 − ǫ, the dynamics is governed by the original k 16 three “directional bias” rules. However, with probability ǫ, ei- public bad, aligned t+1 t t+1 t t+1 t , conflicting public good, aligned ther pi = pi, pi = pi − δ or pi = pi + δ happens 14 equally likely (with probability ǫ/3) but of course obeying the t+1 12 pi ∈ [0, 1] restriction. -strong for -strongfor

10 k

8

Provisioning of public goods

6 We begin with a formal definition of the k−strong equilib- is at most most isat

4 rium. In particular, a pure strategy imputation s∗ is a k-strong i equilibrium of our (symmetric) public goods game if, for all 2 ∗ ∗ ′ ∗

c =1 for all i c =0 for all i C ⊆ N with |C| ≤ k, ui(sC ; sN\C) ≥ ui(sC ; sN\C ) for all i i for all for i ′ 0 i ∈ C for any alternative pure strategy set sC for C. As noted c

1 in the previous section, this definition bridges, one the one 0.0 0.4 0.8 1.2 1.6 2.0

hand, the concept of the Nash equilibrium in pure strategies R [21] in the sense that any k−strong equilibrium with k > 0 is also a Nash equilibrium, and, on the other hand, that of the FIG. 1: The maximal k-strength of equilibria in the studied public (Aumann-)strong equilibrium [39, 40] in the sense that any goods game with directional learning. As an example, we consider k−strong equilibrium with k = n is Aumann strong. Equi- the population size being n = 16. As the rate of return r increases libria in between (for 1 1 c = 1 The maximal k-strengths of the equilibria in our public rate of return increases further above ( ), the i for all i goods game as a function of r are depicted in Fig. 1 for (full cooperation) equilibrium suddenly becomes Aumann-strong (n−strong). Shaded regions denote the public bad game (r < 1), n = 16. The cyan-shaded region indicates the “public bad and the public goods games with conflicting (1 1) individual and public motives in terms of the Nash equilib- and the public motives in terms of the Nash equilibrium of rium of the game (see main text for details). We note that results for the game are aligned towards defection. Here ci = 0 for other population and/or group sizes are the same over R, while r and all i is the unique Aumann-strong equilibrium, or in terms the slope of the red line of course scale accordingly. of the definition of the k−strong equilibrium, ci = 0 for all i is k−strong for all k ∈ [1,n]. The magenta-shaded re- gion indicates the typical public goods game for 1 n (R> 1), where i given that pt = 0 or 1 for all j 6= i, has a larger attraction individual and public motives are again aligned, but this time j given the model’s underlying parameters. towards cooperation. Here ci = 1 for all i abruptly becomes the unique Nash and Aumann-strong equilibrium, or equiva- If R ≥ 1, the probability path for any player to move from t t+T lently the unique k−strong equilibrium for all k ∈ [1,n]. pi = 0 to pi = 1 in some T = 1/δ steps requires a single If we add perturbations of order ǫ to the unperturbed public perturbation for that player and is therefore of the order of a goods game with directional learning that we have introduced single ǫ. By contrast, the probability for any player to move from pt =1 to pt+T =0 in T steps is of the order ǫ3, because in section 2, there exist stationary distributions of pi and the i i following proposition can be proven. In the following, we at least two other players must increase their contribution in denote by “k” the maximal k−strength of an equilibrium. order for that player to experience a payoff increase from his non-contribution. Along any other path or if pt is such that t Proposition: As t → ∞, starting at any p0, the expectation there are not two players j with pj = 0 to make this move, t t t+T with respect to the stationary distribution is E[p ] > then the probability for i to move from pi = 1 to pi = 0 1/2 if R ≥ 1 and E[pt] < 1/2 if R< 1. ∂E[pt]/∂ǫ < in T steps requires even more perturbations and is of higher t t 0 if R ≥ 1, and ∂E[p ]/∂ǫ > 0 if R < 1. Moreover, order. Notice that, for any one player to move from pi = 0 t t t+T ∂E[p ]/∂δ > 0, and ∂E[p ]/∂δ < 0 if R ≥ 1. Finally, to pi =1 we need at least two players to move away from t t ∂E[p ]/∂k < 0 if R< 1. pi = 0 along the least-resistance paths. Because contributing 4

2.0 bution levels in the stationary state. Color encoded results in dependenceon the normalized rate of return R and the respon- 1.8 siveness of players to the success of their past actions δ (alter- 1.6 natively, the sensitivity of the individual learning process) are

1.4 presented in Fig. 2 for ǫ = 0.1. Small values of δ lead to a close convergence to the respective Nash equilibrium of the 1.2 game, regardlessof the of R. Asthevalueof δ increases,

1.0 R the pure Nash equilibria erode and give way to a mixed out- come. It is important to emphasize that this is in agreement, or 0.8 rather, this is in fact a consequence of the low k−strengths of 0.6 the non-contribution pure equilibria (see Fig 1). Within inter-

0.4 mediate to large δ values the Nash equilibria are implemented in a zonal rather than pinpoint way. When the Nash equilib- 0.2 rium is such that all players contribute (R > 1), then small

0.01 0.1 1 values of δ lead to more efficient aggregate play (recall any such equilibrium is n−strong). Conversely, by the same logic, when the Nash equilibrium is characterized by universal free-

0 0.3 0.7 1.0 riding, then larger values of δ lead to more efficient aggregate play. Moreover, the precision of implementation also depends FIG. 2: Color-encoded average contribution levels in the unperturbed on the rate of return in the sense that uncoordinated devia- public goods game with directional learning. Simulations confirm tions of groups of players lead to more efficient outcomes the that, with little directional learning sensitivity (i.e. when δ is zero higher the rate of return. In other words, the free-riding prob- R > 1 or very small), for the marginal per-capita rate of return the lem is mitigated if group deviations lead to higher payoffs for outcome ci = 1 for all i is the unique Nash and Aumann-strong equi- every member of an uncoordinated deviation group, the mini- librium. For R = 1 (dashed horizontal line), any outcome is a Nash mum size of which (that in turn is related to the maximal k− equilibrium, but only ci = 1 for all i is Aumann-strong while all other outcomes are only Nash equilibria. For R< 1, ci = 0 for all i strength of equilibrium) is decreasing with the rate of return. k− is the unique Nash equilibrium, and its maximal strength depends Simulations also confirm that the evolutionary outcome is on the population size. This is in agreement with results presented qualitatively invariant to: i) The value of ǫ as long as the lat- in Fig. 1. Importantly, however, as the responsiveness of players in- creases, contributions to the common pool become significant even in ter is bounded away from zero, although longer convergence the defection-prone R< 1−region. In effect, individuals’ (mis)learn times are an inevitable consequence of very small ǫ values what is best for them and end up contributing even though this would (see Fig. 3); ii) The replication of the population (i.e., mak- not be a unilateral best reply. Similarly, in the R > 1 region free- ing the whole population a group) and the random remixing riding starts to spread despite of the fact that it is obviously better between groups; and iii) The population size, although here to cooperate. For both these rather surprising and counterintuitive again the convergence times are the shorter the smaller the outcomes to emerge, the only thing needed is directional learning. population size. While both ii and iii are a direct consequence of the fact that we have considered the public goods game in a well-mixed rather than a structured population (where players 1 is a best reply for all R ≥ 1, those two players will also would have a limited interaction range and where thus pat- continue to increase if continuing to contribute 1. Notice that tern formation could play a decisive role [38]), the qualita- the length of the path is T = 1/δ steps, and that the path tive invariance to the value of ǫ is elucidated further in Fig. 3. requires no perturbations along the way, which is less likely We would like to note that by “qualitative invariance” it is the smaller δ. meant that, regardless of the value of ǫ> 0, the population al- If R < 1, the probability for any player to move from ways diverges away from the Nash equilibrium towards a sta- t t+T pi = 1 to pi = 0 in some T = 1/δ steps requires a sin- ble mixed stationary state. But as can be observed in Fig. 3, gle perturbation for that player and is therefore of the order the average contribution level and its variance both increase of a single ǫ. By contrast, the probability for any player to slightly as ǫ increases. This is reasonable if one considers ǫ as t t+T move from pi = 0 to pi = 1 in some T steps is at least an exploration or mutation rate. More precisely, it can be ob- of the order ǫk, because at least k players (corresponding to served that, the lower the value of ǫ, the longer it takes for the the maximal k-strength of the equilibrium) must contribute in population to move away from the Nash equilibrium where order for all of these players to experience a payoff increase. everybody contributes zero in the case that 1/n < R < 1 Notice that k decreases in R. Again, the length of the path is (which was also the initial condition for clarity). However, as T = 1/δ steps, and that path requires no perturbations along soon as initial deviations (from pi = 0 in this case) emerge the way, which is less likely the smaller δ. With this, we con- (with probability proportional to ǫ), the neutral rule in the clude the proof of the proposition. However, it is also worth original learning dynamics takes over, and this drives the pop- noting a direct corollary of the proposition; namely, as ǫ → 0, ulation towards a stable mixed stationary state. Importantly, E[pt] → 1 if R ≥ 1, and E[pt] → 0 if R< 1. even if the value of ǫ is extremely small, the random drift Lastly, we simulate the perturbed public goods game with sooner or later gains momentum and eventually yields simi- directional learning and determine the actual average contri- lar contribution levels as those attainable with larger values of 5

0.7 These results have some rather exciting implications. Fore- most, the fact that the provisioning of public goods even un-

0.6 -1

10 der adverse conditions can be explained without any sophis-

-2

10 ticated and often lengthy arguments involving selflessness or

0.5

-3

10 social holds promise of significant simplifications

-4

10 of the rationale behind seemingly irrational individual behav-

0.4

-5

10 ior in sizable groups. It is simply enough for a critical number

- 8 (depending on the size of the group and the rate of return) 0.3 10 of individuals to make a “wrong choice” at the same time

0.2 once, and if only the learning process is sufficiently fast or naive, the whole subpopulation is likely to adopt this wrong

0.1

averagecontribution level choice as their own at least part of the time. In many real- world situations, where the rationality of decision making is

0.0 often compromised due to stress, or peer pres-

0 1 2 3 4 5 6 7 10 10 10 10 10 10 10 10 sure, such “wrong choices” are likely to proliferate. As we

time have shown in the context of public goods games, sometimes this means more prosocial behavior, but it can also mean more FIG. 3: Time evolution of average contribution levels, as obtained free-riding, depending only on the rate of return. for R = 0.7, δ = 0.1 and different values of ǫ (see legend). If only The power of directional (mis)learning to stabilize unilater- ǫ > 0, the Nash equilibrium erodes to a stationary state where at ally suboptimal game play of course takes nothing away from least some members of the population always contribute to the com- the more traditional and established explanations, but it does mon pool. There is a discontinuous transition to complete free-riding bring to the table an interesting option that might be appealing ǫ → 0 ǫ (defection) as . Understandably, the lower the value of (the in many real- situations, also those that extend beyond the smaller the probability for the perturbation), the longer it may take provisioning of public goods. Fashion trends or viral tweets for the drift to gain on momentum and for the initial deviation to evolve towards the mixed stationary state. Note that the time hori- and videos might all share a component of directional learn- zontally is in logarithmic scale. ing before acquiring mainstream success and recognition. We hope that our study will be inspirational for further research in this direction. The consideration of directional learning in ǫ. Most importantly, note that there is a discontinuous jump structured populations [41, 42], for example, appears to be a towards staying in the Nash equilibrium, which occurs only particularly exciting future venture. if ǫ is exactly zero. If ǫ is bounded away from zero, then the free-riding Nash equilibrium erodes unless it is n−strong (for very low values of R ≤ 1/n). Methods

Discussion For the characterization of the stationary states, we intro- duce the concept of k−strong equilibria, which nests both We have introduced a public goods game with directional the Nash equilibrium [21] and the Aumann-strong equilib- learning, and we have studied how the level of contributions to rium [39, 40] as two special cases. While the Nash equi- the commonpool dependson the rate of return and the respon- librium describes the robustness of an outcome against uni- siveness of individuals to the successes and failures of their lateral (1-person) deviations, the Aumann-strong equilibrium own past actions. We have shown that directional learning describes the robustness of an outcome against the deviations alone suffices to explain deviations from the Nash equilibrium of any subgroup of the population. An equilibrium is said to in the stationary state of the public goods game. Even though be (Aumann-)strong if it is robust against deviations of the players have no strategically relevant information about the whole population or indeed of any conceivable subgroup of game and/ or about each others’ actions, the population could the population, which is indeed rare. Our definition of the still end up in a mixed stationary state where some players k−strong equilibrium bridges the two extreme cases, measur- contributed at least part of the time although the Nash equilib- ing the size of the group k ≥ 1 (at or above Nash) and hence rium would be full free-riding. Vice versa, defectors emerged the degree to which an equilibrium is stable. We note that where cooperation was clearly the best strategy to play. We our concept is related to coalition-proof equilibrium [43, 44]. have explained these evolutionary outcomes by introducing In the public goods game, the free-riding Nash equilibrium is the concept of k−strong equilibria, which bridge the gap be- typically also more than 1−strong but never n−strong. As we tween Nash and Aumann-strong equilibria. We have demon- will show, the maximal strength k of an equilibrium translates strated that the lower the maximal k−strength and the higher directly to the level of contributions in the stationary distribu- the responsiveness of individuals to the consequences of their tion of our process, which is additionally determined by the own past strategy choices, the more likely it is for the popu- normalized rate of return R and the responsiveness of players lation to (mis)learn what is the objectively optimal unilateral to the success of their past actions δ, i.e., the sensitivity of the (Nash) play. individual learning process. 6

Acknowledgments (Grant 324247), by the Slovenian Research Agency (Grant P5-0027), and by the Deanship of Scientific Research, King This research was supported by the European Commission Abdulaziz University (Grant 76-130-35-HiCi). through the ERC Advanced Investigator Grant ‘Momentum’

[1] Bowles, S. and Gintis, H. A Cooperative Species: Human Reci- [21] Nash, J. Equilibrium points in n-person games. Proc. Natl. procity and Its Evolution (Princeton University Press, Prince- Acad. Sci. USA 36, 48–49 (1950). ton, NJ, 2011). [22] Hardin, G. The . Science 162, 1243– [2] Hrdy, S. B. Mothers and Others: The Evolutionary Origins of 1248 (1968). Mutual Understanding (Harvard University Press, Cambridge, [23] Fischbacher, U., G¨achter, S., and Fehr, E. Are people condition- MA, 2011). ally cooperative? Evidence from a public goods experiment. [3] Nowak, M. A. and Highfield, R. SuperCooperators: , Econ. Lett. 71, 397–404 (2001). Evolution, and Why We Need Each Other to Succeed (Free [24] Fischbacher, U. and G¨achter, S. , beliefs, and Press, New York, 2011). the dynamics of free riding in public goods experiments. The [4] Rand, D. A. and Nowak, M. A. Human cooperation. Trends in American Economic Review 100, 541–556 (2010). Cognitive Sciences 17, 413–425 (2013). [25] Palfrey, T. R. and Prisbey, J. E. Anomalous Behavior in Public [5] Maynard Smith, J. Evolution and the Theory of Games (Cam- Goods Experiments: How Much and Why? The American bridge University Press, Cambridge, U.K., 1982). Economic Review 87, 829–846 (1997). [6] Weibull, J. W. Evolutionary (MIT Press, Cam- [26] Goeree, J. K. and Holt, C. A. An Explanation of Anomalous bridge, MA, 1995). Behavior in Models of Political Participation. American Politi- [7] Hofbauer, J. and Sigmund, K. Evolutionary Games and Popula- cal Science Review 99, 201–213 (2005). tion Dynamics (Cambridge University Press, Cambridge, U.K., [27] Foster, D. P. and Young, H. P. Regret testing: learning to play 1998). nash equilibrium without knowing you have an opponent. The- [8] Mesterton-Gibbons, M. An Introduction to Game-Theoretic oretical 1, 341–367 (2006). Modelling, 2nd Edition (American Mathematical Society, Prov- [28] Young, H. P. Learning by trial and error. Games and Economic idence, RI, 2001). Behavior 65, 626–643 (2009). [9] Nowak, M. A. Evolutionary Dynamics (Harvard University [29] Roth, A. E. and Erev, I. Learning in extensive-form games: Ex- Press, Cambridge, MA, 2006). perimental data and simple dynamic models in the intermediate [10] Myatt, D. P. and Wallace, C. When Does One Bad Apple Spoil term. Games and Economic Behavior 8, 164–212 (1995). the Barrel? An Evolutionary Analysis of . Re- [30] Erev, I. and Roth, A. E. Predicting how people play games: Re- view of Economic Studies 75, 499–527 (2008). inforcement learning in experimental games with unique, mixed [11] Mesterton-Gibbons, M. and Dugatkin, L. A. Cooperation strategy equilibria. American Economic Review 88, 848–881 among unrelated individuals: Evolutionary factors. The Quar- (1998). terly Review of Biology 67, 267–281 (1992). [31] Selten, R. and Stoecker, R. End behavior in sequences of [12] Nowak, M. A. Five rules for the evolution of cooperation. Sci- finite Prisoner’s Dilemma supergames A learning theory ap- ence 314, 1560–1563 (2006). proach. Journal of Economic Behavior & Organization 7, 47– [13] Rand, D. G., Dreber, A., Ellingsen, T., Fudenberg, D., and 70 (1986). Nowak, M. A. Positive interactions promote public coopera- [32] Selten, R. and Buchta, J. Experimental Sealed Bid First Price tion. Science 325, 1272–1275 (2009). Auctions with Directly Observed Bid Functions. Discussion [14] Santos, F. C. and Pacheco, J. M. Risk of collective failure pro- Paper Series B, (1994). vides an escape from the tragedy of the commons. Proc. Natl. [33] Bayer, R.-C., Renner, E., and Sausgruber, R. Confusion and Acad. Sci. USA 108, 10421–10425 (2011). learning in the voluntary contributions game. Experimental [15] Santos, F. C., Santos, M. D., and Pacheco, J. M. Social diversity Economics 16, 478–496 (2013). promotes the emergence of cooperation in public goods games. [34] Young, H. P., Nax, H., Burton-Chellew, M., and West, S. Learn- Nature 454, 213–216 (2008). ing in a Black Box. Economics Series Working Papers, (2013). [16] Rand, D. G., Nowak, M. A., Fowler, J. H., and Christakis, N. A. [35] Sauermann, H. and Selten, R. Anspruchsanpassungstheorie der Static network structure can stabilize human cooperation. Proc. unternehmung. Journal of Institutional and Theoretical Eco- Natl. Acad. Sci. USA 111, 17093–17098 (2014). nomics 118, 577–597 (1962). [17] Isaac, M. R., McCue, K. F., and Plott, C. R. Public goods pro- [36] Nax, H. H., Pradelski, B. S. R., and Young, H. P. Decentralized vision in an experimental environment. Journal of Public Eco- dynamics to optimal and stable states in the assignment game. nomics 26, 51–74 (1985). Proc. IEEE 52, 2398–2405 (2013). [18] Ledyard, J. O. Public goods: A survey of experimental re- [37] Laslier, J.-F. and Walliser, B. Stubborn learning. Theory and search. In The Handbook of , Kagel, Decision (2014). J. H. and Roth, A. E., editors, 111–194. Princeton University [38] Perc, M., G´omez-Garde˜nes, J., Szolnoki, A., and Flor´ıa and Y. Press, Princeton, NJ (1997). Moreno, L. M. Evolutionary dynamics of group interactions [19] Chaudhuri, A. Sustaining cooperation in laboratory public on structured populations: a review. J. R. Soc. Interface 10, goods experiments: a selective survey of the literature. Experi- 20120997 (2013). mental Economics 14, 47–83 (2011). [39] Aumann, R. J. Subjectivity and correlation in randomized [20] Fehr, E. and G¨achter, S. Cooperation and punishment in public strategies. Journal of 1, 67–96 goods experiments. Am. Econ. Rev. 90, 980–994 (2000). (1974). 7

[40] Aumann, R. J. as an expression of [43] Bernheim, B. D., Peleg, B., and Whinston, M. D. Coalition- bayesian rationality. Econometrica 55, 1–18 (1987). proof equilibria i. concepts. Journal of Economic Theory 42, [41] Szab´o, G. and F´ath, G. Evolutionary games on graphs. Phys. 1–12 (1987). Rep. 446, 97–216 (2007). [44] Moreno, D. and Wooders, J. Coalition-proof equilibrium. [42] Perc, M. and Szolnoki, A. Coevolutionary games – a mini re- Games and Economic Behavior 17, 82–112 (1996). view. BioSystems 99, 109–125 (2010).