THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING

RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

Introduction A common premise surrounding magic is that words themselves have power. Speaking the right words in the right context is believed to create fantastic effects. Everything from Old Norse runes to magic in the Harry Potter books requires activation with words. This kind of belief is a feature of not only western myth and magic, but also of African (famously of Azande oracles: Evans-Pritchard 1937) and Asian (Daoist) traditions. Some healers in the Muslim world write in ink verses from the Koran, and then they wash the ink into a potion to be consumed. In Swahili, one can use the same word, dawa, to refer to both magical spells and the influence a charismatic speaker has over a crowd. Why do so many peoples believe that words themselves are magi- cal? These beliefs are not necessarily irrational. Every one of us, by speaking, can alter the minds of those within earshot. With the word snake, one can conjure a potentially terrifying image in the minds of others. Statements like these reveal how hard it is to really control our thoughts as well as the power mere utterances have over us. Of course people are savvy and do not robotically obey all suggestion or command. However, spoken opinion and advice is highly valued almost everywhere. The words of others, carrying information, can have powerful effects on our own behavior. The mere suggestion that something—like a measles vaccine—is dangerous can have huge effects on behavior. People and governments intuit this power and as a result attempt to control the words they themselves and others are exposed to. Words really are a kind of mind control, or at least mind influence. Their power can travel through the empty air and affect the behavior of masses of other people in powerful ways. They are like magic.

Date: July 20, 2010. Cite as: McElreath, ., Fasolo, B., & Wallin, A. (in press). The Evolutionary Rationality of Social Learning. In R. Hertwig, U. Hoffrage & the ABC Research Group (Eds.), Simple Heuristics in a Social World. Oxford University Press. 1 2 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

The psychology of humans is uniquely elaborated for this kind of “magical” influence. The capacity for language is only one way that social influence on behavior is truly baked into our . Observa- tional learning of various kinds is equally powerful, as people combine theory and social information to arrive at inferences about the reasons for and consequences of behavior. But animals other than humans also use social information (e.g. Bonner 1980, Galef 1992, 1996, Price et al. 2009, Giraldeau & Caraco 2000, Fragaszy & Perry 2003, Laland & Galef 2009). While the psychological mechanisms and diversity of social learning heuristics among, for example, baboons is not the same as that among humans, the savvy monkey too uses information from its social environment. As a result, the field of evolutionary ecology has long been interested in the design and diversity of social learning heuristics, simple strategies that animals use to extract useful informa- tion from their social environments. In this chapter, we review a slice of this literature, as well as ex- plicitly analyze the evolution of social learning heuristics. We outline a family of social learning heuristics and analyze their evolutionary performance—ability to persist and replace other heuristics—under two broadly different kinds of environmental variation. As each so- cial learning heuristic also shapes a social environment as individuals use it, we consider the population feedbacks of each heuristic, as well. Feedbacks occur when the behavior generated by a heuristic in turn change the success rate of a heuristic, a phenomenon sometimes called frequency-dependence. The analyses in this chapter are ecological—the performance of each heuristic is always in the context of a specific set of assumptions about the population structure and environment. They are also game theoretic—social learning heuristics use but also modify the social environment, inducing strong frequency-dependence. Our analyses are also explicitly evolutionary—heuristics succeed or fail de- pending upon long-term survival and reproduction in a population, not atomistic one-shot payoffs. As a result, some of our conclusions reflect an evolutionary rationality that is sometimes counter-intuitive. For example, heuristics that randomize their behavior can succeed where those that are consistent fail. Overall, however, the approach we review here supports the general conclusion that social learning heuristics are likely to be multiple and subtly adapted to different physical, statisti- cal, and social environments. THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 3

Social learning heuristics In parallel to the literature in bounded rationality, evolutionary ecologists and anthropologists studying social learning have proposed that there exists a toolbox of contextually deployed heuristics that are suited to different ecological and social environments (reviews in Hen- rich & McElreath 2003, Richerson & Boyd 2005). The basic premise is that information about the world is costly to acquire and process (Boyd & Richerson 1985). So as a method of reducing information requirements and processing costs, natural selection favors strategies that leverage the specific correlations of specific environments in or- der to make locally-adaptive choices. Each heuristic in the toolbox is best deployed in a different circumstance, and some heuristics are more domain-general than others. Thus the expectation is an ecology of inferential strategies that individuals can use to choose behavior. While some of these strategies are more cognitively demanding and information hungry than others, all are quite “bounded,” compared to Bayesian justifications for social learning (Boyd & Richerson 2005, 2001, Bikhchandani et al. 1992). Like other hypothesized heuristics, these social learning heuristics can be compared to laboratory behav- ior. In recent years, there has been a small industry of testing these models against dynamic learning data (Efferson et al. 2008, McElreath et al. 2008, 2005, Mesoudi & O’Brien 2008, Mesoudi & Whiten 2008, Mesoudi 2008). In this section, we outline and begin to analyze a toolbox of social learning heuristics that evolutionary ecologists and evolutionary an- thropologists have studied. The collection of heuristics we review is not complete. Many other heuristics could be nominated, and each heuristic we do nominate is in reality a family of heuristics. How- ever, by constraining our discussion to the most commonly-discussed strategies, we have space to derive each from first (or at least basic) principles and, later, analyze the performance of several in different ecological circumstances. Theory leads us to expect that people (and perhaps other animals) possess a toolbox of social learning heuristics. Our goal is to study the conditions, both in terms of physical and social environment, that favor different heuristics. In Table 1, we list several social learning heuris- tics from the literature, also listing aliases and a sample of relevant citations to previous work. In the remainder of this chapter we demon- strate the analysis of a few of these. We also present a new analysis of the evolution of social heuristics in time varying environments. The 4 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

Box 1 Psychologist’s guide to theoretical evolutionary ecology We provide here short definitions of some of the key evolutionary ecology concepts in the chapter. A complete introduction can be found in McElreath & Boyd (2007). Population: All organisms of the same species who are linked by gene-flow, the possible exchange of genes across generations. Populations can be sub- divided into smaller groups, in which case not all individuals will be able to interbreed in a given generation. Nevertheless, as long as sub-populations are linked by migration across generations, all individuals in the total population can in principle be linked by gene-flow. The population is the natural unit of evolution, as the frequencies of genes and behavior change over time among the individuals within it. Life cycle: The sequence of events that happen between birth and death. These events, aggregated over many individuals in a population, induce selec- tion on specific genetic or cultural variants. Strategies: Heritable aspects of contingent behavior. Heuristics are strategies. Behavior is distinct from strategy, as the same strategy can produce different behavior in different contexts. In evolutionary models, strategies are what evolve, and the frequencies of different strategies, or the alleles that code for them, describe the state of the population. Fitness: Typically, the expected number of copies of a specific strategy per individual in the next generation. Fitness depends upon survival and reproduc- tion. Fitness concepts do however vary among models of evolutionary processes, because the goal is to define a quantity that will allow us to predict the popula- tion dynamics. Evolutionary ecologists attempt to understand what will evolve, and fitness is a tool in such an analyses. Dynamics: Time evolution in a physical system. In evolutionary models, the dynamics are the time trends of the frequencies of different heritable strategies and behaviors. The frequencies at any time in the future depend upon the frequencies in the past. Evolutionary analysis is a branch of dynamical systems theory. Equilibrium: A combination of strategies at the population level at which the dynamics of the population result in no change. Equilibria can be stable or unstable. The dynamics of a population return the population to a sta- ble equilibrium, when the frequencies are changed slightly. In contrast, the dynamics lead the population away from an unstable equilibrium, when the frequencies are changed slightly. Stable equilibria are candidate end states of the evolutionary process. Invasion: When a rare strategy can increase in numbers in a population, it can invade that population. A strategy that once common can repel rare invaders of all other strategies is an evolutionarily stable strategy. Geometric mean fitness If we define fitness as the product of survival prob- ability and mean reproduction rate, then geometric mean fitness is the geo- metric mean of different probable fitness values. Natural selection in many models maximizes geometric mean fitness, rather than average fitness, because natural selection is affected by both the mean and variance in fitness across generations. THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 5 Table 1. Major social learning heuristics from the lit- erature, with other names for the same strategies and citations to a sample of previous evolutionary analysis.

Heuristic Other names Citations Unbiased Linear social learning, random Boyd & Richerson (1995, social learning copying, imitation 1985), Cavalli-Sforza & Feld- man (1981), Rogers (1988), Mesoudi & Lycett (2009), Aoki et al. (2005), Wakano et al. (2004) Consensus Conformity, conformist trans- Boyd & Richerson (1985), learning mission, positive frequency de- Mesoudi & Lycett (2009), pendent imitation, majority Lehmann & Feldman (2008), rule imitation Henrich & Boyd (2001, 1998), Wakano & Aoki (2007) Payoff bias Success bias, indirect bias Boyd & Richerson (1985), Henrich (2001), Schlag (1998, 1999) Prestige bias Indirect bias Boyd & Richerson (1985), Henrich & Gil-White (2001) Kin bias Vertical transmission, parent- McElreath & Strimling (2008) child transmission dynamical systems approach common in evolutionary analysis may be unfamiliar to many readers, so we provide a quick guide in Box 1. The environmental challenge. In order to make progress defining and analyzing the performance of different learning heuristics, we have to define the challenge the organism faces. Here we use an evolutionary framing of the common multi-armed bandit problem. Assume that a each individual at some point in its life has to choose between a very large number of distinct behavioral options. These options could be timing of reproduction, patterns of paternal care, or any other set of mutually exclusive options. Only one of these options is optimal, producing a higher fitness benefit than all the others. We will assume that a single optimal behavior increases an individual’s fitness by a factor 1 + b > 1. All other behavior leaves fitness unchanged. In particular, let w0 be an individual’s fitness before behaving. Since there are a very large number of alternative choices, randomly guessing will not yield a fitness payoff much greater than w0. Then those who choose optimally have fitness w0(1 + b), while those who do not have fitness w0. Because fitness does not depend upon how many other individuals 6 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN also choose the same option, these payoff are not directly frequency dependent. Individual updating. The foil for all the social learning heuristics we consider here is a gloss individual updating heuristic. However the mechanism works in detail, we assume that individuals have the option of relying exclusively on their own experience, when deciding how to behave. We assume that individual updating requires sampling and processing effort, as well as potential trial-and-error. As a result, an organism that uses individual updating to learn optimal behavior pays a fitness cost by having their survival multiplied by c ∈ [0, 1]. This means the fitness of an individual updater is always w0(1 + b)c > w0. We assume that individual updating is always successful at identify- ing optimal behavior. We have analyzed the same set of heuristics, assuming that individual updating is successful only a fraction s of the time. This assumption, while biologically satisfying, adds little in terms of understanding. It changes none of the qualitative results we will describe in later sections, while adding mathematical complexity. “Unbiased” social learning. Probably the simplest kind of social learning is a strategy that randomly selects a single target to learn from. Much of the earliest evolutionary work on social learning has considered this strategy (Cavalli-Sforza & Feldman 1981, Boyd & Richerson 1985), and even more recent work continues to study its properties (Aoki et al. 2005, Wakano et al. 2004). To formalize this heuristic, consider a strategy that, instead of try- ing to update individually, copies a random member of the previous generation. Such a strategy avoids the costs of learning, c. Social learning may entail costs, but they are assumed to be lower than those of individual updating. The unavoidable cost of social learning is that the payoff from such a heuristic depends upon the quality of available social information. We will refer to such a strategy as “unbiased” so- cial learning. We use the word “unbiased” to describe this kind of social learning, although the word “bias” is problematic. We use the term to refer only to deviations from random, not from normative stan- dards. The word “bias” has been used in this way for some time in the evolutionary study of social learning (Boyd and Richerson 1985, for example). Let q (“quality” of social information) be the proportion of optimal behavior among possible targets of social learning. Then the expected fitness of an unbiased social learner is w0(1+qb), wherein b is discounted by the probability that the unbiased social learner acquires optimal be- havior. Like other social learning heuristics, unbiased social learning THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 7 actively shapes social environments itself—a heuristic that uses behav- ior is a cue and produces behavior will necessarily create feedbacks in the population of learners. As a result, a satisfactory analysis must be dynamic. We consider such an analysis in a later section. Note that we assume no explicit sampling cost of social learning. In- deed, several of the social learning heuristics we consider in this chapter use the behavior of more than one target, and we have not considered explicit costs of sampling these targets either. Consensus social learn- ing (below), in our simple model of it, uses three targets, and payoff biased social learning (later in this section) uses two targets. Does this mean that consensus is worse than payoff bias, when both are equal in all other regards? We think the answer to this question will depend upon details we have not modeled. Do other activities provide ample opportunity to sample targets for social learning or must individuals instead search them out and spend time observing their behavior? If the behavior in question is highly complex and requires time and prac- tice to successfully transmit, like how to make an arrow, then consensus learning may entail higher behavioral costs than, say, payoff bias. This is because a consensus learner needs to observe the detailed technique of three (or more) individuals, while the payoff biased learner must only observe payoffs and then invest time observing one target. We could invent stories that favor consensus, as well. And while constructing and formalizing such stories is likely productive, it is a sufficiently detailed project that we have not undertaken it in this chapter. But we do not wish to send the message that sampling costs and sampling strategy— how many to sample and when to stop, for example—are uninteresting or unimportant questions. They are simply beyond the scope of this introduction.

Consensus social learning. An often discussed category of social learning heuristics are those that use the commonality of a behavior as a cue (Boyd & Richerson 1985, Henrich & Boyd 1998, Mesoudi & Lycett 2009, Wakano & Aoki 2007). When an individual can sample more than two targets, it is possible to use the frequency of observed behavior among the targets as a cue to guide choice. This kind of strategy has been called positive frequency dependence and conformist transmission. We adopt the label “consensus” social learning here, because “conformity” is a vague term that many people associate with social learning of any kind (as it is often used in psychology). The alternative “positive frequency dependence” is an unwieldy term. Consensus social learning can be most easily modeled by assuming that an individual samples three targets at random and preferentially 8 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

Box 2 Deriving consensus social learning We use a simple table to derive probabilities of acquiring optimal (1) and non-optimal (0) behavior, using a consensus social learning heuristic. Observed behavior Pr(Obs) Pr(1) Pr(0) 1 1 1 q3 1 0 1 1 0 3q2(1 − q) 2/3 + D 1/3 − D 1 0 0 3q(1 − q)2 1/3 − D 2/3 + D 0 0 0 (1 − q)3 0 1 D < 1/3 is the strength of the preference for consensus. The columns are, in order from left to right: the vector of observed behavior from a sample of three targets, where 1 indicates optimal behavior and 0 any non-optimal behavior; the probability of sampling that vector; the probability of acquir- ing optimal behavior under the heuristic, given that sample; and the proba- bility of acquiring non-optimal behavior. First, multiply each probability of the observed vector of behavior by the probability of acquiring optimal behav- ior, Pr(1). Then, add together all the products from each row. In this case, q3 × 1 + 3q2(1 − q) × (2/3 + D) + 3q(1 − q)2 × (1/3 − D) + (1 − q)3 × 0 simplifies to Expression 1 in the main text, assuming for simplicity, that D = 1/3. We are making the simplifying assumption in this derivation that all non-optimal behavior is categorized together. As long as most immigrants come from a sin- gle or few neighboring patches, this will not be a bad approximation. Thus we consider these results to hold for structured populations with nearest-neighbor migration. When it is a bad approximation, however, it is a conservative esti- mate that biases our analysis against consensus learning, not in favor of it. adopts the most common behavior among them. In Box 2, we show how to use this definition to derive the expected probability that an individual using consensus social learning will acquire optimal behav- ior: (1) Pr(1) = q + q(1 − q)(2q − 1). Boyd and Richerson (1985) have considered a number of generaliza- tions of this heuristic, including different weights given to each target, as well as correlations among the behavior of the targets. We will ig- nore these complications in this chapter, as our goal is to motivate a mode of analysis and to emphasize the differences among quite differ- ent social learning heuristics, rather than among variants of the same heuristics.

Payoff biased social learning. Another often analyzed category of social learning heuristic is payoff guided learning (Boyd & Richer- son 1985, Schlag 1998, 1999, Stahl 2000). By comparing observable payoffs—health, surviving offspring, or even more domain-specific mea- sures of success—among targets, an individual can use differences in THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 9

Box 3 Deriving payoff biased social learning Again we use a table to derive probabilities of acquiring optimal (1) and non-optimal (0) behavior, this time using payoff biased social learning. Actual behavior Pr(actual) Observed payoffs Pr(obs) Pr(1) Pr(0) 1 1 q2 1 0 1 0 2q(1 − q) 1 0 (1 − x)2 1 0 0 0 x(1 − x) 1/2 1/2 1 1 x(1 − x) 1/2 1/2 0 1 x2 0 1 0 0 (1 − q)2 0 1 In this table, q is the frequency of optimal behavior among targets and x is the chance of incorrectly judging the payoff of a target (or similarly, 1 − x is the correlation between behavior and observed payoffs). The individual using payoff biased social learning samples two targets at random and assesses their payoffs. The individual copies the behavior of the target with the higher observed payoff, unless both observed payoffs are the same, in which case one target is copied at random.

payoff as a guide for learning. This kind of heuristic generates a dy- namic often called “the replicator dynamic” in evolutionary game the- ory (Gintis 2000). This dynamic is very similar to that of natural se- lection, and is often used as a boundedly-rational assumption in social evolutionary models (McElreath et al. 2003, e.g.) and even epidemiol- ogy (Bauch 2005). A simple model of payoff biased social learning assumes that indi- viduals sample two targets and preferentially adopt the behavior of the target with the higher observed payoff. We assume that there is a chance x that the individual can correctly judge the payoff of a target to be high or low. Another interpretation is that x is the chance that a target’s observable payoff is uncorrelated with behavior. Using these assumptions, we show in Box 3 that this heuristic leads to a chance:

Pr(1) = q + q(1 − q)(1 − x) of acquiring optimal behavior. A more general model of payoff-bias allows for the aspect of the target to be judged as “success” to itself be socially transmitted. When this is the case, unaniticipated social processes become possible, such as the run-away exaggeration of preferences for traits that are judged as successful (Boyd and Richerson 1985). 10 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

Spatial+ Spatial Temporal Temporal

Time 1

Time 2

Figure 1. Abstract forms of environmental variation. Each square represents an overhead view of environmental variation. With purely spatial variation, left column, dif- Thursday, July 16, 2009 ferent locales favor different optimal behavior, represented by the shading levels. But these differences remain static through time, moving from top to bottom. With purely tem- poral variation, middle column, all locales favor the same be- havior, but the optimal behavior varies through time. With both spatial and temporal variation, on the right, locales may be different both from other locales and from themselves, through time.

Ecological variation and social learning Given the definitions of heuristics in the previous section, we now turn to analyzing the evolutionary dynamics of these four strategies— individual updating, unbiased social learning, consensus social learning, and payoff biased social learning—both alone and in competition. We will assume that each heuristic is a heritable strategy and study their population dynamics. We consider how these heuristics perform in two statistical environments: (1) a spatially variable environment, in which different behavior is optimal in different places, and (2) a temporally variable environment, in which different behavior is optimal at different times. We also consider the interaction of these two kinds of variation (Figure 1). The reason for focusing on environmental variation, the rates at which the environment changes spatially and temporally, is that “learn- ing” as it has long been studied in evolutionary ecology has identified ecological variation as a prime selection pressure favoring both indi- vidual learning (phenotypic plasticity, as it is often called) and social learning (Levins 1968, Boyd & Richerson 1988). In a perfectly sta- tionary environment, genetic adaptation (canalization) does a fine job THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 11 of adapting the organism, without any of the cognitive overhead and potential for error that arises from using information during develop- ment to alter behavior. Thus evolutionary ecologists still consider the nature of environmental variation to be a key factor in the evolution, maintenance, and design of learning (see e.g. Dunlap & Stephens 2009, Lande 2009). Our goal in this section is to describe some conditions under which each social learning heuristic is well-adapted. No single heuristic can succeed in all circumstances. To some extent, all social learning heuris- tics depend upon some kind of individual updating, for example. Ad- ditionally, the differences among social learning strategies generate dif- ferent dynamics for the quality of the social environment. Because our analysis is explicitly evolutionary, it will turn out that good heuristics are those which can live well with themselves. Social heuristics tend to shape the social environments that they rely upon for information. Thus the precise ways in which the physical and social environments interact play a large role in determining the long-term evolutionary success of a heuristic.

Spatial variation in the environment. In this section we consider the first of two stereotyped forms of environmental variation, spatial variation in optimal behavior. We assume that the environment is sub-divided into a large number of distinct patches, each with a unique optimal behavior. Optimal behavior within each patch is forever the same. However different patches never have the same optimal behav- ior. Within each patch, a large sub-population of organisms follows the life-cycle: (1) birth, (2) learning, (3) behavior, (4) migration, (5) reproduction, (6) death. Individuals are born naive and must use some strategy to acquire behavior. If behavior is optimal for the local patch, then the individual’s fitness is multiplied by the factor 1 + b > 1. Otherwise, fitness is unchanged. A proportion m of the local popula- tion emigrates to other patches and an equally sized group immigrates from other patches. Generations overlap only long enough for newly born naive individuals to possibly learn from the previous generation of adults. Because of migration, some of the adults available to learn from are immigrants, all of whom possess non-optimal behavior for their new patch. While fitness is assigned in natal patches, we assume that adults continue to display their behavior after migration, and so naive individuals run the risk of learning from immigrants. Addition- ally, we assume that naive individuals cannot tell who is and is not a native of their local patch. While such cues might be available in many circumstances, they are certainly not always available. 12 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

Individual updating. The expected fitness of an individual updater is:

w(I) = w0(1 + b)c, where 0 < c < 1 is a multiplicative cost to survival or reproduction. Provided that (1+b)c > 1, individual updating will be the best-adapted heuristic, whenever the quality of social information in the local patch, q, is equal to zero. However, since individual updating quickly increases the frequency of optimal behavior in the local patch, this heuristic quickly generates a social environment favorable to one social learning heuristic or another. Unbiased social learning. Pr´ecis. While individual updaters generate locally adap- tive behavior that social learners can exploit, mixing among patches erodes this information. Therefore, unbiased social learning can invade a population using individ- ual updating, provided mixing among patches is not too strong. Unbiased social learning can never completely replace individual updating, however. Thus when unbi- ased social learning can invade, there will exist a stable mix of individual updating and unbiased social learning in the population. We now consider when unbiased social learning (U) can out-perform individual updating. In generation t, the expected fitness of an indi- vidual using unbiased social learning is:

w(U)t = w0(1 + qtb), where qt is the frequency of optimal behavior among targets in the cur- rent generation t. To compute the expected fitness across generations, we need an expression for the average amount of optimal behavior in the population. In Box 4, we show how to estimate this expression. We use this expression to prove how selection increases and decreases frequencies of these two heuristics. As has been shown many times (see Rogers 1988 for a clear example), neither individual updating nor un- biased social learning can exclude one another under all circumstances, and so models of this kind predict that both will co-exist, in the ab- sence of other heuristics. A stable proportion of individual updaters, pˆ, is found where:

w(I) = w(U)|p=ˆp, m(1 + b)c − 1 (2) pˆ = . (1 − m)(1 + b)(1 − c) THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 13

Box 4 The steady state amount of optimal behavior under a mix of individual updating and unbiased social learning To compute the expected fitness across generations, we need to study the dynamics of q. The frequency of optimal behavior among targets at time t is defined by the recursion:  qt = (1 − m) pt−1 + (1 − pt−1)qt−1 + m(0),

where pt−1 is the proportion of the local population comprising individual up- daters, in the previous generation. To understand this recursion, first consider that a proportion pt−1 of targets are individual updaters. If a social learn- ing targets one of these, then it is certain to acquire optimal behavior (before migration). If instead a social learner targets another social learner, which hap- pens 1 − pt−1 of the time, there is a chance qt−1 of acquiring optimal behavior, because that is the chance each social learner in the previous generation had of acquiring optimal behavior. Finally, only a proportion 1 − m of the local group remains to be potentially targets of learning. The proportion m that immigrates possesses only non-optimal behavior, however it was learned. If we assume that natural selection of the frequencies of learning heuristics is slow relative to the dynamics of q, then we can treat pt as a constant p in the ex- pression above and set qt = qt−1 =q ˆ and solve for the expected proportion of optimal behavior among targets: (1 − m)p (3) qˆ = . 1 − (1 − m)(1 − p) Numerical works shows that this fast-slow dynamics approximation is very ac- curate, unless selection (proportional to b) is very strong.

Inspecting the partial derivatives of the right-hand side shows that increasing migration, increasing the value of optimal behavior, and decreasing the cost of individual updating all increase the equilibrium frequency of individual updating (∂p/∂mˆ > 0, ∂p/∂bˆ > 0, and ∂p/∂cˆ > 0, ∀b > 0, c ∈ [0, 1], m ∈ [0, 1], (1 + b)c > 1). These results tell us that, if migration is too common, then unbi- ased social learning cannot invade a population of individual updaters, because too often the behavior available to copy is appropriate for a dif- ferent patch. However the amount of migration unbiased social learning can tolerate depends upon the costs and benefits of learning. Increas- ing migration, increasing the value of optimal behavior, and decreasing the cost of individual updating all increase the equilibrium frequency of individual updating and decrease the frequency of social learning. Consensus social learning. Pr´ecis. Consensus social learning yields higher fitness and replaces unbiased social learning, provided mixing between patches is not so strong as to make the expected local proportion of optimal behavior below one-half. If 14 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

Box 5 Condition for consensus social learning to invade a mixed pop- ulation of individual updating and unbiased social learning The expected fitness of a rare consensus learner (C) in generation t is:

w(C)t = w0(1 + b(qt + qt(1 − qt)(2qt − 1)),

where the factor qt + qt(1 − qt)(2qt − 1) was derived in Box 2. The invader faces a value of qt =q ˆ, reached under the joint action of both individual updaters and unbiased social learners. But regardless of the value of q, for consensus learning to do better than either common heuristic, all that is required is that: 1 w(C)t > w0(1 + qtb) =⇒ qt > . 2 Consensus social learning is favored in any generation in which the expected proportion of optimal behavior among targets is greater than one-half. Sub- pˆ stituting in the expression forq ˆ, this condition simplifies to m < 1+ˆp . So as long as migration isn’t so strong as to flood local adaptive learning, which hap- pens at a ratep ˆ, consensus learning can invade a mix of individual updating and social learning. Sincep ˆ is a function of m, b, c, we can substitute in the expression forp ˆ derived in the previous section. Doing so results in condition 4 in the main text. If consensus learning can invade, it will always exclude unbi- ased social learning. Sometimes consensus learning can also exclude individual updating. If consensus learning is common, then the expected proportion of locally optimal behavior is: p 3 (1 − m)(1 − 9m) qˆ = (1 − m)qˆ +q ˆ(1 − qˆ)(2ˆq − 1) = + . 4 4(1 − m) This expression is hard to interpret directly, but for small m (such that m2 ≈ 0), it is approximately 1 − m, which shows that migration tends to reduce the proportion of locally optimal behavior as you might expect. Using the exact expression, consensus learning can exclude individual updating when w0(1 + bqˆ) > w0(1 + b)c and c > (1 + b/2)/(1 + b), which is satisfied when both m ≤ 1/9 and 1/2 < c < 3/4.

mixing is sufficiently weak and individual updating suf- ficiently costly, then consensus social learning can actu- ally out-compete both individual updating and unbiased social learning. When can a consensus learning heuristic invade a population of indi- vidual updaters and social learners? We derived above that, when the environment varies spatially, the population will approach a stationary proportion of individual updaters, unless migration is very powerful relative to the value of optimal behavior, in which case individual up- dating will dominate. At the stationary mix of both heuristics, the expected fitness of both individual updating and unbiased social learn- ing is w0(1+b)c. For consensus learning to invade, it only has to achieve greater fitness than this. THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 15

In Box 5, we prove that the condition for consensus social learning to invade a population of individual updating and unbiased social learning is: 1 + b/2 (4) c > . 1 + b This is easier to satisfy as c increases. This means that consensus learning can invade, provided that individual updating is sufficiently cheap (remember: high c means cheap updating). If c is too small (too costly), then there won’t be enough individual updating at equilibrium to keep the average frequency of optimal behavior (q) above one-half. Consensus learning will exclude and replace simple social learning in this environment, whenever it can invade. Perhaps counter-intuitively, if the rate of mixing is low enough, consensus learning can also exclude even individual updating, which simple social learning can never do. We prove this also in Box 5. Provided migration is weak enough and individual updating is sufficiently expensive, but not too expensive, consensus learning can dominate the population entirely. There is an intermediate range of individual updating costs that allow consensus to dominate a population. The exact result here depends critically on the precise model of con- sensus learning. However, the qualitative result is likely quite general. Consensus learning is a non-linear form of social learning. As a conse- quence, it can actively transform the frequency of behavior from one generation to the next. It is a form of “inference,” to speak casually. When mixing is weak, this inferential process can substitute for costly individual updating, because the increase in locally optimal behavior consensus learning generates each generation will balance the loss from immigration.

Payoff bias. Pr´ecis. Payoff biased social learning relies upon the ob- servable consequences of previous choice. As a result, the lower the correlation between observable success and optimal behavior in the relevant domain, the lower the benefit payoff biased learning provides. Payoff biased so- cial learning can, like consensus social learning, replace both unbiased social learning and individual updating, under the right conditions. If migration is weak enough and error in judging payoffs great enough, then consen- sus social learning can out-compete payoff biased learn- ing. 16 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

Payoff bias can always invade and replace unbiased social learning. The condition for payoff bias to invade a population of individual up- daters and unbiased social learners is:

w0(1 + b(qt + qt(1 − qt)(1 − x))) > w0(1 + bqt),

The above simplifies to x < 1 for all qt ∈ [0, 1], and so payoff biased learning dominates unbiased social learning whenever there is any cor- relation between observable success and the behavior of interest. Like consensus learning, payoff bias is non-linear and actively changes the frequency of adaptive behavior from one generation to the next. Also like consensus learning, this means payoff bias can sometimes ex- clude individual updating, in this case provided: m b (5) x < 1 − · . 1 − m b − (1 − m)((1 + b)c − 1) So as long as migration is not too strong and cues of payoffs are suf- ficiently accurate, it is possible for payoff bias to completely exclude individual updating. Finally, consensus social learning can sometimes invade and replace payoff biased social learning. Consensus can invade a pure population of payoff biased social learning and replace it, provided: x(1 − x) m < . x(1 − x) + 2 When x is very small, payoff biased learning is highly accurate and therefore unless migration is also very weak, consensus learning lacks a sufficient advantage. Summary. All of the above results explore the properties of unbiased, consensus, and payoff bias social learning when the environment varies spatially. We showed that unbiased social learning can never com- pletely replace individual updating of some kind, because unbiased learning does not transform the frequency of optimal behavior. As a result, it does nothing to modify its social environment for its own good. In contrast, both consensus bias and payoff bias actively trans- form the frequency of optimal behavior across generations, increasing it slightly above its previous value (assuming q > 1/2, x < 1). As a result, both consensus and payoff bias can completely exclude individ- ual updating, provided the force of mixing, m, isn’t too strong and individual updating is sufficiently costly. We think such conditions are actually quite rare (and as we show in the next sections, completely absent under our model of temporal environmental variation), but they do reveal a fundamental property THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 17 of these non-linear social learning heuristics: they actively modify their social environment, and this process can substitute for individual up- dating, in the right kinds of environments. But each heuristic modifies the social environment using different cues, and therefore they behave differently in different environments. A symptom of this fact is that ei- ther consensus or payoff bias can dominate the other, depending upon the amount of mixing among patches (m) and the amount of error in judging payoffs (x).

Temporal variation. When the environment varies through time, in- stead of across space, many of the same principles that we discovered above hold true. However, there are important differences between temporal variation and spatial variation. Under purely spatial varia- tion in optimal behavior, individuals do well to avoid learning from immigrants to their local patch. However, since the locally optimal behavior does not change over time, a reliable store of locally adaptive culture can accumulate (as long as mixing isn’t too strong). Indeed, we showed that both consensus and payoff bias can even exclude indi- vidual updating, maintaining optimal behavior at high frequency, even though neither uses individual experience with the environment. When the environment varies through time, the nature of the prob- lem is subtly different. Now the optimal behavior in each patch will eventually change. When it does, all previously learned behavior is rendered non-optimal. As a result, all social learning heuristics are at a disadvantage, just after a change in the environment. Specifically, we will assume that the environment no longer varies spatially—all patches favor the same behavior. However, there is a chance u in each generation that all patches switch to favoring a new behavior. Since there is a very large number of alternative behaviors, all previously learned behavior is then non-optimal. This kind of environmental variation also forces us to contend with what evolutionary ecologists call geometric mean fitness (see Orr 2009 for a recent review and comparison of different evolutionary concepts of fitness). When environments vary through time, even rare catastrophe can mean the end of a lineage. As a result, selection may favor risk- averse strategies that are adapted to statistical environments, instead of current environments (Gillespie 1974, Levins 1968).

Temporal variation may favor randomized or mixed strategies. Pr´ecis. A mixed heuristic is one in which individuals use two or more heuristics different proportions of the time. When the environment varies purely across space, 18 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

selection does not clearly favor either pure social learn- ing heuristics or mixed social learning heuristics. When the environment varies through time, however, selection favors mixed heuristics over pure ones. In the case of of unbiased social learning and individual updating, selec- tion favors the mixed heuristic over both pure strategies. One important result of temporal variation is that a strategy that mixes individual updating and social learning will often out-compete both pure strategies. In the spatial variation case, it makes no obvious difference whether individuals randomly update for themselves or learn socially. But when the environment varies through time, natural se- lection tends to favor “bet hedging” strategies that engage in adaptive randomization of behavior. The mathematics can be opaque at first, but grasping the cause is easy: survival and reproduction are multi- plicative processes. As a result, if a lineage ever is reduced to a very small number of individuals, then it will take it a long time to recover. Therefore a strategy has to both do well and avoid bottlenecks in order to grow quickly and sustain its numbers. When selection varies tempo- rally, selection favors heuristics that attend both mean fitness as well as variance in fitness. To apply this idea to our social learning heuristics, consider again the basic individual updating versus random target social learning model from before. We showed that there is a stable mix individual updaters and social learners in this model, lying at: m(1 + b)c − 1) (6) pˆ = , (1 − m)(1 + b)(1 − c) wherep ˆ is the stable fraction of individual updaters. The average fitness of both individual updaters and social learners at this proportion is the same, w0(1 + b)c. Another way to combine these heuristics is internally, within indi- viduals. Suppose that there is a mixed strategy, IU, that uses individ- ual updating a proportion f of the time and unbiased social learning 1 − f of the time. We prove in Box 6 that natural selection does not distinguish among mixed and pure heuristics in this model. In gen- eral, when environmental variation is purely spatial, selection does not clearly distinguish between pure and mixed heuristics. (Although, in very small populations, even this isn’t true—see Bergstrom & Godfrey- Smith 1998). However, when the environment varies temporally, the answer changes. Now, because of the impact of temporal fluctuations in fitness, the en- vironment ends up favoring the mixed heuristic that randomizes its use THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 19

Box 6 Spatial variation does not favor either mixed or pure heuristics Consider an alternative “mixed” strategy that internally randomizes between individual updating (I) and unbiased social learning (U). This heuristic’s ex- pected fitness is:

w(IU) = fw0(1 + b)c + (1 − f)w0(1 +qb ˆ ), where f is the fraction of the time the individual randomly uses individual updating. In order to deduce what value of f natural selection would favor, we find the value of f that maximizes the fitness of the strategy, by solving ? ∂w(IU)/∂f|f=f ? = 0 for f . This yields: m(1 + b)c − 1) f ? = , (1 − m)(1 + b)(1 − c) which is the same expression as (6), and therefore expected fitness at this opti- mal value of f is also w0(1+b)c, the same as either pure strategy at equilibrium. of individual and social updating. This is a result of selection in tem- porally fluctuating environments depending upon geometric mean fit- ness, rather than arithmetic mean fitness (see Cooper & Kaplan 1982, Philippi & Seger 1989, for reviews). In general, if the environment varies temporally between two states, each with probability p1 and p2 respectively, then the long-term growth rate is:

p1 p2 r¯ = w1 w2 . This is in fact the geometric mean fitness of the strategy. Evolution- ary ecologists usually work with the natural logarithm of this average, log[¯r] = p1 log[w1] + p2 log[w2]. In the case of analyzing social learning, the state of environment is the time since the environment last changed, and this could be anything from one generation ago to an infinity of generations ago. This might seem daunting at first, but it is really just an applica- tion of the logic above, extrapolating from two states of the environ- ment to an infinity of states. The kind of fitness expression we seek is Pn Pn log[¯r] = i=1 pi log[wi], for n environmental states, where i=1 pi = 1. Let’s take the geometric logic above and apply it to our models of social learning. We will assume again that there are a large number of alternative behaviors, but instead of pure spatial variation, we will now assume that there is purely temporal variation. Each generation, there is a chance u that the environment changes and makes another random behavior optimal, rendering all previously learned behavior non-optimal. Let p again be the frequency of individual updating in the population. In Box 7, we demonstrate how to derive long-run growth rates under temporal environmental variation, in this case using the example of the pure unbiased social learning strategy, U. 20 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

Box 7 Log-geometric growth rate of unbiased social learning Let p again be the frequency of individual updating in the population. One generation after a change in the environment, the proportion of adaptive be- havior is q(1) = p, because individual updaters have had one generation to pump new knowledge into the society. After one more generation without an- other change, q(2) = p + (1 − p)q(1) = p + (1 − p)p = 1 − (1 − p)2. Then q(3) = p + (1 − p)q(2) = 1 − (1 − p)3. This series continues and implies that, if the environment changed t > 0 generations ago, the expected chance of acquir- ing optimal behavior via social learning is: t X (7) q(t) = p(1 − p)i−1 = 1 − (1 − p)t. i=1 We can compute the expected value of q(t) for any generation t, if we are willing again to assume that p changes slowly, relative to q. In that case, in any given generation, there is a chance u that the environment changed in the most recent generation (t = 0), and therefore only those who updated individually have op- timal behavior. There is a chance u(1 − u) that the environment changed one generation ago (t = 1). In general, there is a chance u(1 − u)t that the envi- ronment last changed t generations ago. Using the definition of log geometric mean fitness, we build the growth rate of an unbiased social learner (U) by multiplying each probability of each environmental state by the log-fitness in that state: ∞ X t r(U) = u log[w0] + u(1 − u) log[w0(1 + bq(t))]. t=1 The above can be motivated in the following way. Expected fitness is the product of each fitness raised to the probability of its occurrence, so the expected log-fitness is the sum of each log-fitness multiplied by the probability it occurs. If the environment has just changed, which happens u of the time, then the individual receives w0. The other possibility is that the environment has not changed in the last t generations, yielding a chance q(t) of fitness w0(1 + b) and a chance 1 − q(t) of fitness w0, for each individual using this heuristic. That is, every social learner experiences the same q(t) in any generation t, and the value of q(t) is determined by the probabilities u(1 − u)t. But a proportion q(t) of social learners will get lucky and choose a target with optimal behavior, while the rest will not. Thus the probability q(t) is inside the logarithm.

Once we have a geometric fitness expression for a heuristic, we can analyze its evolutionary dynamics. Unfortunately, the form of this ex- pression in this case makes the mathematics intractable. There are no algebraic methods for closing such an infinite sum, in which the power t is both outside and inside the logarithm. We can make progress, however, by constructing an approximation that is valid for weak se- lection. In Box 8, we demonstrate how to construct weak selection approximations for this model. THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 21

Box 8 Weak selection approximation Weak selection applies when b and c are small such that terms of order b2 and (1 − c)2 and greater are approximately zero. The use of weak selection approx- imations is common in evolutionary ecology, because it often makes otherwise intractable problems analytically solvable. One must keep in mind, however, that our conclusions from here out will only be exactly valid for choices that have modest effects on fitness. Numerical work, as well as the simulations we show in a later section, confirm that the qualitative conclusions we reach here are general to strong selection, however. To apply the weak selection approxi- mation, we use a Taylor series expansion of r(U) around b = 0, c = 1 and keep the linear terms in b, c. This gives us: (1 − u)bp r(U) ≈ log[w0] + . 1 − (1 − p)(1 − u) We wish to compare this expression to the weak-selection approximation of the growth rate of individual updating, which by the same method is r(I) ≈ log[w0] + b − (1 − c). Selection will adjust p until r(U) = r(I), which implies an expected long-run value of p: u(b − (1 − c)) (8) p¯ = . (1 − u)(1 − c) This is the same as Expression 6, once we apply the weak selection approxima- tion and let m = u. We will need this result in order to compare the two pure heuristics to the mixed heuristic in the temporal variation context.

Finally we are ready to again consider, now in the context of tempo- ral variation, the family of alternative heuristics that randomize their updating, using individual updating (I) a proportion f of the time and unbiased social updating (U) a proportion 1 − f of the time. Using the same logic that allows writing the expression r(U) in Box 7, the long-run growth of this mixed heuristic is: ∞ X t r(IU) = u(1 − u) log[fw0(1 + b)c + (1 − f)w0(1 + bq(t))]. t=0 Using this expression, we prove in Box 9 that the mixed heuristic that randomly uses individual and social updating both invades and replaces a population of pure individual and social updaters. This was not the case for purely spatial environmental variation. The reason for the different result owes to bet-hedging (Philippi & Seger 1989) against the small payoff to social updaters, soon after a change in the environ- ment. Because the mixed heuristic spreads its bets over two different portfolio options—individual updating and social updating—it experi- ences reduced risk of ruin, like pure individual updating, but also reaps higher rewards when the quality of socially learned behavior is high, like unbiased social learning. This is mathematically homologous to 22 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

Box 9 Evolutionary dynamics of IU under weak selection Assuming the IU heuristic is rare and that q(t) is therefore determined only by the proportions of pure individual and social updaters, p, and again that selection is weak, the above closes to: bp(1 − u) + uf(1 − (1 − c)(1 − f)) r(IU) ≈ + (1 − c)f + log[w0]. 1 − (1 − p)(1 − u) By comparing the above to the growth rates for the pure strategies, it turns out that this mixed strategy can invade the stable mix of pure strategies, for any value of f. The condition for a rare IU individual to invade the mix of I and U is r(IU)|p=¯p > r(I), wherep ¯ is defined by Expression 8 in Box 8. This condition is satisfied for all b > 0, 0 < c < 1, 0 < u < 1, which means the mixed strategy can always invade a population of pure heuristics. This kind of result is typical of game theoretic solutions of this kind. The value of f does not matter for invasion, because whatever the value of f, the first mixed strategy individual will behaviorally simulate either a pure I or a pure U individual. Since both I and U have the same fitness at p =p ¯, it makes no difference which of these heuristics is realized. The value of f will matter, however, as IU increases in frequency. Once common, it turns out that the mixed heuristic IU is also always stable against invasion by pure I and U. To prove this we need to calculate the optimal value of f = f ? that no other value of f can invade. When IU is common, the proportion of optimal behavior is Pt ? ? i−1 ? now given by q(t) = i=1 f (1 − f ) , where f is the common chance a IU individual updates individually. The evolutionarily stable value of f ? is found where ∂r(IU)/∂f|f=f ? = 0. Again using a weak selection approximation and solving the above for f ? yields: u(1 + b)c − 1 + u((1 − b)(1 − c) − b) (9) f ? = . (1 − c)(1 − u)2 By substituting the value of f ? back into r(IU), one can derive the growth rate of the mixed strategy when it is common and using the optimal value of f. We ask when r(IU)|f=p=f ? > r(I) and when r(IU)|f=p=f ? > r(U)|p=f ? . Both of these conditions are true for all b > 0, 0 < c < 1, 0 < u < 1, and so the mixed heuristic can both invade a population of pure heuristics as well as resist invasion by either pure heuristic. human investors spreading risk over multiple stocks, because even if the chance of any individual stock losing most of its value is low, if all assets are placed in a single stock, eventually all of one’s assets will lose most of their value. Similarly, the mixed heuristic out-performs both pure heuristics, because this strategy is never entirely ruined by a change in the environment, but neither does it entirely forgo the gains to social learning that accrue when the environment remains stable. Both pure heuristics instead take risks by betting entirely on one kind of event or the other. The important lesson here is that temporal variation strongly favors a fundamentally different way of combining heuristics, mixing them THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 23 within individuals rather than among individuals. In this way, the physical environment, whether variation is spatial or temporal, favors different social learning strategies and combinations of those strategies. This in turn leads us to make different predictions about the kinds of heuristics that will be adapted to different domains of behavior, depending in part upon the relative strengths of spatial and temporal variation. It should be noted, however, that the model in this section is still just a model—it is limited to fairly particular assumptions about the environment and the heuristics. The temporal variation here is not auto-correlated—if the environment has just changed, it is no more or less likely to change again. Real environments, ecological measurements suggest, tend to include a good amount of autocorrelation, as evidenced by “red” noise in their time series (Whitehead & Richerson 2009). While we see no reason that the conclusions here should not extend, qualitatively, to autocorrelated environments, we do believe that it is a problem worth modeling. Consensus learning less favored under temporal variation. Pr´ecis. When the environment varies through time, con- sensus social learning does much worse than it does un- der spatial variation. It can never exclude individual updating, because every time the environment changes, q < 1/2 for at least one generation. As a result, con- sensus learners do badly compared to unbiased learners. However, adding in some spatial variation as well helps consensus learning recover. In the case of purely spatial variation, we already demonstrated that consensus social learning can in fact exclude both simple social learning and individual updating, provided the rate of mixing between locales is sufficiently low. Under purely temporal variation, consensus learning can never exclude the other learning heuristics. Just after a change in the optimal behavior, all previously-learned behavior is non-optimal. Therefore inferring behavior from the majority will lead to stabilizing non-adaptive behavior. As a result, consensus learning depends upon some other heuristic—or mix of heuristics—to increase the frequency of newly-optimal behavior, after a change in the environment. The mathematics of this case are complex, because accounting for geometric fitness effects and the non-linearities of consensus learning is analytically difficult. But the results are easy to visualize in simulation from the fitness definitions. (The short simulation code is contained in the short appendix at the end of this chapter.) Figure 2 plots the 24 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

frequency 1.0 consensus social learning 0.8

0.6

0.4 unbiased social learning

0.2 individual updating time 50 100 150 200 250 frequency

unbiased social learning

0.6

0.4 consensus social learning

0.2 individual updating time 50 100 150 200 250

Figure 2. Stochastic simulations of the evolution of consensus learning, under either spatial or temporal envi- ronmental variation. The thin solid trend in each panel is the proportion of the population that uses individual up- dating. The thick dotted and thick solid curves are unbi- ased social learning and consensus learning, respectively. In both panels, b = 0.5, c = 0.8. Initial proportions for individual, unbiased, and consensus: 0.8, 0.1, 0.1. Top panel: Purely spatial variation, m = 0.05, u = 0. Con- sensus learning evolves to exclude both individual and unbiased social learning. Bottom panel: Purely tempo- ral variation, m = 0, u = 0.05. Consensus learning now does much worse, owing to its inability to increase the frequency of rare, novel adaptive behavior. proportions of individual updating (thin solid trend), unbiased social learning (thick dotted trend), and consensus learning (thick solid trend) through time, for both pure spatial and pure temporal environmental variation. In the absence of temporal variation in optimal behavior, consensus learning can actually exclude both individual updating and THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 25

frequency

0.8

0.6

0.4

0.2

time 200 400 600 800 1000

Figure 3. Stochastic simulations of the evolution of consensus learning, under simultaneous spatial and tem- poral environmental variation. The thin solid trend is the proportion of the population that uses individual updating. The thick dotted and thick solid curves are unbiased social learning and consensus learning, respec- tively. Same parameter values and initial proportions as in Figure 2. Consensus social learning can out-compete unbiased social learning, even in the presence of tem- poral variation, provided there is enough mixing among spatially variable patches. Here, u = m = 0.05. unbiased social learning (top panel). However, under temporal varia- tion in the environment, consensus learning does quite poorly, owing to its drop in frequency each time the environment shifts from one op- timal behavior to another. A mix of individual updating and unbiased social learning evolves. A small amount of spatial variation and mixing can go a long way towards helping consensus social learning, however (Figure 3). While temporal variation hurts consensus learning much more than it hurts unbiased social learning, spatial variation and mixing hurts unbiased learning more than it hurts consensus social learning. After a change in the environment, consensus social learners suffer reduced fitness, declining in frequency as individual updating increases in frequency (see time series in Figure 3). But once the local frequency of opti- mal behavior has increased, unbiased social learners have no particular advantage over consensus social learners. Meanwhile, consensus social learners avoid learning from immigrants with behavior adapted to other patches, while unbiased social learners do not. Reintroducing mixing among spatially variable patches provides a constant environmental 26 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN challenge that partially resuscitates consensus learning. Therefore it is not a valid conclusion that consensus learning is poorly adapted when- ever environments vary through time. Instead, we should conclude that temporal variation works against consensus learning, while mixing and spatial variation work for it. If either force is strong enough, it can eclipse the other. We caution the reader to note that the same principle will hold in the case of consensus social learning in temporally varying environments that holds for unbiased social learning: a mixed individual updating and consensus strategy will do better than a mix of pure strategies. We do not belabor this point here, because we know of no additional in- tuitions to be acquired from the analysis. But we do not want readers to conclude that mixed randomizing heuristics would not be favored for consensus learning as they would be for unbiased social learning. Indeed, the bet-hedging will arguably be stronger in the case of consen- sus learning, because the effects of a recent change in the environment are harsher for consensus learning than they are for unbiased social learning. At the same time, since consensus learning can drive the proportion of optimal behavior both down (when q < 1/2) as well as upwards (when q > 1/2), the dynamics may be much more complex and interesting.

Summary. In this section, we analyzed the effects of temporal envi- ronmental variation on unbiased and consensus social learning heuris- tics. Temporal variation requires a different approach to calculating evolutionarily-relevant payoffs, because if a strategy is reduced to zero numbers in any generation, then the strategy is dead forever. This “bottleneck” effect can have important consequences for the evolution- ary rationality of heuristics. This principle lead us to two main results. First, temporal variation can favor internally mixed heuristics, when purely spatial variation does not. The reason is that temporal vari- ation favors bet-hedging heuristics that spread risk across alternative behavioral strategies. In this case, a mixed strategy that randomly de- ploys individual updating and unbiased social updating always replaces a population of pure individual updating and unbiased social learning strategies, when there is purely temporal variation in the environment. Second, consensus social learning is disadvantaged under temporal variation. The reason is that, just after a change in the environment, all learned behavior is non-optimal. As a consequence, the majority behavior provides an invalid cue to optimality, in this context. Once some other heuristic or set of heuristics have again increased optimal THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 27 behavior in the population, consensus can do well, but lost fitness dur- ing the transition can cause it to be out-competed by other learning heuristics. This does not happen under purely spatial variation, be- cause a constant stream of immigrants actually provides an environ- mental challenge that consensus learning is well-suited to, provided that mixing is not so strong as to make the majority of local behavior non-optimal. Simultaneously combining spatial and temporal variation shows that consensus learning can be profitable when temporal varia- tion is present, provided that there is enough spatial mixing and spatial variation.

Conclusion We have tried to show how the way in which optimal behavior varies across space or through time favors different social learning heuristics. We are skeptical that there will be any learning strategy, social or not, that is best in all contexts. Instead the type of analysis in this chapter suggests that either over evolutionary or developmental time, individ- uals acquire strategies that exploit patterns in specific environments. In this way, the tradition in evolutionary ecology of studying cogni- tive adaptation via social learning is quite similar to the tradition in bounded rationality. And like some analyses in bounded rationality, the environments in this chapter are statistical. Instead of adapting to a single state of the world, the theoretical organisms in our thought experiments adapt to a statistical world in which randomness and vari- ation present survival challenges. Successful heuristics are the ones that out-reproduce competitors over many generations of learning and choice, sometimes hedging their bets against unpredictable bad times. In any particular generation, a social learning heuristic can appear nonsensical. It is in the long run, across the full distribution of envi- ronmental dynamics, that the evolutionary rationality of each heuristic appears. The breadth of issues relevant to the evolution of social learning is huge. We have focused on the nature of environmental variation, be- cause this topic has long been central to the study of learning, social or not, in evolutionary ecology (Levins 1968). Indeed, fluctuating se- lection has turned out to be central to broad debates that touch upon most corners of evolutionary biology (Gillespie 1994). Organisms are not adapted to a static world of stationary challenges, but rather a mosaic world that varies across space and fluctuates through time. A 28 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN satisfactory account of the design of heuristics will include considera- tion of this fact, even if analyzing static decision problems is often a necessary step. The precise kind of variation affected our conclusions. This result re- inforces the message that consideration of a stochastic world will have an important role to play in the study of heuristics, social or otherwise. In some cases, even scholars studying the evolution of social learning in fluctuating environments have missed the importance of the precise assumptions about the nature of the statistical environment. Wakano & Aoki (2007) analyzed a model of the evolution of consensus social learning in a temporally varying environment and find that they reach different conclusions than Henrich & Boyd (1998), who studied the evolution of consensus learning under simultaneous spatial and tempo- ral variation. As we showed in this chapter, temporal variation selects against consensus social learning in a way that spatial variation does not. Although Wakano & Aoki (2007) note the different assumptions about the nature of environment, they decide without analysis that the divergent assumptions have no role in explaining their divergent results. They instead speculate that Henrich and Boyd did not run their simulations to convergence. Explicitly testing the different per- formance of consensus learning under both models of environmental variation would have shed more light on the issue. Whitehead & Rich- erson (2009) use simulations to demonstrate that indeed some kinds of temporal variation are worse for consensus learning than others, serv- ing to re-emphasize the importance of exactly what we assume in the statistical model of the environment. More broadly, the analysis of simple social learning strategies strongly suggests that some kind of social learning will be adaptive, unless en- vironments are extremely unpredictable. While the things people say and do are not always locally adaptive, the very action of a toolbox of social and individual updating heuristics can help construct social environments in which it is worth attending to the beliefs of others. These thought experiments therefore suggest one reason that people are so powerfully influenced by mere words.

Appendix: Simulation code This code runs in Mathematica version 7, although it should run without modification in earlier versions as well. By changing the values of the parameters u and m, one can replicate the results in the figures in the main text. p = 0.8; pc = 0.1; q = 0; b = 0.5; c = 0.8; s = 1; u = 0.0; THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 29 m = 0.05; hp = {}; hpc = {}; hq = {}; w0 = 1; For[t = 0, t < 1000, t++, r = RandomReal[]; q = (1 - m)* If[r < u, 0, p s + (1 - p - pc) q + pc (q + q (1 - q) (2 q - 1))]; wi = w0 (1 + s b) c; wu = w0 (1 + b q); wc = w0 (1 + b (q + q (1 - q) (2 q - 1))); wbar = p wi + (1 - p - pc) wu + pc wc; pp = N[p wi/wbar]; ppc = N[pc wc/wbar]; AppendTo[hp, pp]; AppendTo[hpc, ppc]; AppendTo[hq, q]; p = pp; pc = ppc; ]; Clear[pp, p, q, b, c, s, u, w0, f, pc, ppc, m]; ListPlot[{hp, hpc, 1 - hp - hpc}, Joined -> True, PlotStyle -> {{Black}, {Black, Thick}, {Black, Thick, Dotted}}, AxesLabel -> {time, frequency}]

References Aoki, K., Wakano, J., & Feldman, M. W. (2005). The emergence of social learning in a temporally changing environment: A theoretical model. Current , 46:334–340. Bauch, C. T. (2005). Imitation dynamics predict vaccinating behaviour. Proc. R. Soc. B, 272:1669–1675. Bergstrom, C. T. & Godfrey-Smith, P. (1998). Pure versus mixed strategists: the evolution of behavioral heterogeneity in individuals and populations. Biology and Philosophy, 13:205–231. Bikhchandani, S., Hirshleifer, D., & Welch, I. (1992). A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of Political Economy, 100:992–1026. Bonner, J. T. (1980). The evolution of culture in animals. Princeton University Press, Princeton. Boyd, R. & Richerson, P. J. (1985). Culture and the Evolutionary Process. Univ Chicago Press, Chicago. Boyd, R. & Richerson, P. J. (1988). An evolutionary model of social learning: the effects of spatial and temporal variation. In Zentall, 30 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

T. R. & B. G. Galef, J., editors, Social Learning: Psychological and Biological Approaches, pages 29–47. Erlbaum, Hillsdale, New Jersey. Boyd, R. & Richerson, P. J. (1995). Why does culture increase human adaptability? Ethology and Sociobiology, 16:125–143. Boyd, R. & Richerson, P. J. (2001). Norms and bounded rationality. In Gigerenzer, G. & Selten, R., editors, The Adaptive Tool Box, pages 281–296. MIT Press, Cambridge MA. Boyd, R. & Richerson, P. J. (2005). Rationality, imitation, and tra- dition. In The Origin and Evolution of Cultures, chapter 18, pages 379–396. Oxford University Press, Oxford. Cavalli-Sforza, L. L. & Feldman, M. W. (1981). Cultural transmission and evolution: a quantitative approach. Princeton University Press, Princeton. Cooper, W. S. & Kaplan, R. H. (1982). Adaptive “coin-flipping”: a decision-theoretic examination of natural selection for random indi- vidual variation. Journal of Theoretical Biology, 94:135–151. Dunlap, A. S. & Stephens, D. W. (2009). Components of change in the evolution of learning and unlearned preference. Proc. R. Soc. B, doi:10.1098/rspb.2009.0602. Efferson, C., Lalive, R., Richerson, P. J., McElreath, R., & Lubell, M. (2008). Conformists and mavericks: The empirics of frequency- dependent cultural transmission. Evolution and Human Behavior, 29:56–64. Evans-Pritchard, E. E. (1937). Witchcraft, Oracles and Magic Among the Azande. Oxford University Press, Oxford. Fragaszy, D. & Perry, S., editors (2003). The Biology of Traditions: Models and Evidence. Cambridge University Press. Galef, B. G. (1992). The question of animal culture. Human Nature, 3(2):157–178. Galef, B.G., J. (1996). Social enhancement of food preferences in nor- way rats: A brief review. In Social Learning and Imitation: The Roots of Culture. Gillespie, J. H. (1974). Natural selection for within-generation variance in offspring number. Genetics, 76:601–606. Gillespie, J. H. (1994). The Causes of Molecular Evolution. Oxford University Press, Oxford. Gintis, H. (2000). Game Theory Evolving. Princeton University Press, Princeton. Giraldeau, L.-A. & Caraco, T. (2000). Social foraging theory. Princeton University Press, Princeton. Henrich, J. (2001). Cultural transmission and the diffusion of innova- tions: Adoption dynamics indicate that biased cultural transmission THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 31

is the predominate force in behavioral change and much of sociocul- tural evolution. American Anthropologist, 103:992–1013. Henrich, J. & Boyd, R. (1998). The evolution of conformist transmis- sion and the emergence of between-group differences. Evolution and Human Behavior, 19:215–242. Henrich, J. & Boyd, R. (2001). Why people punish defectors: Weak conformist transmission can stabilize costly enforcement of norms in cooperative dilemmas. Journal of Theoretical Biology, 208:79–89. Henrich, J. & Gil-White, F. (2001). The evolution of prestige: Freely conferred status as a mechanism for enhancing the benefits of cultural transmission. Evolution and Human Behavior, 22:165–196. Henrich, J. & McElreath, R. (2003). The evolution of cultural evolu- tion. Evolutionary Anthropology, 12:123–135. Laland, R. N. & Galef, B., editors (2009). The question of animal culture. Harvard University Press, Cambridge, MA. Lande, R. (2009). Adaptation to an extraordinary environment by evolution of phenotypic plasticity and genetic assimilation. Journal of Evolutionary Biology, 22:1435. Lehmann, L. & Feldman, M. W. (2008). The co-evolution of culturally inherited altruistic helping and cultural transmission under random group formation. Theoretical Population Biology, 73:506–516. Levins, R. (1968). Evolution in Changing Environments. Princeton University Press, Princeton. McElreath, R., Bell, A. V., Efferson, C., Lubell, M., Richerson, P. J., & Waring, T. (2008). Beyond existence and aiming outside the lab- oratory: Estimating frequency-dependent and payoff-biased social learning strategies. Philosophical Transactions of the Royal Society B, 363:3515–3528. McElreath, R. & Boyd, R. (2007). Mathematical Models of Social Evo- lution: A Guide for the Perplexed. Univ of Chicago Press, Chicago. McElreath, R., Boyd, R., & Richerson, P. J. (2003). Shared norms and the evolution of ethnic markers. Current Anthropology, 44:122–129. McElreath, R., Lubell, M., Richerson, P. J., Waring, T., Baum, W., Edsten, E., Efferson, C., & Paciotti, B. (2005). Applying evolution- ary models to the laboratory study of social learning. Evolution and Human Behavior, 26:483–508. McElreath, R. & Strimling, P. (2008). When natural selection favors imitation of parents. Current Anthropology, 49:307–316. Mesoudi, A. (2008). An experimental simulation of the ”copy- successful-individuals” cultural learning strategy: Adaptive land- scapes, producer-scrounger dynamics and informational access costs. Evolution and Human Behavior, 29:350–363. 32 RICHARD MCELREATH, BARBARA FASOLO, AND ANNIKA WALLIN

Mesoudi, A. & Lycett, S. J. (2009). Random copying, frequency- dependent copying and culture change. Evolution and Human Be- havior, 30:41–48. Mesoudi, A. & O’Brien, M. (2008). The cultural transmission of great basin projectile point technology i: An experimental simulation. American Antiquity, 73:3–28. Mesoudi, A. & Whiten, A. (2008). The multiple roles of cultural transmission experiments in understanding human cultural evolu- tion. Philosophical Transactions of the Royal Society B, 363:3489– 3501. Orr, H. A. (2009). Fitness and its role in evolutionary genetics. Nature Reviews Genetics, doi:10.1038/nrg2603. Philippi, T. & Seger, J. (1989). Hedging one’s evolutionary bets, revis- ited. Trends in Ecology and Evolution, 4(2):41–44. Price, E. E., Lambeth, S. P., Schapiro, S. J., & Whiten, A. (2009). A potent effect of observational learning on chimpanzee tool construc- tion. Proc. R. Soc. B, doi: 10.1098/rspb.2009.0640. Richerson, P. J. & Boyd, R. (2005). Not by Genes Alone: How culture transformed human evolution. University of Chicago Press, Chicago. Rogers, A. R. (1988). Does biology constrain culture? American Anthropologist, 90:819–831. Schlag, K. H. (1998). Why imitate, and if so, how? Journal of Eco- nomic Theory, 78(1):130–156. Studies; Experimental English. Schlag, K. H. (1999). Which one should I imitate? Journal of Mathe- matical Economics, 31(4):493–522. Studies; Experimental English. Stahl, D. O. (2000). Rule learning in symmetric normal-form games: Theory and evidence. Games and Economic Behavior, 32:105–138. Wakano, J. & Aoki, K. (2007). Do social learning and conformist bias coevolve? henrich and boyd revisited. Theoretical Population Biology, 72:504–512. Wakano, J., Aoki, K., & Feldman, M. (2004). Evolution of social learning: a mathematical analysis. Theoretical Population Biology, 66:249–258. Whitehead, H. & Richerson, P. (2009). The evolution of conformist social learning can cause population collapse in realistically variable environments. Evolution and Human Behavior, 30:261–273. THE EVOLUTIONARY RATIONALITY OF SOCIAL LEARNING 33

McElreath: [email protected]. Department of Anthropology and Graduate Groups in Ecology, Population Biology and Animal Be- havior, University of California, Davis CA 95616, USA.

Fasolo: Department of Management (Operational Research Group), London School of Economics and Political Science, G313 Houghton Street, London WC2A 2AE UK.

Wallin: Department of Philosophy, Lund University, Kungshuset, Lundag˚ard, 222 22 Lund SE. Phil. Trans. R. Soc. B (2008) 363, 3515–3528 doi:10.1098/rstb.2008.0131 Published online 17 September 2008

Beyond existence and aiming outside the laboratory: estimating frequency-dependent and pay-off-biased social learning strategies Richard McElreath1,*, Adrian V. Bell2, Charles Efferson3, Mark Lubell2, Peter J. Richerson2 and Timothy Waring2 1Department of Anthropology, and 2Department of Environmental Science and Policy, University of California Davis, Davis, CA 95616, USA 3Institute for Empirical Research in Economics, Blu¨mlisalpstrasse 10, 8006 Zu¨rich, Switzerland The existence of social learning has been confirmed in diverse taxa, from apes to guppies. In order to advance our understanding of the consequences of social transmission and evolution of behaviour, however, we require statistical tools that can distinguish among diverse social learning strategies. In this paper, we advance two main ideas. First, social learning is diverse, in the sense that individuals can take advantage of different kinds of information and combine them in different ways. Examining learning strategies for different information conditions illuminates the more detailed design of social learning. We construct and analyse an evolutionary model of diverse social learning heuristics, in order to generate predictions and illustrate the impact of design differences on an organism’s fitness. Second, in order to eventually escape the laboratory and apply social learning models to natural behaviour, we require statistical methods that do not depend upon tight experimental control. Therefore, we examine strategic social learning in an experimental setting in which the social information itself is endogenous to the experimental group, as it is in natural settings. We develop statistical models for distinguishing among different strategic uses of social information. The experimental data strongly suggest that most participants employ a hierarchical strategy that uses both average observed pay-offs of options as well as frequency information, the same model predicted by our evolutionary analysis to dominate a wide range of conditions. Keywords: cultural evolution; social learning; quantitative methods

1. INTRODUCTION serious exchange between the evolutionary anthro- Under a broad definition, social learning is common in pology literature on social learning—which emphasizes nature. The behaviour of conspecifics influences a toolbox of social learning strategies, such as majority individual behaviour through modification of the rule conformity and pay-off-biased learning (Boyd & environment, emulation of goals and imitation of Richerson 1985; Henrich & McElreath 2003)—and the patterns (cf. Whiten & Ham 1992). This psychological animal literature—which tends to emphasize the set of distinctions has directed years of research in existence or not of culture. animal behaviour, especially the study of social learning There are at least two good reasons to try. First, non- in non-human apes. Distinguishing between emulation human animals may also have special-purpose social and imitation, and the interaction of the two (Horner & learning strategies that combine and recombine Whiten 2005), has generated a literature testifying to different kinds of social information, yet usually no the breadth and diversity of social learning in nature effort is made to look for these (Laland 2004). Finding (Fragaszy & Perry 2003). such cases of analogy (or possibly homology, in the case More recent high-profile experiments with chim- of other apes) would potentiate advances in the general panzees (Whiten et al. 2005, 2007) have demonstrated understanding of the evolution of adaptations for that short-lived traditions can evolve in chimpanzee processing social information. Second, many biologists social groups, backing up studies that claim that and anthropologists remain sceptical of the evidence behavioural variation among wild populations of of animal, and especially great ape, culture (Laland & chimpanzees are ‘cultural’ (Boesch & Tomasello Janik 2006). This is partly a result of the difficulty 1998; Whiten et al. 1999; Boesch 2003). While the of inferring patterns of learning from cross sections of finding of short-lived socially transmitted traditions behavioural variation. However, statistical tools may not be surprising to students of Galef’s rat developed to study dynamic learning in human groups experiments (Galef & Whiskin 1997), the findings can be leveraged to study diverse social learning suggest that the time may be right to attempt a more strategies in other animals, as well. * Author for correspondence ([email protected]). In this paper, we illustrate an approach for analysing One contribution of 11 to a Theme Issue ‘Cultural transmission and different strategies for combining social cues from the evolution of human behaviour’. multiple conspecifics, in less poorly controlled settings.

3515 This journal is q 2008 The Royal Society 3516 R. McElreath et al. Analysis on social learning strategies

We use a stylized evolutionary model to generate (Kokko et al. 2006) and social selection (Frank 2006) broad predictions for which of several candidate have generated special literatures of theory and strategies we expect to find in nature and under what evidence that testify to the subtle diversity of the conditions. We then apply these stylized predictions to action of natural selection. One could seriously say a laboratory experiment that allows participants great that there are many natural selections. flexibility in from whom and how they learn. Instead of In a similar sense, there are many social learnings. asking if social learning occurs, we develop likelihood Psychologists and animal behaviourists have long models that allow us to ask how participants socially recognized taxonomic distinctions between, for learn. While the precise example we present uses example, social facilitation and imitation (Zajonc very detailed information, the same approach can be 1965). But many of the highest profile publications applied to more naturalistic contexts, in which still address basic existence questions, asking if other incomplete time series or purely cross-sectional data animals have human-like social learning and human- are all that are available. like traditions or culture (Whiten et al. 2005). These While neither the appreciation of strategic diversity publications are probably taking the right rhetorical nor our model-based approach is particularly new in approach. Many anthropologists remain unconvinced itself, we think the combination is of value. The key that chimpanzee or crow culture is much like human insight is that each social learning strategy implies culture (Boesch 2003). different outcomes, under at least some sets of available However, many scientists have enough interest in information, both for each individual and entire groups the details of social learning in humans, as well as other of individuals. This is not a new point (Cavalli-Sforza & animals, to step aside the ‘is it human enough?’ debate. Feldman 1981; Boyd & Richerson 1985; Galef & As social learning is diverse, it has diverse effects. Some Whiskin 1997), but statistical approaches are usually mechanisms generate rather short-lived traditions, if not up to the task of exploring it adequately. Those who any at all (Galef & Whiskin 1997). Human cultural do study distinctions among strategies may be inclined traditions can be both ephemeral and demonstrate to rely upon highly controlled and artificial experi- tremendous inertia (Richerson & Boyd 2005), depend- ments. Even when an experimenter is clever enough to ing in part upon the strategic diversity of social learning design a series of treatments that can carefully and the details of the social context (Cavalli-Sforza & distinguish among diverse strategies in the laboratory, Feldman 1981; Boyd & Richerson 1992). Studying the scientists will still debate the lessons of behaviour in the mechanistic and algorithmic diversity of social learning wild. In order to resolve animal culture debates and will be just as important as arguing that it exists, and gain a more detailed behavioural understanding of our hunch is that most researchers in both anthro- social learning, whether in humans or other animals, pology and animal behaviour are prepared to move in we will need analytical approaches that do not require this direction. precise experimental control of social information. In this section, we briefly review evolutionary work Another reason to develop statistical methods for less on structurally different social learning strategies. controlled contexts is that part of the action in social Most of this literature has been concerned with learning is evolution of behaviour, and experiments human social and cultural learning (Boyd & Richerson that control social information do not allow us to study 1985; Henrich & McElreath 2003), but there is no these population-level effects nor how strategies are reason these models cannot apply to other organisms adapted to them. (Laland 2004). Before moving on to apply these The general approach we suggest is to (i) nominate different strategies to experimental data, we hope a series of candidate social learning strategies, (ii) to convince the reader that it is worth asking, for translate each of these into an expression for the example, if chimpanzees also use majority rule social conditional probability of behaviour, given an infor- learning or are guided by observed cues of others’ mational context for an individual animal, (iii) use these success. While no single strategy is imagined to expressions to generate likelihoods of observing field or dominate at all times nor to exist in the absence of laboratory data, and (iv) compare the fits of these individual learning, the dynamic consequences of each strategies to the data with information theoretic criteria, strategycanbeappreciatedmosteasilybyfirst such as Akaike information criterion (AIC) or Bayesian examining them in isolation. information criterion (BIC). Approaching the problem as a task of discriminating among a toolbox of potential (a) Unbiased social learning strategies, rather than a task of demonstrating the One of the simplest social learning strategies is to select existence of social learning, may allow all of us to a random target individual and copy his or her squeeze more from both our experiments and field behaviour. We call this kind of social learning studies than we previously imagined. ‘unbiased’, as it tends to maintain the frequencies of different behaviour (Cavalli-Sforza & Feldman 1981; Boyd & Richerson 1985). One adaptive advantage of 2. MANY WAYS TO LEARN SOCIALLY unbiased social learning is economizing on individual There was a time when biology wondered if natural learning costs (Boyd & Richerson 1985). selection occurred. Now no one—within evolutionary biology—seriously questions the existence of natural (b) Frequency-dependent social learning selection as an evolutionary force. Instead, we debate When individuals can sample more than one con- its relative strength and character in different specific, a large family of frequency-dependent environmental and biological contexts. Both sexual strategies become possible. The most commonly

Phil. Trans. R. Soc. B (2008) Analysis on social learning strategies R. McElreath et al. 3517

(a)(positive frequency dependence b) positive frequency dependence 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 after imitation frequency of trait 0.2 0.2 expected frequency of trait 0 0

(c)(pay-off biasedd) pay-off biased + frequency dependence 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 after imitation

frequency of trait 0.2 0.2 expected frequency of trait 0 0 0 0.2 0.4 0.6 0.8 1.0 510152025 frequency of trait before imitation time Figure 1. (a,c) Instantaneous and (b,d ) evolutionary dynamics of frequency-dependent and pay-off-biased social learning. studied of these is positive frequency dependence, which evolutionary dynamics that can be very similar to preferentially copies the most common behaviour natural selection (Boyd & Richerson 1985; Schlag variants in the sample. Such a strategy has very deep 1998, 1999; Henrich & Gil-White 2001). A key intellectual roots, being studied formally at least as far property of these strategies may be their tendency to back as 1785, in Condorcet’s jury theorem (see Estlund lead to the copying of neutral or mildly maladaptive 1994). Evolutionary treatments of positive frequency behaviour that was initially associated with successful dependence, ‘conformity’, emphasize its adaptive value individuals (Boyd & Richerson 1985), but recombina- for individuals (Boyd & Richerson 1985; Henrich & tion is also a possibility (Boyd & Richerson 2002). Boyd 1998). Figure 1c,d plots the instantaneous and evolutionary Figure 1a,b plots the instantaneous and evolutionary dynamics of simple pay-off-biased learning. In figure 1c, dynamics of positive frequency dependence. In frequency of trait after social learning as a function of figure 1a, for any frequency of one of two alternative the frequency before social learning is shown. If p is learned behaviours on the horizontal axis, the solid the value on the horizontal axis, then pCpð1KpÞb curve gives the expected frequency (or probability of is the value on the vertical axis (derived in McElreath & adoption) after social learning. If p is the value on the Boyd 2007, ch. 1). The parameter b determines the horizontal axis, then pCpð1KpÞð2p K1Þ is the value strength of pay-off bias and is analogous to a selection on the vertical (Boyd & Richerson 1985—we re-derive coefficient, in genetic evolutionary theory. We plot this function in §2e). The dashed line illustrates the bZ1/2 here. The dashed line is again the expectation expected frequency under unbiased social learning. In under unbiased social learning. In figure 1d, the figure 1b, the evolution of behaviour within a evolutionary dynamics produce a classic logistic growth population of learners who practice positive frequency curve (solid curve). Pay-off-biased social learning tends dependence depends on whether the initial frequency to increase the frequency of adaptive behaviour, but at of behaviour is below or above one-half. Positive the cost of greater information demands. frequency dependence tends to increase the more common variants and decrease the others. (d) Integrated social learning Many mixes of the above kinds of social learning are (c) Pay-off-biased social learning possible (Laland 2004; Whiten et al. 2004). Aside from When individuals have information about the pay-offs the likely possibility that individual asymmetries—age, of others, it is possible to use these cues of success to sex, skill, position in social network—will make some adaptively bias social learning. Such pay-off-, success-, strategies more common among some individuals, or prestige-biased social learning can be very individually strategies can be hierarchically ranked within each adaptive, provided cues are reliable, leading to individual. Mixes of strategies produce their own

Phil. Trans. R. Soc. B (2008) 3518 R. McElreath et al. Analysis on social learning strategies evolutionary trajectories, as well (Henrich 2001). The Table 1. Probabilities of acquiring optimal behaviour via dashed curve in figure 1d is the dynamics of a mix social learning, for the three nonlinear strategies positive of pay-off bias and positive frequency dependence. frequency dependence (C), pay-off bias (S) and pay-off For different mixes of these and other strategies, conformity (SC). The sample column gives all possible different evolutionary dynamics are expected. samples of three adults. Uppercase letters indicate that the individual sampled received a large (B) pay-off from their behaviour whereas lowercase indicates the opposite. A or a (e) Modelling integrated pay-off-biased and indicates optimal behaviour and B or b indicates non-optimal frequency-dependent social learning behaviour. By multiplying each probability of a specific While there has been modelling effort devoted to sample occurring by chance in a particular strategy column, studying linear, unbiased social learning, frequency- one can sum these products to compute an expected dependent social learning and pay-off-biased social probability of acquiring optimal behaviour via a given strategy. In the Pr(sample) column, q is the frequency of learning, to our knowledge no theoretical study has optimal behaviour in the entire adult population and a and b simultaneously examined these options in the same are the probabilities of optimal and non-optimal behaviours, context. Therefore, we finish this section by presenting respectively, returning large pay-offs. an extension of existing evolutionary theory that includes frequency-dependent bias, pay-off bias and Pr Pr Pr a hierarchical integration of the two. We construct sample Pr(sample) (1rC) (1rS) (1rSC) recursions for the dynamics of genes controlling these different learning strategies, as well as for the frequency AAA q3$a3 111 of adaptive learned behaviour. We then analyse this AAa q3$3a2(1Ka) 111 3 2 gene-culture system in order to understand what Aaa q $3a(1Ka) 111 3$ K 3 environments favour different strategies. aaa q (1 a) 111 2 K $ 2 Consider a large population living in a uniform but AAB 3q (1 q) a b 1 2/3 1 AAb 3q2(1Kq)$a2(1Kb) 111 temporally varying environment. Each individual faces AaB 3q2(1Kq)$2a(1Ka)b 100 a choice of two discrete behaviours. One of these Aab 3q2(1Kq)$2a(1Ka) 111 choices yields a fitness benefit B, a proportion a of the (1Kb) time, yielding an average of aB. The other yields an aaB 3q2(1Kq)$(1Ka)2b 100 average bB!aB. For each generation, there is a chance aab 3q2(1Kq)$(1Ka)2 1 2/3 1 u that the better behaviour switches to the other option. (1Kb) These changes cannot be observed by individuals. ABB 3q2(1Kq)2$ab2 0 1/3 0 Behaviour is acquired via learning, either individu- ABb 3qð1KqÞ2$a2bð1KbÞ 011 ally or socially. Individual learning (I) pays an average Abb 3qð1KqÞ2$að1KbÞ2 011 2 2 learning cost in order to determine which option is abb 3q(1Kq) $(1Ka)(1Kb) 0 1/3 0 2 better. This makes the fitness of an individual learner: aBb 3qð1KqÞ $ð1KaÞ2bð1KbÞ 000 aBB 3qð1KqÞ2$ð1KaÞb2 000 BBB ð1KqÞ3$b3 000 W ðIÞ Z w0 CaBKc; BBb ð1KqÞ3$3b2ð1KbÞ 000 K 3$ K 2 where w0 is baseline fitness from other behaviour and c Bbb ð1 qÞ 3bð1 bÞ 000 is the average cost of learning. bbb ð1KqÞ3$ð1KbÞ3 000 Social learning can be unbiased (linear, L), frequency dependent (conformist, C), pay-off biased (S) or pay-off conformity (SC). Linear social learning conformist learner acquiring optimal behaviour is copies a random adult from the previous generation, resulting in average fitness: qC Z q Cqð1KqÞð2q K1Þ:

W ðLÞ Z w0 CaBq CbBð1KqÞ Using this expression gives us a mean fitness for C, W ðCÞ Z w CBðaq Cbð1K q ÞÞ: Z w0 CBðaq Cbð1KqÞÞ: 0 C C Pay-off-biased social learning (S) samples three The frequency of currently optimal behaviour, q, has individuals and adopts the behaviour with the highest its own dynamics, which we define below. The average observed pay-off. We compute the expected important point here is that linear social learning probability of acquiring optimal behaviour through this does not transform this proportion in any direct way. heuristic in the same fashion as for conformity: each of On average, it replicates the frequency of optimal the three models sampled has a chance q of having behaviour across generations. optimal behaviour and each model then has a chance Positive frequency dependence, conformity (C), either a or b of displaying a pay-off of B. Thus, the does however transform q. We assume perhaps the probability of any combination of underlying behaviour simplest conformity heuristic. The learner samples and displayed pay-offs can be computed from the three random adults from the previous generation and binomial distribution (table 1). This results in a chance then adopts the most common behaviour among these of acquiring optimal behaviour: three models. Since the chance that any one model has optimal behaviour is q, the binomial distribution qS Z q Cqð1KqÞðað2 Cbð2K3bð1KqÞK4qÞÞ (table 1) allows us to compute the probability of any C 2 K C K K : combination and therefore the probability of the a ð3b 1Þq bðbð1 qÞ 2ÞÞ

Phil. Trans. R. Soc. B (2008) Analysis on social learning strategies R. McElreath et al. 3519

(i) (ii) (iii) (iv) (v) (a) 0 IL

u

CSSC 0.6 0 b 0.5 (i) (ii) (iii) (iv) (v) (b) 0 I a – b LC SSC 0.5 0 b 0.5 (i) (ii) (iii) (iv) (v) (c) 0 I L u CSSC 0.6 2 10 B Figure 2. Sensitivity analysis for the evolutionary model of social learning strategies in the main text. Each row plots the frequencies of five different strategies (individual learning, unbiased social learning, positive frequency dependence, pay-off bias and pay-off conformity) for two-dimensional combinations of parameters. Each individual plot is the frequency of a single strategy after 5000 generations of simulations at all combinations of the two parameters labelled on each axis. (a(i)–(v)) u varied from 0 to 0.5, b varied from 0 to 0.5, while aZ0.5Cb.(b(i)–(v)) aKb varied from 0 to 0.5, b again from 0 to 0.5. (c(i)–(v)) u again varied from 0 to 0.5, B from 2 to 10. All other parameters not on axes were fixed at B/cZ6, uZ0.1, aZ3/4, bZ1/4, w0Z2. The most powerful inference from these simulations is that either pay-off bias (S) or pay-off conformity (SC) dominates the population, unless the environment is very unstable and individual learning is too costly, relative to fitness benefits of optimal behaviour. The fitness of S is therefore, The complete evolutionary system is very difficult to analyse, because the recursion for q is highly nonlinear. W ðSÞ Z w CBðaq Cbð1K q ÞÞ: 0 S S This means there is no guarantee that q even reaches a Finally, we consider the integrated strategy pay-off stationary distribution, and so the fast–slow dynamics conformity (SC). This strategy attempts pay-off-biased approach often employed in these situations (see social learning just as S, but falls back on positive McElreath & Boyd 2007, ch. 6) is risky. Even if we frequency dependence whenever observed pay-offs are adopt the fast–slow approach, the implied equilibrium tied. Just as before, it is possible to compute the of q is itself the solution to a cubic in q and very difficult expected chance of acquiring optimal behaviour to analyse. through this heuristic, by using the binomial distri- Therefore, we adopt a simple simulation approach bution (table 1). This gives us to analysing this system. We conduct simulations for a large number of parameter combinations in order to Z C K K 2 K qSC q qð1 qÞð3að1 b Þð1 qÞ map out the conditions that favour different strategies. C3a2bqKqð3bK2Þ K1Þ: The fitness expressions and the recursion for q allow us to define a set of difference equations that define Again, this implies mean fitness: the evolutionary dynamics of the system. For any initial frequencies of the strategies and values for W ðSCÞ Z w0 CBðaqSC Cbð1K qSCÞÞ: w0; B; c; a; b; u, simulating this system amounts to The dynamics of q are governed by the proportions generating a random variable u t and recursively of each strategy in the population. For proportions computing the frequencies of each strategy after ; ; ; ; fI fL fC fS fSC, the frequency of optimal behaviour in selection. After 5000 simulated generations at each the next generation in the absence of environmental parameter combination, we record the frequency of change is given by each strategy. While frequencies could in principle be q0 Z f Cf q Cf q Cf q Cf q : highly stochastic, fluctuating as selection fluctuates, the I L C C S S SC SC results show that taking the final frequency delivers Now accounting for environmental change, we arrive at the correct inferences. It also turns out that initial the recursion for the frequency of optimal behaviour in frequencies have no effect on the long-run evolution of the next generation: the system, allowing us to present simulation results for uniform initial conditions in which all strategies had q00 Z ð1K u Þq0 Cu ð1Kq0Þ; t t initially equal frequency. where ut 2 ½0; 1 is a random variable indicating Figure 2 plots the frequencies of each strategy whether the environment changed in generation t. at simulation end, for two-dimensional sensitivity This random variable has chance u of being 1, as u is analyses. Black indicates a frequency of 1, white the long-run rate of environmental change. indicates a frequency of 0 and grey indicates

Phil. Trans. R. Soc. B (2008) 3520 R. McElreath et al. Analysis on social learning strategies intermediate frequencies, on a linear gradient. Baseline frequency dependent, it needs the optimal behaviour to parameter values in these simulations were B/cZ6, be the more common behaviour, at least long enough uZ0.1, aZ3/4, bZ1/4, w0Z2. In figure 2a,the to realize fitness gains. Otherwise, ignoring frequency horizontal axis takes b, the rate of good pay-offs from information is more adaptive. The other factor the non-optimal choice, from 0 to 0.5, holding the affecting whether pay-off conformity dominates pure value of aZ0.5Cb. Thus, the degree to which pay-off bias appears to be the magnitudes of a and b, the optimal choice is better remains constant, but the the chances optimal and non-options behaviour yield absolute level of profitability of both options increase, large pay-offs. In the simulations, when aO1/2, the as one moves left to right on the horizontal axis. The integrated pay-off-conformity strategy outperforms vertical axis takes u, the rate of environmental change, pay-off bias alone, holding the difference aKb constant. from 0 to 0.6, moving top to bottom. We are unsure what is causing this advantage. The When u is large, the environment changes rapidly, expression qSCOqS can be reduced, but it yields a and individual learning excludes the other strategies complicated expression that is difficult to interpret. It (figure 2a(i)). When the environment is sufficiently is also not the whole story, because the average value stable, however, either pay-off-biased social learning of q is not described by this condition, and a and b (S) or pay-off-conformity social learning (SC) excludes will have large effects on this value. the other strategies. When b is small, S excludes SC. An interesting feature of pay-off-biased strategies is The second row varies the difference between the that they can eliminate individual learning, because any optimal and non-optimal option, aKb, from 0 to 0.5, variation among individuals in choice can be used to on the vertical axis. The difference in profitability discriminate good and bad options by pay-offs. All between the two options interacts only very weakly of the nonlinear social learning strategies—positive with the absolute level of profitability, shown again on frequency-dependence, pay-off bias and pay-off con- the horizontal axis. At the extreme limit of aKbZ0, formity—can in fact do this, because their nonlinear learning does not pay at all, and so all strategies remain effects can, under the right conditions, accomplish the at their initial frequencies (the grey line at the top of same thing as individual learning. the plots in figure 2b(ii)–(v)), except for individual In §3, we present an experimental design that allows learning (I), which is eliminated for trying to learn for a large number of different and integrated social and paying a direct cost to do so. learning strategies. In light of these simulations, we The third row of simulations interacts environmental expect a heavy reliance on pay-off bias. Also, because uncertainty, u, with the magnitude of pay-offs, B.The the environment is quite stable in the experiment vertical axis is identical to that of the first row, but the (changing every 15 periods, or a rate of 0.07), the horizontal varies B from 2 to 10 (centred on the value integrated pay-off-conformity strategy should exclude BZ6 that generated the other rows). We can see now pure pay-off bias. We do not think these exact that, when B is sufficiently small, individual learning is predictions will describe the results—even simple always excluded, even when the environment is highly experiments are much more complex than the unstable. Pay-off-biased social learning, however, theory that motivates them. However, if the theory excludes the other strategies for these parameter we have presented here gets at the right economic combinations. Pay-off conformity only dominates, as considerations, then the qualitative results should the environment becomes more stable. This stands to show a much stronger reliance on pay-off bias than reason, as conformity—combined with pay-off bias or frequency bias. not—suffers more from changes in the environment than does pure pay-off bias. To understand this, consider what happens to a conformist just after a change in 3. EXPERIMENTAL DESIGN the environment. Chances are, majority behaviour is In order to study the diversity of social learning strategies, we suboptimal, and therefore conformity tends to reduce require a decision context complex enough to make both the frequency of optimal behaviour even more. Pay-off frequency dependence and pay-off bias simultaneously bias, however, can still use pay-offs as a cue to optimality. possible. Our social learning experiments create social contexts in which groups of individuals can evolve beha- (f ) Analysis summary vioural traditions, through a combination of their own The most obvious result of this analysis is to emphasize experience and the available social information. These the adaptive significance of pay-off-biased social ‘microsociety’ (Schotter & Sopher 2003; Baum et al. 2004) learning, whether combined with frequency depen- experiments are highly controlled, relative to field studies of social learning, and as a result, we know which social and dence or not. Provided pay-offs can be observed individual information each participant examines at each with sufficient accuracy, adopting behavioural options time step. Unlike most experiments, however, our experi- with higher observed average pay-offs excludes other mental groups generate all social information endogenously, strategies under a wide range of conditions. Unless the without any experimenter deception. This both allows us to environment is extremely stochastic (in which case examine the emergent properties of social learning and individual learning dominates) or almost perfectly stable develop statistical methods that can address less controlled (in which case pure conformity dominates), some kind natural sources of data. of pay-off-biased learning is an evolutionarily stable The experiment allows participants to access both the strategy, in our simulations. frequencies of different choices and associated pay-offs, The integrated social learning strategy, pay-off within their own social groups. Over a series of rounds, they conformity, excludes pure pay-off bias when may or may not use this information to learn, and we use the the environment is not too unstable. Being partly complete time series of decisions and records of which

Phil. Trans. R. Soc. B (2008) Analysis on social learning strategies R. McElreath et al. 3521 participants access which information in order to test the analysis, because all participants would be sure of the best different models of social learning, pay-off biased or option by then, as we have learned from previous experiments frequency biased. (McElreath et al. 2005). If a farm is too short, however, we We have used a similar social decision environment in never witness the full dynamics of any learning process. previous work (McElreath et al. 2005), and the environment Therefore, guided by pilot experiments and our simulation itself is a social-learning extension of familiar multi-arm studies, we decided on 15 periods per farm, as this is the bandits used in diverse fields to study individual learning. By approximate value that maximized our ability to correctly using a well-studied decision environment, we can begin with distinguish simulated strategies. good candidate individual learning models and study the effects of adding different kinds of social information. Our (d) Social information previous experimental studies have omitted pay-off infor- On the first period of each farm, no social information was mation, and so we could not consider pay-off-biased available. However, on each period after the first, participants strategies. And while we have used the statistical approach could access social information from the most recent period. in our previous papers, we have not previously emphasized Participants could examine their own most recent crop the methodological value of the statistics themselves, for choices and resulting yields. Each participant could also analysing data collected in ‘wild’ contexts. examine the most recent crop choices and yields of each member of their own group. This information was displayed (a) Participants on screen in boxes labelled by the type of information. When One hundred and sixty-three participants, students at the a participant moused over a box, the information in it was University of California at Davis, interacted with one another displayed. The experiment software tracked millisecond via a computer network. We recruited participants through an access to this information, resulting in a time series of advertisement in the campus newspaper. Participants information access. This kind of ‘mouse-tracking’ experiment received between $5 and $20 for their participation, based has been used to great effect in judgement and decision- upon their performance. We used no deception in this making research (Payne et al. 1993). The order of the rows, experiment. Participants read a complete set of instructions yield and crop was randomized for each participant, each and successfully completed a set of test questions about their period, and the order of neighbours was also randomized. knowledge of the experiment, before beginning. The order of the crop choices at the bottom was also randomized within each participant and period. (b) Group structure Participants were sorted into random, anonymous groups (e) Pay-offs of four to seven individuals, in sessions of between 8 and Both crops generated pay-offs from normal distributions with 20 participants. Each session was a single experiment on a the samevariance, while the better crop had a mean pay-off of 13 single date. While participants in the same session made units and the worse 10 units (set from previous experience and choices in the same room, these participants did not know simulation study). Participants knew that one crop had a which of the other participants they were sorted into a group constant higher mean than the other, but had no prior with. Groups were constrained to be always greater than information that would allow them to determine which of the three individuals, in order for frequency bias to be effective, two was better. as three neighbours are the required minimum for positive The variance of yields was constant within farms but could frequency dependence. Depending upon the total number of be either 1/2 or 4, determined randomly but in a way to participants showing up for a given session, group sizes were ensure two farms with a variance of 1/2 and two farms with a arranged to create as many groups of four as possible. All variance of 4. The different variances comprise a learning remaining participants in that session were placed in a single difficulty treatment that we have used in previous experiments larger group. (McElreath et al. 2005).

(c) Decision (f ) Simulating the experiment Over a series of 60 periods, ‘seasons’, each participant made a While there is not space here to describe our simulation series of 60 crop choice decisions. These 60 periods were in detail, we used the statistical models we will present divided into four ‘farms’ of 15 periods each. These farms later to produce simulated experimental play, under a served to signal to participants that conditions might have variety of group sizes and other experiment parameters. changed. On any given farm, one of two crops, ‘wheat’ or These simulations simply use the probability models to ‘potatoes’, had a higher average yield than the other. Across produce stochastic learning and choice. We then run farms, which crop was optimal was determined at random. the data produced through the exact statistical analysis Thus in each period, each participant chose a single crop we use on the real data. These simulations allowed us to to plant and receive a yield from. Yields were summed across (i) choose good experimental design parameters and all periods, and participants received cash payment so that (ii) verify that our statistical analysis works (i.e. recovers they earned between $5 and $20, depending upon per- true simulated strategies). formance. The vast majority of participants earned between $15 and $18. The number of farms and periods in each finesses the 4. RESULTS trade-offs of (i) having only limited time to keep participants Like our previous experiments (McElreath et al. 2005; before they grow bored and unmotivated and (ii) desiring the Efferson et al. 2007), participants learn the optimal most varied data on learning. Thus the total number of crop for each farm, over time. Figure 3 shows periods, 4!15, is set by the time constraint. The number of the proportion of participants making optimal choices, periods per farm is set to maximize information about as a function of period within each farm. The rate learning dynamics. If we had a single farm of 60 periods, of improvement is much faster than in previous most of the later periods would add little to nothing to the experiments, which omitted pay-off information for

Phil. Trans. R. Soc. B (2008) 3522 R. McElreath et al. Analysis on social learning strategies

(a)(1.0 b) 0.65

0.9

0.8 0.60

0.7

examined 0.55 optimal crop 0.6 proportion choosing proportion neighbours 0.5 0.50 2 468101214 246810 12 14 round in farm round in farm Figure 3. (a) Proportion of optimal crop decisions, by round within farm. Vertical lines show 95% profile-likelihood confidence intervals. (b) Proportion of neighbours’ crop decisions (circles) and yields (curve) inspected, by round within farm. neighbours (McElreath et al. 2005). The increase combination of individual choice and the influence of between periods 2 and 15 is much smaller than the social information. increase between periods 1 and 2. A large number of meaningfully different strategies Perhaps as a result of the marginal gains in can be constructed by varying these two components optimality declining after the second period, rates of (Camerer & Ho 1999; Stahl 2000; Camerer 2003). As inspecting the choices (which crop was planted) and our purpose in this paper is to illustrate the approach yields (how much profit was made last period) of in the simplest manner, we do not explore a large neighbours decline from the second period onward strategy space, but instead restrict ourselves to those (figure 3). The average rate never falls below a nominated by the basic research question and existing majority of neighbours, however. Note that rates of evolutionary literature: how do people use frequency- inspecting yields slightly exceed those for inspecting dependent and pay-off-biased social learning, when crop choices. This implies that some participants were both are possible? using something similar to an elimination by aspects We examine five different models that combine strategy, in which one important cue is used to first elements of frequency dependence and/or pay-off bias. narrow down the number of cases one will consider First, we define (i) individual learning, (ii) frequency- (see Payne et al. 1993). In this case, some participants dependent social learning, (iii) pay-off-biased social may have first eliminated neighbours to examine crop learning. We then define hierarchical strategies that choices from, by first scanning the yields from the combine pay-off-biased learning with the frequency previous period. This would result in the kind of dependence or individual learning: (iv) hierarchical pattern seen in figure 3b. Our statistical analyses in §5 compare means and individual learning and (v) use only the yields and crops actually inspected by hierarchical compare means and frequency depen- each participant, and so take the search strategy as a dence. We do not present analyses of strategies that given. We think the design of the search strategy is a reverse the hierarchical order of information use, worthwhile question, however. But we doubt such frequency dependence and compare means, for details—truly observing information search—will example. These strategies fit very poorly to our data, often be possible in natural settings. as will become clear when we examine the fits of each basic model, and so we omit them for simplicity of presentation. 5. ANALYSIS We adopt a statistical approach that allows us to (i) directly use mathematical models of social learning (i) Individual learning strategies as statistical models and (ii) evaluate several We use a standard, successful reinforcement learning plausible, non-null statistical models simultaneously. model as the basis of individual updating (Camerer The question is not whether social information is 2003, ch. 6). The attraction score of option i in period used—few would expect a complete absence of social tC1 is given by learning in such a context—but rather how social A Z ð1KfÞA Cfp ; information is used. i;tC1 i;t i;t where f is a parameter determining the weight given to (a) Strategies new experience and pi,t is the pay-off observed for We translate each hypothetical learning strategy into an option i in period t. When option i was not sampled in expression that yields the conditional probability of an period t, pi,tZ0. Since there is no reason to expect individual choosing any behavioural option i in any participants to have strong priors favouring either period t, given private information and the social behavioural option, we set A1,0ZA2,0Z0. information the individual accessed. Each strategy The attraction scores are transformed into probabil- consists of two parts. The first part is the definition of istic choice with a ‘softmax’ choice rule, again typical a recursion for updating the attraction scores of all of the learning in games literature. The probability of behavioural options. The second part is a convex choosing option i in period tC1 is given by

Phil. Trans. R. Soc. B (2008) Analysis on social learning strategies R. McElreath et al. 3523

expðl Ai;tÞ 1.0 PrðijAt; QÞtC1 Z ; expðl A1;tÞ Cexpðl A2;tÞ where Q indicates a vector of all parameters and l is a 0.8 parameter that measures the influence of differences between attraction scores on choice. When lZ0, 0.6 choice is random with respect to attraction scores. As /N l , choice becomes deterministic, in favour of the 0.4 option with the higher attraction score. social learning

(ii) Frequency-dependent social learning 0.2 To model the family of strategies that use the frequency proportion of pay-off-biased of behaviour among group members, we modify the 0 learning model above to cue choice by the frequency of –4–2024 options seen. Attractions are updated as before, but difference in observed mean pay-offs choice is given by the rule Figure 4. The function that determines the reliance on pay- expðl Ai;tÞ PrðijAt; QÞtC1 Z ð1KgÞ off-biased learning, as a function of the observed difference in expðl A ; Þ Cexpðl A ; Þ 1 t 2 t means, p1;t Kp2;t. See the description of the hierarchical f Z n ; means/conformity strategy in the text. Solid curve, d 1/10; Cg i t ; Z f C f dashed curve, d 2. n 1;t n 2;t symmetrical logistic function to model the change in where ni;t is the count of neighbours observed to have chosen option i in period t; g measures the weight of reliance on pay-offs, as the distance between the ; ; social information on choice; and f determines how observed means increases. Let Yðd p1;t p2;tÞ be the nonlinear frequency dependence is. When fZ1, proportion of choice that is driven by individual imitation is unbiased. When fO1, however, more updating, where d is a new parameter that determines common options have exaggerated chances of being how quickly reliance on pay-offs decreases, as the copied, resulting in positive frequency dependence, difference in observed means increases, such as majority rule conformity. When f!1, frequency 2 ð ; ; Þ Z h : dependence is negative, and more commonly observed Y d p1;t p2;t 2 Y 1 Cexpðdðp1;t Kp2;tÞ Þ options are less likely to be copied. Since changes in choice feedback to changes in Figure 4 plots this function for two values of d. The attraction scores, even though this strategy has the same probability of choosing i under the hierarchical attraction updating recursion as individual learning, compare means/individual strategy is reinforcement patterns may be quite different. expðl Ai;tÞ PrðijA ; QÞ C Z ð1KgÞ t t 1 expðl A Þ Cexpðl A Þ (iii) Compare means 1;t 2;t This pay-off-biased strategy attends to neighbours’ p100 C K i;t yields and chooses the option with the highest observed g ð1 YÞ 100 100 p ; Cp ; mean. It uses the choice rule 1 t 2 t ! expðl A ; Þ expðl Ai;tÞ C i t : PrðijA ; QÞ C Z ð1KgÞ Y t t 1 expðl A ; Þ Cexpðl A ; Þ expðl A1;tÞ Cexpðl A2;tÞ 1 t 2 t

100 p ; For similar observed means, the individual learning Cg i t ; 100 C 100 component will dominate the social learning term. p1;t p2;t P Otherwise, the individual will mainly attend to where pi;t Z j pi; j;t=ni;t is the mean pay-off observed for differences in observed means. However, if d is a very option i in period t over all group members j, including large number, then only a very narrow range of very oneself. Raising these average pay-offs to a large power similar observed means will lead to falling back on creates an approximate step function, so that one or the individual updating. other option is favoured by the social component of choice. When one or both options are unobserved in period t, this strategy behaves as individual learning. (v) Hierarchical compare means/frequency-dependent We fix fZ100 in order to force the model to match our social learning theory, i.e. a threshold behaviour. This model is like the previous, but falls back on frequency-dependent social learning, as the difference (iv) Hierarchical compare means/individual learning in observed means increases. This strategy uses the comparison of choice means and expðl Ai;tÞ individual updating, but in a manner different from the PrðijAt;QÞtC1 Zð1KgÞ expðl A1;tÞCexpðl A2;tÞ pure compare means model. Using the distance ! between estimated means as a cue of uncertainty, the 100 f p ; n ; strategy falls back on individual learning (attraction Cg ð1KYÞ i t CY i t : p 100 Cp 100 f C f updating) when the means are similar. We use a 1;t 2;t n1;t n2;t

Phil. Trans. R. Soc. B (2008) 3524 R. McElreath et al. Analysis on social learning strategies

(b) Fitting strategies to data are under consideration. AIC and related approaches The 19 experiment sessions involving 163 participants are becoming increasingly popular in the evolutionary provided 7900 decisions, under full information sciences, because they permit more nuanced questions conditions that might allow us to distinguish between and are not plagued by the same sample size biases frequency-dependent and pay-off-biased social of null hypothesis testing (Johnson & Omland 2003). learning. We fit the above models to these decisions, They also allow for more powerful analysis of producing for each model a negative log likelihood of observational data, collected without precise experi- observing the true data, given the assumption that the mental control. model is true: Klog LðDjx; QÞ for a model x with set of In order to compare the models, each negative log parameters Q, where D is the data, a vector of ‘crop’ likelihood from the fitting exercise is transformed into choices. The likelihood is defined as an AIC: Y AIC ZK2 log ðDjx; PÞ C2k; LðDjx; QÞ Z PrðDj jAtK1; QÞt; x L t Q where k is the number of free parameters in model x. where t indicates the product over all rows t. The We use the common sample-size-adjusted version of usual practice in likelihood estimation, and the practice the above, AICc (Burnham & Anderson 2002), and this we follow here, is to take natural logs of each is what we display in our results: conditional probability and then sum these to find ð C Þ ; 2k k 1 log LðDjx PÞ: AICc;x ZK2 log LðDjx; PÞ C2k C ; X nKk K1 K ; ZK ; : log LðDjx QÞ log PrðDj jAtK1 QÞt where n is the number of observations to be t predicted by the model. The penalty for number of Taking logarthirms first results in greater precision, parameters is not arbitrary—it adjusts precisely for owing to the way most computers handle floating point the expected overfitting that arises whenever free values. The parameters Q are fit via maximum parameters are added. likelihood, and therefore the fitting exercise also yields AIC can be used to select a single ‘best’ model, if an information on the best estimates of flexible com- analyst desires. However, since the ‘true’ model, in all ponents of the learning rules. its detail, is certainly not contained in the set of models We conducted this fitting exercise, as well as the fit to data, it is perhaps a more productive approach to validating simulations, in R and using the helpful treat it as a continuous measure of the degree to which package bbmle (Bolker and based on stats4 by the each model estimates ‘truth’ (Forster & Sober 1994). R Development Core Team 2008; R Development AIC estimates the out-of-sample predictive accuracy of Core Team 2008). All analysis code is available from each model, and one easy way of ranking these the corresponding author. We confirmed via simulation estimates relative to the models in the analysis is by that our analysis could recover true parameter values using Akaike weights (Burnham & Anderson 2002). and strategies, when the true strategy was among the The weight of any model x is given by set of strategies considered. The validation exercise is exp K 1 D helpful, because not all distinct models can be w Z P 2 x ; x K 1 distinguished by all kinds of data (this problem has j exp 2 Dj plagued the individual learning literature, see Camerer where D ZAIC KAIC , the difference between the 2003, ch. 6). x x min AIC of model x and the smallest AIC in the set of compared models. For the best-fitting model with the (c) Comparing models smallest AIC, DZ0.Theseweightsarenumbers We compare the fit of the social learning models using between 0 and 1 that estimate the relative likelihoods Akaike information criteria (Akaike 1974; Burnham & of each model being the best model in the set. Anderson 2002). Unlike null hypothesis testing, A useful way we have found to explain this approach comparing models with Akaike information criteria is to consider a horse race. There are many horses in (AIC—called by Akaike himself simply ‘An infor- each race, and while the fastest horse will not always mation criterion,’ but subsequently renamed by the win, it usually will. If the best horse loses, it should not scientific community), or another information usually lose by much. Thus both the rank of finishes— criterion, allows a researcher to assess the relative which horse was first, second, etc.—and the time explanatory power of any number of different compet- differences in finishes are informative. In the same way, ing and plausible models, without favouring any ‘null’ the true model may not always fit the data best (just as model. AIC is an estimate of the information lost by ‘significant’ p values do not always identify important using any particular model to estimate reality. effects). But it will usually have a high Akaike weight, The advantages of the information theoretic even if not the highest. So, just as a photo finish tells approach over customary null hypothesis testing has you that it is difficult to say, without another race, been discussed for several decades (see citations in which of two horses is faster, when two models have Cohen 1994; Anderson et al. 2000), so we will not very similar Akaike weights, there is uncertainty as to repeat them here. Readers should note, however, that which would make the best out-of-sample predictions. there will be no p values in our presentation. Like many When one model has an Akaike weight much larger statisticians, we do not find much inferential value in than the others, however, we can be confident that it is p-values, especially when multiple plausible models the best of the models considered.

Phil. Trans. R. Soc. B (2008) Analysis on social learning strategies R. McElreath et al. 3525

Table 2. Comparison of social learning models fit to experimental data. AICc indicates the adjusted fit of each model to the entire sequence of choices, for each subject, after accounting for model complexity (number of parameters). Models are ordered from best fit to worst. The Akaike weights estimate the proportion of evidence in favour of each model. (Meanings of parameters: f 2 ½0; 1, strength of attraction updating; lO0, influence of attraction differences of choice; g 2 ½0; 1, weight given to social information; dO0, decline in probability of pay-off-biased imitation, as difference in observed means increases. HCMFD, Hierarchical compare means/frequency dependence; HCMINDIV, Hierarchical compare means/individual learning.)

parameter estimates

Akaike model AICc weight flgd f

HCMFD 6519.59 z1 0.6605 0.1645 0.3365 3.210 1.953 HCMINDIV 6918.89 !0.001 0.4620 0.1917 0.1400 4.982 n.a. compare means 6924.23 !0.001 0.4611 0.1921 0.1239 n.a. n.a. frequency dependent 6929.34 !0.001 0.4998 0.1814 0.1349 n.a. 3.396 individual 7004.25 !0.001 0.4382 0.1866 n.a. n.a. n.a.

Table 2 presents the AICc, Akaike weight and 1.0 parameter estimates for each model, sorted from best to worst fitting model. The bulk of evidence favours model 5, hierarchical compare means/frequency 0.8 dependence. While there is no doubt about hetero- geneity among participants, the strength of this result 0.6 leaves little doubt that any of the simpler strategies accounts for any sizeable fraction of participants. The maximum-likelihood estimate for f,thedegreeof 0.4 positive or negative frequency dependence, is just under 2, indicating mild positive frequency depen- 0.2 dence or conformity (figure 5). The maximum- choice probability of copying likelihood estimate of d (not shown in figure) produces 0 a steep fall-off in reliance on pay-off bias for a distance above approximately 1 unit. We caution that there is 0 0.5 1.0 1.5 2.0 2.5 3.0 uncertainty in these estimates, but emphasize that a frequency of choice model with d fixed to a large value, say 100, does not Figure 5. Maximum-likelihood estimate of strength of produce a better fit, even accounting for the reduction positive frequency dependence for the best-fitting model, of one parameter. hierarchical compare means/frequency dependence. Solid Many readers may wonder what proportion of curve indicates estimated probability of copying a choice, variance in choice is explained by the best-fitting given its frequency in the group. Dashed line indicates same model. As is usual with binomial models, there is no probability under fZ1, unbiased social learning. true equivalent of R2, the proportion of variation explained by the fit model. However, it is possible to 6. DISCUSSION construct an analogue that compares the raw like- We have analysed an interdependent time series of lihoods of each model to a random choice model. A profit-oriented choice behaviour in humans. Our random choice model just chooses randomly at experiment did not precisely control the social each time t. Over 7900 choices, this model will always information available to each participant. Instead, we have a negative log likelihood of K7900!logð1=2ÞZ allowed all social information to arise endogenously, 5475:863. This is a reasonable benchmark for the worst through the actual behaviour and information seeking any model can do, predicting the data. The negative log of participants. While one major tradition in laboratory likelihood of the best-fitting model is 3259.792. There- experiments frowns upon such a design, we consider it fore, an analogous calculation of the variance explained an asset, for two reasons. by any model x is 1KlogðLðDjxÞÞ=logðLðDjrandomÞÞ.In First, if the study of social learning is ever to link our case, 1K3259.792/5475.863Z0.4047. For the the psychological to the population level, statistical second-best model, 1K3459.442/5475.863Z0.3682. techniques that can accommodate observational and These measures do not account for model complexity, noisy data are needed. The model comparison but they do provide a rough guide to additional raw approach we adopt in this paper is general to any set variance explained by the best model. We caution, of strategies a researcher might imagine. Caution is however, that substantial components of choice may needed to ensure that the kind of data available can be truly random, and therefore any behavioural model discriminate among the possible models. But provided will fail to achieve a negative log likelihood of zero. the different models are identifiable in this way, the In cases in which measurement error is possible, as in likelihood-based information criteria can quantify the field studies or data coded from video, measurement relative explanatory power of different hypotheses. error will also make it impossible for even the true These dynamic models can then be reasonably asked model to achieve a negative log likelihood near zero. to produce out-of-system predictions that provide

Phil. Trans. R. Soc. B (2008) 3526 R. McElreath et al. Analysis on social learning strategies another avenue of disconfirmation. By contrast, effect A common reaction, both by ourselves and our sizes from ANOVA cannot reasonably be expected to colleagues, to experiments with students is to be predict out-of-experiment effects, because no genuine sceptical of the generality of the results. True, model of learning is present. university students are a special population that is Second, the emergent population-level conse- likely not typical of the human species. However, no quences of social learning can only be studied where single population will likely be representative of the the experimenter allows them to occur—in settings in human species. That is, every culture and subculture which social information itself is not controlled may be a special case. We think there are serious limits experimentally. This advantage is twofold. Being to how much we can generalize from experiments with able to study population-level effects, such as the students. But we also think that being able to explain emergence of traditions or rates of diffusion, is learning in any case is an advance. Just as studying the important. But in a cultural species, such as humans evolution of beetle larva in the laboratory does not tell and possibly other species, social learning strategies us exactly how evolution works in any other species (or themselves are probably adapted to a cultural even in wild beetles), the clarity of the results does environment (Henrich & McElreath 2007). Thus, it generate insights than can transfer across cases. Our will eventually be difficult to study the functional feeling is that no one should conclude the human design of strategic social learning without appreciating species is just like university students, anymore than the cultural environment it is adapted to. This will one should conclude all insects have the evolutionary be true even (especially) if learning strategies them- dynamics of flour beetles. But nor should one ignore selves are culturally transmitted, because the popu- flour beetles, as if their evolution is not worth lation will exert downward causation on individuals’ explaining. University students are real people with learning strategies. real learning strategies, and being able to model this The major scientific finding of our analysis of the learning is worthwhile. experiment is that our human participants relied It is always possible that another, unconsidered, heavily on pay-off-biased social learning, as predicted strategy is a better description of the social learning by the evolutionary model. We think predictions process. The same weakness is common to all analytical generated by an economic, rather than evolutionary approaches, however, and we caution readers not to model, would make similar predictions, provided social consider this a flaw special to the information criterion information was endogenous to the model. When there and model comparison approach. But despite the is no additional cost to access pay-offs and the strong weight of evidence for this strategy here, we information is subject to no error, as in this experiment, think there is no substitute for replication and the then it is no surprise perhaps that a successful strategy variation of experimental design in order to test the will attend to pay-offs. What might be more counter- robustness of a result. Both our experiments and intuitive is the hierarchical combination of pay-off theoretical analysis are special, like all experiments and frequency biases. The evidence strongly suggests and models. Whatever the source of social learning that our participants used a strategy akin to: (i) Are the strategies—cultural or genetic or (likely) both—the two choices’ pay-offs similar on average? (ii) If yes, strategies we find in our experiments certainly did not which is more common? (iii) If no, which has the higher evolve in the laboratory. And however useful simple average pay-off? evolutionary models are for exploring the logic of It also worth noting that participants did not require population dynamics, they cannot and do not attempt any training time to learn to attend strongly to pay- to replicate reality. We have emphasized the generality offs—they did so from the first period when social of the statistical approach, as it is not tied to any information was present. We make no strong claims particular experiment or set of predictions, but it is about the source of these strategies. Social learning worth noting key assumptions of both the experiments strategies may of course themselves be learned socially, and models. and we have wondered about the effects of this in First, the experimental environment we have used previous experiments (McElreath et al. 2005). Indeed, provides highly accurate (noise free) pay-off infor- there is likely hidden strategic variation among mation, whereas real social environments certainly do participants. Our analysis approach, fitting a single not. In addition, real social environments may provide model at a time to the entire set of data, is a common cues of success, but these cues will often be integrations approach, because rarely do we have enough data on of the contributions of many separate behaviours. For each participant to reliably distinguish differences in example, if someone in your town is healthy, is it a strategy. However, in principle, the statistical methods result of her diet, her religion or her close bonds with here do not require one to conduct the analysis this kin? This integrated nature of cues of success means way. Each participant can be analysed separately, or a that people may copy many traits from successful or series of fixed effects parameters can be used to prestigious individuals, with potentially important statistically model individual differences. In the analysis effects on cultural dynamics (Boyd & Richerson here, the overwhelming support of the best model 1985; Henrich & Gil-White 2001). Relevant to our implies little strategic variation that could be detected experimental results and the prediction of the model by the considered models. However, we do not think that pay-off bias would dominate, it may be that the this means all participants used the same strategy, clear advantage of pay-off bias depends upon the ability merely that we have not modelled the kind of to know that any cue of success arises from a particular differences that exist. behaviour. If not, other forms of social learning may be

Phil. Trans. R. Soc. B (2008) Analysis on social learning strategies R. McElreath et al. 3527 more competitive. Some of our ongoing experiments variation. In Social learning: a psychological and biological explore this consideration. approach (eds T. Zentall & B. G. Galef ), pp. 29–48. Second, our evolutionary analysis is built upon a Hillsdale, NJ: Lawrence Erlbaum Associates. number of existing models of the evolution of social Boyd, R. & Richerson, P. J. 1992 How microevolutionary learning (Boyd & Richerson 1988, 1995, 1996; Rogers processes give rise to history. In Evolution and history (ed. 1988; McElreath & Strimling 2008). By doing so, it is M. Niteki), pp. 149–178. Chicago, IL: University of comparable to these models, but also considers a fairly Chicago Press. Boyd, R. & Richerson, P. 1995 Why does culture increase special life history. In all of these models, generations human adaptability? Ethol. Sociobiol. 16, 125–143. (doi:10. are barely overlapping: adults survive only long enough 1016/0162-3095(94)00073-G) to be imitated. Grandparents never survive to be Boyd, R. & Richerson, P. J. 1996 Why culture is common, but imitated. There is no population structure, including cultural evolution is rare. In Evolution of social behaviour within the biological family, and therefore any effects of patterns in primates and man (eds W. G. Runciman & gene-culture covariation are ignored (see however R. I. M. D. John Maynard Smith) Proc. British Academy, McElreath & Strimling 2008). While this kind of pp. 77–93. Oxford, UK: Oxford University Press. model provides perhaps the purest evaluation of the Boyd, R. & Richerson, P. J. 2002 Group beneficial norms logic and economics of social learning strategies, actual spread rapidly in a structured population. J. Theor. Biol. strategies may have evolved (culturally or genetically) 215, 287–296. (doi:10.1006/jtbi.2001.2515) under rather special conditions or in order to exploit Burnham, K. & Anderson, D. 2002 Model selection and multi- overlapping generations. If so, the inferences derived model inference: a practical information–theoretic approach. from these models will be misleading. How they will be Berlin, Germany: Springer. misleading is hard to say, until more social learning Camerer, C. 2003 Behavioral game theory: experiments in theory exploring population structure and overlapping strategic interaction. The Roundtable Series in Behavioral generations appears. Economics. Princeton, NJ: Princeton University Press. This lacuna of theory aside, the existing evolutionary Camerer, C. F. & Ho, T. 1999 Experience-weighted attraction learning in normal form games. Econometrica literature is sufficient to motivate the search for positive 67, 827–874. (doi:10.1111/1468-0262.00054) frequency dependence and kinds of pay-off bias in Cavalli-Sforza, L. L. & Feldman, M. W. 1981 Cultural other apes, if not crows, whales and rats. In the search transmission and evolution: a quantitative approach. Mono- for the psychological differences that make human graphs in Population Biology, vol. 16. Princeton, NJ: cultural evolution qualitatively different from that of Princeton University Press. [L. L. Cavalli-Sforza and other animals, the existence of frequency-dependent M. W. Feldman. ill.; 23 cm. Includes index.] and refined pay-off bias is often ignored. For example, Cohen, J. 1994 The earth is round ( p!0.05). Am. Psychol. experiments in which apes see three ape demonstrators 49, 997–1000. (doi:10.1037/0003-066X.49.12.997) access food through a two-action problem, with two Efferson, C., Richerson, P. J., McElreath, R., Lubell, M., demonstrators performing one action and the third Edsten, E., Waring, T. M., Paciotti, B. & Baum, W. 2007 another, will produce data that can estimate the Learning, productivity, and noise: an experimental study magnitude of positive frequency dependence. of cultural transmission on the bolivian altiplano. Evol. Hum. Behav. 28, 11–17. (doi:10.1016/j.evolhumbehav. This research was funded by the National Science 2006.05.005) Foundation. Estlund, D. M. 1994 Opinion leaders, independence, and condorcet’s jury theorem. Theory Decision 36, 131–162. [In English.] (doi:10.1007/BF01079210) REFERENCES Forster, M. R. & Sober, E. 1994 How to tell when simpler, Akaike, H. 1974 A new look at the statistical model more unified, or less ad hoc theories will provide more identification. IEEE Trans. Autom. Control 19, 716–723. accurate predictions. Br. J. Phil. Sci. 45, 35. (doi:10.1109/TAC.1974.1100705) Fragaszy, D. & Perry, S. (eds) 2003 The biology of traditions: Anderson, D. R., Burnham, K. P. & Thompson, W. L. 2000 models and evidence. Cambridge, UK: Cambridge Null hypothesis testing: problems, prevalence, and an University Press. alternative. J. Wildl. Manage. 64, 912–923. (doi:10.2307/ Frank, S. A. 2006 Social selection. In Evolutionary genetics: 3803199) concepts and case studies (eds C. W. Fox & J. B. Wolf ), Baum, W. M., Richerson, P. J., Efferson, C. M. & Paciotti, pp. 350–363. Oxford, UK: Oxford University Press. B. M. 2004 Cultural evolution in laboratory microsoci- Galef, B. G. & Whiskin, E. 1997 Effects of social and asocial eties including traditions of rule giving and rule following. learning on longevity of food-preference traditions. Anim. Evol. Hum. Behav. 25, 305–326. (doi:10.1016/j.evolhum- Behav. 53, 1313–1322. (doi:10.1006/anbe.1996.0366) behav.2004.05.003) Boesch, C. 2003 Is culture a golden barrier between human Henrich, J. 2001 Cultural transmission and the diffusion of and chimpanzee? Evol. Anthropol. 12, 82–91. (doi:10. innovations: adoption dynamics indicate that biased 1002/evan.10106) cultural transmission is the predominate force in beha- Boesch, C. & Tomasello, M. 1998 Chimpanzee and human vioral change. Am. Anthropol. 103, 992–1013. (doi:10. culture. Curr. Anthropol. 39, 591–604. (doi:10.1086/ 1525/aa.2001.103.4.992) 204785) Henrich, J. & Boyd, R. 1998 The evolution of conformist Bolker, B. and based on stats4 by the R Development Core transmission and between-group differences. Evol. Hum. Team 2008 bbmle: tools for general maximum likelihood Behav. 19, 215–242. (doi:10.1016/S1090-5138(98) estimation. R package v. 0.8.5. 00018-X) Boyd, R. & Richerson, P. J. 1985 Culture and the evolutionary Henrich, J. & Gil-White, F. 2001 The evolution of prestige: process. Chicago, IL: University of Chicago Press. freely conferred status as a mechanism for enhancing the Boyd, R. & Richerson, P. 1988 An evolutionary model of benefits of cultural transmission. Evol. Hum. Behav. 22, social learning: The effects of spatial and temporal 1–32. (doi:10.1016/S1090-5138(00)00071-4)

Phil. Trans. R. Soc. B (2008) 3528 R. McElreath et al. Analysis on social learning strategies

Henrich, J. & McElreath, R. 2003 The evolution of cultural Richerson, P. J. & Boyd, R. 2005 Not by genes alone: how evolution. Evol. Anthropol. 12, 123–135. (doi:10.1002/ culture transformed human biology. Chicago, IL: University evan.10110) of Chicago Press. Henrich, J. & McElreath, R. 2007 Dual inheritance theory: Rogers, A. R. 1988 Does biology constrain culture? Am. the evolution of human cultural capacities and cultural Anthropol. 90, 819–831. (doi:10.1525/aa.1988.90.4.02a evolution. In Oxford handbook of evolutionary psychology 00030) (eds R. Dunbar & L. Barrett), pp. 555–570. Oxford, UK: Schlag, K. H. 1998 Why imitate, and if so, how? J. Econ. Oxford University Press. Theory 78, 130–156. (doi:10.1006/jeth.1997.2347) Horner, V. & Whiten, A. 2005 Causal knowledge Schlag, K. H. 1999 Which one should I imitate? and imitation/emulation switching in chimpanzees (Pan J. Math. Econ. 31, 493–522. (doi:10.1016/S0304-4068 troglodytes) and children (Homo sapiens). Anim. Cognit. 8, (97)00068-2) 164–181. (doi:10.1007/s10071-004-0239-6) Schotter, A. & Sopher, B. 2003 Social learning and Johnson, J. B. & Omland, K. S. 2003 Model selection in coordination conventions in intergenerational games: an ecology and evolution. Trends Ecol. Evol. 19, 101–108. experimental study. J. Polit. Econ. 111, 498–529. (doi:10. (doi:10.1016/j.tree.2003.10.013) 1086/374187) Kokko, H., Jennions, M. & Brooks, R. 2006 Unifying and testing Stahl, D. O. 2000 Rule learning in symmetric normal-form models of sexual selection. Annu. Rev. Ecol. Evol. Syst. 37, games: theory and evidence. Games Econ. Behav. 32, 43–66. (doi:10.1146/annurev.ecolsys.37.091305.110259) 105–138. (doi:10.1006/game.1999.0754) Laland, K. N. 2004 Social learning strategies. Learn. Behav. Whiten, A. & Ham, R. 1992 On the nature and evolution 32, 4–14. of imitation in the animal kingdom: reappraisal of a Laland, K. N. & Janik, V. M. 2006 The animal cultures century of research. In On the nature and evolution of debate. Trends Ecol. Evol. 21, 544–547. (doi:10.1016/ imitation in the animal kingdom: reappraisal of a century of j.tree.2006.06.005) research (eds P. Slater, J. Rosenblatt, C. Beer & McElreath, R. & Boyd, R. 2007 Mathematical models of social M. Milinski), pp. 239–283. New York, NY: Academic. evolution: a guide for the perplexed. Chicago, IL: University Whiten, A., Goodall, J., McGrew, W., Nishida, T., Reynolds, of Chicago Press. V., Sugiyama, Y., Tutin, C., Wrangham, R. & Boesch, C. McElreath, R. & Strimling, P. 2008 When natural selection 1999 Cultures in chimpanzees. Nature 399, 682–685. favors imitation of parents. Curr. Anthropol. 49, 307–316. (doi:10.1038/21415) (doi:10.1086/524364) Whiten, A., Horner, V., Litchfield, C. & Marshall-Pescini, S. McElreath, R., Lubell, M., Richerson, P. J., Waring, T. M., 2004 How do apes ape? Learn. Behav. 32, 36–52. Baum, W., Edsten, E., Efferson, C. & Paciotti, B. 2005 Whiten, A., Horner, V. & de Waal, F. 2005 Conformity to Applying evolutionary models to the laboratory study of cultural norms of tool use in chimpanzees. Nature 437, social learning. Evol. Hum. Behav. 26, 483–508. (doi:10. 737–740. (doi:10.1038/nature04047) 1016/j.evolhumbehav.2005.04.003) Whiten, A., Spiteri, A., Horner, V., Bonnie, K. E., Lambeth, Payne, J. W., Bettman, J. R. & Johnson, E. J. 1993 The S. P., Schapiro, S. J. & de Waal, F. B. M. 2007 adaptive decision maker. Cambridge, UK: Cambridge Transmission of multiple traditions within and between University Press. chimpanzee groups. Curr. Biol. 17, 1038–1043. (doi:10. R Development Core Team 2008 R: a language and 1016/j.cub.2007.05.031) environment for statistical computing. Vienna, Austria: R Zajonc, R. B. 1965 Social facilitation. Science 149, 269–274. Foundation for Statistical Computing. (doi:10.1126/science.149.3681.269)

Phil. Trans. R. Soc. B (2008)