Arxiv:1305.4354V1 [Q-Bio.PE] 19 May 2013

Natural selection. VII. History and interpretation of kin selection theory

Steven A. Frank∗ Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697–2525 USA

Kin selection theory is a kind of causal analysis. The initial form of kin selection ascribed cause to costs, benefits, and genetic relatedness. The theory then slowly developed a deeper and more sophisticated approach to partitioning the causes of social evolution. Controversy followed because causal analysis inevitably attracts opposing views. It is always possible to separate total effects into different component causes. Alternative causal schemes emphasize different aspects of a problem, reflecting the distinct goals, interests, and biases of different perspectives. For example, group selection is a particular causal scheme with certain advantages and significant limitations. Ultimately, to use kin selection theory to analyze natural patterns and to understand the history of debates over different approaches, one must follow the underlying history of causal analysis. This article describes the history of kin selection theory, with emphasis on how the causal perspective improved through the study of key patterns of natural history, such as dispersal and sex ratio, and through a unified approach to demographic and social processes. Independent historical developments in the multivariate analysis of quantitative traits merged with the causal analysis of social evolution by kin selectionabc.

As is often the case, once a topic has become (see Box 2). I describe the history from my personal per- in vogue, its name ceases to have meaning ... spective. Because I worked actively on the subject over [1]. several decades, I perceive the history by the ways in which my own understanding changed over time. Box 3 highlights other perspectives and key citations. INTRODUCTION

Ideas are embedded in their history and language. EARLY HISTORY Hamilton’s [2] theories of inclusive fitness and kin selection are good examples. As understanding deepened, According to Darwin [3], natural selection favors traits the original ideas transformed into broader concepts of that enhance individual reproduction. Puzzles arise selection and evolutionary process. With that general- when traits reduce individual reproduction while provid- ization, the initial language that remains associated with ing aid to others. In this context, Fisher [4] discussed the the topic has become distorted. The confused language problem of warning coloration in mimicry: and haphazard use of incorrect historical context have [D]istastefulness ... is obviously capable of led to significant misunderstanding and meaningless ar- giving protection to the species as a whole, gument. through its effect upon the instinctive or ac- Current understanding transcends the initial interpre- quired responses of predators, yet since any tation of ‘kin selection,’ which attaches to some notion individual tasted would seem almost bound of similarity by descent from a recent common ancestor. to perish, it is difficult to perceive how indi- The other candidate phrases, such as ‘inclusive fitness’ vidual increments of the distasteful quality, or ‘group selection,’ also have problems. We are left with beyond the average level of the species, could a topic that derives from those antecedent notions and confer any individual advantage. clearly has useful application to those biological puzzles.

arXiv:1305.4354v1 [q-bio.PE] 19 May 2013 At the same time, the modern understanding of altruism Fisher then gave one possible solution: connects to the analysis of selection on multiple charac- [W]ith gregarious larvae the eﬀect will cer- ters, to interactions between diﬀerent species, and to the tainly be to give the increased protection es- broadest generalizations of the theory of natural selec- pecially to one particular group of larvae, tion. probably brothers and sisters of the individ- A good scholarly history of kin selection and its descen- ual attacked. The selective potency of the dants has yet to be written. Here, I give a rough histor- avoidance of brothers will of course be only ical outline in the form of a nonmathematical narrative half as great as if the individual itself were protected; against this is to be set the fact that it applies to the whole of a possibly numerous brood. ... The ideal of heroism has ∗ email: [email protected]; homepage: http://stevefrank.org a doi: 10.1111/jeb.12131 in J. Evolutionary Biology been developed among such peoples consider- b Reprint at http://stevefrank.org/reprints-pdf/13NS07.pdf ably beyond the optimum of personal advan- c Part of the Topics in Natural Selection series. See Box 1. tage, and its evolution is only to be explained, 2

in terms of known causes, by the advantage which it confers, by repute and prestige, upon Box 1. Topics in the theory of natural selection the kindred of the hero. This article is part of a series on natural selection. Although Haldane [5] restated Fisher’s argument. Williams and the theory of natural selection is simple, it remains endlessly Williams [6] presented a specific and limited analysis, contentious and difficult to apply. My goal is to make more ac- which they applied to altruism in social insects. Some cessible the concepts that are so important, yet either mostly rather vague models considered more diffuse forms of unknown or widely misunderstood. I write in a nontechni- cal style, showing the key equations and results rather than genetic similarity created by population structuring of providing full derivations or discussions of mathematical prob- groups [7, 8]. These analyses hinted at a more general lems. Boxes list technical issues and brief summaries of the concept. However, none of them expressed the deeper literature. principle in a clear and convincing way.

Box 2. Scope Hamilton’s rule

Most research in biology is empirical, yet em- Hamilton [9, 10, 11] initiated modern approaches by pirical studies rely fundamentally on theoretical expressing a rule for the increase in an altruistic behavior work for generating testable predictions and interpreting observations. Despite this interdepen- rB − C > 0. (1) dence, many empirical studies build largely on other empirical studies with little direct reference Here, C is the cost to an actor for performing the al- to relevant theory, suggesting a failure of commu- truistic behavior, B is the benefit gained by a recipient nication that may hinder scientific progress. ... of the altruistic act, and r measures the relatedness be- The density of equations in an article has a sig- tween actor and recipient. The idea is that the actor pays nificant negative impact on citation rates, with a personal cost in reduced reproduction for helping an- papers receiving 28% fewer citations overall for other individual, the recipient gains increased reproduc- each additional equation per page in the main tion from the altruistic act, and the relatedness translates text. Long, equation-dense papers tend to be the recipient’s enhanced reproduction back to the actor’s more frequently cited by other theoretical papers, valuation of that extra reproduction. When the benefit but this increase is outweighed by a sharp drop in citations from nontheoretical papers [12]. back to the actor, rB, is greater than the actor’s direct cost, C, then the behavior is favored by natural selection. This article contains no equations beyond a few summary Hamilton figured out how to measure r by analyzing expressions. I do not write for other theoreticians. I do not population genetic models. With the methods Hamilton attempt to be comprehensive. Rather, I try to evoke some used in those early papers, he was only able to give a lines of thought that I believe will be helpful to scientists who rough description for the proper measure for relatedness. want to know about the theory. A rough idea about the theory aids empirical study. It also His initial expression in terms of the genetic correlation helps to cope with the onslaught of theoretical articles. Those between actor and recipient was in the right direction theoretical articles often claim to shift the proper framing of and was later refined by Hamilton [2]. fundamental issues. The literature never seems to come to a Hamilton also defined a new and more general notion consensus. of fitness, which he called ‘inclusive fitness.’ Instead of Here, I attempt to translate a few of the key points into counting the number of offspring by an individual, in- nonmathematical summaries. Such translation necessarily clusive fitness makes a more extensive calculation of how loses essential components of understanding. Yet it seems phenotypes influence the transmission of genes from one worthwhile to express the main issues in way that can be un- generation to the next. In the inclusive fitness interpre- derstood by a wider audience. Refer to my earlier work for tation, eqn 1 describes the changes in direct and indirect mathematical aspects of the theory and for citations to the technical literature [13–19]. reproduction associated with each change in behavior. Thus, the inclusive fitness of a particular behavioral act is the indirect reproductive gain through the recipient, B, multiplied by the relatedness, r, minus the loss in di- of all genetic consequences to the original causal behavior rect reproduction, C. The relatedness, r, measures the as the key to his approach. The proper way of doing the genetic discount of substituting the reproduction of the calculation was the major theoretical advance. Hamilton recipient in place of the reproduction of the actor. [2] emphasized this causal perspective: All of the direct and indirect fitness changes are Considerations of genetical kinship can give a assigned to the actor. This assignment of the different statistical reassociation of the [fitness] effects pathways of genetic transmission to the actor associates with the individuals that cause them. all fitness changes with the behavior that caused those changes. Hamilton viewed this inclusive assignment Hamilton’s 1970 paper also greatly advanced the analysis by using Price’s equation [20]. With this method, Hamil- 3 ton established the correct measure for genetic similarity, r, between actor and recipient. That measure turned out Box 3. Literature to be the regression coefficient of the recipient’s geno- For each topic related to kin selection theory, I list a small type in relation to the actor’s genotype. That regression sample of key articles and reviews. This limited space does is the value needed to establish a proper measure of in- not allow comprehensive coverage or commentary on the par- clusive fitness. Hamilton noted that the regression was ticular articles, but should provide an entry into the extensive the exact measure only in his particular models. He ar- literature and the range of opinions. gued that the regression measure would extend, at least Several reviews follow Hamilton’s perspective [21, 23–30]. approximately, to a wide variety of other assumptions. An associated literature emphasizes the problem of sociality Roughly speaking, r translates the actor’s deviation in and sterile castes in insects, with additional commentary on gene frequency from the population average to the recip- general aspects of the theory [31–39]. ient’s deviation in gene frequency from the population Kin selection theory has been applied to a wide range average. Thus, recipient’s reproduction has consequence of biological problems. Here, I can list only a few general overviews. Those overviews give a sense of the scope but do for gene frequency change that is r multiplied by what not include many significant applications [40–47]. the same fitness increment in the actor would cause to The strongest criticisms arose from population genetics. gene frequency change [15, 21, p. 49]. The main issues concern how the specifics of genetics can vary from case to case and alter the outcome of selection, and how the full analysis of dynamics may provide an essential, deeper Comparison of different approaches perspective on evolutionary process [48–52]. Kin selection theory has a long association with debates about units and levels of selection. I give a very short listing, Already by 1975, different lines of thought on altru- because that topic is beyond my scope [53–56]. The related ism had developed. Hamilton [22, pp. 336–337] gave his topic concerning group selection does fall within my scope opinion: [57–71]. The merging of kin selection theory with quantitative ge- The usefulness of the ‘inclusive fitness’ ap- netics and multivariate analyses of selection follows various proach to social behaviour (i.e. an approach lines of development [72–78]. Advanced aspects of the the- using criteria like [rB − C > 0]) is that it is ory and new directions of theoretical development continue more general than the ‘group selection’, ‘kin to appear [79–83]. selection’, or ‘reciprocal altruism’ approaches and so provides an overview even where regression coefficients [r] and fitness effects [B In essence, I think there is almost no disagreement and C] are not easy to estimate or specify. about how evolutionary process shapes biological characters. No matter the perspective, when faced with the For Hamilton, his rule was a way to clarify biological un- same biological problem, all of the different approaches derstanding and to develop qualitative hypotheses about usually arrive at roughly the same predictions about the causes of adaptation. Hamilton did not think of his how evolution shapes characteristics. Yet, in spite of rule as replacing the way in which one did calculations that agreement, the arguments persist about whether one for models of population genetics. Indeed, he always con- should call the underlying process ‘group selection’ or sidered the classical population genetics theory as the ‘kin selection’ or ‘inclusive fitness’ or ‘population genet- primary truth. He then evaluated his own methods in ics’ or whatever else is being promoted. light of how well they could capture, in a simple way, the Clearly we need to understand more than just the pre- complexities of the underlying genetic models. Following dicted outcomes: we must also understand the underly- that historical line of thought, the subject subsequently ing causal processes. So something is at stake here. But split into two lineages. what exactly? The best way to understand that question On the biological side, Hamilton’s perspective com- is through the historical development of the subject. So pletely changed the way people approach a great variety let us continue with Hamilton’s 1975 article and subse- of key problems, ranging from social insects to parasite quent work. virulence to bacterial competition to the evolutionary his- Almost everything that one would reasonably want to tory of “individuals” to the historical tendency for an in- say about group selection in relation to inclusive fitness crease in biological complexity. Almost everyone agrees or kin selection is in the following quotes from Hamil- that this was a revolutionary change in biological thought ton [22]. I quote in full because there has been much and that it derived from Hamilton’s work. controversy and misunderstanding about these issues. It At the same time, heated debates arose about theoret- helps to read Hamilton’s perspective, given long before ical interpretations and mathematical details. We see in the current participants in the debates fully developed Hamilton’s 1975 quote the different competing phrases their views on the subject. and associated theoretical perspectives. The ongoing debates have grown ever more fierce rather than settling As against ‘group selection’ it [inclusive fit- out to a common perspective. ness] provides a useful conceptual tool where 4

no grouping is apparent—for example, it can ..., ‘relatedness’ or ‘low migration’ (which is deal with an ungrouped viscous population often the cause of relatedness in groups), or where, owing to restricted migration, an indi- else ‘assortation’, as appropriate. The term vidual’s normal neighbours and interactants ‘kin selection’ appeals most where pedigrees tend to be his genetical kindred. tend to be unbounded and interwoven, as is so often the case with humans. In other words, inclusive fitness is more general. Group selection is just a case in which the positive association, The point is that the different labels serve only to help r, arises from clearly defined aspects of groups. In cases identify the cause of association, r. The underlying evo- for which groups are not easily delineated, the same un- lutionary process should be understood with respect to derlying inclusive fitness approach still holds. Continuing rB − C > 0, even when it is difficult in practice to calculate directly the different terms in Hamilton’s inequality. Because of the way it was first explained, Hamilton favored analyzing complex biological problems the approach using inclusive fitness has of- with population genetic models, then interpreting those ten been identified with ‘kin selection’ and models in terms of the simple causal framework captured presented strictly as an alternative to ‘group by the r, B and C components of his rule. selection’ as a way of establishing altruistic social behaviour by natural selection [53, 84]. But the foregoing discussion shows that kin- USING HAMILTON’S THEORY TO SOLVE ship should be considered just one way of PROBLEMS getting positive regression [r] of genotype in the recipient, and that it is this positive regression that is vitally necessary for altru- Limitation of Hamilton’s theory in practical applications ism. Thus the inclusive-fitness concept is more general than ‘kin selection’. Hamilton [85] used his rule to develop various quali- Hamilton preferred to reserve ‘kin selection’ for cases in tative hypotheses about social insect evolution. When which the positive regression, r, comes from interactions Hamilton analyzed kin interactions in quantitative mod- between individuals that we would commonly describe els of sex ratios [86] and dispersal [87], he first made by terms of kinship, such as cousins. I will later argue his mathematical calculations with genetic or game the- against Hamilton’s use of ‘inclusive fitness’ as the ulti- ory models, then made post hoc interpretations of the mate causal view. I will end up using ‘kin selection’ as quantitative results in terms of interactions between kin. the label for a wide variety of processes, because of the Hamilton never used inclusive fitness theory or Hamil- lack of a better alternative. But in 1975, Hamilton’s view ton’s rule to solve for the quantitative phenotype favored made sense. For now, it is useful to read what Hamilton by selection. Hamilton [22, p. 337] noted that said to understand his perspective on the different fram- ings for causal process. Continuing Although correlation between interactants is necessary if altruism is to receive positive se- Haldane’s [[7]] suggestion about tribe- lection, it may well be that trying to find re- splitting can be seen in one light as a way of gression coefficients is not the best analytical increasing intergroup variance and in another approach to a particular model. Indeed, the as a way of getting positive regression in problem of formulating them exactly for sex- the population as a whole by having the ual models proves difficult (Chapter 2). One groups which happen to have most altruists recent model that makes more frequent group divide most frequently. In this case the extinction the penalty for selfishness (or lack altruists are helping true relatives. But in of altruism) has achieved rigorous and strik- the assortative-settling model it obviously ing conclusions without reference to regres- makes no difference if altruists settle with sion or relatedness [88]. But reassuringly the altruists because they are related (perhaps conclusions of both this and another similar never having parted from them) or because model [89] are of the general kind that consid- they recognize fellow altruists as such, or eration of regression leads us to expect. The settle together because of some pleiotropic regression is due to relatedness in these cases, effect of the gene on habitat preference. If we but classified by approach these were the first insist that group selection is different from working models of group selection. kin selection the term should be restricted to situations of assortation definitely not involv- In 1979, I took Hamilton’s graduate seminar at the ing kin. But it seems on the whole preferable University of Michigan. I inherited Hamilton’s interest to retain a more flexible use of terms; to use in fig wasp sex ratios and the idea that one could develop group selection where groups are clearly in models of kin interactions and sex ratios using the Price evidence and to qualify with mention of ‘kin’ equation. At that time, Hamilton was losing interest in 5 working on such problems, and I was left with the last search for mates in other figs. Hamilton guessed that seeds of his insight on this subject. As I pursued my the dispersers must often die before finding another fig empirical studies of fig wasp sex ratios, I also tried to containing potential mates. With such an extreme cost learn how one could develop more realistic models of sex of dispersal, why would an individual develop to disperse ratios with complex kin interactions. rather than stay and try to mate locally? Hamilton and May [87] realized that competition between kin may explain dispersal even when the cost is Using kin selection theory to analyze models of kin very high. A male that stays and outcompetes brothers interactions replaces the brother’s sperm with his genetically similar sperm, causing little gain in the net success of their At first, I had Hamilton’s doubts about using kin se- shared genotype. By contrast, any success of a male dis- lection theory directly to solve problems. Instead, the perser against nonrelated males provides full benefit to method in those days was to solve the problem with pop- the fitness gain of his genotype. ulation genetics, and then try to interpret the resulting Although Hamilton and May [87] recognized the kin se- predictions in terms of kin interactions. The biological in- lection processes involved, they did not use kin selection teractions from my empirical studies led to horrendously theory as a method to analyze the problem. Instead, they complex population genetic analyses [90]. With great ef- formulated a simple ecological model and then solved for fort, I could solve some of the problems. I repeatedly the evolutionarily stable strategy (ESS). After obtaining found that the post hoc interpretations in terms of kin the result, they then gave a post hoc interpretation in selection were simple and easy to understand. For ex- terms of kin interactions. ample, the value to a mother of an extra son is devalued In 1986, I solved this problem in a much simpler and by the mother’s genetic relatedness to the competitors of more general way by using kin selection theory as a her sons [91–93]. method of analysis [95]. This history of the kin selection This underlying simplicity in the exact results of popu- approach sets the context for the modern understanding lation genetics led me to try Hamilton’s suggestion about of the theory. The following is a slightly modified sum- using the Price equation to model sex ratios. Eventually, mary from Frank [15, Section 7.2]. I found a simple Price equation method to get the same results as the complex genetic approach, when analyzing commonly used assumptions. The Price equation method Hamilton & May’s ESS model also gave results that were much more general than the population genetic methods. In a Price equation analy- Hamilton and May [87] assumed that a habitat has a sis, it was easy to follow the causal processes during the large number of discrete sites. In each year, the parents derivation, and it was easy to interpret the final results die after producing offspring. Each offspring has a trait in terms of the biology. The method was like reading that determines the probability, d, that it disperses from sentences from a book in which the biological processes its natal patch. Those that stay at home, with probabil- of competition, cooperation, and kin interactions were ity 1 − d, compete for one of N available breeding sites. written in the clearest and most direct manner. Dispersers die with probability c, and with probability The generality of my solutions, derived directly in 1 − c they find a patch in which to compete for breed- terms of kin interactions, improved through a series of ing. All sites are occupied in the simple model discussed papers on sex ratios and dispersal [91–96]. At first, it here. Hamilton & May analyzed the case in which one did not make sense to twist the results for those biolog- breeding site (N = 1) is available in each patch. In an ical applications into a form that looked like Hamilton’s asexual model, Hamilton & May found the ESS dispersal rule, rB − C > 0. The results did not appear in terms fraction, d, to be of simple costs and benefits. Consequently, I gave little thought to Hamilton’s rule, and instead thought only of 1 how kin interactions shaped the evolution of interesting d∗ = , (2) 1 + c biological characters. in which c, the cost of dispersal, varies from just above zero on up to nearly one. Interestingly, even if the cost of SOLVING PROBLEMS: DISPERSAL EXAMPLE dispersal is high and the chance of surviving the dispersal phase is low, nearly half of the offspring still disperse. A model of dispersal illustrates how the application In a second model, Hamilton & May analyzed dispersal of kin selection theory changed through the 1970s and in a sexual species. Sexual reproduction raised analyti- 1980s. In the 1970s, Hamilton [97, 98] became inter- cal difficulties, because some inbreeding is likely if mating ested in wing polymorphisms among insects. For exam- takes place within the local patch. Their method of anal- ple, some of the parasitic wasps that live in figs have two ysis did not easily handle inbreeding, so they assumed male morphs. A wingless male stays within its natal fig that all males disperse before mating, and a fraction d of to compete for nearby mates. A winged male leaves to females disperse. With that setup, they found that the 6

ESS dispersal fraction for females is the phenotype is small, that local maximum is an ESS. 1 − 2c d∗ = , (3) 1 − 2c2 Box 4. Fitness expression for dispersal in which the dispersal rate is zero when c ≥ 1/2. Hamil- ton & May understood the role of kin selection. The In the dispersal model, fitness w depends on the dispersal dispersal rate is lower in the sexual compared with the phenotypes at the three different scales: a focal individual, asexual model because the genetic relatedness of com- d; the average dispersal probability in the focal individual’s ¯ petitors within the natal patch was less in the sexual patch, dp; and the population average, d, yielding model. Less kin competition reduces the benefit of dis- w d, d , d¯ = (1 − d)p (d ) + d(1 − c)p d¯ . (5) persing to avoid competition with kin, and so lowers the p p dispersal rate favored by selection. Beyond that intu- When an individual remains at home, with probability 1 − d, itive post hoc interpretation, kin selection theory played its expected success, p (dp), depends on the average dispersal no role in the analysis. fraction in its patch, dp. When an individual disperses, with probability d, its probability of landing in a new patch is 1−c, and its expected success in the new patch, p d¯, depends on Motro’s population genetic analysis the average dispersal probability in the population, d¯. The expected success expressions, p (dp) and p d¯ , can be written as Motro [99, 100, 101] analyzed a fully dynamic popula- 1 p(α) = , tion genetic model for the same problem. His complex 1 − α + d¯(1 − c) analysis, spread over three articles, covered the same bio- in which one can use either α ≡ dp or α ≡ d¯. The denominator logical assumptions as Hamilton & May, with a few minor is proportional to the number of competitors for breeding in extensions. The length and complexity of Motro’s work a patch, and so the reciprocal is proportional to the expected arose from the detailed population genetic analysis, as success per individual in that patch. opposed Hamilton & May’s relatively simple ESS meth- When analyzing the fitness maximum with respect to phe- ods. notype, one takes the derivative of w in eqn 5 with respect to Motro found that, in an asexual model, his result d. That differentiation leads to terms in which one has the agreed with Hamilton & May’s expression given in eqn 2. derivative of dp with respect d. Such a term would require Motro obtained two additional results. First, in a sexual specifying how the average phenotype of neighbors in a natal model in which the mother controls the dispersal trait of patch, dp, changes with respect to the phenotype of a focal individual, d. With kin interactions, that relationship could be offspring, the same result arises as for the asexual model, complex, because the focal individual’s phenotype, d, would as in eqn 2. By contrast, when Motro tried to match the be correlated with the average phenotype of its neighbors, dp, assumptions of Hamilton & May for offspring control of through the genetic similarity among neighbors. phenotype, he obtained The lack of clarity about the slope of neighbor (or group) phenotype with respect to the focal individual’s phenotype 1 − 4c d∗ = , (4) initially led to the abandonment of the simple maximization 1 − 4c2 method for ESS analysis when genetic relatives interacted. In 1986, when I first analyzed this dispersal problem, I pub- which differs from Hamilton & May’s result in eqn 3. lished a Price equation method of analysis. I also began to see Motro drew two conclusions. First, in sexual models, the that the simple ESS maximization method worked when one equilibrium depends on whether offspring phenotype is simply replaced the derivative of neighbor phenotype on actor controlled by the mother or by the offspring. Second, phenotype by the coefficient of relatedness. One could see the under offspring control, explicit population genetic mod- equivalence by matching up terms in a Price equation analysis els failed to confirm Hamilton & May’s result. Motro at- with what one obtained by differentiating fitness with respect tributed the mismatch to a failure of the simplified ESS to phenotype and expanding out the terms. However, in the method when compared with his exact and rigorous pop- 1980s, I was not certain about how to justify using the simple maximization approach, so I published only Price equation ulation genetic techniques. analyses. Later, in 1996, with the help of Peter Taylor’s deep understanding and elegant analysis, we published the general maximization method [103]. Analysis by kin selection

I analyzed this dispersal problem by kin selection in At the time, it was generally thought that this ESS max- order to understand the different results of Hamilton & imization method would not work with kin interactions, May and Motro [95]. Box 4 shows the expression for and so was not used to analyze problems of kin selection fitness and some technical details. (Box 4). Following Maynard Smith [102], the standard ESS approach is to find a local maximum of fitness with respect to phenotype. When the amount of genetic variation for I compared my Price equation analysis for the change 7 in fitness with the terms obtained by an ESS maximiza- With the kin selection model, we are not limited to tion approach. The potentially difficult term in the ESS one breeding site per patch, or to an outbreeding sys- analysis is the slope (derivative) of average group phe- tem. Rather, we can treat r as a parameter and express notype with respect to individual phenotype. That slope the ESS dispersal fraction in terms of the coefficient of from the ESS analysis always matched a regression coeffi- relatedness. Higher relatedness increases dispersal. The cient in the Price equation. Under common assumptions, reason is that a genotype competing with close relatives that regression coefficient is exactly the coefficient of re- gains little by winning locally against its relatives. Even latedness of an individual to its neighbors. Thus, one can a small chance of successful migration and competition often replace the slope of group phenotype with respect against nonrelatives can be favored. to individual phenotype by the coefficient of relatedness from kin selection theory, r (Box 4). All of that may sound a bit complicated, but in practice Discussion of the new kin selection methods of it is quite easy. Take the derivative of fitness with respect analysis to the individual phenotype; set to r the slope of average group phenotype with respect to individual phenotype; In retrospect, the kin selection analysis of dispersal and solve for a local maximum to obtain seems simple and obvious. Yet, at the time, Hamilton r − c had not been able to use kin selection theory to analyze d∗ = , (6) r − c2 the problems of sex ratio and dispersal that interested him. In the theoretical analysis of phenotypes, kin selec- which is the result reported in Frank [95] and discussed tion was only a post hoc method of interpreting results in more detail in Frank [15, Section 7.2]. obtained by other means. The understanding of process was sufficiently confused that Motro could write three articles criticizing the Hamilton & May model, arguing Kin selection simplifies and generalizes prior results that only a formal population genetic analysis could give the correct results. In fact, Motro simply made differ- Motro and Hamilton & May assumed that only one ent assumptions about whether the dispersal phenotype female could breed in each patch. They made that as- was controlled by the parent or the offspring and about sumption because their methods did not allow them to whether females mated once or many times. Those dis- analyze the more complicated situation in which multi- tinctions become obvious from the simpler and more gen- ple females bred in each patch. To compare eqn 6 to the eral perspective of the kin selection analysis of pheno- prior models, let us first follow that earlier assumption type. of one breeding female per patch. The solution in eqn 6 did not have any obvious con- If the organism is asexual, then in each patch the can- nection to Hamilton’s rule. Similar kin selection analyses didates for dispersal—the offspring of the single asexual of sex ratio models also did not connect in any clear way mother—are related by a coefficient of one, r = 1, and to Hamilton’s rule [91–96]. At that time, I had con- eqn 6 reduces to Hamilton & May’s result in eqn 2. If the cluded that Hamilton’s rule was not useful for solving re- organism is sexual, and offspring phenotype is controlled alistic problems. In application, nothing like Hamilton’s by the mother, then r = 1, and we again have eqn 2. rule appeared. It turned out that I was wrong about The reason r = 1 with maternal control is that it is the the generality of Hamilton’s rule, but it would take an- relatedness of the individual that controls phenotype to other ten years after 1986 to find the hidden connections. the average phenotype of its patch that matters. With Meanwhile, throughout the 1980s and early 1990s, debate only one breeding female, the mother’s phenotype and about Hamilton’s rule continued. the average phenotype in the patch are the same, and r = 1. If, by contrast, offspring control their own phenotype, PROBLEMS WITH HAMILTON’S RULE BEFORE then relatedness among competitors is r = 1/2, because 1996 competitors on a patch are outbred full siblings. With r = 1/2 in eqn 6, we recover the second result of Hamilton Hamilton’s rule became a widely used standard. The & May in eqn 3. simplicity of rB − C > 0 allowed empiricists to think Hamilton & May assumed that the mother mated only through how particular natural histories might influence once, so that siblings are related by r = 1/2. By contrast, the evolution of phenotypes and to formulate testable Motro implicitly assumed that the mother mated several hypotheses—the essential attribute for a successful the- times and that offspring in a patch were only half sibs, ory. Most biologists continue to abide by some notion so in his model r = 1/4. Using that value of r in the along the lines of Hamilton’s rule, based on its per- general solution of eqn 6 yields Motro’s result in eqn 4. ceived success in explaining empirical patterns (citations Thus, the single kin selection model of eqn 6 explains in Box 3). the parent-offspring conflict and the difference between Yet many theoreticians vigorously attacked the sim- Motro’s analysis and Hamilton & May’s model. plicity of Hamilton’s rule. It appeared easy to set up 8 scenarios in which the rule failed. Those theoreticians in the pair behaves. For example, to achieve a task, it who favored the rule replied with ever more sophisticated may be that both individuals have to contribute coop- theoretical analyses. Anyone with primarily biological eratively to that task, otherwise no gain is achieved. In interests, or lacking in years of specialized mathemati- many cases, one can think of the term y as the average cal training, gave up following the details. Clearly, the partnership phenotype of a focal individual paired with broader notions of kin selection uniquely explained di- a randomly chosen partner from the group. If we include verse aspects of natural history. Hamilton’s rule seemed this synergistic effect to the increment for fitness we have to capture the right idea, if not every possible assumption that one could conceive. w = Bxp − Cx + Dy, (7) In this section, I discuss some of the criticisms of Hamilton’s rule. I focus primarily on issues that arose be- in which D is the fitness contribution of the partnership fore 1996. In that year, my own understanding changed phenotype. Putting this expression in the Price equation with the publication of Taylor and Frank [103]. With the yields the condition for the altruistic behavior to increase help of Peter Taylor’s elegant insights, I came to see the as rB − C + ρD > 0. The term ρ is the slope of y with proper generalization of Hamilton’s rule. That general- respect to x, which measures the association between the ization united the simplicity of the original rule with a individual’s tendency to be altruistic and the tendency of new and broader scope. With the broader scope, what that individual’s partnerships to behave in concert when previously seemed like a long list of exceptions to the faced with a task that requires joint action. rule could be seen as part of an expanded way of framing This analysis suggests that Hamilton’s rule fails when problems of social evolution. Later sections discuss the there are multiplicative interactions between phenotypes, changes in understanding from 1996. This section sets because factors beyond additive costs and benefits arise the necessary background by focusing on issues before [104]. To anticipate later discussion, note that we may that time. think of y in eqn 7 as any characteristic other than the focal individual’s value for the particular behavior under study, x, and the average of that particular behavior in Extra terms in Hamilton’s rule neighbors, xp. If ρ is the slope of y on x and ρ 6= 0, then the analysis will yield a condition for the increase of the Hamilton’s rule is not sufficient if the direction of altruistic character x as rB−C+ρD > 0. It does not take change favored by selection depends on some term in much imagination to think of many different attributes, addition to rB and C. Queller [104] cited several ear- y, that could be associated with x and therefore cause lier examples in which an extra term is required. He Hamilton’s rule to fail, if one chose to think of the subject then showed the general way in which such terms arise. in this way. Before developing that notion, let us continue Suppose that the phenotype, x, is the level of an altruis- with some additional issues. tic behavior that is costly to an individual but beneficial to its neighbors. Among all the neighbors that interact within a local patch, the average level of the altruistic phenotype is xp. Social interactions, through the effects Ecological context and density dependence of these behaviors, increment fitness by w = Bx − Cx, Hamilton [10, 11] emphasized that limited migration p would tend to keep genetically related individuals near in which C is the cost to the individual for the behavior, each other. Such population viscosity could favor altru- x, and B is the benefit received by the individual from ism through the increased relatedness of neighbors. A neighbors that, on average, behave as xp. Putting that popular series of papers in the 1990s raised a problem expression into the Price equation, one finds that the al- with Hamilton’s view of population viscosity [15, 105– truistic behavior increases when rB − C > 0, which illus- 108, Section 7.1]. trates Hamilton’s rule. Here, r is the slope (regression) In a viscous population, neighbors may be related and of xp with respect to x, which measures how strongly an therefore candidates for altruism. However, those same individual’s behavior is associated with the behavior of neighbors may also be the primary competitors of a po- its neighbors. tential altruist. Two potentially offsetting effects may Following a variety of earlier studies, Queller [104] occur. First, altruism may increase the vigor and success pointed out that fitness might depend on synergistic in- of a neighbor, which provides a benefit to the actor in teractions between an individual and its neighbors with proportion to the relatedness between actor and recipi- regard to altruistic behaviors. If the benefit only accrues ent. Second, the more vigorous neighbor may take more when both the focal individual and its neighbor act in of the local resources, which imposes a cost on the orig- concert, then we need to consider a multiplicative term, inal altruistic actor. In some cases, the two factors may y = xxp. One can think of this term as describing the cancel each other. If so, altruism cannot evolve in vis- phenotype of pairs of individuals, in which the pheno- cous populations, even though neighbors may be closely typic value of the pair depends on how each individual related. 9

An early analysis by Alexander relative scales of altruism, competition, and relatedness. Clark [109] argued that “Competition between female kin The problem was expressed beautifully in a much ear- for local limiting resources may explain a male-biased sec- lier article that is rarely cited in this context [23, pp. 353, ondary sex ratio ...” To a mother, the benefit of making 376] an extra daughter is offset by the competitive effect that extra daughter may have on the mother’s other daugh- Hamilton’s development of the concept of in- ters. By contrast, sons may disperse before competing clusive fitness began with the argument that and therefore not reduce the fitness of other sons. the reproductive success of an individual or- The development of sex ratio theory in the 1980s ac- ganism cannot be measured by alone consid- counted for the interactions between dispersal, related- ering the effects on the number and quality of ness, the scale of competition, and the scale of resource direct descendants. Also involved are effects limitation [? ]reviewed in¿frank98foundations. That the- on the reproduction of genetic relatives. But, ory made clear that altruism and kin interactions could since both of these effects can only be mea- only be understood in the full ecological and demographic sured in a comparative sense, there are al- context of the behaviors under study. As Alexander [23] ways other individuals involved, and they are emphasized, the scales of altruism and relatedness must the reproductive competitors of the individ- be evaluated in relation to the scale of competition. uals and genetic elements being considered. In Hamilton’s equations they [the competitors] are the population at large, an average of Ecology and demography in relation to Hamilton’s rule the rest of the species. Hamilton’s arguments thus seem only to consider the detriments of The terms r, B and C of Hamilton’s rule depend on altruism in terms of energy expenditure and the ecological and demographic context. Any consider- risk-taking in the act itself, and to omit or ation of natural history makes that clear. Yet the sub- at least not specify the problem of subse- ject is full of “discoveries” that Hamilton’s rule fails be- quent detriment to the altruist (or its descen- cause those terms are not constants, and that ecology and dants) owing to the presence of the recipient demography matter. The tension arises because simple (or its descendants). But all of the mem- population genetic models tend to take costs and bene- bers of the species, or population, will not fits as fixed parameters rather than ecologically derived compete equally directly with any given in- variables that depend on context. dividual. Nearby individuals are more direct Similarly, relatedness turns out to be part of a much competitors. This would not affect Hamil- broader problem of how to measure costs and benefits ton’s calculations unless nearby individuals in common units. The traditional view of relatedness also have a greater likelihood of being closer translates gene frequency deviations in an actor with re- genetic relatives. That such a correlation gen- spect to gene frequency deviations in a recipient, putting erally exists is obvious, and is acknowledged all terms on the common scale of consequences for gene by Hamilton [85]. I believe this factor mod- frequency change. That makes sense. However, much ifies every consideration of whether or not, confusion arose because terms that appeared to be equiv- and how, nepotism will actually evolve. ... alent to relatedness often popped up in analyses, yet had The significance of this problem can scarcely a variety of meanings. As we continue on through the be exemplified better than by a point made history and the generalization of Hamilton’s rule, we will earlier—that if degrees of relatedness and in- see that Hamilton’s rule can only be understood within tensity of competition among individuals di- a broader approach of partitioning the causes of fitness minish together in certain, not unlikely fash- into meaningful components. Before turning to that gen- ions with distance from any given individual eralization, I continue with the criticisms of Hamilton’s in a population, then nepotism cannot evolve. rule that dominated discussion in the 1980s.

Dispersal and sex ratio FURTHER PROBLEMS: DYNAMICS

Earlier, I discussed a model of dispersal by Hamilton It is better to be vaguely right than exactly and May [87]. In that model, selection favored dispersal wrong [110]. to reduce competition against neighboring relatives and increase competition against distant, unrelated individu- The basic principles of kin selection theory and its de- als. That process of uncoupling the scales of relatedness scendant ideas always hold. Those principles are: costs and competition can be understood in light of Alexan- and benefits of phenotypes matter; statistical associa- der’s analysis. tions between actors and recipients of behaviors matter; Another line of thought independently developed the and heritability traced from the expression of phenotypes 10 to representation among descendants matters. To most near an equilibrium is sometimes called ‘statics.’ As an biologists, kin selection theory is understood as a concise exact analytical tool, kin selection theory is primarily a summary of those basic principles. tool for statics. The story differs in the theoretical literature. Once The problems of statics are widely known. Real bio- one loses sight of the biology, all that is left concerns logical systems are unlikely to be near an equilibrium. mathematical aspects of the theory. Can a particular Thus, exact analysis must consider dynamics—the full approach be used to make an exact calculation about processes of change. Worse, many problems have dif- predicted outcome? If another approach is simpler but ferent points at which the system could come to rest— limited in the scope over which it is exactly right, should alternative equilibria. If one only analyzes what happens it be discarded entirely? near an equilibrium, as in statics, one has no idea which These questions primarily concern mathematical of the alternative equilibria the system will end up near. rather than biological issues. Why should you care about Only a full dynamical analysis of change can indicate those questions if you are interested in the biology? You which of the equilibria one would expect the system to should care because the theoretical literature has not evolve toward. done you the favor of sorting out the parts that mat- Given all of the benefits of dynamics and the limita- ter to you versus the parts that do not. Instead, each tions of statics, why would anyone ever consider a theory theoretical article seemingly replies to another theoreti- based on statics? Because, to make a dynamic analysis, cal article. The real conceptual progress that does matter one has to make a lot of exact and very specific assump- remains buried under the weight of much that does not tions, otherwise one cannot do the analysis. For exam- really matter to someone interested in biological prob- ple, one usually has to specify exactly how the genetics lems. of a phenotype is controlled in order to make a complete Ultimately, sorting all of this out can only be done model of population genetics. The problem is that we do at the technical level. There is no way to argue that not know the proper assumptions to fill out the required certain technical details do not matter without showing, list for a dynamic analysis. So, in making all of the nec- technically, why they do not matter. I expressed my essary detailed assumptions for a dynamic analysis, one views about the technical issues in my book [15]. Written is left with an exact calculation that applies exactly to 15 years ago, that book still gives a good overview of the nothing. issues. Here, I continue to avoid technical discussion, and By contrast, the static analysis requires few assump- simply evoke my perspective on the main points. tions. In kin selection arguments, usually one needs to specify how phenotypes and various environmental factors influence fitness. One obtains a static analysis that Statics versus dynamics translates the biological assumptions about fitness into a prediction about how natural selection influences the Often in the writings of economists the words evolution of phenotypes. Clearly, this is a vague sort of “dynamic” and “static” are used as nothing analysis, but it is not exactly wrong, as is the full dynam- more than synonyms for good and bad, re- ical analysis. Put another way, a static analysis does not alistic and unrealistic, simple and complex. suffer the pretense of exactness. Instead, a static analysis We damn another man’s theory by terming accepts the limitations and calculates the qualitative pre- it static, and advertise our own by calling it dictions about what one expects to see in various natural dynamic. Examples of this are too plentiful settings. to require citation [111].

It would be helpful to calculate exactly how selection Comparative statics influences phenotypes. What is the predicted sex ratio under certain patterns of mating and competition? Now, an observer fresh from Mars might ex- When will individuals band together to form cooperative cusably think that the human mind, inspired groups? When will groups split apart or fail because of by experience, would start analysis with the internal conflict? The generalizations of Hamilton’s rule relatively concrete and then, as more subtle and the broader theories of kin selection only provide relations reveal themselves, proceed to the exact calculations under certain assumptions [15]. relatively abstract, that is to say, to start Roughly, the theory becomes exact when the variation from dynamic relations and then proceed to in fitness is small. However, significant amounts of varia- the working out of the static ones. But this tion are nearly universal in biology. To use kin selection has not been so in any field of scientific en- as an analytical tool, when can we assume that there is deavor whatsoever: always static theory has little variation? Lack of variation primarily arises when historically preceded dynamic theory and the a population approaches an equilibrium. Near an equi- reasons for this seem to be as obvious as librium, little further change occurs, and the population they are sound—static theory is much sim- comes nearly to rest—the condition of stasis. Analysis pler to work out; its propositions are easier 11

to prove; and it seems closer to (logical) es- A PRACTICAL METHOD FOR SOLVING sentials [112]. PROBLEMS AND RECOVERY OF HAMILTON’S RULE

Hamilton did not use kin selection theory to analyze A static analysis summarizes the major forces that po- models of phenotypes. He did pass on to his students tentially influence phenotype. However, that sort of sim- unpublished notes about how the Price equation might plified theory cannot predict the actual phenotypic value be used to study sex ratios. I obtained those notes in that one expects to observe. Instead, one should think of 1979 while attending Hamilton’s graduate seminar. I statics in terms of comparison. The dispersal model dis- modified Hamilton’s Price equation method to solve a cussed earlier provides a good example. In that model, wide variety of problems with kin interactions, includ- the predicted dispersal probability from a static kin se- ing the dispersal model that I presented earlier. Hamil- lection analysis was given in eqn 6, repeated here ton’s class notes are posted as supporting information for this article in the files WDH notes part1 SuppInfo.pdf and WDH notes part2 SuppInfo.pdf. r − c d∗ = , The Price equation method was a bit tedious. Eventu- r − c2 ally, I found the match between the standard ESS analysis of phenotypes and the Price equation analysis of kin interactions. Earlier, I briefly mentioned the enhanced in which r is the relatedness between competitors on a ESS approach for kin interactions. This section describes ∗ patch, c is the cost of dispersing, and d is the predicted that enhanced approach in more detail and considers the equilibrium dispersal rate. Certainly, no one believes that broader consequences for understanding the theory. this model will predict the actual dispersal rate in real cases. Too many factors are left out. Instead, the whole idea of the analysis is to isolate a few key processes. First steps The relatedness term, r, raises another problem. In an actual population, the relatedness in a patch would de- The Price equation is a general expression for the pend on the dispersal rate. The theory given above does change in average phenotype. To calculate the change not tell us how to analyze this dependence between relat- in phenotype, one needs to specify the relation between edness and dispersal. To simplify the analysis, we could phenotype and fitness. For example, Box 4 shows the set relatedness, r, as a given parameter. The model then relation between dispersal and fitness. When one puts predicts that as the relatedness among competitors on that expression for fitness into the Price equation, one a patch increases, the observed dispersal rate increases. obtains a variety of terms. Each term describes the con- That sort of prediction is the method of comparative stat- tribution of a component of fitness to the overall change ics. in phenotype [95]. In a typical problem with kin interactions, different Comparative statics begins with the assumption that a components of fitness oppose each other. Some favor the dynamic analysis following the joint dependence between increase in the phenotype, others favor the decrease in dispersal and relatedness would, ideally, be preferable. the phenotype. For example, dispersal imposes a direct However, one cannot easily achieve that ideal, because cost on an individual, because the increased risk during dynamical analysis requires specific assumptions about dispersal increases the chance of death before reproduc- a variety of processes for which we do not have information. By contrast, dispersal benefits the reproduction tion. Instead, one admits that one does not know enough of neighbors by reducing the competition experienced by to predict dynamics, and so the analysis should empha- those neighbors. Together, the various terms in the Price size statics. A static analysis is based on fewer, simpler equation analysis define the different components of fit- assumptions. ness. The comparative static analysis isolates key causal pro- I found that I could avoid using the Price equation by cesses in a direct way. For example, the dispersal model directly calculating the value of the phenotype that max- makes an interesting comparative prediction: as the re- imized fitness. The maximization approach came from latedness in a patch increases, the predicted dispersal analyses of ESS phenotypes [102]. The idea is simply rate increases. The ideal empirical test identifies natural that, at equilibrium with regard to selection, the favored or laboratory settings in which relatedness changes and phenotype must have fitness at least as great as any one can measure the associated change in the amount of slightly different phenotype. If that were not the case, dispersal. The structure is: measure the change in the then the nearby phenotypes with higher fitness would in- putative cause and the change in the outcome. If the crease, and the previous candidate phenotype was in fact direction of change in the outcome repeatedly tends to not an equilibrium with respect to nearby alternatives. follow the predicted direction of change, then one is on To find a local maximum, one uses the standard calcu- to something. lus approach. Write a function that relates phenotypes 12 to fitness. Take the derivative (change) of fitness with calculus method provided important advantages in un- respect to phenotype. That derivative describes whether derstanding the evolution of phenotypes in social inter- fitness increases or decreases with respect to a change in actions. Thus, I needed to develop the calculus approach phenotype. One assumes that all members of the popu- into a publishable form. lation have the same phenotype, and then analyzes the fitness change for a small fraction of the population that has a slightly deviant phenotype. In other words, little Generalized Hamilton’s rule as a marginal value variation exists. expression A local maximum can only occur when the fitness neither increases nor decreases for the deviant phenotype. To develop the calculus method, I approached Peter If the derivative were increasing, then larger phenotypes Taylor in 1995. Taylor found a way to connect the would be favored. If the derivative were decreasing, then calculus method to Hamilton’s rule. I had abandoned smaller phenotypes would be favored. When the deriva- Hamilton’s rule, because nothing like Hamilton’s rule tive is zero, then a balance in forces has been achieved, had appeared in my numerous studies of different pheno- and no change is favored. When at a balance, one checks types. Suddenly, all of my prior analyses could be under- larger phenotypic deviations to make certain that fitness stood more deeply by their connection to the generalized would indeed decrease if the phenotypic value changed. Hamilton’s rule that came from our work [103]. If so, then the balance point, where the derivative is zero, The calculus method automatically separates out the is a local maximum and an ESS. Maximization is just a various causes of fitness into the three aspects of Hamil- mathematical trick for finding an equilibrium. An equi- ton’s rule. First, all of the focal individual’s phenotypic librium is a point at which stasis occurs, leading to a effects on its own fitness combine into one term. In the static analysis. calculus analysis, that term is the small change in direct When applying the maximization method, one ends up fitness for a given small change in the focal individual’s with terms that are the change in partner phenotype with phenotype, holding constant the phenotype of other in- respect to the change in the focal individual’s phenotype. dividuals. As I mentioned earlier, that kind of term was initially The small changes are usually described as ‘marginal thought to be difficult to interpret when kin interactions changes,’ matching the classical usage of marginal values occur. When partners are genetically related, then the in economic analysis. The first term is thus the marginal association between partner phenotype and focal individ- effect of an individual’s phenotype on its own fitness, ual phenotype depends on the degree of genetic similar- holding constant all other effects. This marginal effect ity. Early analysts recognized that relatedness matters matches exactly the cost term in Hamilton’s rule. One [86, 87], but abandoned the method because it was not traditionally defines this effect as a cost in such models, clear exactly how genetic similarity would translate into because in the standard case of altruism, one analyzes the relation between partner phenotype and focal indi- a phenotype that directly lowers the actor’s fitness—the vidual phenotype that arises in the calculus method of cost. Mathematically, there is no need for this direct differentiation. effect to be negative and costly, and in some models it is I matched the Price equation terms to the ESS maxi- not. But we retain the traditional usage and label this mization method. I could see that the difficult term for term ‘the cost.’ Because the calculus approach analyzes the change in partner phenotype with respect to individ- small changes, that method automatically gives us the ual phenotype was like the regression term that Hamilton marginal cost. [2] had come to use as the coefficient of relatedness in his The second component is the marginal effect of small theory. That equivalence meant that one could use the changes in an individual’s phenotype on the fitness of much simpler maximization trick and the power of cal- social partners. Traditionally, the effect of the actor on culus to analyze fitness and find ESS phenotypes. One recipients is called the ‘benefit.’ Thus, this second effect just had to replace the change in partner phenotype with is the marginal benefit component. respect to focal individual phenotype by the coefficient The third component measures the association be- of relatedness between them. tween the actor’s phenotype and the phenotype or genotype of social partners. The exact measure depends on Before 1995, I only published Price equation analy- various issues [15]. Here, simply note that the calcu- ses. That method, although a bit tedious, easily gave lus approach automatically weights any marginal benefit solutions to many problems of dispersal, sex ratio, and components by the association between the actor and tragedy of the commons models for sociality. I published recipient. That association matches the coefficient of re- a series of articles between 1985 and 1994 on those top- latedness from Hamilton’s rule, although in a generalized ics, as summarized in Frank [15]. I did not publish the form. maximization method without use of the Price equation, An equilibrium can occur only when selection does not because the Price equation had a prior history in the liter- favor a change in phenotype. Thus, at equilibrium, the ature and seemed like a more defensible approach. How- marginal Hamilton’s rule equals zero and has the form ever, in Frank [113], I analyzed a more complex problem, for which avoiding the Price equation and using only the rBm − Cm = 0, (8) 13 in which I use the m subscripts to emphasize that the in which the marginal cost of increased dispersal, Cm, is benefit and cost terms are marginal values [103]. The the cost of dispersal, c, divided by the level of competition marginal values will change with changing phenotypic on a patch for a breeding spot, 1 − cd (Box 5). The values and with changing ecological and demographic marginal benefit is context. Thus, this analysis makes clear that Hamil- ton’s rule arises from context-dependent benefit and cost terms. Others had noted the context dependence of those Box 5. Analysis of dispersal model terms [21, 64, 73]. But Taylor and Frank [103] was the first approach that easily found the ESS phenotype and For the marginal cost in eqn 9, we obtain the competition term at the same time showed the underlying conceptual ba- in the denominator by counting the number of individuals on sis by expressing a generalized Hamilton’s rule. I use a patch competing for each breeding slot. That number is proportional to the fraction of individuals that do not disperse the term ‘generalized’ because the analysis extended the and stay at home to compete, 1 − d, plus the fraction of kinds and complexity of social interaction that could be individuals that disperse and become immigrants into each studied and the interpretation of relatedness. It also be- patch. The immigrant fraction is the fraction that disperse, d, came clear how to deal with multiple social processes si- multiplied by the probability of surviving the dispersal phase, multaneously, leading to multiple marginal cost and ben- 1 − c. Putting the pieces together for the denominator, which efit terms. is an expression proportional to the number of competitors The Taylor & Frank method leaves out much mathe- on a patch, we obtain 1 − d + d(1 − c) = 1 − cd. matical detail. That simplicity allows one to start with For the marginal benefit in eqn 10, the denominator mea- an expression for how different phenotypes and other fac- sures the intensity of competition between pairs of individuals in the patch for a given level of dispersal and cost of dispersal. tors influence fitness, take a standard type of derivative Roughly speaking, intensity of competition can be measured from calculus, and evaluate an equilibrium ESS outcome by the probability that two individuals will compete for the favored by selection. This method of analysis automati- same breeding spot. From the previous paragraph, the in- cally gives the form for the equilibrium ESS condition as tensity of competition for a breeding spot is proportional to the marginal Hamilton’s rule in eqn 8. The ESS provides 1 − cd. Thus, the pairwise competition for a spot is propor- the basis for comparative statics, in which one can see tional to (1 − cd)2. clearly how the the predicted phenotype changes with respect to various social, ecological and demographic causes. To achieve that simplicity, one ignores dynamics, cer- 1 − d tain details of genetics, and the developmental complex- Bm = , (10) ities that connect genotype to phenotype. Most of those (1 − cd)2 complications do not alter the mathematical results with in which the numerator, 1 − d, describes how increased respect to searching for equilibrium values. That magi- dispersal reduces the competition experienced by neigh- cal simplification arises because, whenever the amount of bors. The denominator adjusts the benefit of reduced variation in the population is small, most of the complex- competition by the intensity of competition between pairs ities become negligible in size, and the method correctly of individuals for each breeding spot, (1 − cd)2 (Box 5). identifies the outcome of selection. Although it is pos- Putting these terms in the marginal Hamilton’s rule of sible to include additional complexity, the method sim- eqn 8 leads to the general solution for dispersal in eqn 6. plifies by design, following the precepts of comparative The marginal cost and benefit expressions are essen- statics. tially impossible to obtain either intuitively, by thinking about the dispersal problem, or by inspecting the mathematical expression for fitness given in Box 4. It is Marginal Hamilton’s rule analysis of dispersal only with the maximization technique that the separate marginal cost and benefit expressions can be found. The marginal Hamilton’s rule in eqn 8 evaluates a Once one has the marginal expressions, one can study problem by the marginal costs and benefits of a pheno- them to learn how the simple biological assumptions type. Consider the dispersal model of Hamilton and May translate into cost and benefit effects on different com- [87] described earlier. If we start with the fitness expres- ponents of fitness. As the phenotype of interest changes, sion in Box 4, take the derivative of fitness with respect the costs and benefits change through complex social, to the variant phenotype of a focal individual, and collect demographic and ecological interactions. There will es- together the terms, we end up with the cost, benefit, and sentially never be fixed costs and benefits that can be relatedness components of the marginal Hamilton’s rule plugged into some equation. Instead, one must extract [15, Section 7.2]. those costs and benefits from the biological assumptions Following that method and the analysis of the problem. Historically, people have tended to take Hamilton’s c rule as an expression based on fixed costs and bene- C = , (9) m 1 − cd fits. That history arose because of the population genetic 14 modeling that was associated with the early evaluation of must be discounted by the low expected reproductive the theory. In a population genetic model, the tendency value of old age when compared with the greater repro- has always been to set costs and benefits as parameters ductive value associated with altruistic benefits given to associated with different genotypes. Frequency and den- a younger individual. sity dependence, and other biological interactions, were The problem of reproductive value had always been a considered to be distinct from the costs and and benefits, part of kin selection theory. Hamilton [85] clearly noted as if those costs and benefits were not an outcome of the that different individuals in a social interaction may con- biology. That approach misled people to think of Hamil- tribute differently to the future of the population. In ton’s rule as an expression that translated fixed costs and Hamilton’s analysis, when calculating the benefit of al- benefits into a conclusion about evolutionary change. truism toward different individuals, one must weight each The Taylor and Frank [103] method helps because it individual by its genetic relatedness to the actor and by gives a way to translate biological assumptions about its relative reproductive value. For example, helping a phenotypes and fitness into the separate marginal cost young cousin in the prime of life provides greater net and benefit terms needed to evaluate Hamilton’s rule. benefit than the same help given to a much older sibling One must keep in mind that Hamilton’s rule is, in prac- near the end of life. Relatedness alone is not sufficient. tice, an expression that derives particular meaning from context. Without context, the marginal cost and benefit terms are abstract. That abstraction is the primary Prior status of the theory strength of Hamilton’s rule, allowing it to apply univer- sally. At the same time, abstraction often causes confu- Although the importance of reproductive value was un- sion, because the power of abstraction requires a certain derstood, the theory was in an odd state before 1996. The degree of consideration to understand and apply properly general approach used by Hamilton and most followers [17]. was simply to attach a reproductive value weighting to each component of fitness. If an actor and recipient had different reproductive values, then an extended Hamil- TAYLOR’S INSIGHT INTO REPRODUCTIVE ton’s rule might be rBv − Cv > 0, in which v and v VALUE, DEMOGRAPHY AND LIFE HISTORY r a r a are the reproductive value weightings for recipient and actor. If, for example, the recipient is a young individual In the recent literature, the methods of Taylor and near prime reproductive age, and the actor is an old in- Frank [103] are primarily used to analyze specific models dividual near the end of life, then vr is much larger than by the marginal version of Hamilton’s rule (eqn 8). As va, and the old individual may be favored to express a long as one accepts the approach of comparative statics, costly altruistic act toward the young individual even if generating testable hypotheses follows a simple proce- the recipient is only distantly related to the actor. dure. First, express the different phenotypes and ecologi- Simply attaching reproductive value terms to fitness cal factors that affect fitness in an equation that describes components is correct but often not helpful. It is not the natural history assumptions. Second, maximize the helpful, because it provides no guidance about how to fitness expression to obtain the predictions of compara- find the right valuations in realistic problems. tive statics. Separately, a complete theory of life history had devel- That mathematical method solves what had previ- oped [114]. That theory provided clear guidelines for how ously been complex and often beyond analysis. It also to compare different classes of individuals and different provided a conceptual advance by developing my ear- components of fitness with respect to their reproductive lier Price equation and maximization approaches into values. The relative reproductive value weightings pre- the marginal Hamilton’s rule expression in eqn 8. The dict how life history characters, such as relative invest- marginal Hamilton’s rule clarified understanding about ment in reproduction and survival, may change with age, how selection works and how to interpret a wide variety condition, local ecological factors, and the demographic of problems in natural history [15]. structure of the population. That well developed theory of life history had many successes in explaining major aspects of organismal physiology and behavior. The importance of reproductive value Actual problems of natural history often required combining the relative reproductive valuations with the role From my point of view, however, the great advance in of kin interactions. However, connecting life history Taylor and Frank [103] came from Peter Taylor’s insight theory to kin selection analysis had not been achieved about reproductive value in models of kin selection. Re- in a generally useful way. When studying the the- productive value concerns the relative contribution of an ory of such problems, one could come up with vari- individual to the future of the population. For example, ous special approaches or complicated analyses [? ]e.g., older individuals may have a lower expectation of future ¿frank87demography. However, if one cannot apply a reproduction before death than do younger individuals. theory in a clear and simple way, one does not truly un- The benefit of altruism provided to an older individual derstand the theory at a deep level. In this case, the basic 15 issue was combining reproductive value with the various [103], I suddenly realized the deep connections between aspects of phenotypic evolution that arose in social set- the understanding of marginal valuation and reproduc- tings. But it was not known how to do that in an easy tive valuation in sociality and the solving of actual prob- way. lems of natural history. That new perspective caused me to take up the study of the general theory. I wanted to evaluate the status of the deeper principles in relation to Combining kin selection and life history the way in which one analyzed problems.

Peter Taylor saw all of that. He figured out how to attach our general kin selection approach with life his- Unresolved issues tory analysis. The extended method accounted for the full context of ecological and demographic factors that I soon found that Taylor and Frank [103] led to many interact with social phenotypes [103]. The method easily key points that remained vague or unresolved. Two is- translates natural history into analysis. As before, one sues stood out. First, the meaning of the relatedness specified a problem by how phenotypes affect fitness. In coefficient seemed to be confused. Variations in interpre- addition, one now also specified the different classes of tation arose for different kinds of problems. The original individuals involved and the distinct components of fit- simplicity of Hamilton’s genetic kinship theory had fi- ness. For example, there could be older individuals and nally sunk. There were many earlier hints of problems, younger individuals. The age groups would be embedded but with the expanded scope of the theory, the contra- in the demographic structure of birth and death rates, dictions became too numerous to ignore. which could depend on the social phenotypes. Each class The second issue concerned the connections between could be either actor or recipient or both, in the sense the models of phenotype in sociality and the developing that the phenotype of interest could be expressed by dif- theories of multivariate selection in quantitative genetics. ferent kinds of individuals and could affect different kinds Those theories of multivariate selection had advanced of individuals. Many realistic problems of sociality have greatly since the classic article by Lande and Arnold this sort of class structure. [115]. Following Lande & Arnold, many studies consid- All of this may sound complicated. But the beauty of ered how to partition the causes of selection among the Taylor’s insight is that the crude maximization method different characters that affected fitness and how to ana- that I had been using was transformed into a simple lyze expanded notions of heritability. Those advances in method that combined analysis of sociality, generalized quantitative genetics seemed to be closely related to the notions of kin relations through correlations between phe- problems of analyzing social evolution. Some steps had notypes, and the full power of life history analysis. been made to connect those advances to sociality, yet the deeper relations remained unclear. Following up on what I learned by working with Pe- Opportunity for synthesis ter Taylor, I set out to understand those two issues: the generalization of relatedness and the causal analysis of Up to 1996, I had relatively little interest in theories fitness and heritability. I eventually came across a sur- of kin selection, inclusive fitness, and group selection as prising number of problems that I had never understood separate subjects worthy of study. Instead, I had thought or had not even realized that I was ignoring in my many of those theories as tools that one used to solve problems prior analyses of kin interactions. of natural history, in the sense of comparative statics. I had developed those solutions into testable hypotheses about interesting phenotypes. My collaboration with QUELLER’S INSIGHT AND THE TRUE Peter Taylor opened my eyes to the deep problems of MEANING OF HAMILTON’S RULE selection that had been latent in the subject. In particular, our new approach brought out the proper In pursuit of unresolved issues, I soon came to the formulations for marginal valuation and reproductive val- key articles by Queller [64, 73]. Queller developed the uation. Of course, that was, in a sense, not new, because idea that Hamilton’s analysis had always been about the Hamilton had all the right ideas. But, just as Hamilton causal interpretation of fitness components. Hamilton could not use kin selection theory to solve the problems separated the total fitness effect of a social act into costs, that most interested him, such as dispersal and sex ra- benefits, and relatedness so that one could reason more tio, he also could not use life history and reproductive clearly about how selection shaped behaviors. Hamilton value to solve problems of sociality embedded in the nat- never presented the theory as an alternative to classical ural complexities of demography and multiple classes of methods. Instead, he was after causal decomposition. social interactants. Much of the literature through the 1980s lost sight of I had always believed that if one could not use a theory this primary emphasis on causal decomposition. Con- to solve problems, then one had only a superficial un- troversies about population genetics versus kin selection derstanding of deeper concepts. With Taylor and Frank were ultimately about the tension between dynamics and 16 comparative statics. Full dynamical analyses with de- regression expressions. That complexity led me to aban- tailed assumptions about genetics provide exact theories don the direct use of such regressions in my earlier Price that perhaps apply to no real cases. Statics applies ap- equation models and instead to develop the maximiza- proximately to all situations, but perhaps not exactly to tion technique. The maximization technique requires the any particular case. additional assumption of limited variation. In return for In my own comparative statics analyses of dispersal, that assumption, one obtains a simple and powerful com- sex ratios, and various social traits, I focused on the parative statics tool. In practice, one trades the concep- solutions for phenotypes in terms of biological assump- tually powerful regression modeling and causal analysis tions. I did not try to analyze the problems with respect for the analytically powerful maximization method and to the sort of causal decomposition into costs and ben- comparative statics. efits that Hamilton had emphasized. Later I came to Once I had the first formal expression of the maxi- understand the power of the marginal Hamilton’s rule. mization technique from Taylor and Frank [103], I eval- I then began to understand my past comparative stat- uated the advantages and disadvantages of the various ics models in terms of Hamilton’s partition into causal approaches. I could see that one had to unite Queller’s components. In particular, the marginal Hamilton’s rule causal approach through regression and path analy- automatically separated fitness components into direct sis with the analytical power of the maximization and effects (costs) and indirect effects (benefits) weighted by marginal value techniques. Causal analysis ties back to relatedness [103]. Hamilton’s original goal of separating fitness and trans- Queller [64, 73] had also derived an expression similar mission into components in order to reason more clearly to the marginal Hamilton’s rule. Queller first partitioned about how selection and genetics shape social traits. fitness into components according to a regression model. Marginal values and comparative statics are also neces- His regression terms included the phenotype of the actor sary, to provide the tools to analyze actual problems. Put and the phenotype of the recipient—or the genotypes de- another way, causal analysis provides the foundations for pending on the analysis. Queller considered the different reasoning about complex problems, and marginal value phenotypes within the general context of quantitative ge- analysis provides the techniques for applying that rea- netic analyses of multivariate selection [115]. That con- soning to particular cases. nection to multivariate quantitative genetics eventually Taylor and Frank [103] came close to uniting the opened the way to a broader interpretation of the com- quantitative genetic and causal models given by Queller ponents of fitness and the components of heritability. [64, 73] with the calculus techniques for comparative stat- Queller put his multivariate regression expression for ics. However, once that loose connection was made, I fitness into the Price equation to obtain a multiple regres- soon began to see the next level of unsolved problems. In sion form of Hamilton’s rule. One term is the effect of an essence, the overly simple notions of an actor and a re- actor’s phenotype on its own fitness, holding constant the cipient and of a cost and benefit were too limiting. Those phenotype of its neighbors (cost). The other term is the restrictive assumptions limited general understanding of effect of the neighbors’ phenotype on the actor’s fitness, causes and the analysis of particular cases. To move holding constant the actor’s phenotype (benefit). These ahead, one had to think through various types of nat- cost and benefit terms arise as partial regression coeffi- ural history and to work out how to separate the many cients in the partitioning of fitness into components. In causes into distinct components. That causal decomposi- the Price equation, one automatically obtains a weighting tion was Hamilton’s original goal. But one no longer had of the benefit term by the regression of neighbor pheno- to be confined to the oversimplified abstractions in which type on actor phenotype (or genotype). That weighting Hamilton originally worked to show how causal decom- provides a measure of relatedness. Thus, the condition position might be done. Instead, the time had come for for the increase of an altruistic behavior is rB − C > 0, a more general approach. where each term directly corresponds to a regression co- The class structured modeling for social interactions efficient. introduced in Taylor and Frank [103] suggested the next Queller’s partial regression terms for cost and benefit steps. Taylor & Frank gave examples with multiple are similar to the cost and benefit terms of the marginal classes in social interactions. Those sorts of multiclass Hamilton’s rule. However, two differences are important. problems lead to more general causal decompositions First, Queller’s approach emphasizes the causal analy- with multiple actors and recipients and, more generally, sis of components. He explicitly related the regression with a broader way to reason about the causal compo- model of fitness to a path analysis model, which high- nents of realistic problems. Those multiclass problems lights a causal interpretation of the different terms in the also brought out the challenge of tracing the distinct regression. pathways of transmission and heritability that determine Second, the regression terms, although more general what fraction of the change by selection transmits to the and potentially interpreted by causal analysis, provide a future population. poor method for analysis of actual problems in terms of The way forward was to work carefully with causal comparative statics. Even simple models of dispersal and decompositions of fitness and the transmission of char- sex ratio lead to complex and essentially uninterpretable acters, to use the Price equation as a formal tool to 17 keep track of everything in an exact evolutionary anal- the direction of change in the altruistic phenotype x of ysis, and then to simplify as needed to obtain practical the first species. tools for comparative statics. Although that may sound The first species cannot accrue inclusive fitness benefits complicated, it turns out to be a natural extension of by helping the second species. An inclusive fitness ben- rB − C > 0. One simply needs additional causal terms efit is the indirect transmission of the actor’s genotype to match realistic problems, and to interpret the various through the recipient of the actor’s altruistic behavior. terms more broadly than Hamilton did. Members of another species cannot carry genotypic differences that influence the evolution of traits in the focal species. Inclusive fitness has no meaning in relation to THE PROPER GENERALIZATION OF KIN altruism between species. Nonetheless, we end up with SELECTION THEORY Hamilton’s rule, as follows. Focus on an individual in the first species with altru- Hamilton’s theory is ultimately about causal interpre- istic phenotype, x. That altruism reduces fitness by Cx. tation. The proper generalization arises from a clearer The focal individual may receive benefits from the al- understanding of causal decomposition. With regard to truism of the other species. Suppose that the particular the causal analysis of relatedness, Hamilton [2, 22] had social partners from the other species for the focal indi- already given up on the limited interpretation of the the- vidual have phenotype y, which adds By to the fitness of ory with regard to pedigree relations and classic notions the focal individual. The combination of gains and losses of kinship. Instead, he realized that the theory could be for these effects causes an increase in fitness to the focal extended to deal with genetic similarities no matter how individual when those similarities arise. Causally, selection must be in- different to the process that generates genetic similarity. By − Cx > 0. Selection can only act on the current patterns of genetic variation and the current processes that influence fitness. Now comes the key step: the association between the al- Subsequent generalizations continued to refine the truistic behaviors of partners from the two species. Sup- causal interpretation of selection. The theory naturally pose the slope (regression) of y relative to x is r. Then, transformed from its initial emphasis on identity by de- in evaluating fitness changes, we can use y = rx, because scent and lineal kin relations to statistical associations the altruistic level of the focal individual, x, predicts the of genotypes and then to broader aspects of correlated associated value of y. The translation between x and y characters in social interactions. is the regression coefficient, r. Using y = rx, the fitness change is positive when

An example: interspeciﬁc altruism Brx − Cx > 0,

My study of altruism between species taught me that which is the same as kin selection must be thought of as part of a wider set of problems [116]. I had asked the typical altruism question: when is an individual favored to help another at a cost rB − C > 0. to itself? In this case, the problem concerned whether an individual of one species could be favored to help an in- Is that Hamilton’s rule? If one abides by inclusive fit- dividual of another species. Clearly, the traditional view ness and the traditional view established by Hamilton, of genetic kinship could not be involved. Members of dif- then the answer is no. If one recognizes that the tra- ferent species that do not interbreed cannot be cousins ditional view came into being before we understood the or other types of related kin. broader analysis of characters and the general role of cor- I set up the interaction between species as a variant relations, then the answer is maybe. In the latter case, of Hamilton’s standard model. In this case, I evaluated we must figure out the broader context and its relations whether altruistic behavior by one species toward a sec- to traditional models of social evolution, and then make a ond species can increase by selection. decision about how to understand the full range of social Individuals of the first species have phenotype x, the characters. level of help they provide to individuals of the other After Taylor and Frank [103], I followed up by trying species. The altruistic phenotype directly reduces the fit- to apply the new theory to various problems of natu- ness of actors from the first species by Cx. Individuals of ral history. Several conceptual limitations became clear. the other species have phenotype y, the level of help they The most important problem concerned the meaning of provide to individuals of the first species. That altruis- relatedness. A second associated issue concerned the in- tic phenotype of the second species directly enhances the terpretation of inclusive fitness. The remainder of this fitness of recipients of the first species by By. The goal is section discusses relatedness. The following section takes to evaluate how these behavioral interactions determine up inclusive fitness. 18

Two types of relatedness: social partners and respect to the phenotype, x, is proportional to rB − C, transmission as shown in the previous section. Evolutionary change depends on the heritability of the phenotype, x. This fol- Queller’s [64, 73] quantitative genetic approach linked lows the typical combination in genetics: selection deter- kin selection to Lande & Arnold’s [115] general analysis mines relative reproduction, and heritability determines of multivariate selection. In the traditional multivariate the fraction of selective change that is transmitted to the approach, one usually thinks of two different phenotypes future. as traits present in each individual. That same concep- The heritability is not particularly important in this tual approach to multivariate analysis can also evaluate case. Suppose, for example, that the heritability is τ = social situations, in which one phenotype is present in one Rh. Here, R is the genetic kinship between the focal type of individual, and the other phenotype is present in individual and its descendants, and h is the fraction of the the social partners of the first type [14]. phenotypic variability attributed to genes. In a typical The two interacting types of individuals may be play- parent-offspring example, R = 1/2. If we compared an ers in a game, members of different species, members of individual to a niece through a full sibling, then typically a family, or any other combination. For any of those in- R = 1/4. The condition for x to increase must include teractions, we can often evaluate the consequences in the the heritability. Thus, the condition is same way that I described for the interaction between two different species. The focal individual has a pheno- (rB − C) τ > 0, type with direct cost Cx. That focal individual also has social partners that influence the focal individual’s fit- which describes the transmitted fraction, τ = Rh, of the ness by a factor By, where y is the average phenotype of selective gain, rB − C. If the trait is heritable, τ > 0, the social partners. These two phenotypes, x and y, lead then this condition is the same as rB − C > 0. Here, to a multivariate analysis of selection that depends on distinguishing the similarity between social partners, r, the correlation between characters. In this case, the two and transmission in proportion to R, did not change the characters happen to be in different individuals. But the result. analysis is essentially the same as a standard multivariate selection problem. The correlation between the focal individual’s pheno- Distinguishing the types of relatedness type, x, and its partners’ phenotype, y, is best expressed as a regression coefficient r of y on x. If partners are In many cases, one must distinguish the role of social genetic kin, the r will be a kind of genetic kinship co- partners from the role of transmission. Consider the fol- efficient. If the partners are genetically similar but not lowing example [14, Fig. 9]. There are two classes of by traditional lines of kinship, the r is still similar to the individuals. Individuals of the active class express an al- form of the relatedness coefficient that Hamilton [2] in- truistic phenotype, x, with cost C. A focal individual of troduced as the key to his theory, which did not depend the active class has social partners from the active class on traditional kinship ties. that express an average level of altruism, y, with benefit The partners may have correlated phenotypes, but be B to the focal individual. The fitness increment for a genetically uncorrelated. If so, our focal individual still focal individual in this active class with respect to the gains by the beneficial social phenotype of its partners. altruistic behaviors is w1 = By − Cx. The magnitude of that gain is in proportion to the same Each individual from the active class also has a sin- regression coefficient, r, in which the association is purely gle partner from a second, inactive class. Members of phenotypic. But we must separate two aspects: selection that second class do not express an altruistic phenotype. and transmission. That separation forms a standard part They may, for example, be relatives that receive care but of multivariate quantitative genetics. Selection is the dif- do not give care. The inactive partner gains a benefit, Bˆ, ferential success within a period, such as a behavioral from its active partner’s altruism, x, so its fitness incre- episode or a generation. Transmission is the fidelity by ment from altruism is w2 = Bxˆ . To simplify, we ignore which selected traits are transmitted to the future, the the potentially different reproductive values of the two heritability. classes. An inactive partner might, for example, be an In this model, we can think of two distinct classes of offspring or a nondescendant kin. individuals. The focal individual is both an actor and a From the previous section, the direct fitness effect on recipient for its own phenotype, x. It is an actor because class one is rB − C, where r is the regression of y on x. it expresses the phenotype; it is a recipient because its If the phenotypic association between an actor and its success is influenced by the same phenotype, x. The part- social partners, r, arises from genetic similarity, then r is ners are actors, because they express the phenotype, y. a classic kin selection coefficient of relatedness. However, But they are not recipients, because we have not specified nothing in the model requires genetic similarity. Here, r that either trait, x or y, affects the partners’ own success. only has to do with phenotypic similarity, because rB−C The focal individual is a recipient of the phenotype, y. is the fitness effect separated from aspects of transmission The total fitness increment on the focal individual with to the future. 19

The class one individuals transmit their phenotypes to Kin selection versus correlated selection the future in proportion to τ1, the heritability of their altruistic trait. Thus, the total direct transmitted com- I have emphasized a causal analysis of selection rather ponent by class one is proportional to (rB − C)τ1, which than a purely kinship analysis of selection. In the broader matches the result of the prior section. causal perspective, the two key factors are transmission ˆ In this case, we also have the beneficial effect, B, of the and selection. Transmission of social characters always class one individuals on their inactive partners from class depends on aspects of shared genotype, or at least on ˆ two. The benefit, B, describes the fitness effect separate shared heritable traits. For selection, correlated social from aspects of transmission to the future. For exam- phenotypes play the key role. Such correlations may arise ple, if the class two individuals are genetically unrelated by kinship, by shared genotype through processes other to their actor partners, then the enhanced fitness of the than kinship, or by associations through processes other class two individuals has no effect on the evolution of al- than shared genotype. truism, because those class two recipients are genetically With regard to correlated social phenotypes, it may unrelated to the actors. seem tempting to define the genetic associations as the In general, we may specify the heritability of a class one proper limited domain for kin selection theory. Hamilton actor’s phenotype, x, through its beneficial effect on its developed his theory by first analyzing classical pedigree class two partner, as τ2. For example, the class two part- kinship. He then broadened his scope to include shared ner may produce nieces and nephews of the actor, and genotype through processes other than kinship. But he τ2 would be the relatedness of the actor to its nieces and did not expand his theory to the general analysis of cor- nephews. But the more general interpretation is impor- related traits by processes other than shared genotype. tant: τ2 is the heritability, or transmitted information, There can never be a final resolution with regard to of the class one phenotype through its beneficial fitness the proper domain for kin selection theory. Ultimately, effect on its partner of class two. The net direct effect of ˆ subjective factors determine how different people choose selection and transmission on class two is Bτ2. to split domains and attach labels. If someone chooses Combining the class one and class two effects, the total to associate genetic correlations with ‘kin selection’ and transmitted consequence of the actor’s phenotype favors nongenetic associations with ‘correlated selection,’ that an increase in altruism when is fine as long as the choice is expressed clearly and understood as a subjective choice. (rB − C)τ1 + Bτˆ 2 > 0. (11) In the absence of analyzing particular problems, I This expression combines the role of correlated pheno- would be inclined to separate kin selection and corre- types between social partners, r, and the pathways for lated selection. Those processes seem different, and so it the fidelity of transmission to the future, τ. makes sense to differentiate between them. However, I This example illustrates how to combine the two dif- have repeatedly found that separating in that way is very ferent aspects of similarity, or relatedness, that arise in unnatural when actually analyzing particular problems models of social evolution. The general approach requires [15]. Neither the mathematics nor selection distinguish separating classes of individuals according to their role in the way in which phenotypic correlations between social the social process, following the direct fitness effects on partners arise. For example, in problems that follow the each class, weighting each class by the fidelity of trans- structure that led to eqn 11, the causal effect captured mission of the phenotype under study, and also weighting by the phenotypic correlation r depends only on the phe- each class by its reproductive value [13–15]. In practice, notypic association. The genetic aspects of transmission one typically uses the maximization technique of Taylor are handled independently by the τ terms. and Frank [103], as updated in Frank [13, 15]. For the phenotypic associations, one could choose to All of this follows the kind of causal decomposition at separate the causes of association into shared genotype the heart of Hamilton’s approach to kin selection theory. and other factors. That separation would distinguish But one has to accept several generalizations to the the- between a narrow interpretation of kin selection and a ory, otherwise the problem is beyond understanding by residual component of correlated selection. That separa- kin selection analysis. In particular, one must separate tion can certainly be useful. But repeatedly, in analyzing the correlation between phenotypes that influences fit- particular problems and in developing the underlying ab- ness from the correlation between an actor and various stract theory, the mathematics unambiguously leads one descendants that determines heritability. to blur the distinction when focusing on the causal anal- The original theories of kin selection and inclusive fit- ysis of how selection shapes phenotypes. For the partic- ness blurred the distinction between these different kinds ular causal component that concerns differential success of correlation. To understand the issues more broadly, separated from transmission, it is only the phenotypic one must accept a theory that follows different causes correlations that matter. of fitness, including correlations between social partners, Intuition often runs against the lessons urged by logical and different causes of transmission, including direct and and mathematical analyses. That discord is perhaps the indirect pathways by which phenotypes pass to future most interesting aspect of mathematics. People tend to generations [13–15]. split over that discord. Most trust their intuition above 20 all else. Some, having felt the failure of their intuition are direct effects on the individual expressing the behav- too many times in the face of unambiguous logic, give in ior, and some of the fitness changes are indirect effects to the mathematics. I follow the latter course, in which on other individuals receiving the behavior. This assign- it is much better to adjust intuition to mathematics and ment of all fitness effects back to the behavior that caused logic than to try and bend mathematics and logic to fit them provides a clearer sense of cause and effect. Clear intuition. causal analysis aids in reasoning about the evolution of In my view, the mathematics of selection has led in- complex social behaviors. For example, inclusive fitness evitably to certain developments in the theory. Over emphasizes that the effects of a behavior on the repro- time, the theory came to subsume the early ideas of kin duction of passive recipients can play a key role in deter- selection into a broader causal perspective. That broader mining whether genes associated with the behavior tend perspective is much more powerful when trying to an- to increase in frequency. alyze particular problems, and much simpler and con- Hamilton understood that direct fitness was the ulti- ceptually deeper when trying to grasp the fundamental mate measure for evolutionary analysis. His mathemati- principles of evolutionary change. However, tastes vary. cal studies primarily had to do with showing that inclu- Others will prefer to separate and label differently. If sive fitness was equivalent to direct fitness under many one properly understands the underlying theory, differ- conditions. Hamilton emphasized inclusive fitness as his ent labeling causes few problems and ultimately is not a primary contribution to understanding social evolution. particularly interesting issue. He discussed how inclusive fitness should be regarded as the fundamental process that encompasses kin selection, group selection, and other approaches to social interac- DIRECT AND INCLUSIVE FITNESS tions between genetically similar individuals [22]. Almost all debates about the costs and benefits of Hamilton’s ap- Consider two alternative ways to calculate fitness. The proach and descendant ideas focus on inclusive fitness. direct fitness method counts only the direct reproduction of individuals. If an individual behaves altruistically, we count only the negative effect of that behavior on the Development of the theory and failure of inclusive individual. If that individual’s social partners behave al- fitness truistically, then we add to the direct reproduction of our focal individual the benefit received from the altruism of Since Hamilton’s initial work, the study of social evo- neighbors. To calculate the total effect over the whole lution expanded to analyze a broader and more realistic population, we sum up all of the positive and negative range of evolutionary problems. In my view, inclusive effects on the direct fitness of each individual, based on fitness has become as much a hindrance as an aid to the individual’s own phenotype, the phenotype of each in- understanding. I am not saying that inclusive fitness is dividual’s social partners, and the heritabilities through wrong. Inclusive fitness does provide significant insight different pathways of transmission. into a wide variety of problems. But one must know ex- Inclusive fitness alters the assignment of fitness com- actly its limitations, otherwise trouble is inevitable. Re- ponents. If an individual behaves altruistically, we assign alistic biological scenarios arise for which inclusive fit- two components of fitness to that individual. The nega- ness is important but not sufficient. When one does not tive cost of altruism reduces the individual’s own repro- clearly recognize the boundaries then, when faced with duction. The benefit of altruism to neighbors increases a solution for which inclusive fitness is not sufficient, it the neighbors’ reproduction. We assign that increase in becomes too common to conclude that inclusive fitness neighbors’ fitness to the original altruistic individual that and all broader approaches to kin selection analysis fail caused that increase, rather than directly to the neigh- entirely, and one must discard the whole theory. bors themselves. We discount the neighbors’ fitness com- The issues are somewhat technical in nature. I pro- ponent by their genetic similarity to the altruistic indi- vided a full analysis and discussion in Frank [13, 15]. vidual. Thus, the individual that expressed the behavior Here, I give a sense of the problem and why it mat- is assigned both the direct effect on its own fitness and ters. I begin by briefly summarizing the main points the indirect effect on neighbors’ fitness discounted by ge- from the previous section, which distinguished alterna- netic similarity. tive measures of association between individuals. Sepa- rating those different kinds of association must be done clearly in order to understand the distinction between Hamilton’s approach direct fitness and inclusive fitness. In the previous analysis leading to eqn 11, two different Hamilton’s mathematical analysis showed that, under classes of individuals interacted. Consider first the direct some conditions, inclusive fitness provides the same cal- fitness of class one individuals. They lose the cost C for culation as direct fitness. Hamilton preferred inclusive their altruistic behavior. Their social partners from the fitness, because it assigns all fitness changes to the behav- same class express an altruistic behavior that provides an ior that causes the changes. Some of the fitness changes average benefit rB to a member of the class. The B is the 21 beneficial trait of partners per unit of costly phenotype sociation, no matter the underlying cause. It may be expressed by each individual, and r is the phenotypic that social partners in class one are genetically unrelated association between the costly behavior of an individual but phenotypically associated. Nonetheless, rB − C > 0 and the beneficial phenotype of partners. Thus, the to- is still the proper condition, although it is certainly not tal direct fitness effect on each individual of class one is an inclusive fitness expression in the manner usually un- proportional to rB − C. The heritability of the altruistic derstood by that theory. One can adjust definitions so phenotype for class one individuals is τ1, thus the heri- that inclusive fitness still works. But the clearest under- table increase in altruism from the direct effect of class standing comes from analyzing direct fitness, so that r one individuals is (rB − C)τ1. carries its natural interpretation as a phenotypic asso- A second class of individuals does not express the ciation that may be caused by shared genes or may be altruistic phenotype, but may carry genes for that caused by some other shared nongenetic process. phenotype—for example, genetic relatives that receive Alternatively, suppose that the phenotypic association care but do not give care. The net beneficial effect of between social partners in class one is zero, r = 0. Then altruism from class one on the direct fitness of class two the condition for the increase in altruism is individuals is Bˆ. The heritability for class two individu- ˆ als of the altruistic trait expressed by class one individ- −Cτ1 + Bτ2 > 0. uals is τ2. Thus, the total heritable increase in altruism through the direct reproduction of class two individuals Now consider the interpretation of the τ coefficients in terms of transmission and heritability. The ratio of in- is Bτˆ 2. Putting the direct fitnesses of the two classes together and weighting them equally leads to eqn 11 from direct heritability to direct heritability is R = τ2/τ1. the previous section, repeated here for convenience That coefficient, R, is the form of genetic relatedness commonly used in inclusive fitness theory. For inclusive (rB − C)τ1 + Bτˆ 2 > 0. fitness, one measures the relative transmission of causal genes through indirect compared with direct pathways Note the two different kinds of association, r and τ. of reproduction, which is the ratio of heritabilities. If, in The r coefficient measures the phenotypic association be- the prior expression, we divide by τ1, and use R = τ2/τ1, tween the altruistic expression in social partners from then we have class one. It does not matter how that phenotypic association arises. It may be caused by shared genotype, RB − C > 0, in which case it is a common type of genetic relatedness coefficient. Or it may be caused by shared environment, in which R is the inclusive fitness coefficient of related- such as sunlight or temperature, that is independent of ness. This form is the classic expression of Hamilton’s genotype. No matter the cause of the phenotypic asso- rule, which we may interpret with respect to inclusive ciation, the direct fitness of class one individuals is pro- fitness. portional to rB −C. The actual value of r is a regression The direct fitness approach gives the correct analysis in coefficient, and is sometimes called a coefficient of relat- all cases, with proper interpretation of r as a phenotypic edness. However, it is more general than a coefficient of association between social partners and τ as transmission relatedness, because many different kinds of causes may to the future through heritability. Inclusive fitness arises be involved. With regard to immediate evolutionary con- as a special case. By contrast, if one begins with an sequences, the cause of the association does not matter. inclusive fitness perspective, one has to struggle to get By contrast, the τ coefficients measure heritability, and the right interpretation, and confusion will often arise so can reasonably be understood as a measure of geno- with regard to both the analysis and the interpretation. typic contribution to the expression of the altruistic char- The actual distinctions between direct and inclusive acter. In the case of τ1, the measure is the heritability fitness are more extensive and more subtle [15, Chapter through the direct reproduction by an individual that 4]. Direct fitness typically provides a clear and complete expresses the altruistic behavior. The τ2 coefficient mea- analysis, and subsumes inclusive fitness as a special case. sures the direct contribution of class two individuals to Inclusive fitness does have the benefit of an intuitively the increase in the altruistic character, even though those appealing causal perspective. However, inclusive fitness individuals do not express the character. Because Bˆ rep- is more limited and more likely to cause confusion. As resents an increment in fitness caused by the behavior understanding of a subject develops, it is natural for yes- of class one individuals, the heritability of the altruis- terday’s general understanding to become today’s special tic phenotype expressed by class one individuals through case. the increment of fitness in class two individuals is proportional to the shared genotype between the class one actors and the class two recipients. UNDERSTANDING HOW SELECTION SHAPES If class two recipients are genetically unrelated to class PHENOTYPES one actors, then τ2 = 0, and the condition for the increase in altruism is rB − C > 0. That has the form Hamilton [2] originally set out to develop a causal de- of Hamilton’s rule. However, r measures phenotypic as- composition of social evolution into components. His de- 22 composition by inclusive fitness had two steps: the sep- to look like a kin selection analysis, and classical prob- aration of fitness into components and the analysis of lems of kin selection have come to look like a Lande & heritability. With regard to fitness, Hamilton’s approach Arnold type of analysis of multivariate selection, with the partitioned the total effect of a phenotype into the direct addition of a more complex analysis of heritability. consequence on the actor and the indirect consequence on social partners. With regard to heritability, Hamil- ton weighted the different fitness components by their Statics and the three measures of value fidelity of transmission relative to the phenotype in the focal individual. His coefficient of relatedness measured With regard to studying particular biological prob- the ratio of the heritability through the indirect fitness lems, I continue to favor comparative statics for its prag- component of social partners relative to the heritability matic approach. In analyses of comparative statics, kin through the direct reproduction by the actor. selection problems are transformed into the analysis of three measures of value: marginal value, reproductive value, and the valuations of relative transmission [15]. Multivariate selection and heritability Marginal values transform different phenotypic components into common units. Suppose, for example, that we Independently of Hamilton’s work, the theory of natu- analyze the marginal costs of a behavior associated with ral selection developed during the 1980s and 1990s. That the direct reproduction of an actor and the marginal ben- development primarily followed the influential paper by efits of that behavior associated with the indirect effect Lande and Arnold [115], which built on two earlier lines on the fitness of a social partner. The relative marginal of thought. First, Pearson [117] had established the par- valuations provide a substitution, or translation, mea- titioning of fitness into distinct components. Second, sure. That measure tells us, for each small change in phe- Fisher [118] had established the modern principles of her- notype, how much the marginal benefits change relative itability and the conceptual foundations of quantitative to how much the marginal costs change. For example, genetics. does a small change in the costs for direct reproduction The development of these two lines—the compo- translate into a small or a large change in the benefits nents of fitness and the components of heritability— for social partners? It is the relative marginal valuations independently paralleled the two lines in Hamilton’s that give us that translation. thought. In retrospect, the parallel development is not Marginal valuation only applies to the analysis of surprising. Anyone attempting a causal analysis of selec- small changes. More generally, when one analyzes large tion and evolutionary change would ultimately be led to changes, the regression coefficients from multivariate those same two essential parts of the problem. analysis arise. In that context, the regression coefficients In the early 1980s, I began to develop my own meth- serve as translations for the relative scaling between dif- ods for analyzing phenotypes influenced by kin selection. ferent phenotypes and components of fitness [19]. I started with the Price equation methods that I inher- Reproductive value provides a weighting for different ited from Hamilton’s graduate seminar in 1979. As I kinds of individuals with respect to their contribution to refined my methods of analysis and then eventually gen- the future of the population. Reproductive value is a eralized the approach with the help of Peter Taylor, I component of the transmission of phenotypes. However, was inevitably up against the two problems of partition- we separate reproductive value from heritability, because ing fitness into components and tracing pathways of her- reproductive value usually differs by demographic rather itability. However, until my work with Taylor in 1996, I than genetic aspects. For example, age is a demographic had not given much thought to the underlying structure property, and individuals of different ages have different of the problem. reproductive values, although they may have the same Following 1996, when I found Queller’s papers that heritability in transmission of their phenotypes. Ecolog- merged Hamilton’s kin selection theory with the Lande ical factors, such as available resources or the tendency and Arnold method of multivariate selection and quanti- for local extinctions of groups, also influence reproduc- tative genetics, I began work on developing that connec- tive valuation. In terms of classical demography, resource tion identified by Queller. The result is that kin selection availability may affect birth rates, and local extinctions and inclusive fitness became part of the broader approach of groups may affect death rates. to the study of natural selection [14, 15, 17–19]. With Valuations of relative transmission, or heritability, ob- that advance, it is no longer possible to separate cleanly viously play an essential role in tracing the causes of evo- between the initial view of kin selection as a special kind lutionary change by natural selection. The coefficients of of social problem among genetically similar individuals relatedness in the initial theories of kin selection had to and the broader approach of causal analysis for pheno- do with relative heritabilities through different pathways types, fitness, and heritability. of transmission. Those coefficients of relatedness are just From the merging of kin selection theory and the special cases of the broader analysis of heritabilities in broader aspects of selection and heritability, problems the general study of natural selection. like the analysis of altruism between species have come Social evolution and traditional kin selection problems 23 raise particular issues with regard to the three measures multigeneration groups of insects and other arthropods of value. However, the analysis of those social aspects that lived in isolated rotting logs [97]. As always, Hamil- falls within the broader framework of natural selection, ton combined his natural history observations with math- which applies to all problems of selection and the trans- ematical models to analyze natural selection. For these mission of phenotypes to future generations. This merg- group structured problems, he followed the hierarchical ing of kin selection and inclusive fitness into the broader multilevel methods of Price [120], as described in Hamil- framework for the study of selection has led to deeper un- ton [22]. derstanding and more powerful analytical approaches. At Interestingly, a Price equation analysis of group struc- the same time, the separate initial history of kin selection tured populations is similar to a Lande & Arnold analysis compared with the analysis of nonsocial problems some- of multivariate selection. In the case of group structure, times leads to confusion about the current understanding fitness depends on an individual’s phenotype and on the of the theory of social evolution. average phenotype of social partners in the group. That decomposition of fitness into individual and group components, when used in the Price equation, gives a causal THE FAILURE OF GROUP SELECTION decomposition that ascribes effects to the individual and group phenotypes. The causal component attributed to Properly understood, then, the origins of an groups may be interpreted as group selection. There is idea can help to show what its real content is; nothing special about using individual and group pheno- what the degree of understanding was before types in a Price equation analysis of fitness. If one used the idea came along and how unity and clarity an individual’s phenotype, the phenotype of the individ- have been attained. But to attain such un- ual’s mother, and temperature, one would get a decom- derstanding we must trace the actual course position in terms of those variables. The analysis works of discovery, not some course which we feel for any choice of variables that affect fitness. discovery should or could have taken, and we Lande and Arnold [115] also used the Price equation for must see problems (if we can) as the men of their analysis of multivariate selection. Lande & Arnold’s the past saw them, not as we see them today. approach was in fact very similar to the unpublished In looking for the origin of communication Price equation method Hamilton used to analyze the sex theory one is apt to fall into an almost track- ratios of fig wasps. However, Hamilton did not interpret less morass. I would gladly avoid this entirely his Price equation method broadly as the multivariate but cannot, for others continually urge their analysis of selection, but instead followed Price’s limited readers to enter it. I only hope that they will interpretation of partitioning individual and group com- emerge unharmed with the help of the follow- ponents of success. ing grudgingly given guidance [119, pp. 20– Following Hamilton, I began my own studies of dis- 21]. persal and sex ratios by thinking in terms of group structured natural history. The mathematical models A causal analysis of selection begins by expressing how partitioned individual and group components of fitness phenotypes and other variables influence the fitness of [22, 120]. Hamilton was not a committed group selec- individuals. In social problems, the characteristics of an tionist in the sense that began to develop in the 1980s. individual’s local group sometimes enter the expression Instead, Hamilton interpreted group structure as one way for individual fitness. If group characters influence fit- to get a positive genetic association between individuals, ness, then a causal component of selection is attributed as emphasized very clearly in the quotes from Hamil- to the group. That causal component attributed to the ton [22] that I presented in an earlier section. To some group is one common way in which group selection arises. extent, Hamilton’s strong focus on group structure arose Hamilton developed models of group selection by this from his inability to analyze phenotypes such as dispersal partition into individual and group characters. I used and sex ratios in terms of kin selection and inclusive fit- Hamilton’s group selection methods in my own early ness. He understood that those processes were the key, studies. Later, I came to understand the limitations and but he could not write down mathematical analyses in ultimate failure of the group selection perspective. I then terms of kin selection. He had access to Price’s methods merged kin selection theory with the general causal anal- for group structuring and so used that method instead. ysis of selection and transmission. Hamilton was not fully satisfied with his group level analysis of sex ratios as given in his 1979 notes from his graduate course at the University of Michigan. He Hamilton’s group selection models never published that analysis, perhaps because it showed only that greater genetic similarity within groups led to In the 1970s, Hamilton studied social phenotypes in a stronger kin selection effect. That vague point was group structured populations. He analyzed sex ratios and already obvious, as he had emphasized in his 1975 ar- dispersal polymorphisms of wasps that live in figs [98]. ticle. Simply to show that vague conclusion again for Each fig formed a clearly defined group. He also studied sex ratios did not add any real insight. Hamilton’s 24 class notes are posted as supporting information for this Pathways of causation replace group selection article in the files WDH notes part1 SuppInfo.pdf and WDH notes part2 SuppInfo.pdf (see published version in In the group structured sex ratio models, one can sep- Journal of Evolutionary Biology, DOI given on first page arate several distinct causes with respect to kin interac- of this article). tions. For example, the value of an additional son depends on the mother’s genetic relatedness to the males that her sons compete with for mates. Greater relatedness reduces the transmission benefit to a mother for an additional son. We can express the effect as the marginal My early group selection models gain in mating success through an additional son multiplied by the relative heritability of the mother’s sex ratio trait through sons minus the marginal loss in mating suc- I took up the empirical study of fig wasp sex ratios cess among competing males multiplied by the heritabil- in 1981. At that time, I also began to study Hamilton’s ity of the mother’s sex ratio trait through those compet- notes and to learn how to extend Price’s hierarchical mul- ing males. tilevel selection analysis to apply to my empirical work. The value of an additional daughter depends on the My initial success with that method was limited to seeing mother’s genetic relatedness to the females that her that greater group structuring and more limited migra- daughters compete with for access to resources. Greater tion increased the genetic similarity of individuals within relatedness reduces the transmission benefit to a mother groups. The more closely related group members are, the for an additional daughter. We can express the effect as stronger the kin selection effects. the marginal gain in reproductive success for an addi- By the reasoning from Hamilton’s 1975 paper and his tional daughter multiplied by the relative heritability of teaching, I understood the equivalence between the kin the mother’s sex ratio trait through daughters minus the selection perspective and the conclusion that the greater marginal loss in reproductive success among competing genetic variance among groups, the stronger the tendency females multiplied by the heritability of the mother’s sex for biased sex ratios. So one could call that a group ratio trait through those competing females. In addition selection perspective. But there was always the clear to the direct contributions through each sex, there are understanding that the ultimate causal basis arose from also effects of one sex on the other. For example, an kin selection or inclusive fitness perspectives. I summa- extra daughter may provide additional mating opportu- rized my early understanding of hierarchical multilevel nities for sons. selection and related group selection analyses in my first The full analysis showed how various causal pathways article, A hierarchical view of sex ratio patterns [121]. influence the predicted sex ratio [91–93]. Those pathways often include the genetic associations between competi- In my early work, I focused only on group structured tors, measured by coefficients of relatedness. Typically, populations. I maintained an unbiased dual perspective a coefficient of relatedness can be expressed either as the between kin and group selection. However, in that early genetic variance between groups divided by the total ge- work, I made limited progress toward teasing apart how netic variance in the population, or as a regression co- selection influenced sex ratio evolution. I was stuck at efficient that measures the genetic correlation between the same vague group selection perspective that stopped interacting individuals. The two interpretations are sim- Hamilton. I slowly figured out how to move ahead. Par- ply alternative expressions for the same measure. The ticular sex ratio models played a key role—the synergism first, group based expression for the measure suggests a between application and abstraction. group selection interpretation, whereas the second, indi- In a particular study, I analyzed the case in which vidual based expression suggests a kin-centric interpre- males competed for mates locally within groups, females tation. However, the measure is the same in both cases competed for resources against neighboring females, and [15, 92]. the males and females migrated varying distances before The problem with the group-based interpretation is mating. I then traced the causal processes that deter- that different causal pathways may be associated with mined the evolution of the sex ratio. Assuming that the different patterns of grouping. Or there may not be any mother controlled the sex ratio of her progeny, one could natural grouping. The pairwise correlations of kin se- adopt the mother’s perspective with regard to pathways lection theory do not require group structure. If there is of causation. This new work grew from Price and Hamil- no group structure, kin selection works perfectly whereas ton’s multilevel selection analyses, reflected in the title group selection fails. of a key article, Hierarchical selection theory and sex ra- Group selection, which initially provided a nice intu- tios. I. General solutions for structured populations [92]. itive way to think about group structured populations, Although that article emphasized hierarchical multilevel ultimately proved to be the limitation in understanding selection, it also placed the group structured perspective the evolution of phenotypes. Inevitably, I had to return into its proper role: a special case within the broader to the fundamental causal level, in which the correlations analysis of phenotypes by kin selection theory. between individual phenotypes and the different path- 25 ways of heritability were made clear. relations between phenotypes and pathways of genetic Once at the proper level of causation, one could see transmission. Group structuring is just one limited way that emphasis on groups hindered analysis. The proper in which phenotypic correlations and genetic transmis- view always derives from the causes of fitness and the sion pathways may be influenced. Second, insisting on pathways of transmission. The different causes of fitness a group perspective greatly limits the practical appli- rarely follow along a single pattern of grouping. For ex- cation of the theory to natural history. Most natural ample, males may interact over one spatial scale, females history problems do not have a single rigid group struc- may interact over another spatial scale, and mating be- ture shared by all causal processes. If one starts with tween males and females may occur on a third scale. It is a group selection perspective, solving problems becomes relatively easy to trace cause by the correlations between extremely difficult or impossible. No gain in understand- phenotypes, the ecological context of resource distribu- ing offsets the loss in analysis. tion, and the pathways of genetic transmission. Those Although group selection has problems that limit its causal components do not naturally follow the sort of scope, it also has attractive features. There is a natural rigid grouping needed for a group selection analysis to intuitive simplicity in group structured analysis. Total work. selection arises from the balance between the dynamics The last paragraph of Frank [92] emphasized how anal- of selection within groups and the dynamics of selection ysis of particular biological problems led to a deeper un- between groups. Altruistic characters often tend to lose derstanding of causal process: out during selection within groups and often tend to increase by selection between groups. The problem is that In summary, all three theories—inbreeding, once people gain such intuition, they do not easily give it within-sex competition among relatives, and up in the face of the inevitable conceptual and practical group selection truly describe causal mecha- limitations. Like the growth of any kind of understand- nisms of biased sex ratios in structured pop- ing, one must allow the first general insights to become ulations. Through the study of a variety of the special cases of broader conceptual and analytical scenarios with hierarchical selection theory, approaches. Pinning a topic to the first simple illustra- I draw the following conclusions. First, in- tive model limits progress. Concepts and their associated breeding biases the sex ratio since producing language naturally develop and transform over time. a daughter that inbreeds ... passes on twice Given this history, the idea of a group selection con- as many parental genes as producing a son troversy seems to me to be a logical absurdity. I do un- would. Second, as the amount of within- derstand the intuitive appeal of group selection. I was sex competition among related individuals trained by Hamilton to think about the interesting prop- increases, the relative genetic valuation of erties of groups and about the dynamical processes of that sex decreases. Third, genetic differentia- within-group and between-group selection. I also learned tion among groups ... and genetic correlation from Hamilton the mathematical techniques to analyze within groups ... are related descriptions for multilevel selection. My early conversion, however, did the same phenomenon. Some recent papers not last. Both the conceptual and practical limitations [122, 123] have stressed the group selection became apparent as I tried to make progress in under- aspect of this phenomenon without clarifying standing various problems of natural history and the its similarity to genetic relatedness. Using mathematical models needed to evaluate those problems. group selection for describing causal mech- In summary, when groups are the cause of genetic as- anisms is particularly slippery, since, as in sociations, then group structuring is the causal basis for the various scenarios presented in this paper, the associations that drive kin selection. When group the differentiation among groups may refer to structuring is less clear, the principles of kin selection groups of competing males, groups of compet- still hold, as they must. ing females, or groups that contain inbreeding pairs. While hierarchical selection theory, which is a group selection sort of analy- However, the controversy continues sis, has proved a powerful analytical tool, it seems that, for describing causal mechanisms, The Los Angeles Times newspaper published an inter- it is often useful to apply the genetic regres- view with E. O. Wilson on September 19, 2012. With sions [kin selection coefficients] considered in regard to kin and group selection, the interviewer began: the discussion. The biologist J.B.S. Haldane explained “kin selection” when he was asked whether he Logically, there cannot be a group selection would lay down his life for his brother. No, controversy he said, but he would for two brothers, or eight cousins. In the journal “Nature” in 2010 Two conclusions emphasize the failure of group selec- [52], you challenged kin selection and created tion. First, the ultimate causal processes concern cor- a stir, to say the least. 26

Wilson replied (the following is an exact unmodified ty- of the formula to groups but I think it was pographic transcription of the published article) less explicit and general than mine, indeed almost as if he was trying still to conceal his I was one of the main promoters of kin selec- formula’s full significance [120]. For myself, I tion back when it looked good. By the ‘90s consider the format of analysis [for multilevel I thought I heard the whine of wheels spin- selection] I was able to achieve through his ning. Willie Hamilton’s [British evolutionary idea brilliantly illuminating. biologist W.D. Hamilton] generalized rule [of kin selection] was that if you’ve got enough people looking after [relatives], society could Kin versus group selection in human sociality become very advanced. It wasn’t working. By 2010 I had published peer-reviewed ar- Alexander [125, 126] built his comprehensive evolu- ticles on what was thoroughly wrong [with tionary analysis of human sociality on the importance kin selection]. I said we’ve got to go back of group against group competition. In developing the to “multilevel selection.” Groups form, com- theoretical foundations for his analysis, Alexander had peting with one another for their share. It’s thoroughly reviewed issues of group selection. He ex- paramount in human behavior. The spoils pressed his thinking on this topic in an article with the tend to go to groups that do things better— title Group selection, altruism, and the levels of organi- in business, development, war and so on. zation of life [127]. I knew the biology. I saw that multiple-level I took my first undergraduate course in evolution from selection works, but in different ways in dif- Alexander at the University of Michigan in 1978. His ferent cases, and [with] my mathematical col- views on multilevel selection in Alexander and Borgia leagues, said [in the Nature article], kin selec- [127] were particularly important in shaping how I un- tion cannot work. We knew that was going derstood the subject. Interestingly, Alexander was not to be a paradigm changer. We published it much influenced by Hamilton’s multilevel selection anal- and the storm broke. ysis, which derived from the mathematical theories of Price. Instead, Alexander was an entomologist with a I agree with the three key points emphasized by Wilson deep interest in human behavior. He developed his think- in his article [52]: that multilevel selection is important; ing from broad consideration of natural history. that one should think about group selection in human so- A comment on the first page of Alexander and Borgia ciality; and that kin selection in relation to haplodiploidy [127] provides historical context is not sufficient to explain insect sociality. However, I do have a different perspective on some of the conceptual [E]volution by differential extinction of issues and the history of the subject. groups has recently been modelled or discussed anew by several authors ... E. O. Wil- son [128], for example, has argued that “In Multilevel selection the past several years a real theory of interpopulation selection has begun to be forged, I have emphasized Hamilton’s own interest in multi- with both enriched premises and rigorous level selection. Thus, Wilson’s way of opposing kin selec- model building. ... Insofar as the new theory tion versus multilevel selection does not make any sense considers the results of counteraction between to me. To expand briefly on this point with regard to group and individual selection, it will produce human sociality, note that Hamilton’s [22] primary pub- complex, nonobvious results that constitute lication on multilevel selection had the title Innate social testable alternatives to the hypothesis of in- aptitudes of man: an approach from evolutionary genet- dividual selection. My own intuitive feeling ics. In that article, Hamilton first developed his theoret- is that interpopulation selection is important ical perspective on human evolution by extending Price’s in special cases.” [120] hierarchical selection methods. Hamilton then de- voted approximately ten pages to group level perspec- Clearly, Wilson has long given thought to the potential tives on human sociality. importance of group structuring in nature. The point In Hamilton’s [124] collected works, he gave the here is that this line of thought goes back over 40 years, reprinted version of this article [22] the secondary title with much debate about conceptual issues and the rela- Friends, Romans, Groups, and wrote in the preface for tive importance for understanding natural history. How- the article: ever, most theories conclude that differential extinction of groups is usually a relatively weak force [129]. Only [I] am proud to have included the first pre- Hamilton’s milder version of group structuring in rela- sentation of Price’s natural selection formal- tion to differential reproduction and genetic differentia- ism as applied to group-level processes. ... tion between groups seems to be on solid ground as an He [Price] himself published one application explanation for common patterns of natural history. 27

I have argued throughout that Hamilton’s type of mul- never argued for a distinction between multilevel analy- tilevel selection is clearly a special case within the broader sis and kin selection. Instead, he saw multilevel analy- theory of kin selection. Although group extinctions are sis as one of the most powerful approaches to thinking usually thought to be outside the scope of kin selection about the general problems that arise in applications of theory, the merging of demography and kin selection by his theory. My own view follows Hamilton. In addition, Taylor and Frank [103] and Frank [15] brings those differ- the mathematics do not allow a logical distinction. Any ent processes within a single coherent theoretical frame- distinction must be injected by a particular bias with work. Bringing together those different processes is more respect to how one uses the words and interprets the his- than just a theoretical convenience. tory. However, as long as the historical and conceptual Suppose, for example, that a particular problem re- issues are made clear, it does not matter to me how one quires analyzing how an altruistic phenotype evolves. chooses to use the words. Indeed, given how clearly we That phenotype may affect the probability of extinction understand the theory, it puzzles me why so much atten- for its group. Its group may be composed of genetically tion and argument continues to focus on this issue. similar individuals, perhaps kin in the traditional sense. Returning to Alexander and Borgia [127], their main The phenotype may also have other costs and benefits point concerned how we should think about human evo- with regard to interacting with social partners. It is too lution. In their concluding remarks about humans, they hard to figure out what to expect by separately analyzing state extinctions, group variations in genetics, social processes of costs and benefits, and other processes. Nowak et al. Human social groups represent an almost [52] say that one must make a specific model for a specific ideal model for potent selection at the group case, and then one gets the right answer. True, but the level [23, 135–137]. First, the human species history of science shows unambiguously that one gains a is composed of competing and essentially hos- lot by understanding the abstract causal principles that tile groups that have not only behaved to- join different cases and different models within a common ward one another in the manner of differ- framework [17]. That truth leaves only the sort of causal ent species but have been able quickly to perspective I have emphasized as a reasonable candidate develop enormous differences in reproductive for how to think about such problems. and competitive ability because of cultural Certainly, not everyone agrees with me. Alexander innovation and its cumulative effects. Sec- and Borgia [127] understood Hamilton’s [22] claim that ond, human groups are uniquely able to plan group selection was just a special case of kin selection. and act as units, to look ahead, and to carry However, they rejected that point of view out purposely actions designed to sustain the group and improve its competitive position, That groups are often composed of kin does whether through restricting disruptive behav- not mean that kin selection and group selec- ior from within the group or through direct tion are in any sense synonymous [32, 130– collective action against competing groups. 132]. As West-Eberhard [133] points out, “In the same trivial sense that kin selection Alexander and Borgia [127] certainly understood the is group selection, all of natural selection is broader implications of multilevel selection analysis with group selection, since even ‘individual’ se- regard to a variety of biological problems. The first para- lection really concerns the summed genetic graph of their summary is contribution of a group—the individual’s offspring.” Moreover, although kin selection [T]here may be few problems in biology more can occur in continuously distributed pop- basic or vital than understanding the back- ulations, group selection cannot. For rea- ground and the potency of selection at dif- sons elaborated later, we agree with May- ferent levels in the hierarchies of organization nard Smith [134] that it is more appropriate of living matter. The approaches currently to distinguish kin selection and group selec- being used by evolutionary ecologists and be- tion than to blur their differences by consid- haviorists in assessing the likelihood of effec- ering them together. tive selection at the level of groups or populations of individuals may also be used to This quote shows the clear historical precedent for Wil- advantage by those concerned with function son’s argument that multilevel selection is distinct from at intragenomic levels. The kind of selection- kin selection. As often happens with historical analyses ist techniques used recently to analyze the of ideas in science, one can find significant antecedents behavior of nonhuman organisms may in the that support a variety of different positions. Because near future be widely applied toward under- a controversy about kin and group selection will always standing not only human social phenomena, come down to how one chooses to interpret words, there but a variety of phenomena of classical biol- can be no final resolution. ogy such as mitosis, meiosis, sex determina- What we can say, unambiguously, is that Hamilton tion, segregation distortion, linkage, cancer, 28

immune reactions, and essentially all prob- p. 415] stated that “the key to Hymenopteran lems in gene function and in ontogeny. success is haplodiploidy” and that “nothing but kin selection seems to explain the sta- The year 1978 was a long time ago. I do not understand tistical dominance of eusociality by the Hy- why these ideas are still thought of as novel or controver- menoptera” [132, p. 418]. A long list of simi- sial. lar evaluations of the 3/4 relatedness hypothesis by other authors could be cited. Insect sociality The main empirical evidence in favor of the 3/4 hypothesis is that eusociality seems to Wilson particularly emphasized the failure of kin selec- have arisen many more times in the hap- tion theory in explaining the evolution of advanced so- lodiploid Hymenoptera than in other insects ciality in insects [52]. Once again, we must consider what [31, 35]. This evidence initially appeared im- is meant by the scope of kin selection theory. I tend to pressive, but several recent findings indicate think of kin selection as a particular causal perspective that haplodiploidy and 3/4 relatedness be- within the broader theory of natural selection. Although tween sisters may have been of limited impor- it is tempting to limit the scope of kin selection theory tance for the evolution of eusociality. Other to certain simple scenarios and predictions, such limits factors have clearly been involved, and it never make sense in the logical or mathematical analysis seems possible that haplodiploidy has even of the subject. However, to the extent that others choose been insignificant compared to these factors. to distinguish more finely, that does not bother me as At least five lines of evidence cast doubt on long as the concepts and history are made clear. the overwhelming importance sometimes as- For those who view kin selection narrowly, the poten- cribed to haplodiploidy [and the narrowly de- tial problems of that narrow theoretical view for under- fined kin selection hypothesis]. standing the origin of complex sociality (eusociality) in insects was a lively topic in the 1970s and 1980s. An- The further details do not concern us here. The main dersson [34] introduces his excellent review of the topic point is that by 1984, the problems with a narrow inter- by noting that eusociality has arisen multiple times in pretation of kin selection for explaining eusociality had insects. He then turns to an early theory based on kin been widely discussed. selection to explain why eusociality is particularly common in bees, ants, and wasps (Hymenoptera). Summary Hamilton’s [10, 11] celebrated explanation is that haplodiploid sex determination in Hy- menoptera makes sisters share three quar- Inevitably, the debates about kin selection and group ters of their genes, whereas a daughter only selection will continue, because the ultimate problem receives half her genome from her mother. concerns different usage of words. People vary in whether Hymenopteran females may therefore prop- they prefer to emphasize differences or similarities be- agate their genes better by helping to raise tween components of a broader problem. Those who like reproductive sisters than by raising daugh- differences emphasize distinctions between kinship inter- ters of their own. Haplodiploidy therefore actions and group structuring of populations. Those who should make the evolution of nonreproduc- like similarities see kin and group selection as part of a tive female workers particularly likely among broader theory of natural selection. So what? Perhaps the Hymenoptera. This and other stimulat- the debate can advance a bit by a more nuanced consid- ing ideas of Hamilton’s started a revolution eration of the underlying concepts and history. in the study of social behavior, particularly of the role of kin selection [25, 84]. DISCUSSION Several entomologists have warned against overemphasis on the 3/4 relatedness hypothesis, and they have pointed to other factors im- Separation into component causes portant in the evolution of eusociality [23, 32, 35, 138–145]. Hamilton [10, 11, 85] and Wil- Kin selection theory analyzes the evolutionary causes son [31] also noted that haplodiploidy alone of social phenotypes. Causal analysis is not an alter- cannot explain eusociality in Hymenoptera. native to other analyses, such as population genetics. Such reservations were often forgotten, how- Rather, causal analysis brings out the factors that one ever, and the 3/4 hypothesis came to dom- must emphasize to understand pattern. Why do pheno- inate many textbook and popular accounts. types vary in the way that they do? What matters most? For example, in his comprehensive review What factors should one focus on to make testable pre- of social behavior in animals, Wilson [132, dictions? 29

Hamilton separated evolutionary change into three components of fitness may occur. Costs and benefits causes. A direct component affects the individual that are understood to depend on context. Relatedness co- expresses a phenotype—the cost. An indirect compo- efficients and their generalization by multiple regression nent affects social partners influenced by the focal indi- coefficients translate all fitness components into com- vidual’s phenotype—the benefit. To combine those two mon units of total change. Separation between selection components into the total evolutionary effect on the phe- and transmission clarifies the distinctions between causal notype, one must adjust the indirect component to have components. the same units as the direct component. The adjust- ment translates the indirect component of change into Methods of analysis for solving problems have been an equivalent amount of direct change. That translation developed to complement the causal decomposition. The is often a genetic measure of similarity between the in- limitations of the analytical methods and the causal de- dividuals affected directly and indirectly—the coefficient compositions are reasonably well understood. Causal of relatedness. decomposition and simplified analysis provide tools to Note the structure of causal analysis. The total change enhance understanding rather than alternatives to more is what matters. To understand that total change, we complex and detailed mathematical analyses of particu- separate it into parts that help to reason about the prob- lar problems. lem. Once we have a set of distinct parts, we have to combine those parts back into a common measure for total change. To combine properly, each part is weighted by a factor that translates into a consequence for total change. Separation into distinct causal processes leads to testable predictions about which causal components ex- Limitations of inclusive fitness plain patterns of variation. Separation also highlights the common causal basis that unites previously unconnected problems within a common conceptual framework. Hamilton introduced inclusive fitness as a particular Causal analysis by kin selection is never an alternative type of causal partition. Inclusive fitness assigns an in- to other analyses of total change. Rather, it is a pow- direct fitness effect through a social partner back to the erful complement to other approaches. In practice, kin behavior that caused the fitness effect. For example, if selection can be so powerful in analysis and so helpful an individual saves its sibling’s life, that fitness benefit is in the conceptual framing of problems, that one does not attached to the individual who saved the life rather than need other complementary methods. However, determin- the individual whose life was saved. That fitness benefit ing the best methods always depends on the particular is discounted by the genetic relatedness of the savior to goals. For example, in the study of alternative genetic as- the sibling. sumptions and complex aspects of dynamics, population genetic models provide superior methods. Inclusive fitness has the advantage of assigning changes in components of fitness to the phenotype that caused those changes. That causal decomposition can provide Hamilton’s rule much insight into evolutionary process. The problem arises because that very particular form of causal partitioning is often equated with the entire theory of kin Confusion over Hamilton’s rule arises when it is not selection. Instead, it is much better to view kin selec- properly understood as a partitioning of causes. The rule tion as a general approach to the causal analysis of social is the partition of total change for a social phenotype into processes. Inclusive fitness is a particular causal decom- direct and indirect components. It does not make sense position that helps in some cases and not in others. to consider whether the rule is true or false. Rather, following Hamilton, one thinks of the rule in two ways. For example, phenotypic associations between social First, is there a simple form for the partition of causes partners that do not share a common genotype can have that matches the ultimate measure of total change, at a very powerful effect on social evolution. Inclusive fit- least approximately and under particular conditions? If ness fails as a complete analysis of correlated phenotypes so, what are the proper definitions for the components? between social partners. That failure does not mean we Second, how should we expand Hamilton’s original causal should give up on trying to understanding the causes partition for more complex problems? of social evolution in such cases, or that we should con- Roughly speaking, Hamilton’s original expression in clude that kin selection theory fails as a general approach. terms of costs, benefits, and genetic relatedness provided Instead, we must understand the broader approach of a useful partition that works for simple problems. How- causal analysis, and how different aspects of natural his- ever, as the theory was applied to more realistic prob- tory should be understood from a broader causal per- lems, the associated causal analysis had to be extended. spective. That broader perspective was developed many The modern theory of kin selection provides a more com- years ago and has proved to be a powerful tool for ana- prehensive causal analysis. Multiple direct and indirect lyzing complex social interactions. 30

Correlated social phenotypes versus genetic ton pulled out the first clear abstraction that united vari- relatedness ous simple applications. Yet he could not use his abstract theory to move on to new applications. In particular, he When correlated social phenotypes do not arise from could not solve the problems of dispersal and sex ratios shared genotype, how should we think of the relatedness that arose from kin interactions. coefficients of kin selection theory? Proper causal anal- As the applied theory eventually developed for dis- ysis solves the problem. Changes in phenotypes cause persal, sex ratios and more complex social phenotypes, changes in fitness. Those fitness changes must be trans- deeper abstract principles emerged. For example, the lated into changes in the transmission of phenotypes to distinction between selection and transmission became the future population. In analyzing the causes of fitness clear, and relatedness coefficients became a part of trans- and the causes of transmission, we must put all the com- lating causal components into common units. The im- ponents together into a common measure of total change. provements in abstract theory enhanced the scope of ap- The weighting of the different components leads to dif- plication to complex social phenotypes. ferent types of regression coefficients. Those regression The necessary synergism between abstraction and ap- coefficients are the translations of different causal com- plication showed the ultimate failure of group selection. ponents into a common scale. In particular, group selection is a useful abstraction for a limited set of applications. When faced with a variety The fact that Hamilton’s original theory considered of applications, such as sex ratio evolution with multiple only a very particular aspect of transmission and genetic male and female interactions, group selection fails. In- relatedness has led to confusion. Hamilton’s original re- stead of the limited perspective of group selection, the gression coefficient of relatedness is not the single defining deeper abstract principles dominate. Those principles relatedness and regression coefficient of kin selection the- include a clear causal analysis of distinct fitness compo- ory. Rather, it is the particular coefficient that arises in nents, separation of selection and transmission, and the the special inclusive fitness analysis that Hamilton con- proper weighting of the distinct causal components to sidered in developing his theory. attain an overall analysis of total change. When different people focus exclusively on either abstraction or application, deep tension and fruitless debate arise. When the two modes come together, great progress Synergism between abstraction and application follows.

Abstraction arises by recognizing the common processes that recur in diﬀerent cases. Application demands ACKNOWLEDGMENTS analysis of particular phenotypes under particular cir- cumstances. Kin selection theory grew naturally by the National Science Foundation grant EF-0822399 sup- synergism between abstraction and application. Hamil- ports my research.

[1] M. A. Lazar and M. J. Birnbaum, “De-meaning of [10] W. D. Hamilton, “The genetical evolution of social metabolism,” Science 336, 1651–1652 (2012). behaviour. I.” Journal of Theoretical Biology 7, 1–16 [2] W. D. Hamilton, “Selﬁsh and spiteful behaviour in an (1964). evolutionary model,” Nature 228, 1218–1220 (1970). [11] W. D. Hamilton, “The genetical evolution of social be- [3] C. Darwin, On the Origin of Species by Means of Nat- haviour. II,” Journal of Theoretical Biology 7, 17–52 ural Selection (John Murray, London, 1859). (1964). [4] R. A. Fisher, The Genetical Theory of Natural Selection [12] T. W. Fawcett and A. D. Higginson, “Heavy use of equa- (Clarendon, Oxford, 1930). tions impedes communication among biologists,” Pro- [5] J. B. S. Haldane, “Population genetics,” New Biology ceedings of the National Academy of Sciences USA 109, 18, 34–51 (1955). 11735–11739 (2012). [6] G. C. Williams and D. C. Williams, “Natural selection [13] S. A. Frank, “Multivariate analysis of correlated se- of individually harmful social adaptations among sibs lection and kin selection, with an ESS maximization with special reference to social insects,” Evolution 11, method,” Journal of Theoretical Biology 189, 307–316 32–39 (1957). (1997). [7] J. B. S. Haldane, The Causes of Evolution (Cornell Uni- [14] S. A. Frank, “The Price equation, Fisher’s fundamental versity Press, Ithaca, NY, 1932). theorem, kin selection, and causal analysis,” Evolution [8] S. Wright, “Tempo and mode in evolution: a critical 51, 1712–1729 (1997). review,” Ecology 26, 415–419 (1945). [15] S. A. Frank, Foundations of Social Evolution (Princeton [9] W. D. Hamilton, “The evolution of altruistic behavior,” University Press, Princeton, New Jersey, 1998). American Naturalist 97, 354–356 (1963). [16] S. A. Frank, “Natural selection. III. Selection versus transmission and the levels of selection,” Journal of Evo- 31

lutionary Biology 25, 227–243 (2012). [38] D. C. Queller and J. E. Strassmann, “Kin selection and [17] S. A. Frank, “Natural selection. IV. The Price equa- social insects,” Bioscience 48, 165–175 (1998). tion,” Journal of Evolutionary Biology 25, 1002–1019 [39] K. R. Foster, T. Wenseleers, and F. L. W. Ratnieks, (2012). “Kin selection is the key to altruism,” Trends in Ecology [18] S. A. Frank, “Natural selection. V. How to read the & Evolution 21, 57–60 (2006). fundamental equations of evolutionary change in terms [40] R. Trivers, Social Evolution (Benjamin/Cummings, of information theory,” Journal of Evolutionary Biology Menlo Park, CA, 1985). 25, 2377–2396 (2012). [41] J. Maynard Smith and E. Szathmáry, The Major Tran- [19] S. A. Frank, “Natural selection. VI. Partitioning the sitions in Evolution (Freeman, San Francisco, 1995). information in fitness and characters by path analysis,” [42] B. J. Crespi, “The evolution of social behavior in mi- Journal of Evolutionary Biology (in press) (2013). croorganisms,” Trends in Ecology & Evolution 16, 178– [20] G. R. Price, “Selection and covariance,” Nature 227, 183 (2001). 520–521 (1970). [43] R. E. Michod and D. Roze, “Cooperation and conflict [21] A. Grafen, “A geometric view of relatedness,” Oxford in the evolution of multicellularity,” Heredity 86, 1–7 Surveys in Evolutionary Biology 2, 28–89 (1985). (2001). [22] W. D. Hamilton, “Innate social aptitudes of man: an [44] S. A. West, S. P. Diggle, A. Buckling, A. Gardner, and approach from evolutionary genetics,” in Biosocial An- A. S. Griffin, “The social lives of microbes,” Annu. Rev. thropology, edited by R. Fox (Wiley, New York, 1975) Ecol. Syst. 38, 53–77 (2007). pp. 133–155. [45] Stuart A. West, Sex Allocation (Princeton University [23] R. D. Alexander, “The evolution of social behavior,” Press, Princeton, NJ, 2009). Annual Review of Ecology and Systematics 5, 325–383 [46] A. Burt and R. Trivers, Genes in Conflict: The Biology (1974). of Selfish Genetic Elements (Belknap Press, Cambridge, [24] R. Dawkins, “Twelve misunderstandings of kin se- MA, 2008). lection,” Zeitschrift für Tierpsychologie 51, 184–200 [47] N. B. Davies, J. R. Krebs, and S. A. West, An Introduc- (1979). tion to Behavioural Ecology, 4th ed. (Wiley-Blackwell, [25] R. E. Michod, “The theory of kin selection,” Annual Oxford, UK, 2012). Review of Ecology and Systematics 13, 23–55 (1982). [48] M. K. Uyenoyama, M. Feldman, and L. D. Mueller, [26] L. Lehmann and L. Keller, “The evolution of coopera- “Population genetic theory of kin selection: multi- tion and altruism—a general framework and a classifi- ple alleles at one locus,” Proceedings of the National cation of models,” Journal of Evolutionary Biology 19, Academy of Sciences USA 78, 5036–5040 (1981). 1365–1376 (2006). [49] M. K. Uyenoyama and M. Feldman, “Population genetic [27] T. Wenseleers, “Modelling social evolution: the rela- theory of kin selection. II. The multiplicative model,” tive merits and limitations of a hamilton’s rule-based American Naturalist 120, 614–627 (1982). approach,” Journal of Eevolutionary Biology 19, 1419– [50] S. Karlin and C. Matessi, “Kin selection and altruism,” 1422 (2006). Proceedings of the Royal Society B 219, 327–353 (1983). [28] L. A. Dugatkin, “Inclusive fitness theory from Darwin [51] B. Kerr, P. Godfrey-Smith, and M. W. Feldman, “What to Hamilton,” Genetics 176, 1375–1380 (2007). is altruism?” Trends in Ecology & Evolution 19, 135– [29] A. F. G. Bourke, Principles of Social Evolution (Oxford 140 (2004). University Press, Oxford, UK, 2011). [52] M. A. Nowak, C. E. Tarnita, and E. O. Wilson, “The [30] A. Gardner, S. A. West, and G. Wild, “The genetical evolution of eusociality,” Nature 466, 1057–1062 (2010). theory of kin selection,” Journal of Evolutionary Biology [53] R. C. Lewontin, “The units of selection,” Annual Re- 24, 1020–1043 (2011). view of Ecology and Systematics 1, 1–18 (1970). [31] E. O. Wilson, The Insect Societies (Harvard University [54] R. Dawkins, The Extended Phenotype (Freeman, San Press, Cambridge, Massachusetts, 1971). Francisco, 1982). [32] M. J. West-Eberhard, “The evolution of social behavior [55] L. Keller, ed., Levels of Selection in Evolution (Prince- by kin selection,” Quarterly Review of Biology 50, 1–34 ton University Press, Princeton, NJ, 1999). (1975). [56] S. Okasha, Evolution and the Levels of Selection (Oxford [33] R. L. Trivers and H. Hare, “Haplodiploidy and the University Press, New York, 2006). evolution of the social insects,” Science 191, 249–263 [57] M. J. Wade, “A critical review of the models of group (1976). selection,” Quarterly Review of Biology 53, 101–114 [34] M. Andersson, “The evolution of eusociality,” Annual (1978). Review of Ecology and Systematics 15, 165–189 (1984). [58] M. Uyenoyama and M. W. Feldman, “Theories of kin [35] H. J. Brockmann, “The evolution of social behavior in and group selection: a population genetics perspective.” insects,” in Behavioural Ecology: An Evolutionary Ap- Theoretical Ppopulation Biology 17, 380–414 (1980). proach, edited by J. R. Krebs and N. B. Davies (Black- [59] M. J. Wade, “Soft selection, hard selection, kin selec- well, Oxford, 1984) 2nd ed., pp. 340–361. tion, and group selection,” American Naturalist 125, [36] R. D. Alexander, K. M. Noonan, and B. J. Crespi, 61–73 (1985). “The evolution of eusociality,” in The Biology of the [60] D. S. Wilson, “The group selection controversy: history Naked Mole Rat, edited by P. W. Sherman, J. U. M. and current status,” Annual Review of Ecology and Sys- Jarvis, and R. D. Alexander (Princeton University tematics 14, 159–187 (1983). Press, Princeton, New Jersey, 1991) pp. 3–44. [61] A. Grafen, “Natural selection, kin selection and group [37] A. F. G. Bourke and N. R. Franks, Social Evolution selection,” in Behavioural Ecology, edited by J. R. Krebs in Ants (Princeton University Press, Princeton, New and N. B. Davies (Blackwell Scientific Publications, Ox- Jersey, 1995). ford, 1984) pp. 62–84. 32

[62] L. Nunney, “Group selection, altruism, and structured- Journal of Evolutionary Biology 20, 301–309 (2006). deme models,” American Naturalist , 212–230 (1985). [82] A. Gardner, S. A. West, and N. H. Barton, “The rela- [63] I. L. Heisler and J. Damuth, “A method for analyz- tion between multilocus population genetics and social ing selection in hierarchically structured populations,” evolution theory,” American Naturalist 169, 207–226 American Naturalist 130, 582–602 (1987). (2007). [64] D. C. Queller, “A general model for kin selection,” Evo- [83] J. A. Fletcher and M. Doebeli, “A simple and general lution 46, 376–380 (1992). explanation for the evolution of altruism,” Proceedings [65] L. A. Dugatkin and H. K. Reeve, “Behavioral ecology of the Royal Society B 276, 13–19 (2009). and levels of selection: dissolving the group selection [84] J. Maynard Smith, “Group selection and kin selection,” controversy,” Advances in the Study of Behavior 23, Nature 201, 1145–1147 (1964). 101–133 (1994). [85] W. D. Hamilton, “Altruism and related phenomena, [66] J. Soltis, R. Boyd, and P. J. Richerson, “Can group- mainly in social insects,” Annual Review of Ecology and functional behaviors evolve by cultural group selection? Systematics 3, 193–232 (1972). an empirical test,” Current Anthropology 36, 473–494 [86] W. D. Hamilton, “Extraordinary sex ratios,” Science (1995). 156, 477–488 (1967). [67] E. Sober and D. S. Wilson, Unto Oothers: The Evo- [87] W. D. Hamilton and R. M. May, “Dispersal in stable lution and Psychology of Unselfish Behavior (Harvard habitats,” Nature 269, 578–581 (1977). University Press, Cambridge, MA, 1998). [88] I. Eshel, “On the neighbor effect and the evolution of al- [68] J. Henrich, “Cultural group selection, coevolutionary truistic traits,” Theoretical Population Biology 3, 258– processes and large-scale cooperation,” Journal of Eco- 277 (1972). nomic Behavior & Organization 53, 3–35 (2004). [89] R. Levins, “Extinction,” in Some Mathematical Ques- [69] A. Traulsen and M. A. Nowak, “Evolution of cooper- tions in Biology, Vol. 2, edited by M. Gerstenhaber ation by multilevel selection,” Proceedings of the Na- (American Mathematical Society, Providence, RI, 1970) tional Academy of Sciences USA 103, 10952–10955 pp. 77–107. (2006). [90] S. A. Frank, Theoretical and empirical studies of sex [70] S. A. West, A. S. Griffin, and A. Gardner, “Social se- ratios, mainly in fig wasps (M.S. Thesis, University of mantics: how useful has group selection been?” Journal Florida, Gainesville, 1983). of Evolutionary Biology 21, 374–385 (2008). [91] S. A. Frank, “Hierarchical selection theory and sex ra- [71] E. G. Leigh, Jr., “The group selection controversy,” tios. II. On applying the theory, and a test with fig Journal of Evolutionary Biology 23, 6–19 (2010). wasps,” Evolution 39, 949–964 (1985). [72] J. M. Cheverud, “Evolution by kin selection: a quantita- [92] S. A. Frank, “Hierarchical selection theory and sex ra- tive genetic model illustrated by maternal performance tios I. General solutions for structured populations,” in mice,” Evolution 38, 766–777 (1984). Theoretical Population Biology 29, 312–342 (1986). [73] D. C. Queller, “Quantitative genetics, inclusive fitness, [93] S. A. Frank, “The genetic value of sons and daughters,” and group selection,” American Naturalist 139, 540–558 Heredity 56, 351–354 (1986). (1992). [94] S. A. Frank, “Are mating and mate competition by [74] J. B. Wolf, E. D. Brodie, J. M. Cheverud, A. J. Moore, the fig wasp Pegoscapus assuetus (Agaonidae) random and M. J. Wade, “Evolutionary consequences of indirect within a fig?” Biotropica 17, 170–172 (1985). genetic effects,” Trends in Ecology and Evolution 13, [95] S. A. Frank, “Dispersal polymorphisms in subdivided 64–69 (1998). populations,” Journal of Theoretical Biology 122, 303– [75] J. B. Wolf, E. D. Brodie III, and A. J. Moore, “In- 309 (1986). teracting phenotypes and the evolutionary process. II. [96] S. A. Frank, “Demography and sex ratio in social spi- Selection resulting from social interactions,” American ders,” Evolution 41, 1267–1281 (1987). Naturalist 153, 254–266 (1999). [97] W. D. Hamilton, “Evolution and diversity under bark,” [76] P. Bijma and M. J. Wade, “The joint effects of kin, mul- in Diversity of Insect Faunas, edited by L. A. Mound tilevel selection and indirect genetic effects on response and N. Waloff (Blackwell Scientific Publications, Lon- to genetic selection,” Journal of Evolutionary Biology don, 1978) pp. 154–175. 21, 1175–1188 (2008). [98] W. D. Hamilton, “Wingless and fighting males in fig [77] J. W. McGlothlin, A. J. Moore, J. B. Wolf, and E. D. wasps and other insects,” in Reproductive Competition Brodie III, “Interacting phenotypes and the evolution- and Sexual Selection in Insects, edited by M. S. Blum ary process. III. Social evolution,” Evolution 64, 2558– and N. A. Blum (Academic Press, New York, 1979) pp. 2574 (2010). 167–220. [78] J. B. Wolf and A. J. Moore, “Interacting phenotypes [99] U. Motro, “Optimal rates of dispersal I. Haploid pop- and indirect genetic effects: a genetic perspective on the ulations,” Theoretical Population Biology 21, 394–411 evolution of social behavior,” in Evolutionary Behav- (1982). ioral Ecology, edited by C. W. Fox and D. F. Westneat [100] U. Motro, “Optimal rates of dispersal II. Diploid pop- (Oxford University Press, New York, 2010) pp. 225–245. ulations,” Theoretical Population Biology 21, 412–429 [79] F. Rousset, Genetic Structure and Selection in Subdi- (1982). vided Populations (Princeton University Press, Prince- [101] U. Motro, “Optimal rates of dispersal III. Parent- ton, NJ, 2004). offspring conflict,” Theoretical Population Biology 23, [80] A. Grafen, “Optimization of inclusive fitness,” Journal 159–168 (1983). of Theoretical Biology 238, 541–563 (2006). [102] J. Maynard Smith, Evolution and the Theory of Games [81] P. D. Taylor, G. Wild, and A. Gardner, “Direct fitness (Cambridge University Press, Cambridge, 1982). or inclusive fitness: how shall we model kin selection?” 33

[103] P. D. Taylor and S. A. Frank, “How to make a kin selec- (1981). tion model,” Journal of Theoretical Biology 180, 27–37 [124] W. D. Hamilton, Narrow Roads of Gene Land (Free- (1996). man, San Francisco, 1996). [104] D. C. Queller, “Kinship, reciprocity and synergism in [125] R. D. Alexander, Darwinism and Human Affairs (Uni- the evolution of social behavior,” Nature 308, 366–367 versity of Washington Press, Seattle, 1979). (1985). [126] R. D. Alexander, The Biology of Moral Systems (Aldine [105] D. S. Wilson, G. B. Pollock, and L. A. Dugatkin, “Can de Gruyter, New York, 1987). altruism evolve in purely viscous populations?” Evolu- [127] R. D. Alexander and G. Borgia, “Group selection, al- tionary Ecology 6, 331–341 (1992). truism, and the levels of organization of life,” Annual [106] P. D. Taylor, “Altruism in viscous populations—an in- Review of Ecology and Systematics 9, 449–474 (1978). clusive fitness approach,” Evolutionary Ecology 6, 352– [128] E. O. Wilson, “The origin of sex,” Science 188, 139–140 356 (1992). (1975). [107] P. D. Taylor, “Inclusive fitness in a heterogeneous en- [129] J. Maynard Smith, “Group selection,” Quarterly Re- vironment,” Proceedings of the Royal Society B 249, view of Biology 51, 277–283 (1976). 299–302 (1992). [130] G. C. Williams, Adaptation and Natural Selection [108] D. C. Queller, “Genetic relatedness in viscous popula- (Princeton University Press, Princeton, NJ, 1966). tions,” Evolutionary Ecology 8, 70–73 (1994). [131] D. S. Wilson, “A theory of group selection,” Proceedings [109] A. B. Clark, “Sex ratio and local resource competition of the National Academy of Sciences USA 72, 143–146 in a prosiminian primate,” Science 201, 163–165 (1978). (1975). [110] C. Read, Logic, Deductive and Inductive (A. Moring, [132] E. O. Wilson, Sociobiology: The New Synthesis (Belk- London, 1909). nap Press, Harvard, Cambridge, MA, 1975). [111] P. A. Samuelson, Foundations of Economic Analy- [133] M. J. West-Eberhard, “Born: Sociobiology,” Quarterly sis, enlarged ed (Harvard University Press, Cambridge, Review of Biology 51, 89–92 (1976). Massachusetts, 1983). [134] J. Maynard Smith, “The origin and maintenance of [112] J. A. Schumpeter, History of Economic Analysis (Ox- sex,” in Group Selection, edited by G. C. Williams ford University Press, New York, 1954). (Aldine-Atherton, Chicago, 1971) pp. 163–175. [113] S. A. Frank, “Mutual policing and repression of compe- [135] R. D. Alexander, “The search for an evolutionary phi- tition in the evolution of cooperative groups,” Nature losophy of man,” Proceedings of the Royal Society of 377, 520–522 (1995). Victoria 84, 99–120 (1971). [114] B. Charlesworth, Evolution in Age–Structured Popula- [136] R. D. Alexander, “The search for a general theory of tions, 2nd ed. (Cambridge University Press, Cambridge, behavior,” Behavioral Science 20, 77–100 (1975). 1994). [137] E. O. Wilson, “Group selection and its significance for [115] R. Lande and S. J. Arnold, “The measurement of selec- ecology,” Bioscience 23, 631–638 (1973). tion on correlated characters,” Evolution 37, 1212–1226 [138] J. S. Kennedy, “Some outstanding questions in insect (1983). behaviour,” in Symposium of the Royal Entomological [116] S. A. Frank, “Genetics of mutualism: the evolution of Society of London, Vol. 3, edited by P. T. Haskell (1966) altruism between species,” Journal of Theoretical Biol- pp. 97–112. ogy 170, 393–400 (1994). [139] N. Lin and C. D. Michener, “Evolution of sociality in in- [117] K. Pearson, “Mathematical contributions to the theory sects,” Quarterly Review of Biology 47, 131–159 (1972). of evolution. XI. On the influence of natural selection on [140] C. D. Michener, The Social Behavior of Bees: A Com- the variability and correlation of organs,” Philosophical parative Study (Harvard University Press, Cambridge, Transactions of the Royal Society of London. Series A MA, 1974). 200, 1–66 (1903). [141] C. D. Michener and D. J. Brothers, “Were workers of [118] R. A. Fisher, “The correlation between relatives on the eusocial Hymenoptera initially altruistic or oppressed?” supposition of Mendelian inheritance,” Transactions of Proceedings of the National Academy of Sciences USA the Royal Society of Edinburgh 52, 399–433 (1918). 71, 671–674 (1974). [119] J. R. Pierce, An Introduction to Information Theory: [142] M. J. West-Eberhard, “Polygyny and the evolution of Symbols, Signals and Noise (Dover Publications, New social behavior in wasps,” Journal of the Kansas Ento- York, 1980). mological Society 51, 832–856 (1978). [120] G. R. Price, “Extension of covariance selection mathe- [143] H. E. Evans, “Extrinsic versus intrinsic factors in the matics,” Annals of Human Genetics 35, 485–490 (1972). evolution of insect sociality,” BioScience 27, 613–617 [121] S. A. Frank, “A hierarchical view of sex-ratio patterns,” (1977). The Florida Entomologist 66, 42–75 (1983). [144] R. H. Crozier, “Genetics of sociality,” in Social Insects, [122] R. K. Colwell, “Group selection is implicated in the evo- Vol. 1, edited by H. R. Hermann (Academic Press, New lution of female-biased sex ratios,” Nature 290, 401–404 York, 1979) pp. 223–286. (1981). [145] G. C. Eickwort, “Presocial insects,” in Social Insects, [123] D. S. Wilson and R. K. Colwell, “The evolution of Vol. 2, edited by H. R. Hermann (Academic Press, New sex ratio in structured demes.” Evolution 35, 882–897 York, 1981) pp. 199–280.