Adaptation in natural populations: integrating phenotypic and genetic perspectives

Timothy James Thurman

Redpath Museum and Department of Biology McGill University Montréal, Québec, Canada

October 2019

A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Doctor of Philosophy

c Timothy James Thurman, 2019 Contents

Abstract iii

Résumé v

Acknowledgements viii

Contributions to Original Knowledge xi

Thesis Format xiii

Contribution of Authors xiv

List of Figures xvi

List of Tables xvii

General Introduction 1

1 The genetic consequences of selection in natural populations 13 1.1 Abstract ...... 13 1.2 Introduction ...... 14 1.3 Materials and methods ...... 18 1.4 Results ...... 22 1.5 Discussion ...... 24 1.6 Conclusions ...... 34 1.7 Boxes ...... 35 1.8 Figures and Tables ...... 40 Bibliography ...... 55

Linking Statement 1 56

2 Movement of a Heliconius hybrid zone over 30 years: a Bayesian approach 57 2.1 Abstract ...... 57 2.2 Introduction ...... 58 2.3 Materials and Methods ...... 60 2.4 Results ...... 66 2.5 Discussion ...... 69

i 2.6 Figures and Tables ...... 73 Bibliography ...... 76

Linking Statement 2 82

3 Predicting evolutionary change from ecological interactions: a field experiment with Anolis 83 3.1 Abstract ...... 83 3.2 Introduction ...... 84 3.3 Materials and Methods ...... 87 3.4 Results ...... 95 3.5 Discussion ...... 99 3.6 Figures ...... 105 Bibliography ...... 109

General Discussion & Conclusion 117

A Appendix 124 A.1 Supplemental Material for Chapter 1 ...... 125 A.2 Theory Appendix to Chapter 1 ...... 147 A.3 Supplemental Material for Chapter 2 ...... 152 A.4 Supplemental Material for Chapter 3 ...... 172

Bibliography 199

ii Abstract

A central goal of evolutionary biology is to understand how organisms adapt to their envi- ronment. Though much progress has been made in answering this question, many aspects of the process of adaptation remain mysterious, particularly at the genetic level. This is especially true for biologists’ understanding of the genetic basis of adaptation in natural populations of organisms, as technological advances in sequencing have only recently made it possible to perform genomic re- search on non-model organisms. My dissertation integrates phenotypic and genetic perspectives to advance our understanding of selection and adaptation in natural populations of organisms. I take multiple approaches to this question, combining meta-analysis, population surveys, and manipu- lative experiments in the field. In my first chapter, I explore the consequences of natural selection on genetic variants. In many population genetic models, selection is parameterized as the selec- tion coefficient, s. Despite the selection coefficient’s central importance to models of evolution, until recently we have known very little about plausible values of s for genetic variants in natural populations. Through a meta-analysis of over 3000 selection coefficients from 79 studies, I reveal generalities about how natural selection operates at the genetic level. I relate these results to pop- ulation genetic theory and studies of phenotypic selection, and provide recommendations for the calculation, interpretation, and reporting of selection coefficients. In my second chapter, I consider natural selection and adaptation within a rapidly moving hybrid zone. I study a hybrid zone be- tween two races of Heliconius erato butterfly that differ in colour pattern. Because the genetic loci responsible for variation in colour pattern in H. erato are well characterized, I consider selection at the phenotypic and genetic levels simultaneously. I develop new statistical methods for quantifying hybrid zone position and shape and apply these to historical data and new collections to show that

iii over the last 15 years the H. erato hybrid zone has grown wider while its movement has slowed. I show that this is due to a decrease in the strength of selection on mimetic colour pattern and the underlying colour-pattern allele. I then use remotely-sensed data on forest loss and productivity to test hypotheses about the ecological forces that influence hybrid zone dynamics. In my final chapter, I examine whether phenotypic and genetic change are predictable. I take an experimental approach, using a large-scale, long-term, eco-evolutionary field study with Anolis sagrei lizards. Anoles are an exemplar of parallel evolution across an adaptive radiation, and their interactions with competitor and predator have been well-studied in within-generation experiments. This provides clear predictions for how these ecological interactions might drive adaptive evolu- tion over multiple generations. I test these predictions by manipulating the presence and absence of predator and competitor species in a factorial design across 16 small islands in the Bahamas. I measure changes in a suite of morphological traits relevant to habitat use and performance, and use next-generation sequencing to characterize changes in allele frequency across the genome. Despite strong and consistent effects of predators and competitors on behavior, diet, and population size in A. sagrei, I found that phenotypic and genetic change were difficult to predict in advance. Phe- notypic change, though not stochastic, was related to variation in vegetation structure and densities across islands, making a priori prediction challenging. Genetic change, on the other hand, was unpredictable and unrelated to either our experimental manipulations, phenotypic change, or environmental differences. My work reveals the necessity of ecological data and knowledge of natural history for predicting natural selection, and shows how field experiments can be used to test and clarify hypotheses about how natural selection operates. Overall, my dissertation demon- strates that integrating phenotypic and genetic perspectives can help biologists understand how natural selection operates in the wild. In particular, it shows the value of combining these perspec- tives with detailed ecological data, novel statistical techniques, and experimentation to directly test hypotheses about evolution in natural populations.

iv Résumé

Un des objectifs centraux de la biologie de l’évolution est de comprendre comment les organismes s’adaptent à leur environnement. Bien que beaucoup d’avancées aient été faites à ce propos, de nombreux aspects du processus d’adaptation restent mystérieux, en particulier au niveau génomique. Cela est particulièrement vrai en ce qui concerne notre compréhension des bases géné- tiques de l’adaptation en populations naturelles, puisque les avancées technologiques en matière de séquençage n’ont permis que récemment de mener des recherches génomiques sur des organismes « non-modèles », c’est-à-dire des organismes où peu est connu ou étudié en matière de génétique. Ma thèse intègre des perspectives phénotypiques et génétiques pour développer notre compréhen- sion de la sélection et de l’adaptation au sein de populations naturelles. J’adopte plusieurs méthodes de recherches dans ma dissertation, combinant méta-analyse, suivis de population et des manip- ulations expérimentales sur le terrain. Dans mon premier chapitre, j’explore les conséquences de la sélection naturelle sur les variantes génétiques. Dans de nombreux modèles de génétique des populations, la sélection est décrite par le coefficient de sélection, s. Malgré l’importance cen- trale du coefficient de sélection dans les modèles d’évolution, nous n’avions jusqu’à récemment que peu de connaissances sur les valeurs plausibles de s pour les variantes génétiques au sein de populations naturelles. À travers une méta-analyse de plus de 3000 coefficients de sélection provenant de 79 études, je révèle des généralités sur le fonctionnement de la sélection naturelle en matière de génétique. Je relie ces résultats aux théories de la génétique des populations et aux études de sélection phénotypique, et formule des recommandations pour le calcul, l’interprétation et la manière de présenter les coefficients de sélection. Dans mon deuxième chapitre, je consid- ère la sélection naturelle et l’adaptation au sein d’une zone hybride dynamique qui se déplace

v rapidement. J’étudie une zone hybride entre deux races de papillons Heliconius erato de couleurs différentes. Étant donné que les locus génétiques responsables de la variation du motif de couleur chez H. erato sont bien caractérisés, je considère simultanément la sélection aux niveaux phéno- typique et génétique. Je développe de nouvelles méthodes statistiques pour quantifier la position et la forme de la zone hybride et les applique aux données historiques ainsi qu’à de nouvelles don- nées pour montrer que la zone hybride d’H. erato s’est élargie au cours des 15 dernières années, alors que son déplacement a ralenti. Je montre que cela est dû à une diminution de la sélection du motif de couleur mimétique des ailes de ce papillon et de l’allèle sous-jacente au motif de couleur. J’utilise ensuite des données récoltées par télédétection sur la perte de forêts et la productivité pour tester des hypothèses sur les facteurs écologiques qui influencent le déplacement des zones hy- brides. Dans mon dernier chapitre, j’examine si les changements phénotypiques et génétiques sont prédictibles. Je choisis une approche expérimentale, utilisant une étude de terrain à grande échelle, à long terme et éco-évolutive, réalisée sur des lézards Anolis sagrei. Les « anoles » (Famille ; , genre ; Anolis) sont un exemple d’évolution parallèle à travers une radiation adapta- tive, et leurs interactions avec des espèces compétitrices et prédatrices ont été bien étudiées dans des expériences intra-générationnelles. Cela fournit des prévisions claires sur la manière dont ces interactions écologiques pourraient conduire à une évolution adaptative sur plusieurs générations. Je teste ces prévisions en manipulant la présence et l’absence d’espèces prédatrices et compétitri- ces dans un plan factoriel sur 16 petites iles des Bahamas. Je mesure les changements dans une série de traits morphologiques pertinents pour l’utilisation et la performance de l’habitat, et utilise le séquençage de nouvelle génération pour caractériser les changements de fréquence d’allèles du génome. Malgré les effets forts et constants des prédateurs et des compétiteurs sur le com- portement, le régime alimentaire et la taille de la population chez A. sagrei, j’ai trouvé que les changements phénotypiques et génétiques étaient difficiles à prédire. Le changement phénotyp- ique, bien que non stochastique, était lié à la variation de la structure de la végétation et aux densités de lézards entre les îles, rendant la prévision difficile. Les changements génétiques, en re- vanche, étaient imprévisibles et sans lien avec nos manipulations expérimentales, aux changements

vi phénotypiques ou aux différences environnementales. Mon travail révèle la nécessité d’obtenir des données écologiques et des connaissances en histoire naturelle pour prédire les effets de la sélec- tion naturelle, et montre comment les expériences en milieu naturel peuvent être utilisées pour tester et clarifier des hypothèses sur le fonctionnement de la sélection naturelle. Dans l’ensemble, ma thèse démontre que l’intégration de perspectives phénotypiques et génétiques peut aider les biologistes à comprendre le fonctionnement de la sélection naturelle en milieu naturel. En parti- culier, mes études montrent l’intérêt de combiner ces perspectives avec des données écologiques détaillées, des techniques statistiques novatrices et des expérimentations pour tester directement des hypothèses sur l’évolution de populations naturelles.

vii Acknowledgements

You may have heard that doing a PhD is hard. It is, or at least it was for me. But it would have been much harder without the many people who have supported me over the last few years. First and foremost, I want to thank my PhD advisors, Rowan Barrett and Owen McMillan. I’m sure my dissertation has not turned out as they (or I!) expected: my original grand plans of field and laboratory experiments with Heliconius butterflies failed, as many grand plans do. Figuring out what to do instead has been a rocky journey, and Rowan and Owen were with me every step of the way, providing advice and guidance while also letting my own interests and talents guide the process. Owen has believed in and supported me as a scientist since before I even started my PhD, when I turned up to Panamá as a naive field assistant with a backpack full of REI clothes and little idea of what to do next. For his unflagging support, I am incredibly grateful. Prospective PhD students are, rightly, very concerned with picking a good advisor. I could not have done better than picking Rowan. For his ideas, attentiveness, guidance, support, friendship, and superhuman speed at returning edits on a manuscript, I am thankful. Rowan, I will miss being part of your lab. Thanks also to the members of my supervisory committee, Gregor Fussmann and Thomas Bureau, for providing helpful feedback and motivation toward finishing my dissertation. For the first two years of my PhD, I split my time between Montréal and the Smithsonian Tropical Research Institute in Panamá. I thank Raineldo Urriola, Vilma Fernandez, and the STRI administrative staff who helped me with obtaining permits, vehicles, laboratory space, and housing. An extra-special thanks to Adriana Bilgray, who cut through STRI’s bureaucracy many times to help me get what I need, and who always made sure I got paid. Rearing caterpillars for hours every day can be tedious. Elizabeth Evans, Emily Brodie, Laura Southcott, Joe Hanly, and the other denizens of the Maripozone made it less tedious. And thanks especially to Oscar Paneso, who made everything in the insectaries possible. Folks at STRI often say that Gamboa is a sticky place, both for the obvious reason (tropical humidity is no joke), and because people can’t help but keep coming back. Thank you, in no particular order, to May Dixon, Victoria Flores, Adriana Tapia, Michael Le Chevallier, Ioana Chiver, Salvatore Anzaldo, Justin Touchon, Valerie McMillan, Michael Logan, Rachel Crisp, Andre Szejner-Sigal, Ummatt Somjee, Kelly and Saul Beckett, Richard Merrill, and the many, many friends who kept me coming back. At McGill, thanks are due to Carole Smith, Caroline LeBlond, Sonal Patel, and Ancil Git- tens for solving many accounting and administrative problems along the way. A special thanks to Joe Iantomasi for cheerfully taking delivery of what must have been hundreds of boxes of reagents and lab supplies. I have benefited enormously from the intellectual community of graduate students and post- docs at McGill, and in particular in the Redpath Museum. A special thanks to Antoine Paccard for teaching me everything I know about work in the lab, I could not have done this without your sup-

viii port and friendship. Thanks to Marc-Olivier Beausoleil, Charles Xu, Juntao Hu, Madlen Stange, Alan Garcia-Elfring, Dieta Hanson, and the other members of the Barrett lab for being willing to keep talking to me about evolutionary biology even though we should really all be going home now. Thanks to Andrew Hendry and the whole DRYBAR lab group for creating such a welcoming work environment. And thanks to Marc-Olivier Beausoleil for translating my abstract into French. Besides their intellectual contributions, thanks to the many great friends at McGill who were always willing to go to Thomson House, have a beer, and celebrate (or commiserate) life as a graduate student. To Logan James, Beth Nyboer, Max Farrell, Vincent Fugère, and many others: thanks. And thanks especially to Dustin Raab and Trina Du, who were always willing to have a beer somewhere where the beer is better. In Montréal, thanks to Neven Leddy and Gaelle Hortop for being great housemates and friends. Thanks to Rebekkah Hyams, Simcha Samuel, Erick Provost, Stephanie Robins, Normand Daigneault, Abshishek Vadnerkar, Jamie Woods, Alina Geampana, Golshan Golriz, Chris Somos, and Sarah Mittermeier for giving me roots in Montréal outside of McGill. Thank you to my parents and my family, who have been incredibly supportive of my choice to pursue a PhD and the uncertain career path it entails. Without their love, support, and encourage- ment, I would never have become the person I am today. Mom and Dad, thank you for coming to visit me in foreign countries, reading my papers, making sure I take care of myself, and for always keeping me in your prayers. Thank you to my sister, Katie, who was my very first teacher. I’m sure when you started you didn’t expect I’d be in school for this long. Thanks to my brothers, Mark and Joe, who have been with me since day one. And thank you to my in-laws, Erica and Jason, for their support. To Skye Miner, I feel extremely lucky to have shared my PhD journey with you. Thank you for swimming, biking, and running with me, for putting up with me when I all I can think about is work, and for making every part of life in graduate school better. I also feel lucky to have the support of Skye’s family: Kevin Miner, Brenda Miner, and Samantha Miner. For chapter 1, I thank J. Anderson, W. Eanes, P. Gerbault, Z. Gompert, J. Prunier and S. Taylor for providing data, K. Gotanda for access to the preliminary database of Siepielski et al. 2013, and J. Hadfield and S. Diamond for answering questions about MCMCglmm. S. Otto, P. Nosil and W.O. McMillan provided helpful comments on an earlier version of this manuscript. S. Otto also co-authored appendix A.2. I thank L. Bernatchez, A. Eyre-Walker and two anonymous reviewers for helpful comments. This work was supported by a STRI-McGill-NEO fellowship to T.J.T. and funding from a Canada Research Chair and a Natural Science and Engineering Research Council Discovery grant to R.D.H.B. For chapter 2, I thank the members of the McMillan and Jiggins labs who helped catch butterflies, and S. Anzaldo for assistance in the field. J. Thurman, G. LaRouque, and M. Farrell provided mathematical and statistical advice. S. Van Belleghem provided the butterfly art for Figure 2.1, and J. Mallet provided helpful comments. I thank the government of Panamá for research and collecting permits, and gratefully acknowledge the Smithsonian Tropical Research Institute and McGill University for financial support. For chapter 3, thanks go first to Rob Pringle and Todd Palmer, who have let me take such a large role in the Anolis experiment. I am incredibly thankful. Many people contributed to field- work: Arash Askary, Kiyoko M. Gotanda, Jason J. Kolbe, Oriol LaPiedra, Jonathan B. Losos, Liam Revell, Hanna Wegener, Todd M. Palmer, Charles Xu, Johan Pansu, Julie Montenoise, Tyler Kartzinel, Naomi man in’t Veld, Tyler Coverdale, Kena Fox-Dobbs, Josh Daskin, and Dave Spiller.

ix Thanks to John and Dreko Chamberlain for their extensive help with logistics, and to Stacy Lubin- Gray, Luceta Hanna, and the government of the Bahamas for research and sampling permits. Back in Montréal, thanks to Sam Wunderlich, Ana Catarina Avila Vitorino, and Denice Liu for help in the lab and with measuring lizards. Thanks to Rodney Brown for help with the liquid-handling robot. A huge thank you to Anthony Geneva and Jonathan Losos for providing early access to the preliminary A. sagrei genome, and to Victor Soria-Carrasco for sharing GEMMA scripts. I am thankful for the funding support that I received from McGill University, the McGill Biology Department, the Smithsonian Tropical Research Institute, and the Québec Center for Bio- diversity Sciences during the course of my studies. It is hard to put into words the full extent of my gratitude for all the expertise, help, encour- agement, insight, friendship, and love that has been given to me by these people. And it is hard to know that I do not have the space (or, to my embarrassment, the memory) to name everyone who deserves to be included here. Nevertheless, to all of you: thank you.

x Contributions to Original Knowledge

All chapters in this thesis constitute original scholarship written for the partial fulfillment of the degree of Doctor of Philosophy. Chapter 1 is a literature review and meta-analysis of published estimates of selection coef- ficients, s, in natural populations. To my knowledge, it is the first and only meta-analysis of genetic selection coefficients. Thus, it provides the first answers to important questions about the strength, form, and variability of selection at the genetic level. Our study also reviews a wealth of literature on the genetics of adaptation, develops new theory to explain why the distribution of selection may be similar across phenotypic and genetic scales, and provides guidance to other biologists about how best to calculate and report selection coefficients so that they are more easily analyzed in future meta-analyses. Chapter 2 is a study of a moving hybrid zone in Heliconius butterflies. Though individ- ual studies of hybrid zones are common, very few studies, if any, have examined similarly long timescales (30 years), with a clear understanding of the link between genotype and phenotype, and with a direct hypothesis test of a proposed ecological driver of hybrid zone dynamics. This chap- ter is thus a particularly detailed study of a moving hybrid zone. Beyond this empirical value, I also develop a novel Bayesian implementation of classic cline models which improves on previous models in a number of ways and has the potential to be widely used in studies of hybrid zones and allele frequency clines more generally. Chapter 3 reports the results of a bold eco-evolutionary experiment with Anolis lizards. Experimental tests of natural selection in the wild are rare, and most that do occur last only a gen- eration or two. Our well-replicated, multi-generation experiment combines ecological, phenotypic,

xi and genomic data and is among the most comprehensive experimental tests of a selective hypoth- esis to date. Importantly, our detailed ecological data allow us to contextualize our phenotypic and genetic results, as we show how selection on morphology in Anolis sagrei is mediated by a complex interplay between ecological interactions, individual behavior, and the structural environment.

xii Thesis Format

This thesis is in manuscript style. It opens with a general introduction which situates my work in the field, and attempts to do so in accessible language. The body of my thesis consists of three manuscripts for which I am the lead author. Between these chapters, I have written short linking statements that highlight conceptual and thematic connections between the preceding chap- ter and the next. Finally, I summarize my work in a general discussion. Two of the included manuscripts have been published in peer-reviewed journals and are reprinted here with the per- mission of John Wiley and Sons, their common publisher. The third chapter is in preparation for submission. This thesis uses the Chicago citation style. Chapter 1: Thurman, T.J., Barrett, R.D.H. 2016. The genetic consequences of selection in natural populations. Molecular Ecology, 27(5), 1429-1448. Chapter 2: Thurman, T.J., Szejner-Sigal, A., McMillan, W.O. Movement of a Heliconius hybrid zone over 30 years: a Bayesian approach. Journal of Evolutionary Biology, 32(9), 974-983. Chapter 3: Thurman, T.J., Barrett, R.D.H. Predicting evolutionary change from ecological interactions: a field experiment with Anolis lizards. Prospective journal: Proceedings of the Royal Society B: Biological Sciences.

xiii Contribution of Authors

I (TJT) am the first author for all chapters of this thesis, and for all sections of the ap- pendix except for A.2, which was co-authored with Sarah P. Otto. I am the sole author of all non- manuscript parts of the thesis. My dissertation abstract was translated to French by Marc-Olivier Beausoleil. While writing these manuscripts I received assistance, insights, and support from a number of co-authors: Chapter 1: R.D.H.B. and T.J.T. designed the research. T.J.T. performed the literature review and analysis. T.J.T. and R.D.H.B. wrote the manuscript. Appendix section A.2 was co- authored with Sarah P. Otto. Chapter 2: W.O.M conceived of revisiting the hybrid zone. T.J.T and A.S-S. led the field- work with support from W.O.M. T.J.T. developed models, analyzed data, and wrote the manuscript. All authors edited and provided comments on the manuscript. Chapter 3: R.M.P designed the overall field experiment. R.M.P., and R.D.H.B. coordinated the study. T.J.T., R.M.P., and R.D.H.B performed field work. T.J.T. performed phenotypic measure- ments and prepared sequencing libraries. T.J.T. performed all statistical and genomic analysis with input from R.D.H.B. T.J.T. wrote the manuscript with input from R.M.P and R.D.H.B.

xiv List of Figures

1.1 The distribution of directional selection coefficients, s ...... 40 1.2 The distribution of directional selection coefficients across studied categories . . . 41 1.3 Summary of mean selection coefficients across studied categories ...... 42 1.4 Effect of accounting for autocorrelation and measurement error ...... 43 1.5 The distribution of overdominant selection coefficients...... 44

2.1 Sampling of Heliconius erato across Panamá ...... 73 2.2 Movement of cline through time ...... 74 2.3 Relationship between change in forest cover and change in allele frequency . . . . 75

3.1 Outcomes of predicted change in univariate traits ...... 105 3.2 Parallelism in multivariate phenotypic change ...... 106 3.3 Correlations in allele frequency change across the experiment ...... 107 3.4 Parallelism across scales ...... 108

A.1 Distribution of selection coefficients by direction ...... 141 A.2 Histogram of selection coefficients, binned at 0.01 ...... 142 A.3 Sample size vs. strength of selection ...... 143 A.4 Sample size vs. precision ...... 144 A.5 Precision vs. strength of selection ...... 145 A.6 The distribution of selection coefficients for Mendelian phenotypes ...... 146 A.7 Distribution of s with uniform genetic architecture and µ = 0.1 ...... 150 A.8 Distribution of s with uniform genetic architecture and µ =1 ...... 150 A.9 Distribution of s with exponential genetic architecture and µ = 0.1 ...... 151 A.10 MCMC diagnostics for the best fit model for the 1982 cline ...... 164 A.11 MCMC diagnostics for the best fit model for the 1999 cline ...... 165 A.12 MCMC diagnostics for the best fit model for the 2015 cline ...... 166 A.13 Estimates of inbreeding across the hybrid zone ...... 167 A.14 Posterior predictive check, 1982 cline ...... 168 A.15 Posterior predictive check, 1999 cline ...... 169 A.16 Posterior predictive check, 2015 cline ...... 170 A.17 Variation in forest metrics across the cline ...... 171 A.18 Map of the experimental islands ...... 194 A.19 Skeletal measurements taken ...... 195 A.20 Histogram of all pairwise comparisons of ∆D ...... 196 A.21 Correlations in allele frequency change across the experiment, top 5% SNPs . . . . 197

xv A.22 Parallelism across scales, top 5% SNPs ...... 198

xvi List of Tables

1.1 Summary of database ...... 45 1.2 Mean selection coefficients across categories ...... 45 1.3 Results of the generalized linear mixed models ...... 46

2.1 Parameter estimates for best-fit cline model for each year ...... 76

A.1 Description of the database ...... 139 A.2 Results of sensitivity analysis ...... 140 A.3 Files downloaded from Hansen et al. 2013 ...... 158 A.4 Butterfly collections from 2015 ...... 160 A.5 Comparison of model performance on simulated data ...... 160 A.6 Estimates of the three Cr allele frequencies in the 1982 samples ...... 161 A.7 Estimates of the three Cr allele frequencies in the 2015 samples ...... 161 A.8 Cline parameter estimates for all models for 1982 cline ...... 162 A.9 Cline parameter estimates for all models for 1999 cline ...... 162 A.10 Cline parameter estimates for all models for 2015 cline ...... 162 A.11 Model comparison results for 1982 cline ...... 163 A.12 Model comparison results for 1999 cline ...... 163 A.13 Model comparison results for 2015 cline ...... 163 A.14 Island information ...... 186 A.15 Perch height analysis results ...... 187 A.16 Perch height analysis results, zero-inflated term ...... 187 A.17 Perch diameter analysis results ...... 187 A.18 Repeatability of phenotypic measurements ...... 188 A.19 Results of univariate trait analysis ...... 192 A.20 Effective population size of each island ...... 193

xvii General Introduction

Consider your favourite . You can probably think of many things that make that animal extraordinary. As an example, my favourite animal is the emperor penguin. They are in- credible, with an impressive array of characteristics that make them well-suited to living in the harsh environment of Antarctica. They have dense feathers and thick layers of fat to guard against the frigid water in which they swim. They are agile hunters with counter-shaded camouflage which makes it harder for their predators and prey to see them. Even their blood is suited to their envi- ronment: their hemoglobin has an especially high affinity for oxygen, which allows them to dive for over 20 minutes while going hundreds of meters underwater to find food (Meir and Ponga- nis, 2009). Why is the penguin so well-suited its environment? Or, more generally, how does any organism become well suited to its environment? This question is at the heart of evolutionary biology, and my dissertation. In the ultimate sense, it has perhaps already been answered. Indeed, the answering of this question marks the founding of the field of evolutionary biology in its current form. As Darwin and Wallace first proposed, organisms (more precisely, populations of organisms) become well-suited to their envi- ronment through the process of adaptation driven by natural selection (Darwin and Wallace, 1858). To explain this more clearly, it will be helpful to define some terms1. The first is fitness. This word is occasionally misunderstood, perhaps because of the (near- tautological) phrase often used to define natural selection in layperson’s terms: "the survival of the fittest". Broadly, biologists define fitness as the ability of an organism to survive and reproduce.

1If you are already a practicing evolutionary biologist, please bear with being condescended to for a few paragraphs. I wanted the introduction of my dissertation, at least, to be accessible to laypeople (really, my family).

1 There are many formal and mathematical definitions of fitness and how best to measure it, but they all generally encapsulate this idea (Orr, 2009). More precisely, fitness is not only a property of organisms, but of any "replicating biological unit" (e.g., a cell, a genotype, an organism, a population, etc., Hendry et al., 2018). Finally, it is important to note that fitness is determined by the interplay between an organism and its environment: an organism with high fitness in one environment may have low fitness in another environment. This leads us to our next term, natural selection. Natural selection refers to variation in fitness across organisms. For ease of discussion, it is often characterized as a force, e.g., "selection favoured individuals with big ears." This is useful2, though not technically true. Natural selection is not an external force, but simply the necessary outcome of variation in fitness. Strictly speaking, natural selection occurs at the level of the phenotype, the term biologists use to refer to the physical characteristics of an organism. In other words, variation in fitness is due to variation in phenotypes: some phenotypes are better suited to an environment (that is, more fit) than other phenotypes3. Again, it is important to recognize that natural selection is dependent on ecological context: traits that are favoured in one environment may be at a disadvantage in a different one. Variation in any phenotype could lead to natural selection, though evolutionary biologists are most concerned with selection on heritable traits. These are traits that have a genetic basis and can be inherited by offspring from their parents. This is most easily explained with a counterex- ample. Among elite tennis players, for example, the arm used for holding the racquet has often grown larger and stronger than the non-dominant arm, a result of the physical forces placed on the muscles and bones during play (Lucki and Nicolay, 2007). Though this is a physical characteristic, the children of tennis players do not inherit their parents’ asymmetric arms. Selection on herita- ble phenotypes leads to changes in the distribution of those phenotypes in the next generation. If parents with large ears are more fit (i.e., they survive better and produce more offspring) and ear size is heritable, then the average ear size will increase in the next generation. This is adapta-

2I will do so throughout this thesis. 3Indeed, this is one mathematical definition of natural selection: COV(z, wˆ). That is, selection is the covariance between phenotype, z, and expected relative fitness, wˆ (Brodie et al., 1995).

2 tion: the process by which populations evolve increased fitness via natural selection on heritable phenotypes. So, why do organisms appear so well-suited to their environment? We have an answer. Over generations, natural selection on heritable phenotypic traits has led organisms to evolve into the well-adapted forms we see today4. Though we can summarize adaptation in a single sentence, that elides the fact that we still have many, many things to learn about how exactly this process occurs. To wit, one word in our definition of adaptation is doing some heavy lifting: evolve. In common usage, to evolve is to change. In biology, evolution has a more specific mean- ing. It is not simply any change through time, but rather change in the hereditary material (i.e., the genes) of a population. Just as phenotypes are the physical characteristics of an organism, geno- types are the genetic characteristics of the organism. In population genetics and genomics5, we characterize individual genotypes in terms of alleles: the different versions of a given gene or other genetic unit. Evolution, then, is a change in the frequencies of different alleles within a population6. Allele frequencies in a population can change for a variety of reasons. Because population sizes are not infinite, allele frequencies may randomly change each generation, a process called genetic drift. Or, individuals from another population may immigrate and change allele frequencies (gene flow). The ultimate source of all genetic variation is mutation: the errors that are made when copy- ing DNA which alter allele frequencies by creating alleles that didn’t exist before. These types of evolution are distinct from adaptive evolution, which is the result of changes in allele frequency due to natural selection. This occurs when individuals with certain genotypes are more fit, on av- erage, than individuals with other genotypes. Ultimately, this selection is mediated through the phenotype: the reason some genotypes are more fit than others is because they produce phenotypes that are more fit7. 4And again, remember that fitness depends on the environment. If the environment changes, the well-adapted form of today could become the poorly-adapted form of tomorrow. 5The distinction between genetics and genomics is one of scale. In genetics, we study one or a few genes. In genomics, we study the whole genome, that is, the complete set of genetic material for an organism. 6There are other definitions besides this population genetic one. 7Even selection that appears to act solely on the genotype, like codon bias, is mediated through phenotypes (trans- lational efficiency, in the case of codon bias).

3 Thus when we dig in to the meaning of "evolve", we arrive at a major question: how is natural selection at the phenotypic level related to selection and heritable change at the genetic level? This is a crucial question for our understanding of adaptation, and understanding adaptation is not simply an academic concern. Many pressing public policy and conservation concerns are, fundamentally, issues of adaptation. In order to understand whether species can adapt to environ- ments which are rapidly changing due to climate change (Barrett and Hendry, 2012; Bell, 2013); to predict whether invasive species may colonize a new habitats (Prentis et al., 2008) and how native species may evolve in response (Carroll, 2007); or to halt the spread of insecticide-resistant pests of antibiotic-resistant pathogens (Baquero and Blázquez, 1997; Labbé et al., 2007); we must improve our understanding of adaptation and incorporate this knowledge into policy making (Carroll et al., 2014; Urban et al., 2016). However, we still know surprisingly little about how adaptive evolution- ary change at the genetic level is related to natural selection on phenotypes. This is perhaps partly because these two topics have often been studied separately, and across separate subfields8. Field biologists have long been interested in understanding natural selection on phenotypic traits. This became much more feasible after the development of straightforward statistical meth- ods for detecting and quantifying phenotypic selection (Lande and Arnold, 1983). Since then, hun- dreds of studies have quantified natural selection on phenotypes. These studies revolutionized our understanding of natural selection as a contemporary process, not just a historical one (Reznick et al., 2018). They also raised the standards of evidence required for invoking natural selection and adaptation, dampening some biologists’ tendency to see every trait as a product of adaptation (Gould and Lewontin, 1979; Ellstrand, 1983). Meta-analyses have revealed the strength and form of phenotypic selection (Endler, 1986; Hoekstra et al., 2001; Kingsolver et al., 2001; Hereford et al., 2004), shown that it creates phenotypic diversity (Rieseberg et al., 2002), and examined how it varies through time and space (Siepielski et al., 2009; Kingsolver and Diamond, 2011; Siepielski

8A very notable exception is the large and successful field of quantitative genetics. Quantitative genetics is excellent for predicting genetic evolutionary change in quantitative phenotypes, especially in pedigreed populations as in plant and animal breeding. Although quantitative genetics involves genetics (as the name implies), it does so at the level of mathematical abstraction and statistical summarization. Quantitative genetics generally does not, or did not, involve empirical estimates of allele frequency or genetic sequence. However, that is changing as DNA sequencing technology becomes more widely available.

4 et al., 2011; Morrissey and Hadfield, 2012; Siepielski et al., 2013). However, most of this work has been done with relatively little knowledge of the genetics of the traits in question (Comeault et al., 2014). Thus, in many cases where phenotypic natural selection has been quantified, it is unclear whether the trait is heritable and whether selection will lead to genetic change and adaptation. Studies of natural selection and adaptation at the genetic level, conversely, often lack phe- notypic perspective. This is likely caused, at least in part, by data availability. The study of the genetics of adaptation was once a data-starved, mostly theoretical discipline (Orr, 2005; Rockman, 2012). Technological advances over the last 20 years have, to repeat the cliché, revolutionized the field. The cost of DNA sequencing has plummeted (Goodwin et al., 2016). In many ways, obtaining genetic data is cheaper and easier than obtaining phenotypic or environmental data (Houle et al., 2010). This is, of course, a good thing. However, it does mean that genetic data often lack pheno- typic and ecological context. For example, population geneticists often use "genome scans" to find parts of the genome that show a signature of having been influenced by natural selection in the past (Vitti et al., 2013). However, it can be quite difficult to determine why that particular region may have been under selection. Often, geneticists look at the putative functions of any nearby genes (if there are any) to get some clue about what might have caused selection. This approach can be effective, but there is also a great risk over-interpreting false positives (Pavlidis et al., 2012). As with phenotypic selection, biologists must take care to avoid invoking selection as the explanation for genetic patterns when it is not warranted (Barrett and Hoekstra, 2011). For this reason, studies of experimental evolution are especially useful when considering the process of adaptation at the genetic level, as biologists can manipulate the proposed agent of selection either in the lab (e.g., Good et al., 2017) or in the field (e.g., Gompert et al., 2014) and thus directly test for selective effects. One approach to bridging the gap between genotype and phenotype has been through ge- netic mapping of traits: using statistics to find associations between phenotypic variation and ge- netic variation (Santure and Garant, 2018). Such studies find the "gene(s) for" a particular trait, often called the genetic architecture of a trait, and have been successful in finding the parts of

5 the genome underlying variation in traits ranging from abdominal colouration in Drosophila fruit flies (Bastide et al., 2013) to salt tolerance in monkeyflowers (reviews in Slate, 2004; Schielzeth and Husby, 2014). However, there is much debate about what exactly these studies can tell us about the genetic basis of adaptation (Rockman, 2012; Travisano and Shaw, 2013). One concern is that association mapping is most powerful when a phenotypic trait has a simple genetic basis, i.e., it is "controlled by" only one or a few genes with large effects on the trait9. In this case, un- derpowered association tests can lead to an upward bias when estimating the phenotypic effect of a gene (called the Beavis effect, Xu, 2003). This issue of power is exacerbated when association mapping is done outside of laboratory studies, as a number of the features of natural populations (e.g., difficulty in obtaining large sample sizes, increased amounts of population structure10) fur- ther reduce the power of association mapping to detect the genes associated with a trait (Santure and Garant, 2018). If most traits have a polygenic architecture in which many genes with relatively small effects are responsible for trait variation, association studies may be missing these effects and misleading us (Rockman, 2012). Another issue is that the genetic architecture of a trait can itself evolve, such that the results from association mapping in one population may not apply to another (Hansen, 2006; Santure et al., 2015; Schielzeth et al., 2018). The debate over the value of association mapping is far from settled. However, both advocates and detractors agree that one way forward is to directly study the evolutionary process: instead of (or in addition to) association mapping, biologists should use observations and experiments to link the ecological forces which drive natural selection to the resulting phenotypic and genetic evolutionary changes that are the basis of adaptive evolution (Travisano and Shaw, 2013; Lee et al., 2014). I take this approach in my dissertation, which combines meta-analysis, observational study, and experimentation in the field to advance our understanding of the phenotypic and genetic basis

9Of course, this is generally true of all statistical methods: large effects are always easier to detect than small effects. 10Population structure refers to differences in the frequencies of alleles between populations, or between subgroups in a population. To see why this complicates association studies, imagine two populations which differ in height. These populations have five genes, four of which do not influence height and one which does. The populations will have different allele frequencies at the one gene that affects height, but they will also likely differ at the four other genes as well due to genetic drift and mutation. Now, how can we tell which of the differing genes influences height?

6 of adaptation. By integrating phenotypic, genetic, and ecological data, I hope to contextualize evolutionary change in terms of the environmental and ecological forces which may be causing selection. This improves our understanding of why evolution is occurring and helps us understand the mechanisms behind the evolutionary process. An important feature of my approach is that I study natural populations of organisms. Laboratory studies have been crucial for developing and testing theories of adaptation, but we cannot know whether their results are applicable in the complex, changing environments of natural populations unless we go to the field and study evolution as it happens in the wild. In Chapter 1, I perform a meta-analysis of published estimates of selection at the genetic level in natural populations of organisms. In population genetic models, the effect of selection on genetic variation is often encapsulated in a single parameter, s. I analyze over 3000 published estimates of s to discover generalities about how selection operates at the genetic level. Though the data in this chapter are genetic, I discuss my findings in the context of similar meta-analyses of selection on phenotypes to show how the effects of natural selection are similar across these levels and use theory to show why this might be so. I also give recommendations on how to calculate and report selection coefficients in future studies. In Chapter 2, I study a rapidly moving hybrid zone of Heliconius butterflies. In Heliconius, wing colour patterns are under strong selection: they signal to predators that Heliconius are toxic. Multiple species have converged on the same colour pattern to more effectively educate predators, and individuals with different patterns are more likely to be eaten. I study a hybrid zone between races of Heliconius that differ in colour pattern. The single gene responsible for this difference is known, allowing me to simultaneously track changes in phenotype and genotype as the hybrid zone moves. I develop a new method for quantifying the shape and position of hybrid zones and apply it to historical data and new collections from across the hybrid zone. I use population genetic theory to relate changes in the movement and shape of the hybrid zone to changes in natural selection, and investigate deforestation as a possible ecological cause of changing selection. In Chapter 3, I test the predictability of evolution. I track changes in phenotype and genotype during a large-scale, multi-generation field experiment with

7 Anolis sagrei lizards. The experiment, set up across 16 small islands in the Bahamas, tests how A. sagrei respond to the introduction of predator and competitor species. These novel interactions are expected to impose natural selection on A. sagrei that should lead to predictable evolutionary change. Over 5 generations, I test whether these expectations come true by measuring changes in phenotypes associated with sprint speed, agility, and feeding and by using genomics to quantify changes in allele frequency. I then correlate phenotypic and genetic change with ecological data on the density of each species and the structural characteristics of the environment to test and clarify hypotheses about how these ecological factors might influence adaptive evolution. Finally, I conclude by summarizing my findings from these three chapters, highlighting how they advance our understanding of adaptation, and proposing future research which could build upon my results.

Bibliography

Baquero, F. and J. Blázquez (1997). Evolution of antibiotic resistance. Trends in Ecology & Evolution 12(12), 482–487.

Barrett, R. D. H. and A. P. Hendry (2012). Evolutionary rescue under environmental change? In U. Candolin and B. B. Wong (Eds.), Behavioural Responses to a Changing World, pp. 216–233. Oxford: Oxford University Press.

Barrett, R. D. H. and H. E. Hoekstra (2011). Molecular spandrels: tests of adaptation at the genetic level. Nature Reviews Genetics 12(11), 767–780.

Bastide, H., A. Betancourt, V. Nolte, R. Tobler, P. Stöbe, A. Futschik, and C. Schlotterer (2013). A genome-wide, fine-scale map of natural pigmentation variation in Drosophila melanogaster. PLoS Genetics 9(6), e1003534.

Bell, G. (2013). Evolutionary rescue and the limits of adaptation. Philosophical Transactions of the Royal Society B: Biological Sciences 368(1610), 20120080.

Brodie, III, E. D., A. J. Moore, and F. J. Janzen (1995). Visualizing and quantifying natural selec- tion. Trends in Ecology & Evolution 10(8), 313–318.

8 Carroll, S. P. (2007). Natives adapting to invasive species: ecology, genes, and the sustainability of conservation. Ecological Research 22(6), 892–901.

Carroll, S. P., P. S. Jørgensen, M. T. Kinnison, C. T. Bergstrom, R. F. Denison, P. Gluckman, T. B. Smith, S. Y. Strauss, and B. E. Tabashnik (2014). Applying evolutionary biology to address global challenges. Science 346(6207), 1245993.

Comeault, A. A., V. Soria-Carrasco, Z. Gompert, T. E. Farkas, C. A. Buerkle, T. L. Parchman, and P. Nosil (2014). Genome-wide association mapping of phenotypic traits subject to a range of intensities of natural selection in Timema cristinae. The American Naturalist 183(5), 711–727.

Darwin, C. and A. Wallace (1858). On the tendency of species to form varieties; and on the perpetuation of varieties and species by natural means of selection. Zoological Journal of the Linnean Society 3(9), 45–62.

Ellstrand, N. C. (1983). Why are juveniles smaller than their parents? Evolution 37(5), 1091–1094.

Endler, J. A. (1986). Natural Selection in the Wild. Princeton University Press.

Gompert, Z., A. A. Comeault, T. E. Farkas, J. L. Feder, T. L. Parchman, C. A. Buerkle, and P. Nosil (2014). Experimental evidence for ecological selection on genome variation in the wild. Ecology Letters 17(3), 369–379.

Good, B. H., M. J. McDonald, J. E. Barrick, R. E. Lenski, and M. M. Desai (2017). The dynamics of molecular evolution over 60,000 generations. Nature 551, 45–50.

Goodwin, S., J. D. McPherson, and W. R. McCombie (2016). Coming of age: ten years of next- generation sequencing technologies. Nature Reviews Genetics 17(6), 333–351.

Gould, S. J. and R. C. Lewontin (1979). The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist programme. Proceedings of the Royal Society B 205, 581–598.

Hansen, T. F. (2006). The evolution of genetic architecture. Annual Review of Ecology, Evolution, and Systematics 37(1), 123–157.

9 Hendry, A. P., D. J. Schoen, M. E. Wolak, and J. M. Reid (2018). The contemporary evolution of fitness. Annual Review of Ecology, Evolution, and Systematics 49(1), 457–476.

Hereford, J., T. F. Hansen, and D. Houle (2004). Comparing strengths of directional selection: how strong is strong? Evolution 58(10), 2133–2143.

Hoekstra, H. E., J. M. Hoekstra, D. Berrigan, S. N. Vignieri, A. Hoang, C. E. Hill, P. Beerli, and J. G. Kingsolver (2001). Strength and tempo of directional selection in the wild. Proceedings of the National Academy of Sciences 98(16), 9157–9160.

Houle, D., D. R. Govindaraju, and S. Omholt (2010). Phenomics: the next challenge. Nature Reviews Genetics 11(12), 855–866.

Kingsolver, J. G. and S. E. Diamond (2011). Phenotypic selection in natural populations: what limits directional selection? The American Naturalist 177(3), 346–357.

Kingsolver, J. G., H. E. Hoekstra, J. M. Hoekstra, D. Berrigan, S. N. Vignieri, C. E. Hill, A. Hoang, P. Gibert, and P. Beerli (2001). The strength of phenotypic selection in natural populations. The American Naturalist 157(3), 245–261.

Labbé, P., C. Berticat, A. Berthomieu, S. Unal, C. Bernard, M. Weill, and T. Lenormand (2007). Forty years of erratic insecticide resistance evolutionin the mosquito Culex pipiens. PLoS Ge- netics 3(11), e205.

Lande, R. and S. J. Arnold (1983). The measurement of selection on correlated characters. Evolu- tion 37(6), 1210–1226.

Lee, Y. W., B. A. Gould, and J. R. Stinchcombe (2014). Identifying the genes underlying quanti- tative traits: a rationale for the QTN programme. AoB PLANTS 6, plu004.

Lucki, N. C. and C. W. Nicolay (2007). Phenotypic plasticity and functional asymmetry in re- sponse to grip forces exerted by intercollegiate tennis players. American Journal of Human Biology 19(4), 566–577.

10 Meir, J. U. and P. J. Ponganis (2009). High-affinity hemoglobin and blood oxygen saturation in diving emperor penguins. Journal of Experimental Biology 212(20), 3330–3338.

Morrissey, M. B. and J. D. Hadfield (2012). Directional selection in temporally replicated studies is remarkably consistent. Evolution 66(2), 435–442.

Orr, H. A. (2005). The genetic theory of adaptation: a brief history. Nature Reviews Genetics 6(2), 119–127.

Orr, H. A. (2009). Fitness and its role in evolutionary genetics. Nature Reviews Genetics 10(8), 531–539.

Pavlidis, P., J. D. Jensen, W. Stephan, and A. Stamatakis (2012). A critical assessment of story- telling: gene ontology categories and the importance of validating genomic scans. Molecular Biology and Evolution 29(10), 3237–3248.

Prentis, P. J., J. R. U. Wilson, E. E. Dormontt, D. M. Richardson, and A. J. Lowe (2008). Adaptive evolution in invasive species. Trends in Plant Science 13(6), 288–294.

Reznick, D. N., J. Losos, and J. Travis (2018). From low to high gear: there has been a paradigm shift in our understanding of evolution. Ecology Letters 51, 1742–12.

Rieseberg, L. H., A. Widmer, A. M. Arntz, and J. M. Burke (2002). Directional selection is the primary cause of phenotypic diversification. Proceedings of the National Academy of Sci- ences 99(19), 12242–12245.

Rockman, M. V. (2012). The QTN program and the alleles that matter for evolution: all that’s gold does not glitter. Evolution 66(1), 1–17.

Santure, A. W. and D. Garant (2018). Wild GWAS — association mapping in natural populations. Molecular Ecology Resources 18(4), 729–738.

Santure, A. W., J. Poissant, I. De Cauwer, K. van Oers, M. R. Robinson, J. L. Quinn, M. A. M. Groenen, M. E. Visser, B. C. Sheldon, and J. Slate (2015). Replicated analysis of the genetic

11 architecture of quantitative traits in two wild great tit populations. Molecular Ecology 24(24), 6148–6162.

Schielzeth, H. and A. Husby (2014). Challenges and prospects in genome-wide quantitative trait loci mapping of standing genetic variation in natural populations. Annals of the New York Academy of Sciences 1320(1), 35–57.

Schielzeth, H., A. Rios Villamil, and R. Burri (2018). Success and failure in replication of genotype-phenotype associations: How does replication help in understanding the genetic basis of phenotypic variation in outbred populations? Molecular Ecology Resources 18(4), 739–754.

Siepielski, A. M., J. D. DiBattista, and S. M. Carlson (2009). It’s about time: the temporal dynamics of phenotypic selection in the wild. Ecology Letters 12(11), 1261–1276.

Siepielski, A. M., J. D. DiBattista, J. A. Evans, and S. M. Carlson (2011). Differences in the temporal dynamics of phenotypic selection among fitness components in the wild. Proceedings of the Royal Society B: Biological Sciences 278(1711), 1572–1580.

Siepielski, A. M., K. M. Gotanda, M. B. Morrissey, S. E. Diamond, J. D. DiBattista, and S. M. Carlson (2013). The spatial patterns of directional phenotypic selection. Ecology Letters 16(11), 1382–1392.

Slate, J. (2004). Quantitative trait locus mapping in natural populations: progress, caveats and future directions. Molecular Ecology 14(2), 363–379.

Travisano, M. and R. G. Shaw (2013). Lost in the map. Evolution 67(2), 305–314.

Urban, M. C., G. Bocedi, A. P. Hendry, J. B. Mihoub, G. Peer, A. Singer, J. R. Bridle, L. G. Crozier, L. De Meester, W. Godsoe, A. Gonzalez, J. J. Hellmann, R. D. Holt, A. Huth, K. Johst, C. B. Krug, P. W. Leadley, S. C. F. Palmer, J. H. Pantel, A. Schmitz, P. A. Zollner, and J. M. J. Travis (2016). Improving the forecast for biodiversity under climate change. Science 353(6304), aad8466.

12 Vitti, J. J., S. R. Grossman, and P. C. Sabeti (2013). Detecting natural selection in genomic data. Annual Review of Genetics 47(1), 97–120.

Xu, S. (2003). Theoretical basis of the Beavis effect. Genetics 165(4), 2259–2268.

13 Chapter 1

The genetic consequences of selection in nat- ural populations

Timothy J. Thurman1,2, Rowan D. H. Barrett1 Author affiliations: 1 Redpath Museum and Department of Biology, McGill University. Montréal, QC. Canada 2 Smithsonian Tropical Research Institute. Panamá, República de Panamá This chapter is published in Molecular Ecology 25 (7), 1429–1448.

1.1 Abstract

The selection coefficient, s, quantifies the strength of selection acting on a genetic vari- ant. Despite this parameter’s central importance to population genetic models, until recently we have known relatively little about the value of s in natural populations. With the development of molecular genetic techniques in the late 20th century and the sequencing technologies that fol- lowed, biologists are now able to identify genetic variants and directly relate them to organismal fitness. We reviewed the literature for published estimates of natural selection acting at the genetic level and found over 3000 estimates of selection coefficients from 79 studies. Selection coeffi- cients were roughly exponentially distributed, suggesting that the impact of selection at the genetic level is generally weak but can occasionally be quite strong. We used both nonparametric statis-

14 tics and formal random-effects meta-analysis to determine how selection varies across biological and methodological categories. Selection was stronger when measured over shorter timescales, with the mean magnitude of s greatest for studies that measured selection within a single genera- tion. Our analyses found conflicting trends when considering how selection varies with the genetic scale (e.g., SNPs or haplotypes) at which it is measured, suggesting a need for further research. Besides these quantitative conclusions, we highlight key issues in the calculation, interpretation, and reporting of selection coefficients and provide recommendations for future research.

1.2 Introduction

Since the publication of Lande and Arnold’s landmark methods for calculating selection on quantitative phenotypic traits (Lande and Arnold, 1983), the study of selection in natural pop- ulations has exploded. Hundreds of studies have generated thousands of estimates of selection on phenotypic traits, and the last 15 years have seen a number of influential reviews and meta-analyses of this data on phenotypic selection. These studies have improved our understanding of the strength and form of phenotypic selection in natural populations (Hoekstra et al., 2001; Kingsolver et al., 2001; Hereford et al., 2004), demonstrated its role in creating phenotypic diversity (Rieseberg et al., 2002), and shown how selection varies through time and space (Siepielski et al., 2009; King- solver and Diamond, 2011; Siepielski et al., 2011; Morrissey and Hadfield, 2012; Siepielski et al., 2013). Of course, biologists have long recognized that natural selection must be transmitted to the genetic level for adaptive evolutionary change to occur. Population genetic models explicitly ac- count for natural selection’s role in changing allele frequencies with the parameter s, the selection coefficient (Hartl and Clark, 1997). Although s can have slightly different meanings in different models (section 1.7.1), it generally describes the relative fitness advantage or disadvantage of an allele at a genetic locus. The genetic selection coefficient is thus similar to phenotypic selection gradients and differentials and quantifies the magnitude of natural selection acting on genetic vari- ants. Compared to measures of phenotypic selection, however, we know relatively little about the

15 values of s in natural populations of organisms. The questions that have been considered in reviews of phenotypic selection remain unanswered at the genetic level: How strong is selection at the ge- netic level? Is selection most often directional, overdominant, or frequency dependent? What is the distribution of selection coefficients in natural populations, and does that distribution change according to the temporal or genetic scale at which selection is measured? These are important questions in evolutionary biology, but it is only recently that biologists have had sufficient genetic data to address them empirically. Theoretical models have examined these issues, but their results are difficult to apply to natural populations for a variety of reasons. The first difficulty is the conceptual division between theories of positive selection and theories of negative selection. The designation of selection as positive or negative is determined by the choice of the allele used as the reference for calculating relative fitnesses and is thus somewhat arbitrary (section 1.7.1). Nevertheless, theoretical models often consider only one mode of selection, and this difference in focus can lead to different re- sults. For example, theoretical models of the fitness effects of new mutations find that beneficial mutations fixed during adaptation are likely exponentially distributed, while the distribution of deleterious mutations can be complex and multimodal (see reviews by Orr, 2005a; Eyre-Walker and Keightley, 2007; Rockman, 2012). Although it is easy to delimit positive and negative selection in theoretical models, draw- ing this distinction is more difficult in natural systems, where there is considerable debate about whether most populations are at fitness optima (and thus likely to experience mostly negative selec- tion) or maladapted (and thus allowing an opportunity for positive selection). Reciprocal transplant experiments find frequent but not ubiquitous local adaptation (reviewed in Kawecki and Ebert, 2004; Leimu and Fischer, 2008; Hereford, 2009) and published estimates of phenotypic selection indicate that mean trait values for the majority of traits are within two standard deviations of the fitness optimum (Estes and Arnold, 2007). Whether these patterns indicate widespread adaptation or maladaptation is open to interpretation, so it is difficult to know a priori which set of theory to apply (Hendry and Gonzalez, 2008).

16 Second, within the broad fields of positive and negative selection, theoretical predictions vary greatly based on the assumptions and parameters of specific models. Consider, for example, theories of adaptation that predict the distribution of fitness factors fixed during an adaptive bout (Orr, 2005a,b). Models that assume a stationary fitness optimum (e.g., Orr, 1998, 2003; Kryazhim- skiy et al., 2009) predict a different distribution of selection coefficients than models with a mov- ing optimum (e.g., Collins et al., 2007; Kopp and Hermisson, 2007, 2009a,b). Other factors that can influence this distribution include correlations between traits (Martin and Lenormand, 2006), migration between populations (Yeaman and Whitlock, 2011), the use of novel versus standing genetic variation (Hermisson and Pennings, 2005; Barrett and Schluter, 2008), the distance to the fitness optimum (Barrett et al., 2006; Seetharaman and Jain, 2013), and the number of fit- ness optima (Martin and Lenormand, 2015). Once again, applying this theory requires knowledge of parameters (e.g., amount of migration between locally adapted populations, current level of (mal)adaptation in the population, movement of fitness optima) that can be difficult to estimate for natural populations. Finally, theoretical models usually examine selection at a scale that can be difficult for empiricists to access in natural populations. Most of the models mentioned above consider the fitness effects of single point mutations. Often, biologists must measure selection on different alleles of a gene or QTL; selection acting on these larger genomic intervals might have different properties from selection on SNPs. In summary, this array of theory is informative but difficult to apply. Until recently, obtain- ing the data necessary to address these questions empirically was challenging. Although population geneticists have inferred selection at the genetic level by observing changes in Mendelian pheno- types for many years (see section A.1.9), direct estimation of selection on genetic variation was only made possible by the revolution in molecular genetic techniques that occurred in the 1970s and 1980s. These methods, and the next-generation sequencing technologies that followed, have allowed researchers to detect natural selection at the genetic level using a variety of methods. We briefly discuss these methods below; see Linnen and Hoekstra (2009) and Hohenlohe and Cresko (2010) for more thorough reviews.

17 Many observational approaches to quantifying selection rely on measuring changes in al- lele frequency, which can be detected directly with molecular genetic techniques. Allele frequency changes can occur over time (e.g., an increase in frequency over multiple generations, Nsanza- bana et al., 2010), across an environmental gradient (e.g., a decrease in frequency across a gradient of insecticide application, Lenormand et al., 1999), or between contrasting environments (e.g., frequency differences between two locally adapted populations, Hoekstra et al., 2004). Another important observational approach is the detection of selection from features of DNA sequence variation. These features can include (but are not limited to) haplotype structure (e.g., Quesada et al., 2003), patterns of linkage disequilibrium (LD) around a selected locus (e.g., Ohashi et al., 2011), and reduction in variation around a selected locus (e.g., Orengo and Aguade, 2007). The limitation of these approaches is that they alone cannot explicitly determine the process that led to the observed patterns of allele frequency or nucleotide variation (Barrett and Hoekstra, 2011). Thus, observational approaches often use population genetic modelling, simulations, and statistical analysis to rule out the possibility that only genetic drift or other neutral forces (e.g., demographic changes) could have produced the observed pattern (Excoffier et al., 2009; Li et al., 2012; Vitti et al., 2013). Nevertheless, all estimates of selection likely contain some imprecision due to drift. This problem of determining causality can sometimes be mitigated with experimental approaches. By tracking changes in allele frequency during experimentally controlled selection in the field, researchers can accurately measure selection and identify the agent imposing it (Linnen and Hoek- stra, 2009). Over the past few decades, biologists have made use of all of these techniques, and others (e.g., Robinson et al., 2012), to quantify selection acting at the genetic level in natural populations. In this study, we gathered those estimates of selection to address a number of key questions. Given the difficulty of applying population genetic theory to predict the distribution of selection coeffi- cients, we first plotted this distribution to see how it differs between biological and methodological categories. Next, we used nonparametric statistics and generalized linear mixed models to quantify how the magnitude of selection varies across temporal and genetic scales. Meta-analyses of phe-

18 notypic selection and evolutionary rates indicate that strong phenotypic selection is rarely main- tained for long and that long-term estimates of selection or rates of evolutionary change tend to be weaker than short-term estimates (Gingerich, 1983; Hoekstra et al., 2001; Kinnison and Hendry, 2001; Siepielski et al., 2009). We predicted that this inverse relationship between strength of se- lection and temporal scale would be true of selection at the genetic level as well. We hypothesized that selection would also vary based on the genetic unit at which it was measured. Specifically, we assumed that strength of selection on a locus would be proportional to the amount of phenotypic variance that the genetic unit can explain. We reasoned that, with some exceptions, larger genetic units (e.g., allelic variants of a gene or QTL) would tend to have larger phenotypic effects than SNPs. Thus, we predicted that the strength of selection would increase with genetic scale. Finally, we highlight a number of important issues regarding the calculation and interpretation of selection coefficients and make recommendations for further research that will improve our understanding of this important evolutionary parameter.

1.3 Materials and methods

1.3.1 Literature search

To assemble our database, we searched for journal articles reporting selection coefficients in a number of ways. First, we searched the Web of Science database system using three different search terms: ‘selection coefficient*’, ‘genotyp* selection’ and ‘adapt* gene’. We excluded books and search results from scientific fields outside of ecology and evolution (see section A.1.1). Sec- ond, we searched the preliminary literature database of Siepielski et al. (2013), a meta-analysis of spatial variation in phenotypic selection, for journal articles that were excluded from their analysis for studying genotypes instead of phenotypes. Third, we searched the weekly tables of contents from a number of journals that focus on evolutionary biology and genetics (see section A.1.1). Finally, while determining which studies met our inclusion criteria, we noted references to pa- pers that might have reported selection coefficients and added those to our database. In total, we examined approximately 2200 papers for estimates of natural selection at the genetic level.

19 1.3.2 Inclusion criteria

To be included in our quantitative analysis, studies needed to satisfy three criteria. First, the study had to report a selection coefficient on a genetic unit (s). Estimates of s that were equal to zero were not included, as they did not detect selection acting on a locus (see section A.1.6). Selection coefficients can have different meanings under different population genetic models, but in most cases they quantify the difference in mean relative fitness between the most- and least- fit homozygotes (section 1.7.1). We excluded a small number of studies that reported selection coefficients that did not follow this model and thus had different properties from the rest of the calculated estimates. We also analysed directional selection separately from over- and underdom- inance. Selection coefficients scaled by effective population size (e.g., γ or δ) were excluded, as were estimates of s that reported a range of possible values without specifying a median or point estimate. A number of studies reported relative fitnesses for genotypes without explicitly calcu- lating a selection coefficient. In those cases we used the relative fitnesses to calculate selection against the less-fit homozygote (s = 1 − waa). Second, selection coefficients needed to be calculated for a specific genetic unit. For our analysis, we categorized these units as either ‘SNP’, which includes point mutations and single nucleotide polymorphisms, or ‘haplotype’, which includes all larger genetic units (e.g., insertions or deletions of more than one base pair, allelic variants of genes, allozymes, microsatellite loci, and quantitative trait loci). A number of studies used DNA sequence data to estimate the distri- bution of selection coefficients or average strength of selection acting on a set of genetic loci or type of mutation but did not calculate locus-specific selection coefficients. For example, Turchin et al. (2012) estimated the average strength of selection on about 1400 individual SNPs associated with increased height in Europeans, but did not report estimates of s for each SNP. These average selection coefficients were excluded from our analysis. Finally, studies needed to measure selection operating in natural populations. Thus, we excluded measures of selection in laboratory populations or in domesticated plants and . Estimates of selection in humans were included, as were estimates of selection from experimentally

20 manipulated natural populations or organisms introduced into suitable habitat. For each study that satisfied these criteria, we recorded the absolute value of s, whether se- lection was positive or negative, any measures of error, the statistical significance of the coefficient, the number of generations over which selection was measured, the genetic unit at which selection was measured, the method used to calculate the selection coefficient, whether the estimate of selec- tion came from observation or a manipulative experiment, and other information (Table A.1). We modified this raw database in three ways to prepare it for quantitative analysis. First, to avoid pseu- doreplication, we removed estimates of selection that were calculated from the same data as other selection coefficients. In most cases this occurred when one study reported alternative estimates of selection for the same genetic unit under different evolutionary parameters (e.g., generation times or degrees of dominance). When the authors deemed one set of parameters most biologically plausible, we included the selection coefficient from that model. Otherwise, we flipped a coin to randomly select one selection coefficient to include. Some studies calculated selection coefficients from data previously reported in other studies. If the original study also reported selection coef- ficients, we included whichever study reported more selection coefficients. If the studies reported equal numbers of selection coefficients, we included the original study. Second, some studies reported selection coefficients from the same data at different tem- poral scales or for different fitness components. In such cases, we included the selection estimates from the shorter timescales or more subdivided fitness components in our analysis, as including only the overall component might obscure relevant selection and result in pseudoreplication. For example, Bérénos and colleagues calculated selection coefficients based on selection for survival, reproductive success, and overall lifetime fitness (Bérénos et al., 2015). We included the measures of selection on survival and reproductive success, but did not include the lifetime fitness selection coefficients in our quantitative analysis. Finally, we standardized all estimates of selection as the magnitude of selection against the disfavoured allele. Because positive selection for beneficial alleles and negative selection against deleterious alleles are calculated under slightly different models, they are not directly comparable

21 (see section 1.7.1). Fortunately, a given estimate of positive selection on an allele can be easily converted into the estimate of negative selection against the corresponding disfavoured allele, as- suming a diallelic system (see section A.1.2). Hereafter, references to the distribution of selection coefficients or the mean magnitude of selection coefficients refer to these converted estimates.

1.3.3 Quantitative Analysis

Selection on phenotypes can be measured using standardized, regression-based methods that allow straightforward comparison in a meta-analysis (Kingsolver et al., 2012; Morrissey and Hadfield, 2012; Siepielski et al., 2013). Selection at the genetic level, on the other hand, can be measured with many different methods, and this diversity complicates formal meta-analysis. We therefore analysed the database using a variety of statistical techniques. All analysis was performed in R version 3.0.1 (R Core Team, 2013). First, we followed the example of early syntheses of phenotypic selection coefficients by plotting the distribution of selection coefficients, observing how this distribution differs between biological and methodological categories, and using nonparametric statistics to evaluate differ- ences in the mean magnitude of selection between categories (Endler, 1986; Kingsolver et al., 2001). Because two studies accounted for over 90% of selection estimates (see section 1.4.1), we performed all nonparametric analysis on both the full dataset and the subset of estimates excluding these two studies, hereafter referred to as the reduced dataset. Some studies reported multiple selection coefficients, and failing to correct for autocorre- lation within studies could influence our conclusions (Gurevitch and Hedges, 1999). To account for this, we implemented generalized linear mixed models (GLMMs) in a Bayesian framework using the R package MCMCglmm (Hadfield, 2010). We included study as a random factor and used the exponential distribution to model our response variable, the selection coefficient. For the fixed effects, we specified independent normal distributions with mean = 0 and large variance (109). For the random effects, we used parameter expansion, which results in scaled F priors, to improve con- vergence. We used flat inverse-Wishart priors for the residual variance (a full specification of the

22 models and priors, including the function calls in MCMCglmm, can be found in section A.1.3). We first modelled the distribution of selection coefficients without any predictor variables to see how accounting for autocorrelation within studies influenced our results. We then ran separate mod- els specifying the direction of selection, type of study, time period of selection and genetic unit as explanatory variables to understand whether the strength of selection differed between these categories. Measurement error can have a significant effect on the conclusions drawn from meta- analyses of selection (Morrissey and Hadfield, 2012). Unfortunately, relatively few studies reported measures of error around their estimates of selection, and those that did often used different meth- ods to calculate error bounds. For this reason, we were unable to account for measurement error in our analysis of all reported selection coefficients. To gain some understanding of how measurement error might influence our results, we performed three GLMMs on the subset of our data for which standard errors were reported or could be calculated and compared their estimates of the mean selection coefficient. We used the same normal priors for the fixed effects, but did not use parame- ter expansion and instead used flat inverse-Wishart priors for both the random effects and residual variance. The first model included study as a random factor, the second incorporated measurement error and the third incorporated both terms.

1.4 Results

1.4.1 Database results

Of the more than 2200 studies we examined, only 79 ( 3.5%) met all the inclusion criteria. After accounting for pseudoreplication and multiple temporal scales within a study, the database contained 3416 directional selection coefficients and 70 instances of heterozygote advantage. Most of the directional selection coefficients came from two studies. Anderson et al. (2014) reported 2793 estimates of selection, and Gompert et al. (2014) contained 300 selection coefficients (see section 1.7.2). All of the methodological and biological categories were well represented (see Table 1.1). Of the 79 studies, 15 reported selection coefficients for overdominant selection (see section

23 1.7.3).

1.4.2 Distributions and nonparametric analysis

Overall, directional selection coefficients were roughly exponentially distributed (coeffi- cient of variation = 1.05, CV = 1 for exponential distributions). Estimates of the strength of selec- tion ranged from extremely weak (s = 9.9 × 105) to extremely strong (maximum s = 1 for two lethal mutations, otherwise maximum s = 0.891) (Figure 1.1a). The mean selection coefficient of the full dataset was 0.135 (95% CI: 0.131–0.140, determined by 10 000 bootstrap replicates), while the mean of the reduced dataset was significantly smaller at 0.093 (95% CI: 0.078– 0.110; Wilcoxon rank sum test, W = 697656,P = 3.45 × 1015). The distribution of the reduced dataset was also roughly exponential (Figure 1.1b, dark gray bars). In the full dataset, there was a significant difference in mean strength of selection across categories of statistical significance (Kruskal–Wallis rank sum test, χ2 = 325, d.f. = 2,P = 2.2 × 1016), with significant estimates of selection being much greater than estimates that were not significant or of unknown significance (Figures 1.2a and 1.3a, Table 1.2). In the reduced dataset, there was no difference among statistical categories (Kruskal–Wallis rank sum test, χ2 = 1.79, d.f. = 2,P = 0.4; Figure 1.3a, Table 1.2). Estimates of negative selection had larger mean selec- tion coefficients than estimates of positive selection in both the full and reduced datasets (Figure 1.3b, Table 1.2). The mean strength of selection was greater for manipulative experiments than for observational estimates in both the full and reduced datasets (Table 1.2). The distribution of selection coefficients varied based on the time period over which selec- tion was measured (Figures 1.2b and 1.3c, Table 1.2). When studies did not report the number of generations over which selection was measured, we searched the literature for estimates of gener- ation time for the studied organism and used these to coarsely estimate the number of generations over which selection was measured. We grouped estimates of selection into four categories: selec- tion within a generation, short-term selection operating over <200 generations, long-term selection operating over 200 or more generations, and estimates for which the time period was unclear

24 or unspecified. The mean magnitude of s was significantly different across categories (full dataset: Kruskal–Wallis rank sum test: χ2 = 122, d.f. = 3,P = 2.2×1016; reduced dataset: Kruskal–Wallis rank sum test: χ2 = 48, d.f. = 3,P = 2.1×10−10; Figure 1.3b). In both datasets, the mean strength of selection decreased as the timescale over which selection was measured increased. The distri- bution of selection coefficients also varied with the genetic scale at which selection was measured (Figures 1.2c and 1.3d). In the full dataset, the mean strength of selection was greater for SNPs than for haplotypes, although this difference was marginally nonsignificant. In the reduced dataset, however, selection was significantly stronger on haplotypes (Table 1.2).

1.4.3 GLMM results

The results of our GLMMs were qualitatively similar to the results obtained using nonpara- metric statistics. First, we modelled the mean selection coefficient of the full dataset while account- ing for autocorrelation within studies by including study as a random factor. This GLMM estimated a mean overall selection coefficient of 0.095 (posterior mode, 95% HPD interval: 0.066–0.124). These confidence intervals do not overlap with those of the uncorrected mean selection coefficient of the full dataset (0.135, 95% CI: 0.131–0.140). However, the GLMM estimate is very similar to the mean of the reduced dataset (0.093, 95% CI: 0.078–0.110), albeit with less precision. The GLMMs that incorporated predictor variables found results similar to the nonparametric analyses, but with weaker estimates of the strength of selection and wider confidence intervals, such that differences between categories were not always statistically significant (see Table 1.3 for posterior modes and 95% HPD interval estimates for all models). Negative selection was slightly stronger but not significantly different from positive selection. Selection estimates from experimental stud- ies were nearly equal to estimates from observational studies, in contrast to the nonparametric results. Selection over long timescales was significantly weaker than both selection over short timescales and selection within a generation. The GLMM that included genetic scale as a predictor estimated that selection was stronger on haplotypes than on SNPs, although this difference was not significant.

25 The GLMMs we performed to evaluate the effects of measurement error indicated that autocorrelation had a much greater effect on our dataset than imprecise estimation of selection coefficients (Figure 1.4, Table 1.3). Compared to the uncorrected mean s estimated by bootstrap- ping, the GLMMs that incorporated measurement error had smaller estimates of mean s and wider confidence intervals, as might be expected. However, incorporating measurement error had much less effect than accounting for autocorrelation within a study, which greatly reduced the estimate of mean s. This analysis could only be performed on the approximately 10% of estimates for which we could calculate standard errors, so generalizing these results to the full dataset requires caution. However, these models indicate that the results of the GLMM on the full dataset, which accounts for autocorrelation, are probably robust to measurement error.

1.5 Discussion

In this study, we report the results of the first meta-analysis of published estimates of selec- tion coefficients in natural populations. Our search through the literature has uncovered a dynamic and growing field, with researchers using a wide variety of methodological and analytical tech- niques to understand how genetic variation influences fitness across a diverse set of taxa. Together, these estimates allow us to take the first steps towards answering fundamental questions about how natural selection operates at the genetic level. The vast majority of selection coefficients reported were for directional selection, with heterozygote advantage rarely detected (see section 1.7.3). We found that directional selection coefficients were roughly exponentially distributed, a pattern simi- lar to estimates of selection on phenotypes. Although most estimates of s were small, some studies detected very strong selection (s > 0.5), especially on short timescales. Selection varied as pre- dicted with temporal scale, as selection measured over long time periods was significantly weaker than selection measured over shorter periods. Selection also varied with the size of the genetic unit at which it was measured, although our different analyses found conflicting trends. Before discussing these conclusions in more detail, it is important to note some limitations of this dataset. As with most meta-analyses, our study likely contains some biases, a number of

26 which could tend to inflate estimates of selection. First, researchers may have chosen to study genetic loci that have an a priori expectation of being under strong selection (‘research bias’, Gurevitch and Hedges, 1999). For example, a number of candidate gene studies examined insec- ticide resistance alleles (e.g., Lenormand et al., 1998) or drug resistance alleles (e.g., Roper et al., 2003), which are expected to be under strong selection. Even studies that started without a priori candidates and took a genomewide approach to detecting selection (e.g., Anderson et al., 2014; Gompert et al., 2014) studied populations that could be expected to be under strong selection for local adaptation. Similarly, there may be publication bias against reporting insignificant or weak estimates of selection (the well-known ‘file drawer problem’, Rosenthal, 1979). In our dataset there appears to be some bias against weak estimates of selection (see section A.1.7 and Figures A.2- A.5), but there was clearly bias against statistically insignificant estimates. Nearly all insignificant estimates of selection came from the Anderson et al. (2014) and Gompert et al. (2014) studies. In the reduced dataset, there were only 21 insignificant selection coefficients, compared to 106 significant estimates and 196 with unreported statistical significance. Insignificant selection was thus rarely reported outside of the context of genomewide studies of selection, in which many in- significant estimates are expected. Perhaps this is not surprising, given the preeminence of neutral theory and the desire to avoid adaptationism (Gould and Lewontin, 1979; Nielsen, 2009; Barrett and Hoekstra, 2011). However, we agree with other authors who have urged researchers to think of selection coefficients as continuous variables and not to overemphasize categorical distinctions between ‘significant’ and ‘insignificant’ selection coefficients (Gompert, 2016). Failing to report selection coefficients because they are insignificant puts too much emphasis on P-values, too little on effect sizes and confidence intervals, and leads to publication bias (Halsey et al., 2015). Finally, the full database of selection coefficients is largely made up of estimates from two studies that combined large-scale field experiments with genome-wide sampling to generate hun- dreds of estimates of selection (Anderson et al., 2014; Gompert et al., 2014, see section 1.7.2). Ex- perimental evolution studies have important advantages over other methods of detecting selection, as researchers can track evolution in real time and control or mitigate some of the demographic

27 and ecological factors that complicate the detection and quantification of selection. However, these methods also have limitations, especially for detecting weak selection (see section 1.7.2). While more studies of this type will surely follow, for now they complicate the analysis of this dataset. We have sought to account for this with a variety of statistical techniques, but the accumulation of more estimates of selection at the genetic level will ensure that future meta-analyses of natural selection at the genetic level are not unduly influenced by a few studies. Despite these limitations, this dataset is our best source of information for both preliminary conclusions about selection at the genetic level and for informing future research.

1.5.1 Quantitative results

Our analysis found a number of important quantitative results. First, selection coefficients could be quite large. The uncorrected mean and median of the full dataset were 0.135 and 0.082, respectively, and there were 112 estimates of selection coefficients >0.5. Selection at the genetic level is often assumed to be rather weak. For example, some studies in this database that used simulations to quantify selection considered coefficients only within a narrow range (e.g., 0-0.1 in Ohashi et al. 2004; 0-0.03 in Gerbault et al. 2009). While many published estimates of selection coefficients are indeed small, our results show that researchers cannot discount the possibility of large selection coefficients for genetic variants, especially over short timescales. Of course, whether a given coefficient represents ‘significant’ or ‘strong’ selection is a matter of perspective. All alleles are affected by genetic drift, and where to draw the line between ‘selected’ and ‘neutral’ alleles is a matter of debate. Multiple definitions have been proposed, and most rely on an understanding

of the effective population size (Ne), recognizing that selection will be less efficient in smaller populations (Nei et al., 2010). When estimates of effective population size are unavailable, as in most of the studies in our database, Nei suggests a threshold of approximately |s| = 0.001 for vertebrates (Nei, 2005). Under this relaxed definition of neutrality, nearly all (3411 of 3416) of the selection coefficients in our database are not neutral. Second, the exponential distribution of s is very similar to the distribution of phenotypic

28 selection coefficients reported in other studies (Hoekstra et al., 2001; Kingsolver et al., 2001). This is not necessarily expected, as genetic selection coefficients are fundamentally different from phenotypic selection differentials and gradients. While selection coefficients against a disfavoured allele range from 0 to 1 (see section 1.7.1), selection differentials and gradients are calculated via linear regression and their range is thus unrestricted in theory, although in practice the absolute values of most estimates fall between 0 and 1 (Kingsolver et al., 2001). There is no clear theo- retical expectation that both phenotypic and genetic selection coefficients should be exponentially distributed. Kingsolver et al. (2001) note that, if most organisms are well adapted to their environ- ments, phenotypic directional selection should be normally distributed around a mean of 0. Indeed, more recent meta-analyses of phenotypic selection have used a folded-normal distribution to model the absolute values of selection gradients (Hereford et al., 2004; Kingsolver et al., 2012; Morris- sey and Hadfield, 2012). Multiple genetic models of adaptation predict that the fitness effects of adaptive mutations during a single adaptive walk will be exponentially distributed (Orr, 2005b). However, the assumptions that underlie those predictions (i.e., a single population adapting to a relatively close, stationary fitness peak solely through the fixation of new mutations) do not apply to our broad dataset, and other models make different predictions about how selection coefficients might be distributed (e.g., Kopp and Hermisson, 2009b, who model adaptation to a moving fitness optimum and predict a unimodal distribution with mutations of intermediate effect dominating). Instead of referring to disparate phenotype- or genotype-level theories, another way to ex- plain the similarity in these distributions could come from understanding how selection at these levels is linked. Selection does not act directly at the genetic level; rather it acts on phenotypes and is then transmitted to the genetic level based on the genetic architecture of the trait(s) under selection. Assuming that the phenotypic effects of an allele are proportional to its fitness effects, it may be possible to work downward from the empirical, roughly exponential distribution of phe- notypic selection coefficients to derive an expected distribution for genetic selection coefficients. To do so properly would require some understanding of the number and phenotypic effect sizes of the loci underlying the selected phenotypic traits, as well as the degree of pleiotropy. The general

29 genetic architecture of traits subject to selection is a topic of much debate (Rockman, 2012; Lee et al., 2014). The two opposing views could be characterized as ‘exponential-like’ (i.e., traits are controlled by one or a few loci with large phenotypic effects and many loci with small phenotypic effects) and ‘infinitesimal’ (i.e., traits are controlled by hundreds to thousands of loci of extremely small effect). Interestingly, the observed exponential form of selection coefficients acting on pheno- types may be transmitted to the genetic level to produce an L-shaped (exponential-like) distribution of selection on alleles, regardless of whether the allelic effects on a phenotype are drawn from an exponential or a uniform distribution, assuming that the strength of selection acting on a trait does not influence its genetic architecture and there is no pleiotropy (see appendix A.2, co-authored with S. Otto, for theory). Although some genotype–phenotype maps transmit the exponential-like dis- tribution of phenotypic selection unchanged to the genetic level, not all maps will do so. It remains an open theoretical question to determine which genotype–phenotype maps are most consistent with our observations. The impacts of natural selection at the genetic level varied across a number of biological and methodological categories. Statistically significant estimates of selection tended to be stronger than insignificant ones, which is unsurprising given that stronger selection is easier to distinguish from drift than weak selection. The mean value of selection coefficients that did not have estimates of error or significance was similar to the mean of insignificant selection coefficients (Figure 1.3), which may suggest that many of these estimates are statistically insignificant. Of course, the sta- tistical significance of an estimate of selection is dependent on the power of the procedures used to estimate it. Unfortunately, analysing the power of each study in our database was not feasi- ble. Statistical significance will only be indicative of the biological relevance of a variant’s fitness effect with sufficient power: underpowered studies may be unable to distinguish selection from drift. Conversely, significant estimates should not be misunderstood to mean that only selection is driving allele frequency change. All alleles in finite populations are influenced by drift; signif- icant estimates of s simply indicate that drift alone could not cause the observed change. Again, we emphasize that selection coefficients are continuous variables; it is preferable to interpret their

30 statistical and biological significance by considering their confidence intervals, not their P -values alone. And, absent knowledge of experimental power for each study, we cannot distinguish esti- mates of s that are insignificant due to neutrality from those that are insignificant due to insufficient power. Thus, we caution against overinterpreting the differences we observe across statistical cat- egories. Estimates of negative selection were of greater magnitude than estimates of positive selec- tion in both the full and reduced datasets, although this difference was not significant in the GLMM. Selection coefficients for both forms of selection were roughly exponentially distributed (Figure A.1). In some sense, comparing the magnitude and distributions of these categories is not biolog- ically informative, as the designation of selection as positive or negative is relative (see section 1.7.1). The difference in magnitude between these categories perhaps suggests research bias, with researchers who focus on negative selection choosing to study populations that experience slightly stronger selection. It is reasonable to expect that experimental manipulations may be associated with selection that is stronger than selection that is simply observed. While the nonparametric statistics indicated that this was the case, the estimates of mean s for experimental and observa- tional studies were nearly equal in the GLMM. The vast majority of estimates of selection from experiments came from the Anderson et al. (2014) and Gompert et al. (2014) studies, only eight other studies contributed 31 total estimates, so there is little statistical power for firm conclusions. Natural selection on shorter timescales tended to be stronger than selection on longer timescales, as we predicted. This was true in both the full and reduced datasets, and the GLMM corroborated the trend, although differences between some categories were insignificant. The abso- lute differences in magnitude between categories were fairly small: mean s for long-term estimates was 0.044 in both datasets, while mean s within a generation was 39× greater (0.141) in the full dataset and 59× greater (0.232) in the reduced dataset. This overall trend may be partially due to the fact that studies over shorter time periods, and especially within generations, are often unable to distinguish between direct and indirect selection on a locus, which could lead to larger estimates of s (see section 1.7.1, section 1.7.2). However, the patterns we see in the strength of genetic selection

31 coefficients are consistent with those observed in measures of evolutionary rates and phenotypic selection. Short-term rates of phenotypic change are often orders of magnitude greater than long- term rates (Gingerich, 1983), phenotypic selection on viability is stronger when measured over shorter time periods (Hoekstra et al., 2001), and long-term rates of phenotypic evolution are of- ten slower than one would expect when extrapolating from short-term estimates of phenotypic selection (Kinnison and Hendry, 2001). This tendency for evolutionary rates, phenotypic selec- tion coefficients and genetic selection coefficients to be of smaller magnitude when measured over longer periods of time is likely to be partially a mathematical artefact of averaging that is inherent to all measures that compare differences between initial and final states (Gingerich, 1983). Such measures must assume that the rate of change (in the case of s, change in allele frequencies) is constant between measured instances and will thus average out the instantaneous rates into a less extreme long-term rate. However, the effects of averaging almost certainly reflect biological real- ity. Meta-analysis of phenotypic selection shows that selection may fluctuate through time such that short-term estimates of selection are not indicative of long-term trends (Siepielski et al. 2009; but see Morrissey and Hadfield 2012). This effect is illustrated in the few studies that examined selection on the same locus or loci through time. For example, Barrett et al. (2008) found opposing patterns of strong selection at different life stages on an allele for reduced armour plating in threespine sticklebacks, such that the lifetime s was much weaker than the per-life-stage estimates of selection coefficients. Anderson et al. (2014) also found negative correlations between selection coefficients across some (but not all) episodes of selection, indicative of fitness trade-offs within a generation (see table 4 in Ander- son et al. 2014 and section A.1.8). Those trade-offs did not necessarily lead to estimates of weak lifetime selection. For example, plants in Montana experienced trade-offs between flowering/fruit- ing and overwinter survival. However, selection on survival was relatively weak and selection on both fruiting and flowering was quite strong, leading to large estimates of lifetime s. Perhaps the best example of how temporal variation can affect the magnitude of selection estimates comes from a study on drug resistance alleles in the malaria parasite, Plasmodium falci-

32 parum (Taylor et al., 2012). The authors calculated both annual selection coefficients and overall selection coefficients on mutations at individual codons across a nine-year study. Annual selection coefficients varied in magnitude and direction and were often statistically insignificant. The selec- tion coefficients calculated across all nine years, however, were smaller in magnitude, statistically significant, and favoured resistance alleles. This study could not be included in our quantitative analyses, as the regression-based selection coefficients they calculated were not comparable to the other estimates in our dataset. However, it clearly demonstrates that long-term patterns of selection are the result of fluctuating moment-by-moment forces of selection. The magnitude of s also varied based on the genetic unit at which selection was measured, but interpreting those trends is more complicated. We predicted that selection would be stronger on haplotypes than on SNPs, as allelic variants for haplotypes should, in general, have larger pheno- typic effects than allelic variants of SNPs. In both the reduced dataset and the GLMM this predic- tion was supported, although the difference in mean s between these categories was not significant in the GLMM. In the reduced dataset, the difference in mean s was large (0.052 for SNPs, 0.121 for haplotypes). In the GLMM, the difference was much smaller (posterior mode of s = 0.086 for SNPs, s = 0.093 for haplotypes). In the full dataset, however, selection was stronger on SNPs, al- though this difference was marginally not significant. Some of this inconsistency across analyses is due to the outsized effects of the Anderson et al. (2014) and Gompert et al. (2014) studies. Gompert et al. (2014), in particular, measured 300 instances of selection on SNPs, and their methods were biased towards the detection of very strong selection (see section 1.7.2). This may have inflated the mean s for SNPs in the full dataset. While the GLMM accounts for autocorrelation within studies, the confidence intervals around the estimate of mean s for both categories are quite broad, indi- cating little statistical support for either interpretation. There are also other possible hypotheses for how selection might vary with genetic scale. For example, larger genetic units could contain multiple loci with contrasting fitness effects, so that they experience weaker selection that is the average effect of the individual loci contained within them. In that case haplotypes would tend to experience weaker selection than SNPs, as we see in the full dataset. Given the conflicting trends

33 among the different datasets and methods of analysis, it seems that we need further data before we can determine whether, how, and why the strength of selection varies with genetic scale.

1.5.2 Recommendations for future research

In addition to our observations about the distribution and variation of selection coefficients, our review of the literature uncovered a number of important issues to consider when studying natural selection at the genetic level. First, consider that the acceptance rate for inclusion in our dataset was extremely low ( 3.5%). Of course, this low rate is partly due to our inclusion criteria, as we excluded some studies that quantified selection in ways that were incompatible with our analysis. Another possible reason might have been our Web of Science search terms. They seemed to be simultaneously too broad (our search results included many studies on agricultural plants, purely theoretical models, and phenotypic selection coefficients) and ineffective at locating studies (we found almost as many studies that reported selection coefficients by searching references as we did in our Web of Science searches). While these reasons are certainly part of the explanation, we suspect that the discrepancy between the number of plausible studies and the number of studies that report estimates of s ex- poses a larger issue: natural selection is frequently invoked or detected, but very rarely quantified, even in studies which contain raw data from which selection coefficients could be calculated. Of course, not all biologists are interested in quantifying natural selection, and it is understandable that many researchers do not take this final step. However, we hope that our analysis has shown that quantifying selection can lead us toward answers for important questions in evolutionary bi- ology. We therefore encourage researchers to endeavour, not only to detect selection, but also to quantify its strength. What, then, are the best practices for calculating, reporting and interpreting selection co- efficients? Methods for calculating selection coefficients will depend on the type of data available to a researcher. An extensive review of methods is beyond the scope of this work; for specifics, we direct readers to previous reviews of methods for the detection and quantification of selection

34 (Linnen and Hoekstra, 2009; Hohenlohe and Cresko, 2010; Vitti et al., 2013), to the examples cited in our introduction, and to the papers within our literature database. We also note that new methods are frequently being developed, especially methods which estimate selection coefficients based on sequence data (Charlesworth and Wright, 2004; Slatkin, 2008; Messer and Neher, 2012; Chen and Slatkin, 2013; Vitalis et al., 2014; Foll et al., 2015). Whatever method is used, researchers should take careful note of the mathematical model used to calculate s so that its biological meaning is clear. Of particular importance is understanding whether models calculate positive selection or negative selection, as these quantities are not directly comparable without a conversion (see sec- tion 1.7.1). Further, researchers should calculate statistical significance, ideally from some form of confidence interval, and be cognizant of the specific statistical issues particular to their data (e.g., considerations of multiple testing, linkage between sites, and/or population structure). When fea- sible, researchers should also seek to calculate or determine other parameters that will aid in the interpretation of selection coefficients. These include experimental power (to establish the mini- mum s that could be reliably detected), source of genetic variation (i.e., standing genetic variation or new mutations), effective population size, and the ancestral allelic state of the locus under se- lection. At minimum, researchers should clearly report (i) the model used to calculate s, (ii) some form of confidence interval for the estimate of s, and (iii) the data necessary to understand the time period over which selection was measured (ideally in generations). Researchers should report both significant and insignificant estimates to reduce publication bias. As genomic data become more available, the question of whether to calculate selection coefficients on all loci versus only those that show evidence of selection will become more important. This decision will depend, at least in part, on the interests and computational resources of the researcher. When calculating selection for all loci is not feasible, we recommend researchers follow the example of Gompert et al. (2014) by clearly stating the selection criteria for quantified loci. We also strongly recommend that researchers report estimates of effective population size, which aids in interpreting the strength of selection. Information about the source of genetic varia-

35 tion and levels of linkage disequilibrium in the population tested is also valuable, as levels of LD can determine the extent to which researchers can partition genetic selection as direct or indirect. This complication arises in the application of one-locus models of selection, which assume that s represents direct selection, to natural populations in which allelic variants are also influenced by selection on linked loci and s should properly be interpreted as quantifying both direct and indi- rect selection. Models used to study genomewide selection often have more parameters (genetic variants) than statistical replicates (individuals), inhibiting the ability to measure direct selection (Gompert et al., 2014). Linkage disequilibrium, epistasis and pleiotropy can all complicate the simple goal of measuring the direct fitness effects of an allele and muddle the distinction between direct and indirect selection (Barton and Servedio, 2015). Further theoretical work to address these issues will be especially welcome. Nevertheless we note that, in many cases, quantification of direct selection is not necessarily the ultimate goal. Understanding direct selection is crucial for elucidating the genetic and phenotypic mechanisms that drive adaptive evolutionary change. How- ever, the total amount of selection (both direct and indirect) that impacts a locus is what drives allele frequency change each generation, and understanding it is more important for predicting the trajectory of evolution.

1.6 Conclusions

Our analysis has taken important first steps towards improving our understanding of the impacts of selection at the genetic level. Where should researchers direct their attention with future studies of selection at the genetic level? Keeping in mind our methodological guidelines above, simply accumulating more estimates of selection will be extremely useful. Our conclusions are necessarily limited by the data that have been published so far, and the practice of estimating genetic selection coefficients is still rather young. More estimates of selection from a wider variety of taxa are needed for a fuller understanding of how natural selection shapes genetic variation. Fortunately, technological advances in collecting and analysing genetic data make it possible to quantify selection without requiring a priori knowledge of selection, and to do so in the context

36 of manipulative field experiments. And, as with phenotypic selection, it will be informative to consider how selection coefficients vary with space, time, and across sexes and life-history stages. Such studies will give insight into fundamental questions about local adaptation, developmental trade-offs, and sexual conflict. We expect that, in the coming years, the number and scope of studies that quantify selection at the genetic level will rapidly increase. With larger datasets, future researchers will be able to more conclusively answer the questions we have begun to consider here.

1.7 Boxes

1.7.1 Box 1. The meaning(s) of s

The selection coefficient, s, can have slightly different meanings in different evolutionary models. In most models s represents the difference in mean relative fitness between a reference genotype and another genotype. By definition, the reference genotype has a relative fitness of one. The choice of reference genotype, however, leads to subtle differences in the properties of s. First, consider directional selection at a locus with two alleles, A and a, with allele A having higher fitness. Researchers studying mutation have tended to focus on selection against new, deleterious alleles. The homozygote of the most-fit allele is used as the reference genotype, such that s = 1 −

waa and waa = 1−s. In this case, s quantifies the strength of selection against the deleterious allele and has a range from 0 to 1. Studies of adaptation, however, typically focus on selection in favour of

beneficial alleles and thus set the homozygote of the less-fit allele as the reference: s = wAA − 1, or wAA = 1 + s. Here, s measures the strength of selection acting in favour of the beneficial allele and has a range from 0 to infinity, as genotypes can have a >100% fitness advantage, at least in theory. It should be noted that under this scenario sfor_AA is not equal in magnitude to sagainst_aa (see section A.1.2). When the magnitude of s is small the difference between sfor_AA

and sagainst_aa will be small, but as the strength of selection increases the difference grows. When

sagainst_aa equals 1 (a lethal allele), sfor_AA equals infinity. So far this model has ignored dominance, which has important implications for the calcula- tion of s. In population genetic models of directional selection, dominance is most often accounted

37 for with the dominance coefficient, h. In the single locus, two-allele model described above, the

fitness of each genotype would be wAA = 1, wAa = 1 − hs, waa = 1 − s. When h = 0, A is completely dominant and wAA = wAa. When h = 1, A is completely recessive and waa = wAa. Although the definition of s remains the same, the calculated value of s could change significantly depending on the assumed or known level of dominance and the method used to estimate selection. While methods that estimate s by directly measuring fitness differences between homozygotes are robust to changes in h, methods that track changes in allele frequency or that measure fitness in heterozygotes are sensitive to assumptions about dominance. The dominance coefficient was rarely empirically estimated in the studies included in our database. Most studies assumed additive fit- ness effects (h = 0.5) or calculated multiple possible s values under difference assumptions of dominance. In the case of over- or underdominance, slightly different genetic models are used. The heterozygote is defined as the reference, and selection coefficients for or against each homozygote are calculated. Selection may be assumed to be symmetric such that s for each homozygote is

equal, but other models allow s to vary, and might use s1 and s2 or s and t to denote the two selection coefficients. In the simple, one-locus models described above, s quantifies the direct fitness effects of the genetic variant. In real organisms, of course, allelic variants do not occur in isolation. Each generation, the fate of an allele is determined by both the direct effects of that locus on fitness and by the indirect effects of selection operating on other sites that are in linkage disequilibrium (LD) with the focal locus (Smith and Haigh, 1974; Charlesworth et al., 1993). The situation is analogous to correlated selection on phenotypic traits (Lande and Arnold, 1983). At the phenotypic level, biologists can use multiple regression to distinguish between direct selection and total (direct and correlated) selection on a specific phenotype (selection gradients and differentials, respectively; Lande and Arnold, 1983; Brodie et al., 1995). At the genetic level, isolating the direct effects of an individual locus on fitness is quite difficult (Barrett and Hoekstra, 2011). Accounting for the effects of linked sites requires either (i) sufficient recombination to break apart associations

38 with other alleles, (ii) complex, multigeneration crosses such as near-isogenic lines, (iii) replicate populations subject to the same experimental treatment, (iv) sufficient sample sizes and genetic variation such that selected alleles are present in multiple genetic backgrounds, or (v) transgenics. In most other cases, genetic selection coefficients should be interpreted as being analogous to phenotypic selection differentials, not gradients.

1.7.2 Box 2. Field Studies of Selection: Anderson et al. 2014 and Gompert

et al. 2014

Two studies reported a large portion of the selection coefficients in our database. Both studies tracked changes in allele frequency on hundreds to thousands of loci in large-scale field experiments, and there was no a priori understanding of whether these markers would influence fitness. This is in contrast to many of the other papers in our database, and in principle, such field studies could give a more unbiased view into how selection operates across the genome. However, details of the experimental design and analytical procedures for these studies can also influence the selection coefficients they report, so it is useful to discuss each paper in more detail.

Anderson et al. 2014

Anderson and colleagues used multiyear field transplants to study local adaptation and fitness trade-offs in Boechera stricta, a perennial mustard native to the Rocky Mountains. Anderson et al. crossed plants from two potentially locally adapted populations in Colorado and Montana to

create 172 F6 recombinant inbred lines (RILs), and genotyped each RIL at 62 microsatellite loci and 102 SNPs. They planted two cohorts containing replicates of each RIL and parental line into two common gardens near the source populations, and tracked each cohort for multiple years, measuring survival, flowering success, and fecundity for each individual. From this individual- level data on fitness components, they calculated relative fitnesses for the different genotypes at each locus and used permutation to estimate selection coefficients and significance thresholds for each genotyped locus (Anderson et al., 2013, 2014). This permutation procedure does not calculate error bounds, so the precision of each estimate is unknown. They calculated s at both experimental

39 sites for multiple within-generation episodes of selection and multiyear selection coefficients based on lifetime flowering probability and fruit production. For our quantitative analysis, we included all within-generation estimates of selection, but not the lifetime selection coefficients (see section 1.3.2). We also used the more conservative genomewide threshold when classifying estimates of s as significant or insignificant. Thus, most estimates of s were insignificant, and this might tend to increase the mean of the significant category. However, Anderson et al. calculated and reported a selection coefficient for every genetic marker at every instance of selection, regardless of strength or significance. There is therefore no within-study publication bias, and Anderson et al. present an objective report of the impact of selection in their experiments. Their study is also unusual in that it calculates selection coefficients for each locus in two locations across multiple time periods, providing some insight into how selection at the genetic level can vary through space and time.

Gompert et al. 2014

Gompert and colleagues studied two ecotypes of Timema cristinae stick insects that are differentially adapted to live on the host plants Adenostoma fasciculatum and Ceanothus spinosus. Visual predation by birds drives phenotypic divergence in T. cristinae: insects with a white dorsal stripe are cryptic on Adenostoma and conspicuous on Ceanothus, while the opposite is true of un- striped morphs (Sandoval, 1994; Nosil, 2004; Nosil and Crespi, 2006). Gompert et al. collected 500 total T. cristinae from a mostly Adenostoma-adapted population that receives some gene flow from other populations with different host plants (Nosil et al., 2012). They cut off a portion of leg from each individual for tissue sampling and transplanted groups of insects onto individual Adenostoma and Ceanothus plants in experimental blocks at a nearby site. After 8 days they resampled the ex- perimental plants and recaptured 140 insects, from which they took a postselection tissue sample. Using a genotype-by-sequencing approach, they determined the pre- and postselection allele fre- quencies of almost 200000 SNPs. They developed statistical models to identify loci that showed parallel changes in allele frequency across experimental blocks that were unlikely to occur due to drift alone and then used MCMC to calculate the mean selection coefficient and 95% credible in- tervals for these loci. Thus, for quantifying selection, Gompert et al. take a different approach from

40 Anderson et al. (2014). Although they have the data, in principle, to calculate selection coefficients for all loci, they calculate selection coefficients only for loci that demonstrated large, parallel allele frequency changes across experimental blocks. Weak selection is unlikely to drive such changes, and the Gompert et al. method is thus biased against the quantification of weak selection. Indeed, the distribution of selection coefficients reported in Gompert et al. (2014) is quite different from the distribution of both Anderson et al. (2014) and all other estimates of s (Figure 1.1b).

1.7.3 Box 3. Heterozygote Advantage

Overdominant selection was rarely detected, with only 140 estimates of s from 15 stud- ies (70 instances of heterozygote advantage, two selection coefficients per instance). With so few estimates, it is difficult to draw firm conclusions about the strength of overdominant selection, es- pecially because most estimates were insignificant or did not report statistical significance (Figure 1.5). Overall, selection ranged from very weak (s = 0.0003) to very strong (s = 1 for homozy- gote lethal alleles). The distribution of overdominant selection coefficients was significantly dif- ferent from that of directional selection coefficients (Kolmogorov–Smirnov test, D = 0.34,P = 1.14 × 1014) and was more uniformly distributed, although weak estimates of selection were still most common. Multiple studies reported selection coefficients for HLA loci in humans or MHC loci in other vertebrates. These immune system genes are classic examples of heterozygote advan- tage (Hedrick, 2012). Heterozygote advantage was also detected at a number of allozyme loci in various species of plants, although determining phenotypic effects and agents of selection on these loci is more difficult. The prevalence of heterozygote advantage and its importance for the mainte- nance of genetic variation has long been a topic of debate (Lewontin and Hubby, 1966; Garrigan and Hedrick, 2003; Mitchell-Olds et al., 2007; Hedrick, 2012; Fijarczyk and Babik, 2015). There are few cases in which heterozygote advantage has been suggested in natural populations (Hedrick, 2012), and, as we find in this study, even fewer cases in which selection has been quantified. This may be due to the inherent difficulties in detecting heterozygote advantage (Garrigan and Hedrick, 2003). For example, genome scans can be used to detect a signature of balancing selection in nu-

41 cleotide polymorphism data, which may be indicative of heterozygote advantage. However, other processes can also lead to a signature of balancing selection, including spatial or temporal varia- tion in selection and frequency-dependent selection (Fijarczyk and Babik, 2015). Distinguishing between these possibilities is often not possible using DNA sequence data alone (Hedrick, 2012). Alternatively, heterozygote advantage may be rarely detected because it is, in fact, rare. Resolving the debate over whether heterozygote advantage is truly rare or simply difficult to detect is beyond the scope of our study.

1.8 Figures and Tables

A B 550 550

500 500

450 450

400 400 Study Anderson 350 350 Gompert Other 300 300

250 250 # of estimates 200 200

150 150

100 100

50 50

0 0

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Selection coefficient Selection coefficient

Figure 1.1: The distribution of directional selection coefficients, s. (a) The distribution of direc- tional selection coefficients included in the quantitative analysis. All selection coefficients are rep- resented as selection against the less-fit allele. (b) Directional selection coefficients, coloured by the study in which they were reported. Anderson et al. 2014; in light blue, reported 2793 selection coefficients. Gompert et al. 2014, in orange, contained 300 selection coefficients. All other studies, in dark gray, contained 323 estimates of selection.

42 A B C 400 300 Within generation 30 300 Significant 200 20 100 200

10 0 SNP 25

0 20 (< 200) Short 100 15 400 10

300 Not significant 5 0 0 200 50

40 Long (>/= 200) 200 # of estimates 100 30

0 20 150 10 Haplotype

0 40 Time period unspecified Time period 100 Not reported 6

4 20 50

2

0 0 0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Selection coefficient Selection coefficient Selection coefficient

Figure 1.2: The distribution of directional selection coefficients in the full dataset for different bio- logical and methodological categories. Coefficients were categorized by (a) statistical significance, (b) time period over which selection was measured and (c) genetic scale at which selection was measured. The vertical line in each histogram marks the uncorrected mean of selection coefficients in that category.

43 A B 0.3 0.3

0.2 0.2

0.1 0.1 Mean selection coefficient

0.0 0.0

Significant Not significantNot reported Significant Not significantNot reported Negative Positive Negative Positive reduced reduced reduced reduced reduced Statistical significanace Form of selection C D 0.4 0.25

0.20 0.3

0.15

0.2

0.10 Mean selection coefficient 0.1 0.05

0.0 0.00

Within Short Long Unspecified Within Short Long Unspecified SNP Haplotype SNP Haplotype reduced reduced reduced reduced reduced reduced Time period Genetic unit

Figure 1.3: Summary of mean selection coefficients across different biological and methodological categories. Diamonds and error bars represent the mean and 95% confidence intervals based on 10000 bootstrap replicates. Unfilled diamonds represent the reduced dataset and filled diamonds represent the full dataset. Selection coefficients were categorized by (a) statistical significance, (b) form of selection, (c) timescale and (d) genetic scale. N.B. that estimates of selection for beneficial alleles were converted into selection against the less favoured allele. The means and confidence intervals presented here are from those standardized estimates.

44 0.3

0.2

0.1 Mean selection coefficient

0.0

Uncorrected Error Study ID Error and study ID Model terms

Figure 1.4: Effect of accounting for autocorrelation and measurement error, in the subset of data for which standard errors were reported or could be calculated. The uncorrected estimate shows the mean and 95% confidence interval of the selection coefficient, based on 10000 bootstrap replicates. The other estimates are the posterior mode of the estimate of mean s from the three GLMMs that incorporated measurement error (error), study as a random factor (study ID), or both (error and study ID). Error bars show the upper and lower bounds of the 95% highest posterior density interval.

45 A B 4

30 3 Significant n = 38 2

1

0 5 20 4 Insignificant

3 n = 48

2

1 # of estimates # of estimates

0 10 30 Not reported 20 n = 54

10

0 0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 selection coefficient selection coefficient

Figure 1.5: The distribution of overdominant selection coefficients. (a) The distribution of all over- dominant selection coefficients. (b) The distribution of overdominant selection coefficients across different categories of statistical significance.

46 Full dataset Directional selection Studies 79 Taxon Type Unit of Selection Time Period Statistical significance Taxa 30 Vertebrates 202 SNP 2160 (131) Within generations 3131 (38) Significant 398 (106) Total # s 3556 Invertebrates 350 (50) Haplotype 1256 (192) Short-term 125 Not significant 2822 (21) Positive 1596 (224) Plants 2844 (51) Long-term 141 Not reported 196 Negative 1820 (99) Microbes 20 Unspecified 19 Overdominant 140*

Table 1.1: Summary of database and directional selection coefficients. Numbers in parentheses indicate the number of selection coefficients in the reduced dataset. *Estimates of overdominant selection report two selection coefficients per locus, one for the selective advantage over each of the two homozygotes.

(A) Full dataset (B) Reduced dataset mean 95% CI mean 95% CI Overall mean selection coefficeint 0.135 (0.131-0.140) 0.093 (0.078-0.110) Statistical Significance significant 0.279 (0.260-0.298) 0.106 (0.076-0.141) not significiant 0.118 (0.114-0.123) 0.074 (0.031-0.129) not reported 0.088 (0.070-0.108) 0.088 (0.070-0.108) Form of selection positive selection 0.121 (0.116-0.127) 0.063 (0.055-0.072) negative selection 0.147 (0.140-0.155) 0.16 (0.115-0.208) Type of Study experimental 0.14 (0.135-0.144) 0.122 (0.084-0.167) observational 0.09 (0.074-0.108) 0.09 (0.073-0.108) Time period within generation 0.141 (0.136-0.146) 0.232 (0.141-0.333) short-term (< 200 gens.) 0.114 (0.094-0.137) 0.114 (0.094-0.137) long term (>= 200 gens.) 0.044 (0.036-0.053) 0.044 (0.036-0.053) not specified 0.04 (0.024-0.062) 0.04 (0.024-0.062) Genetic unit haplotypes 0.128 (0.120-0.135) 0.121 (0.097-0.147) SNPs 0.14 (0.134-0.146) 0.052 (0.039-0.067)

Table 1.2: Mean s and 95% confidence intervals (determined by 10000 bootstrap replicates) of various methodological and biological categories, for both the (A) full dataset and (B) reduced dataset.

47 Posterior mode 95% HPD interval (A) Full dataset Selection coefficeint ~ 1 0.095 (0.066-0.124) Selection coeffient ~ form of selection positive selection 0.086 (0.065-0.117) negative selection 0.096 (0.074-0.133) Selection Coefficient ~ Type of Study experimental 0.097 (0.057-0.163) observational 0.095 (0.067-0.124) Selection coefficient ~ time period within generation 0.201 (0.123-0.351) short-term (< 200 gens.) 0.111 (0.077-0.174) long term (>= 200 gens.) 0.036 (0.023-0.065) not specified 0.032 (0.017-0.079) Selection coefficient ~ Genetic unit haplotypes 0.093 (0.067-0.124) SNPs 0.086 (0.065-0.123) (B) Subset with Standard Errors Selection coefficeint ~ 1 study ID 0.141 (0.055-0.231) measurement error 0.186 (0.174-0.202) study ID + measurement error 0.113 (0.050-0.217)

Table 1.3: Results of the generalized linear mixed models. Estimates are the posterior mode and lower and upper bounds of the 95% highest posterior density interval. (A) Results of GLMMs performed on the full dataset. Bolded text shows the form of the fixed effect model specification, and normal text shows each fixed factor within that analysis. All models incorporated study ID as a random factor. (B) Results of GLMMs performed on the subset of data for which standard errors could be calculated. Bolded text shows the form of the fixed effect model specification, and normal text lists the random factors included in the three models: study ID, measurement error or both.

Bibliography

Anderson, J. T., C.-R. Lee, and T. Mitchell-Olds (2014). Strong selection genome-wide enhances fitness trade-offs across environments and episodes of selection. Evolution 68(1), 16–31.

Anderson, J. T., C.-R. Lee, C. A. Rushworth, R. I. Colautti, and T. Mitchell-Olds (2013). Genetic

48 trade-offs and conditional neutrality contribute to local adaptation. Molecular Ecology 22(3), 699–708.

Barrett, R. D. H. and H. E. Hoekstra (2011). Molecular spandrels: tests of adaptation at the genetic level. Nature Reviews Genetics 12(11), 767–780.

Barrett, R. D. H., L. K. M’Gonigle, and S. P. Otto (2006). The distribution of beneficial mutant effects under strong selection. Genetics 174(4), 2071–2079.

Barrett, R. D. H., S. M. Rogers, and D. Schluter (2008). Natural selection on a major armor gene in threespine stickleback. Science 322(5899), 255–257.

Barrett, R. D. H. and D. Schluter (2008). Adaptation from standing genetic variation. Trends in Ecology & Evolution 23(1), 38–44.

Barton, N. H. and M. R. Servedio (2015). The interpretation of selection coefficients. Evolu- tion 69(5), 1101–1112.

Bérénos, C., P. A. Ellis, J. G. Pilkington, S. H. Lee, J. Gratten, and J. M. Pemberton (2015). Heterogeneity of genetic architecture of body size traits in a free-living population. Molecular Ecology 24(8), 1810–1830.

Brodie, III, E. D., A. J. Moore, and F. J. Janzen (1995). Visualizing and quantifying natural selec- tion. Trends in Ecology & Evolution 10(8), 313–318.

Charlesworth, B., M. T. Morgan, and D. Charlesworth (1993). The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289–1303.

Charlesworth, B. and S. I. Wright (2004). The HKA test revisited: a maximum-likelihood-ratio test of the standard neutral model. Genetics 168(2), 1071–1076.

Chen, H. and M. Slatkin (2013). Inferring selection intensity and allele age from multilocus hap- lotype structure. G3 3(8), 1429–1442.

49 Collins, S., J. De Meaux, and C. Acquisti (2007). Adaptive walks toward a moving optimum. Genetics 176(2), 1089–1099.

Endler, J. A. (1986). Natural Selection in the Wild. Princeton University Press.

Estes, S. and S. J. Arnold (2007). Resolving the paradox of stasis: models with stabilizing selection explain evolutionary divergence on all timescales. The American Naturalist 169(2), 227–244.

Excoffier, L., T. Hofer, and M. Foll (2009). Detecting loci under selection in a hierarchically structured population. Heredity 103(4), 285–298.

Eyre-Walker, A. and P. D. Keightley (2007). The distribution of fitness effects of new mutations. Nature Reviews Genetics 8(8), 610–618.

Fijarczyk, A. and W. Babik (2015). Detecting balancing selection in genomes: limits and prospects. Molecular Ecology 24(14), 3529–3545.

Foll, M., H. Shim, and J. D. Jensen (2015). WFABC: a Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data. Molecular Ecology Resources 15(1), 87–98.

Garrigan, D. and P. W. Hedrick (2003). Perspective: detecting adaptive molecular polymorphism: lessons from the MHC. Evolution 57(8), 1707–1722.

Gerbault, P., C. Moret, M. Currat, and A. Sanchez-Mazas (2009). Impact of selection and demog- raphy on the diffusion of lactase persistence. PLoS ONE 4(7), e6369.

Gingerich, P. D. (1983). Rates of evolution: effects of time and temporal scaling. Sci- ence 222(4620), 159–161.

Gompert, Z. (2016). Bayesian inference of selection in a heterogeneous environment from genetic time-series data. Molecular Ecology 25(1), 121–134.

Gompert, Z., A. A. Comeault, T. E. Farkas, J. L. Feder, T. L. Parchman, C. A. Buerkle, and P. Nosil

50 (2014). Experimental evidence for ecological selection on genome variation in the wild. Ecology Letters 17(3), 369–379.

Gould, S. J. and R. C. Lewontin (1979). The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist programme. Proceedings of the Royal Society B 205, 581–598.

Gurevitch, J. and L. V. Hedges (1999). Statistical issues in ecological meta-analyses. Ecol- ogy 80(4), 1142–1149.

Hadfield, J. D. (2010). MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. Journal of Statistical Software 33(2), 1–22.

Halsey, L. G., D. Curran-Everett, S. L. Vowler, and G. B. Drummond (2015). The fickle P value generates irreproducible results. Nature Methods 12(3), 179–185.

Hartl, D. L. and A. G. Clark (1997). Principles of Population Genetics (3rd ed.). Sunderland, MA: Sinauer Associates, Inc.

Hedrick, P. W. (2012). What is the evidence for heterozygote advantage selection? Trends in Ecology & Evolution 27(12), 698–704.

Hendry, A. P. and A. Gonzalez (2008). Whither adaptation? Biology & Philosophy 23(5), 673–699.

Hereford, J. (2009). A quantitative survey of local adaptation and fitness trade-offs. The American Naturalist 173(5), 579–588.

Hereford, J., T. F. Hansen, and D. Houle (2004). Comparing strengths of directional selection: how strong is strong? Evolution 58(10), 2133–2143.

Hermisson, J. and P. S. Pennings (2005). Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics 169(4), 2335–2352.

Hoekstra, H. E., K. E. Drumm, and M. W. Nachman (2004). Ecological genetics of adaptive color polymorphism in pocket mice: geographic variation in selected and neutral genes. Evolu- tion 58(6), 1329–1341.

51 Hoekstra, H. E., J. M. Hoekstra, D. Berrigan, S. N. Vignieri, A. Hoang, C. E. Hill, P. Beerli, and J. G. Kingsolver (2001). Strength and tempo of directional selection in the wild. Proceedings of the National Academy of Sciences 98(16), 9157–9160.

Hohenlohe, P. A. and W. A. Cresko (2010). Using population genomics to detect selection in natural populations: key concepts and methodological considerations. International Journal of Plant Sciences 171(9), 1059–1071.

Kawecki, T. J. and D. Ebert (2004). Conceptual issues in local adaptation. Ecology Letters 7(12), 1225–1241.

Kingsolver, J. G. and S. E. Diamond (2011). Phenotypic selection in natural populations: what limits directional selection? The American Naturalist 177(3), 346–357.

Kingsolver, J. G., S. E. Diamond, A. M. Siepielski, and S. M. Carlson (2012). Synthetic anal- yses of phenotypic selection in natural populations: lessons, limitations and future directions. Evolutionary Ecology 26(5), 1101–1118.

Kingsolver, J. G., H. E. Hoekstra, J. M. Hoekstra, D. Berrigan, S. N. Vignieri, C. E. Hill, A. Hoang, P. Gibert, and P. Beerli (2001). The strength of phenotypic selection in natural populations. The American Naturalist 157(3), 245–261.

Kinnison, M. T. and A. P. Hendry (2001). The pace of modern life II: from rates of contemporary microevolution to pattern and process. Genetica 112-113, 145–164.

Kopp, M. and J. Hermisson (2007). Adaptation of a quantitative trait to a moving optimum. Ge- netics 176(1), 715–719.

Kopp, M. and J. Hermisson (2009a). The genetic basis of phenotypic adaptation I: fixation of beneficial mutations in the moving optimum model. Genetics 182(1), 233–249.

Kopp, M. and J. Hermisson (2009b). The genetic basis of phenotypic adaptation II: the distribution of adaptive substitutions in the moving optimum model. Genetics 183(4), 1453–1476.

52 Kryazhimskiy, S., G. Tkacik, and J. B. Plotkin (2009). The dynamics of adaptation on correlated fitness landscapes. Proceedings of the National Academy of Sciences of the United States of America 106(44), 18638–18643.

Lande, R. and S. J. Arnold (1983). The measurement of selection on correlated characters. Evolu- tion 37(6), 1210–1226.

Lee, Y. W., B. A. Gould, and J. R. Stinchcombe (2014). Identifying the genes underlying quanti- tative traits: a rationale for the QTN programme. AoB PLANTS 6, plu004.

Leimu, R. and M. Fischer (2008). A meta-analysis of local adaptation in plants. PLoS ONE 3(12), e4010.

Lenormand, T., D. Bourguet, T. Guillemaud, and M. Raymond (1999). Tracking the evolution of insecticide resistance in the mosquito Culex pipiens. Nature 400(6747), 861–864.

Lenormand, T., T. Guillemaud, D. Bourguet, and M. Raymond (1998). Evaluating gene flow using selected markers: a case study. Genetics 149(3), 1383–1392.

Lewontin, R. C. and J. L. Hubby (1966). A molecular approach to the study of genic heterozy- gosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura. Genetics 54, 595–609.

Li, J., H. Li, M. Jakobsson, S. Li, P. Sjodin, and M. Lascoux (2012). Joint analysis of demography and selection in population genetics: where do we stand and where could we go? Molecular Ecology 21(1), 28–44.

Linnen, C. R. and H. E. Hoekstra (2009). Measuring natural selection on genotypes and phenotypes in the wild. Cold Spring Harbor Symposia on Quantitative Biology 74, 155–168.

Martin, G. and T. Lenormand (2006). A general multivariate extension of Fisher’s geometrical model and the distribution of mutation fitness effects across species. Evolution 60(5), 893–907.

53 Martin, G. and T. Lenormand (2015). The fitness effect of mutations across environments: Fisher’s geometrical model with multiple optima. Evolution 69(6), 1433–1447.

Messer, P. W. and R. A. Neher (2012). Estimating the strength of selective sweeps from deep population diversity data. Genetics 191(2), 593–605.

Mitchell-Olds, T., J. H. Willis, and D. B. Goldstein (2007). Which evolutionary processes influence natural genetic variation for phenotypic traits? Nature Reviews Genetics 8(11), 845–856.

Morrissey, M. B. and J. D. Hadfield (2012). Directional selection in temporally replicated studies is remarkably consistent. Evolution 66(2), 435–442.

Nei, M. (2005). Selectionism and neutralism in molecular evolution. Molecular Biology and Evolution 22(12), 2318–2342.

Nei, M., Y. Suzuki, and M. Nozawa (2010). The neutral theory of molecular evolution in the genomic era. Annual Review of Genomics and Human Genetics 11(1), 265–289.

Nielsen, R. (2009). Adaptionism—30 years after Gould and Lewontin. Evolution 63(10), 2487– 2490.

Nosil, P. (2004). Reproductive isolation caused by visual predation on migrants between divergent environments. Proceedings of the Royal Society B: Biological Sciences 271(1547), 1521–1528.

Nosil, P. and B. J. Crespi (2006). Experimental evidence that predation promotes divergence in adaptive radiation. Proceedings of the National Academy of Sciences 103(24), 9090–9095.

Nosil, P., Z. Gompert, T. E. Farkas, A. A. Comeault, J. L. Feder, C. A. Buerkle, and T. L. Parchman (2012). Genomic consequences of multiple speciation processes in a stick insect. Proceedings of the Royal Society B: Biological Sciences 279(1749), 5058–5065.

Nsanzabana, C., I. M. Hastings, J. Marfurt, I. Müller, K. Baea, L. Rare, A. Schapira, I. Felger, B. Betschart, T. A. Smith, H. P. Beck, and B. Genton (2010). Quantifying the evolution and

54 impact of antimalarial drug resistance: drug use, spread of resistance, and drug failure over a 12-year period in Papua New Guinea. The Journal of Infectious Diseases 201(3), 435–443.

Ohashi, J., I. Naka, J. Patarapotikul, H. Hananantachai, G. Brittenham, S. Looareesuwan, A. G. Clark, and K. Tokunaga (2004). Extended linkage disequilibrium surrounding the hemoglobin E variant due to malarial selection. The American Journal of Human Genetics 74(6), 1198–1208.

Ohashi, J., I. Naka, and N. Tsuchiya (2011). The impact of natural selection on an ABCC11 SNP determining earwax type. Molecular Biology and Evolution 28(1), 849–857.

Orengo, D. J. and M. Aguade (2007). Genome scans of variation and adaptive change: extended analysis of a candidate locus close to the phantom gene region in Drosophila melanogaster. Molecular Biology and Evolution 24(5), 1122–1129.

Orr, H. A. (1998). The population genetics of adaptation: the distribution of factors fixed during adaptive evolution. Evolution 52(4), 935–949.

Orr, H. A. (2003). The distribution of fitness effects among beneficial mutations. Genetics 163(4), 1519–1526.

Orr, H. A. (2005a). The genetic theory of adaptation: a brief history. Nature Reviews Genetics 6(2), 119–127.

Orr, H. A. (2005b). Theories of adaptation: what they do and don’t say. Genetica 123(1-2), 3–13.

Quesada, H., U. E. M. Ramírez, J. Rozas, and M. Aguadé (2003). Large-scale adaptive hitchhiking upon high recombination in Drosophila simulans. Genetics 165(2), 895–900.

R Core Team (2013). R: A language and environment for statistical computing. https://www. R-project.org/.

Rieseberg, L. H., A. Widmer, A. M. Arntz, and J. M. Burke (2002). Directional selection is the primary cause of phenotypic diversification. Proceedings of the National Academy of Sci- ences 99(19), 12242–12245.

55 Robinson, S. J., M. D. Samuel, C. J. Johnson, M. Adams, and D. I. McKenzie (2012). Emerging prion disease drives host selection in a wildlife population. Ecological Applications 22(3), 1050– 1059.

Rockman, M. V. (2012). The QTN program and the alleles that matter for evolution: all that’s gold does not glitter. Evolution 66(1), 1–17.

Roper, C., R. Pearce, B. Bredenkamp, J. Gumede, C. Drakeley, F. Mosha, D. Chandramohan, and B. Sharp (2003). Antifolate antimalarial resistance in southeast Africa: a population-based analysis. The Lancet 361(9364), 1174–1181.

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bul- letin 86(3), 638–641.

Sandoval, C. P. (1994). The effects of the relative geographic scales of gene flow and selection on morph frequencies in the walking-stick Timema cristinae. Evolution 48(6), 1866–1879.

Seetharaman, S. and K. Jain (2013). Adaptive walks and distribution of beneficial fitness effects. Evolution 68(4), 965–975.

Siepielski, A. M., J. D. DiBattista, and S. M. Carlson (2009). It’s about time: the temporal dynamics of phenotypic selection in the wild. Ecology Letters 12(11), 1261–1276.

Siepielski, A. M., J. D. DiBattista, J. A. Evans, and S. M. Carlson (2011). Differences in the temporal dynamics of phenotypic selection among fitness components in the wild. Proceedings of the Royal Society B: Biological Sciences 278(1711), 1572–1580.

Siepielski, A. M., K. M. Gotanda, M. B. Morrissey, S. E. Diamond, J. D. DiBattista, and S. M. Carlson (2013). The spatial patterns of directional phenotypic selection. Ecology Letters 16(11), 1382–1392.

Slatkin, M. (2008). A Bayesian method for jointly estimating allele age and selection intensity. Genetics Research 90(1), 129–137.

56 Smith, J. M. and J. Haigh (1974). The hitch-hiking effect of a favourable gene. Genetical Re- search 23, 23–35.

Taylor, S. M., A. Antonia, G. Feng, V. Mwapasa, E. Chaluluka, M. Molyneux, F. O. ter Kuile, S. J. Rogerson, and S. R. Meshnick (2012). Adaptive evolution and fixation of drug-resistant Plas- modium falciparum genotypes in pregnancy-associated malaria: 9-year results from the QuEER- PAM study. Infection, Genetics and Evolution 12(2), 282–290.

Turchin, M. C., C. W. Chiang, C. D. Palmer, S. Sankararaman, D. Reich, Genetic Investigation of ANthropometric Traits (GIANT) Consortium, and J. N. Hirschhorn (2012). Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nature Genet- ics 44(9), 1015–1019.

Vitalis, R., M. Gautier, K. J. Dawson, and M. A. Beaumont (2014). Detecting and measuring selection from gene frequency data. Genetics 196(3), 799–817.

Vitti, J. J., S. R. Grossman, and P. C. Sabeti (2013). Detecting natural selection in genomic data. Annual Review of Genetics 47(1), 97–120.

Yeaman, S. and M. C. Whitlock (2011). The genetic architecture of adaptation under migra- tion–selection balance. Evolution 65(7), 1897–1911.

57 Linking Statement 1

Chapter 1 reviews published estimates of selection coefficients, s, estimated in natural pop- ulations. I find that selection at the genetic level is roughly exponentially distributed and that it varies with the biological scale and methodology with which it is measured. Beyond these empir- ical findings, I also review the state of the field and make recommendations for future research. In particular, I note that (1) more researchers need to quantify natural selection in the wild, in- stead of simply invoking it as a possible explanation, and (2) authors need to be clearer in their mathematical and statistical reporting of selection coefficients. In Chapter 2 I follow my own advice and quantify natural selection at the genetic level in a moving hybrid zone between two races of Heliconius butterflies. Heliconius have bright wing colour patterns which are aposematic and under strong natural selection to mimic other toxic but- terfly species. Importantly, the genetic basis for these colours is known, so we can study phenotype frequencies and allele frequencies simultaneously. We track changes in the position and shape of the hybrid zone through time using a novel Bayesian implementation of cline models. From these changes, we quantify how natural selection on mimetic alleles has decreased over time, and use remotely-sensed data on forest cover and productivity to test hypotheses about how environmental disturbance can shape natural selection and evolution in natural populations.

58 Chapter 2

Movement of a Heliconius hybrid zone over 30 years: a Bayesian approach

Timothy J. Thurman1,2, Andre Szejner-Sigal1,3, W. Owen McMillan1 Author affiliations: 1 Smithsonian Tropical Research Institute. Panamá, República de Panamá 2 Redpath Museum and Department of Biology, McGill University. Montréal, QC. Canada 3 Department of Integrative Biology, University of California, Berkeley. Berkeley, CA. USA This chapter is published in Journal of Evolutionary Biology 32 (9), 974-983.

2.1 Abstract

Hybrid zones have long been of interest to biologists as natural laboratories where we can gain insight into the processes of adaptation and speciation. Repeated sampling of individual hybrid zones has been particularly useful in elucidating the dynamic balance between selection and dispersal that maintains most hybrid zones. Here, we revisit a hybrid zone between Heliconius erato butterflies in Panamá for a third time over more than 30 years. We combine a novel Bayesian extension of stepped-cline hybrid zone models with environmental data to understand the genetic and environmental causes of cline dynamics in this species. The cline has continued to move west, likely due to dominance drive, but has slowed and broadened. Environmental analyses suggest that

59 widespread deforestation in Panamá could be leading to decreased avian predation and relaxed selection, causing the observed changes in cline dynamics.

2.2 Introduction

Hybrid zones are areas where distinct taxa meet, interbreed, and produce hybrids (Barton and Hewitt, 1985). This broad definition encompasses biological processes at many levels of or- ganization. Hybrid zones can be inferred from morphology (e.g., Delahaie et al., 2017) or genetic markers (e.g., Porter et al., 1997). They may occur between populations of the same species (e.g., Wilson et al., 2016), or closely related species (e.g., Miller et al., 2014); and they can be created by both neutral and selective forces. Whatever the type, scale, or cause, hybrid zones have long been of interest to biologists as “natural laboratories” where we can gain insight into the processes of adaptation and speciation (Haldane, 1948; Harrison, 1990; Buggs, 2007; Gompert et al., 2017). Many hybrid zones are maintained by a balance between dispersal and selection. Gene flow widens and homogenizes hybrid zones; selection against migrants or hybrids narrows hybrid zones. For some hybrid zones, this balance is relatively stable and clines remain unchanged for decades (e.g., Mettler and Spellman, 2009; Rosser et al., 2014). Many hybrid zones, however, have moved or changed over relatively short timespans (reviewed in Buggs, 2007; more recent examples include Roy et al., 2012; Leaché et al., 2017; Hunter et al., 2017). Studies of hybrid zone dynamics are thus an excellent way to watch evolution in action and understand how and whether natural selection and gene flow vary through time. This is especially true when researchers incorporate data on the biotic or abiotic factors that are thought to be important for fitness and dispersal within particular hybrid zones. Here, we revisit a hybrid zone between races of Heliconius erato butterflies. The hybrid zone between H. e. hydara and H. e. demophoon in Panamá was first characterized by Mallet (1986), who predicted that the cline would move west as the genetically dominant H. e. hydara colour pattern replaced the H. e. demophoon pattern, a phenomenon he termed “dominance drive”. Blum (2002) resampled the cline and found that it had moved as predicted, but hypothesized that

60 other factors besides dominance, particularly deforestation, could be causing hybrid zone move- ment. Using new collections data from 2015, we examine whether the cline movement predicted by Mallet (1986) and characterized by Blum (2002) has continued. To do this, we develop novel Bayesian variants of the classic stepped-cline models of Szy- mura and Barton (1986). These models are frequently used to describe the position and shape of the geographic clines in allele frequency which are characteristic of hybrid zones (Barton and Gale, 1993). However, they pose some statistical challenges. First, they assume statistical independence between sampled alleles. This assumption may be violated in natural populations. This causes models to underestimate statistical uncertainty, leading to overly precise and possibly incorrect parameter estimates (Szymura and Barton, 1986). Various corrections have been developed to account for this statistical dependence (e.g., Szymura and Barton, 1986; Phillips et al., 2004; Alexandrino et al., 2005; Macholán et al., 2008). However, these adjustments make use of maximum-likelihood point estimates of population genetic param- eters (e.g., FIS). Such methods do not account for the uncertainty in estimating those parameters and thus continue to underestimate uncertainty. Our novel Bayesian approach uses genotype infor- mation to simultaneously estimate inbreeding coefficients, allele frequencies, and cline parameters while accounting for uncertainty in all parameters. Second, the likelihood equations describing the stepped-cline models are not analytically tractable or easily approximated. To solve this, cline researchers have generally used a variant of the Metropolis-Hastings Markov chain Monte Carlo (MH-MCMC) algorithm to fit models (Szy- mura and Barton, 1986; Porter et al., 1997; Gay et al., 2008; Derryberry et al., 2014). This method is effective, but recent advances in statistical computing have led to novel algorithms that improve upon MH-MCMC. We use a version of Hamiltonian Monte Carlo (HMC), an MCMC algorithm that is faster and more efficient for high-dimensional models than the Metropolis-Hastings algo- rithm (Betancourt, 2017). We first validate our models on simulated data and by re-analyzing the data collected by Mallet (1986) and Blum (2002). Then, we apply them to our new collections data from 2015 to see

61 whether and how the hybrid zone has changed since 2000. Finally, we combine our cline analysis with data on forest loss to evaluate the hypothesis that changes in the environment could drive the observed hybrid zone movement. Our work shows how repeated sampling of a hybrid zone can be combined with novel methods and environmental data to better understand the evolutionary and ecological forces which influence hybrid zone dynamics.

2.3 Materials and Methods

2.3.1 Study species

Heliconius are well known for their brightly-coloured wings, which are aposematic signals of distastefulness to avian predators (Supple et al., 2014). These colour patterns are under positive frequency-dependent selection within local mimicry rings (Merrill et al., 2012; Chouteau et al., 2016), such that hybrid zones between races that differ in colour pattern are maintained by strong selection against novel colour patterns (Benson, 1972; Mallet and Barton, 1989b; Kapan, 2001). Three forms of Heliconius erato with similar colour patterns can be found in Panamá. H. e. demophoon is broadly distributed across Costa Rica and Panamá and displays the “postman” colour pattern commonly found in Heliconius, with red forewing bands and yellow hindwing bands on a black background (Figure 2.1). H. e. hydara, found in northern Colombia and eastern Panamá, lacks the yellow hindwing bar (Figure 2.1). The western Colombian race H. e. venus can also be found, though more rarely, in eastern Panamá. It has the postman pattern, but with the yellow hindwing band only on the ventral side (Figure 2.1).

2.3.2 Sample collection

We extracted collection data from Mallet (1986) and Blum (2002) so that we could re- estimate the previous clines under a common framework. We included 15 of Mallet’s sites and 22 of Blum’s sites in our analysis (see section A.3.1). For our collections, we sampled H. erato from June to November of 2015 at 17 sites along a transect running from west to east across Panamá (Figure 2.1, Table A.4). When possible, we collected butterflies at the same GPS coordinates as

62 Mallet (1986) and Blum (2002). We captured butterflies with aerial nets.

2.3.3 Phenotyping and Genotyping

Following Mallet (1986), we classified H. erato phenotypes into four categories: (A) the north Colombian race, H. e. hydara; (B) heterozygote; (C) the west Colombian race, H. e. venus; and (D) the Central American race, H. e. demophoon. The races are easy to identify by colour pattern (Figure 2.1). Heterozygotes are distinguished from pure-type H. e. hydara by the presence of a faint band of yellow scales on the hindwing, most often visible as a shadow of the yellow band on the ventral hindwing (Mallet, 1986, Figure 2.1). In H. erato the presence or absence of the yellow hindwing band is controlled by alternative alleles at one locus, the Cr locus, with known dominance relationships (Sheppard et al., 1985; Mallet, 1986; Nadeau et al., 2016; Van Belleghem et al., 2017). In Panamá there are three alleles at this locus, which we designate: (1) CrHYD, the dominant, black-hindwing allele found in H. e. hydara; (2) CrWC, the recessive, ventral-only yellow allele found in the west Colombian H. e. venus; and (3) CrCA, the recessive, yellow allele found in the Central American H. e. demophoon. We can use the known phenotype-genotype relationships for this colour pattern to determine genotype frequencies for each site (see section A.3.2 for details). For this study, we focus on the dominant

CrHYD allele, which can be directly observed, and pool the rare CrWC yellow allele with the more common CrCA yellow allele, as they cannot be visually distinguished in heterozygotes. We placed all 54 sites from the three sampling periods onto a common one-dimensional transect by fitting a cubic regression to the GPS points of all collection sites. For each site, we found the point on the cubic transect closest to the true location. We used those points to calculate the one-dimensional distance along the transect relative to the westernmost site, accounting for both the curve of the cubic transect and the curvature of the earth (Figure 2.1, see section A.3.3).

2.3.4 Cline models

Most previous maximum-likelihood-based models fit cline parameters to allele frequency data, with the binomial errors for allele frequency estimates determined by the number of alleles

63 sampled. This procedure assumes that each sampled allele is statistically independent from the other sampled alleles: in other words, the sampled alleles are independent, identically distributed (iid) random variables. In natural populations, this assumption may be violated. For example, in- breeding could lead to statistical dependence between the two alleles sampled from a diploid indi- vidual. If these dependencies are not accounted for cline inference can be affected: undue weight is given to statistically dependent samples and the amount of uncertainty is underestimated (Szymura and Barton, 1986). The most common solution to this problem has been to adjust the binomial error with an effective sample size correction (Szymura and Barton, 1986; Phillips et al., 2004; Alexandrino

et al., 2005; Macholán et al., 2008). These methods use maximum-likelihood estimates of FIS to correct for statistical dependence due to inbreeding within a population. In a population with no inbreeding, for example, a sample of 30 diploid individuals would yield 60 independent alleles. In a completely inbred population (FIS = 1), however, 30 diploid individuals would yield an effective sample size of only 30 independent alleles (one from each individual). However, corrections which use maximum-likelihood point estimates of FIS do not account for the fact that inbreeding is itself estimated with uncertainty. When the true level of inbreeding differs from the estimate, the effective sample size adjustment will over- or under-correct. Our model avoids effective sample size corrections. Instead, it works directly from geno- type information to account for both statistical dependence between samples due to inbreeding and uncertainty in estimating the level of inbreeding. We use a multinomial likelihood in which the observed data are the number of sampled individuals of each genotype. With two alleles, A and a, the likelihood is:

(2.1) (AAi, Aai, aai) = Multinomial(Ni; pAAi , pAai , paai )

where (AA, Aa, aa) is the number of individuals of each genotype collected, N is the total

64 number of individuals, (pAA, pAa, paa) is the expected frequency of each genotype, and i indexes collecting sites. The expected frequency of each genotype is a function of p, the frequency of the

A allele, and the inbreeding coefficient, FIS, for each site (Hartl and Clark, 1997):

2 pAAi = pi + pi(1 − pi)FISi (2.2) pAai = 2pi(1 − pi)(1 − FISi )

2 paai = (1 − pi) + pi(1 − pi)FISi

In our Bayesian framework, each site’s FIS is not a single unchanging value. Instead it is a parameter to be estimated, with a corresponding posterior distribution. The expected allele frequency p for each site is determined by the cline equation. We fit five variants of the classic stepped cline equations from Szymura and Barton (1986). All share the same sigmoid form in the center of the cline, but differ in their inclusion of expo- nential introgression tails (no tails, left tail only, right tail only, mirrored tails, or independent left and right tails, as in Derryberry et al., 2014). Introgression tails are used to model the differen- tial strength of selection that can occur across the hybrid zone due to linkage disequilibrium (LD) (Barton, 1983). When multiple loci are under selection, LD in the center of the zone causes loci to experience stronger selection (both direct selection and indirect selection on linked loci). This stronger barrier to gene flow causes a steeper (narrower) cline in the center of the hybrid zone. At the tails of the hybrid zone where LD is lessened, loci experience weaker selection (direct selection only). Stepped-cline models use separate equations for the center and tails of the cline to account for this variation in selection (Szymura and Barton, 1986). Below we present the cline equation for the simplest model (without tails), see section A.3.4 for the equations for more complex models:

4 (xi−c) e w pi = pmin + (pmax − pmin) (2.3) 4 (xi−c) 1 + e w

65 A cline is thus described by at least four parameters: c is the center of the cline, w is

the width of the cline (such that 1/w is the maximal slope of the cline), and pmin and pmax are the minimum and maximum values of p in the cline tails. x is the distance along the transect. Introgression tails are modeled with two parameters, δ and τ (Gay et al., 2008). δ is the distance from the cline center at which the introgression tail starts. τ is the ratio (constrained between 0 and 1) of the slope of the introgression tail to the slope of the sigmoid center at distance c ± δ along the transect (Gay et al. 2008, see section A.3.4 for details). The observed data in our model are the numbers of individuals of each genotype collected at each site, (AAi, Aai, aai), and the distances of each site along the transect, xi. All other variables (cline parameters, per-site inbreeding coefficients, and per-site allele and genotype frequencies) are estimated as parameters in our model. In our analysis of the H. erato hybrid zone, we consider the

CrHYD allele as the focal allele, A, and pool the two yellow-band alleles together as a. We fit these cline models in a Bayesian framework using Stan v2.17.0 and RStan v2.17.3 (Carpenter et al., 2017; Stan Development Team, 2018). Stan uses an extension of Hamiltonian Monte Carlo, called the No-U-Turn Sampler (NUTS). NUTS-HMC is well-suited for our model, as it is generally faster and more efficient than other MCMC algorithms at sampling from high- dimensional posterior distributions and it does not require prior distributions to be conjugate to the likelihood (Hoffman and Gelman, 2011; Betancourt, 2017).

2.3.5 Cline analysis- simulated data

We tested our model against simulated data to confirm its accuracy and compare it to sim- ilar, maximum-likelihood-based models. We simulated datasets using a range of cline parameters,

all without introgression tails: cline center constant at 200km; width of 20km and 80km, pmin of

0.04 and 0.15, pmax of 0.85 and 0.97, and FIS of 0, 0.1, 0.25, 0.5, 0.75, and 1. For each of the 48 possible combinations, we simulated 15 datasets. For each of these 720 datasets, we fit cline mod- els without introgression tails using both our novel Bayesian model and a maximum-likelihood model implemented in the R package HZAR (Derryberry et al., 2014), using an effective sample

66 size correction for inbreeding (Alexandrino et al., 2005). Results were comparable between the Bayesian and maximum-likelihood models, though our approach was slightly more accurate in terms of both point estimates of parameters and quantification of uncertainty (see section A.3.5 for full details of simulation results).

2.3.6 Cline analysis- Heliconius

For our analysis of the H. erato hybrid zone, we fit models seperately for each sampling year. We placed weak normal priors on the center N(350, 100) and width N(50, 100), both con-

strained to be positive. For pmin and pmax, we used uniform priors of U(0, 0.2) and U(0.8, 1), respectively. Priors for δ and τ were Exponential(0.05) and U(0, 1), and the prior for FIS was U(0, 1). For all models, we fit four independent chains with 3000 iterations of warm-up and 7000 it- erations of sampling, for a total of 28000 samples from the posterior distribution. We assessed con- vergence using the Gelman-Rubin Rˆ statistic (Gelman and Rubin, 1992) and by visually inspecting trace plots. We also examined the number of effective samples from the posterior distribution. For each year, we fit all 5 possible tail models, then we selected a single model for inference. We chose the model with the largest Akaike weight, using WAIC as our deviance statistic (Watanabe, 2010; McElreath, 2015). For these three best-fit models for each year, we further assessed model fit us- ing posterior predictive checks. Finally, we generated point estimates and credible intervals for each parameter using the mean and 95% highest posterior density interval (HPDI) of the marginal posterior distribution of each parameter.

2.3.7 Forest change analysis

Blum (2002) noted that other factors besides dominance drive could be influencing cline movement. Specifically, Blum suggested that H. e. hydara may have a selective advantage in open, savannah-like habitats (Blum, 2002, 2008). Widespread deforestation in eastern Panamá may have led to the creation of habitat that favours H. e. hydara, shifting the hybrid zone west (Blum, 2002). To test this hypothesis, we extracted data on forest loss and vegetation for Panamá from v1.5

67 of the Global Forest Change dataset of Hansen et al. (2013), which provides high-resolution (1 arcsecond, ~30 meter), global landsat data for the years 2000 and 2017. Unfortunately, equivalent data is unavailable for the time of Mallet’s 1986 sampling. We generated 47 circles of radius 5km at 15km intervals along our transect. Within each circle we calculated four environmental measures from the Hansen et al. (2013) dataset: (1) the proportion of forest cover lost from the year 2000 to 2017, (2) the mean normalized difference vegetation index (NDVI) for the year 2000, (3) the mean NDVI for the year 2017, and (4) the

mean difference in NDVI (∆NDVI) from 2017 to 2000 (i.e., ∆NDVI = NDVI2017 - NDVI2000) (see section A.3.6). NDVI is an index, ranging from -1 to 1, which uses the ratio of the reflectances of near-infrared and red light to characterize vegetation (Pettorelli et al., 2005). NDVI is correlated with fractional vegetation cover, vegetation biomass, and productivity: values closer to 1 indicate higher amounts of vegetation cover, biomass, and productivity (Carlson and Ripley, 1997; Pet- torelli et al., 2005). When comparing differences in NDVI across years, negative values of ∆NDVI indicate decreases in forest cover, biomass, or productivity, while positive values of ∆NDVI indi- cate increases. Using these data, we first looked for clinal variation in forest cover loss, NDVI, and change in NDVI across Panamá which might correspond to clinal variation in allele frequencies. Then, using the best-fit clines for the years 2000 and 2015, we calculated the predicted change in allele frequency for each of these 47 points along the transect. To directly examine whether deforestation is driving hybrid zone movement, we tested for a correlation between (1) proportion of forest lost and change in allele frequency, and (2) change in NDVI and change in allele frequency. If hybrid zone movement is being driven by the creation of deforested habitat which favours H. e. hydara, we would expect a positive correlation between forest loss and change in CrHYD frequency. Similarly, we would expect a negative correlation between ∆NDVI and change in CrHYD frequency.

68 2.4 Results

In total, we analyzed 1460 butterflies (N1982 = 517, N1999 = 434, N2015 = 509). Our diag- nostics indicated that all models effectively sampled the posterior distribution: chains were well- mixed, Rˆ = 1 for all parameters, and effective sample sizes were sufficient for inference (Figures A.10-A.12). For each year, we used WAIC to select a single best-fit model. The best-fit model for the 1982 cline had no introgression tails, while the best-fit models for the 1999-2000 and 2015 clines had introgression tails on the right side of the cline only. Posterior predictive checks indi- cated no serious discrepancies between model fit and our observed data (Figures A.14-A.16). The difference in Akaikie weights between the best-fit and second- or third-ranked model was small (Tables A.11-A.13). Likewise, the estimates of cline position and width within a year were sim- ilar regardless of the inclusion of introgression tails (Tables A.8-A.10). Levels of inbreeding, as quantified with FIS, varied across the hybrid zone (Figure A.13). At most sites the 95% credible intervals for FIS overlapped 0. This may be indicative of low levels of inbreeding, as seen in pre- vious studies of Heliconius and other butterflies in Panamá (Mallet, 1986; Dasmahapatra et al., 2002). However, it may simply reflect the difficulty of precisely estimating inbreeding: most of the 95% CIs were quite large (Figure A.13). This underscores the need to use methods which model uncertainty in FIS instead of using fixed point estimates.

2.4.1 Cline results

Our estimates of the cline parameters for the 1982 and 1999 clines are consistent with the maximum likelihood estimates of the same clines from Blum (2002), further confirming the suitability of our model (Table 2.1). The numeric values for cline center are not directly comparable across these studies, as Blum measured all cline distances in relation to Panama City, while we measured distances in relation to the westernmost sampling site. However, the cline centers we find for 1982 (516 km along the transect, just west of Cañazas) and 1999-2000 (467 km along the transect, on the eastern edge of Lake Bayano) are the same as those estimated by Blum (2002). Similarly, our estimates of cline width for 1982 (53km, 95% CI 35-71) and 1999-2000 (60km,

69 95% CI 36-85) are similar to Blum’s estimates (66km and 58km, respectively) (Table 2.1). From our sampling in 2015, we find that the hybrid zone has continued to move west (Figure 2.2). The cline center is now 451km (95% CI 442-460) from the start of the transect, near the western edge of Lake Bayano. This change in position is statistically significant, though the rate of cline movement has slowed. While the cline moved 48km (95% CI 41-56) from 1982-1999 (~2.8 km/year), it has moved only 16km (95% CI 5-27) from 1999-2015 (~1 km/year). This slowdown of cline movement has coincided with an increase in cline width (Table 2.1, Figure 2.2). Cline width stayed relatively constant from 1982 to 1999 (53km and 60km), but the hybrid zone was 93 km wide in 2015. The 2015 cline is significantly wider than the 1982 cline (95% CI of the posterior distribution of difference in width; 6-74), and wider than the 1999 cline, though this difference is marginally not significant (95% CI of posterior distribution of difference

in width; -5-71, P (width2015 − width1999 > 0) = 0.956). Motivated by empirical studies of Heliconius, Mallet and Barton (1989a) developed theory to relate cline width and velocity to dispersal and strength of selection. They showed that for single-locus clines maintained by frequency-dependent selection at a dominant gene, cline width is determined by the balance between gene flow and selection such that:

r 8σ2 w ≈ (2.4) s

where σ is the mean dispersal distance per generation and s is the strength of selection against novel colour patterns on either side of the cline. This analytical approximation requires some simplifying assumptions (weak selection, Hardy-Weinberg equilibrium), but computer simu- lations show it is relatively accurate even when those assumptions do not hold, as is likely the case here (Mallet and Barton, 1989a). If we further assume that cline movement is caused solely by the effect of dominance, we can also relate cline velocity (in km per generation, assuming 4 generations per year) to selection and dispersal as (Mallet and Barton, 1989a; Blum, 2002):

70 √ v = 0.1 2σ2s (2.5)

Following Blum (2002), we use eqs. 2.4 and 2.5 in a system of equations to estimate σ and s from the full posterior distributions of cline velocity from 1999-2015 and cline width in 2015. Dispersal distance, σ, is 7.5km (95% CI 4.8-10.1). This is within the range of plausible values for H. erato (Mallet, 1986; Mallet et al., 1990) and slightly less than the estimates from Blum (2002) for the previous clines (9.7-10.4). The best estimate for s is 0.05 (95% CI 0.01-0.10), much smaller than Blum’s estimates from the earlier clines (0.2-0.22). This estimate of s is also much lower than selection coefficients calculated for other hybrid zones between Heliconius races that differ more dramatically in colour, as we would expect (Mallet et al., 1990).

2.4.2 Forest change and hybrid zone movement

All four environmental measures (proportion of forest lost, NDVI in 2000, NDVI in 2017, and change in NDVI) varied across Panamá, though none displayed clinal variation that coincided with the H. erato hybrid zone (Figure A.17). NDVI, a proxy for forest biomass and productivity, was higher in eastern Panamá where H. e. hydara is the dominant form. This is perhaps contrary to the idea that H. e. hydara is at an advantage in open habitats. Similarly, we did not find a positive correlation between proportion of forest loss and change in CrHYD, but rather a significant negative correlation (Spearman’s ρ = -0.31, p = 0.04, Figure 2.3a). We did observe the expected negative correlation between ∆NDVI and change in CrHYD frequency, but it was weaker and not statisti- cally significant (Spearman’s ρ = -0.16, p = 0.27, Figure 2.3b). Taken together, these results are inconsistent with the hypothesis that cline movement is being driven by deforestation as opposed to dominance drive.

2.5 Discussion

In this study, we have revisited a Heliconius erato hybrid zone in Panamá. We extended the popular stepped-cline hybrid zone models of Szymura and Barton (1986) within a Bayesian frame-

71 work, then applied this method to three rounds of sampling of the H. erato hybrid zone over more than 30 years. We successfully re-estimate cline parameters from older samples and show that the hybrid zone has continued to move west, albeit more slowly. We find little evidence that defor- estation is directly causing hybrid zone movement. However, the decrease in cline velocity and increase in width suggest that the strength of selection against novel colour patterns has decreased dramatically over the last 15 years, possibly due to environmental changes. Our Bayesian cline fitting method has some advantages over previous maximum-likelihood based methods. First, it avoids using point estimates of FIS to correct for statistical dependence due to inbreeding. When point estimates are inaccurate, corrections based off of them will be inaccurate as well. Our method treats inbreeding as a parameter to be estimated, not a fixed quantity with a known value. We can thus properly account for the error inherent in estimating inbreeding itself, and propagate this uncertainty to the estimation of the other parameters in our model (i.e., the cline parameters). Within the parameter space we simulated, this proper accounting for uncertainty leads to a slight increase in accuracy (see section A.3.5). Second, our method takes advantage of the advances in numerical computing and MCMC algorithm design provided by the Stan statistical programming language. This makes it much faster than previous approaches: on our simulated data, for example, each model fit took ~10X longer using the MH-MCMC algorithm of HZAR (Derryberry et al. 2014, see section A.3.5). Finally, as a Bayesian approach, our method provides the full posterior distribution for each parameter, not simply point estimates and confidence intervals. This is very useful for propagating uncertainty to further analysis. For example, we related cline velocity and width to the evolutionary parameters of s and σ. Because we could use the posterior distributions for cline velocity and width in our equations 2.4 and 2.5 (instead of only point estimates), we could in turn derive full posterior distributions for s and σ, providing us with accurate estimates of uncertainty. One limitation of our model is that it corrects for statistical dependence due to inbreeding only, using data from a single locus. This is appropriate for our analysis of this H. erato hybrid zone, for which only single-locus data are available. Our model could also be applied to single

72 SNPs in the context of a genomic study, but it would not take full advantage of the extra infor- mation contained in multilocus genotypes. In particular, data from multiple loci can be useful in determining if individuals within a population are closely related. Such relatedness can also lead to non-independence between sampled alleles. In the context of cline analysis, there are currently effective sample size corrections that seek to account for relatedness (e.g., Macholán et al., 2008).

These corrections use maximum-likelihood point estimates of FST , which suffer from the same problems as point estimates of FIS. Extending our Bayesian framework to incorporate multilocus genotype data and simultaneously correct for inbreeding and relatedness would be a promising avenue for future work. Applying our model to the H. erato hybrid zone in Panamá, we see two phases of hybrid zone dynamics: from 1982-1999 the zone moved rapidly west while staying roughly the same width, while from 2000-2015 the zone slowed down considerably and became much wider (Table 2.1, Figure 2.2). Our models suggest that the shape of the hybrid zone may have also changed with regard to the presence of introgression tails: the best fit model for 1982 had no tails, while the models for 1999 and 2015 had tails on the right (eastern) side of the cline. Introgression tails on the eastern side of the cline make sense: as a recessive allele, the CrCA allele will only be selected against when it is present in a homozygote and thus can more easily move across the hybrid zone. However, the small differences in WAIC between models and relatively even Akaike weights preclude any definitive conclusions about the presence or absence of introgression tails. This is consistent with previous literature, which has cautioned that it is difficult to distinguish between alternative cline models without very thorough sampling (Barton and Gale, 1993). Deforestation has been proposed as a factor that could drive the cline to the west (Blum, 2002). However, our forest change analyses, particularly the negative correlation between forest cover loss and allele frequency change, suggest that deforestation is not a main driver of cline movement (Figure 2.3). This is not to say that forest cover has no effect on cline dynamics. Forest cover and NDVI decreased all across our transect (Figure A.17). Avian predators are the primary drivers of mimicry selection in Heliconius (Pinheiro, 1996, 2011). If forest loss leads to habitat loss

73 and decreased population sizes of avian predators, we would expect to see a decrease in mimicry selection across the hybrid zone. Consistent with this, our estimates of selection and dispersal from 2000-2015 suggest that while dispersal has decreased only slightly (from ~10 km to ~7.5 km), selection has dramatically decreased by 75% (s2000 ≈ 0.2, s2015 ≈ 0.05). To explain the observed changes in hybrid zone dynamics, we propose a scenario in which dominance drive is the primary force moving the cline to the west, but environmental factors such as forest cover and productivity moderate the speed and width of the zone. Of course, we cannot rule out more complex scenarios in which dominance drive, defor- estation, and other, unmeasured factors combine to determine cline dynamics in this system. For example, hybrid zones may slow down and become trapped in regions of low population density. This has been suggested as the cause of the stability of Heliconius hybrid zones in northern Peru, where hybrid zones coincide with a local peak in rainfall (Rosser et al., 2014). We have little evi- dence that the hybrid zone in Panamá is approaching a region of low population density. We did not have more difficulty collecting butterflies near the center of the 2015 cline, though this is merely anecdotal, as our sampling procedure was not meant to estimate population densities. Although individual studies of hybrid zones are certainly useful, our work shows how re- peated sampling of a single hybrid zone elicits a deeper understanding of evolutionary dynamics. By combining long-term sampling with novel statistical techniques and environmental data, we track evolution in action and learn more about how environmental changes affect H. erato hybrid zones. Long term datasets like these are essential for determining how ecological and evolutionary forces change through time. Ours is the third study over roughly 30 years to examine this hybrid zone. We hope it will not be the last. By our estimate, the center of the hybrid zone is now ~85 km east of the center of the city of Panamá. It will be interesting to see whether and how the more urban, developed landscape around the city influences hybrid zone dynamics in the future.

74 2.6 Figures and Tables

A

DV hydara B

DV heterozygote C

N DV 0 60 120 D venus kilometers

DV demophoon

Figure 2.1: Sampling of Heliconius erato across Panamá. Collection sites from 2015 are shown in yellow, while the sites from Mallet (1986) and Blum (2002) are shown as grey triangles and squares, respectively. The pie charts show the proportion of each phenotype sampled at each site in 2015, with colours matching the cartoon representations of the four phenotypic classes of H. erato in our study. Dorsal pattern is shown on the left with ventral pattern on the right. The dashed line shows the fitted cubic transect used for calculating distances.

75 1.00 ● ● ● ● ● ● 0.75 ● D ● Y H

0.50 ● ●

● Frequency of Cr Frequency 0.25 ● 1982 1999 ● 2015 ● ● 0.00 ● ● 300 350 400 450 500 550 600 Distance along transect (km)

Figure 2.2: Results of cline analysis. The solid orange line shows the best-fit cline for 2015, while the transparent grey lines show the full posterior distribution of the best-fit cline model for 2015: each grey line is a cline where the values for each parameter are the values from one sample of the posterior distribution. The dotted and dashed lines show the estimated best-fit clines for 1982 and 1999-2000, respectively. The points and vertical lines are not derived from the cline model, and instead show the per-site frequency (with 95% credible intervals) of the CrHYD allele at each site along the transect in 2015, as estimated directly from the sampling data.

76 A B

0.20 ρ = −0.31, p = 0.04 0.20 ρ = −0.16, p = 0.27

0.15 0.15

0.10 0.10 1999−2015 1999−2015 D D Y Y H 0.05 H 0.05 Cr Cr ∆ ∆

0.00 0.00

0.0 0.1 0.2 0.3 −0.15 −0.10 −0.05 Proportion of forest lost, 2000−2017 ∆NDVI, 2000−2017

Figure 2.3: The relationship between change in CrHYD allele frequency and (A) proportion of forest lost from 2000 to 2017, and (B) mean ∆NDVI from 2000 to 2017. Each data point represents one circle of radius 5km spaced at 15km intervals along our transect. Inset in each panel are the corre- lation coefficient (Spearman’s ρ) and P -value of a test for correlation between the two variables.

77 Cline parameters

Year Best model center (km) width (km) pmin pmax δR (km) τR 1982 no tails 516 (511, 521) 53 (35, 71) 0.0684 (0.0375, 0.1009) 0.9114 (0.8734, 0.9476) NA NA 1999-2000 right tail 467 (462, 473) 60 (36, 85) 0.0376 (0.015, 0.0631) 0.9478 (0.8976, 1) 21 (0.0015, 53) 0.57 (0.1758, 1) 2015 right tail 451 (442, 460) 93 (65, 123) 0.0397 (0.01, 0.0717) 0.9432 (0.8978, 0.9999) 25 (5e-04, 59) 0.6556 (0.2599, 1)

Table 2.1: Parameter estimates for the best-fit cline model for each year. Estimates are given as the posterior mean, and 95% credible intervals are the 95% highest posterior density interval of the marginal distribution for each parameter. Parameters: center is the location (in km along the transect) of the cline center; width is the width of the cline (in km) such that 1/width is the maximal slope of the cline; pmin and pmax are the minimum and maximum frequencies of the CrHYD allele in the cline tails, δR is the distance from the cline center at which the right introgression tail starts, and τR is the ratio of the slope of the introgression tail to the slope of the sigmoid center. The best-fit cline from 1982 was a model without introgression tails, so it does not have δR and τR parameters.

Bibliography

Alexandrino, J., S. J. E. Baird, L. Lawson, J. R. Macey, C. Moritz, and D. B. Wake (2005). Strong selection against hybrids at a hybrid zone in the Ensatina ring species complex and its evolu- tionary implications. Evolution 59(6), 1334–1347.

Barton, N. H. (1983). Multilocus clines. Evolution 37(3), 454–471.

Barton, N. H. and K. S. Gale (1993). Genetic analysis of hybrid zones. In R. G. Harrison (Ed.), Hybrid zones and the evolutionary process, pp. 13–45. New York: Oxford University Press.

Barton, N. H. and G. M. Hewitt (1985). Analysis of hybrid zones. Annual Review of Ecology and Systematics 16(1), 113–148.

Benson, W. W. (1972). Natural selection for Mullerian mimicry in Heliconius erato in Costa Rica. Science 176(4037), 936–939.

Betancourt, M. (2017). A conceptual introduction to Hamiltonian Monte Carlo. arxiv, 1–60.

Blum, M. J. (2002). Rapid movement of a Heliconius hybrid zone: evidence for phase III of Wright’s shifting balance theory? Evolution 56(10), 1992–1998.

78 Blum, M. J. (2008). Ecological and genetic associations across a Heliconius hybrid zone. Journal of Evolutionary Biology 21(1), 330–341.

Buggs, R. J. A. (2007). Empirical study of hybrid zone movement. Heredity 99(3), 301–312.

Carlson, T. N. and D. A. Ripley (1997). On the relation between NDVI, fractional vegetation cover, and leaf area index. Remote Sensing of Environment 62(3), 241–252.

Carpenter, B., A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell (2017). Stan: a probabilistic programming language. Journal of Statistical Software 76(1), 1–32.

Chouteau, M., M. Arias, and M. Joron (2016). Warning signals are under positive frequency- dependent selection in nature. Proceedings of the National Academy of Sciences 113, 2164– 2169.

Dasmahapatra, K. K., M. J. Blum, A. Aiello, S. Hackwell, N. Davies, E. P. Bermingham, and J. Mallet (2002). Inferences from a rapidly moving hybrid zone. Evolution 56(4), 741–753.

Delahaie, B., J. Cornuault, C. Masson, J. A. M. Bertrand, Y. X. C. Bourgeois, B. Milá, and C. Thébaud (2017). Narrow hybrid zones in spite of very low population differentiation in neutral markers in an island bird species complex. Journal of Evolutionary Biology 30(12), 2132–2145.

Derryberry, E. P., G. E. Derryberry, J. M. Maley, and R. T. Brumfield (2014). HZAR: hybrid zone analysis using an R software package. Molecular Ecology Resources 14(3), 652–663.

Gay, L., P. A. Crochet, D. A. Bell, and T. Lenormand (2008). Comparing clines on molecular and phenotypic traits in hybrid zones: a window on tension zone models. Evolution 62(11), 2789–2806.

Gelman, A. and D. B. Rubin (1992). Inference from iterative simulation using multiple sequences. Statistical Science 7(4), 457–472.

79 Gompert, Z., E. G. Mandeville, and C. A. Buerkle (2017). Analysis of population genomic data from hybrid zones. Annual Review of Ecology, Evolution, and Systematics 48(1), 207–229.

Haldane, J. B. S. (1948). The theory of a cline. Journal of Genetics 48(3), 277–284.

Hansen, M. C., P. V. Potapov, R. Moore, M. Hancher, S. A. Turubanova, A. Tyukavina, D. Thau, S. V. Stehman, S. J. Goetz, T. R. Loveland, A. Kommareddy, A. Egorov, L. Chini, C. O. Jus- tice, and J. R. G. Townshend (2013). High-resolution global maps of 21st-century forest cover change. Science 342(6160), 850–853.

Harrison, R. G. (1990). Hybrid zones: windows on evolutionary process. In D. J. Futuyma and J. Antonovics (Eds.), Oxford Surveys in Evolutionary Biology, pp. 69–128. New York: Oxford University Press.

Hartl, D. L. and A. G. Clark (1997). Principles of Population Genetics (3rd ed.). Sunderland, MA: Sinauer Associates, Inc.

Hoffman, M. D. and A. Gelman (2011). The No-U-Turn Sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. arXiv.org.

Hunter, E. A., M. D. Matocq, P. J. Murphy, and K. T. Shoemaker (2017). Differential effects of climate on survival rates drive hybrid zone movement. Current Biology 27(24), 3898–3903.

Kapan, D. D. (2001). Three-butterfly system provides a field test of müllerian mimicry. Na- ture 409(6818), 338–340.

Leaché, A. D., J. A. Grummer, R. B. Harris, and I. K. Breckheimer (2017). Evidence for concerted movement of nuclear and mitochondrial clines in a lizard hybrid zone. Molecular Ecology 26(8), 2306–2316.

Macholán, M., S. J. Baird, P. Munclinger, P. Dufková, B. Bímová, and J. Piálek (2008). Genetic conflict outweighs heterogametic incompatibility in the mouse hybrid zone? BMC Evolutionary Biology 8(1), 271–14.

80 Mallet, J. (1986). Hybrid zones of Heliconius butterflies in Panama and the stability and movement of warning colour clines. Heredity 56(2), 191–202.

Mallet, J. and N. Barton (1989a). Inference from clines stabilized by frequency-dependent selec- tion. Genetics 122(4), 967–976.

Mallet, J., N. Barton, G. Lamas, J. Santisteban, M. Muedas, and H. Eeley (1990). Estimates of selection and gene flow from measures of cline width and linkage disequilibrium in Heliconius hybrid zones. Genetics 124(4), 921–936.

Mallet, J. and N. H. Barton (1989b). Strong natural selection in a warning-color hybrid zone. Evolution 43(2), 421–431.

McElreath, R. (2015). Statistical rethinking: a Bayesian course with examples in R and Stan. Boca Raton, FL: CRC Press.

Merrill, R. M., R. W. R. Wallbank, V. Bull, P. C. A. Salazar, J. Mallet, M. Stevens, and C. D. Jiggins (2012). Disruptive ecological selection on a mating cue. Proceedings of the Royal Society B: Biological Sciences 279(1749), 4907–4913.

Mettler, R. D. and G. M. Spellman (2009). A hybrid zone revisited: molecular and morphological analysis of the maintenance, movement, and evolution of a Great Plains avian (Cardinalidae: Pheucticus) hybrid zone. Molecular Ecology 18(15), 3256–3267.

Miller, M. J., S. E. Lipshutz, N. G. Smith, and E. Bermingham (2014). Genetic and phenotypic characterization of a hybrid zone between polyandrous Northern and Wattled Jacanas in western Panama. BMC Evolutionary Biology 14(1), 227.

Nadeau, N. J., C. Pardo-Diaz, A. Whibley, M. A. Supple, S. V. Saenko, R. W. R. Wallbank, G. C. Wu, L. Maroja, L. Ferguson, J. J. Hanly, H. Hines, C. Salazar, R. M. Merrill, A. J. Dowling, R. H. ffrench Constant, V. Llaurens, M. Joron, W. O. McMillan, and C. D. Jiggins (2016). The gene cortex controls mimicry and crypsis in butterflies and moths. Nature 534(7605), 106–110.

81 Pettorelli, N., J. O. Vik, A. Mysterud, J.-M. Gaillard, C. J. Tucker, and N. C. Stenseth (2005). Using the satellite-derived NDVI to assess ecological responses to environmental change. Trends in Ecology & Evolution 20(9), 503–510.

Phillips, B. L., S. Baird, and C. Moritz (2004). When vicars meet: a narrow contact zone between morphologically cryptic phylogeographic lineages of the rainforest skink, Carlia rubrigularis. Evolution 58(7), 1536–1548.

Pinheiro, C. E. G. (1996). Palatablility and escaping ability in Neotropical butterflies: tests with wild kingbirds (Tyrannus melancholicus, Tyrannidae). Biological Journal of the Linnean Soci- ety 59(4), 351–365.

Pinheiro, C. E. G. (2011). On the evolution of warning coloration, Batesian and Müllerian mimicry in Neotropical butterflies: the role of jacamars (Galbulidae) and tyrant-flycatchers (Tyrannidae). Journal of Avian Biology 42(4), 277–281.

Porter, A. H., R. Wenger, H. Geiger, A. Scholl, and A. M. Shapiro (1997). The Pontia daplidice- edusa hybrid zone in northwestern Italy. Evolution 51(5), 1561–1573.

Rosser, N., K. K. Dasmahapatra, and J. Mallet (2014). Stable Heliconius butterfly hybrid zones are correlated with a local rainfall peak at the edge of the Amazon basin. Evolution 68(12), 3470–3484.

Roy, J.-S., D. O’Connor, and D. M. Green (2012). Oscillation of an anuran hybrid zone: morpho- logical evidence spanning 50 years. PLoS ONE 7(12), e52819.

Sheppard, P. M., J. R. G. Turner, K. S. Brown, W. W. Benson, and M. C. Singer (1985). Genetics and the evolution of Muellerian mimicry in Heliconius butterflies. Philosophical Transactions of the Royal Society B: Biological Sciences 308(1137), 433–610.

Stan Development Team (2018). RStan: the R interface to Stan.

Supple, M., R. Papa, B. Counterman, and W. O. McMillan (2014). The genomics of an adaptive

82 radiation: insights across the Heliconius speciation continuum. In Ecological Genomics Ecology and the Evolution of Genes and Genomes, pp. 249–271. Dordrecht: Springer Netherlands.

Szymura, J. M. and N. H. Barton (1986). Genetic analysis of a hybrid zone between the Fire- Bellied Toads, Bombina bombina and B. variegata, near Cracow in Southern Poland. Evolu- tion 40(6), 1141–1159.

Van Belleghem, S. M., P. Rastas, A. Papanicolaou, S. H. Martin, C. F. Arias, M. A. Supple, J. J. Hanly, J. Mallet, J. J. Lewis, H. M. Hines, M. Ruiz, C. Salazar, M. Linares, G. R. P. Moreira, C. D. Jiggins, B. A. Counterman, W. O. McMillan, and R. Papa (2017). Complex modular architecture around a simple toolkit of wing pattern genes. Nature Ecology & Evolution 1, 0052.

Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and Widely Applicable Information Criterion in singular learning theory. Journal of Machine Learning Research 11, 3571–3594.

Wilson, J. D., D. J. Schmidt, and J. M. Hughes (2016). Movement of a hybrid zone between lineages of the Australian glass shrimp (Paratya australiensis). Journal of Heredity 107(5), 413–422.

83 Linking Statement 2

In Chapter 2, I take an observational approach to studying selection in natural populations by combining historical data with new collections from a moving hybrid zone of Heliconius butter- flies. Because we know the alleles responsible for colour pattern in this species, I link changes in phenotype frequencies of populations with changes in the underlying allele frequencies. Through developing new statistical models and incorporating data on environmental change, I show how a decrease in the strength of selection, possibly driven by deforestation, has led to changes in hybrid zone dynamics over the last 15 years. Chapter 3 takes an experimental approach to studying selection and adaptation. I study Anolis sagrei lizards in the Bahamas as part of a long-term field experiment. As with Heliconius, the ecological forces that influence natural selection on Anolis lizards are well-studied, though often only within a single generation. Unlike Heliconius, little is known about the genetic basis of putatively selected traits. I combine genomic sequencing with manipulations of the presence and absence of competitors and predators to test hypotheses about the predictability of evolution- ary change over multiple generations in A. sagrei. Though the experimental manipulations had consistent effects on the behavior, diet, and population size of A. sagrei, evolutionary change in phenotypes and genotypes was difficult to predict at the treatment level. I find that this is due, in part, to environmental and ecological differences between islands within the same treatment. This work highlights the importance of understanding ecology when predicting evolution, and shows how field experiments can be used to clarify hypotheses about natural selection and adaptation.

84 Chapter 3

Predicting evolutionary change from ecolog- ical interactions: a field experiment with Ano- lis lizards

Timothy J. Thurman1,2, Robert M. Pringle3, Rowan D. H. Barrett1 Author affiliations: 1 Redpath Museum and Department of Biology, McGill University. Montréal, QC. Canada 2 Smithsonian Tropical Research Institute. Panamá, República de Panamá 3 Department of Ecology & Evolutionary Biology, Princeton University, Princeton, NJ, USA

3.1 Abstract

Determining whether and how evolution can be predictable is an important goal, particu- larly as anthropogenic disturbances lead to an increasingly uncertain future for earth’s biodiversity. Here, we use a multi-generation, replicated field experiment with Anolis sagrei lizards to test the predictability of evolution. We manipulated the presence and absence of a terrestrial predator, Leio- cephalus carinatus, and a conspecific competitor, A. smaragdinus, in a factorial design across 16 islands in the Bahamas with existing A. sagrei populations. We sampled lizards before our experi-

85 mental manipulations and again after roughly five generations, measuring functional traits related to locomotor performance and habitat use and using ddRAD sequencing to estimate genome-wide changes in allele frequency. Despite strong and consistent effects of predators and competitors on the behavior, diet, and population size of A. sagrei, we find that evolutionary change at both the phenotypic and genomic level was difficult to predict in advance. Phenotypic change, though not stochastic, was related to variation in vegetation structure and lizard densities across islands, making a priori prediction challenging but ultimately possible given sufficient ecological data. Genetic change, on the other hand, was unpredictable and unrelated to either our experimental ma- nipulations, phenotypic change, or environmental differences. Our work shows how small changes in ecological context can alter evolutionary outcomes over short timescales and demonstrates the importance of using field experiments to test and clarify hypotheses about how natural selection operates.

3.2 Introduction

In a famous (at least, famous to biologists) thought experiment, Stephen J. Gould asked us to imagine “replaying life’s tape”: if we could rewind the history of life and re-start evolution from an earlier point, would we get the same outcome (Gould, 1989)? This metaphor neatly encapsulates a fundamental debate in biology about the relative role of deterministic versus stochastic factors in evolution (Blount et al., 2018). This debate has played out across many scales, from the role of selection versus drift in structuring genetic variation within populations (Kreitman and Akashi, 1995; Nei et al., 2010) to macroevolutionary studies of phenotypic convergence across distantly related lineages (Losos, 2011). If deterministic factors (e.g., natural selection) predominate, re- playing the tape of life would lead to a similar outcome. If stochastic factors rule, the outcome would be unpredictable. Of course, the ability to predict evolution is not simply a thought experi- ment or academic concern. With anthropogenic disturbances including climate change, pollution, and invasive species increasingly threatening biodiversity, predicting how species and ecosystems will respond is crucial for effective conservation and mitigation of harm (Urban et al., 2016).

86 Biologists have often taken a retrospective approach when studying the predictability of evolution. A popular method is to examine (and ideally quantify) the similarities and differences across replicate populations or lineages which have evolved in similar habitats or faced similar adaptive challenges. Repeated evolution of the same phenotypes or genotypes during these “natural experiments” is evidence that evolution can be consistent and thus predictable, though the extent to which this parallel evolution is common remains an open question (Bolnick et al., 2018). Evolution is increasingly recognized as a contemporary process, not only a historical one (Reznick et al., 2018). With this paradigm shift, biologists have realized that they do not have to rely on natural experiments to study the repeatability of evolution; they can perform their own. Much of this work has been done in the laboratory, where studies of experimental evolution with microbes (e.g. Lenski et al., 1991; Travisano et al., 1995), Drosophila (e.g., Burke et al., 2010), and other organisms have shown that evolution can indeed be predictable, to an extent (reviewed by Kawecki et al., 2012; Blount et al., 2018). Evolutionary experiments outside the lab are rarer, perhaps because of the logistical, ethical, legal, and conservation concerns with performing experiments in the wild. Nevertheless, pioneer- ing studies with guppies (Reznick et al., 1990), Anolis lizards (Losos et al., 1997; Kolbe et al., 2012), and Daphnia (Ebert et al., 2002) show that evolution can still be predictable in the wild if natural selection is strong. With advances in genetic sequencing, biologists can now combine field experiments with genomics to understand the predictability of evolutionary change at the genetic level (e.g., Gompert et al., 2014; Thurman and Barrett, 2016; Barrett et al., 2019; Exposito-Alonso et al., 2019). Many of these genomic field studies, however, last less than a generation or two. Though they show that selection may be predictable, whether this translates to parallel evolution across multiple generations remains to be seen. We present results from a large-scale, replicated, multi-generational evolutionary ecology experiment with Anolis lizards. Anoles are an oft-cited example of parallel evolution. Similar eco- morphs with adaptations to different microhabitats have repeatedly evolved during the adaptive radiation of Anolis across the Caribbean (Losos, 2009). On 16 small islands in the Bahamas al-

87 ready occupied by brown anoles (Anolis sagrei), we introduced a congeneric competitor (the green anole, A. smaragdinus) and a terrestrial predator (the curly-tailed lizard, Leiocephalus carinatus) in a factorial design. We have previously shown that these introductions have strong ecological effects (Pringle et al., 2019). Introduced predators suppress population growth in brown anoles and lead to significant behavioral changes: the usually semi-terrestrial brown anoles perch higher in vegetation to avoid predation. This change in habitat use leads to intensified competition with the more arboreal green anoles, as revealed by increased spatial and dietary overlap between these species. On half of our experimental islands with all three species, this increased competition for predator-free space led to the extirpation of the green anole population. Here, we combine phenotypic and genetic data to ask whether these novel ecological interactions led to predictable evolutionary responses in A. sagrei. Our experiment builds on a large body of comparative, laboratory, and field studies in Anolis that show how different phenotypic traits affect performance and fitness (reviewed in Losos, 2009). From this work, we can derive predictions about what may happen in our experiment. Two mor- phological traits in particular, hindlimb length and number of toepad scales, have been the subject of previous studies which give us clear predictions for how they might evolve in our experiment. In anoles, limb length, especially hindlimb length, is related to locomotor performance and perch use. Lizards with relatively longer limbs are faster sprinters on broad surfaces, while lizards with shorter limbs are less likely to fall when moving on narrow perches (Losos and Sinervo, 1989; Losos and Irschick, 1996). In a single-generation study, Losos et al. (2004, 2006) introduced ter- restrial curly-tailed lizards on islands with A. sagrei. A. sagrei began perching on higher, narrower perches and experienced strong natural selection on hindlimb length. Thus, Losos and colleagues predicted that, over multiple generations, A. sagrei should evolve shorter hindlimbs more suited to moving on the high, thin perches that are a refuge from predators (Losos et al., 2006). We therefore expect relative hindlimb length of A. sagrei to decrease on islands with predators relative to islands without predators. Another trait related to an arboreal lifestyle is the number of subdigital scales on the toepad

88 (lamellae). Arboreal species tend to have more lamellae (Glossip and Losos, 1997), and individuals with larger toepads have better clinging ability (Elstrott and Irschick, 2004). Stuart et al. (2014) found that native A. carolinensis, a close relative of the green anoles in our study, perch higher in response to the introduction of brown anoles and adaptively evolve larger toepads. Again, given the changes in perch height in our experiment, we expect number of lamellae in A. sagrei to increase on islands with predators relative to controls. For other phenotypic traits, and for the effect of the introduced competitors, we do not have such clear a priori expectations. The same is true at the genetic level, where we have very little information about the genetic architecture of Anolis traits which could be adaptively important. Nevertheless, we can still examine evolutionary predictability in the broader sense of identifying consistencies across replicate populations. Importantly, our experimental setup avoids some draw- backs of historical approaches. We know the initial conditions of our experimental populations and can directly observe changes, so we do not have to compare extant lineages and make assumptions about the state of ancestral populations (Bolnick et al., 2018). Our study examines two types of predictability across multiple scales. First, we consider it in the truest sense: will our a priori predictions come true? Next, we quantify whether our experimental manipulations had consistent, and thus predictable, phenotypic effects on both uni- variate and multivariate phenotypes. In doing so we examine the relationship between phenotypic change and ecological data on habitat structure and population density, to test whether these factors influence predictability. Then, we determine whether genome-wide allele frequency changes are consistent across our treatments. Finally, we consider phenotypic and genetic change jointly to see if there are any similarities in parallelism across these two scales.

3.3 Materials and Methods

3.3.1 Description of field experiment

Our work is one facet of a large-scale, long-term field experiment on Anolis sagrei lizards in the Bahamas. Established in 2011, this experiment seeks to understand the impacts of a competi-

89 tor species (the green anole, Anolis smaragdinus) and a predator species (the curly-tailed lizard, Leiocephalus carinatus) on populations of A. sagrei (see section A.4.2). We briefly describe the experimental design here; more details can be found in Pringle et al. (2019) and section A.4.3. We established our experiment across 16 small islands, transplanting 10-11 green anoles and 5-7 curly-tailed lizards in a factorial design to create 4 treatments: control (CON) islands with no introduced species, competitor (COMP) islands with added A. smaragdinus, predator (PRED) islands with added L. carinatus, and islands with all 3 species (ALL). We randomly assigned 4 islands to each treatment. Transplanted green anoles were captured from Staniel Cay, a large island with A. sagrei but no curly-tailed lizards. The introduced curly-tailed lizards were captured on Thomas Cay, a large island where all three species are present. Thus, the curly-tailed lizards were accustomed to the anole species and the green anoles were accustomed to brown anoles. However, A. sagrei were naive to the introduced competitor, and both anoles were ecologically naive to the predator.

3.3.2 Annual census and perching behavior

Each year from 2011 to 2016, we conducted a census in late April or early May to estimate the population sizes for each species of lizard. In 2011, this census was performed for A. sagrei only, before the experimental transplants. Each year, 3-6 researchers searched islands for all lizard species over three consecutive days, using a mark-resight procedure developed for Caribbean Ano- lis to estimate population size (Heckel et al., 1979; Pringle et al., 2015, 2019, see section A.4.4). During each census, we also recorded data on habitat use for each lizard spotted, including perch height and diameter of substrate (see section A.4.4). Lizards were not marked in an individually- identifiable way. Given the important relationship between morphology and habitat in anoles, we expand on the analysis of perch use presented in Pringle et al. (2019). Based on 5464 perch observations of A. sagrei from 2011 and 2016, we used generalized linear mixed models (GLMMs) to test for changes in perch height and diameter of A. sagrei associated with introduced predators, competitors, and

90 their interaction. For perch height, we used a zero-inflated negative binomial likelihood (to account for the many individuals seen on the ground at height 0), and for perch diameter we used a negative binomial likelihood (section A.4.6).

3.3.3 Phenotypic and genetic sampling

We did two separate trips to collect phenotypic and genetic data from A. sagrei. In May 2011, before the introduction of the green anoles and curly-tailed lizards, we sampled A. sagrei from all 16 experimental islands. In January 2016, after roughly 5 generations, we returned to take a post-treatment sample from all 16 experimental islands. We used a portable X-ray to capture a full-body X-ray image for measuring skeletal traits, scanned the underside of the lizard for mea- surements of toepad traits, weighed each lizard to the nearest 0.1g, and took a small piece of tail tip as a tissue sample for genetic analysis (see section A.4.7). Because phenotypic sampling was time-intensive, on some islands we also collected tissue samples in the field from A. sagrei from which we did not collect any phenotypic data. After sampling, all lizards were returned to the loca- tion where they were captured and marked with a small dot of water-soluble nail polish to prevent recapture.

3.3.4 Skeletal and toepad measurements

We measured 13 skeletal traits and 2 toepad traits from our X-rays and scans (Figure A.19). We measured skeletal traits associated with ecomorph type, locomotor performance, and bite force in Anolis: snout-vent length (SVL, a measure of body size), head width and length, pectoral and pelvic width, forelimb length (total, and broken down into humerus, ulna, and toe length), and hindlimb length (total, and broken down into femur, tibia, and toe length). For toepads, we counted the number of lamellae (scales) on the third toe of the fore foot and the fourth toe of the hind foot. All measurements for a given trait were done by a single person and blind to treatment. Skeletal traits were measured from X-ray images in ImageJ v1.49v (Rasband, 2018; Schneider et al., 2012) using the objectJ plugin (Vischer and Nastase, 2015). We counted the number of toepad scales directly from scans. We attempted to measure all traits for all lizards, but we did not

91 take measurements in cases where (1) X-rays or scans were too unclear or contained anomalies such that we could not locate landmarks, (2) individuals had bones that appeared to have healed after a fracture, and (3) some part of the trait of interest was missing or damaged such that it could not be measured. After measuring all lizards, we excluded juvenile lizards from further phenotypic analysis (see section A.4.8). We measured a subset of x-rays and scans multiple times to estimate repeatability. All skeletal measurements were highly repeatable (repeatability > 0.99), and scale measures were also highly repeatable though less so (> 0.84, see section A.4.9 and Table A.18). For X-rays and scans measured multiple times, we took the mean of the repeated measurements. For bilateral traits, we measured both the left and right sides, when possible, and took the mean of the two measurements. If only one side could be measured, we used that value. After removing juveniles from our dataset, we analyzed phenotypes from 967 A. sagrei (mean N = 15 per island/sex/year, range 3-35).

3.3.5 Phenotypic change- univariate

We first analyzed the consistency of change for univariate traits of A. sagrei using linear mixed models (LMMs). We included presence/absence of the predator, presence/absence of the competitor, sampling year, sex, and all possible interactions as main effects. For all models, we allowed the variance in each trait to differ between sexes and included island as a random ef- fect crossed with sex. We tested for the effect of predators, competitors, and their interaction on phenotypic change by examining the two- and three-way interactions of these effects with year. In anoles, most skeletal traits are strongly correlated with overall body size (Beuttell and Losos, 1999). To account for this, we included SVL and an SVL*sex interaction as covariates. These relationships may differ across A. sagrei populations. Our model used partial pooling to estimate both per-island and experiment-wide coefficients for these covariates, which allows these parameters to vary across islands while sharing information across the experiment to avoid over- fitting. We did not correct for body size when analyzing number of toepad lamellae, which is fixed early in development (Losos, 2009) and was uncorrelated with body size in our data. Thus,

92 for toepad traits and SVL our modelling approach was analogous to repeated-measures ANOVA, while our models for skeletal traits were analogous to repeated-measure ANCOVAs. We implemented these LMMs in a Bayesian framework with the R package brms (Bürkner, 2017, 2018). We used the Student’s t-distribution as the likelihood, which is similar to a Gaussian likelihood but more robust to possible outliers (Gelman and Hill, 2007). We used weakly infor- mative priors and fit four independent chains with 1000 iterations of warmup and 1000 iterations of sampling each, using the brms default control parameters (see section A.4.10 for full details of priors and models). We assessed statistical significance for the interactions of interest by de- termining whether the 95% highest posterior density interval (HPDI) for that parameter contained 0.

3.3.6 Phenotypic change- multivariate

To examine parallelism in multivariate phenotypic change, we used the phenotypic change vector analysis developed by Collyer and Adams (2007) and Adams and Collyer (2009). This ap- proach summarizes a population’s multivariate change across N trait dimensions with a phenotypic change vector (PCV): a vector of length N where each element of the vector is the mean change in a single trait. Importantly, the PCV quantifies both the magnitude of change (the length of the vector, D) and the direction of change relative to another vector (θ), making it an especially useful method of quantifying parallelism (Stuart et al., 2017; Bolnick et al., 2018). We used 13 traits for this analysis: the two toepad measures, and all the skeletal measures except total forelimb length and total hindlimb length (which we excluded to avoid redundancy). To remove the effect of body size on skeletal traits besides SVL, we calculated residuals from a linear mixed model using the same approach as above, but without the year, predator, and competi- tor effects (see section A.4.11 for details). We used these residuals as size-independent measures of each trait and standardized all trait measures to have a mean of 0 and a standard deviation of 1 to remove possible scaling effects across traits. Finally, we calculated the change in average value for each of the 13 phenotypes on each island. Thus, for each island we have a single PCV which

93 describes how the population has evolved through 13-dimensional morphospace. To quantify par- allelism, we calculated the length of each PCV (D), all pairwise differences in length (∆D), and all pairwise differences in direction (θ), and used ANOVAs to determine whether these quantities varied across our experimental treatments.

3.3.7 Ecological and environmental causes of parallelism

To test whether environmental and ecological variation across islands promotes or in- hibits parallelism, we quantified pairwise environmental differences between islands, ∆env (sec- tion A.4.5). Briefly, we combined data on island size, vegetation structure, and the density of each lizard species across years, and used these to calculate the Euclidean distance matrix of pairwise comparisons between islands. We used Mantel tests with 10000 permutations to examine whether environmental similarity was related to parallelism in the magnitude of phenotypic evolution (i.e.,

∆env vs D) or parallelism in the direction of phenotypic evolution (i.e., ∆env vs θ).

3.3.8 Library preparation and sequencing

We extracted DNA from individual lizard tail tips using a standard phenol-chloroform based method (section A.4.12), then cleaned the genomic DNA using NucleoSpin R gDNA cleanup kits (Takara Bio). We then prepared sequencing libraries for double-digest, restriction enzyme- associated DNA sequencing (ddRAD, Peterson et al., 2012). Briefly, we used the restriction enzymes NlaIII and MluCI to digest genomic DNA. We nor- malized individual samples to 90ng using a Biomek FXP liquid handling robot (Beckman Coulter), then ligated adaptors to the digested DNA. We used adaptors with 48 individual barcodes and 16 library barcodes, allowing us to multiplex up to 768 samples within a single lane of sequencing. As much as possible, we made library composition even in terms of population and year of sam- pling to avoid batch effects. In the P2 adaptor, we included a small Degenerate Base Region (DBR Schweyen et al., 2014), slightly modified from the design of Vendrami et al. (2017). These ran- dom bases in the adaptor allow us to bioinformatically filter out potential PCR duplicates, which is not otherwise possible with ddRAD sequencing. After digestion and adaptor ligation, we pooled

94 individual samples into libraries, size-selected fragments of 400-450bp using a PippinPrep (Sage Science), and finally PCR amplified each library (see section A.4.12 for full details). In total, we assembled 29 libraries, which we sent to Génome Québec (Montréal, Canada) for sequencing. We sequenced one library on a half lane of an Illumina HiSeq 2500 using paired- end, 125bp sequencing. All other libraries were sequenced across 2 lanes (16 libraries in one lane, 12 on the other) of an Illumina NovaSeq 6000 S4 flowcell using paired-end 150bp reads.

3.3.9 Bioinformatic pipeline

We briefly summarize our bioinformatic pipeline here, see section A.4.13 for the full details of the programs and options we used. We demultiplexed raw reads and filtered out PCR duplicates using the process_radtags and clone_filter tools in Stacks v1.46 (Catchen et al., 2011, 2013). We used fastx_trimmer and cutadapt v2.1 to remove extra bases and adaptor contamination, and then mapped reads to the preliminary Anolis sagrei genome using the default settings of the BWA-MEM algorithm in bwa v0.7.15, keeping only alignments with a mapping quality of at least 20 (Martin, 2011; Li and Durbin, 2009). From here, our analysis pipeline diverged based on the two methods we used to quantify genetic parallelism. Our first approach measures correlations between changes in allele frequency (see be- low). We calculated allele frequencies directly from the mapped reads using ANGSD v0.918 (Ko- rneliussen et al., 2014). Our second analysis takes a finer-scale approach and uses the Cochran- Mantel-Haenszel (CMH) test to quantify parallel allele frequency change. Currently, this test can- not be performed directly on mapped reads, but instead needs called genotypes. Thus, we used the multiallelic SNP caller in the mpileup/call pipeline of samtools v1.5 and bcftools v1.5 to generate a VCF file of called genotypes (Li, 2011; Danecek et al., 2014). We used vcftools v0.1.14 (Danecek et al., 2011) to filter these raw variants. We retained only biallelic SNPs with an experiment-wide minor allele frequency greater than 5%, a minimum genotype quality score of 25, a maximum of 50% missing genotypes across the experiment, a minimum mean depth per sample of 5, and a maximum total depth of 42. After filtering, we retained 85888 SNPs.

95 3.3.10 Parallelism of genetic change

We first examined parallel genetic change on a broad, genome-wide scale. To do this, we estimated allele frequencies for each population at each of the two time points. We made these estimates directly from the sequencing read data using the maximum-likelihood model imple- mented in ANGSD (Kim et al., 2011). This model accounts for some of the special features of low-coverage, next-generation sequencing data, including read misalignment, base-calling errors, variation in read depth, and genotype uncertainty (Kim et al., 2011; Korneliussen et al., 2014). Importantly, this allows us to accurately make use of read and SNP data that would be discarded in more traditional, quality-filtered approaches (as in our filtering approach for the CMH test). We discarded SNPs with a minor allele frequency less than 5% at either timepoint and then calculated the change in allele frequency, ∆p, for all SNPs within a given island. For each is-

land, we estimated the average variance effective population size, Ne, during the experiment (Jorde and Ryman, 2007, see section A.4.14). Then, for all possible pairwise comparisons of islands, we calculated the correlation (Pearson’s r) between ∆p for variants shared between the two popula- tions. We would expect that many variants may be changing due to genetic drift alone, though this becomes less likely as allele frequency changes become more extreme. Thus, we calculated the correlation for both all shared variants (rall) and for variants that were in the most extreme 5% tails of allele frequency change within either population (r5%). These outlier SNPs are more likely to be (or be linked to) variants which are under selection. We used ANOVAs to test whether correlations in allele frequency varied significantly across treatments. We also tested for significant correlations between environmental similarity and correlations in allele frequency change (i.e., ∆env vs rall, and

∆env vs r5%). We also looked at parallel changes in allele frequency at a finer scale using the CMH test. The CMH test is analogous to Fisher’s exact test across replicate samples or populations: it tests for independence across 2x2xK contingency tables, where K is the number of replicates and the 2x2 refers to allele counts for a biallelic SNP across two groups (Orozco terWengel et al., 2012). It has become popular in evolve-and-resequence studies to test whether allele frequencies at a

96 single locus have changed significantly across multiple replicate populations (Schlotterer et al., 2015). We used PLINK v1.9 to perform the CMH test on the four islands in each treatment (Chang et al., 2015; Purcell and Chang, 2017). That is, we performed the CMH test four times: once as a 2x2x4 test on the CON islands, once as a 2x2x4 test on the PRED islands, etc. At the individual SNP level, we used the Benjamini-Hochberg method to correct for multiple testing, controlling the false-discovery rate at α = 0.05. We also tested for significant parallelism across larger genomic windows (genome-wide, and in 100kb, 50kb windows, and 10kb windows) using the harmonic mean p-value method for combining dependent tests of Wilson (2019).

3.3.11 Association mapping of traits

To test whether parallel changes in univariate phenotypes led to parallel changes in allele frequency, we used the same set of filtered SNPs as for the CMH test to perform genome-wide association studies (GWAS) of the genetic architecture of phenotypic traits. For our focal traits (hindlimb length and toepad number), and for the traits which showed evidence of consistent change due to our experimental manipulations (see section 3.4.1), we used the Bayesian Sparse Linear Mixed Model (BSLMM) approach of the program GEMMA v0.97 to perform association mapping (Zhou and Stephens, 2012; Zhou et al., 2013, see section A.4.15 for full details).

3.3.12 Parallelism across scales

Finally, we examined whether parallelism was related across the scales at which we mea- sured it. For all 120 possible pairwise comparisons of islands, we have a measure of (1) parallelism in the magnitude of phenotypic change (D), (2) parallelism in the direction of phenotypic change

(θ), and (3) parallelism at the genetic level (rall and r5%). We used Mantel tests with 10000 permu- tations to examine the correlation between parallelism in the magnitude and direction of phenotypic evolution (i.e., D vs θ), while we tested for significant correlations between phenotypic and genetic correlations (i.e., D vs rall, θ vs rall, D vs r5%, and θ vs r5%).

97 3.4 Results

3.4.1 Phenotypic change- univariate traits

We tested for consistent effects of introduced competitors and predators on 13 skeletal traits and two toepad traits linked to ecomorph type, performance, and habitat use. In our LMM frame- work, we quantified predator:year, competitor:year, and predator:competitor:year interactions to determine whether our experimental manipulations caused significant differences in phenotypic evolution relative to controls. For brevity, we drop the “year” when discussing the effect of each manipulation below. We first discuss the results from phenotypes for which we had clear a priori predictions for change in response to introduced predators. We then examine whether any other traits showed signs of consistent change across treatments, presenting results only for models with significant effects. Full results from all traits can be found in Table A.19. We expected predators to select for shorter hindlimbs in A. sagrei. We found the oppo- site: introduced predators had a positive (though not statistically significant) effect on change in hindlimb length (Figure 3.1A). This effect was stronger in males than females (in males, posterior

mean of βpred:year = 0.36, -0.09-0.85 95%CI; females βpred:year = 0.09, -0.3-0.54 95%CI). Among the traits that make up the hindlimb (femur, tibia, and length of the 4th toe), all three had positive effects of predators for males, though for females the effects were mixed. Of these models, the

only statistically significant effect was the βpred:comp:year interaction for male femurs. Our predictions for toepad evolution were also not supported. We predicted predators would select for more toepad scales, but they had a negative effect on change in lamellae number for toes on both the fore- and hindlimb (Figure 3.1C). This effect was significant for the hindlimb in males. Unexpectedly, competitors had a significant negative effect on number of toepad lamellae in males, and there was a strong positive predator:competitor interaction (Figure 3.1C). Four other traits showed signs of being consistently affected by introduced predators or competitors. Predators had a significant negative effect on body size (SVL) in males (βpred:year = - 2.07, -3.35 - -0.78 95%CI). We also observed effects on head shape. For head length in males, there

98 was a significant positive predator:competitor interaction, though neither main effect (positive for predators, negative for competitors) was significant on its own. Predators also had a significant neg- ative effect on head width in males, though all head shape effects in females were nonsignificant. Finally, competitors had a significant negative effect on width of the pectoral girdle in males.

3.4.2 Change in perch behavior

Given these unexpected results, we more closely examined perch behavior of A. sagrei, as the selective effects of predators are thought to be mediated by changes in habitat use (Losos et al., 2006). As we have previously reported (Pringle et al., 2019), A. sagrei significantly changed their perch height in response to predators: they perched higher and were less likely to be on the ground (Figure 3.1B, see Tables A.15-A.16). However, we find that this large increase in perch height did not lead to the expected decrease in average perch diameter (Losos et al., 2006). Quali- tatively, we see little difference in distribution of perch diameters across treatments (Figure 3.1D). Quantitatively, the effect of predators on change in perch diameter was not statistically significant:

(males: βpred:year = -0.14, -0.37-0.11 95% CI; females: βpred:year = -0.2, -0.41-0.02 95% CI ). In- troduced competitors also had no significant effect, nor was there a significant predator:competitor interaction (Table A.17).

3.4.3 Phenotypic change- multivariate traits

We found little evidence that multivariate phenotypic change in response to introduced predators and competitors was parallel at the treatment level. The magnitude of phenotypic change

for each island, D, did not vary significantly across treatments (F3,12 = 0.034, p = 0.99). Differ- ences in the magnitude (∆D) and direction (θ) of trait change also did not vary with treatments. First, considering only the pairwise comparisons done within treatments (6 comparisons within each, 24 total), we see that differences in direction were generally large (θ¯ = 56.5), though with considerable variation across treatments (Figure 3.2A). In general, predator and competitor islands actually showed less parallelism in direction (larger θs) than control islands, though these dif- ferences were not significant (F3,20 = 3.078, p = 0.05). Variation in ∆D across treatments was

99 likewise not statistically significant (F3,20 = 1.302, p = 0.3, Figure 3.2B). As further evidence of the lack of parallelism at the treatment level, we also examined all 120 possible pairwise comparisons between the 16 islands. We tested whether ∆D and θ varied based on whether the islands being compared were in the same or different treatments. They did

not: (∆D: F1,118 = 0.524, p = 0.47; θ: F1,118 = 0.01, p = 0.92, see Figure 3.2C for θ and A.20 for ∆D) When we consider environmental variation, however, we see that the lack of parallelism at the treatment level is related to differences in the vegetation structure and densities of lizards between islands. There was a positive association between environmental dissimilarity and the differences in the direction of phenotypic change (∆env x θ, Mantel test, r = 0.25, p = 0.06), though there was no association between environmental dissimilarity and the magnitude of change

(∆env x ∆D, Mantel test, r = -0.1, p = 0.69). Thus, the lack of parallelism at the treatment level is not stochastic, but rather is due in part to environmental and ecological differences across islands.

3.4.4 Association mapping of traits

We performed genome-wide association tests for our focal traits (hindlimb length and toepad number), and for the traits which showed evidence of consistent change due to our exper- imental manipulations (i.e., SVL, head length, head width, and pectoral width). Across all traits, we find no SNPs significantly associated with variation in phenotype (all posterior inclusion prob- abilities < 0.95), and thus could not test for parallel allele frequency changes at individual SNPs associated with any phenotype.

3.4.5 Parallelism of genetic changes

As at the phenotypic level, we again found little evidence of parallel genetic change within treatments. Effective population sizes were generally small (range Ne = 43-145, Table A.20). We could only calculate rall and r5% for SNPs that were shared across islands. This varied across pair- wise comparisons (for all SNPS, mean = 6792 sites, range 2204-14801; for top 5% outliers, mean

= 1721 sites, range 610-3724). Across all pairwise comparisons, rall ranged from mild positive

100 correlation (rall = 0.303) to mild negative correlation (r5% = -0.296), but there were few clear pat-

terns related to our experimental manipulations (Figure 3.3). Results were similar for r5%, so we

present only results for rall here, see section A.4.16 for analysis of genetic parallelism among these outliers. Considering only comparisons between islands of the same treatment (the boxes along the diagonal of Figure 3.3), correlations appeared weakest amongst the ALL islands and most strongly positive among the COMP islands. However, these trends were mild and statistically insignificant

(ANOVA, F3,20 = 2.4, p = 0.1). When we consider all pairwise tests, rall did not vary according

to whether the islands being compared were in the same or different treatments (ANOVA, F1,118 = 0.45, p = 0.5). Environmental differences were not significantly correlated with genetic parallelism

(∆env x rall, r = 0.08, p = 0.41). We also tested for parallel allele frequency changes at smaller scales using the CMH test. We performed the CMH test separately for the 4 islands within each treatment, using the filtered variants (85888 SNPs). Within each treatment, the number of SNPs with p < 0.05 was slightly more than we would expect due to chance (6-7%, expect 5%). However, none of these SNPs across any treatment remained significant after correcting p-values for multiple testing. Using the harmonic mean p-value approach to test for signs of parallelism across larger genomic regions, we found a genome-wide signature of parallelism in the 4 competitor islands (HMP = 0.04), though we could not narrow down this signal (all HMP > 0.05 across smaller windows). The genome-wide and window-level tests across the other treatments showed no significant parallelism (all HMP > 0.05).

3.4.6 Parallelism across scales

We found no significant relationships between measures of phenotypic and genetic par- allelism. Degree of parallelism in the magnitude of phenotypic change, ∆D, was unrelated to parallelism in the direction of phenotypic change, θ (Mantel test, r = -0.12, p = 0.74). Similarly,

neither of these measures was significantly correlated with rall (∆D x rall, r = 0.1, p = 0.3; θ x rall,

101 r = -0.01, p = 0.88; Figure 3.4).

3.5 Discussion

Our study reports the evolutionary outcomes of a field experiment testing the responses of A. sagrei lizards to introduced predators and competitors. We have previously shown that these introductions cause strong and consistent ecological effects on the perch behavior, diet, and pop- ulation size of A. sagrei (Pringle et al., 2019). Our novel evolutionary results reveal that these introductions cause some significant univariate trait changes, but not necessarily in the direction we predicted. Introduced predators and competitors did not lead to consistent changes in multi- variate phenotype or genotype across treatments, and there was no correlation between parallelism (or lack thereof) across these levels. There was a correlation between phenotypic parallelism and environmental differences between islands, indicating that the inconsistent phenotypic evolution at the treatment level is partly due to island-level differences. Taken together, what do these results tell us about the predictability of evolution? We first consider our a priori predictions about evolution in hindlimb length and num- ber of toepad lamellae. When we began our experiment, we had clear predictions derived from decades of research in Anolis. Studies of comparative morphology (Losos and Irschick, 1996; Glossip and Losos, 1997; Beuttell and Losos, 1999), performance trials with lizards (Losos and Sinervo, 1989; Losos, 1990a,b; Losos and Irschick, 1996; Elstrott and Irschick, 2004), and field experiments (Losos et al., 1997, 2004, 2006; Kolbe et al., 2012; Stuart et al., 2014) all point to relative hindlimb length and number of toepad lamellae as targets of natural selection related to habitat use. Recently, two studies have found a lack of significant effects of L. carinatus on A. sagrei hindlimbs (Schoener et al., 2017; Lapiedra et al., 2018), but when we started our experiment these studies were not yet published. We thus predicted that introduced predators would cause A. sagrei to perch higher, which happened (Figure 3.1C). We also predicted that this increased perch height would favour traits associated with arboreality: shorter relative hindlimbs and more toepad lamellae. Instead, we found the opposite (Figures 3.1A and 3.1B)

102 Our analysis of perch behavior reveals a possible reason for these unexpected results. Al- though lizards perched higher after the introduction of predators, they did not use significantly narrower perches (Figure 3.1D, A.17). Laboratory trials show that it is not perch height per se that impedes the performance of long-legged lizards, but rather perch diameter (Losos and Sinervo, 1989; Losos and Irschick, 1996). It is intuitive to think of perch height and diameter as negatively correlated: trunks and branches are smaller higher up in trees. However, among all A. sagrei perch observations in our experiment, the relationship between perch height and diameter was weak (r = -0.04). Our islands were chosen, in part, because they were large enough to support the tall vegeta- tion that is necessary for A. smaragdinus populations (Losos and Spiller, 1999). This is in contrast to the islands used in previous studies, which were smaller (e.g., 104-324 m2 in Losos et al., 2004, 487-3320 m2 in our study, Table A.14) and had shorter vegetation with few high, broad perches. In an experimental introduction of L. carinatus which partly inspired our work, Losos et al. (2006) found that, immediately following the introduction, selection briefly favoured long hindlimbs in A. sagrei, presumably because they confer increased escape ability in the naive prey. It was only after a few months, once A. sagrei had shifted to more arboreal behavior, that selection favoured short limbs. The net effect was of selection for shorter limbs, which led Losos et al. 2006, and us, to predict that arboreal traits would evolve over multiple generations. However, it seems the larger vegetation of our islands allowed A. sagrei to move higher and still find wide perches that did not inhibit their performance and select for shorter legs or more toe pads. In fact, our result of a positive, though not significant, effect of predators indicates that longer hindlimbs may continue to be favoured even once prey are no longer naive. Indeed, on the eight islands with predators, average predator density was positively and significantly correlated with change in hindlimb length (r = 0.76, p = 0.03). This is consistent with the idea that predators are selecting for increased sprint performance, not agility on narrow perches. This could be due to a landscape of fear, favouring individuals which can dart quickly down to the ground to forage before rapidly returning to a safe perch. Ability to forage may also be related to the changes in head shape that we observed. Ter-

103 restrial ecomorphs tend to have shorter and wider heads than arboreal ecomorphs (Beuttell and Losos, 1999), which lead to stronger bite forces (Herrel et al., 2006). In Anolis, bite force is re- lated to feeding performance through the size and hardness of prey (Herrel et al., 2006). From our diet data, we know both predators and competitors cause A. sagrei to change their diet composi- tion (Pringle et al., 2019). Though we do not have measures of prey hardness to directly link diet change and morphology, the effects of predators and competitors on head length in males are in the direction we would expect based on the general trends in head shape across ecomorphs. Predators had a significant negative effect on male A. sagrei body size (SVL). One can imagine a selective reason for this, as smaller lizards may be less conspicuous. But this seems doubtful: other studies find that selection favours larger lizards, both in experiments with predators (which are gape-limited, Lapiedra et al., 2018) and in anole populations more generally (Calsbeek, 2009). Two other possibilities seem more plausible. Though body size is partly heritable (Calsbeek and Smith, 2007), it is also related to food intake: lizards that eat less food do not grow as large (Bonneaud et al., 2015). Another possibility is a change in the age structure of populations with predators. Most have indeterminate growth, so a decrease in average body size could in- dicate a decrease in the proportion of old lizards in a population. These possible mechanisms for changes in body size are consistent with the ecological effects of predators that we have previously reported, namely changes in diet composition and suppressed population growth. Considering the results from the univariate data, whether we deem these changes as “pre- dictable” or not depends on exactly what we mean. Some traits changed in consistent ways in response to the introduction of predators and competitors, and our ecological data uncover clear possible mechanisms for those changes. Change in these phenotypes does not appear to be stochas- tic: it could, in theory, be predicted. But, as our predictions about evolution in hindlimb length and lamellae number show, successfully making these forecasts may require information that we do not have or did not think to consider. In retrospect, it seems clear that natural selection on hindlimb length might be different on islands with taller vegetation. But, when our experiment began, it was difficult to predict the complex interplay between predator density, vegetation structure, and prey

104 behavior which led to our results. The results on multivariate phenotypic change reveal a similar story. At first glance, the introduction of predators and competitors did not seem to lead to predictable change in multi- variate phenotype: differences in the magnitude (∆D) and direction (θ) of phenotypic change did not vary with treatment. When we considered between-island variation, however, we found that differences in the direction of phenotypic change were correlated with environmental differences between islands. This is a clear case of cryptic environmental variation leading to non-parallel evo- lution (Stuart et al., 2017). Given our factorial experimental design, an analysis of parallelism at the treatment level makes sense. But the broad, discrete treatment categories obscure differences in other ecological factors, and differences in the densities of predators and competitors them- selves, which are driving non-parallel evolution. It is increasingly recognized that non-parallel deterministic effects like these are an important cause of non-parallel evolution, and that not all non-parallelism is indicative of stochasticity or unpredictability (Bolnick et al., 2018). The univariate results show another clear cause of non-parallel selection in our experiment: sexual dimorphism. A. sagrei are dimorphic in body size, shape, and behavior (Butler and Losos, 2002), and our univariate results show that the effects of predators and competitors vary across sex. Such sex-specific selection could dampen population- and treatment-level signs of parallelism. Again, accounting for these non-parallel sources of selection may be difficult when making true predictions: it may be hard to know in advance which environmental factors are important, or to predict sex-specific responses. Nevertheless, this type of unpredictability is a solvable problem. We simply require more data and a clearer understanding of ecological interactions. At the genetic level, however, we see little evidence of non-parallel deterministic forces. Allele frequency changes were not correlated within treatments, not associated with phenotypic change, and not related to environmental differences. These results suggest that drift may be the main driver of the genetic changes we observed. This is not surprising, given the estimates of effective population size for these islands (Table A.20). Most previous field experiments with A. sagrei have not had a genetic component, and the exceptions (e.g., Kolbe et al., 2012) have not

105 calculated effective population size, so we do not know if our populations have unusually small Ne for island populations of A. sagrei. A complementary explanation is that genetic adaptation is occurring, but through modes we do not have the statistical power to detect in single-SNP tests. When traits under selection have a diffuse genetic architecture – hundreds to thousands of loci affect the trait in small, difficult-to- detect ways – genetic adaptation would occur through subtle, coordinated shifts in the frequencies of multiple alleles. There is some evidence for this among the islands in the competitor treatment, based on the genome-wide significance of the CMH test. The results of our association tests are also consistent with a polygenic architecture of ecologically important traits: we found no individually significant SNPs. Of course, performing association mapping in natural populations is challenging, and we may simply lack sufficient statistical power to find associated SNPs (Santure and Garant, 2018). Polygenic selection is harder to detect than large, parallel changes in allele frequency at a few loci, though much progress in this direction is being made (e.g., Bourret et al., 2014; Gompert et al., 2017). A promising approach for detecting polygenic selection is to use time series data (e.g., Terhorst et al., 2015; Gompert, 2016; Buffalo and Coop, 2019). As our experiment continues and we accumulate more sampling points for allele frequencies, we may be able to uncover signatures of selection at the genetic level which we do not yet have the power to detect, if such selection is occurring. Overall, our experiment shows that predicting outcomes of evolution can be challenging, even in well-studied taxa for which there is a wealth of natural history knowledge. Our phenotypic results demonstrate that even when evolution is not stochastic, it may be difficult to make accurate predictions. Seemingly small changes in ecological context, both between experiments and within the replicates of our own experiment, can drastically alter natural selection and evolutionary out- comes. This lesson surely applies in species besides anoles: selection on many traits is driven by a complex interplay between interspecific interactions, behavior, and the environment. Detailed knowledge of all three is necessary to effectively predict evolution. At the genetic level, our results were unpredictable. Of course, it remains to be seen whether this is because genetic evolution is

106 truly random in our experiment, or whether future genetic sampling will reveal predictable out- comes that we do not yet have the ability to uncover. In an analysis of Gould’s thought experiment, Beatty (2006) notes that Gould, perhaps inadvertently, proposed two quite different versions. In the first, the tape of life is rewound to an earlier point and restarted. In the second, the tape is rewound, some small change is made, and then life proceeds again. Testing the first version may be impossible: who can say that any two points in the tape of life are exactly the same? However, our experiment shows that there is great value in using large scale, replicated, manipulative field experiments to test the second. In doing so, we have found that the limits to predictability can be surprisingly near. Nevertheless, finding these limits is useful for clarifying our understanding about how natural selection operates in natural populations.

107 3.6 Figures

A C Relative hindlimb length Perch height CON F PRED M 60 COMP ALL 0.5 50

0.0 40 β

30 −0.5 mean perch height (cm)

20

−1.0

pred:year comp:year pred:comp:year 2011 2012 2013 2014 2015 2016 interaction term year

B D Lamellae number Perch diameter forelimb F 0.5 M

0.0 10

−0.5

pred:year comp:year pred:comp:year β hindlimb

1.0 5 perch diameter (cm) 0.5

0.0

−0.5

−1.0 0 pred:year comp:year pred:comp:year CON PRED COMP ALL interaction term treatment

Figure 3.1: The outcomes of our predictions for univariate trait change. A and B, Estimated model parameters for the effects of predators, competitors, and their interaction on (A) hindlimb length, and (B) number of toepad scales (posterior mean ± 95% HPDI). C, Change in perch height through time (treatment mean ± 2SE, pooling across islands). D, The distribution of observed perch diam- eters across treatments, pooling across islands and years (excluding 2011, before the experimental transplants). Females are in lighter colours, with males in darker colours. For visual clarity we exclude rare perch diameters > 15 cm for plotting, but those observations were included in our statistical analysis.

108 A B

2.0 80 D) ) ∆ θ

1.5 60

1.0

0.5 in direction ( Difference 40 Difference in magnitude ( Difference

0.0 CON PRED COMP ALL CON PRED COMP ALL treatment treatment

C

15 Islands compared in different treatments in same treatment

10 count

5

0

25 50 75 100 Difference in direction (θ)

Figure 3.2: Parallelism, or lack thereof, in the magnitude and direction of multivariate phenotypic change. A and B, The differences in (A) magnitude and (B) direction of evolution for the 6 possible pairwise comparisons within each treatment. Jittered small points show individual comparisons, while the diamonds show the treatment mean ± 2SE. C, Histogram of difference in direction, θ, for all possible pairwise comparisons. Colours indicate whether the comparison was between islands within the same treatment or between islands in different treatments.

109 Figure 3.3: Heatmap of the correlation in allele frequency change for all SNPs across islands. Shaded colours indicate the strength and direction of correlation for each pairwise comparison (orange if positive, blue if negative, white if 0). The Pearson correlation coefficient for all SNPs, rALL, is shown in each cell, printed in black if statistically significant and in grey if non-significant. Islands are grouped by treatment (shown with dotted lines), and the within-island comparisons are in the solid boxes along the diagonal.

110 A B l 0.2 l 0.2 l l a a

0.0 0.0 Genetic parallelism, r Genetic parallelism, Genetic parallelism, r Genetic parallelism, −0.2 −0.2

0.0 0.5 1.0 1.5 2.0 2.5 20 40 60 80 Phenotypic parallelism, magnitude (∆D) Phenotypic parallelism, direction (θ)

Figure 3.4: Correlations between measures of parallelism at the phenotypic and genetic levels. A, Correlation between parallelism in magnitude of phenotypic evolution, ∆D, and parallelism in allele frequency change, rALL. B, Correlation between parallelism in direction of phenotypic evolution, θ, and parallelism in allele frequency change, rALL. Each point is a single pairwise comparison between islands. Orange lines representing linear trends are for visualization purposes only, our statistical analysis examined correlations.

111 Bibliography

Adams, D. C. and M. L. Collyer (2009). A general framework for the analysis of phenotypic trajectories in evolutionary studies. Evolution 63(5), 1143–1154.

Barrett, R. D. H., S. Laurent, R. Mallarino, S. P. Pfeifer, C. C. Y. Xu, M. Foll, K. Wakamatsu, J. S. Duke-Cohan, J. D. Jensen, and H. E. Hoekstra (2019). Linking a mutation to survival in wild mice. Science 363(6426), 499–504.

Beatty, J. (2006). Replaying life’s tape. Journal of Philosophy 103(7), 336–362.

Beuttell, K. and J. B. Losos (1999). Ecological morphology of Caribbean anoles. Herpetological Monographs 13, 1–28.

Blount, Z. D., R. E. Lenski, and J. B. Losos (2018). Contingency and determinism in evolution: Replaying life’s tape. Science 362(6415), eaam5979.

Bolnick, D. I., R. D. H. Barrett, K. B. Oke, D. J. Rennison, and Y. E. Stuart (2018). (Non)parallel evolution. Annual Review of Ecology, Evolution, and Systematics 49(1), 303–330.

Bonneaud, C., E. Marnocha, A. Herrel, B. Vanhooydonck, D. J. Irschick, and T. B. Smith (2015). Developmental plasticity affects sexual size dimorphism in an anole lizard. Functional Ecol- ogy 30(2), 235–243.

Bourret, V., M. Dionne, and L. Bernatchez (2014). Detecting genotypic changes associated with selective mortality at sea in Atlantic salmon: polygenic multilocus analysis surpasses genome scan. Molecular Ecology 23(18), 4444–4457.

Buffalo, V. and G. Coop (2019). The linked selection signature of rapid adaptation in temporal genomic data. bioRxiv, 1–58.

Burke, M. K., J. P. Dunham, P. Shahrestani, K. R. Thornton, M. R. Rose, and A. D. Long (2010). Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467(7315), 587–590.

112 Butler, M. A. and J. B. Losos (2002). Multivariate sexual dimorphism, sexual selection, and adap- tation in Greater Antillean Anolis lizards. Ecological Monographs 72(4), 541–559.

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software 80(1), 1–28.

Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal 10(1), 395–411.

Calsbeek, R. (2009). Experimental evidence that competition and habitat use shape the individual fitness surface. Journal of Evolutionary Biology 22(1), 97–108.

Calsbeek, R. and T. B. Smith (2007). Probing the adaptive landscape using experimental islands: density-dependent natural selection on lizard body size. Evolution 61(5), 1052–1061.

Catchen, J., P. A. Hohenlohe, S. Bassham, A. Amores, and W. A. Cresko (2013). Stacks: an analysis tool set for population genomics. Molecular Ecology 22(11), 3124–3140.

Catchen, J. M., A. Amores, P. Hohenlohe, W. Cresko, and J. H. Postlethwait (2011). Stacks: building and genotyping loci de novo from short-read sequences. G3 1(3), 171–182.

Chang, C. C., C. C. Chow, L. C. Tellier, S. Vattikuti, S. M. Purcell, and J. J. Lee (2015). Second- generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7.

Collyer, M. L. and D. C. Adams (2007). Analysis of two-state multivariate phenotypic change in ecological studies. Ecology 88(3), 683–692.

Danecek, P., A. Auton, G. R. Abecasis, C. A. Albers, E. Banks, M. A. DePristo, R. E. Handsaker, G. Lunter, G. T. Marth, S. T. Sherry, G. McVean, R. Durbin, and 1000 Genomes Project Analysis Group (2011). The variant call format and VCFtools. Bioinformatics 27(15), 2156–2158.

Danecek, P., S. S, and R. Durbin (2014). Multiallelic calling model in bcftools (-m).

Ebert, D., C. Haag, M. Kirkpatrick, M. Riek, J. W. Hottinger, and V. I. Pajunen (2002). A selective advantage to immigrant genes in a Daphnia metapopulation. Science 295(5554), 485–488.

113 Elstrott, J. and D. J. Irschick (2004). Evolutionary correlations among morphology, habitat use and clinging performance in Caribbean Anolis lizards. Biological Journal of the Linnean Soci- ety 83(3), 389–398.

Exposito-Alonso, M., 500 Genomes Field Experiment Team, H. A. Burbano, O. Bossdorf, R. Nielsen, and D. Weigel (2019). Natural selection on the Arabidopsis thaliana genome in present and future climates. Nature 348, 571–575.

Gelman, A. and J. Hill (2007). Data analysis using regression and multilevel/hierarchical models. New York, USA: Cambridge University Press.

Glossip, D. and J. B. Losos (1997). Ecological correlates of number of subdigital lamellae in Anoles. Herpetologica 53(2), 192–199.

Gompert, Z. (2016). Bayesian inference of selection in a heterogeneous environment from genetic time-series data. Molecular Ecology 25(1), 121–134.

Gompert, Z., A. A. Comeault, T. E. Farkas, J. L. Feder, T. L. Parchman, C. A. Buerkle, and P. Nosil (2014). Experimental evidence for ecological selection on genome variation in the wild. Ecology Letters 17(3), 369–379.

Gompert, Z., S. P. Egan, R. D. H. Barrett, J. L. Feder, and P. Nosil (2017). Multilocus approaches for the measurement of selection on correlated genetic loci. Molecular Ecology 26(1), 365–382.

Gould, S. J. (1989). Wonderful life: the Burgess Shale and the nature of history. New York, USA: W.W. Norton & Company.

Heckel, D. G., J. R. Ecology, and 1979 (1979). A technique for estimating the size of lizard populations. Ecology 60(5), 966–975.

Herrel, A., R. Joachim, B. Vanhooydonck, and D. J. Irschick (2006). Ecological consequences of ontogenetic changes in head shape and bite performance in the Jamaican lizard Anolis lineato- pus. Biological Journal of the Linnean Society 89(3), 443–454.

114 Jorde, P. E. and N. Ryman (2007). Unbiased estimator for genetic drift and effective population size. Genetics 177(2), 927–935.

Kawecki, T. J., R. E. Lenski, D. Ebert, B. Hollis, I. Olivieri, and M. C. Whitlock (2012). Experi- mental evolution. Trends in Ecology & Evolution 27(10), 547–560.

Kim, S. Y., K. E. Lohmueller, A. Albrechtsen, Y. Li, T. Korneliussen, G. Tian, N. Grarup, T. Jiang, G. Andersen, D. Witte, T. Jørgensen, T. Hansen, O. Pedersen, J. Wang, and R. Nielsen (2011). Estimation of allele frequency and association mapping using next-generation sequencing data. BMC Bioinformatics 12(1), 231.

Kolbe, J. J., M. Leal, T. W. Schoener, D. A. Spiller, and J. B. Losos (2012). Founder effects persist despite adaptive differentiation: a field experiment with lizards. Science 335(6072), 1086–1089.

Korneliussen, T. S., A. Albrechtsen, and R. Nielsen (2014). ANGSD: Analysis of next generation sequencing data. BMC Bioinformatics 15(1), 356.

Kreitman, M. and H. Akashi (1995). Molecular evidence for natural selection. Annual Review of Ecology and Systematics 26(1), 403–422.

Lapiedra, O., T. W. Schoener, M. Leal, J. B. Losos, and J. J. Kolbe (2018). Predator-driven natural selection on risk-taking behavior in anole lizards. Science 360, 1017–1020.

Lenski, R. E., M. R. Rose, S. C. Simpson, and S. C. Tadler (1991). Long-term experimental evolu- tion in Escherichia coli. I. Adaptation and divergence during 2,000 generations. The American Naturalist 138(6), 1315–1341.

Li, H. (2011). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27(21), 2987–2993.

Li, H. and R. Durbin (2009). Fast and accurate short read alignment with Burrows-Wheeler trans- form. Bioinformatics 25(14), 1754–1760.

115 Losos, J. B. (1990a). Ecomorphology, performance capability, and scaling of West Indian Anolis lizards: an evolutionary analysis. Ecological Monographs 60(3), 369–388.

Losos, J. B. (1990b). The evolution of form and function: morphology and locomotor performance in West Indian Anolis lizards. Evolution 44(5), 1189–1203.

Losos, J. B. (2009). Lizards in an evolutionary tree. Ecology and adaptive radiation of anoles. Berkeley and Los Angeles, California: University of California Press.

Losos, J. B. (2011). Convergence, adaptation, and constraint. Evolution 65(7), 1827–1840.

Losos, J. B. and D. J. Irschick (1996). The effect of perch diameter on escape behaviour of Anolis lizards: laboratory predictions and field tests. Animal Behaviour 51, 593–602.

Losos, J. B., T. W. Schoener, R. B. Langerhans, and D. A. Spiller (2006). Rapid temporal reversal in predator-driven natural selection. Science 314(5802), 1111.

Losos, J. B., T. W. Schoener, and D. A. Spiller (2004). Predator-induced behaviour shifts and natural selection in field-experimental lizard populations. Nature 432(7016), 505–508.

Losos, J. B. and B. Sinervo (1989). The effects of morphology and perch diameter on sprint performance of Anolis lizards. Journal of Experimental Biology 145(1), 23–30.

Losos, J. B. and D. A. Spiller (1999). Differential colonization success and asymmetrical interac- tions between two lizard species. Ecology 80(1), 252–258.

Losos, J. B., K. I. Warheit, and T. W. Schoener (1997). Adaptive differentiation following experi- mental island colonization in Anolis lizards. Nature 387, 70–73.

Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17(1), 10–12.

Nei, M., Y. Suzuki, and M. Nozawa (2010). The neutral theory of molecular evolution in the genomic era. Annual Review of Genomics and Human Genetics 11(1), 265–289.

116 Orozco terWengel, P., M. Kapun, V.Nolte, R. Kofler, T. Flatt, and C. Schlotterer (2012). Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Molecular Ecology 21(20), 4931–4941.

Peterson, B. K., J. N. Weber, E. H. Kay, H. S. Fisher, and H. E. Hoekstra (2012). Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non- model species. PLoS ONE 7(5), e37135.

Pringle, R. M., T. R. Kartzinel, T. M. Palmer, T. J. Thurman, K. Fox-Dobbs, C. C. Y. Xu, M. C. Hutchinson, T. C. Coverdale, J. H. Daskin, D. A. Evangelista, K. M. Gotanda, N. A. M. i. t. Veld, J. E. Wegener, J. J. Kolbe, T. W. Schoener, D. A. Spiller, J. B. Losos, and R. D. H. Barrett (2019). Predator-induced collapse of niche structure and species coexistence. Nature 570(7759), 58–64.

Pringle, R. M., D. M. Kimuyu, R. L. Sensenig, T. M. Palmer, C. Riginos, K. E. Veblen, and T. P. Young (2015). Synergistic effects of fire and elephants on arboreal animals in an African sa- vanna. Journal of Animal Ecology 84(6), 1637–1645.

Purcell, S. M. and C. C. Chang (2017). Plink v1.90b4.6. www.cog-genomics.org/plink/1.9/.

Rasband, W. (1997-2018). Imagej. https://imagej.nih.gov/ij/.

Reznick, D., H. Bryga, and J. A. Endler (1990). Experimentally induced life-history evolution in a natural population. Nature 346, 357–359.

Reznick, D. N., J. Losos, and J. Travis (2018). From low to high gear: there has been a paradigm shift in our understanding of evolution. Ecology Letters 51, 1742–12.

Santure, A. W. and D. Garant (2018). Wild GWAS — association mapping in natural populations. Molecular Ecology Resources 18(4), 729–738.

Schlotterer, C., R. Kofler, E. Versace, R. Tobler, and S. U. Franssen (2015). Combining experimen-

117 tal evolution with next-generation sequencing: a powerful tool to study adaptation from standing genetic variation. Heredity 114(5), 431–440.

Schneider, C. A., W. S. Rasband, and K. W. Eliceiri (2012). NIH Image to ImageJ: 25 years of image analysis. Nature Methods 9(7), 671–675.

Schoener, T. W., J. J. Kolbe, M. Leal, J. B. Losos, and D. A. Spiller (2017). A multigenerational field experiment on eco-evolutionary dynamics of the influential lizard Anolis sagrei: a mid-term report. Copeia 105(3), 543–549.

Schweyen, H., A. Rozenberg, and F. Leese (2014). Detection and removal of PCR duplicates in population genomic ddRAD studies by addition of a degenerate base region (DBR) in sequenc- ing adapters. The Biological Bulletin 227(2), 146–160.

Stuart, Y. E., T. S. Campbell, P. A. Hohenlohe, R. G. Reynolds, L. J. Revell, and J. B. Losos (2014). Rapid evolution of a native species following invasion by a congener. Science 346(6208), 463– 466.

Stuart, Y. E., T. Veen, J. N. Weber, D. Hanson, M. Ravinet, B. K. Lohman, C. J. Thompson, T. Tasneem, A. Doggett, R. Izen, N. Ahmed, R. D. H. Barrett, A. P. Hendry, C. L. Peichel, and D. I. Bolnick (2017). Contrasting effects of environment and genetics generate a continuum of parallel evolution. Nature Ecology & Evolution 1(6), 0158.

Terhorst, J., C. Schlotterer, and Y. S. Song (2015). Multi-locus analysis of genomic time series data from experimental evolution. PLoS Genetics 11(4), e1005069.

Thurman, T. J. and R. D. H. Barrett (2016). The genetic consequences of selection in natural populations. Molecular Ecology 25(7), 1429–1488.

Travisano, M., J. A. Mongold, A. F. Bennett, and R. E. Lenski (1995). Experimental tests of the roles of adaptation, chance, and history in evolution. Science 267(5194), 87–90.

Urban, M. C., G. Bocedi, A. P. Hendry, J. B. Mihoub, G. Peer, A. Singer, J. R. Bridle, L. G.

118 Crozier, L. De Meester, W. Godsoe, A. Gonzalez, J. J. Hellmann, R. D. Holt, A. Huth, K. Johst, C. B. Krug, P. W. Leadley, S. C. F. Palmer, J. H. Pantel, A. Schmitz, P. A. Zollner, and J. M. J. Travis (2016). Improving the forecast for biodiversity under climate change. Science 353(6304), aad8466.

Vendrami, D. L. J., L. Telesca, H. Weigand, M. Weiss, K. Fawcett, K. Lehman, M. S. Clark, F. Leese, C. McMinn, H. Moore, and J. I. Hoffman (2017). RAD sequencing resolves fine- scale population structure in a benthic invertebrate: implications for understanding phenotypic plasticity. Royal Society Open Science 4(2), 160548–17.

Vischer, N. and S. Nastase (2015). objectj. https://sils.fnwi.uva.nl/bcb/objectj/index.html.

Wilson, D. J. (2019). The harmonic mean p-value for combining dependent tests. Proceedings of the National Academy of Sciences 104, 201814092.

Zhou, X., P. Carbonetto, and M. Stephens (2013). Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genetics 9(2), e1003264.

Zhou, X. and M. Stephens (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Ecology & Evolution 44(7), 821–824.

119 General Discussion & Conclusion

In my thesis, I have examined the process of adaptation from both phenotypic and genetic perspectives. In Chapter 1, I used a meta-analysis of published selection coefficients to uncover the strength and form of natural selection at the genetic level, show how it varies across biological and methodological categories, and provide guidance on how best to study selection at the genetic level. As the first meta-analysis of selection coefficients at the genetic level, it begins to answer some fundamental questions about the genetic consequences of natural selection. In Chapter 2, I use novel statistical methods to show that the Heliconius erato hybrid zone in Panamá has continued to move west since the year 2000, but that it is slowing down and growing wider. I show that deforestation, a proposed ecological factor influencing cline dynamics, is unlikely to be causing cline movement, though it may be related to the decrease in selection that is causing the cline to grow wider. By simultaneously developing new methods and applying them to a well-studied hybrid zone, my work shows the value of integrating novel techniques and multiple sources of data to better understand how evolution occurs in the wild. In Chapter 3, I tested whether Anolis sagrei would evolve predictably in response to introduced predators and competitors. Phenotypic change, though not stochastic, was related to differences in vegetation structure and population densities across islands, making advance forecasting difficult. At the genetic level, however, evolution in A. sagrei was unpredictable, which may be due to a diffuse genetic architecture for selected traits, or genetic drift in these small populations. This work shows how field experiments can be used to test and refine hypotheses of how natural selection operates on ecologically-relevant traits.

120 Future Directions

There are a number of ways to build on these results to continue advancing our understand- ing of adaptation. Our meta-analysis of selection coefficients was inspired by the analysis of phe- notypic selection coefficients (e.g., Kingsolver et al., 2001). Meta-analysis of phenotypic selection coefficients is now something of a field unto itself, with ever larger databases being used to answer a wide array of questions, such as whether precipitation or human influences affect the strength of selection (Siepielski et al., 2017; Fugère and Hendry, 2018). Our analysis only scratched the sur- face of some of these possible questions. However, future meta-analyses of genetic selection will face some serious hurdles. The meta-analysis of selection coefficients presented in chapter 1 was published in 2016, when the calculation of selection coefficients at the genetic level was just reach- ing a tipping point. Though we found that roughly 80 studies had calculated individual selection coefficients, the majority of estimates came from two studies which combined field experiments and genomics to calculate selection for hundreds of alleles (Anderson et al., 2014; Gompert et al., 2014). We predicted that the scale and scope of this type of study would increase. That turned out to have been an understatement. Now, individual studies can calculate selection coefficients for, literally, 1000x more selection coefficients than we analyzed in 2016 (e.g., Exposito-Alonso et al., 2019, which calculated statistically significant estimates of s for over 400k SNPs). Incorporating such vast amounts of data will be a significant challenge for future meta-analyses, and may require the development of new statistical techniques for incorporating things like linkage disequilibrium between SNPs into the mathematical infrastructure of the random-effects meta-analysis. A (per- haps) less challenging approach would be to build on the theory we developed in section A.2. We took a phenotype-down approach to predicting the distribution of genetic selection coefficients to see how different genetic architectures for traits would transmit selection down to the underlying alleles. We made a number of simplifying assumptions (e.g., no epistasis or pleiotropy); it would be fruitful to know how these assumptions might influence our conclusion that the distribution of selection coefficients could be relatively robust to differences in genetic architecture. In chapter 2, the Bayesian cline model I developed has great potential to be widely used and

121 extended. Already, I have extended my Bayesian method to allow estimation of phenotypic clines in quantitative traits, and released all these models in an open-source R package, bahz (Thurman, 2019). Other extensions can be readily imagined. My genetic model accounts for statistical de- pendence between sampled alleles due to inbreeding. With multi-locus genetic data, it could be extended to account for relatedness between individuals within a population. Other Bayesian mod- els of allele frequencies and relatedness have used the beta-binomial distribution to account for this relatedness, which could be a promising approach for cline models as well (Bradburd et al., 2013). As genomic data become more readily available, another possibility would be to move beyond the estimation of single-locus clines and calculate genome-wide geographic clines. There are currently methods available to estimate genomic clines of ancestry, that is, clines where the x-axis is the hy- brid index of an individual instead of the distance along a transect (Gompert and Buerkle, 2012). Again, these methods could serve as a model for calculating genome-wide geographic clines. My thesis also demonstrates the value of long-term studies (Chapters 2 and 3). Chapter 2 was the third study of this hybrid zone over the last 30 years, and this long time series allowed us to make inferences about hybrid zone dynamics that would not have been possible within a single study, or even with only two timepoints. We have tissue samples from the majority of butterflies sampled for this study, and for some of the butterflies from Blum’s sampling in 2002. This hybrid zone would thus be a great system for studying allele frequency clines in a genome-wide context, particularly as a way to develop the theory and statistics of genome-wide clines (see above). Along a simpler route, I hope that I, or someone else, will be able to revist the Heliconius hybrid zone in the coming decades: as the hybrid zone nears the city of Panamá and its suburbs, it will be interesting to see whether this urbanized environment alters selection or migration in a which affects the hybrid zone. Finally, although the five-year, five-generation, time period of our experiment in Chapter 3 may seem short, it is quite long for a replicated field experiment of genetic selection, as most last only a generation (e.g., Gompert et al., 2014; Exposito-Alonso et al., 2019). Multi-generation ex- periments like this allows biologists to test if short-term selection leads to long-term evolutionary

122 change, and these experiments only become more useful as they continue over more generations. The experiment in the Bahamas is still ongoing. Future sampling of these populations would be valuable for further advancing our understanding of selection over contemporary timescales. Addi- tional sampling points will be especially useful for genetic analysis, as the last few years have seen a great increase in the number of methods that can detect selection at the genetic level using time series of allele frequencies (Mathieson and Mcvean, 2013; Rego et al., 2019; Buffalo and Coop, 2019).

Conclusion

To return to the questions posed in the introduction, what does this dissertation tell us about the process of adaptation? First, we have learned that, when averaging across many studies, the overall effects of natural selection are surprisingly similar at the phenotypic and genetic level. In both cases, the strength of selection is roughly exponentially distributed. Also, the strength of selection varies with the length of time over which it is measured: at both levels, it is stronger when measured over shorter timescales. These results both point to generalities about natural selection itself. Returning to the idea of natural selection as a consequence of variance in fitness, our results suggest that the variance in fitness in natural populations tends to be small, though in some cases it can be quite large (and, indeed, be larger than biologists often expect, e.g., Gerbault et al., 2009). This suggests that many populations, or at least the populations that biologists choose to study, are already relatively well-adapted to their environment. The fact that weak selection is more common may, as the results of selection across time suggest, be because large variances in fitness (strong selection) cannot be maintained for long. This is another example of the way that selection "erases its traces" (Haller and Hendry, 2014). Second, we have shown how information on genetic evolution can be used to contextualize phenotypic evolution, and vice versa, to better understand adaptation. In Chapter 2, for example, we could link changes in phenotype frequencies directly to changes in allele frequencies. By doing so, we could make use of the large theoretical and statistical literature on allele frequency clines

123 to better describe and characterize our hybrid zone. Even in Chapter 3, where the genetic results are less clear, they provide insight into how phenotypic evolution may be occurring. The lack of genetic parallelism and significant SNPs in the GWAS suggests that the phenotypic change we see in our experiment may have a complex genetic basis, and that there may be multiple genetic routes to arriving at the same outcome (e.g., multiple genetic routes to the same phenotypic outcome, Feldman et al., 2016; Gould and Stinchcombe, 2017). Third, we have shown that an understanding of the ecology and natural history of the organ- ism or population being studied is essential to understanding why, in the proximate sense, natural selection and adaptation occur. In Chapter 2, we use data on forest cover to distinguish between possible reasons for cline movement, settling on dominance drive as the more likely explanation. And, as Chapter 3 shows, predicting evolution requires detailed knowledge of how ecology, pheno- type, and the genotype interact. Even small changes in ecological context can alter the outcomes of natural selection. This is an important fact to consider when trying to predict how and whether or- ganisms may be able to adapt to the changing environments caused by anthropogenic disturbances. Finally, the work presented in this thesis adds to the ever-growing number of examples that evolution and adaptation can be experimentally studied. To those with their finger on the pulse of evolutionary biology, this is not a surprise. The understanding that biologists can observe, quantify, and experimentally study natural selection and adaptation on contemporary timescales is now the dominant paradigm in the field. Nevertheless, I think it is important to reinforce and advance this paradigm through ambitious, comprehensive field experiments like the one presented in Chapter 3. Even when the results do not turn out as we expect, we learn a lot about the process of adaptation. Perhaps more importantly, the idea that natural selection and adaptation can be rapid, contemporary processes has yet to take hold outside of academic biology. Nearly every non-biologist I have ever told about my research is surprised that it is possible to study evolution as it happens over the course of a 6-year PhD. This is a problem: the world faces many biological challenges, and many, perhaps most, people have little understanding of the fundamental biological processes that are at the heart of these challenges. Hopefully, more studies like the ones presented in this dissertation

124 will change that.

Bibliography

Anderson, J. T., C.-R. Lee, and T. Mitchell-Olds (2014). Strong selection genome-wide enhances fitness trade-offs across environments and episodes of selection. Evolution 68(1), 16–31.

Blum, M. J. (2002). Rapid movement of a Heliconius hybrid zone: evidence for phase III of Wright’s shifting balance theory? Evolution 56(10), 1992–1998.

Bradburd, G. S., P. L. Ralph, and G. M. Coop (2013). Disentangling the effects of geographic and ecological isolation on genetic differentiation. Evolution 67(11), 3258–3273.

Buffalo, V. and G. Coop (2019). The linked selection signature of rapid adaptation in temporal genomic data. bioRxiv, 1–58.

Exposito-Alonso, M., 500 Genomes Field Experiment Team, H. A. Burbano, O. Bossdorf, R. Nielsen, and D. Weigel (2019). Natural selection on the Arabidopsis thaliana genome in present and future climates. Nature 348, 571–575.

Feldman, C. R., A. M. Durso, C. T. Hanifin, M. E. Pfrender, P. K. Ducey, A. N. Stokes, K. E. Barnett, E. D. Brodie III, and E. D. Brodie Jr (2016). Is there more than one way to skin a newt?; Convergent toxin resistance in snakes is not due to a common genetic mechanism. Heredity 116(1), 84–91.

Fugère, V. and A. P. Hendry (2018). Human influences on the strength of phenotypic selection. Proceedings of the National Academy of Sciences 16, 201806013.

Gerbault, P., C. Moret, M. Currat, and A. Sanchez-Mazas (2009). Impact of selection and demog- raphy on the diffusion of lactase persistence. PLoS ONE 4(7), e6369.

Gompert, Z. and C. A. Buerkle (2012). bgc: Software for Bayesian estimation of genomic clines. Molecular Ecology Resources 12(6), 1168–1176.

125 Gompert, Z., A. A. Comeault, T. E. Farkas, J. L. Feder, T. L. Parchman, C. A. Buerkle, and P. Nosil (2014). Experimental evidence for ecological selection on genome variation in the wild. Ecology Letters 17(3), 369–379.

Gould, B. A. and J. R. Stinchcombe (2017). Population genomic scans suggest novel genes under- lie convergent flowering time evolution in the introduced range of Arabidopsis thaliana. Molec- ular Ecology 26(1), 92–106.

Haller, B. C. and A. P. Hendry (2014). Solving the paradox of stasis: squashed stabilizing selection and the limits of detection. Evolution 68(2), 483–500.

Kingsolver, J. G., H. E. Hoekstra, J. M. Hoekstra, D. Berrigan, S. N. Vignieri, C. E. Hill, A. Hoang, P. Gibert, and P. Beerli (2001). The strength of phenotypic selection in natural populations. The American Naturalist 157(3), 245–261.

Mathieson, I. and G. Mcvean (2013). Estimating selection coefficients in spatially structured pop- ulations from time series data of allele frequencies. Genetics 193, 973–984.

Rego, A., F. J. Messina, and Z. Gompert (2019). Dynamics of genomic change during evolutionary rescue in the seed beetle Callosobruchus maculatus. Molecular Ecology 28, 2136–2154.

Siepielski, A. M., M. B. Morrissey, M. Buoro, S. M. Carlson, C. M. Caruso, S. M. Clegg, T. Coul- son, J. DiBattista, K. M. Gotanda, C. D. Francis, J. Hereford, J. G. Kingsolver, K. E. Augus- tine, L. E. B. Kruuk, R. A. Martin, B. C. Sheldon, N. Sletvold, E. I. Svensson, M. J. Wade, and A. D. C. MacColl (2017). Precipitation drives global variation in natural selection. Sci- ence 355(6328), 959–962.

Thurman, T. (2019). bahz: Bayesian analysis of hybrid zones. https://github.com/tjthurman/BAHZ. R package version 0.0.0.9011.

126 Appendix

127 A.1 Supplemental Material for Chapter 1

Supplemental material for: The genetic consequences of selection in natural populations

A.1.1 Details of literature review

Web of Science searches

To find studies that reported selection coefficients at the genetic level, we searched the Thomson Reuters Web of Science database using the keywords “selection coefficient*”, “geno- typ* selection”, and “adapt* gene”. We performed two sets of searches. In the first, during the week of October 29, 2013, we searched the full Web of Science database. The second round of searching, performed on March 25, 2015, was limited to the period of January 2014-March 2015. For each round of searching, we excluded some research areas that were not pertinent to evolu- tionary biology. The details of exact search terms and research area exclusions for each search are given below: Topic=("selection coefficient*") Refined by: [excluding] Research Areas=(CHEMISTRY OR DENTISTRY ORAL SURGERY MEDICINE OR EN- ERGY FUELS OR PHYSICS OR HISTORY PHILOSOPHY OF SCIENCE OR MATHEMATICS OR INSTRU- MENTS INSTRUMENTATION OR PHARMACOLOGY PHARMACY OR POLYMER SCIENCE OR PSYCHI- ATRY OR OPTICS OR PSYCHOLOGY OR TELECOMMUNICATIONS OR AUTOMATION CONTROL SYS- TEMS) Timespan=All years. Databases=SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH. Search performed: Oct. 29, 2013 Topic=("genotyp* selection") Refined by: [excluding] Web of Science Categories=(ENGINEERING CHEMICAL OR ENGINEERING INDUS- TRIAL OR ENGINEERING MECHANICAL OR ENGINEERING MULTIDISCIPLINARY OR GEOCHEMISTRY GEOPHYSICS OR GERIATRICS GERONTOLOGY OR HEMATOLOGY OR MECHANICS OR CHEMISTRY APPLIED OR MEDICINE LEGAL OR COMPUTER SCIENCE INFORMATION SYSTEMS OR METEOROLOGY ATMOSPHERIC SCIENCES OR GASTROENTEROLOGY HEPATOLOGY OR OBSTETRICS GYNECOLOGY OR MEDICINE RESEARCH EXPERIMENTAL OR PUBLIC ENVIRONMENTAL OCCUPATIONAL HEALTH OR PHYSICS CONDENSED MATTER OR PSYCHIATRY OR PSYCHOLOGY MULTIDISCIPLINARY OR RA- DIOLOGY NUCLEAR MEDICINE MEDICAL IMAGING OR CHEMISTRY ANALYTICAL OR SURGERY OR COMPUTER SCIENCE THEORY METHODS OR WATER RESOURCES OR ENERGY FUELS ) Timespan=All

128 years. Databases=SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH. Search performed Nov. 5, 2013 Topic=("adapt* gene") Refined by: [excluding] Research Areas=(SURGERY OR AUTOMATION CONTROL SYSTEMS OR FOOD SCI- ENCE TECHNOLOGY OR MATERIALS SCIENCE OR OPHTHALMOLOGY OR PSYCHIATRY OR RESPIRA- TORY SYSTEM OR ANTHROPOLOGY OR BUSINESS ECONOMICS OR PHARMACOLOGY PHARMACY OR GENERAL INTERNAL MEDICINE OR ETHNIC STUDIES OR GERIATRICS GERONTOLOGY OR NU- TRITION DIETETICS OR HISTORY OR IMAGING SCIENCE PHOTOGRAPHIC TECHNOLOGY OR PHYSICS OR MECHANICS OR RESEARCH EXPERIMENTAL MEDICINE OR METEOROLOGY ATMOSPHERIC SCI- ENCES OR OBSTETRICS GYNECOLOGY OR ORTHOPEDICS OR MEDICAL LABORATORY TECHNOLOGY OR RHEUMATOLOGY OR PSYCHOLOGY OR SOCIAL ISSUES OR PUBLIC ENVIRONMENTAL OCCUPA- TIONAL HEALTH OR SOCIOLOGY OR URBAN STUDIES OR UROLOGY NEPHROLOGY OR PEDIATRICS ) Timespan=All years. Search language=Auto Search performed Nov. 5, 2013 TOPIC: (selection coefficient*) Refined by: [excluding] RESEARCH AREAS: (ENGINEERING OR SPORT SCIENCES OR NUTRITION DIETETICS OR PSYCHOLOGY OR NUCLEAR SCIENCE TECHNOLOGY OR ACOUSTICS OR DENTISTRY ORAL SURGERY MEDICINE OR CHEMISTRY OR METEOROLOGY ATMOSPHERIC SCIENCES OR PUBLIC ENVIRONMENTAL OCCUPATIONAL HEALTH OR RHEUMATOLOGY OR PHYSICS OR TELECOMMUNICATIONS OR FOOD SCIENCE TECHNOLOGY OR ELECTROCHEMISTRY OR CRYSTALLOGRAPHY OR RADIOLOGY NUCLEAR MEDICINE MEDICAL IMAGING OR EDUCATION EDUCATIONAL RESEARCH OR ASTRONOMY ASTROPHYSICS OR MATERIALS SCIENCE OR ROBOTICS OR SCIENCE TECHNOLOGY OTHER TOPICS OR SPECTROSCOPY OR SURGERY OR OTORHINOLARYNGOLOGY OR PHARMACOLOGY PHARMACY OR METALLURGY METALLURGICAL ENGINEERING OR MINERALOGY OR BUSINESS ECONOMICS OR MATHEMATICAL METHODS IN SOCIAL SCIENCES OR CONSTRUCTION BUILDING TECHNOLOGY OR NEUROSCIENCES NEUROLOGY OR IMAGING SCIENCE PHOTOGRAPHIC TECHNOLOGY OR GEOCHEMISTRY GEOPHYSICS OR OPTICS OR POLYMER SCIENCE OR MECHANICS OR ORTHOPEDICS OR ENERGY FUELS OR GENERAL INTERNAL MEDICINE OR MINING MINERAL PROCESSING OR CARDIOVASCULAR SYSTEM CARDIOLOGY OR AUTOMATION CONTROL SYSTEMS OR UROLOGY NEPHROLOGY OR REHABILITATION OR INSTRUMENTS INSTRUMENTATION OR PSYCHIATRY OR THERMODYNAMICS OR TRANSPORTATION OR HISTORY PHILOSOPHY OF SCIENCE OR GEOLOGY OR GEOGRAPHY OR PHYSICAL GEOGRAPHY OR WATER RESOURCES OR OPERATIONS RESEARCH MANAGEMENT SCIENCE ) AND [excluding] WEB OF SCIENCE CATEGORIES: ( STATISTICS PROBABILITY OR CRIMINOLOGY PENOLOGY OR COMPUTER SCIENCE ARTIFICIAL INTELLIGENCE OR COMPUTER

129 SCIENCE CYBERNETICS OR URBAN STUDIES OR SOCIOLOGY OR SOCIAL SCIENCES INTERDISCIPLINARY OR SOCIAL SCIENCES BIOMEDICAL OR SOCIAL ISSUES OR PUBLIC ADMINISTRATION OR POLITICAL SCIENCE OR OPHTHALMOLOGY OR NURSING OR MATHEMATICS APPLIED OR COMPUTER SCIENCE HARDWARE ARCHITECTURE OR AREA STUDIES OR ANESTHESIOLOGY OR MATHEMATICS ) Timespan: 2014-2015. Indexes: SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH. Search performed March 15, 2015. TOPIC: ("genotyp* selection") Timespan: 2014-2015. Indexes: SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH. Search language=Auto. Search performed March 15, 2015. TOPIC: ("adapt* gene") Timespan: 2014-2015. Indexes: SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH. Search language=Auto. Search performed March 15, 2015.

Journal Tables of Contents

We also searched the weekly tables of contents of the following journals for studies that might report selection coefficients: Nature; Science; Journal of Heredity; Evolution; Ecology Letters; Methods in Ecology and Evolution; Proceedings of the Royal Society B, Philosophical Transactions of the Royal Society; Biological Journal of the Linnean Society; Genome Research; Molecular Ecology; Nature Genet- ics; Heredity; Proceedings of the National Academy of Sciences of the United States of America; Journal of Evolutionary Biology.

A.1.2 Standardization of selection coefficients

Selection coefficients can be calculated under a number of population genetic models, which can complicate the comparison of selection coefficients across studies. When measuring directional selection, the selection coefficient s is equal to the difference in relative fitness between the most fit and least fit homozygote. One important but often overlooked issue is that selection co- efficients quantifying the fitness disadvantage of deleterious alleles (negative selection) are slightly different from selection coefficients quantifying the fitness advantage of beneficial alleles (positive selection). Consider a single-locus model with two alleles, A and a. Homozygotes of AA are more

130 fit than homozygotes of aa (w = fitness, wAA > waa).

Selection against a (snegative) is equal to the difference in relative fitnesses of the two ho- mozygotes, with the more-fit homozygote used as the reference (equations A.1 and A.2).

wAA waa snegative = − (A.1) wAA wAA w = 1 − aa (A.2) wAA

Because waa/wAA < 1, snegative ranges from 0 to 1. When calculating selection for A

(spositive), however, the fitness of the less-fit homozygote is used as the reference (equations A.3 and A.4).

wAA waa spositive = − (A.3) waa waa w = AA − 1 (A.4) waa

If the AA homozygote is more than twice as fit as the aa homozygote (if wAA/waa > 2), spositive will be greater than 1. Indeed, spositive could be equal to infinity, in the case of a lethal allele where waa = 0. Because spositive and snegative use different genotypes as the reference genotype, they are not directly comparable and spositive does not equal snegative. This is easily seen when one genotype is more than twice as fit as another. For example, if wAA = 10, and waa = 3, snegative = 1 − (3/10) = 0.7 and spositive = (10/3) − 1 = 2.33. The difference between snegative and spositive decreases when the magnitude of selection is smaller. For example, if wAA = 10 and waa = 9, snegative = 1 − (9/10) = 0.1 and spositive = (10/9) − 1 = 0.11. Fortunately, only simple algebraic re-arrangement is necessary to derive a conversion be- tween spositive and snegative. One can rearrange equations A.2 and A.4 so that they are equal both equal to wAA/waa, yielding equation A.5.

131 1 = spositive + 1 (A.5) 1 − snegative

Further algebraic rearrangement yields two simple equations that can be used to convert between positive and negative selection:

snegative spositive = (A.6) 1 − snegative

spositive snegative = (A.7) 1 + spositive

Clearly, classifying selection as either positive or negative depends on the interest of the researcher. In studies of adaptation, positive selection for new, fitter alleles is measured relative to the wild type. Studies of deleterious mutations quantify selection against the new, less fit mutant relative to the fitter wild-type. Using the wild-type as the reference makes biological sense, but in either case the perspective could be flipped (e.g., studies of adaptation could calculate negative selection against the now less-fit wild-type allele). For our analysis, we used equation A.7 to convert all estimates of positive selection for more-fit alleles into estimates of negative selection against the less-fit allele so that all selection coefficients are directly comparable. We could have standardized estimates as positive selection,

though this would complicate analysis: there are two lethal alleles in our dataset (spositive = ∞), which would need to be excluded lest the mean s equal infinity. Excluding those estimates, the

value of spositive in our dataset would range from 0 to 8.3, instead of the more intuitive range of negative selection coefficients (0 to 1).

A.1.3 Details of MCMCglmm models

Presented below are the priors and model specifications for each of the generalized linear mixed models we fitted using the MCMCglmm package (Hadfield, 2010). Models 1-4 and 8 were performed on the full dataset with selection coefficients modeled under an exponential distribution. For the fixed effect priors, we specified independent normal distributions with mean = 0 and large

132 variance (109). For the random effects, we used parameter expansion to generate F priors on the working parameters of our flat inverse-Wishart priors. We used flat inverse-Wishart priors with V = 1 and confidence parameter = 0 for the residual variance. Each model included study ID as a random factor. Models 5-7 were performed on the subset of the dataset for which measures of error were reported with selection coefficients modeled under a Gaussian distribution. Again, the prior distri- butions for the fixed effects were independent normal distributions with mean = 0 and variance = 109. The random effects and residual variances were both given flat inverse-Wishart priors with V = 1 and confidence parameter = 0. To examine the effects of different priors, we also ran all models with inverse-Wishart priors (shape V =1, confidence parameter = 0.002) for the random effects and residual variance, which is equivalent to an inverse gamma distribution with shape and scale equal to 0.001. For models 1-4 and 8, we did this for models with both normal and parameter-expanded priors. Though changing the priors marginally changed the point estimates of s in some cases, the 95% HPD intervals were similar such that our conclusions were supported regardless of the specifics of the prior distributions. Model 1: Intercept-only model to determine the mean selection coefficient, no predictor vari- ables. prior.1.pe.000 <- list(R= list(V=1, n = 0), G= list(G1 = list(V=1, n = 0, alpha.mu=0, alpha.V = 1000))) model.1.pe.000 <- MCMCglmm(corr.s ~ 1, family = "exponential", data = directional, random = ~ study.ID, nitt = 5100000, prior = model.1.pe.000, burnin= 100000, thin = 10000, verbose = T, pr = T)

Model 2: Modeling relationship between selection coefficient and time period. prior.2.pe.000 <- list(R= list(V=1, n = 0), G= list(G1 = list(V=1, n = 0, alpha.mu=0, alpha.V = 1000))) model.2.pe.000 <- MCMCglmm(corr.s ~ time.period -1,

133 family = "exponential", data = directional, prior = prior.2.pe.000, random = ~ study.ID, nitt = 2100000, burnin= 100000, thin = 1000, verbose = T, pr = T)

Model 3: Modeling relationship between selection coefficient and unit of selection. prior.3.pe.000 <- list(R= list(V=1, n = 0), G= list(G1 = list(V=1, n = 0, alpha.mu=0, alpha.V = 1000))) model.3.pe.000 <- MCMCglmm(corr.s ~ unit.selection -1, family = "exponential", data = directional, prior = prior.3.pe.000, random = ~ study.ID, nitt = 500000, burnin= 100000, thin = 200, verbose = T, pr = T)

Model 4: Modeling relationship between selection coefficient and form of selection. prior.4.pe.000 <- list(R= list(V=1, n = 0), G= list(G1 = list(V=1, n = 0, alpha.mu=0, alpha.V = 1000))) model.4.pe.000 <- MCMCglmm(corr.s ~ form.of.selection - 1, family = "exponential", data = directional, prior = prior.4.pe.000, random = ~ study.ID, nitt = 1100000, burnin= 100000, thin = 500, verbose = T, pr = T)

Model 5: Intercept-only model to determine mean selection coefficient in subset of data with measures of error. Includes study ID as a random factor and incorporates measurement error. model.5.default <- MCMCglmm(fixed = corr.s ~ 1, family = "gaussian", mev = error.term, data = error.measures, random = ~ study.ID, nitt =500000, burnin= 10000, thin = 200, pr = T)

Model 6: Intercept-only model to determine mean selection coefficient in subset of data with measures of error. Incorporate measurement error. model.6.default <- MCMCglmm(fixed = corr.s ~ 1, family = "gaussian", mev = error.term, data = error.measures,

134 nitt =500000, burnin= 100000, thin = 200, pr = T)

Model 7: Intercept-only model to determine mean selection coefficient in subset of data with

measures of error. Includes study ID as a random factor.

model.7.default <- MCMCglmm(fixed = corr.s ~ 1, family = "gaussian", random = ~ study.ID, data = error.measures, nitt =500000, burnin= 100000, thin = 200, pr = T)

Model 8: Modeling relationship between positive and negative selection.

prior.8.pe.000 <- list(R= list(V=1, n = 0), G= list(G1 = list(V=1, n = 0, alpha.mu=0, alpha.V = 1000))) model.8.pe.000 <- MCMCglmm(corr.s ~ exp.obs -1, family = "exponential", data = directional, prior = prior.3.pe.000, random = ~ study.ID, nitt = 500000, burnin= 100000, thin = 200, verbose = T, pr = T)

A.1.4 Determining the distribution of selection coefficients

Visually, selection coefficients roughly follow an exponential distribution. To evaluate this more rigorously, we used the fitdistrplus package in R to fit our data to exponential, gamma, log-normal, normal, and Weibull distributions (R Core Team, 2017; Delignette-Muller and Du- tang, 2015). Recent meta-analyses of phenotypic selection have used a folded-normal distribu- tion to model the distribution of phenotypic selection gradients (Hereford et al., 2004; Kingsolver et al., 2012; Morrissey and Hadfield, 2012). The folded-normal distribution is not part of the fitdistrplus package. Instead, we wrote a custom R function to determine if there was a folded-normal distribution that was not statistically different (according to a Kolmogorov-Smirnov test) from the empirical distribution of selection coefficients we observed. However, no folded- normal distributions passed this test. For the other distributions, we compared AIC values to de- termine which had the best fit. A Weibull distribution with shape parameter of 0.97 and scale parameter of 0.1339 provided the best fit (AIC = -6809.824), while an exponential distribution

135 with rate parameter 7.36 was the next-best fit (AIC = -6806.68). However, the empirical distribu- tion of s is significantly different from both of these idealized distributions (Kolmogorov-Smirnov test, Weibull: D = 0.0685, p = 2.31 × 10−14; exponential: D = 0.0753, p < 2.2 × 10−16). The Weibull distribution can take many shapes, but a Weibull with a shape parameter of 1 is equivalent to an exponential distribution (Forbes et al., 2011, pp. 196). Though the distribution of selection coefficients does not perfectly fit an exponential distribution, it is clearly of an exponential form.

A.1.5 Additional non-parametric statistics

In addition to the nonparametric statistics that we report in the main text, we used Wilcoxon rank-sum tests to determine whether the various biological and methodological categories had sig- nificantly different median magnitudes. In most cases, these statistical tests returned the same re- sults. Once again, negative selection was significantly greater than positive selection in the full dataset, though this difference was not significant in the reduced dataset (full dataset: W = 1356930, p = 9.03 × 10−4; reduced dataset: W = 9776, p = 0.09). Selection in experimen- tal studies was of greater magnitude than selection in observational studies (full dataset: W = 603415.5, p < 2.2 × 10−16: reduced dataset: W = 6141.5, p = 0.001). This is in slight contrast to the results of the main text; in the reduced dataset, while selection in experiments is greater than selection in observational studies, the 95% confidence intervals for these categories overlap. When considering how selection varies with time, the Wilcoxon tests concur with the results presented in the main text: selection within a generation is not significantly different from selection over the short term (W = 217418, p = 0.10), though it is significantly stronger than long-term selection (W = 333158, p < 2.2 × 10−16). Selection over short timescales is also stronger than selection over long timescales (W = 5070.5, p = 6.88 × 10−9). In the reduced dataset, a Wilcoxon rank-sum test finds that selection within a generation is not significantly different from short-term selection (W = 2641.5, p = .88). However, the 95% confidence intervals for these categories do not overlap. Otherwise, the Wilcoxon tests concur with the main text. Selection within a generation is significantly greater than selection over long time periods (W = 4009, p = 8.19 × 10−6), and

136 selection over short timescales is greater than selection over long timescales (W = 5070.5, p = 6.88 × 10−9). In the full dataset, selection on SNPs was significantly greater than selection on haplotypes (W = 1459080, p = 2.2 × 10−4), though this difference is marginally not significant according to the 95% confidence intervals, which barely overlap. In the reduced dataset, selection on haplotypes was significantly greater than selection on SNPs, (W = 8181.5, p = 9.69 × 10−8), in agreement with the result obtained by comparing bootstrapped means.

A.1.6 Effect of null estimates (s = 0)

Our main empirical goals were to understand the magnitude and distribution of selection when it is operating, as opposed to the overall distribution of selection coefficients for all mutations or genetic variants. Thus, we did not include estimates of s = 0 in our analysis. However, we were curious if our results were robust to the inclusion of estimates of no selection. To explore this, we performed sensitivity analyses to determine how many estimates of s = 0 would need to be included to change our conclusions about the distribution of selection and how it changes with temporal and genetic scale. We added estimates of s = 0 to only one category (e.g., selection over short timescales) until the 95% confidence intervals for that category (determined by 10,000 bootstrap replicates) were no longer consistent with our previous results (e.g., until selection over short timescales was equal to or less than selection over long timescales). This method is conservative, as it is unlikely that all published estimates of s = 0 would fall into one category. The results of these analyses are presented in Table A.2. In most cases, hundreds of esti- mates of s = 0 would need to be added to change our conclusions. Some results are particularly robust. For example, in the full dataset 9744 estimates of s = 0 for within-generation selection would need to be added before selection within a generation is significantly weaker than selec- tion over long timescales. Other results are less robust. In the reduced dataset, selection within a generation was significantly greater than selection over the short term, but only just. Adding one estimate of s = 0 to the within-generation category would cause these confidence intervals to

137 overlap (though this is mostly due to the far wider confidence intervals in the reduced dataset: the difference in means would still be quite large). We did not systematically record the number of estimates of s = 0, but we noticed rela- tively few of these estimates outside of Anderson et al. (2014), which reported 159 (107 SNPs, 52 haplotypes) estimates of s = 0 after accounting for pseudoreplication. Adding these estimates to our database does not significantly alter our conclusions (e.g., mean s for SNPs = 0.133, 95% CI: 0.127-0.139; mean s for haplotypes = 0.122, 95% CI: 0.115-0.130). Given this, and the results of the sensitivity analysis, we expect that our overall conclusions would change very little if we included all published estimates of s = 0.

A.1.7 Testing for Publication Bias

We performed some additional tests for publication bias. Visual diagnosis of publication bias against estimates of weak selection may be easier if selection coefficients are binned at smaller intervals. We therefore re-created Figure 1.1 using bin sizes of 0.01 (Figure A.2). These graphs show more clearly that there are many fewer estimates of selection in the smallest category (0 to 0.01) than in the second-smallest category (0.01 to 0.02). Often, publication bias in meta-analyses can be diagnosed by observing a correlation be- tween sample size (a proxy for experimental power) and the effect size being analyzed. Studies with smaller sample sizes are likely to have extreme values for effect sizes, as only extreme values will be significant with small sample sizes. Higher-powered studies, on the other hand, are more able to significantly detect weak effects and will have more moderate effect sizes (Palmer, 1999). We used linear regression to determine whether there was a significant relationship between sample size and estimated selection coefficient in both the full and reduced datasets (A.3). In both cases the linear regressions were statistically significant, but sample size explained very little of the variation in selection coefficients (adjusted r2 of 0.0034 in the full dataset, 0.0282 in the reduced dataset). In the full dataset, the regression could be considered weak evidence for publication bias: strength of selection tended to decrease with increasing sample size. In the reduced dataset, however, strength

138 of selection increased with increasing sample size. Because selection coefficients can be calculated with many different methods, sample sizes may not be an accurate proxy for experimental power across our dataset. To examine this, we used linear models to examine the relationship between sample size and the precision of each estimate (size of confidence intervals) for estimates of selection in which both sample sizes and measures of error were reported (Figure A.4). Increased sample size was associated with a small but significant decrease in the size of the error bounds, though this relationship disappeared when we excluded a small number of estimates that had extremely large sample sizes (N >15000). Given this, we performed a final test for publication bias by using a linear model to examine the relationship between size of error bounds and the magnitude of s (Figure A.5). The magnitude of the estimate of s increased significantly with the size of error bounds, and the effect was quite strong (slope = 0.638, adjusted r2= .85, p < 2.2 × 10−16). This relationship is indicative of pub- lication bias: less-powerful studies tended to report more extreme values of s than more-powerful studies. However, we could only perform this analysis on the roughly 12% (412 out of 3416) of estimates of selection for which error bounds were reported.

A.1.8 Temporal variation in selection in Anderson et al. 2014

Anderson et al. (2014) examined trade-offs in fitness across space and time by calculating correlation coefficients for selection coefficients across various episodes of selection (see table 4 in Anderson et al., 2014). However, they focused on spatial patterns in trade-offs and did not calculate correlation coefficients for all possible comparisons within a cohort. We calculated correlation coefficients between corrected selection coefficients for all pair- wise comparisons within a cohort. For this analysis, we standardized the magnitude of selection coefficients as before, but made estimates of s positive or negative to indicate directionality (as op- posed to our main analysis, in which directionality was indicated by a factor). Similar to Anderson et al. (2014), we generated 1000 null correlational matrices by permuting the selection coefficients within each episode of selection, calculating all pairwise correlations across episodes, and select-

139 ing the correlation with the greatest absolute value from each. We used these values to construct a null distribution of correlation coefficients within each cohort and set our significance threshold as the 95% quantile of that distribution. Our null model provides some estimate of statistical signif- icance, but because we did not have access to the individual phenotype- and genotype-level data, our model does not account for linkage between sites, as in the full CNAP analysis in Anderson et al. (2014). There were no significant within-generation tradeoffs in the Colorado 2008 or Montana 2009 cohorts. There is a significant tradeoff (r = -0.40) between overwinter survival and fall sur- vival in the Colorado 2009 cohort, but this is the only comparison for which data are available. The Montana 2008 cohort also displayed significant tradeoffs between flowering in 2009 and over- winter survival in 2009-2010 (r = -0.55) and between fruiting in 2009 and overwinter survival in 2009-2010 (r = -0.60). However, these fitness tradeoffs in the Montana 2008 cohort did not lead to lifetime selection coefficients that were of lesser magnitude than episodic selection coefficients. This is likely because selection on flowering and fruiting was quite strong (mean magnitudes of 0.11 and 0.17, respectively), but selection on overwinter survival, though in opposition, was gen- erally too weak (mean magnitude of 0.05) to counteract selection at the earlier, reproductive stage. In summary, some cohorts in the Anderson et al. (2014) study experienced variation in selection across life stages, but these variations did not attenuate lifetime selection coefficients.

A.1.9 Selection on Mendelian phenotypes

Prior to the development of molecular genetic techniques, biologists were unable to directly observe an organism’s genotype. Instead, population genetic studies were restricted to phenotypes that displayed Mendelian inheritance such that underlying allele frequencies could be estimated from phenotype frequencies. Studies of the change in frequency of Mendelian phenotypes laid the foundation for the empirical study of selection in natural populations and sparked much debate about the relative roles of natural selection and genetic drift for driving allele frequency changes (Fisher and Ford, 1947; Kettlewell, 1958; Owen and Clarke, 1993; Cook, 2003).

140 In our literature search, we found 38 studies that reported 336 estimates of selection on alleles underlying a Mendelian phenotype. We chose to analyse these estimates of selection sepa- rately, as they infer selection on alleles by tracking changes in phenotype without directly measur- ing genotypes. Further, for most studies the Mendelian inheritance of the phenotype was assumed or determined with crosses, although in rare cases (e.g., Rosser et al., 2014) the location and iden- tity of the genetic region controlling the phenotype was known. Most estimates of selection came from studies of three organisms that display some of the most famous polymorphisms known to biology: the peppered moth, Biston betularia; the scarlet tiger moth, Panaxia dominula; and the snail Cepaea nemoralis. Data sets on these organisms, especially B. betularia, have been studied intensively since the middle of the 20th century, making it difficult to account for pseudorepli- cation. While we could eliminate strict pseudoreplication (that is, multiple estimates of selection calculated from exactly the same data), more recent studies often incorporated data from earlier studies to calculate selection across different spatial or temporal scales. As such, many estimates of selection for B. Betularia are not independent. The distribution of selection on Mendelian phenotypes was significantly different from the distribution of directional selection coefficients at the genetic level (Kolmogorov–Smirnov test, D = 0.123, p = 1.69×10−4, Figure A.6). While weaker estimates of selection were still most common, the distribution was less exponential. The mean magnitude of s for Mendelian phenotypes was 0.158 (95% CI 0.140-0.176, 10000 bootstrap replicates), significantly greater than the mean s for all selection coefficients at the genetic level (see section 1.4.2). The mean s was similar if we excluded all B. betularia estimates (mean = 0.167, 95% CI 0.144-0.192) or included only the estimates from Mathieson and Mcvean (2013), a recent paper which used a novel hidden Markov model and all available historical data to estimate selection coefficients for the melanic allele in B. betularia (mean = 0.161, 95% CI 0.139-0.185). In our analysis of directional selection coefficients, we hypothesized that strength of selection would be correlated with the proportion of phenotypic variance explained by the genetic unit under selection. Thus, selection would be weaker on SNPs than on haplotypes, as SNP variants would tend to have smaller phenotypic effects. Following

141 this logic, mean s should be greatest for the genetic variants underlying Mendelian phenotypes, which by definition explain nearly all of the variation in a trait. This was indeed the case, although the 95% confidence interval of the estimate for Mendelian phenotypes overlaps with that of the estimate for SNPs in the full data set and haplotypes in the reduced data set. Together, these data suggest that selection on Mendelian phenotypes and their underlying genes is of similar form but generally greater strength than selection on genetic units that explain less phenotypic variation.

A.1.10 Supplemental tables and figures

Field Description study.ID numerical identifier for each study that reports an estimate of s study.organism scientific name of organism in which selection was quantified common.name common name of organism in which selection was quantified taxon.class V = vertebrates, I = invertebrates, P = plant, M = microorganisms unit.selection SNP, QTL, or allele (anything not a SNP or QTL). For analysis, QTLs were lumped with alleles to create the “haplotype” category. target.of.selection the name of the genetic unit at which selection was identified (e.g., a gene name or SNP identifier) sample.size number of individual replicates used in the calculation of selection sample.details explanation of how sample size was determined or calculated generation.time generation time for the organism, if reported time time period of the study, if reported gens.in.study number of generations in study. Either reported, or calculated from other information time.period factor for analysis. Within generation, short (< 200 generations), long (≤ 200 generations), or unknown form.of.selection whether selection was positive (for) or negative (against) the genetic unit s the numerical estimate of s lower.bound the lower bound of the numerical estimate of s, if reported upper bound the upper bound of the numerical estimate of s, if reported sig whether the estimate is statistically significant, not significant, or of unknown significance error.method the type of error bound (e.g., 95% confidence intervals, 95% HPD intervals, etc.). exp.obs whether the study was observational (no manipulation) or experimental. i.calculated NO: authors directly reported s. YES: Authors reported estimates of absolute and/or relative fitness, and we converted these to a measure of s against the less-fit homozygote. method.Class what general method was used to detect and quantify selection? pop.gen.known: From data on changes in allele frequency through time, authors use population genetic equations or simulations to determine the s necessary to drive such changes. When starting and ending allele frequencies are sampled. pop. gen. unknown: As above, but when the initial allele fre- quency is assumed or estimated. fitness.diff: Selection calculated directly from estimates of absolute or relative fitness. sequence: s estimated from features of DNA sequence data (e.g. haplotype structure, the site frequency spectrum). cline: s estimate from allele frequency differences across a cline, assuming mutation/selection balance. method.Details details about the method used to calculate s notes other relevant notes, especially location data.

Table A.1: Description of the directional selection database

142 number of s = 0 added % of dataset Overall distribution Full dataset mean s < 0.135 253 7.41 Reduced dataset mean s < 0.093 137 42.41 Genetic scale Full dataset SNP < haplotype 481 14.08 haplotype < SNP 23 0.67 Reduced dataset haplotype = SNP 84 26.01 haplotype < SNP 553 171.21 Time period Full dataset within < short 1776 51.99 within = long 4847 141.89 within < long 9744 285.25 short = long 89 2.61 short < long 372 10.89 Reduced dataset within = short 1 0.31 within < short 107 33.13 within = long 56 17.34 within < long 343 106.19 short = long 89 27.55 short < long 372 115.17

Table A.2: Results of a sensitivity analysis to determine how many estimates of no selection (s = 0) are necessary to change our conclusions about the strength of selection and how it varies with genetic and temporal scales. The first column lists the “new” conclusion that contradicts our earlier results. The second and third columns show the number and percentage (in terms of the original dataset) of estimates of s = 0 that must be added to the dataset reach the new conclusion.

143 300

200 negative

100

0

300 # of estimates

200 positive

100

0

0.00 0.25 0.50 0.75 1.00 selection coefficient

Figure A.1: The distribution of selection coefficients in the full dataset, categorized by direction of selection (positive or negative).

144 A B 350 350

300 300

250 250 Study Anderson Gompert 200 200 Other

150 150 # of estimates

100 100

50 50

0 0

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 selection coefficient selection coefficient

Figure A.2: Analysis of publication bias: Figure 1.1 of the main text, with selection coefficients binned at 0.01 intervals

145 Full dataset

●● ● ● ● ● Slope = −1.285486e−05

0.8 ● ● ● ● ● Adj. r^2 = 0.0034 ● ● ● ● ● ● ● ● ● ● ● significant

s ● ● ● ● ●● ● ●●●● ● ● ●●● ● ● 0.4 ●●● ● ●●● ●● ● ● ● ●●● ● ●●●● ● ●● ● ●● ●●● ●●● ● ●●●●●● ● ● ●●●●●● ●● ● ● ●●●●●●●● ●● ● ●●● ●●●●●● ● ● ● ●●●●●●●●● ●●● ● ● ● ● ● ● ●●●●●●●●●●●●●● 0.0

0 5000 10000 15000

sample size

Reduced dataset

●● ● ● Slope = 1.24944e−05 0.8 ● ● ● Adj. r^2 = 0.0282 ● ● ● ● significant s ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ●● ● ● ●●● ●● ● ●● ● ● ● ●●●●●● ● ● ●●● ●● ● ● ● ●●●●●●●● ●●● ● ● ● ● ● ● ●●●●●●●●●●●●●● 0.0

0 5000 10000 15000

sample size

Figure A.3: Analysis of publication bias: the relationship between sample size and estimated se- lection coefficient, s, in the full and reduced datasets.

146 ● Slope = −1.4152e−05 0.8 Adj. r^2 = 0.015 ● ● significant ● ● ● ● ● ●● ● 0.4 ● ●● ● ●● ●● ● ● ●● ● ● ●● ● ● ●●●● ● ●● ● ● ●●● size of error bound size ●● ● ● ● ●●●●● ● ● ● ●●●●●●●●●●● ● ● ● 0.0

0 5000 10000 15000

sample size

N < 5000

● Slope = −2.6108e−06 0.8 Adj. r^2 = −0.002 ● ● not significant ● ● ● ● ● ● ● ● 0.4 ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● size of error bound size ●●● ● ● ● ●●● ● ● ● ● ● ●●●●●●●●●●●●●● ●●●● ●● ● ● 0.0

0 1000 2000 3000 4000

sample size

Figure A.4: Analysis of publication bias: the relationship between sample size and precision of estimation of s, for all data which reported error bounds (top) and excluding studies with N > 5000 (bottom).

147 ● 0.8 Slope = 0.6385368 Adj. r^2 = 0.85 significant ● ●

● 0.6 ●

● ● ●

● ● ●● ● ● ● ●

s ● ●● ●●● ●●●●● 0.4 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●●● ●●●●●●● ●●●●●● ●●● ●●●●● ● ●●● ● ● ● ●●●●● ● ● ●● ● ●●● ● ●●●● ●●●●●●●● ● ● ●● ● ●● ● ● ● ● ●●●●● ● ● ● ● ● ●● ●●● ● ●●●● ● ● ● ● ● ●●● ●●●●● ● ●●● ● ●● ● ●●●●●●●●● ●●● ●● ● ●● ●●●●●●●●● ● ●● ● ● 0.2 ● ●●●●●● ● ●●● ● ●●● ●● ● ●●●● ● ● ● ● ●●● ● ● ● ● ●●● ● ●● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ●●● ●●● ●● ●● ● ● ● ● ●●● ● ●●●● ●● ● ● ●●●●● ●●●●●●●●●● ●●●●●● ● 0.0

0.0 0.2 0.4 0.6 0.8 1.0

size of error bound

Figure A.5: Analysis of publication bias: the relationship between the precision of the estimate of s (size of error bounds) and the magnitude of s, for all selection coefficients with error bounds.

148 A B 15 Significant

40 10 n = 115

5

30 0

20 Insignificant

15 n = 105

20 10 5 # of estimates # of estimates

0 10.0

10 7.5 Not reported n = 116

5.0

2.5 0 0.0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 selection coefficient selection coefficient

Figure A.6: The distribution of selection coefficients for Mendelian phenotypes. (A) The distri- bution of all selection coefficients. (B) The distribution of selection coefficients across different categories of statistical significance.

149 A.2 Theory Appendix to Chapter 1

Genetic architecture and the distribution of phenotypic and ge- netic selection coefficients

Sarah P. Otto1, Timothy J. Thurman2,3, Rowan D.H. Barrett2 Author affiliations: 1Department of Zoology, University of British Columbia 2Redpath Museum and Department of Biology, McGill University 3Smithsonian Tropical Research Institute In this appendix, we explore the connection between the distribution of selection coeffi- cients acting at the phenotypic level and at the genetic level. Our key assumption is that, at the phe- notypic level, the distribution of selection coefficents is approximately exponentially distributed, as seen in meta-analyses of phenotypic selection (Kingsolver et al., 2001, 2012):

1 − β f(β) ≈ e µ (A.8) µ

where f(β) is the distribution describing the selective importance of different phenotypic changes, β represents the selection coefficient for a unit change in a particular phenotype, and µ is a scaling constant that represents the average strength of selection acting on a mutation of unit effect. While the unit change typically considered is the phenotypic standard deviation of a trait, in this appendix we measure β on a scale given by the average effect of mutations that alter that phenotype, which itself should be proportional to the phenotypic standard deviation, all else being equal. Another key assumption is that we consider only beneficial mutations and assume that the phenotypic axes are oriented such that positive changes are selectively favored. Epistasis and pleiotropy are also ignored. Recognizing that phenotypic selection must be transmitted to the genetic level through the genetic architecture of the trait(s) under selection, we derive the distribution of genetic selection

150 coefficients by starting with the empirical distribution of phenotypic selection coefficients and working down to the genetic level. We model both a genetic architecture with many uniformly small-effect loci ("infinitesimal" model) and an "exponential-like" genetic architecture (i.e., traits are controlled by one or a few loci with large phenotypic effects and many loci with small pheno- typic effects). We assume that there are a large number of phenotypic traits or dimensions upon which selection can act. A mutation that occurs has an effect size on the phenotype, x, which itself is drawn from a distribution that describes the genetic architecture of the trait under selection. The actual selection on that mutation is thus its phenotypic effect, x, multiplied by the impact of the affected phenotype on fitness, β, the product of which will generate the selective effect of the mutation, s. The selective effect can thus be found by integrating over the distribution of mutational

s effects, after replacing β with x :

Z max(x) 1 − s g(s) = pdf(x) e µx dx (A.9) min(x) µ

where pdf(x) is the probability density function for the genetic effect and g(β) is the prob- ability density function for mutations having selective effect β. By integrating over all possible genetic sizes, equation (A.9) gives the probability density function for genetic mutations that have selective effect s. Although, in reality, the effect sizes of mutations affecting a particular trait may be correlated (i.e., some traits may be influenced by genes of larger effect than others), we ig- nore these correlations and seek the overall distribution of selective effects for mutations across all phenotypes. The final distribution of selective effects, s, will be influenced by the genetic architecture of the phenotype under selection (i.e., the distribution of x). With an "infinitesimal" genetic architec- ture, each locus has a similar, small effect on the phenotype. This may be approximately modeled by a uniform distribution of pdf(x), where:

151 1 pdf(x) = (A.10) max(x) − min(x)

As the units for x, we assume that the average effect of a mutation on the phenotype is

max(x)+min(x) one (i.e., that is how the phenotypes are scaled). Consequently 2 = 1, which, after integrating across the maximum and minimum values of x, can be used to rewrite equation (A.9) as:

−s −s s s µ(2−max(x)) µ max(x) 1 µ(2−max(x)) µ max(x) e (2 − max(x)) − e max(x) − s(Γ0 − Γ0 ) g(s) = µ (A.11) 2µ(1 − max(x))

where Γ represents the incomplete gamma function (Abramowitz and Stegun, 1972). The

(max(x)−1)2 mean of equation (A.11) is µ(1 + 3 ). When all genetic effects are identical, max(x) = average(x) = 1, equation (A.11) simplifies to an exponential distribution:

1 −s g(s) = e µ (A.12) µ

Though this distribution is only exactly exponential in the limit of equal effect sizes, it has an exponential form even when min(x) 6= max(x). Assuming that µ = 0.1 (i.e., the average selection coefficient of a mutant of average effect is 0.1), we can plot the resulting distribution of s (in red), and compare it to an exponential distribution with the same mean (dashed curve):

152 Figure A.7: Distribution of s with uniform genetic architecture and µ = 0.1

Even with the largest range possible for a mean phenotypic effect of 1 (i.e., max(x) = 2), the resulting distribution is quite similar in shape to an exponential distribution:

Figure A.8: Distribution of s with uniform genetic architecture and µ = 1

Thus, if the distribution of selection coefficients acting on phenotypic traits is exponentially distributed, and the distribution of genetic effects of mutations is uniform, then the distribution of selection coefficients of mutations is roughly, though not exactly, expontentially shaped. To quantify the departure, we calculate the coefficient of variation (CV) for distribution (A.11). Even √ in the worst case scenario (max(x) = 2), the CV is 5/2 = 1.12 and remains close to that for an exponential distribution (CV = 1). Next we consider the distribution of selective effects when the distribution of genetic effects

153 is itself exponential, pdf(x) = e−x, where we again set the mean phenotypic effect of mutations to be one, without loss of generality. This corresponds to a model of genetic architecture in which traits are controlled by a few loci of large effect and many loci of small effect. By substituting this density function into equation (A.9), integrating, and simplifying, we obtain:

2 r s r s g(s) = K (2 ) (A.13) µ µ 1 µ

where Kn(z) is the modified Bessel function of the second kind (Abramowitz and Stegun, 1972). We can again set µ = 0.1 and compare this distribution (in red) to an exponential distribu- tion with the same mean (dashed curve):

Figure A.9: Distribution of s with exponential genetic architecture and µ = 1

The resulting distribution is more L-shaped and clearly not perfectly exponential. However, it again follows the general exponential form: there are many small selection coefficients, and few √ large selection coefficients. Furthermore, the CV for this distribution, 2 = 1.4, remains close to that for the exponential distribution, regardless of µ or s.

154 A.3 Supplemental Material for Chapter 2

Supplemental material for: Movement of a Heliconius hybrid zone over 30 years: a Bayesian ap- proach

A.3.1 Details on collection data

For Mallet’s collections, we used the locality and phenotype data presented in Table 4 of Mallet (1986). Mallet sampled 20 sites across Panama, but some of these sites were not on the main transect he used to estimate cline parameters. We included 15 of Mallet’s sites in our analysis. We excluded 5 sites (which Mallet also excluded) for being on the coast and too far from the main transect, on islands off the mainland, or too far east. However, we included two sites (Río Iglesias and Madden Dam) that Mallet excluded from his cline estimates. For Blum’s collections, we extracted locality and phenotype data from Table 1 of Blum (2002). Blum collected at 24 sites, but did not include all sites in his calculation of the cline. We included 22 sites in our analysis, excluding 2 sites (which Blum also excluded) for being on islands off the mainland. We included two sites (Pipeline Road and Madden Dam) that Blum excluded from his cline estimates. Blum did not use all four phenotypic categories, only H. e. hydara, H. e. demophoon, and heterozygotes. N.B.- the of H. erato has changed since the publication of these earlier papers. H. erato demophoon in Panamá was previously considered H. erato petiverana, and goes by that older name in Mallet (1986) and Blum (2002). For the 2015 collections, at some sites we collected at multiple subsites to achieve sufficient sample sizes. Before fitting clines, we tested for genetic differentiation between subsites at the same

site using Fisher’s exact tests on allele counts (pooling the CrWC and CrCA alleles). We found no evidence of differentiation (all P > 0.05, results not shown), and thus combined subsites, using the GPS coordinates of the subsite with the most samples or randomly selected a subsite if sampling was equal.

155 A.3.2 Phenotyping, genotyping, and estimation of allele frequencies

In Panamá there are three alleles at the Cr locus, which we designate: (1) CrHYD, the dom-

inant, black-hindwing allele found in H. e. hydara; (2) CrWC, the recessive, ventral-only yellow

allele found in the west Colombian H. e. venus; and (3) CrCA, the recessive, yellow allele found

in the Central American H. e. demophoon. The dominance relationship is: CrHYD is dominant to

CrWC which is dominant to CrCA (Mallet, 1986). Given this dominance relationship, we can assign genotypes to the four phenotypic classifications: (A) the north Colombian race, H. e. hydara, with fully black hindwings, is homozygous for

CrHYD. (B) Heterozygotes, with black hindwings that display a faint yellow bar on the ventral side,

could be either CrHYD/CrCA or CrHYD/CrWC. These genotypes cannot be distinguished visu- ally. (C) the west Colombian race, H. e. venus, with the yellow hindwing band present only on

the ventral side, could be either CrWC/CrCA or CrWC/CrWC. These genotypes cannot be dis- tinguished visually. (D) the Central American race, H. e. demophoon, with the yellow hindwing band on the

dorsal and ventral sides, is homozygous for CrCA.

Given these phenotype possibilities, the frequency of the CrHYD allele can be directly ob- served as:

2A + B f(Cr ) = (A.14) HYD A + B + C + D

The combined frequency of the yellow alleles, f(Cryel), is simply 1 − f(Crhyd). However, determining which yellow allele(s) are present in a population and calculating their frequencies is

less straightforward. For populations where f(Cryel) > 0, there are 4 possible situations. When neither yellow homozygote is present in the populations, we assume that any heterozygotes have

the CrCA allele, as this is more common in our study populations.

156 In populations with both types of yellow allele (e.g., with both type C and type D individu- als), the frequencies of the individual yellow alleles cannot be directly observed. The appendix of

Mallet (1986) presents a maximum likelihood method for partitioning f(Cryel) into f(CrCA) and f(CrWC ). Assuming the locus is at Hardy-Weinberg equilibrium, the ratio between f(CrCA) and f(CrWC ) can be calculated as:

f(Cr ) f + f (f + f ) CA = D D C D (A.15) f(CrWC ) fC where f(x) is the frequency of allele x and fX is the frequency of phenotype X. Allele frequencies must add up to one such that

f(CrHYD) + f(CrWC ) + f(CrCA) = 1 (A.16)

We can solve equation A.15 for f(CrWC ):

f(CrCA)fC f(CrWC ) = (A.17) fD + fD(fC + fD)

We can then substitue this result into equation A.16 and do some algebra to derive an equation for the frequency of the central American allele:

[fD + fD(fC + fD)](1 − f(CrHYD)) f(CrCA) = 2 (A.18) fC + fD + fDfC + fD

Using this equation, we can calculate allele frequencies for all three alleles. Blum (2002) did not distinguish between West Columbian alleles and central American alleles. But, applying this method to the collections data from Mallet (1986) and this paper, we find that the west Colombian

CrWC allele is rare in our study (tables S3 and S4). Thus, for this study we focus on the dominant

CrHYD allele, which can be directly observed, and pool the rare CrWC yellow allele with the more common CrCA yellow allele, as they cannot be visually distinguished in heterozygotes.

157 A.3.3 Assembling transect

To calculate the one-dimensional distance along the transect, we calculated the arclength of the cubic transect accounting for both the curvature of the transect and the curvature of the earth. To calculate the distance between a and b, we evaluate:

s π Z b  f(x)π 2 R cos + f 0(x)2 dx (A.19) 180 a 180 where R is the radius of the earth at the equator, in km, a and b are longitudes for the two sites, and f(x) is the equation describing the cubic transect. For our calculations, we used R = 6378.137.

A.3.4 Cline model equations

There are a number of parameterizations for cline introgression tails, with slightly differing equations. We use the parameterization from Gay et al. (2008), though our equations are modified to work with clines of increasing allele frequency. We use three equations: one to describe the left tail, one to describe the cline center, and one to describe the right tail. For each model, we include introgression tails as necessary, otherwise the equation for cline center is used to describe the cline. For the mirrored tail model the parameters for the left and right tails are equal, such that

δL = δR = δM and τL = τR = τM . The equations are:

Equation for left tail, (when xi ≤ c − δL):

1 4τL(xi − c + δL)/w pi = pmin + (pmax − pmin) exp (A.20) 4 δL −4δL 1 + e w 1 + e w

Equation for center (when c − δL < xi < c + δR):

4 (xi−c) e w pi = pmin + (pmax − pmin) (A.21) 4 (xi−c) 1 + e w

Equation for right tail (when xi ≥ c + δR):

158   1 −4τR(xi − c − δR)/w pi = pmin + (pmax − pmin) 1 − exp (A.22) 4 δR −4δR 1 + e w 1 + e w

A.3.5 Simulated data and model validation

To test our model, we simulated genotypic data from clines and compared our model esti- mates to the simulated parameters. For each simulated collection site, we used the cline equation without introgression tails (equation 2.3 in the main text, A.21 above) to calculate the expected allele frequency, p, at that site. Then, following equation 2.2 from the main text, we calculated pre- dicted genotype frequencies given the allele frequency, p, and the simulated level of inbreeding,

FIS. From these genotype frequencies, we simulated genotypes of diploid individuals by drawing from the multinomial distribution of genotype frequencies, following equation 2.1 from the main text. Each simulated dataset consisted of 41 collection sites spread at 10km intervals from 0 to 400km along a transect, with 40 individuals collected at each site. We simulated datasets under a variety of parameters. We held the center of the cline constant

at 200km while varying the other parameters: cline width of 20km and 80km, pmin of 0.04 and

0.15, pmax of 0.85 and 0.97, and FIS of 0, 0.1, 0.25, 0.5, 0.75, and 1. There were thus 48 different possible parameter combinations. For each parameter combination we simulated 15 datasets, for a total of 720 simulated datasets. For each simulated dataset, we fit the cline models two ways: (1) using our novel Bayesian model, (2) using a maximum likelihood approach in the R package HZAR (Derryberry et al., 2014) and applying the effective sample size correction of Alexandrino et al. (2005). We refer to these approaches as (1) Bayesian and (2) corrected ML. For the Bayesian approach, we fit the cline model without introgression tails in Stan v2.17.0 and RStan v2.17.3 (Carpenter et al., 2017; Stan Development Team, 2018). We placed weak normal priors on the center N(350, 100) and width N(50, 100), both constrained to be pos- itive. For pmin and pmax, we used uniform priors of U(0, 0.2) and U(0.8, 1), respectively. We fit

159 four independent chains with 3000 iterations of warm-up and 7000 iterations of sampling, for a total of 28000 samples from the posterior distribution. Chains were run in parallel across 4 proces- sor cores. We generated point estimates and credible intervals for each parameter using the mean and 95% highest posterior density interval (HPDI) of the marginal posterior distribution of each parameter. For the corrected ML approach, we fit models in HZAR following the example code given in appendix 1 of Derryberry et al. 2014, but removing unnecessary visualization steps to speed model fitting. We fit only the model without introgression tails (“free.none”). We used default settings for all functions and ran chains in parallel. However, we modified the initialization values for all parameters. The models would often fail to fit using the default initialization values, so instead we drew random starting values for each parameter from the same distributions we used to initialize our Bayesian models: for center, a normal distribution with mean equal to the simulated value and a standard deviation of 20; for width, a normal distribution with mean equal to the simulated

value and a standard deviation of 15; and for pmin and pmax uniform distributions of U(0, 0.2) and U(0.8, 1), respectively. For each parameter, we used the ML value as the point estimate and the lower and upper two-unit log-likelihood limits as the lower and upper confidence intervals (Derryberry et al., 2014). We used the R package tictoc to time each individual instance of model fitting and compare across methods (Izrailev, 2014). All model fitting was done on a Mac Pro with a 3-Ghz, 8 core Intel Xeon processor with 64 GB of RAM. The average time to fit a model using our Bayesian approach was 30 seconds, while the average runtime for the corrected ML approach was 5.81 minutes. To compare model accuracy, we calculated the root-mean-square deviation (RMSD) be- tween the estimated parameter values from our models and the simulated values for each of the 48 combinations of parameters. RMSD is a measure of accuracy, with lower values indicating a more accurate model (i.e., smaller average squared differences between the estimated parameter value and the simulated parameter value). For each of the 48 parameter combinations, we calculated

160 the RMSD of each model and for each cline parameter (RMSD is scale-dependent and cannot be compared across parameters). We used paired t-tests to examine whether the average RMSD across parameter combi- nations differed between our Bayesian approach and the corrected ML approach. Our Bayesian approach had a significantly smaller average RMSD (was more accurate) than the corrected ML approach for the center and width parameters (Table S2). However, these differences in accuracy

were relatively small. For the pmin and pmax parameters the difference in average RMSD between the models was not statistically different from 0 (Table S2). As another measure of model accuracy, we also calculated how often the “true” simulated value of a parameter was included within the confidence intervals estimated by the model. For our Bayesian method, the true simulated value fell within the 95% credible intervals 93.33% of the time. This is a slight improvement over the ML model (92.71%), and indicates that our method better models uncertainty around parameter estimates.

A.3.6 Forest data

We extracted forest data for Panamá from v1.5 of the Global Forest Change dataset of Hansen et al. 2013, found at https://earthenginepartners.appspot.com/science-2013-global-forest/ download_v1.5.html. We downloaded six files:

Filename Year Data

Hansen_GFC-2017-v1.5_first_10N_080W.tif 2000 Landsat multispectral Hansen_GFC-2017-v1.5_first_10N_090W.tif 2000 Landsat multispectral Hansen_GFC-2017-v1.5_last_10N_080W.tif 2017 Landsat multispectral Hansen_GFC-2017-v1.5_last_10N_090W.tif 2017 Landsat multispectral Hansen_GFC-2017-v1.5_lossyear_10N_080W.tif 2000-2017 forest loss Hansen_GFC-2017-v1.5_lossyear_10N_090W.tif 2000-2017 forest loss

Table A.3: Files downloaded from Hansen et al. 2013

161 For each dataset, we used QGIS (QGIS Development Team, 2018) to merge the files for east Panamá (80W) and west Panamá (90W) together, and to crop the images down to include only Panamá. The Hansen et al. 2013 forest loss images encode the data for each pixel as either 0 (no forest loss, where loss is “defined as a stand-replacement disturbance”) or a number from 1-17, representing the year of major forest loss. Thus, to determine the proportion of forest lost within a given area, we calculated 1-(number of pixels with value of 0/total number of pixels). We made those calculations using the raster package in R (Hijmans, 2019). The Hansen et al. 2013 Landsat multispectral images contain data from four bands, with 8-bit, normalized top-of-atmosphere reflectance values for each band (ρ). For NDVI calculation, we used band 3 (red) and band 4 (near infrared, NIR) to calculate NDVI as:

ρ − ρ NDVI = NIR red (A.23) ρNIR + ρred

We calculated NDVI separately for each year (2000 and 2017), using the raster calculator in QGIS (QGIS Development Team, 2018) to calculate NDVI and ∆NDVI from 2000 to 2017

(i.e., ∆NDVI = NDVI2017 - NDVI2000). To find the mean NDVI or mean ∆NDVI within a given area, we used the raster package in R (Hijmans, 2019).

162 A.3.7 Supplemental tables and figures

site.collected coord.N.decdeg coord.W.decdeg A.melanized B.hetero C.west.col D.postman total el valle 8.593233 -80.14108 3 0 0 19 22 cerro campana 8.704150 -79.89185 0 1 0 34 35 sherman 9.248100 -79.94779 0 3 0 28 31 gamboa 9.116050 -79.69837 0 0 0 35 35 tocumen 9.201050 -79.39247 0 4 0 33 37 tapagra 9.166333 -79.20790 3 5 0 9 17 el llano 9.237533 -78.95347 8 16 1 9 34 loma naranjo 9.179817 -78.88455 3 10 0 6 19 corp. bayano 9.207400 -78.82647 1 8 0 14 23 mangowichi 9.137667 -78.68952 20 12 1 3 36 ipeti 8.972917 -78.51060 15 6 0 0 21 agua fria 8.858850 -78.22670 15 8 1 1 25 casa pastoral 8.663683 -78.16152 17 4 1 1 23 puertolara 8.613550 -78.13982 45 5 0 0 50 meteti 8.406767 -77.99898 32 6 0 0 38 IFAD 8.304800 -77.81590 28 2 0 0 30 yaviza 8.204617 -77.71825 27 6 0 0 33

Table A.4: Site names, GPS coordinates (in decimal degrees), and number of samples from each phenotypic class (as defined by Mallet 1986, see section A.3.2).

parameter difference in RMSD T statistic Degrees of freedom P center -0.022 -2.435 47 0.019 width -0.156 -2.037 47 0.047 pmin 0.000 1.416 47 0.163 pmax 0.000 0.424 47 0.674

Table A.5: Results of paired t-tests comparing the RMSD for each model. When difference in RMSD is negative the Bayesian model has a smaller RMSD (is more accurate)

163 Year Site f.HYD f.CA f.WC 1982 El Copé 0.07 0.93 0.00 1982 Madden Dam 0.00 1.00 0.00 1982 Ciudad Panamá 0.01 0.99 0.00 1982 Tocumen 0.15 0.85 0.00 1982 El Llano-Cartí 0.05 0.95 0.00 1982 Bayano 0.04 0.96 0.00 1982 Piriatí 0.12 0.88 0.00 1982 Near Tortí 0.35 0.65 0.00 1982 Cañazas 0.67 0.31 0.02 1982 Quebrada Mono 0.89 0.11 0.00 1982 Río Iglesias 0.85 0.15 0.00 1982 Meteí 0.91 0.09 0.00 1982 Canglón 0.95 0.05 0.00 1982 Near Yaviza 0.91 0.09 0.00 1982 Cana 0.95 0.05 0.00

Table A.6: Maximum-likelihood estimates of the allele frequencies of the three Cr alleles in the 1982 samples.

Year Site f.HYD f.CA f.WC 2015 el valle 0.14 0.86 0.00 2015 cerro campana 0.01 0.99 0.00 2015 sherman 0.05 0.95 0.00 2015 gamboa 0.00 1.00 0.00 2015 tocumen 0.05 0.95 0.00 2015 tapagra 0.32 0.68 0.00 2015 el llano 0.47 0.49 0.04 2015 loma naranjo 0.42 0.58 0.00 2015 corp bayano 0.22 0.78 0.00 2015 mangowichi 0.72 0.21 0.06 2015 ipeti 0.86 0.14 0.00 2015 agua fria 0.76 0.12 0.12 2015 casa pastoral 0.83 0.09 0.08 2015 puertolara 0.95 0.05 0.00 2015 meteti 0.92 0.08 0.00 2015 IFAD 0.97 0.03 0.00 2015 yaviza 0.91 0.09 0.00

Table A.7: Maximum-likelihood estimates of the allele frequencies of the three Cr alleles in the 2015 samples.

164 Introgression tails parameter none left right mirror ind center 516.05 516.42 516.43 516.54 516.67 width 52.87 49.42 50.96 48.64 46.99 pmin 0.07 0.06 0.07 0.06 0.06 pmax 0.91 0.91 0.93 0.92 0.93 deltaL NA 30.43 NA NA 29.10 tauL NA 0.62 NA NA 0.62 deltaR NA NA 26.94 NA 25.22 tauR NA NA 0.59 NA 0.59 deltaM NA NA NA 30.91 NA tauM NA NA NA 0.61 NA

Table A.8: Cline parameter estimates (posterior mean) for all five possible tail models (no intro- gression tails, left tail, right tail, mirrored tails, and independent tails) for 1982.

Introgression tails parameter none left right mirror ind center 467.03 466.48 467.15 466.88 466.59 width 63.59 60.67 59.52 59.13 56.45 pmin 0.04 0.03 0.04 0.03 0.03 pmax 0.92 0.92 0.95 0.94 0.94 deltaL NA 31.82 NA NA 30.60 tauL NA 0.57 NA NA 0.57 deltaR NA NA 20.56 NA 19.66 tauR NA NA 0.57 NA 0.56 deltaM NA NA NA 27.51 NA tauM NA NA NA 0.54 NA

Table A.9: Cline parameter estimates (posterior mean) for all five possible tail models (no intro- gression tails, left tail, right tail, mirrored tails, and independent tails) for 1999.

Introgression tails parameter none left right mirror ind center 450.09 450.00 450.89 451.06 451.37 width 99.45 89.77 93.18 82.75 80.52 pmin 0.04 0.03 0.04 0.03 0.03 pmax 0.91 0.91 0.94 0.92 0.94 deltaL NA 44.23 NA NA 35.57 tauL NA 0.69 NA NA 0.70 deltaR NA NA 24.81 NA 22.38 tauR NA NA 0.66 NA 0.61 deltaM NA NA NA 32.23 NA tauM NA NA NA 0.71 NA

Table A.10: Cline parameter estimates (posterior mean) for all five possible tail models (no intro- gression tails, left tail, right tail, mirrored tails, and independent tails) for 2015.

165 WAIC pWAIC dWAIC weight none 102.80 8.57 0.00 0.28 right 103.19 8.67 0.39 0.23 mirror 103.58 8.77 0.78 0.19 left 103.71 8.82 0.91 0.18 ind 104.42 9.07 1.62 0.12

Table A.11: Table of WAIC comparisons and Akaike weights for the five possible tail models (no introgression tails, left tail, right tail, mirrored tails, and independent tails) for 1982 cline. pWAIC is the effective number of parameters, dWAIC is the difference in WAIC compared with the model with the lowest WAIC, and weight is the Akaike weight.

WAIC pWAIC dWAIC weight right 115.21 9.44 0.00 0.29 ind 115.42 9.64 0.21 0.27 mirror 115.56 9.66 0.36 0.25 left 117.38 10.04 2.17 0.10 none 117.50 9.95 2.29 0.09

Table A.12: Table of WAIC comparisons and Akaike weights for the five possible tail models (no introgression tails, left tail, right tail, mirrored tails, and independent tails) for 1999 cline. pWAIC is the effective number of parameters, dWAIC is the difference in WAIC compared with the model with the lowest WAIC, and weight is the Akaike weight.

WAIC pWAIC dWAIC weight right 157.64 15.37 0.00 0.48 none 158.52 15.26 0.88 0.31 mirror 161.29 16.82 3.65 0.08 left 161.40 16.50 3.76 0.07 ind 161.75 17.51 4.11 0.06

Table A.13: Table of WAIC comparisons and Akaike weights for the five possible tail models (no introgression tails, left tail, right tail, mirrored tails, and independent tails) for 2015 cline. pWAIC is the effective number of parameters, dWAIC is the difference in WAIC compared with the model with the lowest WAIC, and weight is the Akaike weight.

166 0.85 0.90 0.95 0.04 0.08 0.12 100 505 510 515 520 525 40 60 80 iueA1:Taeposo h ln aaeesfrtebs tmdlfrte18 cline 1982 the for model fit best the for parameters cline the of plots Trace A.10: Figure pmax pmin width center 0 0 0 0 2000 2000 2000 2000 167 4000 4000 4000 4000 6000 6000 6000 6000 Chain 4 3 2 1 0.00 0.25 0.50 0.75 1.00 0.80 0.85 0.90 0.95 1.00 0.00 0.03 0.06 0.09 100 150 120 460 465 470 475 480 485 50 40 80 0 iueA1:Taeposo h ln aaeesfrtebs tmdlfrte19 cline 1999 the for model fit best the for parameters cline the of plots Trace A.11: Figure tauR deltaR pmax pmin width center 0 0 0 0 0 0 2000 2000 2000 2000 2000 2000 4000 4000 4000 4000 4000 4000 168 6000 6000 6000 6000 6000 6000 Chain Divergence 4 3 2 1 0.00 0.25 0.50 0.75 1.00 0.85 0.90 0.95 1.00 0.00 0.05 0.10 100 150 200 120 150 430 440 450 460 470 50 60 90 0 iueA1:Taeposo h ln aaeesfrtebs tmdlfrte21 cline 2015 the for model fit best the for parameters cline the of plots Trace A.12: Figure tauR deltaR pmax pmin width center 0 0 0 0 0 0 2000 2000 2000 2000 2000 2000 169 4000 4000 4000 4000 4000 4000 6000 6000 6000 6000 6000 6000 Chain 4 3 2 1 1982 1999 2015 600 for each site along the IS F , across the hybrid zone. Points and IS F 400 170 Distance along transect (km) 200 95% HPDI of the posterior distribution of ± 0

0.50 0.25 0.00 0.50 0.25 0.00 1.00 0.75 1.00 0.75 1.00 0.75 0.50 0.25 0.00 Inbreeding coefficient (Fis) coefficient Inbreeding Figure A.13: Estimates of of the inbreeding coefficient, lines show the mean transect, as estimated from the best-fit cline model for each year.

AA Aa aa

Cana

Near Yaviza Near Canglón ). Sites coloured in

aa

Meteí Río Iglesias Río

, botom =

Quebrada Mono Quebrada

Aa

Cañazas Near Tortí Near , middle = Site

AA Piriatí 171

Bayano

El Llano−Cartí El

Tocumen

Ciudad Panamá Ciudad

Madden Dam Madden El Copé El

0.00 0.75 0.50 0.25 0.25 0.00 1.00 1.00 0.75 0.50 0.50 0.25 0.00 1.00 0.75 Genotype frequency Genotype Figure A.14: Posterior predictiveWithin check each of panel, the vertical bestgenotype lines fit frequency represent at cline the that model 95%are site, (no posterior represented while tails) separately predictive points for in interval show each the of the panel 1982 the (top observed cline. expected = genotype frequency. Genotypes orange show an observed genotype frequency outside of the 95% posterior predictive interval.

AA Aa aa

Near Yaviza Near

Canglón

Quebrada Mono Quebrada Metetí

). Sites coloured in

Agua Fria #1 Fria Agua

aa

Cañazas

Rio Ipetí Rio 20 m east of Bayano of east m 20 , botom =

Aa 10 km east of Bayano of east km 10

Corp. Bayano Corp. Loma de Naranjo de Loma

, middle = El Llano El Site

AA Tocumen 172

Madden Dam Madden

Panama City Panama

Pipeline Road Pipeline

El Valle El

El Copé El

Calobre

Remedios

David Boquete

0.50 0.25 0.00 0.25 0.00 1.00 0.75 0.75 0.50 0.50 0.25 0.00 1.00 1.00 0.75 Genotype frequency Genotype Figure A.15: Posterior predictiveWithin check each of the panel, best verticalgenotype fit lines frequency cline represent at model the that (right 95%are site, tail) posterior represented for while separately predictive the points in interval 1999 show each of cline. the panel the (top observed expected = genotype frequency. Genotypes orange show an observed genotype frequency outside of the 95% posterior predictive interval.

AA Aa aa

yaviza

IFAD meteti

). Sites coloured in

puertolara

aa

casa pastoral casa agua fria agua

, botom =

ipeti Aa

mangowichi corp. bayano corp. , middle =

Site

loma naranjo loma AA 173

el llano el

tapagra

tocumen

gamboa

sherman

cerro campana cerro el valle el

0.25 0.00 0.75 0.50 0.50 0.25 0.00 1.00 1.00 0.75 0.50 0.25 0.00 1.00 0.75 Genotype frequency Genotype Figure A.16: Posterior predictiveWithin check each of the panel, best verticalgenotype fit lines frequency cline represent at model the that (right 95%are site, tail) posterior represented for while separately predictive the points in interval 2015 show each of cline. the panel the (top observed expected = genotype frequency. Genotypes orange show an observed genotype frequency outside of the 95% posterior predictive interval. A 0.3

0.2

2000−2017 0.1 Proportion of forest lost 0.0 0 200 400 600

B

0.8

0.7

0.6 mean NDVI, 2000 mean NDVI, 0.5

0 200 400 600

C

0.7

0.6

0.5 mean NDVI, 2017 mean NDVI, 0.4

0 200 400 600

D −0.05

−0.10 NDVI, 2000−2017 NDVI,

∆ −0.15 mean

0 200 400 600 Distance along transect (km)

Figure A.17: Variation in forest loss, NDVI, and change in NDVI across Panamá. For all panels, each point represents one of 47 circles of radius 5km at 15km intervals along our transect. Y-axes display measurements of forest dynamics within each circle: (A) proportion of forest loss, (B) mean NDVI in 2000, (C) mean NDVI in 2017, (D) mean difference in NDVI from 2000 to 2017. Each panel includes a loess-smoothed line to visualize trends. The vertical lines show the estimated center of the hybrid zone in 2000 (dotted) and 2015 (dashed).

174 A.4 Supplemental Material for Chapter 3

Supplemental material for: Predicting evolutionary change from ecological interactions: a field experiment with Anolis lizards

A.4.1 Ethics

This work was conducted under permits from the BEST Commission and the Bahamas Department of Agriculture. Princeton University’s Institutional Animal Care and Use Committee provided guidance on and approved the study protocols (permits 1922-13 and 1922-F16).

A.4.2 Study organisms

Anolis sagrei- The brown anole, A. sagrei, is a small lizard native to the Bahamas. Anolis species are often classified into different “ecomorphs,” groupings of lizards with common mor- phological, ecological, and behavioral traits which appear to be well-suited for various structural habitats (Williams, 1972; Beuttell and Losos, 1999). A. sagrei are “trunk-ground” ecomorphs: they are generally found on or close to the ground, and have skeletal traits suited to moving on broad perches such as rocks and tree trunks. A. sagrei are sexually dimorphic with males being larger than females (Butler and Losos, 2002). On small islands in the Bahamas, year-to-year survival rates are often extremely low such that A. sagrei are effectively an annual species with 1 generation per year (Calsbeek and Smith, 2007; Cox and Calsbeek, 2010). Anolis smaragdinus- The green anole, A. smaragdinus, is a “trunk-crown” ecomorph. Thus, it has a more arboreal lifestyle than A. sagrei, using narrower perches in taller vegetation, and has morphological traits to match. In both the field and in the literature, A. smaragdinus is often confused with its close relative A. carolinensis, which is native to the southeast United States (Les and Powell, 2014). Leiocephalus carinatus- The curly-tailed lizard, L. carinatus, is a large-bodied, ground dwelling lizard. It is native to the Bahamas, however much less is known about its ecology com- pared to Anolis species. Curly-tailed lizards are known predators of Anolis lizards, both generally

175 (Schoener et al., 1982) and in our experiment specifically (Pringle et al., 2019). They also consume a wide variety of other foods and prey items (Schoener et al., 1982).

A.4.3 Experiment details

Our experimental islands are located in the Exuma chain of the Bahamas near Staniel Cay, an area that has been used for studies of anoles since the 1970s (e.g., Schoener and Schoener, 1983). We selected 16 islands for our study (Table A.14, Figure A.18) that were chosen because they: (1) were small enough to be effectively surveyed, (2) had existing A. sagrei populations (5 islands had been experimentally colonized in the 1970s, the remainder were naturally-occurring), (3) had no other lizard or predator populations, and (4) had vegetation greater than 2m tall (con- sidered necessary to support A. smaragdinus populations, Losos and Spiller, 1999). After an initial population census in 2011, we assigned islands to treatments randomly. We first used Google Earth Pro to calculate the vegetated area of each island (excluding the bare rock around the island edge). We stratified our random assignment to ensure that each treatment contained 2 replicates among the 8 largest islands and two replicates among the 8 smallest. We captured green anoles for transplantation from Staniel Cay (which has no curly-tailed lizards) and captured curly-tailed lizards from nearby Thomas Cay (which has all 3 species present). Transplanted lizards were randomly assigned to islands. We released green anoles in groups of 10 or 11 (equal sex ratio, with the 11th individual a juvenile). We released unsexed curly-tailed lizards in groups of 5-7. These numbers were chosen to mimic colonization by small founder populations.

A.4.4 Census and population estimation

To estimate population sizes, we used a mark-resight procedure developed for Caribbean Anolis (Heckel et al., 1979). A team of 3-6 researchers comprehensively searched islands for all lizard species over three consecutive days. Using a squirt gun, we marked lizards with a day- specific colour of non-toxic, water-soluble paint (blue, red, and yellow for days 1, 2, and 3, re- spectively). Following Pringle et al. (2015), we estimated population size as the mean of the three possible Chapman estimates:

176 (M + 1)(C + 1) n = (A.24) Chap R + 1

where nChap is the Chapman estimate of population size, M and C are the numbers of individuals marked on the first and second visits, and R is the number of marked individuals re-sighted on the second visit. During each census, we recorded data on habitat use for each lizard spotted. Specifically, we noted: (1) the sex of the lizard, (2) whether the lizard was on the ground, (3) perch height (estimated to nearest 5 cm, 0 for individuals found on the ground), (4) perch diameter (estimated to the nearest 0.5 cm, not recorded for individuals on the ground), and (5) maximum vegetation height within a 1-m radius (recorded only 2014-2016).

A.4.5 Quantifying environmental differences across islands

We quantified environmental differences in island size (as measured in Google Earth Pro, section A.4.3), vegetation structure, and density of each species (from population surveys, section A.4.4). To quantify differences in vegetation structure across islands, we surveyed each island in transects and took the following measurements for each island: mean maximum vegetation height, mean maximum perch diameter, mean total number of perches, mean number of perches at 0.75m off the ground, mean number of perches at 1.5m off the ground, the mean maximum perch diam- eter at 0.75m off the ground, and the mean maximum perch diameter at 1.5m off the ground. For our measures of density, we used the estimated density for each species on each island in each year. In total, our analysis of environmental differences included 23 measurements: the 8 measure- ments describing island size and vegetation structure, and the 15 (3 species x 5 years) measures of population density for each lizard. We centered and scaled all variables to a mean of 0 and stan- dard deviation of 1 to remove effects of measurement scale and calculated the Euclidean distance between each islands as our quantification of island (dis)similarity.

177 A.4.6 Analysis of perch use

In Pringle et al. (2019), we reported quantitative analysis of changes in A. sagrei perch height and qualitative analysis of changes in A. sagrei perch diameter throughout the course of the experiment. Here, we re-analyze changes in perch behavior here under a different statistical framework, recapitulating the results from Pringle et al. (2019), but with more nuance. We used generalized linear mixed models (GLMMs) to analyze changes in perch height and diameter. Our analysis of change in perch height is based on 5464 perch observations of A. sagrei from 2011 and 2016. Because we cannot measure the perch diameter of lizards found on the ground, our analysis of perch diameter includes fewer observations (3449 total across 2011 and 2016). As with the analysis of univariate phenotypes, we fit these GLMMs in a Bayesian framework with the R package brms (Bürkner, 2017, 2018). For perch height, we rounded estimates of perch height to the nearest centimeter and used a zero-inflated negative binomial likelihood to account for the excess of individuals found on the ground at a height of 0 cm. We included year (as a factor), sex, presence/absence of predator, presence/absence of competitor, and all possible interactions as main effects, and allowed the in- tercept and effect of sex to vary across islands. We used the same predictor variables to estimate the zero-inflated term. In the syntax of brms, the model equation is thus: height.cm ~ year*sex*pred*comp + (1 + sex | island), zi ~ year*sex*pred*comp + (1 + sex | island) For perch diameter, we converted the raw estimates, in centimeters, to millimeters and rounded to the nearest millimeter. We used a negative binomial likelihood and used the same predictor variables as for perch height (but without the zero-inflated term). For both models, we used uninformative normal priors of N(0, 50) for all coefficients and the brms default priors for all other parameters. We fit four independent chains with 1000 iterations of warmup and 1000 iterations of sampling each. We assessed convergence using the Gelman-Rubin Rˆ statistic (Gelman and Rubin, 1992) and ensured that the ratio of effective samples to total samples for each chain was > 0.1. We assessed statistical significance for the interactions of interest by determining whether

178 the 95% highest posterior density interval (HPDI) for that parameter contained 0. As we previously reported (Pringle et al., 2019), A. sagrei significantly decrease their perch height in response to predators (Table A.15). This is true for both males and females. In females, introduced A. smaragdinus have a significant negative effect on A. sagrei perch height, and there is a significant positive predator:competitor interaction. These effects were not significant in males. And, as shown in (Pringle et al., 2019), A. sagrei were less likely to be found on the ground after the introduction of the terrestrial predators (Table A.16). Although the effects of introduced competitors were non-significant, the predator:competitor interaction was significantly negative in both sexes. Previous work has suggested that increased perch height should lead to decreases in perch diameter (e.g., Losos et al., 2004, 2006). However, we do not see a significant decrease in perch diameter in response to introduced predators in either sex (Table A.17). In fact, none of our ma- nipulations had a statistically significant effect on the change in average perch diameter. Our analysis of perch diameter only considers lizards perching off the ground as we cannot measure the diameter of the ground. For islands with predators, this may tend to underestimate the overall decrease in perch diameter among all lizards, both on and off the ground. That is, if we could include lizards on the ground the average perch diameters for all islands would be larger, and the difference between our (off-ground only) average perch diameter and the on- and off-ground average perch diameter would be greatest on islands with a large proportion of individuals on the ground. Thus we may be underestimating the “true” decrease in average perch diameter due to predators, as predators make A. sagrei less likely to be found on the ground. Nevertheless, our analysis shows that, among lizards perching off the ground, the decrease in perch diameter after the introduction of predators is small and statistically insignificant.

A.4.7 Lizard sampling

On all sampling trips, we captured lizards by hand or with small nooses made of dental floss or thread. In 2016, we recorded habitat use information (e.g., perch height and diameter) for

179 captured lizards. Each lizard was given a unique ID (which contained no treatment information). We kept lizards individually in disposable plastic containers and transported them to Staniel Cay for phenotyping. We anesthetized lizards with isoflurane, then used a portable X-ray to capture an X-ray image of the full lizard for measuring skeletal traits. We also weighed each lizard to the nearest 0.1g. We scanned the underside of each lizard for measurements of toepad traits, though green anoles were not scanned in 2011. Finally, we took a small piece of tail tip (0.5-1.25cm) from each lizard and preserved it either in DMSO or froze it at -20◦C in 100% ethanol for genetic analysis. After leaving the Bahamas tissue samples were stored at -80◦C for long-term storage.

A.4.8 Phenotypic measurement details

T.J.T. performed all skeletal measures of A. sagrei (Figure A.19), while A.C.A.V. counted the number of toepad lamellae for all A. sagrei. We excluded juvenile lizards from phenotypic analysis following Kolbe et al. (2012). We excluded males with SVL < 38.5 mm and females with SVL < 35 mm.

A.4.9 Repeatability of phenotypic measures

To ensure that our phenotypic measurements were repeatable, we measured a subset of X-rays and scans multiple times and estimated repeatability of all trait measurements following Quinn and Keough (2002) and Whitlock and Schluter (2009). Repeatability is equal to:

2 σα 2 2 (A.25) σα + σE

2 2 where σα is the variance among groups (here the groups are the individual lizards) and σE is the variance within groups (Whitlock and Schluter, 2009). Within-group variance is estimated from the mean-squared error (or residual) term of a single-factor, random-effects ANOVA in which the trait measurement is the response variable and lizard ID is the factor (Whitlock and Schluter, 2009).

2 Because X-rays and scans were measured different numbers of times, we estimated σα as:

180 MSgroups − MSresidual P P 2 P (A.26) ( ni − ni / ni)/(p − 1) where ni is the number of measurements of individual i and p is the total number of individuals that were measured multiple times (Quinn and Keough, 2002). X-ray and scan quality varied across years, so we estimated repeatability within each year for each trait. In total, we measured 72 A. sagrei x-rays and 87 A. sagrei scans multiple times. All measurements were highly repeatable (Table A.18).

A.4.10 Univariate linear mixed models

We fit our LMMs using the R package brms. For all phenotypic traits, we included year (as a factor), sex, presence/absence of predator, presence/absence of competitor, and all possible interactions as main effects. We allowed the phenotypic variance, σ, to vary across sexes. For skeletal traits, which need to be corrected for size, we included body size (SVL) and its interaction with sex as covariates, and we allowed these effects to vary across islands. In the syntax of brms, the model equation is: trait.value ~ SVL*sex + sex*pred*comp*year + (1 + SVL*sex | island), sigma ~ sex for the traits which required size correction. To improve fitting and interpretation for these models, we standardized SVL to have a mean of 0 and standard deviation of 1. For the toepad traits and body size (SVL) itself, which do not require size correction, we used the same model but without the SVL terms. In the syntax of brms: trait.value ~ sex*pred*comp*year + (1 + sex | island), sigma ~ sex

For all phenotypic models, we used a normal prior of N(0, 100) for the intercept and a normal prior of N(0, 10) for the main effect of sex. For all other coefficients, we used a normal prior

181 with µ = 0 and σ scaled to be one-half the standard deviation of the trait being analyzed. These weakly informative priors encapsulate our prior belief that the main effects are unlikely to shift trait means by more than 1 standard deviation. For all other parameters, we used the brms default priors. We used the Student’s t-distribution as the likelihood. This is similar to using a Gaussian likelihood, but more robust to possible outliers in the data because the Student’s t-distribution has heavier tails (Gelman and Hill, 2007). Our conclusions were similar when using a Gaussian likelihood (results not shown). We fit four independent chains with 1000 iterations of warmup and 1000 iterations of sam- pling each. We assessed convergence using the Gelman-Rubin Rˆ statistic (Gelman and Rubin, 1992) and ensured that the ratio of effective samples to total samples for each chain was > 0.1. We assessed statistical significance for the interactions of interest by determining whether the 95% highest posterior density interval (HPDI) for that parameter contained 0. Table A.19 summarizes the posterior mean and 95% HPDI for the interactions of interest for all traits.

A.4.11 Size-corrected traits for multivariate analysis

To calculate size-corrected traits for our multivariate phenotypic analysis, we again fit LMMs using brms. As before, we did not correct for body size for number of toepad lamellae (which did not vary with body size) nor for body size (SVL) itself. For all other traits, we fit simi- lar linear models as in our univariate analysis, but without the year, predator, and competitor terms. In the syntax of brms, our model was:

value ~ SVL*sex + (1 + SVL*sex | island), sigma ~ sex All likelihoods, priors, numbers of chains and iterations, and other model fitting procedures were the same as for our univariate trait analysis. We used the fitted models for each trait to predict the phenotype for each individual. We then calculated the residuals between these predicted values and the observed trait value and used these residuals as our size-corrected measures for each trait.

182 A.4.12 Library preparation and sequencing

We extracted DNA from individual lizard tail tips using a standard phenol-chloroform based method. We used individual sterilized razor blades to chop a portion of the tail tip into 0.5mm pieces. Those pieces were left overnight at 55◦C in tissue digestion buffer and proteinase K. From the digested tissue, we used 25:24:1 phenol:choloroform:isoamyl alcohol washes and centrifuga- tion to isolate DNA, then ethanol washes to precipitate DNA. Extracted DNA was resuspended in

molecular-grade water or TE and cleaned using NucleoSpin R gDNA cleanup kits (Takara Bio) according to the manufacturer’s instructions. When selecting samples for library preparation, we distributed individuals from islands and timepoints as evenly as was feasible across libraries and sequencing lanes to avoid batch effects. Overall, we made 29 libraries. For each library, we digested at least 300ng of genomic DNA per sample for three hours using the restriction enzymes NlaIII and MluCI. We performed a bead cleanup using NaCl-PEG diluted SpeedBeads to remove small fragments (Rohland and Re-

ich, 2012), then quantified DNA using the Quant-iTTM PicoGreen R dsDNA Assay Kit (Thermo Fisher Scientific). We normalized individual samples to 90ng using a Biomek FXP liquid handling robot (Beckman Coulter), then ligated adaptors to the digested DNA. We used the 48 individually- barcoded P1 adaptors from Peterson et al. (2012). Our P2 adaptors were custom designed to include degenerate base regions (DBRs), short stretches of random bases which allow PCR duplicates to be bioinformatically removed (which is not otherwise possible with ddRAD sequencing). Our adap- tors were modified slightly from the design of Vendrami et al. (2017). The sequences of the two strands are: flex_DBR_P2.1 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNMMGGACG flex_DBR_P2.2_bio /5Phos/AATTCGTCCIINNNNNNAGATCGGAAGAGCGAGAACAA/3Bio/ where N represents a random base (A,C,T, or G), M is a random base of either A or C, /5Phos/

183 is a phosphate group on the 50 end of the primer, I is inosine, a synthetic base which binds with all 4 natural bases, and /3Bio/ is a biotin tag on the 30 end of the primer. After adaptor ligation we pooled individual samples into single libraries and performed a second bead cleanup to remove small fragments. For each library, we size-selected fragments wit inserts of 400-450bp using a Pip- pinPrep (Sage Science). We performed a bead cleanup with Dynabeads (Thermo Fisher Scientific), which select only fragments with P1-P2 adaptor combinations. We PCR amplified libraries using the AccuPrimeTM Taq DNA Polymerase System (Invitrogen). We divided each library into 4 sepa- rate 50 µl reactions and used a PCR program with 12 amplification cycles to minimize the chance of PCR duplicates. We used the PCR primers and 12 library barcodes from Peterson et al. (2012) as well as 4 custom library barcodes for a total of 16 library barcodes. Combined with the 48 individ- ual barcodes, this allows for multiplexing of up to 768 samples per lane. After PCR amplification, we did one final round of bead cleanup, after which each library was ready for sequencing.

A.4.13 Bioinformatic pipeline

We demultiplexed the raw reads from each library using the process_radtags tool in Stacks v1.46 (Catchen et al., 2011, 2013), using the --inline_index, --disable_rad_check, and -P options. Next, we used the clone_filter tool in Stacks v1.46 to remove potential PCR duplicates. Again, we used the --inline_index option and set the length of our paired-end oligo DBR sequence as 8 bp with the --oligo_len_2 option. We trimmed extra bases from the reverse reads using the fastx_trimmer from the FASTX Toolkit v0.0.14 found at http://hannonlab.cshl.edu/fastx_toolkit/index.html. Finally, we used cutadapt v2.1 (Martin, 2011) to remove possible adaptor contamination, requiring 10 bp of adaptor sequence before trimming (--overlap 10). We aligned these processed reads to the preliminary Anolis sagrei genome using the default settings of the BWA-MEM algorithm in bwa v0.7.15 (Li and Durbin, 2009). We used samtools v1.5 to convert the sam output from bwa into sorted bam files, keeping only alignments with a mapping quality of at least 20 (Li et al., 2009).

184 For our analysis of allele frequency change, we used ANGSD to estimate allele frequencies directly from these bam files (Korneliussen et al., 2014). For each island/year combination, we ran ANGSD with the following options: -P 1: use a single processor; -GL 2: use ANGSD’s imple- mentation of the early GATK method for calculating genotype likelihoods; -doMaf 1: estimate allele frequencies using the method from Kim et al. (2011) with a fixed major allele and known minor allele; -doMajorMinor 4: set the reference allele as the major allele; -uniqueOnly 1: use only reads that map uniquely; -minIndDepth 5: minimum per-individual read depth of 5; -minMapQ 30: minimum read mapping quality of 30; -minQ 20: minimum base quality score of 20; -minMaf 0.05; minimum minor allele frequency of 5%, -SNP_pval 1e-6; only keep SNPs that are likely to be variable according to a likelihood ratio test with p value < 10−6; and -minInd set to ~70% (with rounding) of the total number of individuals per population and timepoint, such that SNPs must be in ~70% of individuals to be retained. For the CMH test, which requires called genotypes, we first created read pileups from the bam files using the samtools mpileup program (Li, 2011). We used the -C 50 flag to adjust mapping quality as suggested in the manual, the -E flag to recalculate base quality, and the -t flag to calculate and add allelic depth (AD), strand bias (SP), and depth (DP) to the output. We only included reads with a minimum mapping quality of 20 (-q 20) and bases with a minimum quality of 30 (-Q 30). We then called variants and calculated genotype posterior probabilities (-f GP) using the multiallelic SNP caller (-m) in bcftools v1.5 (Li, 2011; Danecek et al., 2014). We used vcftools v0.1.14 (Danecek et al., 2011) to filter these raw called variants. We retained only (1) SNPs (--remove-indels); (2) with two alleles (--min-alleles and --max-alleles of 2); (3) a minor allele frequency greater than 5% (--maf 0.05); (4) a min- imum genotype quality score of 25 (--minQ 25); (5) a maximum of 50% missing genotypes across the experiment (--max-missing 0.5); (6) a minimum mean depth per sample of 5 (--min-meanDP 5); and (7) a maximum total depth of 42 (--maxDP 42). After filtering, we retained 85888 SNPs for analysis. We used PLINK v1.9 to perform the CMH test on the four islands in each treatment (Chang

185 et al., 2015; Purcell and Chang, 2017). We used PLINK to convert our filtered VCF file into PLINK bed/bim/fam files, then ran the CMH test in PLINK using the --mh module. We used the --allow-extra-chr and --allow-no-sex flags to override PLINK’s default checks for human chromosome names.

A.4.14 Estimating variance effective population size

We estimated the average variance effective population size, Ne, for each island over the

course of our experiment following Jorde and Ryman (2007). Specifically, we estimated Ne as:

t N = (A.27) e 2F

where t is the number of generations (5) and F is an estimate of the variance in the change in allele frequency (∆p) across timepoints. We estimated F as:

Pa 2 i=1 (xi − yi) F = Pa (A.28) i=1 zi (1 − zi)

where x and y are the frequencies for a given allele at the two timepoints, z is the average frequency

of the allele across timepoints, and i1, i2, ..., iα indexes SNPs.

A.4.15 Genetic mapping

We performed genotype-phenotype association mapping using the Bayesian Sparse Linear Mixed Model (BSLMM) approach of the program GEMMA v0.97. (Zhou and Stephens, 2012; Zhou et al., 2013). Importantly, GEMMA estimates a genetic relatedness matrix to account for popula- tion structure during association mapping (Zhou et al., 2013). We first converted our VCF file of filtered SNPs into a BIMBAM mean genotype file using a custom Perl script (V. Soria-Carrasco, pers. communication). This mean genotype file uses genotype likelihoods to calculate the posterior mean genotype for each individual. The posterior mean genotype is a value between 0 (no copies of the alternate allele) and 2 (2 copies of the reference allele). This approach provides some of the benefits of the more complete likelihood models as implemented in a program like angsd, that

186 is, it accounts for some genotyping uncertainty. Then, for each analyzed trait, we ran GEMMA using the following options: -bslmm 1: use the BSLMM model, -w 5000000: 5 million warmup iterations, -s 25000000: 25 million sampling iterations, -rpace 1000: thinning interval of 1000 to target 25k samples from the posterior total, -wpace 10000: writing interval of 10000, and -seed, the seed for random number generation, set to a different randomly-chosen number for each trait. For samples from which we had genotypic data but could not measure a phenotype, we encoded the phenotype value as missing data but kept all genetic data for use in the calculation of the genetic relatedness matrix. For each SNP, GEMMA calculates a posterior inclusion proba- bility (PIP), the posterior probability that the SNP makes a significant individual contribution to the phenotype, and β, the individual SNP’s effect on the phenotype. We considered a SNP to be significantly associated with a trait if it had a PIP > 0.95.

A.4.16 Genetic parallelism in outlier SNPs

In section 3.4 we reported results for our tests of genetic parallelism considering correla-

tions in allele frequency change across all SNPs shared between islands, rALL. Here we report the

results considering only SNPs in the 5% tails of the allele frequency change in either island, r5%. Such loci may be, or be linked to, loci under natural selection, which may lead to stronger signals of parallelism than when considering all SNPs, the majority of which are expected to be changing mainly due to genetic drift.

The minimum and maximum values of r5% were indeed more extreme than those for rall, ranging from moderate positive correlation (r5% = 0.489) to moderate negative correlation (r5% = -0.356). However, this did not lead to clearer patterns of parallelism across the experiment (Fig- ure A.21). In general, pairwise comparisons which showed significant correlations for rALL had stronger correlations for r5%, while pairwise comparisons that were insignificant for r5% were weaker and still insignificant for r5%. Overall, rALL and r5% were strongly correlated (r = 0.945).

Thus, as with rALL, there is no statistically significant variation in r5% across treatments (ANOVA,

F3,20 = 1.91, p = 0.16), nor are there differences in r5% across all pairwise comparisons for islands

187 from the same or different treatments (ANOVA, F1,118 = 0.06, p = 0.8). Environmental differences were not significantly correlated with r5% (∆env x r5%, r = 0.1, p = 0.3). Finally, neither measure of parallelism at the phenotypic level was significantly correlated with r5% (∆D x r5%, r = 0.12, p

= 0.2; θ x r5%, r = 0, p = 0.96; Figure A.22).

188 A.4.17 Supplemental tables and figures

alias island latitude longitude treatment vegetated avg avg avg avg avg avg avg area max max total perches perches max max (m2) veg perch perches at at 1.5m perch perch height diam 0.75m diam at diam at (cm) (cm) 0.75m 1.5m CON1 936 24.19 -76.47 CON 2772 202.6 5.12 6.88 2.88 1.71 2.00 1.68 CON2 ANDREW 24.17 -76.45 CON 1758 109.6 6.54 6.85 2.08 0.23 1.15 0.27 CON3 332 24.21 -76.50 CON 1450 52.0 2.53 2.33 0.20 0.07 0.13 0.07 CON4 5 24.15 -76.45 CON 1333 79.5 NA 2.69 0.85 0.15 NA NA PRED1 930 24.19 -76.47 PRED 2582 185.1 6.94 9.44 3.88 1.31 3.88 2.41 PRED2 WBC 24.15 -76.45 PRED 1575 122.7 NA 3.73 1.45 0.18 NA NA PRED3 314 24.24 -76.52 PRED 1400 123.2 4.21 5.00 0.75 0.17 1.25 0.21 PRED4 204 24.23 -76.49 PRED 487 138.7 7.79 8.33 2.42 0.42 2.46 0.46 COMP1 311 24.24 -76.49 COMP 2241 132.3 4.56 7.33 1.11 0.06 1.47 0.06 COMP2 6 24.15 -76.45 COMP 1851 101.5 NA 2.86 0.29 0.07 NA NA COMP3 931 24.21 -76.49 COMP 1070 153.9 5.75 5.50 1.83 0.58 2.42 0.54 COMP4 305 24.24 -76.50 COMP 603 91.5 4.12 6.54 1.00 0.00 0.65 0.00 ALL1 926 24.16 -76.47 ALL 3320 123.9 3.58 7.67 1.17 0.17 1.36 0.58 ALL2 922 24.19 -76.47 ALL 1648 142.1 8.42 4.50 1.71 0.50 1.71 0.68 ALL3 1 24.15 -76.47 ALL 1429 82.0 1.77 6.92 1.50 0.08 NA NA ALL4 312 24.24 -76.52 ALL 640 117.1 4.04 6.75 1.17 0.09 0.92 0.08

Table A.14: Names, locations, and characteristics of experimental islands.

189 sex parameter posterior mean lower 95% CI upper 95% CI female pred:year 0.455 0.268 0.659 female comp:year -0.300 -0.489 -0.113 female pred:comp:year 0.684 0.413 0.966 male pred:year 0.428 0.206 0.654 male comp:year 0.029 -0.168 0.247 male pred:comp:year 0.138 -0.179 0.455

Table A.15: GLMM results for the effects of predators, competitors, and their interaction on change in perch height in A. sagrei.

sex parameter posterior mean lower 95% CI upper 95% CI female pred:year -1.035 -1.628 -0.469 female comp:year 0.070 -0.339 0.526 female pred:comp:year -1.363 -2.282 -0.521 male pred:year -1.505 -2.407 -0.472 male comp:year -0.429 -1.015 0.179 male pred:comp:year -29.411 -95.690 -0.314

Table A.16: GLMM results for the effects of predators, competitors, and their interaction on the probability of being on the ground for A. sagrei (the zero-inflated term in our model).

sex parameter posterior mean lower 95% CI upper 95% CI female pred:year -0.203 -0.414 0.019 female comp:year -0.091 -0.295 0.115 female pred:comp:year -0.078 -0.378 0.231 male pred:year -0.135 -0.375 0.115 male comp:year -0.049 -0.274 0.179 male pred:comp:year 0.064 -0.295 0.407

Table A.17: GLMM results for the effects of predators, competitors, and their interaction on the change in perch diameter of A. sagrei.

190 trait 2011 2016 femur 1.000 0.999 fore.limb 1.000 1.000 fore.scale 0.842 0.893 head.length 1.000 0.999 head.width 1.000 0.999 hind.limb 1.000 0.999 hind.scale 0.858 0.914 humerus 0.999 0.998 pectoral.width 1.000 0.999 pelvic.width 0.999 0.999 SVL 1.000 1.000 tibia 1.000 0.998 toe.III 0.998 0.995 toe.IV 1.000 0.999 ulna 1.000 0.998

Table A.18: Estimates of repeatability for measured phenotypic traits across years and species.

191 trait sex parameter posterior mean lower 95% CI upper 95% CI significant

SVL female pred:year -0.713 -1.693 0.196 false SVL female comp:year 0.145 -0.768 1.040 false SVL female pred:comp:year 0.267 -0.994 1.470 false SVL male pred:year -2.070 -3.351 -0.782 true SVL male comp:year -0.572 -1.780 0.671 false SVL male pred:comp:year 0.333 -1.543 2.117 false head width female pred:year -0.088 -0.183 0.002 false head width female comp:year -0.053 -0.137 0.031 false head width female pred:comp:year 0.044 -0.072 0.171 false head width male pred:year -0.037 -0.133 0.062 false head width male comp:year -0.175 -0.267 -0.079 true head width male pred:comp:year 0.102 -0.042 0.244 false head length female pred:year -0.021 -0.127 0.090 false head length female comp:year -0.101 -0.210 0.003 false head length female pred:comp:year 0.122 -0.025 0.268 false head length male pred:year 0.070 -0.064 0.208 false head length male comp:year -0.045 -0.170 0.080 false head length male pred:comp:year 0.188 0.004 0.375 true pectoral width female pred:year -0.008 -0.105 0.099 false pectoral width female comp:year -0.048 -0.146 0.052 false pectoral width female pred:comp:year 0.022 -0.115 0.163 false pectoral width male pred:year 0.037 -0.084 0.149 false pectoral width male comp:year -0.124 -0.231 -0.002 true pectoral width male pred:comp:year 0.029 -0.142 0.182 false

192 trait sex parameter posterior mean lower 95% CI upper 95% CI significant humerus female pred:year -0.076 -0.178 0.030 false humerus female comp:year -0.054 -0.153 0.055 false humerus female pred:comp:year 0.068 -0.076 0.209 false humerus male pred:year 0.050 -0.071 0.186 false humerus male comp:year 0.047 -0.066 0.166 false humerus male pred:comp:year -0.052 -0.227 0.122 false ulna female pred:year 0.013 -0.077 0.092 false ulna female comp:year -0.006 -0.086 0.071 false ulna female pred:comp:year -0.033 -0.146 0.081 false ulna male pred:year 0.033 -0.061 0.122 false ulna male comp:year -0.025 -0.113 0.068 false ulna male pred:comp:year 0.017 -0.114 0.155 false toe III female pred:year 0.003 -0.077 0.086 false toe III female comp:year 0.006 -0.067 0.089 false toe III female pred:comp:year 0.004 -0.104 0.114 false toe III male pred:year -0.026 -0.119 0.068 false toe III male comp:year -0.070 -0.157 0.017 false toe III male pred:comp:year 0.101 -0.029 0.239 false fore limb female pred:year -0.019 -0.264 0.229 false fore limb female comp:year -0.124 -0.347 0.105 false fore limb female pred:comp:year -0.001 -0.314 0.324 false fore limb male pred:year 0.044 -0.229 0.328 false fore limb male comp:year -0.087 -0.340 0.166 false fore limb male pred:comp:year 0.008 -0.381 0.400 false

193 trait sex parameter posterior mean lower 95% CI upper 95% CI significant pelvic width female pred:year 0.050 -0.022 0.126 false pelvic width female comp:year -0.019 -0.089 0.051 false pelvic width female pred:comp:year -0.013 -0.108 0.082 false pelvic width male pred:year -0.013 -0.083 0.056 false pelvic width male comp:year -0.034 -0.104 0.029 false pelvic width male pred:comp:year -0.006 -0.100 0.091 false femur female pred:year 0.072 -0.043 0.201 false femur female comp:year 0.005 -0.109 0.122 false femur female pred:comp:year -0.185 -0.352 -0.026 true femur male pred:year 0.163 0.006 0.320 true femur male comp:year 0.018 -0.131 0.152 false femur male pred:comp:year -0.119 -0.346 0.092 false tibia female pred:year 0.020 -0.100 0.141 false tibia female comp:year -0.015 -0.124 0.098 false tibia female pred:comp:year -0.050 -0.206 0.110 false tibia male pred:year 0.066 -0.060 0.196 false tibia male comp:year 0.058 -0.069 0.172 false tibia male pred:comp:year -0.098 -0.267 0.099 false toe IV female pred:year -0.006 -0.205 0.188 false toe IV female comp:year 0.054 -0.135 0.234 false toe IV female pred:comp:year -0.134 -0.407 0.106 false toe IV male pred:year 0.132 -0.071 0.344 false toe IV male comp:year 0.012 -0.181 0.215 false toe IV male pred:comp:year -0.164 -0.454 0.124 false

194 trait sex parameter posterior mean lower 95% CI upper 95% CI significant hind limb female pred:year 0.090 -0.299 0.538 false hind limb female comp:year 0.092 -0.322 0.474 false hind limb female pred:comp:year -0.390 -0.946 0.167 false hind limb male pred:year 0.361 -0.088 0.847 false hind limb male comp:year 0.136 -0.278 0.575 false hind limb male pred:comp:year -0.407 -1.063 0.235 false scale fore toe female pred:year -0.008 -0.324 0.325 false scale fore toe female comp:year -0.074 -0.368 0.246 false scale fore toe female pred:comp:year -0.020 -0.411 0.398 false scale fore toe male pred:year -0.108 -0.478 0.235 false scale fore toe male comp:year -0.398 -0.734 -0.053 true scale fore toe male pred:comp:year 0.444 -0.041 0.913 false scale hind toe female pred:year -0.310 -0.638 0.043 false scale hind toe female comp:year -0.102 -0.436 0.213 false scale hind toe female pred:comp:year 0.124 -0.306 0.555 false scale hind toe male pred:year -0.639 -1.008 -0.293 true scale hind toe male comp:year -0.406 -0.754 -0.082 true scale hind toe male pred:comp:year 0.806 0.321 1.291 true

Table A.19: Results for LMMs of the effects of predators, competitors, and their interaction on all univariate pheno- types for A. sagrei males and females.

195 alias island Ne CON1 936 109 CON2 ANDREW 104 CON3 332 73 CON4 5 90 PRED1 930 98 PRED2 WBC 129 PRED3 314 79 PRED4 204 44 COMP1 311 48 COMP2 6 89 COMP3 931 97 COMP4 305 98 ALL1 926 145 ALL2 922 78 ALL3 1 88 ALL4 312 107

Table A.20: Estimates of variance effective population size, Ne, for each island.

196 24.225

24.200

24.175

24.150 0 2 4km −76.525 −76.500 −76.475 −76.450 −76.425

Figure A.18: The 16 experimental islands near Staniel Cay in the Exuma Chain of the Bahamas. Islands are coloured by treatment type as in the main text: grey, control islands; blue, predator islands; green, competitor islands; red, all islands.

197 Figure A.19: An example X-ray image of an Anolis sagrei male, labelled with the 13 skeletal traits we measured.

198 20 Islands compared in different treatments in same treatment

15

10 count

5

0

0.0 0.5 1.0 1.5 2.0 2.5 Difference in magnitude (∆D)

Figure A.20: Histogram of difference in magnitude, ∆D, for all possible pairwise comparisons. Colours indicate whether the comparison was between islands within the same treatment or be- tween islands in different treatments.

199 ALL4 0.07 0.01 0.15 −0.14 0.08 −0.08 0.01 −0.1 0.03 0 0.08 −0.03 −0.05 0.01 0.02

ALL3 −0.04 0.11 0.04 −0.01 0.06 0.2 −0.05 −0.03 0.18 0.22 0.06 0.09 0.06 0.11 0.02

ALL2 0.13 −0.04 0.04 −0.09 0.06 −0.06 0.07 0.13 −0.1 −0.19 −0.24 0.02 −0.09 0.11 0.01

ALL1 −0.28 −0.21 −0.29 0.15 0.19 0.09 −0.04 −0.12 −0.07 −0.03 −0.35 0.01 −0.09 0.06 −0.05

COMP4 0.1 0.08 0 0.22 0 0.27 −0.14 0.12 −0.04 0.06 −0.02 0.01 0.02 0.09 −0.03

COMP3 0.34 0.34 0.37 0 −0.24 −0.02 −0.04 0.01 0.19 0.1 −0.02 −0.35 −0.24 0.06 0.08

COMP2 0.13 0.26 −0.02 0.16 −0.12 0.49 −0.22 −0.04 0.34 0.1 0.06 −0.03 −0.19 0.22 0

rall COMP1 0.18 0.32 0.08 0.03 −0.11 0.43 −0.25 −0.01 0.34 0.19 −0.04 −0.07 −0.1 0.18 0.03 0.25 0.00 PRED4 0.08 0.06 0.14 −0.06 −0.01 0.05 0.14 −0.01 −0.04 0.01 0.12 −0.12 0.13 −0.03 −0.1 −0.25

PRED3 −0.17 −0.26 0.07 −0.05 −0.01 −0.36 0.14 −0.25 −0.22 −0.04 −0.14 −0.04 0.07 −0.05 0.01

PRED2 0.19 0.44 0.07 0.34 −0.21 −0.36 0.05 0.43 0.49 −0.02 0.27 0.09 −0.06 0.2 −0.08

PRED1 −0.13 −0.18 −0.07 −0.04 −0.21 −0.01 −0.01 −0.11 −0.12 −0.24 0 0.19 0.06 0.06 0.08

CON4 −0.04 0.08 −0.14 −0.04 0.34 −0.05 −0.06 0.03 0.16 0 0.22 0.15 −0.09 −0.01 −0.14

CON3 0.22 0.2 −0.14 −0.07 0.07 0.07 0.14 0.08 −0.02 0.37 0 −0.29 0.04 0.04 0.15

CON2 0.32 0.2 0.08 −0.18 0.44 −0.26 0.06 0.32 0.26 0.34 0.08 −0.21 −0.04 0.11 0.01

CON1 0.32 0.22 −0.04 −0.13 0.19 −0.17 0.08 0.18 0.13 0.34 0.1 −0.28 0.13 −0.04 0.07

CON1 CON2 CON3 CON4 PRED1 PRED2 PRED3 PRED4 COMP1COMP2COMP3COMP4 ALL1 ALL2 ALL3 ALL4

Figure A.21: Heatmap of the pairwise correlation in allele frequency change for SNPs in the 5% tail of allele frequency change in either island. Shaded colours indicate the strength and direction of correlation for each pairwise comparison (orange if positive, blue if negative, white if 0). The Pearson correlation coefficient for the top 5% of outlier SNPs, r5%, is shown in each cell, printed in black if statistically significant and in grey if non-significant. Islands are grouped by treatment (shown with dotted lines), and the within-island comparisons are in the solid boxes along the diagonal.

200 A B

0.4 0.4 5% 5%

0.2 0.2

0.0 0.0 Genetic parallelism, r Genetic parallelism, −0.2 r Genetic parallelism, −0.2

0.0 0.5 1.0 1.5 2.0 2.5 20 40 60 80 Phenotypic parallelism, magnitude (∆D) Phenotypic parallelism, direction (θ)

Figure A.22: Correlations between measures of parallelism at the phenotypic and genetic levels. A, Correlation between parallelism in magnitude of phenotypic evolution, ∆D, and parallelism in allele frequency change for SNPs in the 5% tail of allele frequency change, r5%. B, Correlation between parallelism in direction of phenotypic evolution, θ, and parallelism in allele frequency change, r5%. Each point is a single pairwise comparison between islands. Orange lines representing linear trends are for visualization purposes only, our statistical analysis examined correlations.

201 Bibliography

Abramowitz, M. and I. A. Stegun (Eds.) (1972). Handbook of Mathematical Functions with Fo- rumlas, Graphs, and Mathematical Tables (10th ed.). Applied Mathetmatics Series 55. New York, USA: US Department of Commerce, National Buerau of Standards.

Adams, D. C. and M. L. Collyer (2009). A general framework for the analysis of phenotypic trajectories in evolutionary studies. Evolution 63(5), 1143–1154.

Alexandrino, J., S. J. E. Baird, L. Lawson, J. R. Macey, C. Moritz, and D. B. Wake (2005). Strong selection against hybrids at a hybrid zone in the Ensatina ring species complex and its evolu- tionary implications. Evolution 59(6), 1334–1347.

Anderson, J. T., C.-R. Lee, and T. Mitchell-Olds (2014). Strong selection genome-wide enhances fitness trade-offs across environments and episodes of selection. Evolution 68(1), 16–31.

Anderson, J. T., C.-R. Lee, C. A. Rushworth, R. I. Colautti, and T. Mitchell-Olds (2013). Genetic trade-offs and conditional neutrality contribute to local adaptation. Molecular Ecology 22(3), 699–708.

Baquero, F. and J. Blázquez (1997). Evolution of antibiotic resistance. Trends in Ecology & Evolution 12(12), 482–487.

Barrett, R. D. H. and A. P. Hendry (2012). Evolutionary rescue under environmental change? In U. Candolin and B. B. Wong (Eds.), Behavioural Responses to a Changing World, pp. 216–233. Oxford: Oxford University Press.

Barrett, R. D. H. and H. E. Hoekstra (2011). Molecular spandrels: tests of adaptation at the genetic level. Nature Reviews Genetics 12(11), 767–780.

Barrett, R. D. H., S. Laurent, R. Mallarino, S. P. Pfeifer, C. C. Y. Xu, M. Foll, K. Wakamatsu, J. S. Duke-Cohan, J. D. Jensen, and H. E. Hoekstra (2019). Linking a mutation to survival in wild mice. Science 363(6426), 499–504.

202 Barrett, R. D. H., L. K. M’Gonigle, and S. P. Otto (2006). The distribution of beneficial mutant effects under strong selection. Genetics 174(4), 2071–2079.

Barrett, R. D. H., S. M. Rogers, and D. Schluter (2008). Natural selection on a major armor gene in threespine stickleback. Science 322(5899), 255–257.

Barrett, R. D. H. and D. Schluter (2008). Adaptation from standing genetic variation. Trends in Ecology & Evolution 23(1), 38–44.

Barton, N. H. (1983). Multilocus clines. Evolution 37(3), 454–471.

Barton, N. H. and K. S. Gale (1993). Genetic analysis of hybrid zones. In R. G. Harrison (Ed.), Hybrid zones and the evolutionary process, pp. 13–45. New York: Oxford University Press.

Barton, N. H. and G. M. Hewitt (1985). Analysis of hybrid zones. Annual Review of Ecology and Systematics 16(1), 113–148.

Barton, N. H. and M. R. Servedio (2015). The interpretation of selection coefficients. Evolu- tion 69(5), 1101–1112.

Bastide, H., A. Betancourt, V. Nolte, R. Tobler, P. Stöbe, A. Futschik, and C. Schlotterer (2013). A genome-wide, fine-scale map of natural pigmentation variation in Drosophila melanogaster. PLoS Genetics 9(6), e1003534.

Beatty, J. (2006). Replaying life’s tape. Journal of Philosophy 103(7), 336–362.

Bell, G. (2013). Evolutionary rescue and the limits of adaptation. Philosophical Transactions of the Royal Society B: Biological Sciences 368(1610), 20120080.

Benson, W. W. (1972). Natural selection for Mullerian mimicry in Heliconius erato in Costa Rica. Science 176(4037), 936–939.

Bérénos, C., P. A. Ellis, J. G. Pilkington, S. H. Lee, J. Gratten, and J. M. Pemberton (2015). Heterogeneity of genetic architecture of body size traits in a free-living population. Molecular Ecology 24(8), 1810–1830.

203 Betancourt, M. (2017). A conceptual introduction to Hamiltonian Monte Carlo. arxiv, 1–60.

Beuttell, K. and J. B. Losos (1999). Ecological morphology of Caribbean anoles. Herpetological Monographs 13, 1–28.

Blount, Z. D., R. E. Lenski, and J. B. Losos (2018). Contingency and determinism in evolution: Replaying life’s tape. Science 362(6415), eaam5979.

Blum, M. J. (2002). Rapid movement of a Heliconius hybrid zone: evidence for phase III of Wright’s shifting balance theory? Evolution 56(10), 1992–1998.

Blum, M. J. (2008). Ecological and genetic associations across a Heliconius hybrid zone. Journal of Evolutionary Biology 21(1), 330–341.

Bolnick, D. I., R. D. H. Barrett, K. B. Oke, D. J. Rennison, and Y. E. Stuart (2018). (Non)parallel evolution. Annual Review of Ecology, Evolution, and Systematics 49(1), 303–330.

Bonneaud, C., E. Marnocha, A. Herrel, B. Vanhooydonck, D. J. Irschick, and T. B. Smith (2015). Developmental plasticity affects sexual size dimorphism in an anole lizard. Functional Ecol- ogy 30(2), 235–243.

Bourret, V., M. Dionne, and L. Bernatchez (2014). Detecting genotypic changes associated with selective mortality at sea in Atlantic salmon: polygenic multilocus analysis surpasses genome scan. Molecular Ecology 23(18), 4444–4457.

Bradburd, G. S., P. L. Ralph, and G. M. Coop (2013). Disentangling the effects of geographic and ecological isolation on genetic differentiation. Evolution 67(11), 3258–3273.

Brodie, III, E. D., A. J. Moore, and F. J. Janzen (1995). Visualizing and quantifying natural selec- tion. Trends in Ecology & Evolution 10(8), 313–318.

Buffalo, V. and G. Coop (2019). The linked selection signature of rapid adaptation in temporal genomic data. bioRxiv, 1–58.

Buggs, R. J. A. (2007). Empirical study of hybrid zone movement. Heredity 99(3), 301–312.

204 Burke, M. K., J. P. Dunham, P. Shahrestani, K. R. Thornton, M. R. Rose, and A. D. Long (2010). Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467(7315), 587–590.

Butler, M. A. and J. B. Losos (2002). Multivariate sexual dimorphism, sexual selection, and adap- tation in Greater Antillean Anolis lizards. Ecological Monographs 72(4), 541–559.

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software 80(1), 1–28.

Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal 10(1), 395–411.

Calsbeek, R. (2009). Experimental evidence that competition and habitat use shape the individual fitness surface. Journal of Evolutionary Biology 22(1), 97–108.

Calsbeek, R. and T. B. Smith (2007). Probing the adaptive landscape using experimental islands: density-dependent natural selection on lizard body size. Evolution 61(5), 1052–1061.

Carlson, T. N. and D. A. Ripley (1997). On the relation between NDVI, fractional vegetation cover, and leaf area index. Remote Sensing of Environment 62(3), 241–252.

Carpenter, B., A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell (2017). Stan: a probabilistic programming language. Journal of Statistical Software 76(1), 1–32.

Carroll, S. P. (2007). Natives adapting to invasive species: ecology, genes, and the sustainability of conservation. Ecological Research 22(6), 892–901.

Carroll, S. P., P. S. Jørgensen, M. T. Kinnison, C. T. Bergstrom, R. F. Denison, P. Gluckman, T. B. Smith, S. Y. Strauss, and B. E. Tabashnik (2014). Applying evolutionary biology to address global challenges. Science 346(6207), 1245993.

205 Catchen, J., P. A. Hohenlohe, S. Bassham, A. Amores, and W. A. Cresko (2013). Stacks: an analysis tool set for population genomics. Molecular Ecology 22(11), 3124–3140.

Catchen, J. M., A. Amores, P. Hohenlohe, W. Cresko, and J. H. Postlethwait (2011). Stacks: building and genotyping loci de novo from short-read sequences. G3 1(3), 171–182.

Chang, C. C., C. C. Chow, L. C. Tellier, S. Vattikuti, S. M. Purcell, and J. J. Lee (2015). Second- generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7.

Charlesworth, B., M. T. Morgan, and D. Charlesworth (1993). The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289–1303.

Charlesworth, B. and S. I. Wright (2004). The HKA test revisited: a maximum-likelihood-ratio test of the standard neutral model. Genetics 168(2), 1071–1076.

Chen, H. and M. Slatkin (2013). Inferring selection intensity and allele age from multilocus hap- lotype structure. G3 3(8), 1429–1442.

Chouteau, M., M. Arias, and M. Joron (2016). Warning signals are under positive frequency- dependent selection in nature. Proceedings of the National Academy of Sciences 113, 2164– 2169.

Collins, S., J. De Meaux, and C. Acquisti (2007). Adaptive walks toward a moving optimum. Genetics 176(2), 1089–1099.

Collyer, M. L. and D. C. Adams (2007). Analysis of two-state multivariate phenotypic change in ecological studies. Ecology 88(3), 683–692.

Comeault, A. A., V. Soria-Carrasco, Z. Gompert, T. E. Farkas, C. A. Buerkle, T. L. Parchman, and P. Nosil (2014). Genome-wide association mapping of phenotypic traits subject to a range of intensities of natural selection in Timema cristinae. The American Naturalist 183(5), 711–727.

Cook, L. M. (2003). The rise and fall of the carbonaria form of the peppered moth. The Quarterly Review of Biology 78(4), 399–417.

206 Cox, R. M. and R. Calsbeek (2010). Sex-specific selection and intraspecific variation in sexual size dimorphism. Evolution 64(3), 798–809.

Danecek, P., A. Auton, G. R. Abecasis, C. A. Albers, E. Banks, M. A. DePristo, R. E. Handsaker, G. Lunter, G. T. Marth, S. T. Sherry, G. McVean, R. Durbin, and 1000 Genomes Project Analysis Group (2011). The variant call format and VCFtools. Bioinformatics 27(15), 2156–2158.

Danecek, P., S. S, and R. Durbin (2014). Multiallelic calling model in bcftools (-m).

Darwin, C. and A. Wallace (1858). On the tendency of species to form varieties; and on the perpetuation of varieties and species by natural means of selection. Zoological Journal of the Linnean Society 3(9), 45–62.

Dasmahapatra, K. K., M. J. Blum, A. Aiello, S. Hackwell, N. Davies, E. P. Bermingham, and J. Mallet (2002). Inferences from a rapidly moving hybrid zone. Evolution 56(4), 741–753.

Delahaie, B., J. Cornuault, C. Masson, J. A. M. Bertrand, Y. X. C. Bourgeois, B. Milá, and C. Thébaud (2017). Narrow hybrid zones in spite of very low population differentiation in neutral markers in an island bird species complex. Journal of Evolutionary Biology 30(12), 2132–2145.

Delignette-Muller, M. L. and C. Dutang (2015). fitdistrplus: An R Package for Fitting Distribu- tions. Journal of Statistical Software 64(4), 1–34.

Derryberry, E. P., G. E. Derryberry, J. M. Maley, and R. T. Brumfield (2014). HZAR: hybrid zone analysis using an R software package. Molecular Ecology Resources 14(3), 652–663.

Ebert, D., C. Haag, M. Kirkpatrick, M. Riek, J. W. Hottinger, and V. I. Pajunen (2002). A selective advantage to immigrant genes in a Daphnia metapopulation. Science 295(5554), 485–488.

Ellstrand, N. C. (1983). Why are juveniles smaller than their parents? Evolution 37(5), 1091–1094.

Elstrott, J. and D. J. Irschick (2004). Evolutionary correlations among morphology, habitat use

207 and clinging performance in Caribbean Anolis lizards. Biological Journal of the Linnean Soci- ety 83(3), 389–398.

Endler, J. A. (1986). Natural Selection in the Wild. Princeton University Press.

Estes, S. and S. J. Arnold (2007). Resolving the paradox of stasis: models with stabilizing selection explain evolutionary divergence on all timescales. The American Naturalist 169(2), 227–244.

Excoffier, L., T. Hofer, and M. Foll (2009). Detecting loci under selection in a hierarchically structured population. Heredity 103(4), 285–298.

Exposito-Alonso, M., 500 Genomes Field Experiment Team, H. A. Burbano, O. Bossdorf, R. Nielsen, and D. Weigel (2019). Natural selection on the Arabidopsis thaliana genome in present and future climates. Nature 348, 571–575.

Eyre-Walker, A. and P. D. Keightley (2007). The distribution of fitness effects of new mutations. Nature Reviews Genetics 8(8), 610–618.

Feldman, C. R., A. M. Durso, C. T. Hanifin, M. E. Pfrender, P. K. Ducey, A. N. Stokes, K. E. Barnett, E. D. Brodie III, and E. D. Brodie Jr (2016). Is there more than one way to skin a newt?; Convergent toxin resistance in snakes is not due to a common genetic mechanism. Heredity 116(1), 84–91.

Fijarczyk, A. and W. Babik (2015). Detecting balancing selection in genomes: limits and prospects. Molecular Ecology 24(14), 3529–3545.

Fisher, R. A. and E. B. Ford (1947). The spread of a gene in natural conditions in a colony of the moth Panaxia dominula L. Heredity 1(2), 143–174.

Foll, M., H. Shim, and J. D. Jensen (2015). WFABC: a Wright-Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data. Molecular Ecology Resources 15(1), 87–98.

208 Forbes, C., M. Evans, N. Hastings, and B. Peacock (Eds.) (2011). Statistical Distributions (4th ed.). Hoboken, USA: John Wiley & Sons, Inc.

Fugère, V. and A. P. Hendry (2018). Human influences on the strength of phenotypic selection. Proceedings of the National Academy of Sciences 16, 201806013.

Garrigan, D. and P. W. Hedrick (2003). Perspective: detecting adaptive molecular polymorphism: lessons from the MHC. Evolution 57(8), 1707–1722.

Gay, L., P. A. Crochet, D. A. Bell, and T. Lenormand (2008). Comparing clines on molecular and phenotypic traits in hybrid zones: a window on tension zone models. Evolution 62(11), 2789–2806.

Gelman, A. and J. Hill (2007). Data analysis using regression and multilevel/hierarchical models. New York, USA: Cambridge University Press.

Gelman, A. and D. B. Rubin (1992). Inference from iterative simulation using multiple sequences. Statistical Science 7(4), 457–472.

Gerbault, P., C. Moret, M. Currat, and A. Sanchez-Mazas (2009). Impact of selection and demog- raphy on the diffusion of lactase persistence. PLoS ONE 4(7), e6369.

Gingerich, P. D. (1983). Rates of evolution: effects of time and temporal scaling. Sci- ence 222(4620), 159–161.

Glossip, D. and J. B. Losos (1997). Ecological correlates of number of subdigital lamellae in Anoles. Herpetologica 53(2), 192–199.

Gompert, Z. (2016). Bayesian inference of selection in a heterogeneous environment from genetic time-series data. Molecular Ecology 25(1), 121–134.

Gompert, Z. and C. A. Buerkle (2012). bgc: Software for Bayesian estimation of genomic clines. Molecular Ecology Resources 12(6), 1168–1176.

Gompert, Z., A. A. Comeault, T. E. Farkas, J. L. Feder, T. L. Parchman, C. A. Buerkle, and P. Nosil

209 (2014). Experimental evidence for ecological selection on genome variation in the wild. Ecology Letters 17(3), 369–379.

Gompert, Z., S. P. Egan, R. D. H. Barrett, J. L. Feder, and P. Nosil (2017). Multilocus approaches for the measurement of selection on correlated genetic loci. Molecular Ecology 26(1), 365–382.

Gompert, Z., E. G. Mandeville, and C. A. Buerkle (2017). Analysis of population genomic data from hybrid zones. Annual Review of Ecology, Evolution, and Systematics 48(1), 207–229.

Good, B. H., M. J. McDonald, J. E. Barrick, R. E. Lenski, and M. M. Desai (2017). The dynamics of molecular evolution over 60,000 generations. Nature 551, 45–50.

Goodwin, S., J. D. McPherson, and W. R. McCombie (2016). Coming of age: ten years of next- generation sequencing technologies. Nature Reviews Genetics 17(6), 333–351.

Gould, B. A. and J. R. Stinchcombe (2017). Population genomic scans suggest novel genes under- lie convergent flowering time evolution in the introduced range of Arabidopsis thaliana. Molec- ular Ecology 26(1), 92–106.

Gould, S. J. (1989). Wonderful life: the Burgess Shale and the nature of history. New York, USA: W.W. Norton & Company.

Gould, S. J. and R. C. Lewontin (1979). The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist programme. Proceedings of the Royal Society B 205, 581–598.

Gurevitch, J. and L. V. Hedges (1999). Statistical issues in ecological meta-analyses. Ecol- ogy 80(4), 1142–1149.

Hadfield, J. D. (2010). MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. Journal of Statistical Software 33(2), 1–22.

Haldane, J. B. S. (1948). The theory of a cline. Journal of Genetics 48(3), 277–284.

Haller, B. C. and A. P. Hendry (2014). Solving the paradox of stasis: squashed stabilizing selection and the limits of detection. Evolution 68(2), 483–500.

210 Halsey, L. G., D. Curran-Everett, S. L. Vowler, and G. B. Drummond (2015). The fickle P value generates irreproducible results. Nature Methods 12(3), 179–185.

Hansen, M. C., P. V. Potapov, R. Moore, M. Hancher, S. A. Turubanova, A. Tyukavina, D. Thau, S. V. Stehman, S. J. Goetz, T. R. Loveland, A. Kommareddy, A. Egorov, L. Chini, C. O. Jus- tice, and J. R. G. Townshend (2013). High-resolution global maps of 21st-century forest cover change. Science 342(6160), 850–853.

Hansen, T. F. (2006). The evolution of genetic architecture. Annual Review of Ecology, Evolution, and Systematics 37(1), 123–157.

Harrison, R. G. (1990). Hybrid zones: windows on evolutionary process. In D. J. Futuyma and J. Antonovics (Eds.), Oxford Surveys in Evolutionary Biology, pp. 69–128. New York: Oxford University Press.

Hartl, D. L. and A. G. Clark (1997). Principles of Population Genetics (3rd ed.). Sunderland, MA: Sinauer Associates, Inc.

Heckel, D. G., J. R. Ecology, and 1979 (1979). A technique for estimating the size of lizard populations. Ecology 60(5), 966–975.

Hedrick, P. W. (2012). What is the evidence for heterozygote advantage selection? Trends in Ecology & Evolution 27(12), 698–704.

Hendry, A. P. and A. Gonzalez (2008). Whither adaptation? Biology & Philosophy 23(5), 673–699.

Hendry, A. P., D. J. Schoen, M. E. Wolak, and J. M. Reid (2018). The contemporary evolution of fitness. Annual Review of Ecology, Evolution, and Systematics 49(1), 457–476.

Hereford, J. (2009). A quantitative survey of local adaptation and fitness trade-offs. The American Naturalist 173(5), 579–588.

Hereford, J., T. F. Hansen, and D. Houle (2004). Comparing strengths of directional selection: how strong is strong? Evolution 58(10), 2133–2143.

211 Hermisson, J. and P. S. Pennings (2005). Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics 169(4), 2335–2352.

Herrel, A., R. Joachim, B. Vanhooydonck, and D. J. Irschick (2006). Ecological consequences of ontogenetic changes in head shape and bite performance in the Jamaican lizard Anolis lineato- pus. Biological Journal of the Linnean Society 89(3), 443–454.

Hijmans, R. J. (2019). raster: Geographic data analysis and modeling. R package version 2.8-19.

Hoekstra, H. E., K. E. Drumm, and M. W. Nachman (2004). Ecological genetics of adaptive color polymorphism in pocket mice: geographic variation in selected and neutral genes. Evolu- tion 58(6), 1329–1341.

Hoekstra, H. E., J. M. Hoekstra, D. Berrigan, S. N. Vignieri, A. Hoang, C. E. Hill, P. Beerli, and J. G. Kingsolver (2001). Strength and tempo of directional selection in the wild. Proceedings of the National Academy of Sciences 98(16), 9157–9160.

Hoffman, M. D. and A. Gelman (2011). The No-U-Turn Sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. arXiv.org.

Hohenlohe, P. A. and W. A. Cresko (2010). Using population genomics to detect selection in natural populations: key concepts and methodological considerations. International Journal of Plant Sciences 171(9), 1059–1071.

Houle, D., D. R. Govindaraju, and S. Omholt (2010). Phenomics: the next challenge. Nature Reviews Genetics 11(12), 855–866.

Hunter, E. A., M. D. Matocq, P. J. Murphy, and K. T. Shoemaker (2017). Differential effects of climate on survival rates drive hybrid zone movement. Current Biology 27(24), 3898–3903.

Izrailev, S. (2014). tictoc: Functions for timing r scripts, as well as implementations of stack and list structures. R package version 1.0.

212 Jorde, P. E. and N. Ryman (2007). Unbiased estimator for genetic drift and effective population size. Genetics 177(2), 927–935.

Kapan, D. D. (2001). Three-butterfly system provides a field test of müllerian mimicry. Na- ture 409(6818), 338–340.

Kawecki, T. J. and D. Ebert (2004). Conceptual issues in local adaptation. Ecology Letters 7(12), 1225–1241.

Kawecki, T. J., R. E. Lenski, D. Ebert, B. Hollis, I. Olivieri, and M. C. Whitlock (2012). Experi- mental evolution. Trends in Ecology & Evolution 27(10), 547–560.

Kettlewell, H. (1958). A survey of the frequencies of Biston betularia (L.)(Lep.) and its melanic forms in Great Britain. Heredity 12(1), 51–72.

Kim, S. Y., K. E. Lohmueller, A. Albrechtsen, Y. Li, T. Korneliussen, G. Tian, N. Grarup, T. Jiang, G. Andersen, D. Witte, T. Jørgensen, T. Hansen, O. Pedersen, J. Wang, and R. Nielsen (2011). Estimation of allele frequency and association mapping using next-generation sequencing data. BMC Bioinformatics 12(1), 231.

Kingsolver, J. G. and S. E. Diamond (2011). Phenotypic selection in natural populations: what limits directional selection? The American Naturalist 177(3), 346–357.

Kingsolver, J. G., S. E. Diamond, A. M. Siepielski, and S. M. Carlson (2012). Synthetic anal- yses of phenotypic selection in natural populations: lessons, limitations and future directions. Evolutionary Ecology 26(5), 1101–1118.

Kingsolver, J. G., H. E. Hoekstra, J. M. Hoekstra, D. Berrigan, S. N. Vignieri, C. E. Hill, A. Hoang, P. Gibert, and P. Beerli (2001). The strength of phenotypic selection in natural populations. The American Naturalist 157(3), 245–261.

Kinnison, M. T. and A. P. Hendry (2001). The pace of modern life II: from rates of contemporary microevolution to pattern and process. Genetica 112-113, 145–164.

213 Kolbe, J. J., M. Leal, T. W. Schoener, D. A. Spiller, and J. B. Losos (2012). Founder effects persist despite adaptive differentiation: a field experiment with lizards. Science 335(6072), 1086–1089.

Kopp, M. and J. Hermisson (2007). Adaptation of a quantitative trait to a moving optimum. Ge- netics 176(1), 715–719.

Kopp, M. and J. Hermisson (2009a). The genetic basis of phenotypic adaptation I: fixation of beneficial mutations in the moving optimum model. Genetics 182(1), 233–249.

Kopp, M. and J. Hermisson (2009b). The genetic basis of phenotypic adaptation II: the distribution of adaptive substitutions in the moving optimum model. Genetics 183(4), 1453–1476.

Korneliussen, T. S., A. Albrechtsen, and R. Nielsen (2014). ANGSD: Analysis of next generation sequencing data. BMC Bioinformatics 15(1), 356.

Kreitman, M. and H. Akashi (1995). Molecular evidence for natural selection. Annual Review of Ecology and Systematics 26(1), 403–422.

Kryazhimskiy, S., G. Tkacik, and J. B. Plotkin (2009). The dynamics of adaptation on correlated fitness landscapes. Proceedings of the National Academy of Sciences of the United States of America 106(44), 18638–18643.

Labbé, P., C. Berticat, A. Berthomieu, S. Unal, C. Bernard, M. Weill, and T. Lenormand (2007). Forty years of erratic insecticide resistance evolutionin the mosquito Culex pipiens. PLoS Ge- netics 3(11), e205.

Lande, R. and S. J. Arnold (1983). The measurement of selection on correlated characters. Evolu- tion 37(6), 1210–1226.

Lapiedra, O., T. W. Schoener, M. Leal, J. B. Losos, and J. J. Kolbe (2018). Predator-driven natural selection on risk-taking behavior in anole lizards. Science 360, 1017–1020.

Leaché, A. D., J. A. Grummer, R. B. Harris, and I. K. Breckheimer (2017). Evidence for concerted

214 movement of nuclear and mitochondrial clines in a lizard hybrid zone. Molecular Ecology 26(8), 2306–2316.

Lee, Y. W., B. A. Gould, and J. R. Stinchcombe (2014). Identifying the genes underlying quanti- tative traits: a rationale for the QTN programme. AoB PLANTS 6, plu004.

Leimu, R. and M. Fischer (2008). A meta-analysis of local adaptation in plants. PLoS ONE 3(12), e4010.

Lenormand, T., D. Bourguet, T. Guillemaud, and M. Raymond (1999). Tracking the evolution of insecticide resistance in the mosquito Culex pipiens. Nature 400(6747), 861–864.

Lenormand, T., T. Guillemaud, D. Bourguet, and M. Raymond (1998). Evaluating gene flow using selected markers: a case study. Genetics 149(3), 1383–1392.

Lenski, R. E., M. R. Rose, S. C. Simpson, and S. C. Tadler (1991). Long-term experimental evolu- tion in Escherichia coli. I. Adaptation and divergence during 2,000 generations. The American Naturalist 138(6), 1315–1341.

Les, A. M. and R. Powell (2014). Anolis smaragdinus. Catalogue of American Amphibians and Reptiles 902, 1–14.

Lewontin, R. C. and J. L. Hubby (1966). A molecular approach to the study of genic heterozy- gosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura. Genetics 54, 595–609.

Li, H. (2011). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27(21), 2987–2993.

Li, H. and R. Durbin (2009). Fast and accurate short read alignment with Burrows-Wheeler trans- form. Bioinformatics 25(14), 1754–1760.

Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. R. Abecasis,

215 R. Durbin, and 1000 Genome Project Data Processing Subgroup (2009). The Sequence Align- ment/Map format and SAMtools. Bioinformatics 25(16), 2078–2079.

Li, J., H. Li, M. Jakobsson, S. Li, P. Sjodin, and M. Lascoux (2012). Joint analysis of demography and selection in population genetics: where do we stand and where could we go? Molecular Ecology 21(1), 28–44.

Linnen, C. R. and H. E. Hoekstra (2009). Measuring natural selection on genotypes and phenotypes in the wild. Cold Spring Harbor Symposia on Quantitative Biology 74, 155–168.

Losos, J. B. (1990a). Ecomorphology, performance capability, and scaling of West Indian Anolis lizards: an evolutionary analysis. Ecological Monographs 60(3), 369–388.

Losos, J. B. (1990b). The evolution of form and function: morphology and locomotor performance in West Indian Anolis lizards. Evolution 44(5), 1189–1203.

Losos, J. B. (2009). Lizards in an evolutionary tree. Ecology and adaptive radiation of anoles. Berkeley and Los Angeles, California: University of California Press.

Losos, J. B. (2011). Convergence, adaptation, and constraint. Evolution 65(7), 1827–1840.

Losos, J. B. and D. J. Irschick (1996). The effect of perch diameter on escape behaviour of Anolis lizards: laboratory predictions and field tests. Animal Behaviour 51, 593–602.

Losos, J. B., T. W. Schoener, R. B. Langerhans, and D. A. Spiller (2006). Rapid temporal reversal in predator-driven natural selection. Science 314(5802), 1111.

Losos, J. B., T. W. Schoener, and D. A. Spiller (2004). Predator-induced behaviour shifts and natural selection in field-experimental lizard populations. Nature 432(7016), 505–508.

Losos, J. B. and B. Sinervo (1989). The effects of morphology and perch diameter on sprint performance of Anolis lizards. Journal of Experimental Biology 145(1), 23–30.

Losos, J. B. and D. A. Spiller (1999). Differential colonization success and asymmetrical interac- tions between two lizard species. Ecology 80(1), 252–258.

216 Losos, J. B., K. I. Warheit, and T. W. Schoener (1997). Adaptive differentiation following experi- mental island colonization in Anolis lizards. Nature 387, 70–73.

Lucki, N. C. and C. W. Nicolay (2007). Phenotypic plasticity and functional asymmetry in re- sponse to grip forces exerted by intercollegiate tennis players. American Journal of Human Biology 19(4), 566–577.

Macholán, M., S. J. Baird, P. Munclinger, P. Dufková, B. Bímová, and J. Piálek (2008). Genetic conflict outweighs heterogametic incompatibility in the mouse hybrid zone? BMC Evolutionary Biology 8(1), 271–14.

Mallet, J. (1986). Hybrid zones of Heliconius butterflies in Panama and the stability and movement of warning colour clines. Heredity 56(2), 191–202.

Mallet, J. and N. Barton (1989a). Inference from clines stabilized by frequency-dependent selec- tion. Genetics 122(4), 967–976.

Mallet, J., N. Barton, G. Lamas, J. Santisteban, M. Muedas, and H. Eeley (1990). Estimates of selection and gene flow from measures of cline width and linkage disequilibrium in Heliconius hybrid zones. Genetics 124(4), 921–936.

Mallet, J. and N. H. Barton (1989b). Strong natural selection in a warning-color hybrid zone. Evolution 43(2), 421–431.

Martin, G. and T. Lenormand (2006). A general multivariate extension of Fisher’s geometrical model and the distribution of mutation fitness effects across species. Evolution 60(5), 893–907.

Martin, G. and T. Lenormand (2015). The fitness effect of mutations across environments: Fisher’s geometrical model with multiple optima. Evolution 69(6), 1433–1447.

Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17(1), 10–12.

217 Mathieson, I. and G. Mcvean (2013). Estimating selection coefficients in spatially structured pop- ulations from time series data of allele frequencies. Genetics 193, 973–984.

McElreath, R. (2015). Statistical rethinking: a Bayesian course with examples in R and Stan. Boca Raton, FL: CRC Press.

Meir, J. U. and P. J. Ponganis (2009). High-affinity hemoglobin and blood oxygen saturation in diving emperor penguins. Journal of Experimental Biology 212(20), 3330–3338.

Merrill, R. M., R. W. R. Wallbank, V. Bull, P. C. A. Salazar, J. Mallet, M. Stevens, and C. D. Jiggins (2012). Disruptive ecological selection on a mating cue. Proceedings of the Royal Society B: Biological Sciences 279(1749), 4907–4913.

Messer, P. W. and R. A. Neher (2012). Estimating the strength of selective sweeps from deep population diversity data. Genetics 191(2), 593–605.

Mettler, R. D. and G. M. Spellman (2009). A hybrid zone revisited: molecular and morphological analysis of the maintenance, movement, and evolution of a Great Plains avian (Cardinalidae: Pheucticus) hybrid zone. Molecular Ecology 18(15), 3256–3267.

Miller, M. J., S. E. Lipshutz, N. G. Smith, and E. Bermingham (2014). Genetic and phenotypic characterization of a hybrid zone between polyandrous Northern and Wattled Jacanas in western Panama. BMC Evolutionary Biology 14(1), 227.

Mitchell-Olds, T., J. H. Willis, and D. B. Goldstein (2007). Which evolutionary processes influence natural genetic variation for phenotypic traits? Nature Reviews Genetics 8(11), 845–856.

Morrissey, M. B. and J. D. Hadfield (2012). Directional selection in temporally replicated studies is remarkably consistent. Evolution 66(2), 435–442.

Nadeau, N. J., C. Pardo-Diaz, A. Whibley, M. A. Supple, S. V. Saenko, R. W. R. Wallbank, G. C. Wu, L. Maroja, L. Ferguson, J. J. Hanly, H. Hines, C. Salazar, R. M. Merrill, A. J. Dowling,

218 R. H. ffrench Constant, V. Llaurens, M. Joron, W. O. McMillan, and C. D. Jiggins (2016). The gene cortex controls mimicry and crypsis in butterflies and moths. Nature 534(7605), 106–110.

Nei, M. (2005). Selectionism and neutralism in molecular evolution. Molecular Biology and Evolution 22(12), 2318–2342.

Nei, M., Y. Suzuki, and M. Nozawa (2010). The neutral theory of molecular evolution in the genomic era. Annual Review of Genomics and Human Genetics 11(1), 265–289.

Nielsen, R. (2009). Adaptionism—30 years after Gould and Lewontin. Evolution 63(10), 2487– 2490.

Nosil, P. (2004). Reproductive isolation caused by visual predation on migrants between divergent environments. Proceedings of the Royal Society B: Biological Sciences 271(1547), 1521–1528.

Nosil, P. and B. J. Crespi (2006). Experimental evidence that predation promotes divergence in adaptive radiation. Proceedings of the National Academy of Sciences 103(24), 9090–9095.

Nosil, P., Z. Gompert, T. E. Farkas, A. A. Comeault, J. L. Feder, C. A. Buerkle, and T. L. Parchman (2012). Genomic consequences of multiple speciation processes in a stick insect. Proceedings of the Royal Society B: Biological Sciences 279(1749), 5058–5065.

Nsanzabana, C., I. M. Hastings, J. Marfurt, I. Müller, K. Baea, L. Rare, A. Schapira, I. Felger, B. Betschart, T. A. Smith, H. P. Beck, and B. Genton (2010). Quantifying the evolution and impact of antimalarial drug resistance: drug use, spread of resistance, and drug failure over a 12-year period in Papua New Guinea. The Journal of Infectious Diseases 201(3), 435–443.

Ohashi, J., I. Naka, J. Patarapotikul, H. Hananantachai, G. Brittenham, S. Looareesuwan, A. G. Clark, and K. Tokunaga (2004). Extended linkage disequilibrium surrounding the hemoglobin E variant due to malarial selection. The American Journal of Human Genetics 74(6), 1198–1208.

Ohashi, J., I. Naka, and N. Tsuchiya (2011). The impact of natural selection on an ABCC11 SNP determining earwax type. Molecular Biology and Evolution 28(1), 849–857.

219 Orengo, D. J. and M. Aguade (2007). Genome scans of variation and adaptive change: extended analysis of a candidate locus close to the phantom gene region in Drosophila melanogaster. Molecular Biology and Evolution 24(5), 1122–1129.

Orozco terWengel, P., M. Kapun, V.Nolte, R. Kofler, T. Flatt, and C. Schlotterer (2012). Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Molecular Ecology 21(20), 4931–4941.

Orr, H. A. (1998). The population genetics of adaptation: the distribution of factors fixed during adaptive evolution. Evolution 52(4), 935–949.

Orr, H. A. (2003). The distribution of fitness effects among beneficial mutations. Genetics 163(4), 1519–1526.

Orr, H. A. (2005a). The genetic theory of adaptation: a brief history. Nature Reviews Genetics 6(2), 119–127.

Orr, H. A. (2005b). Theories of adaptation: what they do and don’t say. Genetica 123(1-2), 3–13.

Orr, H. A. (2009). Fitness and its role in evolutionary genetics. Nature Reviews Genetics 10(8), 531–539.

Owen, D. F. and C. A. Clarke (1993). The medionigra polymorphism in the moth, Panaxia domin- ula (Lepidoptera: Arctiidae): a critical re-assessment. Oikos 67(3), 393–402.

Palmer, A. R. (1999). Detecting publication bias in meta-analyses: a case study of fluctuating asymmetry and sexual selection. The American Naturalist 154(2), 220–233.

Pavlidis, P., J. D. Jensen, W. Stephan, and A. Stamatakis (2012). A critical assessment of story- telling: gene ontology categories and the importance of validating genomic scans. Molecular Biology and Evolution 29(10), 3237–3248.

Peterson, B. K., J. N. Weber, E. H. Kay, H. S. Fisher, and H. E. Hoekstra (2012). Double digest

220 RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non- model species. PLoS ONE 7(5), e37135.

Pettorelli, N., J. O. Vik, A. Mysterud, J.-M. Gaillard, C. J. Tucker, and N. C. Stenseth (2005). Using the satellite-derived NDVI to assess ecological responses to environmental change. Trends in Ecology & Evolution 20(9), 503–510.

Phillips, B. L., S. Baird, and C. Moritz (2004). When vicars meet: a narrow contact zone between morphologically cryptic phylogeographic lineages of the rainforest skink, Carlia rubrigularis. Evolution 58(7), 1536–1548.

Pinheiro, C. E. G. (1996). Palatablility and escaping ability in Neotropical butterflies: tests with wild kingbirds (Tyrannus melancholicus, Tyrannidae). Biological Journal of the Linnean Soci- ety 59(4), 351–365.

Pinheiro, C. E. G. (2011). On the evolution of warning coloration, Batesian and Müllerian mimicry in Neotropical butterflies: the role of jacamars (Galbulidae) and tyrant-flycatchers (Tyrannidae). Journal of Avian Biology 42(4), 277–281.

Porter, A. H., R. Wenger, H. Geiger, A. Scholl, and A. M. Shapiro (1997). The Pontia daplidice- edusa hybrid zone in northwestern Italy. Evolution 51(5), 1561–1573.

Prentis, P. J., J. R. U. Wilson, E. E. Dormontt, D. M. Richardson, and A. J. Lowe (2008). Adaptive evolution in invasive species. Trends in Plant Science 13(6), 288–294.

Pringle, R. M., T. R. Kartzinel, T. M. Palmer, T. J. Thurman, K. Fox-Dobbs, C. C. Y. Xu, M. C. Hutchinson, T. C. Coverdale, J. H. Daskin, D. A. Evangelista, K. M. Gotanda, N. A. M. i. t. Veld, J. E. Wegener, J. J. Kolbe, T. W. Schoener, D. A. Spiller, J. B. Losos, and R. D. H. Barrett (2019). Predator-induced collapse of niche structure and species coexistence. Nature 570(7759), 58–64.

Pringle, R. M., D. M. Kimuyu, R. L. Sensenig, T. M. Palmer, C. Riginos, K. E. Veblen, and T. P.

221 Young (2015). Synergistic effects of fire and elephants on arboreal animals in an African sa- vanna. Journal of Animal Ecology 84(6), 1637–1645.

Purcell, S. M. and C. C. Chang (2017). Plink v1.90b4.6. www.cog-genomics.org/plink/1.9/.

QGIS Development Team (2018). Qgis geographic information system. open source geospatial foundation project. http://qgis.osgeo.org.

Quesada, H., U. E. M. Ramírez, J. Rozas, and M. Aguadé (2003). Large-scale adaptive hitchhiking upon high recombination in Drosophila simulans. Genetics 165(2), 895–900.

Quinn, G. P. and M. J. Keough (2002). Experimental design and data analysis for biologists. Cambridge, UK: Cambridge University Press.

R Core Team (2013). R: A language and environment for statistical computing. https://www. R-project.org/.

R Core Team (2017). R: A language and environment for statistical computing. https://www.R- project.org/.

Rasband, W. (1997-2018). Imagej. https://imagej.nih.gov/ij/.

Rego, A., F. J. Messina, and Z. Gompert (2019). Dynamics of genomic change during evolutionary rescue in the seed beetle Callosobruchus maculatus. Molecular Ecology 28, 2136–2154.

Reznick, D., H. Bryga, and J. A. Endler (1990). Experimentally induced life-history evolution in a natural population. Nature 346, 357–359.

Reznick, D. N., J. Losos, and J. Travis (2018). From low to high gear: there has been a paradigm shift in our understanding of evolution. Ecology Letters 51, 1742–12.

Rieseberg, L. H., A. Widmer, A. M. Arntz, and J. M. Burke (2002). Directional selection is the primary cause of phenotypic diversification. Proceedings of the National Academy of Sci- ences 99(19), 12242–12245.

222 Robinson, S. J., M. D. Samuel, C. J. Johnson, M. Adams, and D. I. McKenzie (2012). Emerging prion disease drives host selection in a wildlife population. Ecological Applications 22(3), 1050– 1059.

Rockman, M. V. (2012). The QTN program and the alleles that matter for evolution: all that’s gold does not glitter. Evolution 66(1), 1–17.

Rohland, N. and D. Reich (2012). Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Research 22(5), 939–946.

Roper, C., R. Pearce, B. Bredenkamp, J. Gumede, C. Drakeley, F. Mosha, D. Chandramohan, and B. Sharp (2003). Antifolate antimalarial resistance in southeast Africa: a population-based analysis. The Lancet 361(9364), 1174–1181.

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bul- letin 86(3), 638–641.

Rosser, N., K. K. Dasmahapatra, and J. Mallet (2014). Stable Heliconius butterfly hybrid zones are correlated with a local rainfall peak at the edge of the Amazon basin. Evolution 68(12), 3470–3484.

Roy, J.-S., D. O’Connor, and D. M. Green (2012). Oscillation of an anuran hybrid zone: morpho- logical evidence spanning 50 years. PLoS ONE 7(12), e52819.

Sandoval, C. P. (1994). The effects of the relative geographic scales of gene flow and selection on morph frequencies in the walking-stick Timema cristinae. Evolution 48(6), 1866–1879.

Santure, A. W. and D. Garant (2018). Wild GWAS — association mapping in natural populations. Molecular Ecology Resources 18(4), 729–738.

Santure, A. W., J. Poissant, I. De Cauwer, K. van Oers, M. R. Robinson, J. L. Quinn, M. A. M. Groenen, M. E. Visser, B. C. Sheldon, and J. Slate (2015). Replicated analysis of the genetic

223 architecture of quantitative traits in two wild great tit populations. Molecular Ecology 24(24), 6148–6162.

Schielzeth, H. and A. Husby (2014). Challenges and prospects in genome-wide quantitative trait loci mapping of standing genetic variation in natural populations. Annals of the New York Academy of Sciences 1320(1), 35–57.

Schielzeth, H., A. Rios Villamil, and R. Burri (2018). Success and failure in replication of genotype-phenotype associations: How does replication help in understanding the genetic basis of phenotypic variation in outbred populations? Molecular Ecology Resources 18(4), 739–754.

Schlotterer, C., R. Kofler, E. Versace, R. Tobler, and S. U. Franssen (2015). Combining experimen- tal evolution with next-generation sequencing: a powerful tool to study adaptation from standing genetic variation. Heredity 114(5), 431–440.

Schneider, C. A., W. S. Rasband, and K. W. Eliceiri (2012). NIH Image to ImageJ: 25 years of image analysis. Nature Methods 9(7), 671–675.

Schoener, T. W., J. J. Kolbe, M. Leal, J. B. Losos, and D. A. Spiller (2017). A multigenerational field experiment on eco-evolutionary dynamics of the influential lizard Anolis sagrei: a mid-term report. Copeia 105(3), 543–549.

Schoener, T. W. and A. Schoener (1983). The time to extinction of a colonizing propagule of lizards increases with island area. Nature 332, 332–334.

Schoener, T. W., J. B. Slade, and C. H. Stinson (1982). Diet and sexual dimorphism in the very catholic lizard genus, Leiocephalus of the Bahamas. Oecologia 53(2), 160–169.

Schweyen, H., A. Rozenberg, and F. Leese (2014). Detection and removal of PCR duplicates in population genomic ddRAD studies by addition of a degenerate base region (DBR) in sequenc- ing adapters. The Biological Bulletin 227(2), 146–160.

224 Seetharaman, S. and K. Jain (2013). Adaptive walks and distribution of beneficial fitness effects. Evolution 68(4), 965–975.

Sheppard, P. M., J. R. G. Turner, K. S. Brown, W. W. Benson, and M. C. Singer (1985). Genetics and the evolution of Muellerian mimicry in Heliconius butterflies. Philosophical Transactions of the Royal Society B: Biological Sciences 308(1137), 433–610.

Siepielski, A. M., J. D. DiBattista, and S. M. Carlson (2009). It’s about time: the temporal dynamics of phenotypic selection in the wild. Ecology Letters 12(11), 1261–1276.

Siepielski, A. M., J. D. DiBattista, J. A. Evans, and S. M. Carlson (2011). Differences in the temporal dynamics of phenotypic selection among fitness components in the wild. Proceedings of the Royal Society B: Biological Sciences 278(1711), 1572–1580.

Siepielski, A. M., K. M. Gotanda, M. B. Morrissey, S. E. Diamond, J. D. DiBattista, and S. M. Carlson (2013). The spatial patterns of directional phenotypic selection. Ecology Letters 16(11), 1382–1392.

Siepielski, A. M., M. B. Morrissey, M. Buoro, S. M. Carlson, C. M. Caruso, S. M. Clegg, T. Coul- son, J. DiBattista, K. M. Gotanda, C. D. Francis, J. Hereford, J. G. Kingsolver, K. E. Augus- tine, L. E. B. Kruuk, R. A. Martin, B. C. Sheldon, N. Sletvold, E. I. Svensson, M. J. Wade, and A. D. C. MacColl (2017). Precipitation drives global variation in natural selection. Sci- ence 355(6328), 959–962.

Slate, J. (2004). Quantitative trait locus mapping in natural populations: progress, caveats and future directions. Molecular Ecology 14(2), 363–379.

Slatkin, M. (2008). A Bayesian method for jointly estimating allele age and selection intensity. Genetics Research 90(1), 129–137.

Smith, J. M. and J. Haigh (1974). The hitch-hiking effect of a favourable gene. Genetical Re- search 23, 23–35.

225 Stan Development Team (2018). RStan: the R interface to Stan.

Stuart, Y. E., T. S. Campbell, P. A. Hohenlohe, R. G. Reynolds, L. J. Revell, and J. B. Losos (2014). Rapid evolution of a native species following invasion by a congener. Science 346(6208), 463– 466.

Stuart, Y. E., T. Veen, J. N. Weber, D. Hanson, M. Ravinet, B. K. Lohman, C. J. Thompson, T. Tasneem, A. Doggett, R. Izen, N. Ahmed, R. D. H. Barrett, A. P. Hendry, C. L. Peichel, and D. I. Bolnick (2017). Contrasting effects of environment and genetics generate a continuum of parallel evolution. Nature Ecology & Evolution 1(6), 0158.

Supple, M., R. Papa, B. Counterman, and W. O. McMillan (2014). The genomics of an adaptive radiation: insights across the Heliconius speciation continuum. In Ecological Genomics Ecology and the Evolution of Genes and Genomes, pp. 249–271. Dordrecht: Springer Netherlands.

Szymura, J. M. and N. H. Barton (1986). Genetic analysis of a hybrid zone between the Fire- Bellied Toads, Bombina bombina and B. variegata, near Cracow in Southern Poland. Evolu- tion 40(6), 1141–1159.

Taylor, S. M., A. Antonia, G. Feng, V. Mwapasa, E. Chaluluka, M. Molyneux, F. O. ter Kuile, S. J. Rogerson, and S. R. Meshnick (2012). Adaptive evolution and fixation of drug-resistant Plas- modium falciparum genotypes in pregnancy-associated malaria: 9-year results from the QuEER- PAM study. Infection, Genetics and Evolution 12(2), 282–290.

Terhorst, J., C. Schlotterer, and Y. S. Song (2015). Multi-locus analysis of genomic time series data from experimental evolution. PLoS Genetics 11(4), e1005069.

Thurman, T. (2019). bahz: Bayesian analysis of hybrid zones. https://github.com/tjthurman/BAHZ. R package version 0.0.0.9011.

Thurman, T. J. and R. D. H. Barrett (2016). The genetic consequences of selection in natural populations. Molecular Ecology 25(7), 1429–1488.

226 Travisano, M., J. A. Mongold, A. F. Bennett, and R. E. Lenski (1995). Experimental tests of the roles of adaptation, chance, and history in evolution. Science 267(5194), 87–90.

Travisano, M. and R. G. Shaw (2013). Lost in the map. Evolution 67(2), 305–314.

Turchin, M. C., C. W. Chiang, C. D. Palmer, S. Sankararaman, D. Reich, Genetic Investigation of ANthropometric Traits (GIANT) Consortium, and J. N. Hirschhorn (2012). Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nature Genet- ics 44(9), 1015–1019.

Urban, M. C., G. Bocedi, A. P. Hendry, J. B. Mihoub, G. Peer, A. Singer, J. R. Bridle, L. G. Crozier, L. De Meester, W. Godsoe, A. Gonzalez, J. J. Hellmann, R. D. Holt, A. Huth, K. Johst, C. B. Krug, P. W. Leadley, S. C. F. Palmer, J. H. Pantel, A. Schmitz, P. A. Zollner, and J. M. J. Travis (2016). Improving the forecast for biodiversity under climate change. Science 353(6304), aad8466.

Van Belleghem, S. M., P. Rastas, A. Papanicolaou, S. H. Martin, C. F. Arias, M. A. Supple, J. J. Hanly, J. Mallet, J. J. Lewis, H. M. Hines, M. Ruiz, C. Salazar, M. Linares, G. R. P. Moreira, C. D. Jiggins, B. A. Counterman, W. O. McMillan, and R. Papa (2017). Complex modular architecture around a simple toolkit of wing pattern genes. Nature Ecology & Evolution 1, 0052.

Vendrami, D. L. J., L. Telesca, H. Weigand, M. Weiss, K. Fawcett, K. Lehman, M. S. Clark, F. Leese, C. McMinn, H. Moore, and J. I. Hoffman (2017). RAD sequencing resolves fine- scale population structure in a benthic invertebrate: implications for understanding phenotypic plasticity. Royal Society Open Science 4(2), 160548–17.

Vischer, N. and S. Nastase (2015). objectj. https://sils.fnwi.uva.nl/bcb/objectj/index.html.

Vitalis, R., M. Gautier, K. J. Dawson, and M. A. Beaumont (2014). Detecting and measuring selection from gene frequency data. Genetics 196(3), 799–817.

227 Vitti, J. J., S. R. Grossman, and P. C. Sabeti (2013). Detecting natural selection in genomic data. Annual Review of Genetics 47(1), 97–120.

Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and Widely Applicable Information Criterion in singular learning theory. Journal of Machine Learning Research 11, 3571–3594.

Whitlock, M. C. and D. Schluter (2009). The analysis of biological data. Greenwood Village, CO: Roberts and Co. Publishers.

Williams, E. E. (1972). The origins of faunas. Evolution of lizard congeners in a complex island fauna: a trial analysis. Evolutionary Biology 6, 47–89.

Wilson, D. J. (2019). The harmonic mean p-value for combining dependent tests. Proceedings of the National Academy of Sciences 104, 201814092.

Wilson, J. D., D. J. Schmidt, and J. M. Hughes (2016). Movement of a hybrid zone between lineages of the Australian glass shrimp (Paratya australiensis). Journal of Heredity 107(5), 413–422.

Xu, S. (2003). Theoretical basis of the Beavis effect. Genetics 165(4), 2259–2268.

Yeaman, S. and M. C. Whitlock (2011). The genetic architecture of adaptation under migra- tion–selection balance. Evolution 65(7), 1897–1911.

Zhou, X., P. Carbonetto, and M. Stephens (2013). Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genetics 9(2), e1003264.

Zhou, X. and M. Stephens (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Ecology & Evolution 44(7), 821–824.

228