The Evolution of Reproductive Divergence in the Sea

by Jennifer M. Sunday B.Sc., University of British Columbia, 2002

Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

in the Department of Biological Sciences Faculty of Science

© Jennifer M. Sunday 2012 SIMON FRASER UNIVERSITY Fall 2012

All rights reserved. However, in accordance with the Copyright Act of Canada, this work may be reproduced, without authorization, under the conditions for “Fair Dealing.” Therefore, limited reproduction of this work for the purposes of private study, research, criticism, review and news reporting is likely to be in accordance with the law, particularly if cited appropriately.

Approval

Name: Jennifer M. Sunday Degree: Doctor of Philosophy (Science) Title of Thesis: The evolution of reproductive divergence in the sea

Examining Committee: Chair: Gordon Rintoul, Associate Professor

Michael Hart Senior Supervisor Associate Professor

Felix Breden Supervisor Associate Professor

Elizabeth Elle Internal Examiner Assistant Professor Department of Biological Sciences

Michael Hellberg External Examiner Associate Professor, Department of Biological Sciences Louisiana State University

Date Defended: October 3, 2012

ii

Partial Copyright Licence

iii

Abstract

Understanding how speciation occurs in the ocean is challenging because the high dispersal potential of marine larvae, and the scarcity of absolute physical barriers to their dispersal, suggest that gene flow should slow or prevent the evolution of divergence among populations. However, spatial heterogeneity in gene flow and localized sexual selection are two potential drivers of divergence among marine populations. Here I investigate how gene flow and sexual selection contribute to reproductive divergence in a coastal seastar with a long larval pelagic phase, Patiria miniata. I first use microsatellite markers to assess genetic population structure across the species range along the west coast of North America, and find a genetic disjunction near the central coast of British Columbia consistent with two hypotheses about the effects of historical climate events and contemporary gene flow. I next use an oceanographic dispersal model to assess the extent to which variation in larval dispersal can account for this structure. I find that oceanographic features predict the genetic structure observed better than dispersal distance between populations alone. Given this genetic structure, I next test the hypothesis that fertilization proteins of this broadcast spawner have diversified among the two (northern and southern) populations, as predicted under a hypothesis of sexual conflict. My findings reveal divergent, positive selection in the sperm protein bindin, which suggest that sexual selection has lead to localized divergence at this fertilization compatibility gene. Finally, I test fertilization compatibility between males and females as a function of population source and male bindin genotype. I find that localized coevolution has occurred in the southern population, and that southern females have a greater affinity than northern females for male bindin genotypes found in the south. Together, these findings provide evidence that patterns in larval dispersal and sexual selection can lead to reproductive divergence in a marine species in spite of its high dispersal potential. Characterizing both genetic structure and adaptive molecular evolution among populations is a powerful approach for understanding incipient speciation in the sea.

Keywords: Larval dispersal, population genetic structure, gamete compatibility, bindin, positive selection, sexual conflict, reproductive coevolution, reproductive divergence

iv

I dedicate this work to my parents. My dad has

taught me to be infinitely curious, persistently

creative and never to stop learning, and my

mom has taught me to aim high in all of my endeavours and to handle challenges with grace

and integrity. They are both the most

wonderful teachers and fantastic friends.

v

Acknowledgements

There are many people to acknowledge for the completion of this work. I thank first my senior supervisor, Mike Hart, who has been infinitely supportive and has provided tremendous guidance towards every chapter of my thesis from conception to completion. Thanks also to Felix Breden who has provided much guidance and advice in the design of projects, analyses, interpretation, and writing. Thanks to Carson Keever for a huge amount of field and laboratory assistance, and very helpful discussions throughout. Thanks also to Susana Patiño for critical support in bindin genotyping and many chats over data collection and interpretation, and to Iva Popovic for helpful discussions and edits. Thanks to each of my coauthors for their various contributions and assistance with writing. Thanks also to Isabelle Côté for guidance and support in early stages of my thesis work, and to my examining committee for helpful comments on my thesis.

I wish to thank all the members of FAB* lab for their fantastic guidance. This lab group is a great place to develop as a scientist, and I am extremely grateful for their dependable ability to help clarify ideas, often poignantly and immediately. Thanks especially to Bernie Crespi, Arne Mooers, and Jeffery Joy (and those aforementioned) in this regard.

Thanks to Shane Anderson, John and Vicki Pearse, Edmund Sunday, Scott Walker, Morgan Hocking, Joel Harding, Sophia and Olivia Dulvy, the staff at the Bamfield Marine Sciences Centre, and staff at Moresby Explorers for assistance in collection of sea stars.

I thank the Natural Sciences and Engineering Research Council of Canada, the Simon Fraser University Graduate Fellowship program, and the Garfield Weston Foundation/B.C. Packers for financial support.

I wish to thank my parents for their tremendous support and encouragement.

Finally, I thank my fantastic husband, Mike McDermid, for a million useful discussions, encouragement, comic relief, and above all his patience and undying support.

vi

Table of Contents

Approval ...... ii Partial Copyright Licence ...... iii Abstract ...... iv Acknowledgements ...... vi Table of Contents ...... vii List of Tables ...... x List of Figures ...... xi

Chapter 1. General Introduction ...... 1 Thesis chapter overview and author contributions ...... 3 References ...... 7

Chapter 2. Discordant distribution of populations and genetic variation in a sea star with high dispersal potential ...... 11 Abstract ...... 11 Introduction ...... 12 Methods ...... 15 Population Sampling ...... 15 Genetic Data Collection ...... 15 Microsatellites ...... 15 mtDNA ...... 15 Introns ...... 16 Quantitative Analysis ...... 17 Polymorphism ...... 17 Heuristic measures of population structure ...... 17 Population differentiation ...... 18 Gene flow and effective population size ...... 19 Results ...... 21 Polymorphism ...... 21 Population Structure ...... 22 Population differentiation ...... 23 Gene Flow and Effective Population Size ...... 24 Discussion ...... 26 Population Genetic Patterns in P. miniata ...... 27 Comparison to Close Neighbours and Close Relatives ...... 29 References ...... 31 Tables and Figures ...... 39

Chapter 3. There’s no place like home: oceanographic circulation model predicts low larval dispersal and observed genetic structure along a complex coastline ...... 44 Abstract ...... 44

vii

Introduction ...... 45 Methods ...... 47 Study species ...... 47 Oceanographic Model ...... 47 Seeding locations ...... 48 Connectivity Matrix ...... 49 Genetic model ...... 49 Molecular sampling ...... 50 New molecular samples ...... 50 Analysis ...... 51 Results ...... 52 Population Clustering ...... 53 Discussion ...... 54 References ...... 59 Tables and Figures ...... 64

Chapter 4. Sea star populations diverge by positive selection at a sperm- egg compatibility locus ...... 70 Abstract ...... 70 Introduction ...... 70 Methods ...... 73 Bindin sampling and alignment ...... 73 Gene genealogy ...... 75 Repetitive domain analysis ...... 75 Tests of population structure ...... 76 Tests of positive selection ...... 76 Results ...... 79 Polymorphism ...... 79 Bindin genealogy ...... 80 Relationships among repeat paralogs ...... 81 Bindin population structure ...... 81 Tests of positive selection ...... 82 Number of Indels in coding vs. non coding region ...... 83 Covariation between sites under selection and indels ...... 83 Discussion ...... 84 Selective mechanisms ...... 86 Potential function of bindin coding sequence variation ...... 88 Conclusion ...... 89 References ...... 90 Tables and Figures ...... 95

Chapter 5. Localized reproductive coevolution in a seastar ...... 103 Abstract ...... 103 Introduction ...... 104 Methods ...... 107 Study Design ...... 107

viii

Collections ...... 107 Fertilization Crosses ...... 107 Fertilization success ...... 108 Bindin genotypes ...... 109 Characterization of repeat variation ...... 109 Characterization of sites under positive selection ...... 110 Models of bindin influence on fertilization ...... 110 Dominance ...... 110 Heterozygosity...... 111 Male-female bindin matching ...... 111 Gamete measurements ...... 112 Analysis ...... 113 Differences in fertilization success among populations ...... 113 Differences in fertilization success among bindin genotypes ...... 113 Results ...... 114 Effect of population source on fertilization success ...... 114 Effect of genotype on fertilization success ...... 115 Gamete traits ...... 116 Discussion ...... 117 Conclusion ...... 121 References ...... 122 Tables and Figures ...... 126

Chapter 6. Synthesis ...... 135 Summary of Findings ...... 135 Novel Contributions ...... 136 The potential role of sexual conflict ...... 137 Speciation in the sea? ...... 139 References ...... 140

Appendix ...... 143 Appendix A. Population pairwise FST values averaged across 7 microsatellite loci...... 144 Appendix B. Branch-sites model results from PAML analysis...... 145 Appendix C. Site-by-site Bayes-empirical-Bayes probability of positive selection in the first recombinant region of bindin in P. miniata for all gene trees investigated...... 146 Appendix D. Model results of fertilization success with and without inclusion of high sperm treatment...... 147 Appendix E. Egg cell diameter and egg jelly coat thickness across populations...... 148 Appendix F. Sperm head diameter does not vary across populations...... 149

ix

List of Tables

Table 2.1. Sample locations and number of individuals genotyped per location (n) for each marker class...... 39

Table 2.2. Patiria miniata population structure from analysis of molecular variance (AMOVA) in allele frequencies...... 40

Table 3.1. Connectivity matrix of the proportion of particles moving from each dispersal area to each recruitment area...... 64

Table 3.2. FST and distance matrices...... 65

Table 4.1. Details of codon sites under positive selection...... 95

Table 4.2. Review of allelic and indel variation in gamete recognition genes of broadcast spawning marine invertebrates ...... 96

Table 5.1. Results of linear mixed effects model showing effect of male and population source on fertilization success...... 126

Table 5.2. Generalized linear model results for fertilization success...... 127

Table 5.3. Generalized linear model results for fertilization success based on male and female genotype similarity...... 128

Table 5.4. Linear model results for effect of male heterozygosity on variance in fertilization success among the females with which he was mated...... 128

x

List of Figures

Figure 1.1 Haplotype network for Patiria miniata mtDNA sequences...... 41

Figure 1.2 STRUCTURE clustering of Patiria miniata microsatellite (A) and GPI + microstellite (B) genotypes...... 42

Figure 1.3 Asymmetrical gene flow and effective population sizes estimated in MIGRATE...... 43

Fig. 3.1. Dispersal of simulated larvae through an oceanographic circulation model on the west coast of British Columbia and south eastern Alaska...... 66

Fig. 3.2. Pairwise FST between two example population pairs with time...... 66

Fig. 3.3. Relationships between model-predicted FST, observed FST, and geographic distance between sites...... 67

Fig. 3.4. Predicted and observed FST for comparisons to the newly-sampled Bella Bella population...... 68

Figure 3.5. Results from STRUCTURE analysis showing clustering data for Central Coast in combination with genetic data from previously sampled populations...... 69

Fig. 4.1. Schematic diagram of bindin gene coding region and map of study area...... 97

Fig. 4.2. Summary of bindin diversity across sampled populations...... 98

Fig. 4.3. Consensus Bayesian gene tree and gene alignment for the first recombinant region of the bindin gene in Patiria miniata...... 100

Fig. 4.4. Neighbour-joining gene trees from alignments of (a) collagen-like copies and (b) tandem repeats across 44 individuals...... 101

Fig. 4.5. Site-by-site probability of positive selection, and location of indels, in the first exon of bindin in Patiria miniata...... 102

Fig. 5.1. Summary of bindin variation and location of collection sites...... 129

Fig. 5.2. Fertilization success depends on the population of origin in different combinations of mating pairs...... 130

Fig. 5.3. Heterozygosity in collagen-like copy number variation affects fertilization success for southern males...... 131

Fig. 5.4. Length of shortest male bindin allele affects fertilization success with (a) southern females, but not with (b) northern females...... 132

xi

Fig. 5.5. Number of derived states at sites under selection affects fertilization success with populations...... 133

Fig. 5.6. Sperm velocity affects fertilization success for different male source populations...... 134

xii

Chapter 1.

General Introduction

Understanding mechanisms of speciation in the ocean is a challenge to our classic terrestrial-based conceptualizations of divergence and macro-evolution, due to high dispersal potentials and a rarity of physical barriers in the ocean (Palumbi 1992; Palumbi 1994; Hellberg 1998). Large-scale barriers and historical tectonic events can create obvious barriers to gene flow in the ocean, and have likely contributed to speciation events in the past (eg. Mayr 1954; Lessios 1981). But what is the potential for local adaptation and speciation among marine populations on smaller geographic and temporal scales?

Gene flow is traditionally considered as the primary impediment to localized trait divergence in the ocean (Mayr 1963). Development mode and pelagic duration of propagules vary greatly among marine taxa, and estimated dispersal distances can range from 10-3-104 km per generation (Shanks et al. 2003). However, potential dispersal distances based on life histories do not always predict realized dispersal. Substantial evidence suggests that interactions between larval behaviour, local oceanographic conditions, and availability of suitable can limit realized dispersal of larvae (Jones et al. 1999; Cowen et al. 2000; Taylor & Hellberg 2003; reviews in Swearer et al. 2002; Queiroga & Blanton 2005; Levin 2006).

There is also evidence that selection, if strong enough, can overcome intermediate levels of gene flow, and that local adaptation can occur despite ongoing gene flow between populations (Erhlich & Raven 1969; Koehn et al. 1980; Saint-Laurent et al. 2003). Local adaptation is a prominent phenomenon in terrestrial and freshwater systems, but has less frequently been identified in marine taxa (Sanford et al. 2003; Sotka 2005; Conover et al. 2006), potentially reflecting a lack of empirical effort (Sotka, 2005), or a detection problem in identifying ecological trait divergence in marine taxa

1

(Knowlton 1993; Sotka 2005; Conover et al. 2006). Indeed, marine organisms are more difficult to access and preserve than terrestrial species, and the prominence of chemical rather than visual interactions make important ecological differences difficult to detect (Knowlton 1993).

Multiple findings of monophyletic species groups coexisting along coastlines, or in remote island archipelagos, suggest that speciation in marine taxa has occurred without large permanent barriers to dispersal (Hellberg 1998; Munday et al. 2004; Rocha et al. 2005; Crow et al. 2010; Bird et al. 2011; Krug 2011). These have sometimes been attributed to ecological speciation in sympatry (Munday et al. 2004; Crow et al. 2010; Bird et al. 2011; Krug 2011), or to divergence during non-permanent allopatric events such as Pleistocene glacial shifts (Hellberg 1998l; Krug 2011). Remarkably, pelagic dispersal does not seem to have been an impediment to speciation in these examples (Hellberg 1998; Munday et al. 2004; Crow et al. 2010; Bird et al. 2011).

Considerable recent work suggests that among broadcast spawning invertebrates, visually cryptic reproductive traits may be under strong diversifying selection. In broadcast spawning species, fertilization compatibility is primarily facilitated by proteins on the surface of sperm and eggs. Analysis of genomic sequences of these fertilization proteins has revealed high rates of positive selection in multiple species, including sea urchins (Metz & Palumbi 1996; Biermann 1998; McCartney & Lessios 2004), abalone (Lee & Vacquier 1992; Galindo et al. 2003) turban snails (Hellberg & Vacquier 1999), (Riginos & McDonald 2003; Springer & Crespi 2007) and oysters (Moy et al. 2008). These findings are counter to the general expectation that functional proteins so closely associated with reproductive fitness should be under strong purifying selection.

One mechanism to explain this observation is antagonistic coevolution between the sexes akin to an arms race. Sexual conflict is thought to arise among male and female gametes of broadcast spawning species when sperm competition is high, because eggs are at risk of polyspermy (supernumerary sperm fusion), which usually results in embryo death (Styan 1998; Franke et al. 2002; Levitan 2004 Levitan, 2004). While males are presumably selected for high rates of fertilization under strong inter- male competition, selection may favour females who reduce fertilization rates, allowing

2

the egg time to establish a physical block to polyspermy (Vacquier et al. 1997). Eggs with surface protein receptors that are unmatched or rare among competing sperm may therefore be favoured by selection, while sperm should be selected to close match the receptors of most females (Vacquier et al. 1997). Direct evidence for this conflict comes from fertilization experiments in sea urchins, in which females with non-matching (Strongylocentrotus franciscanus) or rare (S. purpuratus) protein genotypes had greater mating success when sperm densities were high (Levitan & Ferrell 2006; Levitan & Stapper 2009).

Remarkably, there are few empirical studies which investigate divergence in fertilization proteins and fertilization compatibility between spatially distinct populations of marine invertebrates. Some work has shown differences in allele frequencies of gamete compatibility loci among geographically separated populations (in sea urchins: Geyer & Palumbi 2003, in mussels: Riginos et al. 2006; Springer & Crespi 2007), and other work has shown reduced gamete compatibility between geographically separated populations of the same species (in sea urchins: Biermann & Marks 2000, polychaetes: Styan et al. 2008, oysters: Zhang et al. 2010). To date, no study has investigated both molecular divergence in fertilization loci with experiments of compatibility across populations to test if selection within populations can drive incompatibility between them.

The aim of this dissertation is to understand how gene flow and sexual selection can contribute to localized reproductive divergence within a marine species, as a possible precursor towards speciation. I focus on a coastal asteroid with planktonic larval development, found in the north eastern Pacific Ocean, Patiria miniata.

Thesis chapter overview and author contributions

With a geographic distributional range that crosses several putative current- driven dispersal barriers (Palumbi 1994), and regions historically influenced by glacial cover, P. miniata could reasonably be expected to show signals of historical and/or contemporary barriers to gene flow. Alternatively, the long planktonic larval duration of P. miniata, on the order of 6 to 10 weeks, may have brought the system to drift-migration equilibrium, decreasing the overall genetic structure (Bohonak 1999). Chapter 1

3

represents a collaborative project to characterize the genetic structure of P. miniata and test these predictions using multiple classes of non-coding molecular loci. My collaborators were Carson Keever, Jonathan Puritz, Jason Addison, Robert Toonen,

Richard Grosberg, and Michael Hart. I investigated seven nuclear microsatellite loci from 9 populations along the P. miniata distributional range, and revealed evidence for two genetically differentiated populations north and south of Queen Charlotte Sound, between Haida Gwaii and Vancouver Island in British Columbia. Carson and Jonathan collected and analysed two other classes of markers (mitochondrial and nuclear sequences), which showed a concordant genetic break in this same region, and provided additional geneological information to understand the history of the divergence. Our findings show a historical genetic break that appears to have been maintained at least since the last glaciation. I share first authorship of this paper with Carson and Jonathan, and all authors contributed text, analytical advice, and interpretation.

The phylogeographic break discovered in Chapter 1 indicates that gene flow between the northern and southern populations may be low, despite a long pelagic duration in this species. In Chapter 2, I use an independent model of contemporary gene flow to determine if predicted larval advection can explain the observed patterns in genetic structure. I use a particle dispersion model that simulates ocean currents in the north eastern Pacific ocean, developed by Dr. Michael Foreman and colleagues at the Department of Fisheries and Oceans (Foreman et al. 2008). I use this model to simulate advection of P. miniata larvae to and from multiple locations throughout the species range, and test the predictions of this model against observed genetic structure generated in Chapter 1. I also use the connectivity matrix generated by the dispersal model to predict allele frequencies of a previously-unsampled region of P. miniata, at the approximate location of the phylogeographic break. One of my coauthors, Iva Popovic, subsequently sampled this population and we analyzed how observed genetic structure matched the model predictions. We show that the circulation model reasonably predicts genetic structure observed along the BC coast, indicating that patterns in larval dispersal may be responsible for maintaining the high genetic structure observed. Iva genotyped and analysed samples from the newly-sampled population, Michael Foreman contributed the oceanographic circulation simulation model, Wendy Palen conducted spatially-

4

explicit analyses in GIS, and Michael Hart helped substantially with interpretation and text.

Populations in allopatry may undergo different evolutionary trajectories in reproductive traits, potentially leading to reproductive divergence. In Chapter 3, I investigate molecular signals of localized sexual selection within populations north and south the phylogenetic break at Queen Charlotte Sound. P. miniata is found in high densities (Rumrill 1989), and is likely to mirror other high-density spawning in which a high risk of polyspermy and strong sexual conflict has been identified (Levitan & Ferrell 2006). Such a process is predicted to favour the accumulation of novel or rare genotypes in fertilization compatibility proteins within populations, which are expected to differ across populations (Gavrilets 2000). In this chapter, I sample the gene coding for the sperm-expressed fertilization protein, bindin, in both regions, and test for positive selection (a relatively high rate of nonsynonymous nucleotide substitutions to the rate of synonymous substitutions) within either population. I find a remarkable level of genetic variation in both populations relative to variation in fertilization proteins in other taxa, particularly in copy number at two variable repetitive domains. I also find a signal of positive Darwinian selection in both populations, at different codon sites in either population. These findings provide the first evidence of population-divergent positive selection in a fertilization compatibility locus, and suggest that divergent sexual selection can lead to reproductive divergence in marine taxa. Michael Hart contributed substantially to analytical design, interpretation, and text.

While these findings suggest selection has promoted divergence between these two populations at reproductive loci, the functional consequences of this variation was yet unknown. In Chapter 4, therefore, I test how fertilization success relates to both population source and male bindin genotype, by reciprocally mating individuals from populations north and south of the phylogenetic break. I find evidence of localized reproductive coevolution in the southern population, as southern males have greater fertilization success with southern females than with northern females. I also find greater fertilization success among males with shorter bindin alleles (fewer repeats in variable- length regions) in crosses with southern females, but not with northern females. These findings provide evidence of a greater female affinity for shorter bindin alleles in the southern population, where shorter bindin genotypes are found in greater frequencies,

5

where the signal for positive selection was stronger, and where there is likely greater sperm competition. Michael Hart contributed substantially to analytical design, interpretation, and text.

At the end of my thesis I provide a general conclusion which brings these findings together and addresses some unresolved issues and future directions for understanding the mechanisms of the reproductive divergence observed, and implications for the evolution of reproductive isolation in the sea.

6

References

Biermann CH (1998) The molecular evolution of sperm bindin in six species of sea urchins (Echinoida : Strongylocentrotidae). Molecular Biology and Evolution 15, 1761-1771.

Biermann CH, Marks JA (2000) Geographic divergence of gamete recognition systems in two species in the sea urchin Strongylocentrotus. Zygote 8, S86-S87.

Bird CE, Karl SA, Smouse PE, Toonen RJ (2011) Detecting and measuring genetic differentiation. In Phylogeography and Population Genetics in Crustacea (ed. Koenemann S, Held C, Schubart C), pp. 31-55: CRC Press Crustacean Issues Series, vol. 19.

Bohonak AJ (1999) Dispersal, gene flow, and population structure. Quarterly Review of Biology 74, 21-45.

Conover D, Clarke L, Munch S, Wagner G (2006) Spatial and temporal scales of adaptive divergence in marine fishes and the implications for conservation. Journal of Fish Biology 60 21-47.

Cowen RK, Lwiza KMM, Sponaugle S, Paris CB, Olson DB (2000) Connectivity of Marine Populations: Open or Closed? Science 287, 857-859

Crow KD, Munehara H, Bernardi G (2010) Sympatric speciation in a genus of marine reef fishes. Molecular Ecology 19, 2089-2105.

Erhlich P, Raven P (1969) Differentiation of populations. Science 195, 1228–1232.

Foreman MGG, Crawford WR, Cherniawsky JY, Galbraith J (2008) Dynamic ocean topography for the northeast Pacific and its continental margins. Geophysical Research Letters 35.

Franke ES, Babcock RC, Styan CA (2002) Sexual conflict and polyspermy under sperm- limited conditions: In situ evidence from field simulations with the free-spawning marine echinoid Evechinus chloroticus. American Naturalist 160, 485-496.

Galindo BE, Vacquier VD, Swanson WJ (2003) Positive selection in the egg receptor for abalone sperm lysin. Proceedings of the National Academy of Sciences of the United States of America 100, 4639-4643.

Gavrilets S (2000) Rapid evolution of reproductive barriers driven by sexual conflict. Nature 403, 886-889.

Geyer LB, Palumbi SR (2003) Reproductive character displacement and the genetics of gamete recognition in tropical sea urchins. Evolution 57, 1049-1060.

Hellberg ME (1998) Sympatric sea shells along the sea's shore: The geography of speciation in the marine gastropod Tegula. Evolution 52, 1311-1324.

7

Hellberg ME, Vacquier VD (1999) Rapid evolution of fertilization selectivity and lysin cDNA sequences in teguline gastropods. Molecular Biology and Evolution 16, 839-848.

Jones G, Milicich M, Emslie M, Lunow C (1999) Self recruitment in a coral reef fish population. Nature 402, 802-804.

Knowlton N (1993) Sibling species in the sea. Annual Review of Ecology Systematics 24, 189–216.

Koehn R, Newell R, Immermann F (1980) Maintenance of an aminopeptidase allele frequency cline by natural-selection. Proceedings of the National Academy of Sciences 77, 5385–5389.

Krug PJ (2011) Patterns of speciation in marine gastropods: A review of the phylogenetic evidence for localized radiations in the sea. American Malacological Bulletin 29, 169-186.

Lee Y, Vacquier VD (1992) The divergence of species-specific abalone sperm lysins in promoted by positive darwinian selection. Biological Bulletin 182, 97-104.

Lessios HA (1981) Divergence in allopatry - molecular and morphological differentiation between sea urchins separated by the Isthmus of Panama. Evolution 35, 618- 634.

Levin L (2006) Recent Progress in Understanding Larval Dispersal: New Directions and Digressions. Integrative and Comparative Biology 46, 282-297.

Levitan DR (2004) Density-dependent sexual selection in external fertilizers: Variances in male and female fertilization success along the continuum from sperm limitation to sexual conflict in the sea urchin Strongylocentrotus franciscanus. American Naturalist 164, 298-309.

Levitan DR, Ferrell DL (2006) Selection on gamete recognition proteins depends on sex, density, and genotype frequency. Science 312, 267-269.

Levitan DR, Stapper AP (2009) Simultaneous positive and negative frequency- dependent selection on sperm bindin, a gamete recognition protein in the sea urchin Strongylocentrotus purpuratus. Evolution 64, 785-797.

Mayr E (1954) Geographic speciation in tropical echinoids. Evolution 8, 1-18.

Mayr E (1963) Species and Evolution. Cambridge: Belknap Press, Harvard

McCartney MA, Lessios HA (2004) Adaptive evolution of sperm bindin tracks egg incompatibility in neotropical sea urchins of the genus Echinometra. Molecular Biology and Evolution 21, 732-745.

8

Metz EC, Palumbi SR (1996) Positive selection and sequence rearrangements generate extensive polymorphism in the gamete recognition protein bindin. Molecular Biology and Evolution 13, 397-406.

Moy GW, Springer SA, Adams SL, Swanson WJ, Vacquier VD (2008) Extraordinary intraspecific diversity in oyster sperm bindin. Proceedings of the National Academy of Sciences of the United States of America 105, 1993-1998.

Munday PL, van Herwerden L, Dudgeon CL (2004) Evidence for sympatric speciation by host shift in the sea. Current Biology 14, 1498-1504.

Palumbi SR (1992) Marine speciation on a small planet. Trends in Ecology & Evolution 7, 144-188.

Palumbi SR (1994) Genetic divergence, reproductive isolation, and marine speciation. Annual Review of Ecology and Systematics 25, 547-572.

Queiroga H, Blanton BO (2005) Interactions between behaviour and physical forcing in the control of horizontal transport of decapod crustaceans’ larvae: an overview. Advances in Marine Biology 47, 107–214.

Riginos C, McDonald JH (2003) Positive selection on an acrosomal sperm protein, M7 lysin, in three species of the genus Mytilus. Molecular Biology and Evolution 20, 200-207.

Riginos C, Wang D, Abrams AJ (2006) Geographic variation and positive selection on M7 lysin, an acrosomal sperm protein in mussels (Mytilus spp.). Molecular Biology and Evolution 23, 1952-1965.

Rocha LA, Robertson DR, Roman J, Bowen BW (2005) Ecological speciation in tropical reef fishes. Proceedings of the Royal Society B-Biological Sciences 272, 573- 579.

Rumrill SS (1989) Population size structure, juvenile growth, and breeding periodicity of the sea star Asterina miniata in Barkley Sound, British Columbia. Marine Ecology-Progress Series 56, 37-47.

Saint-Laurent R, Legault M, Bernatchez L (2003) Divergent selection maintains adaptive differentiation despite high gene flow between sympatric rainbow smelt ecotypes (Osmerus mordax Mitchill). Molecular Ecology 12, 315–330.

Sanford E, Roth M, Johns G, Wares J, G. Somero (2003) Local selection and latitudinal variation in a marine predator-prey interaction. Science 300, 1135–1137.

Shanks AL, Grantham BA, Carr MH (2003) Propagule dispersal distance and the size and spacing of marine reserves. Ecological Applications 13, S159– S169.

Sotka EE (2005) Local adaptation in host use among marine invertebrates. Ecology Letters 8, 448–459.

9

Springer SA, Crespi BJ (2007) Adaptive gamete-recognition divergence in a hybridizing Mytilus population. Evolution 61, 772-783.

Styan CA (1998) Polyspermy, egg size, and the fertilization kinetics of free-spawning marine invertebrates. American Naturalist 152, 290-297.

Styan CA, Kupriyanova E, Havenhand JN (2008) Barriers to cross-fertilization between populations of a widely dispersed polychaete species are unlikely to have arisen through gametic compatability arms races. Evolution 62, 3041-3055.

Swearer SE, Shima JS, Hellberg ME, Thorrold SR, Jones GP, Robertson DR, Morgan SG, Selkoe KA, Ruiz GM, Warner. RR (2002) Evidence of self-recruitment in demersal marine populations. Bulletin of Marine Science 70, 251-271.

Taylor MS, Hellberg ME (2003) Genetic evidence for local retention of pelagic larvae in a Caribbean reef fish. Science 299, 107-109.

Vacquier VD, Swanson WJ, Lee YH (1997) Positive Darwinian selection on two homologous fertilization proteins: What is the selective pressure driving their divergence? Journal of Molecular Evolution 44, S15-S22.

Zhang HB, Scarpa J, Hare MP (2010) Differential Fertilization Success Between Two Populations of Eastern Oyster, Crassostrea virginica. Biological Bulletin 219, 142-150.

10

Chapter 2.

Discordant distribution of populations and genetic variation in a sea star with high dispersal potential

This chapter is published in the journal, Evolution (Keever et al., 2009), and is included with permission from the coauthors. Authors: C.C. Keever, J.M. Sunday, J.B. Puritz, J.A. Addison, R.J. Toonen, R.K. Grosberg, M.W. Hart

Abstract

Patiria miniata, a broadcast-spawning sea star species with high dispersal potential, has a geographic range in the intertidal zone of the northeast Pacific Ocean from Alaska to California that is characterized by a large range gap in Washington and Oregon. We analyzed spatial genetic variation across the P. miniata range using multilocus sequence data (mtDNA, nuclear introns) and multilocus genotype data (microsatellites). We found a strong phylogeographic break at Queen Charlotte Sound in British Columbia that was not in the location predicted by the geographical distribution of the populations. However, this population genetic discontinuity does correspond to previously described phylogeographic breaks in other species. Northern populations from Alaska and Haida Gwaii were strongly differentiated from all southern populations from Vancouver Island and California. Populations from Vancouver Island and California were undifferentiated with evidence of high gene flow or very recent separation across the range disjunction between them. The surprising and discordant spatial distribution of populations and alleles suggests that historical vicariance (possibly caused by glaciations) and contemporary dispersal barriers (possibly caused by oceanographic conditions) both shape population genetic structure in this species.

11

Keywords: ; ATPS; effective population size; GPI; life history; planktotrophy

Introduction

Explaining the origin and persistence of large geographical discontinuities in species distributions, such as the antitropical distributions of many temperate-zone and plants, is one of the original goals of evolutionary ecology (Darwin 1859; Ekman 1953; Briggs 1987; Wiley 1988; Lindberg 1991). Such range disjunctions may be initiated and maintained by a complex combination of factors, encompassing extrinsic geological and climatic barriers to dispersal and colonization, and intrinsic biological properties of organisms including habitat preferences and dispersal capabilities (Schwaninger 2008). Dispersal barriers such as mountain ranges or climatic effects associated with Pleistocene glacial cycles (DeChaine and Martin 2006; Knowles and Carstens 2007) are well-known determinants of the geographical distribution both of populations and of genetic variation in animals and plants of the Northern Hemisphere (Hewitt 1996; Riddle 1996; Byun et al. 1997; Knowles 2001; Lovette 2005).

The intrinsic potential of some organisms to traverse physical and geological features that are barriers to dispersal for other species (e.g., Lessios et al. 1998) is arguably greatest among sessile and sedentary marine animals that have prolonged development of feeding planktonic (planktotrophic) larvae. Direct estimates of realized dispersal in species with planktotrophic development are restricted for logistical reasons to cases in which realized dispersal is limited to small spatial scales (e.g., Knowlton and Keller 1986), but inferences based on oceanography and larval duration suggest that realized dispersal can exceed 103–104 km per generation (Scheltema 1986; Shanks et al. 2003). Numerous lines of evidence suggest a correlation between the evolutionary gain or loss of planktonic larval development and the magnitude or geographical scale of neutral genetic differentiation among marine animal populations (Arndt and Smith 1998; Bohonak 1999; Kyle and Boulding 2000; Grosberg and Cunningham 2001; Hellberg et al. 2002; Kelly and Eernisse 2007; Teske et al. 2007). However, this trend conflicts with a conspicuous minority of other studies that reveal strong phylogeographic breaks in spite of prolonged planktonic larval development, or large differences in population

12

genetic structure between sympatric species with similar larval dispersal potential (Benzie 1999; Swearer et al. 1999; Forward et al. 2003; Marko 2004; Hickerson and Cunningham 2005; Crandall et al. 2008; Hamilton et al. 2008).

The concordance between the geographical distributions of populations (including patterns such as range disjunctions) and of population genetic variation (phylogeographic breaks) can be used to test biogeographic hypotheses (Avise 2000). Range disjunctions that coincide with phylogeographic breaks are consistent with the effects of geological or physical processes (like glaciations) that limit dispersal, and implicate dispersal barriers in the origin of the disjunct range (e.g., Muñoz-Salazar et al. 2005). Alternatively, range disjunctions that do not correspond to phylogeographic breaks suggest gene flow across the disjunction, and implicate ecological effects such as the distribution of suitable habitat, habitat selection by recruiting propagules, juvenile mortality, or recent extirpation (rather than dispersal barriers) in the origin of the disjunct range. Comparative analyses of sympatric species that share the same range disjunction have revealed mixtures of results that match both of these patterns (Bernardi et al. 2003; Ayre et al. 2009).

Here we use a large suite of mitochondrial and nuclear genetic markers to test the strength of the concordance between biogeographic and phylogeographic discontinuities in the bat star Patiria miniata (formerly Asterina miniata; O’Loughlin and Waters [2004]) from the northeast Pacific. Bat stars are abundant in intertidal and shallow subtidal marine communities in sheltered waters of southeast Alaska, British Columbia, central and southern California, and Baja California (Fisher 1911; Lambert 2000), but are strikingly rare in or absent from the outer coasts of Washington, Oregon, and northern California between Cape Flattery, WA (48°38′N, 124°71′W) and Cape Mendocino, CA (40°26′N, 124°24′W) (Kozloff 1983) (Fig. 1). The northern part of the P. miniata range includes the transition between the Oregonian and Aleutian zoogeographical provinces (Briggs 1974), and the range disjunction includes the southern extent of the Wisconsin glaciation on the Pacific coast of North America at about 48°N latitude (Pielou 1991).

Comparative population genetic studies of this community (Hellberg 1996; Arndt and Smith 1998; Burton 1998; Rocha-Olivares and Vetter 1999; Kyle and Boulding 2000;

13

Dawson 2001; Edmands 2001; Hellberg et al. 2001; Hickerson and Ross 2001; Marko 2004; Sotka et al. 2004; Hickerson and Cunningham 2005; Cassone and Boulding 2006; Harley et al. 2006; Wilson 2006; Kelly and Eernisse 2007; Lee and Boulding 2007; Polson et al. 2009) have typically revealed evidence of either (1) strong north-south genetic differentiation only in species that lack a prolonged planktonic larval stage (with low intrinsic dispersal potential); or (2) broad genetic homogeneity, often in species with long-lived planktonic larvae, but sometimes in species lacking long-distance dispersal potential (reviewed by Marko, 2004). The first pattern has usually been interpreted as evidence of north-south vicariance, with persistence in northern and southern ice-free refuges during the Pleistocene (Byun et al. 1997; Holder et al. 1999; Smith et al. 2001; Hetherington et al. 2003). The second pattern likely arises from either long-term gene flow across these geological and oceanographic boundaries, or a short recent history of northern populations wherein northern extirpation was followed by subsequent Holocene range expansion out of southern refuges (Edmands 2001).

We use our population genetic data to test two working hypotheses about the factors responsible for the range disjunction in P. miniata. Coincidence of the P. miniata distributional gap with both the southern extent of the last glacial maximum and a significant phylogeographic break would be consistent with a history of vicariant isolation of bat star populations north and south of the range disjunction, and would be consistent with slow and incomplete range expansion into the formerly glaciated areas as the cause of the range disjunction. In contrast, population genetic homogeneity across the range disjunction would be consistent with larval dispersal and gene flow across the disjunct part of the range. Such a result would implicate benthic ecological factors, such as the distribution or larval selection of suitable benthic habitat, as the cause of the distributional gap. The results allow us to disentangle the different roles of past vicariant events and contemporary ecological processes in establishing and maintaining a striking range disjunction in the sea.

14

Methods

Population Sampling

We obtained tissue samples (tube feet) preserved in 70–95% ethanol from 423 individual sea stars collected from 14 locations in southeast Alaska, Haida Gwaii (the Queen Charlotte Islands), Vancouver Island, and California (Table 1, Fig. 1). Based on preliminary results from analysis of mtDNA sequences for populations sampled in 2004 (BA, BOD) and 2005 (VC, DI, RS, MC, TO, FB, HOP, CAR, SB), we focused our analysis of microsatellite and nuclear intron sequence variation on a subset of those populations plus additional populations from Vancouver Island (WH) and Haida Gwaii (LN, TA) sampled in 2006. In total, we genotyped 253 individuals from 13 populations for mtDNA, 408 individuals from 12 populations for microsatellites, and 172–180 individuals from eight populations for the nuclear introns (Table 1).

Genetic Data Collection

Microsatellites

We extracted DNA from tube feet using either a CTAB protocol (Grosberg et al. 1996) or a simple proteinase K digestion of a single tube foot in water (Addison and Hart 2005a). We used these DNA extractions in PCR to amplify and characterize allele size variation at ATG or CAG microsatellite loci. We used PCR cocktails, thermal cycling conditions, electrophoresis methods, and allele size estimation software to obtain genotypes for seven loci (B11, B201, B202, C8, C204a, C113, and C210) as described previously (Keever et al. 2008). mtDNA

We used the same CTAB protocol noted above to extract genomic DNA. We used primers from Colgan et al. (2005) to amplify part of the mitochondrial genome that contains five transfer RNA genes and the 5′ end of the cytochrome c oxidase subunit I (COI) gene (Hart et al. 1997). Thermal cycling conditions were 90° (2:00), 55° (0:40), 72° (2:00) for one cycle; 90° (0:30), 55° (0:30), 72° (1:40) for 30 cycles; and 90° (0:40), 55° (0:40), 72° (7:00) for one cycle. Amplicons were checked by agarose gel electrophoresis, purified by sodium acetate precipitation, and directly sequenced with the

15

forward primer (Hart et al. 1997). Sequences were proofread in 4Peaks v. 1.6 (A. Griekspoor and T. Groothuis, mekentosj.com), aligned in Clustal W using the default settings (Thompson et al. 1997), and trimmed to standard length (369 bp; GenBank accession numbers EF165733-EF165790; EF165792-EF165971; FJ939314-FJ939328).

Introns

Genomic DNA used for nuclear DNA sequence analysis was extracted using the Qiagen DNeasy kit (Qiagen, Valencia, CA). We used primers from Jarman et al. (2002) to amplify and sequence an intron in the alpha subunit of the ATP synthetase gene (ATPS). From these preliminary sequences, we made and used two P. miniata-specific primers PMATPSF9 (5′-TAAGGCCGTGGATAGTCTGG) and PMATPSR669 (5′-

TGATGGTGTCAATGGCTACAG) that were designed to amplify a ∼670 bp fragment. Thermal cycling conditions for these primers were 95° (3:00) for one cycle; 94° (0:30), 58.5° (0:30), 72° (0:30) for 40 cycles; and 72° (20:00) for one cycle. Amplicons were then cleaned using the QIAquick PCR Purification Kit (Qiagen) and directly sequenced in both directions. Amplicons from heterozygous individuals were cloned into the pZero plasmid vector using a custom TA cloning system, and 5–8 clones per individual were then sequenced and compared directly to the heterozygous genotype sequence to ascertain each allele and correct all cloning errors. All sequences were edited in Sequencher v. 4.8 (Gene Codes), aligned using Clustal W in MEGA v. 3.0 (Kumar et al. 2004), then aligned again by eye and trimmed to standard length (635 bp; GenBank accession numbers FJ850593-FJ850958).

We amplified and sequenced a second intron from the glucose-6-phosphate isomerase gene (GPI). We used degenerate primers designed from an alignment of genomic GPI sequences from Strongylocentrotus purpuratus, Danio rerio, Canis familiaris, and Bufo melanostictus to obtain preliminary sequence data. From these preliminary sequences, we designed two asterinid-specific primers GPIFN4 (5′ GCCAAGCACTTTGTBGCCCT) and GPIR28M (5′ TCCCARAAVGGAAACATRTTWTCCTTGT). Thermal cycling conditions for these primers were 95°(3:00) for one cycle; 94° (0:30), 54–57° (0:45), 72° (0:30) for 40 cycles; and 72° (20:00) for one cycle. Amplicons (∼240–480 bp) were then cleaned using the QIAquick kit and direct sequenced in both directions. Alleles of some

16

heterozygous individuals could be inferred parsimoniously from comparison to known alleles previously sampled from homozygous individuals. Amplicons of all other heterozygotes were cloned using the CloneJet (Fermentas, Burlington, ON, Canada) blunt end cloning kit and compared directly to the heterozygous genotype sequence. All sequences were edited and aligned as described above (476 bp; GenBank accession numbers FJ850243-FJ850592).

Quantitative Analysis

Polymorphism

For each population sample (Table 1), we calculated allelic frequencies, expected and observed heterozygosity, linkage disequilibrium (for microsatellites), and haplotype and nucleotide diversity (for mtDNA and introns) using Arlequin v. 3.11 (Excoffier et al. 2005). For the microsatellites, we calculated allelic richness by standardizing to the smallest sample size (n= 15; Table 1) using the rarefaction method in FSTAT v. 2.9.3 (Goudet 1995). Differences in allelic richness between groups of populations identified by clustering analysis in STRUCTURE (see below) were tested by a whole-sample permutation test. For the three sequenced loci, we used Arlequin to calculate Tajima's D and Fu and Li's F to test for departures from neutral variation in allelic diversity relative to the number of segregating sites (Tajima 1989) or number of alleles (Fu and Li 1993; Simonsen et al. 1995).

Heuristic measures of population structure

We used the clustering method in STRUCTURE v. 2.1 (Pritchard et al. 2000) with or without geographical priors to identify genotype groupings that minimize Hardy- Weinberg disequilibrium and linkage disequilibrium by assigning individual multilocus genotypes to k groups. We analyzed the microsatellite genotypes alone (n= 408) and the combined microsatellite and GPI genotype data for individuals genotyped for both marker classes (n= 138; we did not include ATPS in this combined analysis because of the very high allelic diversity at this locus and the limited scope for measuring linkage with alleles at other loci). For each data set, we estimated the number of populations sampled (k) under an admixture model without geographical priors, with allele frequencies correlated between populations. Admixture proportions were estimated from

17

the dataset and fixed for all populations. We ran Bayesian MCMC searches of 1,000,000 steps with a burn-in of 250,000. For each analysis, we carried out seven independent runs for each value of k up to k= 5 populations. We used the method of Evanno et al. (2005) to find the best-fit value of k. These analyses identified two significant genotype clusters (k= 2) that were clearly segregated into a northern (Alaska, Haida Gwaii) and a southern (Vancouver Island, California) group.

We also examined spatial associations within haplotype networks based on the ancestor–descendant relationships among mtDNA and intron sequences. The very high haplotype diversity in ATPS and GPI samples included many apparent recombinants. Because recombination obscures ancestor–descendant relationships among sequences, we used DnaSP v. 4.20 (Rozas et al. 2003) to identify the largest contiguous block of nucleotide sites for each intron alignment that satisfied the four-gamete test and was therefore assumed to experience no recombination (e.g., Woerner et al. 2007). For ATPS, this largest nonrecombining block consisted of nucleotide sites 202–314 (∼18% of the original sequence alignment); for GPI, we analyzed nucleotide sites 85–427 (∼72% of the original sequence alignment). We used median-joining networks constructed by the algorithm of Bandelt et al. (1999) implemented in Network v. 4.5 using the default parameter values (equal character weighting = 10; epsilon = 0; distance criterion = connection cost) and simplified by using the MP algorithm of Polzin and Daneshmand (2003). We resolved one ambiguity (a closed circuit) in the resulting mtDNA network in favor of ancestor–descendant relationships between common haplotypes and singletons.

Population differentiation

We calculated fixation indices (F-statistics) among all populations, between pairs of populations (FST), and among individuals within populations (FIS), using Arlequin. We used Mantel tests in Arlequin to characterize isolation by distance as the correlation between population differentiation (pairwise FST) and straight-line geographical distance. All F-statistics were computed by the method of Weir and Cockerham (1984), and F- values significantly different from zero were identified by comparison to results from 10,000 permutations of the data (Raymond and Rousset 1995). We tested the statistical significance of pairwise FST values by genotypic permutation using a G-test implemented

18

in FSTAT. We adjusted the critical P-values for these tests using sequential Bonferroni corrections.

We asked whether microsatellite allele size contributed significant additional information to estimates of population structure (RST) compared to estimates based on allele frequencies alone (FST). We calculated RST for each locus and compared the results to an expected RST distribution generated by 10,000 allele-size permutations in SPAGeDi v. 1.2.1 (Hardy and Vekemans 2002).

To test for regional subdivision of sequence and microsatellite diversity, we used analysis of molecular variance (AMOVA) in Arlequin. We used 10,000 permutations of the data to identify measures of differentiation (Φ) significantly different from zero. We partitioned this variance into differences between northern and southern population groups (ΦCT) and differences among populations within each group (ΦSC). We carried out two of these analyses: one based on population groups north (Alaska, Haida Gwaii, Vancouver Island) and south (all California sites) of the range disjunction; and a second based on the STRUCTURE results, which strongly suggested a phylogeographic break between population groups north (Alaska, Haida Gwaii) and south (Vancouver Island, California) of Queen Charlotte Sound (see Fig. 1).

Gene flow and effective population size

We carried out coalescent analyses of migration rate (m) and effective population size (Ne) for the mtDNA and intron sequences using the Bayesian method in MIGRATE- N v. 2.4 (Beerli and Felsenstein 2001; Beerli 2006). We tried to avoid the confounding effects of recombination on each analysis of intron sequences by using only the largest nonrecombining block of nucleotide sites identified by DnaSP (see above). Because our STRUCTURE and haplotype network analyses identified an unexpected and strong phylogeographic break at Queen Charlotte Sound in northern British Columbia, we were specifically interested in contrasting gene flow across Queen Charlotte Sound relative to gene flow across the range disjunction. We therefore organized our MIGRATE analyses to match our AMOVA analyses. We pooled all population samples from California, Vancouver Island, Haida Gwaii, and Alaska, into four regional samples, and estimated the 16 corresponding parameter values (four effective population sizes, 12 pairwise migration rates). For this number of parameters, it was not possible to run MIGRATE

19

using our full population samples because the gene genealogies were prohibitively large. Consequently, we limited the analysis to the 102 individuals from Alaska (n= 17), Haida Gwaii (n= 21), Vancouver Island (n= 30), and California (n= 34), for which we had characterized all loci. The MCMC search was based on a chain of 20,000,000 (ATPS, GPI) or 50,000,000 (mtDNA) steps sampled every 40 or 100 steps for a total of 500,000 samples (with a burn-in of 50,000 samples). We used exponential prior distributions for migration rate (0, 1000) and effective population size (0, 0.01) because such priors often explore parameter space much more efficiently than other prior distributions. We used adaptive heating of three additional chains to search more effectively the genealogy space for the highly polymorphic ATPS and GPI data (but not for mtDNA). We repeated this intron analysis using uniform (rather than exponential) priors to test for any bias caused by the exponential priors, and because parameter estimates under uniform (but not other) prior distributions can also be considered maximum likelihood estimates. We assessed convergence by comparing results from multiple runs. We analyzed the two introns separately and in parallel to obtain a single two-locus estimate of parameter values. We characterized effective population size as θ= 4Neµ, and expressed the magnitude of gene flow estimated from these analyses as the number of migrants per generation M=θm/µ. We used the “estimate” option in MIGRATE to infer the mutation rate parameter µ that scales both θ and M (Beerli 2006).

An alternative approach to accounting for the effects of recombination on such coalescent analyses is to estimate the recombination rate as part of the MCMC process using the whole sequence alignment (rather than analyzing a smaller portion of the alignment that is inferred to be nonrecombining). We used LAMARC v. 2.1.3 (Kuhner 2006) to estimate simultaneously recombination rate (the ratio of recombination events per site per generation to mutations per site per generation), migration, and effective population size for both the mtDNA sequences (which presumably were nonrecombining) and the full intron sequence alignments. Estimating recombination rate greatly increased the computational time for these MCMC analyses, so we further trimmed the data set (and the allelic genealogies) to a randomly selected group of 10 individuals (20 alleles) from each of the four pooled population samples (80 alleles total). We used a short chain of 1,250,000 steps sampled every 50 steps for a total of 25,000 samples (with a burn-in of 2500) and a long chain of 25,000,000 steps sampled every

20

100 steps for 250,000 samples (with a burn-in of 25,000). We used logarithmic prior distributions for recombination rate (0.001, 10), migration rate (0.01, 3000), and effective population size (0.00001, 10). We assessed convergence by monitoring the effective sample size for each parameter and by confirming similar results from multiple runs.

Results

Polymorphism

We found 3–12 microsatellite alleles per locus (summed across all 12 populations), with standardized allelic richness of 3.9–5.5 (averaged across loci; Table 1). Observed and expected heterozygosities were ∼0.3 to ∼0.6 across all populations. We found homozygote excesses (high FIS) significantly greater than zero in most populations (0.13 < FIS < 0.44) and at most loci (0.09 < FIS < 0.70), a characteristic of many broadcast-spawning marine invertebrates (Addison and Hart 2005b). Five of 2856 total single-locus genotypes were null homozygotes. We found one case of a pair of loci (B11 and B202) in one population (Louise Narrows) in linkage disequilibrium; no other loci were in linkage disequilibrium within any other populations.

The mtDNA sequence alignment included 28 unique haplotypes that differed by up to five substitutions. There were 2–11 haplotypes per population, with haplotype diversities of 0.13–0.75 (Table 1). Nucleotide diversity was low (π < 0.004) for these slow-evolving tRNA sequences but varied about four-fold among populations.

We found 143 GPI alleles among 172 individuals, with 9–36 alleles per population that differed by up to 14 substitutions. Allele size ranged from 242 to 415 bp. The sequence alignment included 318 gap sites and was consistent with 70 insertion– deletion events of 15.0 bp average length. Both haplotype (0.74–0.99) and nucleotide (0.047–0.067) diversity were correspondingly high.

The ATPS sequences exhibited the greatest overall diversity: 252 alleles among 180 individuals, and 19–50 alleles per population, with up to 30 substitution differences. Allele size varied from 584 to 616 bp. The alignment included 158 gap sites, consistent with 59 insertion–deletion events of 4.2 bp average length. Most individuals in most

21

populations were heterozygotes (h >> 0.9) in part due to the many short insertion– deletion differences. Nucleotide diversity was slightly lower (0.033–0.042) than GPI diversity in the same samples.

Although there was no detectable association between latitude and mitochondrial diversity, the nuclear markers were generally less diverse in northern compared to southern populations. The sample site latitude (Table 1) was negatively correlated with average allelic richness for microsatellites (r=−0.79, P= 0.002) and with haplotype diversity for GPI (r=−0.71, P= 0.047). For the very high ATPS haplotype diversities, the correlation was slightly similar and nonsignificant (r=−0.58, P= 0.135), and there was no association between latitude and mitochondrial diversity. A permutation test suggested that Haida Gwaii and Alaska samples had significantly lower mean microsatellite allelic richness (4.0) than samples from California and Vancouver Island (5.3; P= 0.003). These differences in diversity appeared to be consistent with neutral variation (or at least not the product of large and easily detected nonneutral effects). After correction for 58 simultaneous tests of neutrality (D or F) across three loci, eight or 13 populations, and two methods, four values of F (from −5.56 to −22.04) for ATPS from Bodega Bay and for mtDNA from Louise Narrows, Tofino, and Bodega Bay, were significantly different from zero (neutral expectation). No values of D were significantly different from zero. The bias toward a few significant F (rather than D) values could reflect the information from invariant sites incorporated into the F statistic (329/635 bp for ATPS, 273/476 bp for GPI, 346/369 bp for mtDNA). The result suggests some isolated cases of nonneutral deficit of haplotypes relative to segregating sites or number of alleles consistent with recent population expansion.

Population Structure

Replicated analysis with the admixture model in STRUCTURE identified two significant clusters of microsatellite genotypes with very different frequencies in two geographic regions. One group of samples (shaded orange in Fig. 2A) occurred mainly in California and Vancouver Island, and the other (shaded purple in Fig. 2A) was mainly restricted to Haida Gwaii and Alaska. The marginal likelihood improvements with higher values of k (Fig. 2C) were not significant: the Δk statistic of Evanno et al. (2005) showed a strong mode at k= 2 that was about two orders of magnitude higher than at k= 3.

22

Adding the GPI genotypes to the microsatellite data set confirmed the existence of just two significant clusters of genotypes with the same pronounced geographical difference in their distribution (Fig. 2B, C).

The geographical distribution of the mtDNA clades exhibited a similar north– south grouping. Most individuals (88%) had one of three haplotypes that differed from each other by one or two substitutions; four other shared haplotypes occurred in 2–5 individuals. Of the three most common haplotypes, only one (n= 42) appeared in all four geographical regions from Alaska to California (Fig. 1). The single most common haplotype (n= 120) was entirely absent from our large sample of sea stars from Alaska and was rare in Haida Gwaii. The second most common haplotype (n= 59) was restricted to these two northern regions, as were two other shared haplotypes. Altogether, 70 of 96 DNA sequences from Alaska and Haida Gwaii samples were unique to those two regions north of Queen Charlotte Sound. South of Queen Charlotte Sound, the Vancouver Island and California populations shared two of the three most common haplotypes as well as two of the rarest haplotypes.

Networks of ATPS and GPI alleles based on analysis of the complete alignment (with recombination) or based on the largest nonrecombining block of nucleotide sites were considerably more complex (data not shown). Most alleles were relatively rare and separated from each other by large numbers of substitutions and insertion–deletion changes. Some geographical structure similar to the mtDNA results was evident in the form of shared alleles found only in Alaska and Haida Gwaii, or only in Vancouver Island and California samples.

Population differentiation

Global measures of deviation from equilibrium genotype and allele frequencies differed significantly from zero for all loci and suggested a considerable history of genetic drift within populations and changes in allele frequencies among some populations

(Table 2). We used FST rather than RST in these comparisons for the microsatellite data because results from the allele-size permutation test (data not shown) suggested that allele size did not add useful information for any of the seven loci.

23

Population pairwise FST values were typically lower in comparisons among populations north or south of Queen Charlotte Sound, and considerably higher in comparisons between populations separated by Queen Charlotte Sound (see Appendix

A for mean pairwise FST values from microsatellite data, and Supplementary Materials from Keever et al. 2009 for pairwise FST values of all other markers). Mantel tests identified weak signals of isolation-by-distance for microsatellites (r= 0.26, P= 0.057) and mtDNA (r= 0.32, P= 0.022), but not for ATPS (r= 0.05, P= 0.325), or GPI (r= 0.21, P= 0.131). None of these correlations were significantly different from zero after Bonferroni correction for four simultaneous tests.

In an AMOVA that grouped populations into regions north and south of the range disjunction, differentiation between groups was small (0.01 ≤ΦCT≤ 0.07), was not significantly different from zero, and accounted for 0.1–7.5% of molecular variance (Table 3) for all marker types. A large proportion of the molecular variance (up to 21% for mtDNA) was due to differences among populations within these two groups, in particular to the many large pairwise differences between populations from Vancouver Island and those from Alaska and Haida Gwaii.

In contrast, when we grouped populations north and south of Queen Charlotte Sound (the population grouping found in our STRUCTURE analysis above), we found considerably greater differentiation between regional groups (as high as ΦCT= 0.33; Table 3). After correction for simultaneous comparisons for four markers, only the estimate of between-group differentiation for the very highly polymorphic ATPS (ΦCT= 0.03) was not significantly different from zero. For all four markers, this alternative grouping improved the among-groups sum of squares by a factor of ∼2 (for highly polymorphic introns) to ∼5 (for microsatellites and mtDNA).

Gene Flow and Effective Population Size

The analyses above based on allele frequencies and genetic distances consistently revealed one major phylogeographic break in P. miniata. We used the coalescent analyses to explore two aspects of this population genetic structure: (1) demographic differences among populations manifested as variation in effective

24

population size; and (2) asymmetries in gene flow that might underlie patterns of allele frequency similarities between populations.

Some of the MIGRATE results suggested large differences in effective population sizes (Fig. 3), consistent with the frequency-based observations of genetic diversity (Table 1). In particular, the most probable estimates of θ based on introns from Alaska samples were one to three orders of magnitude lower than those for introns from all populations to the south (Fig. 7B). Among the latter populations, θ values for Vancouver Island and California populations were significantly higher than for Haida Gwaii. We found similar parameter estimates using uniform priors (Fig. 3) and using exponential priors (data not shown). The analyses of much less polymorphic mtDNA did not reveal such significant differences among population sizes (Fig. 3A).

Analysis of gene flow from the mtDNA data did not identify any significant differences in migration rates among these populations. In general, the MCMC search converged on coalescent patterns that emphasized widespread ancestral polymorphisms within populations to account for allele or haplotype sharing rather than high and variable rates of migration between populations (Fig. 3A).

In contrast, the considerably greater allelic diversity and higher divergence among alleles of the two introns allowed MIGRATE to identify several significant differences in migration rates, including higher M from Vancouver Island to California than from California to Alaska or Haida Gwaii, or from Vancouver Island to Alaska (Fig. 3B). We found similar results in separate analyses of the two intron alignments individually (data not shown). We obtained these more informative results despite the relatively short sequence alignments (113–343 bp) that could be identified as nonrecombining blocks suitable for MIGRATE analysis. The MIGRATE analysis of ATPS and GPI variation included low nonzero estimates of gene flow across Queen Charlotte Sound, most notably as north-to-south gene flow from Haida Gwaii to Vancouver Island (Supplementary Materials from Keever et al. 2009).

The most probable estimates (and confidence intervals) of recombination rates from LAMARC analysis of the full GPI (R= 0.46, 0.31–0.58) and ATPS (R= 1.21, 0.91– 1.68) alignments were high and repeatable in a second MCMC analysis (e.g., for GPI R=

25

0.50, 0.32–0.72). Because the ATPS alignment (635 bp) was based on longer sequences (584–616 bp) and included fewer indels than the 476 bp alignment of GPI alleles (242–415 bp), the very high ATPS recombination rate might be the more accurate estimate of the two in spite of the lower precision implied by the broader confidence interval. As expected, recombination rate estimated for mtDNA was low (R < 0.001) and not different from zero.

Parameter value estimates from these coalescent analyses were positively correlated across methods and genetic markers. For example, MIGRATE (Fig. 3A) and LAMARC converged on four similar pairs of θ estimates for the same mtDNA alignment (r= 0.52; see the Supporting Information). Similarly, the LAMARC analysis converged on 12 similar pairs of M estimates for ATPS and GPI alignments that differed in recombination rate, total length, and overall levels of polymorphism (r= 0.90).

Discussion

Previous reviews of empirical studies across oceans and taxa (Bohonak 1999; Grosberg and Cunningham 2001; Hellberg et al. 2002) have clearly articulated the expected patterns of population genetic variation for marine animals such as P. miniata with sedentary benthic adults and long-lived planktonic larvae. These patterns include (1) genetic homogeneity, perhaps with isolation-by-distance across the extremes of the range (Hellberg 1996); (2) genetic differentiation corresponding to present-day biogeographic features including physical oceanographic structure, range disjunctions, or boundaries between zoogeographic provinces that represent past or continuing barriers to dispersal (Burton 1998); or (3) population genetic signals of disjunction or vicariance that do not correspond to obvious contemporary barriers to gene flow (Benzie 1999). Identifying such patterns, documenting their relative frequencies, and understanding their causes is an important goal of evolutionary ecology and contributes to a general understanding of the opportunities for local adaptation, extirpation, and speciation in the seas (Foltz et al. 2008).

26

Population Genetic Patterns in P. miniata

Two consistent and strong population genetic patterns emerged from our multilocus analysis of genetic structure in the bat star. First, most loci and methods of analysis showed strong and highly significant genetic differentiation between P. miniata populations north and south of Queen Charlotte Sound, including populations from southern Haida Gwaii and northern Vancouver Island separated by just a few hundred kilometers. Second, there was no evidence from any locus or method for genetic differentiation across the majority of the bat star geographic range from Vancouver Island to southern California that includes the broad distributional gap in Washington, Oregon, and northern California.

Our success in identifying these patterns from extremely variable intron sequence data depended on the method of analysis. Under infinite allele models based on allele and genotype frequencies (such as AMOVA) these data sometimes failed to identify the major phylogeographic break that was clearly and consistently evident in data from less polymorphic markers (such as mtDNA; Table 3). However, the introns were considerably more informative than other markers under coalescent models that emphasize allelic genealogies (such as MIGRATE; Fig. 3) because such methods use the magnitude and history of allelic divergence to estimate demographic parameters. Coalescent analysis of the intron data suggested large and highly significant differences in the effective population size that were consistent with severe population bottlenecks in Alaska and Haida Gwaii. We also found 10-fold higher rates of gene flow across the range disjunction in comparison to some rates of gene flow across Queen Charlotte Sound. Neither of these patterns was evident in similar analyses of the mtDNA data, probably due to the very shallow coalescent depth of the mtDNA phylogeny (Fig. 1). Such differences in information content across markers and models highlight the specific utility of adding highly variable nuclear sequences to phylogeographic studies (in coalescent analyses) and the importance of comparing such data against less variable markers (in frequency-based analyses).

In the absence of genetic information, the coincidence of the P. miniata geographic range disjunction and the southern extent of the last North American glaciation on the Pacific coast might imply that bat star populations gradually expanded

27

out of northern and southern glacial refuges, leaving descendants that established the extant populations north and south (respectively) of the range disjunction. However, the discordance between the location of the phylogeographic break and the geographic range disjunction—and the genetic affinity between Vancouver Island and California populations—allows us to reject this simple prediction. Our data and analyses are consistent with either high gene flow across the range disjunction or recent colonization of Vancouver Island by California migrants. A third scenario (which would be difficult to reject on the basis of genetic data alone) is recent local extirpation of bat star populations in Washington, Oregon, and northern California that fragmented a formerly more continuous species range. However, under any of these scenarios, the origin and maintenance of the range disjunction seems to require an ecological rather than a geological or historical explanation.

The origin and maintenance of the phylogeographic break at Queen Charlotte Sound may be more easily associated with geological and oceanographic dispersal barriers. A coalescent estimate of population divergence time between bat stars from Alaska and Vancouver Island using mtDNA and six anonymous nuclear DNA sequence markers (T. McGovern, C. Keever, M. Hart, C. Saski, and P. Marko, unpubl. data) is ∼80,000 years ago. This divergence time is consistent with population isolation due to changes in climate, sea level, and oceanographic circulation associated with the last glacial episode (or with other ecological or geological processes operating on the same time scale but not associated with glacial cycles). The phylogeographic break that we observed coincides with a west-to-east surface current (the North Pacific Current) that diverges at about the same latitude to form the northward Alaska and southward California current systems (Cummins and Freeland 2007). High-resolution models of density- and wind-driven circulation in the northeast Pacific (Foreman et al. 2008) suggest that exchange of P. miniata larvae across Queen Charlotte Sound may be limited. This physical current structure could contribute to the maintenance of a phylogeographic break established by Pleistocene climate change in spite of the prolonged 6- to 10-week period of planktonic larval growth and development in bat stars (Strathmann 1987; Rumrill 1989; Basch 1996).

28

Comparison to Close Neighbours and Close Relatives

Bat stars share a similar range disjunction with two other rocky intertidal species, the kelp Eisenia arborea and the turban snail Astraea gibberosa. Future phylogeographic studies of these species across the shared range disjunction could test the hypothesis of a shared ecological cause for this unusual distribution (if all three species show the same population genetic pattern across the disjunction; Bermingham and Moritz 1998; Avise 2000).

Bat stars also share a similar phylogeographic break with a taxonomically heterogeneous group of snails, fishes, and sea cucumbers (see Marko 2004; Hickerson and Cunningham 2005) that all lack a long-lived planktonic larval stage in the life cycle. Our study is the first from this well-known zoogeographic transition zone to show strong population genetic differentiation in a species with long-lived feeding larvae and high intrinsic dispersal potential. Among members of the same community with planktotrophic development, our results contrast with those for the abundant keystone predator Pisaster ochraceus. Harley et al. (2006) found no significant mtDNA differentiation in this sea star species at Queen Charlotte Sound (or at any other well-known phylogeographic breaks such as Point Conception in California; Dawson 2001). Pisaster ochraceus and P. miniata have broadly overlapped geographical distributions but occur in different and on different substrates (Lambert 2000). If the phylogeographic break between bat star populations was initiated by late Pleistocene glaciations and maintained by the North Pacific Current (as suggested above), then these two species may also differ in their response to Pleistocene climate change (and population history) or in their response to present-day ocean currents and the tendency to cross Queen Charlotte Sound during larval development. The nature of such differences is unknown.

Bat stars are members of a clade of shallow-water sea stars (Family Asterinidae) in which highly derived mating systems and modes of dispersal have evolved in parallel in several genera (Hart et al. 1997; Byrne 2006; Keever and Hart 2008). As expected, strong population genetic structure has been found in asterinid species with benthic development of brooded larvae (Hunt 1993; Matsuoka and Asano 2003; Waters et al. 2004; Baus et al. 2005; Colgan et al. 2005; Sherman et al. 2008). These results suggest that evolutionary changes in mode of larval development are important determinants of

29

population genetic structure in asterinids. However, the discovery of unexpected and strong phylogeographic breaks in P. miniata and another asterinid (from New Zealand; Waters and Roy 2004) with planktotrophic development suggests that extrinsic physical or geological barriers and historical processes might also have significant effects on population genetic variation in asterinids. Phylogenetic comparative analyses (e.g., Kyle and Boulding 2000; Collin 2001; Kelly and Eernisse 2007; Teske et al. 2007; Sherman et al. 2008) can complement community-level analyses of the contribution of biogeographic history, mode of development, and ecological process to the evolution of population genetic structure. Such phylogenetic comparative analyses using multiple asterinid lineages and genetic markers could be used to estimate the phylogenetic correlations between evolutionary changes in life history and population genetic structure, and the residual contributions of other (geological or ecological) processes in shaping the evolution of population genetic structure.

30

References

Addison JA, Hart MW (2005a) Colonization, dispersal, and hybridization influence phylogeography of North Atlantic sea urchins (Strongylocentrotus droebachiensis). Evolution 59:532–543.

Addison JA, Hart MW (2005b) Spawning, copulation and inbreeding coefficients in marine invertebrates. Biology Letters 1:450–453.

Arndt A, Smith MJ (1998) Genetic diversity and population structure in two species of sea cucumber: differing patterns according to mode of development. Molecular Ecology 7:1053–1064.

Avise JC (2000) Phylogeography: the history and formation of species. Harvard Univ Press, Cambridge, MA.

Ayre DJ, Minchinton TE, Perrin C (2009) Does life history predict past and current connectivity for rocky intertidal invertebrates across a marine biogeographic barrier? Molecular Ecology 14:1887–1903.

Bandelt HJ, Forster P, Rohl A (1999) Median-joining networks for inferring intraspecific phylogenies. Molecular Biology and Evolution 16:37–48.

Basch LV (1996) Effects of algal and larval densities on development and survival of asteroid larvae. Marine Biology 126:693–701.

Baus E, Darrock DJ, and Bruford MW (2005) Gene-flow patterns in Atlantic and Mediterranean populations of the Lusitanian sea star Asterina gibbosa. Molecular Ecology 14:3373–3382.

Beerli, P 2006 Comparison of Bayesian and maximum likelihood inference of population genetic parameters Bioinformatics 22:341–345.

Beerli P, Felsenstein J (2001) Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc Natl Acad Sci USA 98:4563-4568.

Benzie JAH (1999) Genetic structure of coral reef organisms: ghosts of dispersal past. Amer Zool 39:131–145.

Bermingham E, Moritz C (1998) Comparative phylogeography: concepts and applications Molecular Ecology 7:367–369.

Bernardi G, Findley L, Rocha-Olivares A (2003) Vicariance and dispersal across Baja California in disjunct marine fish populations. Evolution 57:1599–1609.

Bohonak AJ (1999) Dispersal, gene flow, and population structure. Quarterly Review of Biology 74:21–45.

31

Briggs JC (1987) Antitropical distribution and evolution in the Indo-West Pacific ocean. Syst Zool 36:237–247.

Briggs JC (1974) Marine zoogeography. McGraw-Hill, New York, NY.

Burton RS (1998) Intraspecific phylogeography across the Point Conception biogeographic boundary. Evolution 52:734–745.

Byrne M (2006) Life history diversity and evolution in the Asterinidae. Integr Comp Biol 46:243–254.

Byun SA, Koop BF, Reimchen TE (1997) North American black bear mtDNA phylogeography: implications for morphology and the Haida Gwaii glacial refugium controversy. Evolution 51:1647–1653.

Cassone BJ, Boulding EG (2006) Genetic structure and phylogeography of the lined shore crab, Pachygrapsus crassipes, along the northeastern and western Pacific coasts. Marine Biology 149:213–226.

Chia, F-S, Walker CW (1991) Echinodermata: Asteroidea pp 301–353 in A C Geise, J S Pearse and V B Pearse, eds Vol 6 Reproduction of marine invertebrates Echinoderms and lophophorates. The Boxwood Press, Pacific Grove, CA.

Colgan DJ, Byrne M, Rickard E, Castro LR (2005) Limited nucleotide divergence over large spatial scales in the asterinid sea star exigua. Marine Biology 146:263–270.

Collin R (2001) The effects of mode of development on phylogeography and population structure of North Atlantic Crepidula (Gastropoda: Calyptraeidae). Molecular Ecology 10:2249–2262.

Crandall, ED, Frey MA, Grosberg RK, Barber PH (2008) Contrasting demographic history and phylogeographical patterns in two Indo- Pacific gastropods Molecular Ecology 17:611–626.

Cummins PF, Freeland HJ (2007) Variability of the North Pacific current and its bifurcation. Progr Ocean 75:253–265.

Darwin, C (1859) On the origin of species by means of natural selection. John Murray, London.

Dawson MN (2001) Phylogeography in coastal marine animals: a solution from California? Journal of Biogeography 28:723–736.

DeChaine EG, Martin AP (2006) Using coalescent simulations to test the impact of Quaternary climate cycles on divergence in an alpine plant-insect association. Evolution 60:1004–1013.

32

Edmands S (2001) Phylogeography of the intertidal copepod Tigriopus californicus reveals substantially reduced population differentiation at northern latitudes. Molecular Ecology 10:1743–1750.

Ekman S (1953) Zoogeography of the sea. Sidgwick and Jackson, London.

Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology 14:2611–2620.

Excoffier L, Laval G, Schneider S (2005) Arlequin (Version 30): an integrated software package for population genetics data analysis. Evol Bioinform Online 1:47–50.

Fisher WK (1911) Asteroidea of the north Pacific and adjacent waters Part 1 Phanerozonia and Spinulosa. Bull US Natl Mus 76:1–419.

Foltz DW, Nguyen AT, Kiger JR, Mah CL (2008) Pleistocene speciation of sister taxa in a North Pacific clade of brooding sea stars (Leptasterias). Marine Biology 154:593–602.

Foreman MGG, Crawford WR, Cherniawsky JY, Galbraith J (2008) Dynamic ocean topography for the northeast Pacific and its continental margins. Geophysical Research Letters 35:L22606.

Forward RB, Tankersley RA, Welch JM (2003) Selective tidal stream transport of the blue crab Callinectes sapidus: an overview. Bulletin of Marine Science 72:347– 365.

Fu, Y-X, Li W-H (1993) Statistical tests of neutrality of mutations. Genetics 133:693–709.

Goudet J (1995) FSTAT version 12: a computer program to calculate F statistics. Journal of Heredity 86:485–486.

Grosberg RK, Cunningham CW (2001) Genetic structure in the sea: From populations to communities pp 61–84 in M D Bertness, S D Gaines and M E Hay, eds Marine community ecology Sinauer Associates, Sunderland, MA.

Grosberg RK, Levitan DR, and Cameron BB (1996) Characterization of genetic structure and genealogies using RAPD-PCR markers: a random primer for the novice and nervous pp 67–100 in JD Ferraris and SR Palumbi, eds Molecular zoology: Advances, strategies, and protocols. Wiley-Liss, New York.

Hamilton SL, Regetz J, Warner RR (2008) Postsettlement survival linked to larval life in a marine fish. Proc Natl Acad Sci USA 105:1561–1566.

Hardy OJ, Vekemans X (2002) SPAGEDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels Molecular Ecology Notes 2:618–620.

33

Harley CDG, Pankey MS, Wares JP, Grosberg RK, Wonham MJ (2006) Color polymorphisms and genetic structure in the sea star Pisaster ochraceus. Biological Bulletin 211:248–262.

Hart MW, Byrne M, and Smith MJ (1997) Molecular phylogenetic analysis of life-history evolution in asterinid . Evolution 51:1848–1861.

Hellberg, ME (1996) Dependence of gene flow on geographic distance in two solitary corals with different larval dispersal capabilities. Evolution 50:1167–1175.

Hellberg, ME, Balch DP, Roy K (2001) Climate-driven range expansion and morphological evolution in a marine gastropod. Science 292:1707–1710.

Hellberg ME, Burton RS, Neigel JE, Palumbi SR (2002) Genetic assessment of connectivity among marine populations. Bulletin of Marine Science 70(Suppl):273–290.

Hetherington R, Barrie JV, Reid RGB, MacLeod R, Smith DJ, James TS, Kung R (2003) Pleistocene coastal paleogeography of the Queen Charlotte Islands, British Columbia, Canada, and its implications for terrestrial biogeography and early postglacial human occupation. Canadian Journal of Earth Sciences 40:1755– 1766.

Hewitt GM (1996) Some genetic consequences of ice ages, and their role in divergence and speciation. Biological Journal of the Linnean Society 58:247–276.

Hickerson MJ, and Cunningham CW (2005) Contrasting quaternary histories in an ecologically divergent sister species pair of low-dispersing intertidal fish (Xiphister) revealed by multilocus DNA analysis. Evolution 59:344–360.

Hickerson MJ, Ross JRP (2001) Post-glacial population history and genetic structure of the northern clingfish (Gobbiesox maeandricus) revealed from mtDNA analysis Marine Biology 38:407–419.

Holder KR Montgomerie, Friesen VL (1999) A test of the glacial refugium hypothesis using patterns of mitochondrial and nuclear DNA sequence variation in rock ptarmigan (Lagopus mutus). Evolution 53:1936–1950.

Hunt A (1993) Effects of contrasting patterns of larval dispersal on the genetic connectedness of local populations of two intertidal starfish, Patiriella calcar and P exigua. Marine Ecology Progress Series 92:179–186.

Jarman SN, Ward RD, Elliott NG (2002) Oligonucleotide primers for PCR amplification of coelomate introns. Marine Biotechnology 4:347– 355.

Keever CC, and MW Hart (2008) Something for nothing? Reconstruction of ancestral character states in asterinid sea star development Evol Dev 10:62–73.

34

Keever CC, Sunday J, Wood C, Byrne M, Hart MW (2008) Discovery and cross- amplification of microsatellite polymorphisms in asterinid sea stars. Biological Bulletin 215:164–172.

Keever CC, Sunday J, Puritz JB, Addison JA, Toonen RJ, Grosberg RK, Hart MW (2009) Discordant distribution of populations and genetic variation in a sea star with high dispersal potential. Evolution 63: 3214-3227.

Kelly RP, Eernisse DJ (2007) Southern hospitality: a latitudinal gradient in gene flow in the marine environment. Evolution 61:700– 717.

Knowles LL (2001) Did the Pleistocene glaciations promote divergence? Tests of explicit refugial models in montane grasshoppers. Molecular Ecology 10:691–701.

Knowles LL, and Carstens BC (2007) Estimating a geographically explicit model of population divergence. Evolution 61:477–493.

Knowlton N, Keller BD (1986) Larva which fall short of their potential: Highly localized recruitment in an alpheid shrimp with extended larval development. Bulletin of Marine Science 39:213–223.

Kozloff EN (1983) Seashore life of the northern Pacific coast Univ Washington Press, Seattle, WA.

Kuhner MK (2006) LAMARC 20: maximum likelihood and Bayesian estimation of population parameters. Bioinformatics 22:768–770.

Kumar S, Tamura K, Nei M (2004) MEGA3: Integrated software for molecular evolutionary genetics analysis and sequence alignment Brief. Bioinformatics 5:150–163.

Kyle CJ, Boulding EG (2000) Comparative population genetic structure of marine gastropods (Littorina spp) with and without pelagic larval dispersal. Marine Biology 137:835–845.

Lambert, P (2000) Sea stars of British Columbia, southeast Alaska, and Puget Sound University of British Columbia Press, Vancouver, BC.

Lee HJ, and Boulding EG (2007) Mitochondrial DNA variation in space and time in the northeastern Pacific gastropod, Littorina keenae. Molecular Ecology 16:3084– 3103.

Lessios HA, Kessing BD, Robertson DR (1998) Massive gene flow across the world’s most potent marine biogeographic barrier. Proceedings of the Royal Society of London B 265:583–588.

Lindberg, DR (1991) Marine Biotic interchange between the northern and southern hemispheres. Paleobiology 17:308–234.

35

Lovette, IJ (2005) Glacial cycles and the tempo of avian speciation. Trends in Ecology and Evolution 20:57–59.

Marko, PB (2004) ‘What’s larvae got to do with it?’ Disparate patterns of post-glacial population structure in two benthic marine gastropods with identical dispersal potential. Molecular Ecology 13:597–611.

Matsuoka N, Asano H (2003) Genetic variation in northern Japanese populations of the starfish Asterina pectinifera. Zool Sci 20:985– 988.

Munoz-Salazar R, Talbot SL, Sage GK, Ward DH, Cabello-Pasini A (2005) Population genetic structure of annual and perennial populations of Zostera marina L along the Pacific coast of Baja California and the Gulf of California Molecular Ecology 14:711–722.

O’Loughlin PM, and Waters JM (2004) A molecular and morphological revision of genera of Asterinidae (Echinodermata: Asteroidea) Mem Mus 61:1–40.

Pielou EC 1991 After the ice age: The return of life to glaciated North America Univ Chicago Press, Chicago.

Polson MP, Hewson WE, Eernisse DJ, Baker PK, Zacherl DC (2009) You say conchaphila, I say lurida: Molecular evidence for restricting the Olympia oyster (Ostrea lurida Carpenter 1864) to temperate western North America J Shell Res 28:11–21.

Polzin T, and Daneshmand SV (2003) On Steiner trees and minimum spanning trees in hypergraphs. Oper Res Lett 31:12–20.

Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945– 959.

Raymond M, Rousset F (1995) An exact test for population differentiation. Evolution 49:12808–1283

Riddle BR (1996) The molecular phylogeographic bridge between deep and shallow history in continental biotas. Trends in Ecology and Evolution 11:207–211.

Rocha-Olivares A, and Vetter RD (1999) Effects of oceanographic circulation on the gene flow, genetic structure, and phylogeography of the rosethorn rockfish (Sebastes helvomaculatus) Canadian Journal of Fisheries and Aquatic Sciences 56:803–813.

Rozas J, Sanzhez-DelBarrio JC, Messeguer X, Rozas R (2003) DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:2496–2497.

Rumrill SS (1989) Population size-structure, juvenile growth, and breeding periodicity of the sea star Asterina miniata in Barkley Sound, British Columbia. Marine Ecology Progress Series 56:37.

36

Scheltema RS (1986) Long-distance dispersal by planktonic larvae of shoalwater benthic invertebrates among central Pacific islands. Bulletin of Marine Science 39:241– 256.

Schwaninger HR (2008) Global mitochondrial DNA phylogeography and biogeographic history of the antitropically and longitudinally disjunct marine bryozoan Membranipora membranacea L (Cheilostomata): another cryptic marine sibling species complex? Molecular Phylogenetics and Evolution 49:893–908.

Shanks AL, Grantham BA, Carr MH (2003) Propagule dispersal distance and the size and spacing of marine reserves. Ecological Applications 13:S159–S169.

Sherman CD, Hunt A, Ayre DJ (2008) Is life history a barrier to dispersal? Contrasting patterns of genetic differentiation along an oceanographically complex coast. Biological Journal of the Linnean Society 95:106–116.

Simonsen KL, Churchill GA, Aquadro CF (1995) Properties of statistical tests of neutrality for DNA polymorphism data, Genetics 141:413– 429.

Smith CT, Nelson RJ, Wood CC, Koop BF (2001) Glacial biogeography of North American coho salmon (Oncorhynchus kisutch). Molecular Ecology 10:2775– 2785.

Sotka EE, Wares JP, Barth JA, Grosberg RK, Palumbi SR (2004) Strong genetic clines and geographical variation in gene flow in the rocky intertidal barnacle Balanus glandula. Molecular Ecology 13:2143– 2156.

Strathmann MF (1987) Reproduction and development of marine invertebrates of the northern Pacific coast. Univ Washington Press, Seattle, WA.

Swearer SE, Caselle JE, Lea DW, Warner RR. (1999) Larval retention and recruitment in an island population of a coral-reef fish. Nature 402:799–802.

Tajima F 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism Genetics 105:437–460.

Teske PR, Papadopoulos I, Zardi GI, DMcQuaid C, Edkins MT, Griffiths CL, Barker NP (2007) Implications of life history for genetic structure and migration rates of southern African coastal invertebrates: planktonic, abbreviated and direct development. Marine Biology 152:697– 711.

Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 24:4876–4882.

Waters JM, Roy MS (2004) Phylogeography of a high-dispersal New Zealand sea star: does upwelling block gene flow? Molecular Ecology 13:2797– 2806.

37

Waters JM, O’Loughlin PM, and Roy MS (2004) Cladogenesis in a starfish species complex from southern Australia: evidence for vicariant speciation? Molecular Phylogenetics and Evolution 32:236–245.

Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure Evolution 38:1358–1370.

Wiley, EO (1988) Vicariance biogeography. Annual Review of Ecology and Systematics. 19:513– 542.

Wilson AB (2006) Genetic signature of recent glaciation on populations of a near-shore marine fish species (Syngnathus leptorhynchus) Molecular Ecology 15:1857– 1871.

Woerner AE, Cox MP, Hammer MF (2007) Recombination filtered genomic datasets by information maximization. Bioinformatics 23:1851–1853.

38

Tables and Figures

Table 2.1. Sample locations and number of individuals genotyped per location (n) for each marker class.

For ATPS and GPI, the number of alleles sequenced=2n. Summary statistics for diversity are observed (HO), and expected (HE) heterozygosity and standardized allelic richness (RS) averaged across seven microsatellite loci; and haplotype (h) and nucleotide (π) diversity for each of the three sequenced loci.

39

Table 2.2. Patiria miniata population structure from analysis of molecular variance (AMOVA) in allele frequencies.

Source of d.f. Sum of % of Fixation P-value variation squares variance index

Microsatellites Among 10 101.4 4.7 FST=0.04 <0.001 populations

Within 396 993.7 22.5 FIS=0.24 <0.001 populations

Within 6 632.0 72.7 FIT=0.27 <0.001 individuals

mtDNA Among 12 32.9 25.5 FST =0.25 <0.001 populations Within 240 87.1 74.4 populations

ATPS Among 7 178.6 2.4 FST =0.02 0.004 populations

Within 172 2257.7 15.5 FIS =0.16 <0.001 populations

Within 180 1716.1 82.1 FIT =0.18 <0.001 individuals

GPI Among 7 403.6 6.4 FST =0.06 <0.001 populations

Within 164 3582.4 73.4 FIS =0.78 <0.001 populations

Within 172 453.9 20.2 FIT =0.79 <0.001 individuals

40

Figure 1.1 Haplotype network for Patiria miniata mtDNA sequences.

Each symbol represents a unique sequence (28 total). The area of each circle (and of the segments in each pie diagram) is proportional to haplotype frequency (n= 1–120). Lines between symbols indicate single substitution differences between haplotypes (the small open symbol indicates one missing haplotype inferred for two sequences that differed by two substitutions), and are not proportional to genetic distance between haplotypes. Color coding indicates the origin of each private haplotype (solid circles) or the frequency of shared haplotypes (pie diagrams) in each region indicated in the map at right: Alaska (AK, dark blue) and Haida Gwaii (HG, cyan) to the north of Queen Charlotte Sound (QCS); Vancouver Island (VI, green) and California (CA, red) to the south. Black rings on the map show approximate locations of sample sites (some indicate >1 locations separated by small distances; see Table 1). The arrows show the approximate location of the divergence of the North Pacific Current. The hatched shading shows the P. miniata range disjunction.

41

Figure 1.2 STRUCTURE clustering of Patiria miniata microsatellite (A) and GPI + microstellite (B) genotypes.

Both data sets suggest a strong phylogeographic break between samples north (Alaska, Haida Gwaii) and south (Vancouver Island, California) of Queen Charlotte Sound. The heights of individual bars in each panel show the probability of assignment to one of k=2 empirically identified groups indicated by color (orange or purple). Sample site abbreviations (two- or three- letter acronyms) as in Table 1. Microsatellite data alone could be fitted to k=2 groups (C, left) with a nonsignificant improvement in log-likelihood scores up to k=4; the combined data (C, right) strongly suggested a best fit with k=2 groups. Both data sets had a modal value of Δk (a measure of the best-fit number of genotype clusters) at k=2.

42

Figure 1.3 Asymmetrical gene flow and effective population sizes estimated in MIGRATE.

Gene flow (M= θm/µ) and population size (θ= 4Neµ) were estimated using mtDNA (A) and combined results from ATPS and GPI intron polymorphisms (B) for regional population samples (see Fig. 1). Arrows show asymmetrical gene flow from one region to another, numbers show effective population size for each region. Estimates of θ for mtDNA were multiplied by 4 for comparison to diploid biparental nuclear markers. Within each panel, similar line weights indicate groups of migration rates with overlapping 95% confidence intervals in post hoc comparisons (see the legend for each panel); significant differences between migration rates in panel B are indicated by migration rates that do not share the same italic letters a or b. Asymmetries are shown as differences in line weight for the same arrow (e.g., between Vancouver Island and California for introns). Within each panel, different font sizes for θ show significant differences between regions. See the Supplementary Materials from Keever et al. 2009 for most probable estimates and confidence intervals for θ and M values for each population, populations pair, data set, and method.

43

Chapter 3.

There’s no place like home: oceanographic circulation model predicts low larval dispersal and observed genetic structure along a complex coastline

Authors: J.M. Sunday, I. Popovic, W.J. Palen, M.G.G. Foreman, M.W. Hart

Abstract

Understanding the movement of genes and individuals across marine landscapes is a long-standing challenge in marine ecology and evolution, and can inform predictions on the possibility of local adaptation, the persistence and movement of populations, and the spatial scale of effective management. Patterns of gene flow in the ocean are often inferred based on population genetic analyses coupled with knowledge of species’ dispersive life histories. However, genetic analyses often rely on simplistic model assumptions and may not capture present-day connectivity between populations. Here we use a high-resolution oceanographic circulation model to predict larval dispersal along the complex coastline of western Canada, separating two previously-established zoogeographic provinces. We simulate dispersal in a benthic sea star with a 6-10 week pelagic larval phase, and test predictions of this model against observed genetic structure within this region. We also test predictions with de novo genetic sampling in a site within the phylogenetic break. We find that the genetic and circulation models both predict the high degree of genetic structure observed in this species, despite its long pelagic duration. High genetic structure on this complex coastline can therefore be explained through advective circulation patterns, validating the utility of this model for predictions at a broad scale.

44

Introduction

Understanding patterns of gene flow across geographic landscapes is a long- standing goal of ecology. The movement of genes and individuals across landscapes can provide information on the possibility of local adaptation, the persistence of populations, the synchrony by which species’ ranges will shift under environmental change, and the spatial scale at which management efforts will be most effective. As genetic tools become increasingly available, the use of molecular markers to infer gene flow has become widespread. However, inferring gene flow based on the spatial distribution of genetic variation is fraught with potential errors due to mismatches between the natural processes that may lead to genetic differentiation (e.g. genetic drift, historical vicariance) and the highly simplistic assumptions made when interpreting differentiation as a consequence of contemporary gene flow (Whitlock & McCauley 1999; Marko & Hart 2011). For example, Wright’s Island model is widely used to calculate and interpret genetic structure using F-statistics. Yet this model restrictively assumes that populations are in drift-migration equilibrium, and thus does not attribute genetic variation to historical gene flow that may differ from contemporary gene flow, or to similarity between populations by descent rather than gene flow (Wright 1969, Slatkin 1987).

Coalescent approaches, such as the isolation with migration model, provide a non-equilibrium based method to disentangle common ancestry from allele-sharing due to gene flow (Nielsen & Wakeley 2001; Hey & Nielsen 2004), and can therefore provide more robust estimates of gene flow between populations. However, estimates gene of flow are not very sensitive to recent changes, and they tend to require high amounts of data to generate precise estimates of gene flow (Marko & Hart 2011). In addition comparisons can only be made between two populations at a time.

Another approach is to base strong predictions of gene flow on independent data (Irwin 2002). For example, the increasing availability of high-resolution oceanographic circulation models in marine systems can provide information on advection patterns of dispersive life phases (e.g. Siegel et al. 2003). Several recent studies have used high- resolution oceanographic circulation models for direct comparison to empirical genetic data of marine organisms. Some approaches involve qualitative comparison of dispersal

45

patterns and observed genetic structure (e.g. Gilg & Hilbish 2003; Kenchington et al. 2006), while others use analytical tests, usually producing a matrix of dispersal probabilities between locations and relating this to observed pairwise genetic differentiation (Galindo et al. 2006; Dupont et al. 2007; Galindo et al. 2010; White et al. 2010; Foster et al. 2012), or multilocus assignment tests (Schunter et al. 2011). Results of these studies have either shown good agreement with genetic studies (Galindo et al. 2006; Foster et al. 2012; White et al. 2010), or have been useful for identifying alternative or additional mechanisms in structuring populations, such as the role of history, natural selection, and larval behaviour (e.g. Galindo et al. 2010).

The west coast of North America from southern British Columbia to Alaska is an area of potentially complex oceanography, with large fjords, canyons, and islands promoting multiple eddies and counter-currents. Two major oceanic gyres split along the coastline within this region, forming the Alaska current moving northwards, and the California current moving southwards (Dodimead et al. 1963). This roughly coincides with the transition between the Oregonian and Aleutian zoogeographical provinces (Briggs 1974). In addition, there is a relatively recent history of glacial cover over much of the coastal region during the last glacial maximum (~15ky bp), through which population geographic distributions have likely changed (Pielou 1991). Effects of this history coupled with prominent circulation patterns are expected to be integrated in observable genetic variation sampled in present-day populations. Indeed, comparative studies of marine genetic structure in this region have tended to reveal either relatively low diversity and little genetic structure throughout, or relatively high diversity with significant genetic structure between locations sampled between Oregon and Alaska (Hellberg 1996; Arndt & Smith 1998; Rocha-Olivares & Vetter 1999; Kyle & Boulding 2000; Edmands 2001; Marko 2004; Hickerson & Cunningham 2005; Harley et al. 2006; Wilson 2006; Kelly & Eernisse 2007; Kelly & Palumbi 2010). Most strikingly, these patterns do not seem to be related to pelagic larval duration (Marko 2004; Kelly & Palumbi 2010), and several species with long pelagic durations have been found to have significant population structure across samples in this region (Rocha-Olivares & Vetter 1999; Kyle & Boulding 2000; Keever et al. 2009; Kelly & Palumbi 2010). This suggests that local oceanographic features may contribute to genetic structure through effects on larval dispersal and gene flow.

46

To characterize the potential contribution of ongoing gene flow to spatial genetic structure (distinct from the effects of historical demographic changes) we used a high- resolution oceanographic circulation model to simulate larval dispersal along on the north west Coast of North America. We simulated gene flow through this model and tested predictions against observed population genetic structure of a genetically well- sampled marine invertebrate with a long pelagic larval dispersal duration - the sea star, Patiria miniata (Keever et al. 2009). We also used this model to make predictions of gene flow into and out of a previously unsampled location, and tested these predictions with new genetic samples. We found that the oceanographic model matches observed patterns well, and provides accurate predictions of genetic structure in the test populations.

Methods

Study species

We simulated population connectivity of the bat star, Patiria miniata, in which genetic population structure has been previously sampled using several classes of molecular marker: a mitochondrial sequence, seven nuclear microsatellites, two nuclear intron sequence markers (Keever et al. 2009), and sequences encoding gamete recognition genes (Sunday and Hart, unpublished). P. miniata adults live in the intertidal and shallow subtidal benthos, and their distribution is not continuous along the eastern Pacific. P. miniata are known to occur in California as far north as Fort Bragg, are absent from the coast of Oregon and Washington, but are found on the outer coast of British Columbia, including rocky shores of Haida Gwaii, and as far north as southeastern Alaska (Lambert, 2000). P. miniata spawns primarily in the summer (Rumrill 1989), and has a planktotrophic larval phase estimated to be 6-10 weeks (Strathman 1987; Rumrill

1989; Basch 1996).

Oceanographic Model

We used a buoyancy-, wind-, and tidally-driven Lagrarian particle dispersal model for the coastal region of North America from California to southeastern Alaska. This model uses depth-varying ocean velocities to simulate the movement of individual

47

particles from a given starting location and depth. Ocean velocities were computed with a high-resolution diagnostic finite element model of the northeast Pacific Ocean forced with seasonal climatologies of temperature, salinity, and wind stress (Foreman et al. 2008). While the model was developed to provide seasonal estimates of dynamic ocean topography, and in this capacity was validated against satellite altimetry and coastal tide gauge measurements, it also produces three-dimensional velocity fields that can be used for transport and dispersion studies. In particular, depth-varying velocities are computed at the vertices of an unstructured triangular grid with 169,869 triangles and sides whose lengths vary from 100 m in some narrow coastal channels to 70 km in the deep ocean (Foreman et al. 2008). We used the summer-averaged velocities to simulate the spawning season of P. miniata. These included a time-invariant background buoyancy- and wind-driven flow field and a tidal current that is comprised of time-varying flows from eight constituents (the four largest semi-diurnals and diurnals), comprising approximately 85% of the total tidal range.

We used these velocity fields with the drogue-tracking algorithm implemented in DROG3D (Blanton 1995) to trace passive Lagrarian particles through the combined tidal and buoyancy-wind velocity fields with a 0.5 hour time step. This particle tracking simulation was deterministic (without a random diffusion step), but outcomes were variable depending on the half-hour at which the drogue was released, and we therefore obtained high variation among particles released from the same location at different release times. We held particles at 1m of depth from the surface, based on observation of other larvae (Pennington & Emlet 1986; Martin et al. 1997) and observations of negative-geotactic swimming behaviour in Pisaster ochreaceus (Crawford & Jackson 2002) and P. miniata (JS personal observation)

Seeding locations

We defined 10 coastal regions of 110 km stretches of coastline within the northern end of the P. miniata range (Fig.1). These were in sites where P. miniata had either been observed, or where their habitat (shallow rocky substratum) is known to exist. Within each region, we established 40 seeding locations along the coastline each ~2km offshore to reduce the number of particles that became grounded along the coastline near the release sites. We tracked simulated larvae from each location starting

48

at 25 different times throughout the breeding season, producing 1000 tracks from each region. These 10 regions included areas of the coastline which were not sampled genetically, but provide “stepping stones” for more realistic dispersal between the 5 sampled populations. We refer to genetically-sampled populations using bold capitals (BA, WH, BB, HG, AK) and to stepping stone populations using lower case italics (cvi, scc, shg, kit, pru; see Table 1 for definitions).

Connectivity Matrix

To determine the number of simulated larvae that could settle in each region, we defined an area bounded by the 50km radius from the center of each region on the coast using Geographic Information Systems (ESRI, ArcGIS 10) (Fig.1a). A larva was considered to settle within any such “settlement region” through which it passed during its competency period, which we defined as being between 40 days, the time after which larva are capable of settling, and 70 days, after which it is likely to die (Basch 1996). We assumed constant survival between 40 and 70 days. We calculated dispersal probability between pairs of populations based on the number of arrivals out of the total number of larvae started from each population.

From preliminary work we determined that there was no evidence of direct connectivity between sites in California and those in British Columbia and Alaska. Therefore, we did not include California populations in the connectivity matrix, and focussed on hypotheses of genetic structure within the northern part of the P. miniata range only.

Genetic model

To predict the genetic structure arising from the estimated connectivity matrix, we ran a genetic simulation similar to that in Galindo et al. (2006). We simulated 10 unlinked loci with 2 alleles starting at a frequency of 0.5 in each population. All populations had

Npop individuals, and in every generation, a random sample of parents were sampled, from which Nlarv larvae were produced, and dispersed to each of the other populations according to probabilities calculated in the connectivity matrix. When new recruits arrived at each population, we randomly subsampled Npop individuals in the new population to simulate post-settlement mortality. Hence there were two episodes of genetic drift in

49

every generation: once when larvae were sampled from parents, and again through mortality in the population after new recruitment. Values of Npop and Nlarv affected the rate at which populations reached drift-migration equilibrium but did not qualitatively or quantitatively affect results (data not shown). We present results from Npop=500 and

Nlarv=100 here.

We ran the model for sufficient time for genetic structure to become independent of starting conditions, but before many alleles became fixed within populations. Longer time periods did not strongly affect patterns of spatial differentiation, but increased variability among model runs as more populations became fixed for single alleles (Fig.

2). We measured overall and pairwise FST as the standardized variance in allele frequency (p): FST =var(p)/(mean(p)*(1-mean(p))).

We constructed a matrix of predicted pairwise FST values from the mean of multiple model runs. We found that pairwise FST values varied greatly between model runs, and stabilized with ~50 independent model runs. We therefore used the mean FST of 50 model runs to generate robust predictions.

Molecular sampling

We used population genetic data from a previously published study in which individuals were sampled from various sites within the study range (Keever et al. 2009; Fig.1). We focussed on microsatellite marker data from this study because allelic diversity within populations for these seven loci was relatively low and similar (n=3-12 alleles per locus) to allelic diversity in our connectivity model (n=2), in comparison to the much larger number of haplotypes observed within populations for mitochondrial and intron sequence data (Keever et al. 2009). We calculated the mean pairwise FST values from the seven microsatellite loci for the four previously sampled populations used in our dispersal model. In some cases, two sample sites were geographically proximal and found within a single recruitment area as defined in our dispersal model (Fig. 1a).

New molecular samples

We collected new microsatellite data for the same seven loci from a previously unsampled population, in order to test predictions from the dispersal model. Individuals

50

were sampled from Bella Bella (52°10'N, 128°10'W), a location approximately midway between the northern and southern phylogeographic groups. We obtained tissue samples and microsatellite genotypes from 50 individuals using the same methods as in Keever et al. (2009).

Analysis

We tested the correlation between pairwise FST values observed and those predicted from the genetic model using a Mantel test with 9999 permutations, and estimated the relationship using reduced major axis regression (Jensen et al. 2005). We also tested for a correlation between observed FST values and the geographic distance between populations. Geographic distances were defined as the closest straight lines drawn between sample locations without crossing land, estimated coarsely to the nearest 20 km.

We ran mantel tests and reduced major axis regressions first without the additional sample from Bella Bella to estimate the relationship predicted from existing data, and then asked if the pairwise FST values of the new samples fell along the predicted relationship. Reduced major axis regression was performed using the IBDWS (Jensen et al. 2005), and all other analysis were performed in R (R Development Core Team 2009).

We also used the relationship (slope and intercept) from the reduced major axis regression between observed and predicted FST from previously-sampled sites to convert predicted FST values for newly-sampled sites into the scale of FST expected using microsatellites. From 100 independent model runs, we iteratively drew 7 and calculated mean pairwise FST, simulating sampling of 7 independent loci. We bootstrapped 1000 times to generate a distribution of plausible FST estimates between Bella Bella and all other sampled populations. We compared this distribution directly to the observed FST.

We used the clustering method in STRUCTURE (Pritchard et al. 2000) to heuristically search genotype groupings to determine how samples from Bella Bella (BB) are grouped with other populations. This method allows identification of the likely number of populations in the system (k), and an assignment probability of individuals to each group based on minimizing within-group Hardy-Weinberg and linkage

51

disequilibrium. Analyses with previously-sampled sites indicated two linkage groups. Our specific interest was in differentiating among three hypotheses: clustering of BB individuals with northern samples; clustering of BB individuals with southern samples; or evidence of admixture of southern-like and northern-like genotypes in the BB sample. We used an admixture model without geographical priors, with allele frequencies correlated between populations, and admixture proportions estimated from the dataset and fixed for all populations. We ran Bayesian MCMC searches of 1,000,000 steps with a burn-in of 250,000. For each analysis, we carried out seven independent runs for each value of k up to k = 5 populations. We used the method of Evanno et al. (Evanno et al. 2005) to find the best-fit value of k.

Results

The dispersal model predicted relatively diffuse movement of particles from their starting locations relative to the strength of the off-shelf coastal currents. Within 70 days, particles from Vancouver Island (including BA, cvi, and WH) tended to move northwards, though not as far as Haida Gwaii (Fig.1b). Particles from the southern central coast of British Columbia (scc) moved distinctly offshore, towards northern Vancouver Island, and southwards along the west coast of Vancouver Island (Fig. 1c). Populations from Haida Gwaii were very isolated, with only a single particle of 1000 moving from Kitimat (kit) to the Haida Gwaii (HG) recruitment area, and no larvae moving in the opposite direction (Fig. 1c). There was no successful particle movement between the northern (BC and Alaska) and the southern (California) regions of P. miniata species range. Although particles from the most northern known location within California moved almost due North in the Davidson coastal counter-current, they did not travel far enough to reach any of the northern sites (Fig. 1d). These patterns (excluding California) are reflected in the probabilities of the connectivity matrix (Table 1).

Multiple runs of the genetic model showed that pairwise FST became independent of starting conditions within several hundred generations, and the mean remained stable although variability increased with time (Fig. 2). We used the modeled FST after 500 generations to characterize predicted population connectivity.

52

The observed (microsatellite) and predicted (model) population pairwise FST values were significantly correlated in a mantel test (r = 0.89, p = 0.04; Fig. 3a). Notably, these same pairwise FST values also showed a tight correlation with physical distance between sites, indicating isolation-by-distance (r = 0.99, p = 0.08, Fig. 3b).

Using these relationships to make predictions of pairwise FST with the newly- sampled Bella Bella population revealed the relative predictive strength of the dispersal model. The dispersal model predicted that Bella Bella would be genetically more similar to Winter Harbour (WH) on Vancouver Island than to the Haida Gwaii (HG) to the north, despite a similar physical distance between them (Fig. 3c, Table 2). This was owing to the high connectivity between the southern central coast region (scc) with both Winter Harbour (WH) and Bella Bella (BB; Fig. 1c). This prediction was borne out in the observed microsatellite frequency distributions, which revealed pairwise FST between BB and HG that was greater than that expected based on distance (Fig. 3d) and more similar to that predicted based on the dispersal model (Fig. 3c). While the dispersal model overestimated connectivity with Alaska and underestimated connectivity with Bamfield, on average the predictions fell along the slope of the predicted relationship

(Fig. 3c; observed FST vs. model predictions, Mantel test, r=0.5778245, p= 0.0408). The predictions of FST with BB were within the margin of error expected based bootstrapped analysis simulating 7 independent loci (Fig. 4.4). By contrast, the observed pairwise FST between Bella Bella and more northern populations was greater than predicted under isolation by distance (Figs. 3d; FST vs. distance, Mantel test, r=0.6748198, p=0.0512).

Population Clustering

Based on the circulation model and connectivity matrix, we predicted that individuals in Bella Bella would represent immigrants from the more northern coastal region around Kitimat, but no further north, as well as immigrants from the southern central coast, which would receive immigrants from Vancouver Island and further south. We therefore predicted that individuals in this region would be included in Southern linkage group.

Results from the STRUCTURE analysis indicated two clusters of microsatellite genotypes (Fig. 4.4), as before (Keever et al. 2009). Most BB individuals were assigned

53

to the linkage group that occurs predominantly on Vancouver Island (Fig. 4). However, some BB individuals had high assignment probabilities to the other linkage group that is found in northern populations (Fig. 3.4).

Discussion

Genetic structure was relatively well predicted from the oceanographic dispersal model and revealed low dispersal rates among populations along the coast of British Columbia and southeastern Alaska. Our coupled dispersal and genetic model predicted the pattern of isolation-by-distance observed between the previously sampled northern populations, and also predicted that the newly-sampled Bella Bella population would have higher similarity to the southern populations on Vancouver Island than to the northern populations in Haida Gwaii and Alaska. These predictions were met in the genetic data, and give validation to this model for use as a predictive tool at the scales examined.

The model predicted relatively low rates of larval dispersal through the system and high self-recruitment in most populations. This finding is counter to conventional predictions of dispersal distances in species with long pelagic durations (reviewed in Cowen et al. 2000), but is consistent with the genetic pattern observed, showing isolation-by-distance. Our dispersal model validates this interpretation of stepping-stone dispersal through the study region, suggesting that larvae mostly migrate primarily to adjacent populations. The inclusion of Bella Bella showed genetic structure that was more consistent with predictions from the dispersal model than to the expectations based on distance alone, owing to asymmetrical oceanographic connectivity between Bella Bella and adjacent northern and southern populations. This helps to validate the specific, smaller-scale dispersal patterns predicted by the physically-based dispersal model.

The dispersal model did not provide perfect predictions of pairwise Fst between Bella Bella and other populations. These discrepancies may be the result of simplifying assumptions made in the dispersal model (see below), or to the (unmodelled) influence of historical processes on population genetic structure. Previous work indicates that

54

populations north and south of Queen Charlotte Sound descend from two ancient lineages (McGovern et al. 2010). The estimated time to coalescence of 282 k years indicates that these two populations were historically vicariant through several cycles of the Pleistocene glaciation (McGovern et al. 2010). These findings provide complimentary information with our model. The isolation-with-migration analysis of McGovern et al. (2010) estimated a low level of per-generation gene flow between the populations, which is consistent with our findings. Our model provides an explanation of how these two ancient lineages have come to persist despite the long pelagic duration of this species. The high similarity between Bella Bella and the southern region of Vancouver Island, and the low similarity between Bella Bella and Alaska suggests that the effect of this historical vicariance may not yet have been erased through drift-migration equilibrium, as simulated in our genetic model. This result highlights that the coupled oceanographic and genetic models can capture contemporary gene flow, but not the multiple historic processes that generate genetic divergence among alleles.

While the circulation model had a reasonable ability to predict the genetic structure found within British Columbia and south-eastern Alaska, it was not able to resolve the extremely high genetic similarity between Southern British Columbia and California. Particles seeded in Vancouver Island did not travel far south, and although particles seeded in the most northern population of California travelled distinctly northwards, presumably in the Davidson Current, they did not move far enough to connect with the northern region of the P. miniata range. This result suggests three alternative mechanisms. First, connectivity may be greater in some extreme years, such as during El Nino Southern Oscillation (ENSO) events, which were not captured in the dispersal model. The northward Davidson current is known to strengthen during ENSO years, and has been invoked to explain sporadic and temporary occurrences of California-affiliated species on Vancouver Island (Schoener & Fluharty 1985; Behrens Yamada & Hunt 2000). The regularity of ENSO events lends support to this hypothesis as a plausible mechanism for the high similarity in diverse allele distributions among California and Vancouver Island (Keever et al. 2009). Second, this genetic pattern may be maintained by intermediate populations occurring along the Oregon or Washington coasts, which have either gone undetected, or have recently gone extinct. Third, historical re-colonization from the south after the last glacial maximum may explain their

55

similarities, together with large effective population sizes and hence a slow rate of genetic drift.

Within British Columbia and southeastern Alaska, the model predictions were significantly but not strongly correlated with observed FST. Pairwise Fst between Bella Bella and Bamfield was lower than expected, and between Bella Bella and Alaska was greater than expected. This suggests that there may be historical processes, such as shared recent ancestry between Bella Bella and Bamfield, also influencing FST. In addition, there are several simplifying assumptions inherent in the ocean circulation model that may contribute to the differences we found between model predictions and observed genetic structure. For example, the circulation model uses mean seasonal wind and buoyancy currents, and thus does not allow fluctuations in velocities due to storm or ENSO events. Changes in ocean topography and wind velocities in such extreme periods may be especially important to population connectivity, and in turn influence the resulting genetic structure. Second, the particle tracking model did not include a stochastic element to allow diffusion of a particle outside of its deterministic track. This limited the number of particles we were able to seed in each location, and potentially limited the number of rare vagrants connecting certain pairs of populations. More particles and some stochasticity may have improved our estimates of connectivity. Third, we did not incorporate larval behaviour such as vertical migration in the water column, or directed movement. Vertical swimming over small distances, especially if timed with tidal changes, can have major effects on larval advection by bringing larvae into different depth-varying velocity fields (Tankersley et al. 1995; Robinson et al. 2005). Nevertheless, the high degree of congruence between predictions generated from this model and observed genetic structure suggests that it provides reasonably robust predictions at the spatial scales we examined, and warrants future use in predictive genetic studies in other taxa.

One important application of our findings is how they inform the varying patterns of genetic structure recorded among marine fish and invertebrates British Columbia and southeastern Alaska. Because we modeled gene flow of a species with a long pelagic duration (70 days) and found substantial genetic structure, this informs the level of genetic structure that can be attributed to passive dispersal through currents in this region. For example, localized natural selection and active larval retention have been

56

suggested to explain the high genetic structure found in the rockfish, Sebastes helvomaculatus, with a similar genetic disjunction between Vancouver Island and Haida Gwaii as found in P. miniata (Rocha-Olivares & Vetter 1999). However, our results suggest that this structure could potentially be maintained by passive dispersal alone (Hilbish, 1996). By contrast, additional mechanisms are required to explain the lack of genetic structure in taxa with similar dispersal potentials, e.g. the sea star Pisaster ochreaseus (Harley et al. 2006), dispersing snails (Kyle & Boulding 2000; Marko 2004), chitons with dispersive larvae (Kelly & Eernisse 2007), intertidal copepods (Edmands 2001), and other dispersing species (Kelly & Palumbi 2010). The relatively low connectivity generated by our oceanographic model suggests that the lack of genetic structure in other taxa is more likely attributable to historical processes such as relatively recent range expansions, or large effective population sizes that limit the rate of lineage sorting through genetic drift. Coalescent methods that simultaneously estimate gene flow, the timing of isolation, and effective population sizes could be used to independently test this hypothesis (Hart and Marko, 2011).

Our results have specific implications for marine spatial planning along the coastline of western Canada. Guidelines for the placement of marine protected areas (MPAs) ubiquitously include consideration of self-recruitment into MPAs, connectivity among MPAs, and dispersal from MPAs to adjacent unprotected areas (Halpern 2003; UNEP-CBD 2008; Gaines et al. 2010; Hamilton et al. 2010; Jessen et al. 2011). Our findings demonstrate that dispersal of summer-spawning species with larvae that develop at the surface layer (as modelled here) occurs over relatively small spatial scales (but see Robsinson et al. 2005, for greater dispersal of winter propagules at 30 m depth). Hence surface propagule dispersal to areas adjacent to an MPA may be high, but the effects of MPAs are primarily expected to be local. However, some regions in our model had lower local connectivity than others. Notably, Haida Gwaii was relatively isolated from the mainland, and propagules from the southern region of Haida Gwaii did not self-recruit and moved either north within Haida Gwaii, or to the west and offshore. In addition, populations north of Kitimat (pru and AK) were also relatively isolated. Given the relatively low distance of stepping stone dispersal suggested by our simulations, affective connectivity between protected areas would most likely require a network of geographically proximal MPAs along the coastline. Future work could also help to

57

identify nodes along the coast that are most critical for maintaining a network of connectivity (Watson et al. 2011), such as the southern central coast region, which appears to connect the central coast with western Vancouver Island.

58

References

Arndt A, Smith MJ (1998) Genetic diversity and population structure in two species of sea cucumber: differing patterns according to mode of development. Molecular Ecology 7, 1053-1064.

Avise JC, Arnold J, Ball RM, Bermingham E, Lamb T, Neigel JE, Reeb CA, Saunders NC (1987) Intraspecific phylogeography - the mitochondrial-DNA bridge between population-genetics and systematics. Annual Review of Ecology and Systematics 18, 489-522.

Basch LV (1996) Effects of algal and larval densities on development and survival of asteroid larvae. Marine Biology 126, 693-701.

Baums IB, Paris CB, Chérubin LM (2006) A bio-oceanopgraphic filter to larval dispersal in a reef-building coral. Limnology and Oceanography 51: 1969-1981.

Behrens Yamada S, Hunt C (2000) The arrival and spread of the European green crab, Carcinus maenas, in the Pacific Northwest Dreissena 11, 1-7.

Blanton BO (1995) DROG3D: User’s manual for 3-dimensional drogue tracking on a finite element grid with linear finite elements. Unpublished manuscript, Ocean Processes Numerical Methods Laboratory, University of North Carolina at Chapel Hill.

Briggs JC (1974) Marine zoogeography. New York: McGraw-Hill.

Crawford BJ, Jackson D (2002) Effect of microgravity on the swimming behaviour of larvae of the starfish Pisaster ochraceus. Canadian Journal of Zoology-Revue Canadienne De Zoologie 80, 2218-2225.

Dodimead AJ, Favorite F, Hirano T (1963) Salmon of the North Pacific Ocean: Part II. Review of oceanography of the subarctic Pacific region: Int. N. Pacific Fisheries Commission, Vancouver B.C. 195 pp.

Dupont L, Ellien C, Viard F (2007) Limits to gene flow in the slipper limpet Crepidula fornicata as revealed by microsatellite data and a larval dispersal model. Marine Ecology-Progress Series 349, 125-138.

Edmands S (2001) Phylogeography of the intertidal copepod Tigriopus californicus reveals substantially reduced population differentiation at northern latitudes. Molecular Ecology 10, 1743-1750.

Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology 14, 2611-2620.

59

Foreman MGG, Crawford WR, Cherniawsky JY, Galbraith J (2008) Dynamic ocean topography for the northeast Pacific and its continental margins. Geophysical Research Letters 35.

Foster NL, Paris CB, Kool JT, Baums IB, Stevens JR, Sanchez JA, Bastidas C, Agudelo C, Bush P, Day O, Ferrari R, Gonzalez P, Gore S, Guppy R, McCartney MA, McCoy C, Mendes J, Srinivasan A, Steiner S, Vermeij MJA, Weil E, Mumby PJ (2012) Connectivity of Caribbean coral populations: complementary insights from empirical and modelled gene flow. Molecular Ecology 21, 1143-1157.

Gaines SD, White C, Carr MH, Palumbi SR (2010) Designing marine reserve networks for both conservation and fisheries management. Proceedings of the National Academy of Sciences of the United States of America 107, 18286-18293.

Galindo HM, Olson DB, Palumbi SR (2006) Seascape genetics: A coupled oceanographic-genetic model predicts population structure of Caribbean corals. Current Biology 16, 1622-1626.

Galindo HM, Pfeiffer-Herbert AS, McManus MA, Chao Y, Chai F, Palumbi SR (2010) Seascape genetics along a steep cline: using genetic patterns to test predictions of marine larval dispersal. Molecular Ecology 19, 3692-3707.

Gilg MR, Hilbish TJ (2003) The geography of marine larval dispersal: Coupling genetics with fine-scale physical oceanography. Ecology 84, 2989-2998.

Halpern BS (2003) The impact of marine reserves: Do reserves work and does reserve size matter? Ecological Applications 13, S117-S137.

Hamilton SL, Caselle JE, Malone DP, Carr MH (2010) Incorporating biogeography into evaluations of the Channel Islands marine reserve network. Proceedings of the National Academy of Sciences of the United States of America 107, 18272- 18277.

Harley CDG, Pankey MS, Wares JP, Grosberg RK, Wonham MJ (2006) Color polymorphism and genetic structure in the sea star Pisaster ochraceus. Biological Bulletin 211, 248-262.

Hellberg ME (1996) Dependence of gene flow on geographic distance in two solitary corals with different larval dispersal capabilities. Evolution 50, 1167-1175.

Hey J, Nielsen R (2004) Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics 167, 747-760.

Hickerson MJ, Cunningham CW (2005) Contrasting quaternary histories in an ecologically divergent sister pair of low-dispersing intertidal fish (Xiphister) revealed by multilocus DNA analysis. Evolution 59, 344-360.

60

Hilbish TJ (1996) Population genetics of marine species: The interaction of natural selection and historically differentiated populations. Journal of Experimental Marine Biology and Ecology 200, 67-83.

Irwin DE (2002) Phylogeographic breaks without geographic barriers to gene flow. Evolution 56, 2383-2394.

Jensen JL, Bohonak AJ, Kelley ST (2005) Isolation by distance, web service. Bmc Genetics 6.

Jessen S, Chan K, Côté I, Dearden P, Santo ED, Fortin MJ, Guichard F, Haider W, Jamieson G, Kramer DL, McCrea-Strub A, Mulrennan M, Montevecchi WA, Roff J, Salomon A, Gardner J, Honka L, Menafra R, Woodley A (2011) Science- based Guidelines for MPAs and MPA Networks in Canada, pp. 58. Vancouver: Canadian Parks and Wilderness Society.

Keever CC, Sunday J, Puritz JB, Addison JA, Toonen RJ, Grosberg RK, Hart MW (2009) Discordant distribution of populations and genetic variation in a sea star with high dispersal potential. Evolution 63, 3214-3227.

Kelly RP, Eernisse DJ (2007) Southern hospitality: A latitudinal gradient in gene flow in the marine environment. Evolution 61, 700-707.

Kelly RP, Palumbi SR (2010) Genetic Structure Among 50 Species of the Northeastern Pacific Rocky Intertidal Community. Plos One 5.

Kenchington EL, Patwary MU, Zouros E, Bird CJ (2006) Genetic differentiation in relation to marine landscape in a broadcast-spawning bivalve mollusc (Placopecten magellanicus). Molecular Ecology 15, 1781-1796.

Kyle CJ, Boulding EG (2000) Comparative population genetic structure of marine gastropods (Littorina spp.) with and without pelagic larval dispersal. Marine Biology 137, 835-845.

Marko PB (2004) 'What's larvae got to do with it?' Disparate patterns of post-glacial population structure in two benthic marine gastropods with identical dispersal potential. Molecular Ecology 13, 597-611.

Marko PB, Hart MW (2011) The complex analytical landscape of gene flow inference. Trends in Ecology & Evolution 26, 448-456.

Martin D, Claret M, Pinedo S, Sarda R (1997) Vertical and spatial distribution of the near-shore littoral meroplankton off the Bay of Blanes (NW Mediterranean Sea). Journal of Plankton Research 19, 2079-2089.

McGovern TM, Keever CC, Saski CA, Hart MW, Marko PB (2010) Divergence genetics analysis reveals historical population genetic processes leading to contrasting phylogeographic patterns in co-distributed species. Molecular Ecology 19, 5043- 5060.

61

Nielsen R, Wakeley J (2001) Distinguishing migration from isolation: A Markov chain Monte Carlo approach. Genetics 158, 885-896.

Pennington JT, Emlet RB (1986) Ontogenetic and diel vertical migration of a planktonic echinoid larva, Dendraster excentricus (Eschscholtz) - occurrence, causes, and probably consequences. Journal of Experimental Marine Biology and Ecology 104, 69-95.

Pielou EC (1991) After the ice age: The return of life to glaciated North America. Chicago: Univ. Chicago Press.

Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155, 945-959.

R Development Core Team (2009) R: A Language and Environment for Statistical Computing. Vienna, Austria.

Robinson CLK, Morrison J, Foreman MGG (2005) Oceanographic connectivity among marine protected areas on the north coast of British Columbia, Canada. Canadian Journal of Fisheries and Aquatic Sciences 62, 1350-1362.

Rocha-Olivares A, Vetter RD (1999) Effects of oceanographic circulation on the gene flow, genetic structure, and phylogeography of the rosethorn rockfish (Sebastes helvomaculatus). Canadian Journal of Fisheries and Aquatic Sciences 56, 803- 813.

Rumrill SS (1989) Population size structure, juvenile growth, and breeding periodicity of the sea star Asterina miniata in Barkley Sound, British Columbia. Marine Ecology-Progress Series 56, 37-47.

Schoener A, Fluharty DL (1985) Biological anomalies off Washington in 1982–83 and other major Niño periods. In El Niño effects in the eastern subarctic Pacific Ocean (W.S. Wooster and E.L. Fluharty, eds), pp. 211-225. Seattle: Wash. Sea Grant Program, University of Washington.

Schunter C, Carreras-Carbonell J, MacPherson E, Tintore J, Vidal-Vijande E, Pascual A, Guidetti P, Pascual M (2011) Matching genetics with oceanography: directional gene flow in a Mediterranean fish species. Molecular Ecology 20, 5167-5181.

Siegel D, Kinlan B, Gaylord B, Gaines S (2003) Lagrangian descriptions of marine larval dispersion. Marine Ecology Progress Series 260, 83–96.

Strathman MF (1987) Reproduction and development of marine invertebrates of the northern Pacific coast. Seattle: Univ. Washington Press.

Tankersley RA, McKelvey LM, Forward RB (1995) Responses of estruarine crab megalopae to pressure, salinity and light - implications for flood tide transport. Marine Biology 122, 391-400.

62

UNEP-CBD (2008) Decision adopted by the conference of the parties to the convention of biological diversity at its ninth meeting: (United Nations Environment Programme -Convention on Biological Diversity) UNEP/CBD/COP/DEC/IX/20. www.cbd.int/doc/decisions/cop-connect09/cop-09-dec-20-en.pdf.

Watson JR, Siegel DA, Kendall BE, Mitarai S, Rassweiller A, Gaines SD (2011) Identifying critical regions in small-world marine metapopulations. Proceedings of the National Academy of Sciences of the United States of America 108, E907- E913.

White C, Selkoe KA, Watson J, Siegel DA, Zacherl DC, Toonen RJ (2010) Ocean currents help explain population genetic structure. Proceedings of the Royal Society B-Biological Sciences 277, 1685-1694.

Whitlock MC, McCauley DE (1999) Indirect measures of gene flow and migration: F-ST not equal 1/(4Nm+1). Heredity 82, 117-125.

Wilson AB (2006) Genetic signature of recent glaciation on populations of a near-shore marine fish species (Syngnathus leptorhynchus). Molecular Ecology 15, 1857- 1871.

Wright S (1969) Evolution and the Genetics of Populations. Variability Within and Between Populations: University of Chicago Press.

63

Tables and Figures

Table 3.1. Connectivity matrix of the proportion of particles moving from each dispersal area to each recruitment area.

To: From: BA cvi WH scc BB shg HG kit pru AK Bamfield BA 0.755 0.155 0 0 0 0 0 0 0 0 central Vancouver Island cvi 0 0.4833 0.145 0 0 0 0 0 0 0 Winter Harbour WH 0 0.0917 0.574 0.122 0 0 0 0 0 0 southern Central Coast scc 0 0.1917 0.301 0.156 0 0 0 0 0 0 Bella Bella BB 0 0 0 0.018 0.449 0 0 0.106 0 0 southern Haida Gwaii shg 0 0 0 0 0 0 0.534 0.00097 0 0 Haida Gwaii HG 0 0 0 0 0 0.855 0.0724 0 0 0 Kitimat kit 0 0 0 0 0.0259 0 0 0.427 0 0 Prince Rupert pru 0 0 0 0 0 0 0 0 0.6052 0 southeast Alaska AK 0 0 0 0 0 0 0 0 0.00099 0.037 Acronyms indicate locations described in left hand column and shown in Fig. 1.

64

Table 3.2. FST and distance matrices.

In (a), values above diagonal show observed genetic differentiation (FST) among populations, values below the diagonal show predicted FST from model outcomes. In (b), values below the diagonal represent geographic distances between each pair of populations. Population abbreviations are as defined in Table 1.

a) Observed and predicted FST b) Geographic distance (km) BA WH BB HG AK BA WH BB HG

BA 0 0.0244 0.0578 0.0784 0.1053

WH 0.3107 0 0.0468 0.0353 0.0655 290

BB 0.7649 0.2618 0 0.0673 0.1063 510 220

HG 0.6086 0.3406 0.5841 0 0.0372 670 380 260

AK 0.7872 0.4771 0.6171 0.5227 0 1040 750 450 300

65

Fig. 3.1. Dispersal of simulated larvae through an oceanographic circulation model on the west coast of British Columbia and south eastern Alaska.

(a) Initial position of larvae (points) and larval recruitment areas (dark grey polygons) for each region. (b-d) Overlaid dispersal tracks of 1000 particles after 70 days. (c) Close-up within Queen Charlotte Sound and Hecate Strait. (d) Dispersal of particles from Fort Bragg, California.

2.0

1.5

1.0 BA to AK

Pairwise FST 0.5 BA to WH 0.0

0 200 400 600 800 1000 Generations

Fig. 3.2. Pairwise FST between two example population pairs with time.

Comparisons between Bamfield and Alaska (blue), and Bamfield and Winter Harbour (grey) are shown from 50 independent model runs with the same starting conditions. Thick lines show mean FST values across model runs.

66

a) b)

0.12 0.12 BAtoAK BAtoAK 0.10 0.10

BAtoHG BAtoHG 0.08 0.08 WHtoAK WHtoAK 0.06 0.06 Observed Fst 0.04 WHtoHGHGtoAK 0.04 HGtoAKWHtoHG

BAtoWH BAtoWH 0.02 0.02

0.2 0.3 0.4 0.5 0.6 400 600 800 1000

c) Predicted Fst d) Distance, km

0.12 0.12 BBtoAK BBtoAK 0.10 0.10

0.08 0.08 BBtoHG BBtoHG

0.06 BBtoBA 0.06 BBtoBA BBtoWH BBtoWH Observed Fst 0.04 0.04

0.02 0.02

0.2 0.3 0.4 0.5 0.6 200 400 600 800 1000

Predicted Fst Distance, km

Fig. 3.3. Relationships between model-predicted FST, observed FST, and geographic distance between sites.

(a,b) Previously-sampled populations, (c,d) all populations. Text represents population comparisons for each point. Grey solid lines represent reduced major axis regression lines. Black dashed lines in (c) and (d) represent reduced major axis regression lines from previously sampled data only for comparison. Circled points in (d) highlight sites in which observed FST was great than expected given distance from Bella Bella (BB). Population abbreviations in Table 3.1.

67

0.25

0.20

0.15

0.10 FST 0.05

0.00

-0.05

BBtoBA BBtoWH BBtoHG BBtoAK

Fig. 3.4. Predicted and observed FST for comparisons to the newly-sampled Bella Bella population.

Black density plots show the density distribution of predicted FST using the dispersal model and the relationship between observed and predicted FST. White points show observed FST.

68

Figure 3.5. Results from STRUCTURE analysis showing clustering data for Central Coast in combination with genetic data from previously sampled populations.

(a) Posterior probabilities of linkage group association for every individual, for 2 linkage groups (k=2). (b) Estimated log-likelihood values for of k=1 to k=5 populations across seven independent runs. (c) Delta K estimates indicating largest Δk at k=2.

69

Chapter 4.

Sea star populations diverge by positive selection at a sperm-egg compatibility locus

Authors: J.M. Sunday and M.W. Hart

Abstract

Fertilization proteins of marine broadcast spawning species often show signals of positive selection. Among geographically isolated populations, positive selection within populations can lead to differences between them, and may result in reproductive isolation upon secondary contact. Here we test for positive selection in the reproductive compatibility locus, bindin, in two populations of a sea star on either side of a phylogeographic break. We find evidence for positive selection at codon sites in both populations, which are under neutral or purifying selection in the reciprocal population. The signal of positive selection is stronger and more robust in the population where effective population size is larger and bindin diversity is greater. In addition, we find high variation in coding sequence length caused by large indels at two repetitive domains within the gene, with greater length diversity in the larger population. These findings provide evidence of population-divergent positive selection in a fertilization compatibility locus, and suggest that sexual selection can lead to reproductive divergence between conspecific marine populations.

Introduction

One of the most remarkable patterns to emerge from molecular genetic investigations of natural populations is the signature of diversifying selection in proteins involved in fertilization (Swanson and Vacquier 2002; Turner and Hoekstra 2008).

70

Several non-mutually exclusive mechanisms have been proposed to explain these patterns, including sexually antagonistic coevolution (Vacquier et al. 1997; Gavrilets 2000; Galindo et al. 2003; Haygood 2004), arms races with infecting pathogens (Vacquier et al. 1997), and reproductive character displacement among hybridizing species (Metz et al. 1998; Swanson and Vacquier 2002; Palumbi 2009). Regardless of the precise mechanism(s), if the selective agents that lead to such diversifying selection act in a population-specific manner, then geographically isolated populations may undergo differential, or divergent, selection in the very proteins that confer reproductive compatibility (Gavrilets, 2000). This may thus be a an important mechanism for reproductive isolation (Vacquier et al. 1997).

In broadcast spawning marine invertebrates, mate choice is primarily mediated by gamete surface proteins, providing a relatively simple system in which it is possible to understand the evolution of reproductive isolation. The identification of these proteins in the genome of broadcast spawning species allows a mechanistic understanding at the level of single genes and nucleotide substitutions. Considerable evidence suggests rapid evolution of these proteins in response to selection in multiple species. Signals of positive selection have been detected among sea urchins (Metz and Palumbi 1996; Biermann 1998; McCartney and Lessios 2004), abalone (Lee and Vacquier 1992; Galindo et al. 2003) turban snails (Hellberg and Vacquier 1999; Hellberg et al. 2012), mussels (Riginos and McDonald 2003; Springer and Crespi 2007), and oysters (Moy et al. 2008). Among sea urchin species, variation in pairwise reproductive compatibility was better explained by pairwise divergence in the sperm protein, bindin, than by neutral genetic markers, indicating that evolution in this protein has been particularly important in driving reproductive isolation between species pairs.

The relatively simple mating systems of broadcast spawners are particularly good systems in which to analyze the origins of reproductive isolation at its earliest stage among geographically separated populations of conspecifics. However, in comparison to the abundant evidence for among-species differences driven by selection there is much less evidence implicating positive selection as the cause of population differences in reproductive compatibility loci. Differential presence or absence of alleles from clades found to be under positive selection was found among populations of Echinometra oblonga (Geyer and Palumbi 2003), and Mytilus galloprovincialis (Riginos et al. 2006;

71

Springer and Crespi 2007), suggesting positive selection has occurred in some, but not all, populations. More traditional approaches using FST statistics has shown significant population differentiation of reproductive compatibility loci in Echinometra oblonga (Geyer and Palumbi 2003), Mytilus galloprovincialis (Riginos et al. 2006), and Heliocidaris bajulus (Hart et al. 2012), but not in Strongylocentrotus franciscanus (Debenham et al. 2000), Echinometra lucunter (Geyer and Lessios 2009), or Heliocidaris erythrogramma (Binks et al. 2012). Other evidence for population-level divergence in reproductive compatibility can be found from experimental work. Fertilization rates were higher between mates from the same local population than between mates from different, geographically separated populations in the sea urchin Strongylocentrotus droebachiensis (Biermann and Marks 2000; Styan et al. 2008; Zhang et al. 2010), suggesting that reproductive isolation can evolve between populations in allopatry.

Here we investigate population-level variation in a gamete compatibility locus in a broadcast-spawning sea star, and test for divergent positive selection across populations. Our approach differs from previous work in that we directly compare lineages of alleles found within each population, and test for site-specific rates of positive selection that differ between populations. We studied the bat star, Patiria miniata, which is abundant and ecologically important in shallow marine habitats of the Pacific Ocean from southeastern Alaska to southern California (Rumrill 1989). This species is found at high local densities (Rumrill 1989), and mate by external fertilization. We focused on two populations on either side (north and south) of an established phylogeographic break in this species, characterized previously using three classes of selectively neutral marker loci (Keever et al. 2009) and located in the region of Queen Charlotte Sound, British Columbia (at ca. 51°N; Fig. 4.1). The timing of this population split was dated to ca. 200,000 years b.p. using coalescent analysis, with low subsequent rates of gene flow between the populations (McGovern et al. 2010), and low potential for contemporary dispersal of planktonic larvae across Queen Charlotte Sound (Sunday et al., in prep). We analyzed population variation in the gene encoding the sperm protein, bindin (Patiño et al. 2009). Bindin is expressed in the acrosomal vesicle of the sperm head, and interacts with a glycoprotein receptor on the egg surface (Kamei and Glabe 2003). This interaction is strongly species-specific among several sea urchins (reviewed in Vacquier et al. 1995), and bindin evolves under positive selection in many sea urchin taxa

72

(reviewed in Swanson and Vacquier 2002; Palumbi 2009; Lessios ). Sea star bindin includes a short core region similar to the invariant core domain of bindin in sea urchins, but most of the sea star bindin coding sequence is structurally distinctive from sea urchins and consists of several types of long repetitive motifs (Fig. 4.1; see also Patiño et al. 2009).

We report a remarkable level of intraspecific variation in this repetitive structure, which we describe and analyze in detail. We find a significant bindin difference between the two populations, including many single nucleotide polymorphisms that encode amino acid substitutions, as well as a large number of insertion-deletion differences in coding sequence length (and protein size). Most importantly, we show that nucleotide differences are most likely a response to selection that favours amino acid substitutions (positive selection) at different sites among lineages of alleles in the northern and southern populations. We identify those sites, and develop a series of hypotheses pertaining to the mechanism of selection and population divergence.

Methods

Bindin sampling and alignment

We analyzed bindin variation among 44 individual sea stars collected from populations North and South the phylogenetic break. We sampled 20 northern individuals from Sandspit, Haida Gwaii (53° 14' 28"N, 131° 50' 00"W), and 20 southern individuals from Bamfield, Vancouver Island (48° 49’ 43”N 125° 08’ 05”W). Our quantitative analyses focus on those 40 individuals from those two populations, but for a qualitative comparison within each region we also sampled to two northern individuals from Dunbar Inlet, Alaska (55° 04’ 09”N 132° 50’ 47”W), and two southern individuals from Fort Bragg, California (39° 24’ 32”N 123° 48’ 22”W). Genomic DNA was extracted from a single tubefoot of each individual using a 2x CTAB extraction (described in Keever et al. 2009). Most of the coding sequence for the mature bindin protein was amplified using the AEE and VLS primers developed by Patino et al. (2009). This PCR product was 2739-4008 bp in length. It includes the coding sequence for the amino end of the mature bindin protein, consisting mainly of multiple copies of several distinctive

73

repetitive amino acid motifs, as well as a non-repetitive amino acid sequence, an intron, and part of the invariant core domain at the carboxy end of the predicted protein sequence.

PCR products were purified on agarose gels, extracted using the Qiagen gel- extraction kit, and cloned using the Invitrogen Topo-TA cloning kit. Both ends of six to ten clones for each individual were sequenced using universal plasmid primers (ca. 700 bp from both ends of each cloned PCR product). From the consensus of those sequences, one or two alleles were identified per individual. Any unique nucleotide differences between an individual clone and the consensus sequence for that allele were scored as sequencing errors (estimated as 0.00128 per nucleotide). Because the sea star bindin coding sequence is considerably longer than sea urchin bindin, it was not possible to obtain the full-length sequence of an allele from the paired-end sequences of a single clone (as is often done in sea urchin bindin studies, e.g., Geyer and Lessios 2009). To complete the genotypes for each individual sea star, one clone for each allele was then fully sequenced using the following three custom internal primers: KLN-f: 5’- CCAGTGGAAGGGAAGCTAAACT-3’; QPA-F: 5’-GGAATCGGAGTCACAACCAGCGG- 3’; and GML-R 5’-CGAAGCATACCGAAACAGC-3’.

The full-length sequences for all alleles were aligned using default settings of the protein alignment algorithm in MAFFT v.6 (Katoh et al. 2002), followed by minimal adjustments by eye in Se-Al v.2.1. That alignment of 88 sequences included 161 singleton polymorphisms in the part of the gene sequenced using internal primers from one clone per allele. Among these, those polymorphisms that were unique to one out of 88 alleles could represent real but rare polymorphisms or sequencing errors. Because the frequency of these singleton polymorphisms (about 0.0013 per site) from the internal sequence data for single clones per allele was similar to the sequencing error rate (0.00128 per nucleotide) estimated from sequences for the ends of multiple clones per allele, we assumed that they were errors and recoded them with the consensus nucleotide for that site.

74

Gene genealogy

We used the genetic algorithm for recombination detection (GARD) (Kosakovsky Pond et al. 2006) to screen the bindin alignment for evidence of recombination. We found a single recombination site 4 basepairs upstream of the beginning of the intron (Fig. 4.1), and divided our alignment of alleles into two partitions upstream and downstream of this site. Because the second partition downstream of this site is comprised mainly the intron and the highly-conserved core domain (with few variable codons), our phylogeny and analysis of positive selection was conducted using only the first partition, which includes the large majority of the coding region of the gene and almost all of the amino acid polymorphisms we observed.

We constructed a genealogy for bindin using a Bayesian phylogenetic analysis with Markov chain Monte Carlo sampling, using Mr. Bayes v. 3.1.2. We rooted the gene tree using a single bindin allele from P. pectinifera, the sister species of P. miniata, as the outgroup. We ran the MCMC search for 10,000,000 steps, sampling every 1000 steps after a burn-in of 40,000. Independence from starting conditions and convergence was checked using Tracer v.1.5. We used a model of evolution with 2 substitution types and four rate categories of gamma-distributed rates across sites, based on previous identification of the HKY85 model of evolution using Model Selection procedure implemented on the Datamonkey webserver (Delport et al. 2010). Model priors were uniform.

Repetitive domain analysis

We expected to find two repetitive domains within the coding sequence of bindin based on previous work (Patiño et al. 2009). The first is a collagen-like repeat domain, which is made up of 12 to 16 KGKK(G/R)R motifs (Fig. 4.1). Together with their flanking region, these domains were themselves repeated twice in the sequence described in Patino et al. (2009). Downstream of each of these collagen-like domains is a series of tandem repeats (Fig. 4.1). We defined repetitive domains from the longest allele in our alignment using Radar (http://www.ebi.ac.uk/Radar), which identified a repeat domain of 30-35 amino acids (corresponding to the B+C repeat types of Patiño et al. 2009). To explore the evolutionary history of these repeats, we aligned each repeat unit within

75

individual alleles, and combined this alignment across all alleles. To investigate phylogenetic relationships among repeat units, and to identify patterns of gene conversion among them, we used this alignment to construct a neighbour-joining tree from the alignment of collagen-like copies (without their flanking regions) and from the alignment of the tandem repeats. In the alignment of tandem repeats, we included all nine copies in the major tandem repeat region downstream of the third collagen-like domain, plus the additional copies identified in the flanking regions of the first and second collagen-like domains (Figs.1 & 4).

Tests of population structure

We used Arlequin 3.0 to estimate bindin differentiation (ΦST) between the Sandspit and Bamfield populations under a Kimura two-parameter substitution model, and used the randomization test of the hypothesis that ΦST=0. In addition, we compared bindin differentiation to differentiation estimated in two nuclear intron loci previously sampled from different individuals in the same two populations. These were: an intron of the alpha subunit of ATP synthetase (ATPS, Genbank accession: FJ850593-FJ850958, Bamfield n (sequences)=52; Sandspit n=58) and an intron of the glucose-6-phosphate isomerase gene (GPI, Genbank accession: FJ850243-FJ850592, Bamfield n=50; Sandspit n=54) (Keever et al. 2009). We selected these loci among other non-coding loci also sampled from these populations (microsatellite and mitochondrial markers) because they had high levels of polymorphism comparable to that observed in bindin (Keever et al. 2009). To allow comparisons between fixation indices calculated from different loci, we used a standardized fixation index (Φ’ST) obtained by dividing observed

ΦST by the maximum possible ΦST for each locus (Hedrick 2005; Meirmans 2006). We used the standardization methods described in Bird et al. (2011) based on (i) the maximum observed genetic distance between any two haplotypes in the alignment, and (ii) the fragment length of each locus. These analyses treated gaps in the alignment as a 5th character state.

Tests of positive selection

We tested for Darwinian positive selection among lineages found in each population using the branch-sites model in PAML v.4.4 (Yang 2007). This model fits

76

relative rates of non-synonymous to synonymous nucleotide changes (ω=dN/dS) for three different rate classes (ω0, ω1, ω2) representing sites experiencing purifying, neutral, and positive selection, and estimates the proportion of codons assigned to each class. The test compares two sets of branches or allelic lineages in the gene tree, which are assumed to share the same estimated rate of codon evolution at sites experiencing purifying selection (ω0<1.0) or neutral accumulation of substitutions (ω1=1.0), but differ at the third class of codons. Branches or lineages of alleles in the ‘foreground’ set are assumed to experience positive selection (ω2>1.0), while all other branches or lineages in the gene tree (the ‘background’ set) are assumed to experience purifying selection

(with the same ω0 rate estimated for the first class of codons) or neutral accumulation of substitutions (ω1=1.0). The hypothesis of positive selection at some sites in the foreground (that is, ω2 significantly greater than 1.0), and significant divergence between foreground and background populations, can be tested by comparing the maximum likelihood scores for the selection model (with ω2 estimated from the data) to the likelihood score for a null model (with ω2 fixed to 1.0). We used the likelihood ratio test of significance (as twice the difference in log-likelihood scores for the selection and null models) with the χ2 approximation and one degree of freedom.

We ran this analysis twice, using Bamfield and Sandspit lineages as foreground branches, respectively. We defined the foreground set of branches as all terminal branches of the gene tree leading to alleles found in either Bamfield or Sandspit plus internal branches subtending any clades of alleles that consisted only of alleles sampled from that population. From each model fit, we used the Bayesian estimator B (Yang et al. 2005) to identify those codons with a high posterior probability that ω>1 in the foreground population.

Because the PAML branch-sites model and similar models of codon evolution were originally developed for quantifying patterns of selection among species, the underlying gene tree is assumed to be known (that is, the gene tree is not inferred as part of the analysis of selection), and the inference of strong or weak selection may be highly sensitive to errors in the underlying gene tree. Phylogenetic analysis of alleles from multiple species that differ from each other by large numbers of substitutions will typically reveal a single most likely genealogy with relatively long internal branches and

77

strongly supported clades (e.g., from bootstrapping), such that gene tree uncertainty may be low. In contrast, phylogenetic analysis of many alleles from one species, in which some pairs of alleles differ from each other by one or a few substitutions, will typically result in a posterior distribution of many slightly different genealogies in which many internal branches are short and bootstrapping percentages or other measures of clade support and topological confidence will be weak. Under a codon model of positive selection, these genealogies may lead to different inferences about the overall strength of selection, or about differences in selection among lineages or among codons. It may therefore be important to account for this kind of uncertainty in intraspecific analyses using codon models of selection like the branch-sites model. To explore the sensitivity our results to uncertainty in the bindin gene tree topology, we repeated these pairs of PAML analyses using all of the 10 most likely trees from the Bayesian posterior distribution of gene trees generated by our MrBayes analysis.

The branch sites model overall is conservative to false positives (sites inferred to be under positive selection in the foreground), because the selection model is tested against a null model that allows the same sites to be under purifying selection in the background branches but under neutral selection in the foreground branches (Zhang et al. 2005). For all analyses we used pairwise deletion of missing nucleotide sites so that we included potential signals of positive selection (and differences among populations) at sites that were present in some alleles but absent in others (a difference we inferred to represent mainly deletions of parts of the repetitive structure of the gene; see Results). This approach was crucial to our analysis because of the extensive length variation in both populations (see Results). This meant that some other codon-based tests of positive selection that use complete deletion rather than pairwise deletion of gap sites were able to characterize patterns of selection for only a small proportion of the alignment that lacked gaps (e.g., the branch site REL method of Kosakovsky Pond et al. 2011). As a result, these methods were relatively ineffective in detecting sites or lineages under selection. Similarly, a lack of fixed differences between the populations precluded use of some other methods that use gene trees to test hypotheses of positive selection (e.g., McDonald and Kreitman 1991).

In order to characterize the potential effects of selection on codon insertion- deletion variation, we also compared the number of indels in the coding region of bindin

78

to the number of indels in the non-coding intron. We hypothesized that under a model of neutral selection, the number of indels should be proportionally equal to the length of the fragment in the coding and non-coding regions, respectively. From the full-length sequence alignment of 80 alleles we counted the number of unique contiguous series of gap sites found in the coding sequence (e.g., a single codon insertion, or a 10-codon deletion, each were scored as one unique indel) and in the intron (same scoring as for coding sequence). We used a χ2 test (with Yates’ correction for small sample sizes) of the expectation of equal frequency of indel occurrence per unit length of coding or non- coding sequence (in nucleotides).

Results

Polymorphism

The alignment of bindin genomic sequences including the intron was 4134 bp, with 196 variable sites. The coding region without the intron (Fig.1) was 2853 bp, with 70 variable sites and 46 amino acid polymorphisms. Most of this variation occurred in the first partition upstream of the recombination site (at codon site 890, Fig. 4.1), where there were 69 variable sites and 45 amino acid polymorphisms. We found high haplotype diversity in this first partition, including 68 unique alleles among 88 sequenced gene copies.

We found striking variation in the length of the gene, owing to variation in the number of repeat units in the repetitive domains. The collagen-like repeat units varied in number from 1 to 4 copies with a mode of 3 (Figs. 2, 3). There was also polymorphism in the length of these collagen-like repeats, owing to insertions or deletions of the constituent KGKK(G/R)R repeat motifs. The tandem repeat motifs of 32-36 amino acids also varied in copy number from 3 to 9 copies, with a mode of 8 (Figs. 2-3). The alignment suggested that the 13 alleles with two collagen-like copies were not identical by decent. Instead, alleles with two collagen-like copies was consistently and unambiguously aligned to longer alleles (with three or four collagen-like copies) in one of three ways (variants b, d, and e in Fig. 4.2). This pattern strongly suggests that shorter

79

alleles with fewer collagen-like domains were derived in parallel by several different types of deletion mutations.

Genetic diversity, heterozygosity, and repeat copy number polymorphism were greater in Bamfield compared to Sandspit (Fig. 4.2). Pairwise nucleotide diversity was 6- fold greater in Bamfield compared to Sandspit, from which the same number of individuals were sampled (Fig. 4.2). Almost all of the copy number variation in the collagen-like domains occurred in the Bamfield sample (1-3 copies); in the Sandspit sample all alleles had 3 collagen-like domains except for a single Sandspit allele with four collagen-like copies (Fig. 4.2). Similarly, the Bamfield population had five length variants among the tandem repeats, while the Sandspit population had only three, two of which occurred in both populations (Fig. 4.2). There were 11 other indels which occurred either as variation in the number of KGKK(G/R)R repeat motifs within individual collagen-like repeats (n=8), or which occurred in non-repetitive regions (n=3). Overall, most of the length variation occurred in the Bamfield population (Fig. 4.2). There were no fixed indel differences or nucleotide polymorphisms between the populations at nucleotide or indel sites.

Bindin genealogy

The consensus bindin gene tree did not show an obvious strong pattern of lineage sorting between the two populations (reciprocal monophyly of large clades), although some small clades were unique to one or the other population (Fig. 4.3). The gene tree included a large clade of alleles characterized by two 6-codon insertions in the third collagen copy, as well as an Arg−>Gly substitution at codon 365. Most alleles in this group also had a Pro−>Glu transversion at codon 842 (Fig. 4.3). Alleles from both of these lineages were sampled in northern and southern populations (Fig. 4.3).

Although gaps in the alignment were not included as a character state in the phylogenetic reconstruction, some alleles with the same copy number variants were grouped together within clades, indicating that they also shared nucleotide polymorphisms (e.g., repeat variants b and c, Fig. 4.3). Other repeat variants did not form clades but tended to be closely related to each other (variants d and e).

80

Relationships among repeat paralogs

In the alignment of collagen-like domain copies, the neighbour-joining tree showed that copies 1, 2, and 3 formed strongly supported clades subtended by long internal branches, with no strong evidence of gene conversion among them (Fig. 4.4). The fourth collagen-like copy, which was only observed in one individual, was identical to copy 2 (labeled ‘2b’ in Fig. 4.4). Further analysis showed that the flanking region downstream of copy 2b was also identical to that of copy 2 (Figs. 1 and 3). Collagen-like copy 3 had two forms, which differed by the two six-codon insertions and the nucleotide transversion described above, defining a large clade of the gene tree, and these were more similar to one another than they were to copies 1 and 2. There were two unique collagen-like copies found in single individuals that showed evidence of partial gene conversion: copy 2 in an allele from Bamfield (labeled ‘BA89-2’ in Fig. 4.4) had a 15-bp region that was strongly similar and readily aligned to copy 3; and copy 2 of another Bamfield allele was almost identical to copy 1 (labeled ‘BA48A-2’ in Fig. 4.4).

The neighbour-joining tree of tandem repeats indicated a complex history of duplications among these repeats (Fig. 4.4). Copies 5, 6, 7, 8, and 9 were strongly similar or identical to each other, consistent with relatively recent duplication. A single polymorphism was common though not fixed in copy 9. There were three apparent instances of gene conversion in the tandem repeats among individual alleles. Two of these instances involved sequence similarities that were out of phase with the repeat motif as defined by Radar, so that the affected alleles had strong sequence similarities involving parts of two adjacent repeat copies. In both of these alleles, a region including parts of tandem repeats 9 and 10 was found to be identical to the analogous parts of repeats 10 and 11. Both of these cases occurred within alleles sampled from Alaska. In the third instance, tandem repeat 1 in a Bamfield allele (downstream of the first collagen- like repeat) was found to be identical to repeat 3 (downstream of the third collagen-like repeat).

Bindin population structure

Population differentiation (ΦST=0.220) between Bamfield and Sandspit was strong and highly significant (p<0.001). Standardized genetic distance in bindin (Φ’ST)

81

was 0.462 (standardized by maximum genetic distance) or 0.454 (standardized by fragment length). By contrast, Φ’ST for the ATPS and GPI loci were were 0.0545 and 0.0940 (standardized by maximum genetic distance) or 0.0310 and 0.0697 (standardized by fragment length). Thus, standardized genetic distance was nearly an order of magnitude larger across these two populations in bindin compared to the non- coding intron sequences.

Tests of positive selection

The branch-site model of positive selection in PAML was somewhat sensitive to which tree was used (among the top 10 most probable trees from the Bayesian posterior distribution). We therefore report all of the results in the online supplementary material (Table S1), and summarize the general findings below.

With Bamfield alleles and clades in the foreground, significant positive selection was detected for all trees, as the likelihood ratio tests rejected the null model in all cases (Table S1). The Bayes empirical Bayes estimator detected nine sites under positive selection with a high ω>1 (Figure 5). We found the same sites under positive selection in this population for all 10 gene trees tested (Figure S1).

With Sandspit alleles and clades in the foreground, significant positive selection was only detected in tests using two of ten gene trees (Table S1). For the other eight trees, the positive selection model (with an estimated value of ω2>1) was not a significantly better fit to the data and the gene tree in comparison to the neutral model

(with the value of ω2 fixed to 1.0 rather than estimated). For the two trees which lead to a significant signal of positive selection, the Bayes empirical Bayes estimator detected four sites under positive selection with a high ω>1 (Fig. 5). These same four sites were also detected in non-significant selection models by the Bayes empirical Bayes estimator (when a subset of models were run sufficiently long for this estimation to be made; Figure S1), but with lower posterior probabilities of falling into the positively selected class of sites. This indicates that these four sites may have notably high dN/dS when other gene trees are used, but the selection model with an extra estimated parameter was not more likely than the neutral model with a fixed value of ω=1 at those sites.

82

Notably, the sites that were strongly inferred to be under positive selection in Bamfield differed from the sites that were weakly inferred to be under selection in Sandspit (Table 1, Fig. 5). For only one site (codon 842) did we find relatively strong evidence of positive selection in both populations. Of the other eight sites under positive selection in Bamfield, six were under purifying selection in Sandspit; similarly, three of the four sites putatively identified as experiencing positive selection in Sandspit were under purifying selection in Bamfield (Table 1). Also of note is the location of these sites in the gene structure. Of the 11 total sites identified as positively selected in one or both populations, five were associated with the third collagen-like domain, three occurred in the tandem repeat region, and three occurred downstream of the last tandem repeat copy in the nonrepetitive coding sequence (Table 1; Fig. 5).

Number of Indels in coding vs. non coding region

We found a total of 22 different types of indels (Fig. 4.3) in our alignment of coding sequences (2850 bp in length) and only 2 in the non-coding intron (1285 bp in length). This ~5-fold higher density of indels among coding sequences (0.0077 per bp) 2 compared to introns (0.00155 per nucleotide) was significant (χ df=1=4.76, p<0.05).

Covariation between sites under selection and indels

All of the sites found to be under positive selection in Bamfield, and two of the four sites that are probably under positive selection in Sandspit, occurred in regions of indel variation (Fig. 5). Most of these were found among the tandem repeat domains, however two codon sites were also found in parts of the alignment where there were smaller indels (Fig. 5). One of these indels included codon 805, under positive selection in Bamfield. This indel was common at Sandspit (n=5 alleles), but rare at Bamfield (n=1). The other indel present in both populations included codon 842, which was under positive selection in both populations. In addition to these, one small indel found only in Sandspit and Alaskan alleles included codon 778, which might be under positive selection in Bamfield (with a marginally nonsignificant posterior probability, P=0.94) (Fig. 5).

83

Discussion

We find a different pattern of selection in the bindin locus in two geographically separated populations of a single species. Our results specifically indicate that there are sites under selection (dN>dS) in one population that are under purifying or neutral selection in the other population (dN<=dS). Indeed, the sites found to be under positive selection differed between the two populations. Although the signal of positive selection in the Sandspit population was sensitive to the gene tree used, our results clearly indicate that sites under positive selection in Bamfield are not under selection in Sandspit. These findings strongly suggest that selection favors different bindin alleles in the two populations. To the extent that the patterns differ between the two populations (fewer sites under selection in Sandspit, weaker overall evidence for positive selection among those northern alleles and lineages), the populations may be experiencing different processes as well (weaker selection, or a different mode of selection, acting on northern bindin alleles).

This strong population divergence is remarkable given the very high level of within-population bindin diversity in P. miniata, including both single nucleotide polymorphisms and indel variation. This overall polymorphism is notably high compared to studies of other broadcast spawning species with comparable sample sizes (Table 2). The nucleotide variation is partitioned significantly between Sandspit and Bamfield, and the standardized genetic distance is much greater than that observed for neutral markers, suggesting that the divergence in bindin has been greater than that expected by genetic drift alone. We consider how these findings compare to theoretical expectations and to previous observations of polymorphism at reproductive loci, and contribute to our understanding of speciation in the marine environment.

This work builds on previous studies of population differentiation in reproductive proteins driven by positive selection. In the M7 lysin locus of the mussel Mytilus galloprovincialis, one clade (GD) of alleles which showed significant positive selection was found at different frequencies across populations, suggesting the selective processes were localized (Springer and Crespi 2007; Riginos et al. 2006). Similarly, in the sea urchin Echinometra oblonga, one clade (clade 3) of bindin alleles occurred exclusively in island populations with a sympatric congener. In this case, positive

84

selection was detected only when clade 3 was included with other clades in an analysis, indicating that positive selection caused differentiation of clade 3 from other E. oblonga bindin alleles (Geyer and Palumbi 2003). In a third case study, no population differentiation was detected among bindin alleles of Strongylocentrotus franciscanus sampled along the Pacific coast of North America between Alaska and southern California (Debenham et al. 2000). Taken together, while many studies have detected positive selection in fertilization proteins within populations and across species (reviewed by Swanson and Vacquier 2002), surprisingly few studies appear to have tested for divergence between groups of alleles that characterize geographically separated populations. Our observations of population differentiation driven by positive selection suggest that future phylogeographic analyses of gamete recognition genes could reveal population-level patterns associated with reproductive divergence.

One striking feature in our bindin alignment was the high number of indels in repeat regions of the gene. Indel variation has previously been observed in gamete recognition loci of broadcast spawners, both across (Metz and Palumbi 1996; Biermann 1998; Zigler et al. 2005) and within species (Table 3). Indeed, complex repetitive protein domain structure and insertion-deletion of repeats appears to be a hallmark of species- level diversity of gamete recognition loci under positive selection (Palumbi 1999; McCartney and Lessios 2004; Levitan and Stapper 2009), and may allow relatively rapid diversification through unequal crossing over (Minor et al. 1991; Vacquier et al. 1995; Metz and Palumbi 1996). Positive selection on indel variation has been identified in sperm proteins of primates (Podlaha and Zhang 2003) and rodents (Podlaha et al. 2005), and indels were also found to be associated with regions of positive selection in seminal fluid proteins of Drosophila (Schully and Hellberg 2006). The occurrence in our study of indel variation within regions of the gene that include codons under positive selection further suggests that indel mutations may be selected for their effects on protein function similarly to selection acting on other forms of functional variation caused by positively selected amino acid substitutions.

If the number of collagen-like domains or the number of tandem repeat copies are functionally significant and under selection, that selection may differ between Bamfield and Sandspit. The most common number of collagen-like copies is three, and in the Bamfield population, alleles with only two copies appear to have evolved

85

convergently in three different ways. This finding was supported by the differential alignment of the 2-copy variants, as well as the finding of one 2-copy variant within a distinct clade in the gene tree (clade b of Fig. 4.2). This is in stark contrast to the Sandspit population, in which only one allele was sampled with other than three collagen-like copies. Notably, this rare variant was found in the population with the weaker evidence for positive selection on individual codons, and had more (four) rather than fewer collagen-like domains. Tandem repeat number was also somewhat divergent between populations, with some private alleles found in both populations. The distribution of tandem repeat copy number on the consensus gene tree suggests that tandem repeat indels also occurred multiple times in parallel. Because the gene trees were inferred using pairwise deletion of missing characters, they are informed by nucleotide polymorphisms only. It therefore remains possible that the evolution of tandem repeat copy number was more conserved through time than seems apparent from the consensus tree. However, without an a priori model of molecular evolution that weighs indel variation relative to nucleotide polymorphism, this issue is not easily resolvable.

Although we looked for gene conversion across repeats, it does not appear to be a major process driving divergence among these P. miniata populations. If it were, we might expect repeats within alleles to be more similar to each other than to repeats in the same part of the coding sequence alignment in other alleles. Instead, almost all tandem repeat and collagen-like domain copies formed clades that implied homology of those domain copies to each other. However, our focus on the Bamfield and Sandspit populations does not rule out the possibility of finding more evidence of gene conversion in other populations. Of the two Alaskan individuals sampled, each had one allele with a fully converted repeat copy. Concerted evolution has been identified across species in the vitelline envelope receptor for lysin in molluscs (Swanson and Vacquier 1998), and may yet be an important mechanism of divergence in P. miniata bindin at the population level.

Selective mechanisms

Several potential mechanisms have been proposed to explain observations of positive selection in fertilization proteins, some of which also predict high allelic diversity

86

and population-level variation. First, a sexual conflict may exist in broadcast spawners under conditions of high sperm competition, in which males are selected to maximize encounters and compatibility of sperm with eggs, and females are selected to moderate encounters between eggs and sperm, possibly through lowered sperm-egg compatibility, in order to reduce female risk of polyspermy (fertilization of individual eggs by multiple sperm). Theory suggests that such a conflict over the rate of sperm-egg encounters can lead to an arms race between the sexes, in which successive adaptations in the female reproductive system to reduce polyspermy rates are counteracted by specific male adaptations. Such an arms race operating independently in geographically isolated populations is expected to lead to population-specific male and female adaptations, population divergence of reproductive traits, and (potentially) to reproductive isolation between members of such populations (Gavrilets 2000; Gavrilets and Waxman 2002). Such a process is of obvious interest to evolutionary ecologists as a source of selection leading to the formation of new species.

Our findings of positive selection at different codon sites across populations is consistent with the evolutionary chase outcome of sexual conflict (Gavrilets and Waxman 2002). Divergent female reproductive traits may have evolved in northern and southern P. miniata populations, leading to differential selection on male bindin genotypes. However, because allele diversity was also high in both populations, and neither selective sweeps nor gene conversion were apparent, these findings are also consistent with the polymorphism-maintenance outcome of sexual conflict (Haygood 2004). Hence these alleles may be maintained by negative frequency dependence, with different novel genotypes selected in either population (Haygood 2004; Tomaiuolo and Levitan 2010). While we have no direct evidence for a sexual conflict via polyspermy in this species, Patiria miniata is known to live in at high adult spatial densities in Bamfield (2.6-3.5 individuals m-2, Rumrill 1989) and California (1.9 individuals m-2, Schroeter et al. 1983), similar to the range of high densities that are associated with sperm competition and polyspermy in the urchin Strongylocentrotus franciscanus (Levitan and Ferrell 2006). The stronger signal of positive selection in the Bamfield sample (compared to Sandspit) is also consistent with greater expected response to selection in the population with the larger effective population size (Keever et al. 2009; McGovern et al. 2010).

87

Other mechanisms of positive selection are worthy of consideration, namely reproductive character displacement, sexual selection (without conflict per se), and immunological defense (Vacquier et al. 1997). Reproductive character displacement as a result of reinforcement selection against hybrids is not a likely mechanism in P. miniata as there are no closely related species in its range. The most closely related and sympatric species that are abundant and likely to have similar spawning seasons are from different taxonomic families (the asteropseid Dermasterias imbricata; several solasterid species in the genera Crossaster, Solaster; Mah 2011; Mah 2012). Because hybridization is unlikely across families (e.g., Nakachi 2006), and introgression of haplotypes from other species has not been observed in extensive phylogeographic surveys of mtDNA (Keever et al. 2009), P. miniata seem unlikely to experience selection against the formation of low-fitness hybrids with other species. Sexual selection without conflict over polyspermy rates could result in directional selection driven by sperm competition and female choice (e.g. for more compatible males), and indeed divergence between populations may represent different molecular pathways towards the same optimal male phenotype. However, under this type of sexual selection, a single optimal male genotype is expected to be selected toward fixation. Without the trade-offs associated with an accompanying sexual conflict, it is difficult to explain the high within- population bindin variability on the basis of sexual selection among males alone. Finally, immunological defense is a potential driver of evolution in egg surface receptors, and pathogens may drive both diversification and positive selection in similar conflict scenarios (Hughes and Nei 1988), via selection on sperm ligands to match changing female receptors (Vacquier et al. 1997). Hence immunological defense cannot be ruled out as a potential alternative explanation for the patterns observed. Whichever mechanism(s) may be involved, population-level divergence is an important outcome with potential consequences for incipient speciation.

Potential function of bindin coding sequence variation

Although the functional significance of bindin amino acid and indel variation remains to be tested, it is difficult to imagine that the polymorphism in P. miniata bindin is neutral to fertilization success. Variation in reproductive success is correlated with quantitatively lower levels of sequence variation at gamete recognition loci of other species (Palumbi 1999; Levitan and Ferrell 2006; Levitan and Stapper 2009). For

88

example, complete gamete incompatibility between congeneric sea urchins can be associated with as few as 8-10 bindin amino acid substitutions (and perhaps correlated changes in the egg bindin receptor) (Zigler et al. 2005). Importantly, experimental studies of species-specific molecular responses highlight repeat elements in bindin as having a possible function in species recognition (Vacquier et al. 1995).

Conclusion

We find high intraspecific polymorphism and evidence for divergent positive selection between two populations at the gene encoding the male-expressed gamete reproductive protein bindin. We also find evidence that indel variation is subject to diversifying selection across these populations. These findings suggest that selection favours different bindin alleles in the two populations, most likely through female choice via egg cell surface molecules. This divergence may represent early stages of incipient speciation based on processes intrinsic to Patiria miniata populations, rather than in response to extrinsic selective pressures associated with, for example, environmental or biotic differences between the two habitats occupied by those populations. Study of the functional role of bindin variation is required to understand the mechanisms maintaining polymorphism within populations and leading to divergence between populations. The potential for non-ecological processes to drive reproductive isolation among populations has profound implications for our understanding of the speciation process, particularly in the ocean where strong physical barriers are to gene flow are sparse, and the origins of reproductive isolation at its earliest stages are not obvious.

89

References

Biermann CH (1998) The molecular evolution of sperm bindin in six species of sea urchins (Echinoida : Strongylocentrotidae). Molecular Biology and Evolution 15, 1761-1771.

Biermann CH, Marks JA (2000) Geographic divergence of gamete recognition systems in two species in the sea urchin genus Strongylocentrotus. Zygote 8, S86-S87.

Binks RM, Prince J, Evans JP, Kennington WJ (2012) More than bindin divergence: reproductive isolation between sympatric subspecies of a sea urchin by asynchronous spawning. Evolution 66, 3545-3557.

Bird CE, Karl SA, Smouse PE, Toonen RJ (2011) Detecting and measuring genetic differentiation. In Phylogeography and Population Genetics in Crustacea (ed. Koenemann S, Held C, Schubart C), pp. 31-55: CRC Press Crustacean Issues Series, vol. 19.

Debenham P, Brzezinski MA, Foltz KR (2000) Evaluation of sequence variation and selection in the bindin locus of the red sea urchin, Strongylocentrotus franciscanus. Journal of Molecular Evolution 51, 481-490.

Delport W, Poon AFY, Frost SDW, Pond SLK (2010) Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics 26, 2455- 2457.

Galindo BE, Vacquier VD, Swanson WJ (2003) Positive selection in the egg receptor for abalone sperm lysin. Proceedings of the National Academy of Sciences, USA 100, 4639-4643.

Gavrilets S (2000) Rapid evolution of reproductive barriers driven by sexual conflict. Nature 403, 886-889.

Gavrilets S, Waxman D (2002) Sympatric speciation by sexual conflict. Proceedings of the National Academy of Sciences, USA 99, 10533-10538.

Geyer LB, Lessios H (2009) Lack of character displacement in the male recognition molecule, bindin, in altantic sea urchins of the genus Echinometra. Molecular Biology and Evolution 26, 2135-2146.

Geyer LB, Palumbi SR (2003) Reproductive character displacement and the genetics of gamete recognition in tropical sea urchins. Evolution 57, 1049-1060.

Hart MW, Popovic I, Emlet RB (2012) Low rates of bindin codon evolution in lecithotrhophic Heliocidaris sea urchins. Evolution 66, 1709-1721.

Haygood R (2004) Sexual conflict and protein polymorphism. Evolution 58, 1414-1423.

90

Hedrick PW (2005) A standardized genetic differentiation measure. Evolution 59, 1633- 1638.

Hellberg ME, Dennis AB, Arbour-Reily P, Aagaard JE, Swanson WJ (2012) The tegula tango: a coevolutionary dance of interacting, positively selected sperm and egg proteins. Evolution 66, 1681-1694.

Hellberg ME, Vacquier VD (1999) Rapid evolution of fertilization selectivity and lysin cDNA sequences in teguline gastropods. Molecular Biology and Evolution 16, 839-848.

Hughes AL, Nei M (1988) Pattern of nucleotide substitution at major histocompatability complex class I loci reveals overdominant selection. Nature 335, 167-170.

Kamei N, Glabe CG (2003) The species-specific egg receptor for sea urchin sperm adhesion is EBR1, a novel ADAMTS protein. Genes & Development 17, 2502- 2507.

Katoh K, Misawa K, Kuma K, Miyata T (2002) AFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30, 3059-3066.

Keever CC, Sunday J, Puritz JB, Addison JA, Toonen RJ, Grosberg RK, Hart MW (2009) Discordant distribution of populations and genetic variation in a sea star with high dispersal potential. Evolution 63, 3214-3227.

Kosakovsky Pond SL, Murrell B, Fourment M, Frost SDW, Delport W, Scheffler K (2011) A random effects branch-site model for detecting episodic diversifying selection. Molecular Biology and Evolution 28, 3033-3043.

Kosakovsky Pond SL, Posada D, Gravenor MB, Woelk CH, Frost SDW (2006) Automated phylogenetic detection of recombination using a genetic algorithm. Molecular Biology and Evolution 23, 1891-1901.

Lee Y, Vacquier VD (1992) The divergence of species-specific abalone sperm lysins in promoted by positive darwinian selection. Biological Bulletin 182, 97-104.

Lessios HA Speciation Genes in Free-Spawning Marine Invertebrates. Integrative and Comparative Biology 51, 456-465.

Levitan DR, Ferrell DL (2006) Selection on gamete recognition proteins depends on sex, density, and genotype frequency. Science 312, 267-269.

Levitan DR, Stapper AP (2009) Simultaneous positive and negative frequency- dependent selection on sperm bindin, a gamete recognition protein in the sea urchin Strongylocentrotus purpuratus. Evolution 64, 785-797.

Mah C, and D. Foltz (2011) Molecular phylogeny of the Valvatacea (Asteroidea: Echinodermata). Zoological Journal of the Linnean Society 161, 769-788.

91

Mah CL, and D. B. Blake (2012) Global diversity and phylogeny of the Asteroidea (Echinodermata). PLoS ONE 7, e35644.

McCartney MA, Lessios HA (2004) Adaptive evolution of sperm bindin tracks egg incompatibility in neotropical sea urchins of the genus Echinometra. Molecular Biology and Evolution 21, 732-745.

McDonald JH, Kreitman M (1991) Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652-654.

McGovern TM, Keever CC, Saski CA, Hart MW, Marko PB (2010) Divergence genetics analysis reveals historical population genetic processes leading to contrasting phylogeographic patterns in co-distributed species. Molecular Ecology 19, 5043- 5060.

Meirmans PG (2006) Using the AMOVA framework to estimate a standardized genetic differentiation measure. Evolution 60:, 2399–2402.

Metz EC, Gomez-Gutierrez G, Vacquier VD (1998) Mitochondrial DNA and bindin gene sequence evolution among allopatric species of the sea urchin genus Arbacia. Molecular Biology and Evolution 15, 185-195.

Metz EC, Palumbi SR (1996) Positive selection and sequence rearrangements generate extensive polymorphism in the gamete recognition protein bindin. Molecular Biology and Evolution 13, 397-406.

Minor JE, Fromson DR, Britten RJ, Davidson EH (1991) Comparison on fte bindin proteins of Strongylocentrotus franciscanus, Strongylocentrotus purpuratus, and Lytechinus variegatus sequences involved in the species specificity of fertilization. Molecular Biology and Evolution 8, 781-795.

Moy GW, Springer SA, Adams SL, Swanson WJ, Vacquier VD (2008) Extraordinary intraspecific diversity in oyster sperm bindin. Proceedings of the National Academy of Sciences of the United States of America 105, 1993-1998.

Nakachi M, H. MOriyama, M. Hoshi, and M. Matsumoto (2006) Acrosome reaction is subfamily specific in sea star fertilization. Developmental Biology 298, 597-604.

Palumbi SR (1999) All males are not created equal: Fertility differences depend on gamete recognition polymorphisms in sea urchins. Proceedings of the National Academy of Sciences of the United States of America 96, 12632-12637.

Palumbi SR (2009) Speciation and the evolution of gamete recognition genes: pattern and process. Heredity 102, 66-76.

Patiño S, Aagaard JE, MacCoss MJ, Swanson WJ, Hart MW (2009) Bindin from a sea star. Evolution and Development 11, 376-381.

92

Podlaha O, Webb DM, Tucker PK, Zhang JZ (2005) Positive selection for indel substitutions in the rodent sperm protein Catsper1. Molecular Biology and Evolution 22, 1845-1852.

Podlaha O, Zhang JZ (2003) Positive selection on protein-length in the evolution of a primate sperm ion channel. Proceedings of the National Academy of Sciences of the United States of America 100, 12241-12246.

Riginos C, McDonald JH (2003) Positive selection on an acrosomal sperm protein, M7 lysin, in three species of the mussel genus Mytilus. Molecular Biology and Evolution 20, 200-207.

Riginos C, Wang D, Abrams AJ (2006) Geographic variation and positive selection on M7 lysin, an acrosomal sperm protein in mussels (Mytilus spp.). Molecular Biology and Evolution 23, 1952-1965.

Rumrill SS (1989) Population size structure, juvenile growth, and breeding periodicity of the sea star Asterina miniata in Barkley Sound, British Columbia. Marine Ecology-Progress Series 56, 37-47.

Schroeter SC, Dixon J, Kastendiek J (1983) Effects of the starfish Patiria miniata on the distribution of the sea urchin Lytechinus anamesus in a southern californian kelp forest. Oecologia 56, 141-147.

Schully SD, Hellberg ME (2006) Positive selection on nucleotide substitutions and indels in accessory gland proteins of the Drosophila pseudoobscura subgroup. Journal of Molecular Evolution 62, 793-802.

Springer SA, Crespi BJ (2007) Adaptive gamete-recognition divergence in a hybridizing Mytilus population. Evolution 61, 772-783.

Styan CA, Kupriyanova E, Havenhand JN (2008) Barriers to cross-fertilization between populations of a widely dispersed polychaete species are unlikely to have arisen through gametic compatability arms races. Evolution 62, 3041-3055.

Swanson WJ, Vacquier VD (1998) Concerted evolution in an egg receptor for a rapidly evolving abalone sperm protein. Science 281, 710-712.

Swanson WJ, Vacquier VD (2002) The rapid evolution of reproductive proteins. Nature Reviews Genetics 3, 137-144.

Tomaiuolo M, Levitan DR (2010) Modeling how reproductive ecology can drive protein diversification and result in linkage disequilibrium between sperm and egg proteins. American Naturalist 176, 14-25.

Turner LM, Hoekstra HE (2008) Causes and consequences of the evolution of reproductive proteins. International Journal of Developmental Biology 52, 769- 780.

93

Vacquier VD, Swanson WJ, Hellberg ME (1995) What have we learned about sea urchin sperm bindin. Development Growth & Differentiation 37, 1-10.

Vacquier VD, Swanson WJ, Lee YH (1997) Positive Darwinian selection on two homologous fertilization proteins: What is the selective pressure driving their divergence? Journal of Molecular Evolution 44, S15-S22.

Yang ZH (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24, 1586-1591.

Yang ZH, Wong WSW, Nielsen R (2005) Bayes empirical Bayes inference of amino acid sites under positive selection. Molecular Biology and Evolution 22, 1107-1118.

Zhang HB, Scarpa J, Hare MP (2010) Differential Fertilization Success Between Two Populations of Eastern Oyster, Crassostrea virginica. Biological Bulletin 219, 142-150.

Zhang JZ, Nielsen R, Yang ZH (2005) Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Molecular Biology and Evolution 22, 2472-2479.

Zigler KS, McCartney MA, Levitan DR, Lessios HA (2005) Sea urchin bindin divergence predicts gamete compatibility. Evolution 59, 2399-2404.

94

Tables and Figures

Table 4.1. Details of codon sites under positive selection.

95

Table 4.2. Review of allelic and indel variation in gamete recognition genes of broadcast spawning marine invertebrates

Number of No. unique repeat Positive Species Study alleles motif selection? (sample copy size) variants P. miniata This study 68 (88) 18 yes C. gigas Moy et al. 2008 unknown 4 yes S. pupuratus Levitan and Stapper 2009 40 (135) 2 - S. franciscanus Debenham et al., 2000 14 (134) 0 no Levitan and Ferrel, 2006 15 (254) 0 no E. malthaei Palumbi 1995 15 (85) 3 yes E. oblonga Geyer and Palumbi 40 (82) 2 yes E. sp. C Geyer and Palumbi, 2003 37 (84) 0 yes E. lucunter Geyer and Lessios, 2009 96 (246) 6 yes E. viridis McCartney and Lessios, 2004 28 (31) 2 weak E. vanbrunti McCartney and Lessios, 2004 17 (17) 2 no

96

a) b) recombination site collagen-like 1 collagen-like 2 collagen-like 3 tandem repeats * core British Columbia 5’ 3’ Sandspit

1285 bp intron collagen-like 2b Bam!eld

0 200 400 600 800 Codon position U.S.A.

Fig. 4.1. Schematic diagram of bindin gene coding region and map of study area.

(a) Main region (above) shows the 914-amino acid haplotype with the modal number of collagen- like copies and tandem repeats. Inserts below show the extra collagen-like copy with 3’ flanking region, and the extra tandem repeat, each found only once in our sample. Black and grey solid rectangles show tandem repeat copies that are present in all alleles, dashed rectangles show tandem repeats that vary in copy number among alleles. Asterisk shows location of the single recombination site at codon site 855, black triangle denotes the 1285 bp intron at codon site 856, and solid black rectangle denotes the 135 bp invariant core. Colors in sequence are the default colours used in Se-al (v.2.0) (cyan=H,R,K; magenta=G,P,T,A,S; black=E,Q,D,N; green=L,V,M,I; blue=Y). (b) Map of study region and location of two main population samples. Dashed grey line represents location of phylogeographic break identified by Keever et al. (2009).

97

Bamfield Sandspit Alaska Fort Bragg Combined

No. individuals sampled 20 20 2 2 44 No. alleles 37 26 4 4 68 Heterozygosity 1 0.9 1 1 0.98 Pairwise nucleotide diversity 0.131±0.064 0.020±0.009 0.014±0.009 0.008±0.005

Bamfield Sandspit Alaska Fort Bragg All combined Collagen copy 4 number 3 2

1

0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 0 20 40 60 80 Tandem repeat 9 number 8 7 6 5 4 3

0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 0 20 40 60 80

ind360 ind90 ind805 Other indels in ind778 ind280 coding region ind825 ind405 ind230 ind215 ind90l

0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 0 20 40 60 80 Frequency

Fig. 4.2. Summary of bindin diversity across sampled populations.

98

P. pectinifera

BA85-A BA86-B BA94-B a BA85-B 0.95 AK52-B SS12-B SS80-B SS85-B 0.89 SS26-B CA02-B BA48-B BA72-A a SS74-B SS76-B SS89-A a BA76-B A 805 BA71-A BA84-B 631 BA82-A 0.75 842 SS89-B BA48-A AK52-A a 1.00 AK51-B BA62-B BA56-B BA31-A 0.86 BA86-A b 631 BA78-B 175 176 BA78-A 124 BA33-B BA62-A SS80-B BA96-A c SS01-B BA40-A SS75-B SS75-A SS28-B SS98-B SS90-B SS76-A SS86-A SS28-A 493 SS90-A 0.90 640 SS86-B 365 SS93-B SS04-B 2 x 18bp inserts SS04-A SS17-A SS29-B SS01-A CA04-B CA04-A CA02-A BA71-B SS98-A 842 SS26-B SS85-B B SS27-B 0.92 0.95 SS27-A Synonymous transversion 461 SS79-B at site 443 SS17-B 0.97 SS79-B 842 SS88-B SS88-A SS93-A SS29-A 653 BA55-A BA31-B BA96-B BA58-A BA89-A d BA89-B BA94-A BA74-B e BA40-B 631 BA80-A BA80-B BA84-A 124 BA74-A BA72-B AK51-A SS74-A BA76-A a BA58-B 631 SS12-A BA82-B BA33-B

99

Fig. 4.3. Consensus Bayesian gene tree and gene alignment for the first recombinant region of the bindin gene in Patiria miniata.

The gene tree is annotated with the eleven codon sites under positive selection. Circled numbers refer to codon sites described in Table 3. Grey circles represent reversals. Two other differences are also annotated on the tree – the addition of two 18 bp inserts, which distinguishes the paraphyletic groups 1 and 2, and a single but common synonymous transversion, which most members of haplotype group B share. Uncircled numbers above major clades denote Bayesian posterior probability of partition. Gaps in the gene alignment shows variation in repeat units, grey tone in sequences represents amino acid identity (light grey= H,R,K,G,P,T,A,S; dark grey= E,Q,D,N,L,V,M,I,Y), and letters a-d to the right indicate repeat-number variants that occurred multiple times (and are emphasized in the main text).

100

a) 1 2 2b 3

3A 1

BA-2

BA-2 2b 3B

2

b) 1 2 3 4 5 6 7 8 9 10 11 12

0.02 4 2 11 BA-3 3

1 10

12

2 x AK-10 5,6,7,8,9

Fig. 4.4. Neighbour-joining gene trees from alignments of (a) collagen-like copies and (b) tandem repeats across 44 individuals.

In both, clades of repeat units match their location in the gene by number and colour. In the collagen-like copy alignment (a), the location of repeat unit 2b is noted with a pink arrowhead, and the two copies of repeat unit 2 from Bamfield that show conversion with other copies (BA-2) are highlighted. In (b) the three tandem repeats that show partial or full gene conversion are highlighted (two from Alaska: AK-10; one from Bamfield: BA-3).

101

Fig. 4.5. Site-by-site probability of positive selection, and location of indels, in the first exon of bindin in Patiria miniata.

Peaks show for each codon the Bayes-empirical-Bayes estimate of the posterior probability of belonging to the positively selected class of sites (with w>1) and high rates of amino acid substitution along the foreground branches from the maximum likelihood analysis. Panel (a) shows results with the Bamfield population in the foreground, panel (b) shows results with the Sandspit population in the foreground, both using the most visited tree from the posterior distribution of most likely gene trees generated by the MrBayes analysis (for results from other gene trees, see Fig. S1). Height of the curve shows the probability that dN/dS is greater than 1 for each site along the length (x-axis) of the first recombinant region. Grey bands show the 95% probability range. Numbered peaks indicate sites with >95% probability dN/dS>1. Panel (c) shows the distribution of indels.

102

Chapter 5.

Localized reproductive coevolution in a seastar

Authors: J.M. Sunday and M.W. Hart

Abstract

Findings of positive selection in fertilization proteins of broadcast spawners suggest sexual conflict drives diversification. One compelling and untested question is whether positive selection within allopatric populations leads to reproductive isolation between them. Here we investigate patterns of localized coevolution of mating affinity in two populations of a seastar north and south of a phylogeographic break, and compare the results with known patterns of divergence in the locus coding for the sperm protein, bindin. We test for effects of population source and male bindin genotype on fertilization success in single-mate crosses within and between populations. Our results reveal an asymmetrical pattern of localized reproductive coevolution, in which females from the larger southern population have greater fertilization success when mated with males from the same population than with northern males. We also find that copy number variation in repetitive regions in the male bindin locus is related to fertilization success in a population-specific manner: males with fewer copies and therefore shorter bindin alleles have greater fertilization success when mated to females from one, but not the other, population. Our results suggest that sexual selection for shorter bindin alleles has occurred in the population where among-male sperm competition is likely stronger and where short bindin alleles are sampled more frequently. These findings provide evidence that sexual selection can lead to population divergence in reproductive compatibility in natural populations.

103

Introduction

One of the most remarkable patterns to emerge from molecular genetic investigations of natural populations is the signature of diversifying selection in proteins involved in fertilization (Swanson & Vacquier 2002; Turner & Hoekstra 2008). While intuition suggests such proteins should be functionally conserved by selection, molecular evidence shows that selection has favoured high rates of change in proteins in numerous species of insects (e.g. Schully & Hellberg 2006), rodents (e.g. Sutton & Wilkinson 1997), primates (e.g. Gasper & Swanson 2006), and broadcast-spawning marine invertebrates (reviewed in Swanson & Vacquier 2002 and Palumbi 2009). Multiple non-mutually exclusive mechanisms have been proposed to explain these findings (reviewed in Swanson & Vacquier 2002), including evolutionary arms races between males and females over conflicting reproductive fitness optima, reproductive character displacement owing to selection against interspecific hybridization, and arms races with co-evolving pathogens. Each of these mechanisms suggests that diversifying selection within populations may lead to reproductive divergence and prezygotic isolation among populations, particularly if they are driven by a continual arms race (Rice & Holland 1997; Vacquier et al. 1997; Holland & Rice 1998; Rice 1998; Arnqvist et al. 2000; Gavrilets 2000; Swanson & Vacquier 2002).

In broadcast spawning marine invertebrates, mate choice is primarily mediated by gamete surface proteins, providing a relatively simple system in which it is possible to understand the evolution of reproductive isolation. Considerable evidence suggests that evolution in these proteins has been both rapid and dynamic in multiple species. In genomic sequences of these loci, signals of positive selection have been detected among numerous taxa, including urchins (Metz & Palumbi 1996; Biermann 1998; McCartney & Lessios 2004), abalone (Lee & Vacquier 1992; Galindo et al. 2003) turban snails (Hellberg & Vacquier 1999), mussels (Riginos & McDonald 2003; Springer & Crespi 2007), and oysters (Moy et al. 2008). In some taxa, signals of positive selection have been associated with low within-species diversity and signs of selective sweeps, such as in the abalone sperm lysin (Lee & Vacquier 1992). In other taxa, substantial genetic polymorphism exists within species, with or without a signal of positive selection, such as in the sea urchin sperm protein, bindin, of several species (Metz & Palumbi

104

1996; Palumbi 1999; Geyer & Palumbi 2003). Both of these outcomes are consistent with theoretical expectations under sexual conflict between male and female gametes (Gavrilets & Waxman 2002; Haygood 2004; Tomaiuolo & Levitan 2010).

Sexual conflict in broadcast spawners is thought to arise when sperm competition is high, because eggs are at risk of polyspermy (supernumery sperm fusion) usually resulting in embryo death. While males are selected for high rates of fertilization because of strong inter-male competition, selection may favour females that reduce fertilization rates, allowing the egg time to establish a physical block to polyspermy. Eggs with surface protein receptors that are unmatched to sperm surface proteins or rare among competing sperm may therefore be favoured by selection among females because of reduced fertilization and polyspermy rates, while sperm with close matches to most females should be favoured by selection among males (Palumbi 1999). Direct evidence for this pattern of selection comes from fertilization experiments in sea urchins, in which females with non-matching (Strongylocentrotus franciscunus) or rare (S. purpuratus) genotypes had greater mating success when sperm densities were high (Levitan & Ferrell 2006; Levitan & Stapper 2009).

Theoretical models suggest sexual conflict within populations could lead to rapid divergence in reproductive compatibility among allopatric populations (Gavrilets 2000; Gavrilets & Waxman 2002). However, there has been surprisingly little empirical investigation of this hypothesis. Some work has shown differences in allele frequencies at gamete compatibility loci among populations (in sea urchins: Geyer & Palumbi 2003, in mussels: Riginos et al. 2006; Springer & Crespi 2007). Other experimental work has demonstrated gamete incompatibility between populations of the same species, indicating localized coevolution in reproductive traits (in the urchin Strongylocentrotus droebachiensis: Biermann & Marks 2000; in the serpulid polychaete, Galeolaria caespitosa: Styan et al. 2008; in the oyster Crassostrea virginica, Zhang et al. 2010). These studies each indicate partial and asymmetrical reproductive isolation between populations, albeit in populations separated by large distances. To date, no analysis has combined a study of divergence in fertilization loci with measurements of gamete compatibility between members of different populations to test the effect of selection within populations on the evolution of reproductive incompatibility between them.

105

We recently found a pattern of divergent positive selection in a gamete compatibility locus among populations of the coastal broadcast spawning seastar, Patiria miniata (Sunday & Hart in prep). In two sampled populations on either side of a phylogeographic break in British Columbia, Canada, we found high levels of variation in the gene for the sperm protein, bindin, with significant population differentiation. Positive Darwinian selection was found in both populations, with stronger evidence and more sites under positive selection in the southern sampled population, where effective population size is also greater; we found weaker evidence and fewer selected sites in the northern sampled population. Most of the codon sites identified as being under positive selection in one population were under purifying selection in the other population. This suggests that the bindin locus has responded to divergent sexual selection on either side of the historical barrier with possibly stronger selection, or stronger responses to selection, in the southern population. In addition, the bindin coding sequence in this species is characterized by two large repetitive domains, and we found high variability in the number of repeats at both domains. The variability in repeat number was greater in the southern population, with evidence of convergent losses in repeat motifs. These findings suggest that copy number variation may also be under differential selection in these populations. Although these findings provide compelling evidence for divergent sexual selection which may affect reproductive incompatibility between the two populations, the functional significance of this molecular variation in seastar bindin is yet unknown (Sunday & Hart in prep).

In the work reported here, we investigate reproductive compatibility in males and females within and between the northern and southern populations of P. miniata, and test for a functional significance of molecular variation in the bindin locus. We hypothesize that bindin polymorphism has been maintained by localized sexual selection, and test the prediction that males with derived genotypes at selected sites within a population have a fertilization advantage when mated with females from the same population. We tested this prediction in sperm-limiting, non-competitive crosses, focusing on copy number variation and on codon sites identified as being under positive selection in each population. We found a pattern of localized coevolution and evidence for population-specific female affinity for shorter bindin alleles in the southern population.

106

Methods

Study Design

We conducted full-factorial fertilization crosses between males and females from each of the two collection regions, in sperm-limiting non-competitive conditions, to assess variation in fertilization affinities between mating pairs. We genotyped all mating adults after the fertilization experiments were completed, and analyzed fertilization success as a function of population source, bindin genotype, and gamete physical traits. Because it is generally unknown how bindin variation is expressed in males and interacts with female receptors, we considered multiple models and used an information theoretics approach to compare among them. Briefly, we tested effects of copy number variation and sites under positive selection, and considered how these variables affect fertilization success under different expression models of dominance, effects of heterozygosity, and effects of male-female bindin genotype matching (see below).

Collections

P. miniata adults were collected from two locations north and south of a phylogeographic break at Queen Charlotte Strait in British Columbia, Canada. Collection locations were Bamfield, Vancouver Island (southern), and Sandspit, Haida Gwaii (northern; see Fig. 5.1). Specimens were collected at low tide and transported by road or by air to Simon Fraser University in chilled coolers within 10 hours. Specimens were kept together in recirculating UV-filtered seawater tanks at 11-12°C and fed regularly with mussels. Individuals were acclimated to lab conditions for at least 2 weeks before trials began.

Fertilization Crosses

We conducted ten independent fertilization crosses on separate days. Each cross involved a male and a female individual from each of the two populations, crossed reciprocally. To obtain gametes, we removed individuals from seawater, brought them to room temperature, and induced them to spawn by dorsal injection of 1-3 ml 100uM 1- methyl-adenine, which induces maturation of oocytes (in females) and spawning (in both sexes, Schroeder year). Eggs appearing within 30 minutes of injection were collected by

107

Pasteur pipette into 0.45um filtered seawater, rinsed 3 times with filtered seawater, and allowed to settle to the bottom of a glass beaker at ~12°C. Sperm were collected with a pipette as they were extruded from the gonopore without water (“dry”) and were placed in a glass petri dish on ice.

Before each experimental cross, sperm were brought to three desired concentrations in 0.45 filtered seawater. We first diluted sperm by a factor of 100, fixed a subsample of this stock solution with 5% formalin, and measured absorbance using a Nanodrop 2000c spectrophotometer (Thermoscientific). We used the difference in absorbance at 600 and 280 nm, as recommended for cell cultures by the manufacturer. We compared this absorbance to a standard curve of absorbance against sperm concentration generated previously, using haemocytometer counts of fixed sperm from a series of 10-fold dilutions. The experimental sperm stock was then independently diluted to three desired sperm concentrations in 20ml of filtered seawater, chosen to approximate conditions of sperm limitation through to sperm competition based on preliminary experiments (low: 104 ml-1, medium: 105 ml-1, and high: 106 ml-1). Aliquots of sperm dilutions used in experimental treatments were also fixed in 5% formalin and counted on a haemocytometer after each experiment, to check that they were at desired concentrations.

Immediately after the sperm stock was diluted and mixed, 4 ml of each sperm treatment was dispensed into two 5mm sterile plastic petri dishes. One drop of eggs from each female (~0.1ml) that had settled to the bottom of a glass beaker was added to each dish containing sperm. As well, eggs were added to two controls with only filtered seawater. Thus, the reciprocal crosses were replicated twice at each of three sperm concentrations and one control. All crosses were conducted in an incubation chamber held at ~12°C.

Fertilization success

Fertilization success was scored at 3 stages after each fertilization as the proportion of embryos reaching a certain stage, by scoring the first 100 eggs observed under a dissecting microscope. We measured (1) presence or absence of fertilization envelopes at 30 minutes post-insemination, (2) presence or absence of cleavage after 2

108

hours, and (3) development to the blastula stage after 24 hours. For each score, petri dishes were examined directly on the microscope and returned promptly to the incubation chamber. Within any single cross, fertilization success scored using one of these developmental stages was highly correlated with fertilization success scored from the other stages, so here we present results using just the final stage (proportion of blastula embryos after 24 hours). This measure integrates the effects of both successful fertilization encounters (between eggs and single sperm) and unsuccessful encounters (between eggs and multiple sperm leading to embryo death before blastula formation).

Bindin genotypes

After each fertilization experiment, tissue samples were collected from each male and female for genomic DNA extraction. Diploid bindin genotypes were obtained for every male and female, as described and reported in Sunday and Hart (in prep); samples from Bamfield and Sandspit analysed in this study were exactly those of the males and females in the present cross. Genomic bindin sequences of P. miniata have two exons, of 3135 and 174 nucleotide basepairs respectively, separated by a single intron of 1284 basepairs (Fig. 5.1). There is a single recombinant site between the first exon and the intron (Sunday and Hart, in prep). The first exon is highly variable with two repeat domain types (described below), while the second exon has almost no variation, and includes the 135-basepair invariant core (described in Patiño et al. 2009 and Sunday & Hart in prep; Fig. 5.1). We therefore considered only the variable first exon in the present analyses.

Characterization of repeat variation

There are two large repetitive domains in the first exon of P.miniata bindin, with high levels of polymorphism in repeat number (Fig. 5.1). One is a collagen-like repeat, made up of multiple KGKK(G/R)R repeats. Bindin alleles vary in the number of copies of this collagen-like domain (mode = 3 copies, range = 1 to 4), and copies also vary in length (mode number of KGKK(G/R)R repeats= 14, range = 9 to 15). The second repetitive domain is made up of multiple tandem repeats of a distinctive 30-35 amino acid motif (mode = 8 repeats, range = 3 to 9). Most copy number variation in both of these domain types occurs in the southern population, with only one repeat variant

109

among northern males (Fig. 5.1).

Characterization of sites under positive selection

For the codon sites under positive selection, we identified each polymorphism as being either ancestral or derived using the ASR model in HyPhy (Kosakovsky Pond et al. 2006). This allowed us to test the specific hypothesis that derived sites provide a selective advantage to fertilization. We then scored each bindin haplotype by taking a sum of the number of derived states across each allele. This allowed us to test the combined effect of sites under selection under the simple assumption that they are additive, while avoiding the analytical issues of multiple tests, model over- parameterization, and non-independence of sites. We also tested the hypothesis that sites in which derived states were less frequent had a greater effect on fertilization success, by weighing the sum of derived states by the inverse frequency of each derived state. The sites under selection differed between the two populations (Sunday and Hart, in prep). Because we expected selection on males to be mediated by female affinity, the sites considered in each case were only those previously found to be under selection in the population source of the female of each cross.

Models of bindin influence on fertilization

Dominance

It is generally unknown how the many bindin proteins expressed in a single males’ sperm interact with each other to form a bindin mass on the sperm head, whether both bindin alleles are expressed in each sperm of a heterozygous male, how the two bindin protein isoforms of heterozygotes may interact with each other (if both are expressed in each cell), and how the bindin mass interacts with egg surface molecules. This lack of knowledge prevents the development of a strong expectation about whether one or both bindin alleles are relevant to egg compatibility. We therefore explored two models by which diploid male genotypes may affect fertilization success: one in which the effect of each allele is additive, and the other in which the more derived allele is functionally dominant. This model may be appropriate for ligands and receptors if only one allele (with the best match to the female phenotype) determines fertilization

110

success, and is one mechanism by which polymorphisms may be maintained in a population under sexual conflict (Haygood 2004).

Heterozygosity

Under the dominance model, we also considered the hypothesis that heterozygote males have an advantage in fertilization, because they are more likely to have a single match to female phenotypes. If so, we would expect heterozygous males to have greater success on average with the different females with which he is mated, and to have lower variability in fertilization success in each cross. We therefore calculated male heterozygosity at each repeat type and sites under positive selection. For repetitive domains, we scored males as having the same or different numbers of collagen copies, or tandem repeat copies, across his two alleles. We also combined the collagen copy number and tandem repeat number and scored males as having the same or different haplotypes considering both repeat types simultaneously (but ignoring other SNP variation in the gene). For the sites under selection, we took a sum for each male of the number of sites under selection at which he was heterozygous.

Male-female bindin matching

We tested the hypothesis of greater fertilization success when male and female genotypes are matching, as has been demonstrated in three species of sea urchins to date (Palumbi 1999; Levitan & Ferrell 2006; Levitan & Stapper 2009). Although there is no evidence that females express bindin (Gao et al, 1986; M. Hart unpublished), a history of assortative mating can lead to linkage disequilibrium between female- expressed loci involved in sperm-egg compatibility (such as the egg bindin receptor) and male-expressed loci such as bindin (Palumbi 1999; Tomaiuolo & Levitan 2010).

We scored each male-female cross for similarity at the three variables of interest. For each repeat type, we scored each male-female cross as having a full match (pairs consisting of two homozygotes with the same number of copies, or two heterozygotes for the same pair of copy numbers); a zero match (pairs of homozygotes or heterozygotes that share no alleles with the same copy numbers); or a partial match (pairs including at least one heterozygote that share one but not two alleles with the same number of copies). We also combined the repeat types, considering the number of

111

repeats of each together as a single haplotype, and scored for full, partial, or zero matches of this haplotype. For the codons under selection, we scored each site for full, partial, or zero matches, and took a sum of the number of matches at these codons for each cross. Because we only expect genotype matching where assortative mating has lead to linkage disequilibrium between sperm bindin alleles and female receptors, we looked for an effect of genotype matching only in crosses within populations.

Gamete measurements

We measured gamete physical properties that might covary with fertilization rate. Eggs from 18 of the females used in the crosses were obtained from a separate spawning induction. Samples were mounted on slides with elevated coverslips in filtered seawater stained with Sumi ink to visualize the edges of transparent egg jelly coats as well as egg diameter (Schroeder 1980). Digital photos were taken, and analysed using Image-J software (Rasband 1997-2009) calibrated with a micrometer. For 20 eggs from each female, we made two perpendicular measurements of egg diameter, and egg plus jelly coat diameter.

To measure sperm morphology, sperm from all males used in the crosses were obtained in a separate spawning induction, fixed in 10% formalin, and mounted on a glass slide. Photos were taken on an inverted spinning-disc confocal microscope (Zeiss Axio Observer) at 400x magnification. As sperm heads of P. miniata are spherical, we measured sperm head diameter, using Volocity Quantitation (4.3.2) software.

We also measured sperm swimming velocity in a subset of males (4 from each population). This sample size was smaller because our lab population of males experienced high mortality from a castrating gonadal parasite (Sunday et al. 2008). After ~12 months of parasite-free culture in the lab colony, remaining males from the fertilization experiment were induced to spawn, and sperm were collected dry on ice. Sperm stocks were serially diluted in filtered seawater to a final dilution factor of 10-4, and 4 ml of this solution was immediately mounted in a glass-bottom petri dish on the inverted microscope. A plane of focus in the center of this liquid (~2mm thick) was recorded at 9 frames per second for two minutes. We moved the stage vertically and horizontally every 10 seconds to avoid double-counting sperm moving in circular paths.

112

All experiments were conducted at room temperature, and images were taken immediately after mounting dishes under the light source, to avoid advection within petrie dishes during the experiment. Advection circulation was easily identifiable by movement of non-swimming sperm; and all subsequent images were discarded after we detected advection. Velocity of active sperm was measured as the total distance covered across a minimum of 6 frames within the plane of focus, using ImageJ (Rasband 1997-2009) software. We measured a minimum of 20 sperm per male.

Analysis

Differences in fertilization success among populations

We tested for localized coevolution of overall fertilization affinity using a linear model with population source of males, females, and their interaction, as fixed effects on proportion of eggs successfully developed to the blastula stage. With this approach, a positive interaction would indicate a disproportionately greater rate of fertilization success between males and females from the same population. We fitted a generalized linear model using a binomial error distribution for proportion data (Crawley 2007). To account for the effect of crossing each male and female to two different mates (one from the same population source, and one from the other population), we included both male and female identity each as crossed random effects. We also included sperm concentration treatment as a fixed effect. We tested that model assumptions were met by plotting model residuals against each variable and fitted values (Zuur et al. 2009). Because fertilization success was close to saturation (100%) in many crosses at both the medium and high sperm treatment (see Fig. 5.1), inclusion of the high (106 ml-1) treatment introduced a non-linearity that lead to violation of the assumption of heteroscedasticity, and we therefore excluded the high treatment from these tests. Models with and without inclusion of high treatment were quantitatively similar (see Appendix D).

Fertilization success = F(male population+female population+male population*female population + sperm concentration, random= female i.d., male i.d.)

Differences in fertilization success among bindin genotypes

We tested for effects of male bindin genotype, male heterozygosity, and amount

113

of genotypic matching between males and females on fertilization success. We first tested for effects in within-population crosses, and next asked if variables found to be important within populations had the same or different effects across populations. We also tested an affect of male heterozygosity on variability in fertilization success across females.

Because of the large number of potential explanatory variables relative to number of observations, we ran multiple single-variable models and compared them using the Akaike Information Criterion (adjusted for small sample sizes; AICc). This allowed identification of genotype variables which best explained variation in fertilization success. The single-variable approach ignores the possibility of additive effects and interactions among variables. We also explored a large number of more complex models with multiple variables and interactions, but considered them to be potentially confounded by over-parameterization relative to the small number of observations (n=10 crosses), especially given the uneven sampling across bindin genotypes, because we sampled males and females randomly with respect to bindin traits.

Because almost all of the variation in fertilization success occurred at low sperm concentration in southern males and at medium sperm concentration in northern males (Fig. 2), we used fertilization success at these different treatments to test genotype effects within each population. This maximized our power to detect the influence of traits within populations.

Results

Effect of population source on fertilization success

At both low and medium sperm concentrations, southern males had greater fertilization success than northern males, indicating an overall greater fertilization efficiency in southern males (Table 1, Fig. 2). Southern males also had significantly greater fertilization success with females from their own population than with females from the northern population at these same sperm concentrations (significant interaction effect in Table 1). However, the opposite was not true – northern males did not have significantly greater fertilization success with females from either population (Table 1,

114

Fig. 2). At high sperm concentration, there was no difference in performance (Fig. 2).

As expected, fertilization success increased with 10-fold increases in sperm concentration in both populations. The higher fertility of the southern sperm was approximately of the same order of magnitude as this 10-fold increase in sperm concentration: low and medium sperm concentration treatments of southern males qualitatively resembled the medium and high sperm treatments of northern males, respectively (Fig. 2).

Effect of genotype on fertilization success

Many of the models showed significant effects, but varied greatly in their AICc scores, and the extent to which they explained variability in fertilization success, from a minimum of 0 to a maximum of 53% of deviance explained (Table 2). We therefore highlight those models with lowest AICc, in which most deviance was explained. In Table 2, we highlight those top models that explain greater the 15% of residual deviance.

Considering within-population crosses, bindin genotype had a greater effect on fertilization success in the southern population. Heterozygosity in collagen copy number had the best fit – males with one short (1 or 2 copies) and one long allele (3 copies) had greater fertilization success than males with the same copy number in both alleles (Fig. 5.3a, Table 2). The effect of male collagen heterozygosity had the same effect when southern males were crossed with northern females (Fig. 5.3b, Table2). We could not test for a similar effect among northern males because they were all homozygotes for collagen copy number. Despite the good fit of heterozygosity in collagen copy number in explaining fertilization success, variability in fertilization success was not greater in homozygotes (Fig. 5.3c, Table 5.3), as expected if the heterozygote advantage was driven by greater sampling probability of high-affinity alleles.

The next best model within populations was length of a male’s bindin allele, which was associated with increased fertilization success in the southern population (Table 5.2). Most of the models capturing repeat number variation and allele length showed the same effect: shorter alleles had greater success. But the best-fit models were those under the dominance model, where the allele with the shortest length, or fewest number of copies, best explained fertilization success. Considering all crosses,

115

the effect of shortest allele was strongest when males from either population were crossed with females from the southern population (Table 5.2, Fig. 5.4), but was not detected when either males were crossed with northern females. The affinity for shorter alleles therefore appears to be unique to southern females.

In addition, the level of male-female matching at repeat genotypes was associated with greater fertilization success in the southern population, when both repeat types were simultaneously taken into account (Table 5.3). This was the sole model in which male-female matching had a reasonably good model fit.

The sum of derived states at sites under positive selection also explained variation in fertilization success with good model fits (Table 5.2). However, the effect was in the opposite direction to the expectation if these alleles were selected by female preference: males with more derived states had lower fertilization success (Fig. 5.5). This effect was also apparent when we weighed the tally of derived states by the inverse frequency of each state within the population. When only the allele with the most derived sites was considered (dominance model), the within-population effect was not as strong (Table 5.2).

We did not detect an overall effect of heterozygosity or male-female genotype matching at sites under positive selection (Table 5.3). However, in the single cross in which the male and female shared three rare derived states, fertilization success was greater than expected given the relationship with coding region length (Fig. 5.4).

Gamete traits

There was no significant difference in egg diameter, egg jelly coat, or sperm head diameter between the populations (see Appendices E and F).

There was a significant difference in sperm velocity between the two populations. Southern males had significantly higher sperm velocities than northern males, although only four males from each population were assayed (Fig. 5.6a). Fertilization success did not increase with sperm velocity within populations, but did increase with sperm velocity across populations (Fig. 5.6b).

116

Discussion

Our results show significant localized reproductive coevolution in Patiria miniata. The signal is asymmetric, suggesting that reproductive co-evolution between male and female gametes has occurred in the southern population of Bamfield, but not in the northern population of Sandspit, Haida Gwaii. We detected this signal despite a large difference in overall fertilization efficiency between males of either population, with greater efficiency, and greater sperm velocities, in the males of the southern population. We also found that indel variation in the male bindin locus is related to fertilization success when matched with southern, but not northern, females. We consider the alternative mechanisms for the difference in fertilization efficiencies between southern and northern males, and the variation in fertilization success according to bindin genotype within and across populations.

The much greater fertilization efficiency of southern males with females from either population suggests it is due to a fixed character that does not interact with the female. Sperm velocity was greater in southern males, and may in part be responsible for this difference. However, the difference in sperm velocity was probably not great enough to explain the difference in fertilization rate according to a fertilization kinetics model (Vogel et al. 1982). There were no fixed differences in bindin genotypes between the populations, but the most consistent difference was in repeat number or coding region length. However, this alone cannot explain the difference in performance between the male populations, as the two southern males that were homozygous for the mode number of repeats (the two longest southern alleles in Fig. 5.3) had greater fertilization success than any of the southern males with the same repeat genotype. It is possible that sperm velocity and repeat genotype together explain the difference between male populations, or that there is another, unmeasured phenotypic difference between the populations that can explain their overall difference. Nevertheless, the differences in both sperm velocity and fertilization efficiency suggest that sperm competition is greater in the southern population.

The finding of greater fertilization success among southern males that were heterozygous for collagen copy number matched our predictions that males with two types of bindin have a greater chance of matching female phenotypes if they are

117

polymorphic in the population. This finding is consistent with theoretical models showing that a heterozygous advantage can explain the maintenance of male ligand polymorphism in a population (Haygood 2004; Palumbi 2009). However, this mechanism also predicts that homozygous males should have greater variability in fertilization success among female mates (one from either population), under the expectation that homozygous males will sometimes match and sometimes not match particular female phenotypes. This prediction was not met in our data –variability in fertilization success was not greater among males homozygous for collagen copy number. With such a small sample size, however, it is possible that each homozygous male was, by chance, only mated with non-matching females. A more powerful test of this prediction would come from an experiment in which each male is mated with a larger sample of females.

The alternative set of models with good support showed that males with fewer repeats under the dominance model had greater fertilization success. Across populations, this advantage appeared to be confined to crosses with southern females. Southern males with shorter alleles had greater success only with southern females, and not with northern females. In addition, the single northern male with a tandem repeat deletion had a greater advantage when paired with a southern female (Fig. 5.4). This mechanism can also explain the pattern of reproductive coevolution apparent in southern crosses - southern females have an affinity for males with shorter alleles or fewer repeats, and more southern males have these types of alleles. This mechanism is also supported by the pattern of genotype matching in the southern population: males and females with matching repeat genotypes had greater fertilization success. This suggests that assortative mating has lead to linkage disequilibrium between female receptor and bindin genotype (Palumbi 1999; Tomaiuolo & Levitan 2010), and is difficult to explain without a role for polymorphic female receptors in shaping selection on male bindin variation. We therefore find support for the existence of population- and female- specific affinities for males with fewer repeat copies in their bindin alleles.

Counter to our expectation, we did not observe improved fertilization success among males with more derived states at sites under positive selection. However, our analyses may have lacked the power to detect their effects for two reasons. First, variation at sites under positive selection may have subtle effects relative to the role of

118

indel variation of fertilization success. We did not have enough observations to conduct multi-factorial tests, and the effect of variation at these sites may not be observable with single-factor models. Second, while these codon sites were identified as having greater- than-expected changes at the amino acid level and therefore are likely to be maintained by selection, most of the derived states occurred at low frequencies in the population. If female receptors exist that select for males with these mutations in either population, we expect that they are not common and therefore may not have been well represented in our sample of females. Both of these issues could be resolved with greater sampling in future fertilization experiments, with many more crosses between males and females.

While the no-competition crosses conducted here allow observation of fertilization affinities, the conditions do not mimic those of natural spawning events in which multiple males may compete for females. Under male-competitive conditions, particular male genotypes may have precedence not apparent from these experiments (Geyer & Palumbi 2003). In addition, a competitive experiment is needed to reveal how sexual selection may favour certain males genotypes at high sperm densities. Natural populations of P.miniata exist in relatively dense aggregations in Bamfield (2.6-3.5 individuals m-2, Rumrill 1989), and California (1.9 individuals m-2, Schroeter et al. 1983), which are within the density range that is associated with strong sperm competition and frequent polyspermy in sea urchins (Levitan & Ferrell 2006). A competitive experiment conducted at such polyspermic conditions may reveal selection for male genotypes contingent on the frequency of male genotypes in the spawning population, as well as the identity of females (Levitan & Ferrell 2006; Levitan & Stapper 2009). Nevertheless, our results provide the first evidence of functional polymorphism in sea star bindin, as well as a window into how selection has occurred within this species. One or more female traits that favour male bindin alleles with fewer repeats appear to have been selected in the southern population. This affinity may have been selected because of greater polyspermy avoidance, at a time when shorter bindin alleles were relatively rare. Our results suggest that this selection has occurred locally in the southern population, where sperm competition appears to be stronger.

Shorter bindin alleles with fewer repeats in either of the two repeat domains appear to be under sexual selection in the southern population, and have occurred multiple times within the gene tree (Sunday & Hart in prep). However, these repeats also

119

represent historical duplications, suggesting that female preference for shorter bindin alleles may have not always existed, and may have previously been neutral, or in the opposite direction. Variation in repeat number may represent an easily variable molecular polymorphism on which evolving sexual selection can act (Vacquier et al. 1995; Podlaha et al. 2005). Indeed, repeat copy variation appears to be associated with positive selection in broadcast spawning species (Palumbi 1999; Levitan & Stapper 2009; Sunday & Hart in prep as well as in sperm proteins of drosophila (Schully & Hellberg 2006), rodents (Podlaha et al. 2005), and primates (Podlaha & Zhang 2003). Our findings provide further experimental support that indel variation is subject to sexual selection that may vary over time or space.

The asymmetric pattern of compatibility between populations of P. miniata is in keeping with the three previous intraspecific cross-fertilization studies known, in which a signal of incompatibility was greater in one direction of the reciprocal cross than in the other (Biermann & Marks 2000; Styan et al. 2008; Zhang et al. 2010). It is also in keeping with patterns of asymmetric prezygotic isolation among species of marine broadcast spawners, in which experimental observations of asymetric hybridization appears to be the rule rather than the exception (e.g. Strathmann 1981; Harper & Hart 2005; Zigler et al. 2005; Lessios 2007). In cross-fertilizations between sea urchin species pairs, Zigler et al. (2005) identified two examples of highly asymmetric incompatibilities. In both cases, the more discriminating female was the species with the more ‘derived’ form of male bindin (Heliocidaris erythrogramma and Echinometra lucunter) while the less discriminating female was the species with a more ‘ancestral’ form. In P. miniata, the southern females were more discriminating for males from their own population compared to the northern females, and the southern populations had more derived variation (Sunday & Hart in prep). Bindin from the southern population also had more sites under positive selection and greater repeat polymorphism, suggesting it is more derived than the northern population. Our results within species are therefore consistent with patterns previously found across species. Our findings go one step further by demonstrating a greater affinity among southern females for males with shorter bindin alleles, and suggest a plausible route by which species differences might have arisen through asymmetric coevolution of male and female traits in some but not all populations.

120

If the divergence observed between these two populations represents incipient speciation, it is at an early stage. At present, secondary contact between these populations would likely lead to a spread of southern male genotypes in the population, as southern males appear to have a competitive advantage in spawning events. However, experiments in competitive spawning scenarios that mimic natural conditions are needed to test this. For example, if sperm are highly limiting in natural spawns in the northern population, than sperm longevity, rather than velocity, may be more advantageous for male fertilization success. Sperm longevity is known to trade-off with sperm velocity in sea urchins (Levitan 2000) and thus males in the north may have an advantage in terms of longevity, which was not measured in the present experiment. Molecular analysis on non-coding markers suggests that gene flow has been low and asymmetrical between these populations, with greater rates of migration from the north to south (McGovern et al. 2010). Simulations of oceanographic dispersal trajectories support this finding, showing low migration from the southern coastal region of British Columbia to Haida Gwaii, but no migration from Haida Gwaii to the southern coastal region (Sunday et al., in prep). This asymmetrical migration may in part explain why southern male genotypes have not spread to the northern population. If females select males with particular novel or rare male genotypes in the southern population but not in the northern population, then both female receptors and bindin alleles may become more derived in the southern population. Over time, the signature may become similar to the asymmetric species crosses observed in species pairs of sea urchins.

Conclusion

We found a significant pattern of localized reproductive coevolution, and a population-specific fertilization advantage for male bindin alleles that are more frequent in the southern population. We also discovered a large difference in overall fertilization efficiency between males of either population, which suggests that sperm competition, and possibly sexual conflict, is stronger in the southern population, where effective population size is also greater. Our findings are consistent with species-level compatibility experiments, and suggest a way in which these species-level differences may have arisen through sexual selection within populations.

121

References

Arnqvist G, Edvardsson M, Friberg U, Nilsson T (2000) Sexual conflict promotes speciation in insects. Proceedings of the National Academy of Sciences of the United States of America 97, 10460-10464.

Biermann CH (1998) The molecular evolution of sperm bindin in six species of sea urchins (Echinoida : Strongylocentrotidae). Molecular Biology and Evolution 15, 1761-1771.

Biermann CH, Marks JA (2000) Geographic divergence of gamete recognition systems in two species in the sea urchin genus Strongylocentrotus. Zygote 8, S86-S87.

Crawley MJ (2007) The R Book. Chichester: John Wiley & Sons Ltd.

Galindo BE, Vacquier VD, Swanson WJ (2003) Positive selection in the egg receptor for abalone sperm lysin. Proceedings of the National Academy of Sciences, USA 100, 4639-4643.

Gasper J, Swanson WJ (2006) Molecular population genetics of the gene encoding the human fertilization protein zonadhesin reveals rapid adaptive evolution. American Journal of Human Genetics 79, 820-830.

Gavrilets S (2000) Rapid evolution of reproductive barriers driven by sexual conflict. Nature 403, 886-889.

Gavrilets S, Waxman D (2002) Sympatric speciation by sexual conflict. Proceedings of the National Academy of Sciences, USA 99, 10533-10538.

Geyer LB, Palumbi SR (2003) Reproductive character displacement and the genetics of gamete recognition in tropical sea urchins. Evolution 57, 1049-1060.

Harper FM, Hart MW (2005) Gamete compatibility and sperm competition affect paternity and hybridization between sympatric Asterias sea stars. Biological Bulletin 209, 113-126.

Haygood R (2004) Sexual conflict and protein polymorphism. Evolution 58, 1414-1423.

Hellberg ME, Vacquier VD (1999) Rapid evolution of fertilization selectivity and lysin cDNA sequences in teguline gastropods. Molecular Biology and Evolution 16, 839-848.

Holland B, Rice WR (1998) Perspective: Chase-away sexual selection: Antagonistic seduction versus resistance. Evolution 52, 1-7.

Kosakovsky Pond SL, Posada D, Gravenor MB, Woelk CH, Frost SDW (2006) Automated phylogenetic detection of recombination using a genetic algorithm. Molecular Biology and Evolution 23, 1891-1901.

122

Lee Y, Vacquier VD (1992) The divergence of species-specific abalone sperm lysins in promoted by positive darwinian selection. Biological Bulletin 182, 97-104.

Lessios HA (2007) Reproductive isolation between species of sea urchins. Bulletin of Marine Science 81, 191-208.

Levitan DR (2000) Sperm velocity and longevity trade off each other and influence fertilization in the sea urchin Lytechinus variegatus. Proceedings of the Royal Society of London Series B-Biological Sciences 267, 531-534.

Levitan DR, Ferrell DL (2006) Selection on gamete recognition proteins depends on sex, density, and genotype frequency. Science 312, 267-269.

Levitan DR, Stapper AP (2009) Simultaneous positive and negative frequency- dependent selection on sperm bindin, a gamete recognition protein in the sea urchin Strongylocentrotus purpuratus. Evolution 64, 785-797.

McCartney MA, Lessios HA (2004) Adaptive evolution of sperm bindin tracks egg incompatibility in neotropical sea urchins of the genus Echinometra. Molecular Biology and Evolution 21, 732-745.

McGovern TM, Keever CC, Saski CA, Hart MW, Marko PB (2010) Divergence genetics analysis reveals historical population genetic processes leading to contrasting phylogeographic patterns in co-distributed species. Molecular Ecology 19, 5043- 5060.

Metz EC, Palumbi SR (1996) Positive selection and sequence rearrangements generate extensive polymorphism in the gamete recognition protein bindin. Molecular Biology and Evolution 13, 397-406.

Moy GW, Springer SA, Adams SL, Swanson WJ, Vacquier VD (2008) Extraordinary intraspecific diversity in oyster sperm bindin. Proceedings of the National Academy of Sciences of the United States of America 105, 1993-1998.

Palumbi SR (1999) All males are not created equal: Fertility differences depend on gamete recognition polymorphisms in sea urchins. Proceedings of the National Academy of Sciences of the United States of America 96, 12632-12637.

Palumbi SR (2009) Speciation and the evolution of gamete recognition genes: pattern and process. Heredity 102, 66-76.

Patiño S, Aagaard JE, MacCoss MJ, Swanson WJ, Hart MW (2009) Bindin from a sea star. Evolution and Development 11, 376-381.

Podlaha O, Webb DM, Tucker PK, Zhang JZ (2005) Positive selection for indel substitutions in the rodent sperm protein Catsper1. Molecular Biology and Evolution 22, 1845-1852.

123

Podlaha O, Zhang JZ (2003) Positive selection on protein-length in the evolution of a primate sperm ion channel. Proceedings of the National Academy of Sciences of the United States of America 100, 12241-12246.

Rasband WS (1997-2009) ImageJ. Bethesda: U. S. National Institutes of Health.

Rice WR (1998) Intergenomic conflict, interlocus antagonistic coevolution and the evolution of reproductive isolation. In Endless Forms: Species and Speciation (ed. Howard DJ, Berlocher SH), pp. 261-270: Oxford University Press.

Rice WR, Holland B (1997) The enemies within: intergenomic conflict, interlocus contest evolution (ICE), and the intraspecific Red Queen. Behavioral Ecology and Sociobiology 41, 1-10.

Riginos C, McDonald JH (2003) Positive selection on an acrosomal sperm protein, M7 lysin, in three species of the mussel genus Mytilus. Molecular Biology and Evolution 20, 200-207.

Riginos C, Wang D, Abrams AJ (2006) Geographic variation and positive selection on M7 lysin, an acrosomal sperm protein in mussels (Mytilus spp.). Molecular Biology and Evolution 23, 1952-1965.

Rumrill SS (1989) Population size structure, juvenile growth, and breeding periodicity of the sea star Asterina miniata in Barkley Sound, British Columbia. Marine Ecology-Progress Series 56, 37-47.

Schroeder TE (1980) The jelly canal marker of polarity for sea urchin oocytes, eggs and embryos. Experimental Cell Research 128, 490–494.

Schroeter SC, Dixon J, Kastendiek J (1983) Effects of the starfish Patiria miniata on the distribution of the sea urchin Lytechinus anamesus in a southern californian kelp forest. Oecologia 56, 141-147.

Schully SD, Hellberg ME (2006) Positive selection on nucleotide substitutions and indels in accessory gland proteins of the Drosophila pseudoobscura subgroup. Journal of Molecular Evolution 62, 793-802.

Springer SA, Crespi BJ (2007) Adaptive gamete-recognition divergence in a hybridizing Mytilus population. Evolution 61, 772-783.

Strathmann RR (1981) On barriers to hybridization between Strongylocentrotus droebachiensis (O.F. Muller) and Strongylocentrotus pallidus (G.O. Sars). Journal of Experimental Marine Biology and Ecology 55, 39-47.

Styan CA, Kupriyanova E, Havenhand JN (2008) Barriers to cross-fertilization between populations of a widely dispersed polychaete species are unlikely to have arisen through gametic compatability arms races. Evolution 62, 3041-3055.

124

Sunday J, Raeburn L, Hart MW (2008) Emerging infectious disease in sea stars: castrating ciliate parasites in Patiria miniata Diseases of Aquatic Organisms 81, 173-176.

Sunday JM, Hart MW (in prep) Divergent positive selection across populations in a reproductive compatibility locus.

Sutton KA, Wilkinson MF (1997) Rapid evolution of a homeodomain: Evidence for positive selection. Journal of Molecular Evolution 45, 579-588.

Swanson WJ, Vacquier VD (2002) The rapid evolution of reproductive proteins. Nature Reviews Genetics 3, 137-144.

Tomaiuolo M, Levitan DR (2010) Modeling how reproductive ecology can drive protein diversification and result in linkage disequilibrium between sperm and egg proteins. American Naturalist 176, 14-25.

Turner LM, Hoekstra HE (2008) Causes and consequences of the evolution of reproductive proteins. International Journal of Developmental Biology 52, 769- 780.

Vacquier VD, Swanson WJ, Hellberg ME (1995) What have we learned about sea urchin sperm bindin. Development Growth & Differentiation 37, 1-10.

Vacquier VD, Swanson WJ, Lee YH (1997) Positive Darwinian selection on two homologous fertilization proteins: What is the selective pressure driving their divergence? Journal of Molecular Evolution 44, S15-S22.

Vogel H, Czihak G, Chang P, Wolf W (1982) Fertilization kinetics of sea urchin eggs. Mathematical Biosciences 58, 189-216.

Zhang HB, Scarpa J, Hare MP (2010) Differential Fertilization Success Between Two Populations of Eastern Oyster, Crassostrea virginica. Biological Bulletin 219, 142-150.

Zigler KS, McCartney MA, Levitan DR, Lessios HA (2005) Sea urchin bindin divergence predicts gamete compatibility. Evolution 59, 2399-2404.

Zuur AF, Ieno EN, Walker NJ, Savellev AA, Smith GM (2009) Mixed Effects Models and Extensions in Ecology with R. New York: Springer.

125

Tables and Figures

Table 5.1. Results of linear mixed effects model showing effect of male and population source on fertilization success. Data from both the low and medium sperm concentrations are included. Contrasts show effect of southern males and females, in contrast to effect of northern males, and effect of increased sperm concentration (medium, compared to the reference level, low). Coefficients are and standard error are logit values of fertilization success.

Coefficient Std. Error z value P-value Intercept (N) 0.01 0.316 0.03 0.98 male population (S) 2.26 0.409 5.54 3.02E-08 *** female population (S) 0.098 0.203 0.48 0.63 male pop (S) x female pop (S) 0.368 0.12 3.07 0.002 ** sperm treatment (medium) 2.74 0.07 39.2 < 2e-16 ***

126

Table 5.2. Generalized linear model results for fertilization success.

127

Table 5.3. Generalized linear model results for fertilization success based on male and female genotype similarity. Table shows estimated coefficients (slopes), Akaike information criteria, corrected for small samples sizes (AICc), and residual deviance explained (dev.) for each model. Columns indicate type of fertilization cross used in each model. Grey areas highlight top models with greater than 15% deviance explained.

Within populations southern male northern male

Trait southern female northern female coef. AICc dev. coef. AICc dev. Male and female similarity score matching collagen 0.15 195 6.22 -0.22 360 1.3 tandem repeat matching 0.11 196 5.68 -0.63 333 9.5 matching at collagen & tandem 0.21 181 16.38 -0.46 333 9.9 matching at sites under selection 0.16 196 6.17 -0.58 351 4.1

Table 5.4. Linear model results for effect of male heterozygosity on variance in fertilization success among the females with which he was mated.

Trait Southern males Northern males coef t p-value coef t p-value Heterozygosity in collagen-like copy -0.00047 -0.09 0.93 - - - number Heterozygosity in tandem repeat 0.0047 0.88 0.41 -0.0088 -0.181 0.86 number Heterozygosity in collagen-like copy -0.00047 -0.09 0.93 -0.0088 -0.181 0.86 and tandem repeat number Heterozygosity at sites under -0.0028 -1.22 0.26 -0.015 -0.951 0.37 selection Columns indicate type of fertilization cross used in each model.

128

a) c)

* * * * core British tandem repeats Columbia collagen-like repeats intron N b)

S * ** * * * * * * * U.S.A.

Fig. 5.1. Summary of bindin variation and location of collection sites.

(a-b) Image of bindin amino acid alignment, showing mode length and variation in the northern (a) and southern (b) populations. Light blue regions show collagen-like repeats, boxes along gene indicate tandem repeats (dashed = varies as indels, solid = does not vary), black triangle indicates the 1285 pb intron, and black solid box indicates 135 pb core. Grey horizontal lines indicate regions of indel variation at repeat domains that decrease gene length from mode. Inserts below gene indicate indels that increase length from mode. Asterisks indicate location of sites under selection. (c) Map of sample locations. N=northern population in Sandspit, S=southern population in Bamfield, British Columbia. Grey dashed line indicates location of phylogeographic break (from Keever et al., 2009).

129

a) Southern males b) Northern males 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

Fertilization Success 0.2 0.2 Southern females Southern females Northern females Northern females 0.0 0.0

10^4 10^5 10^6 10^4 10^5 10^6 Sperm concentration (sperm/ul) Sperm concentration (sperm/ul)

Fig. 5.2. Fertilization success depends on the population of origin in different combinations of mating pairs.

Mean proportion of eggs fertilized in each cross is shown for crosses between southern females (black points) and northern females (white points) with southern (a) and northern (b) males. Lines link mean values from the 10 crosses (solid = southern females, dashed = northern females).

130

1.0 (a) 1.0 (b) 1.0 (c)

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2 Fertilization success

0.0 0.0 0.0

Homozygous Heterozygous Homozygous Heterozygous 2 4 6 8 10 Southern male identity

Fig. 5.3. Heterozygosity in collagen-like copy number variation affects fertilization success for southern males.

(a) and (b) show success of males heterozygous and homozygous for the number collagen repeats in either allele, from crosses with (a) southern females and (b) northern females. Whisker plots show median, upper and lower quartiles, and minimum and maximum values. (c) Fertilization success of each southern male from crosses with southern females (circles) and northern females (squares), according to male heterozygosity (filled points) or homozygosity (open points) at collagen repeats, showing similar variability among females in homozygotes and heterozygotes.

131

a) b) Southern females Northern females 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 Fertilization success 0.2 0.2 Southern males Northern males 0.0 0.0

1400 1600 1800 2000 2200 2400 2600 1400 1600 1800 2000 2200 2400 2600

Length of shortest allele

Fig. 5.4. Length of shortest male bindin allele affects fertilization success with (a) southern females, but not with (b) northern females.

Results at low sperm concentration shown. Lines show best-fit relationship from linear mixed effects models. Diamond point shows single cross in which both male and female had derived states at the same three sites under positive selection. Legend applies to both panels.

132

Southern males x southern females Northern males x northern females 1.0

0.8

0.6

0.4

Fertilization success 0.2

0.0

0 2 4 6 8

Number of derived states at sites under selection

Fig. 5.5. Number of derived states at sites under selection affects fertilization success with populations.

Results at low sperm concentration shown. Lines show best-fit relationship from linear mixed effects models.

133

1.0 Southern males

180 Northern males

1 0.8 − s 160 0.6

140 0.4 Fertilization Success Sperm velocity, um 120 0.2

0.0 100

Southern Northern 100 120 140 160 180

−1 Male source population Sperm velocity, um s

Fig. 5.6. Sperm velocity affects fertilization success for different male source populations.

(a) Whisker plots show median, interquartile and extreme values of sperm velocity. (b) Fertilization success in relation to sperm velocity, grey = northern males, black = southern males.

134

Chapter 6.

Synthesis

Here I briefly summarize the findings of each chapter and the contributions that my thesis makes within the broader research field. I also discuss future research directions and consider the implications of this work towards understanding the evolution of reproductive isolation in the sea.

Summary of Findings

In the population genetic structure analysis (Chapter 2), we showed that populations of Patiria miniata consist of two relatively ancient genetic lineages roughly separated at Queen Charlotte Sound, British Columbia. This finding was robust to the 3 different classes of markers used, and clearly showed that the genetic beak was not at the location of the species range disjunction between California and Vancouver Island. This work provided evidence that gene flow in this species is not high enough to homogenize P. miniata populations across Queen Charlotte Sound, despite its long pelagic larval phase. It also served to refute the hypothesis that the range disjunction is a remnant of historical vicariance, pointing to ecological factors as an alternative explanation for the lack of populations in Washington and Oregon.

Using an oceanographic dispersal model (Chapter 3), I next showed that passive dispersal through British Columbia and south eastern Alaska can predict the genetic structure observed. The model predicted relatively low, stepping-stone gene flow throughout the northern region of the species range.

In my analyses genomic sequences of the sperm protein, bindin (Chapter 4), I found a remarkable level of variation, both as single nucleotide polymorphisms and

135

varying numbers of repeat motifs. Population divergence in bindin across Queen Charlotte Sound was greater than population divergence in non-coding nuclear introns, and diversity was greater in the southern population. I found a robust signal of positive selection in the southern population, a weak signal of positive selection in the northern population, and evidence that positive selection has occurred at different codon sites in either region. This provides evidence that bindin has been subject to differential selection among populations, most likely attributable to divergent sexual selection.

Finally, my investigation of reproductive success as a function of population source and bindin genotype revealed a significant signal of reproductive coevolution in the southern population (Chapter 5). I found that fertilization success was greater in males with fewer copies of repetitive motifs in their bindin genotypes, but only when crossed with females from the southern population. These findings suggest that selection favours males with shorter bindin haplotypes in the southern population, and provides an explanation for the greater frequency of shorter bindin alleles sampled in the south. I also found that sperm velocity and fertilization efficiency was greater among southern males. This provides evidence that sperm competition is greater in the southern population.

Novel Contributions

The findings in these chapters provide important research advances towards understanding reproductive evolution in marine taxa. The population genetics and gene flow studies of Chapters 2 and 3 provide a new reference for the level of gene flow and the potential for local adaptation that may be expected within a species with a long-lived pelagic phase. The low gene flow inferred in P. miniata may be unique to complex and glacially-influenced coastlines such as British Columbia and south eastern Alaska, but nevertheless sets a precedent which may adjust how scientists and managers consider the potential levels of connectivity along such coastlines. This information will also be useful for marine protected area design in British Columbia. In addition, this work provided important historical and contemporary context for the subsequent study of reproductive evolution among populations.

136

The study of molecular variation in bindin (Chapter 4) provides the first example of divergent positive selection in a gamete recognition gene between natural populations, and in doing so fills a critical gap in the literature. Previous work on gamete recognition loci has focused on signals of positive selection at levels above and below this type of comparison: either investigating variation within populations, or between species. The hypothesis that positive selection may be divergent between populations has been considered mathematically (Gavrilets 2000; Gavrilets & Waxman 2002) and verbally (Palumbi 1992; Rice & Holland 1997l; Holland & Rice 1998; Parker & Partridge 1998; Coyne & Orr 2004), and my work provides empirical data in support of this theory.

The investigation of fertilization compatibility in the context of population source and bindin variation (Chapter 5) provides a window into how this bindin variation relates to fertilization success. Previous work in sea urchins has shown how bindin variation affects fertilization success within populations (Palumbi 1999; Levitan & Ferrell 2006; Levitan & Stapper 2009) and across species (Zigler et al. 2005). My work provides an important link between these levels of study for the first time, showing how bindin traits that differ between populations have differential fertilization success. They also represent the first functional study of bindin polymorphism in a seastar. In addition, the importance of repeat number variation in P.miniata provides compelling evidence that similar variation observed in reproductive loci of other species may be under sexual selection.

The potential role of sexual conflict

My work suggests that bindin has been under differential positive selection among populations of P. miniata. While I did not design my experiments to identify the mechanism of this selection, some evidence suggests that sexual conflict may be an underlying mechanism. First, the finding high polymorphism in bindin largely rules out directional sexual selection for an optimal male genotype (without conflict), as the expectation would be for low polymorphism following a selective sweep of the most optimal male genotype. Instead, the finding of high functional polymorphism is more consistent with the existence fitness trade-offs and frequency-dependent selection such as that proposed in a sexual conflict over polyspermy (Haygood 2004; Levitan & Ferrell

137

2006). Second, the southern population showed a greater strength of and/or response to diversifying selection, and also showed signals of greater sperm competition (faster sperm velocities, greater rate of fertilization). Where sperm competition is higher, sexual conflict is expected be stronger (Gavrilets 2000) and hence these findings are consistent with a greater strength of sexual conflict in the southern population. Finally, population densities of P. miniata are known to be high (Schroeter et al, 1983; Rumrill 1989) and within the range of spawner densities in which polyspermy and negative-frequency dependence of bindin has been observed (Levitan & Ferrell 2006).

However, future work is needed in order to identify sexual conflict per se relative to other hypotheses as the mechanism behind positive selection. Although P. miniata gametes are short-lived and do not likely interact strongly with their environment, the hypothesis that female sperm receptors coevolve in antagonistically with pathogens, which in turn selects for matching male genotypes (Vacquier et al. 1995) cannot be ruled out. One fruitful avenue for future research is the multi-spawning experimental design of Levitan and colleagues (Levitan & Ferrell 2006; Levitan & Stapper 2009), in which mating success at both low and high (polyspermic) sperm concentrations is investigated, and multiple male genotypes are competed for fertilization success with multiple females. Such a design can allow identification of negative frequency-dependent selection on bindin traits at high sperm densities, and simulate sexual selection more accurately than no-choice crosses in a natural population.

Another potential avenue for future research is simultaneous investigation of the female receptor for bindin, as well as other potentially important ligand-receptor systems that contribute to reproductive compatibility (Hart 2012). My work used female identity and population source as identifiers of potential differences in female affinities for different male bindin genotypes, but knowledge of female receptor genotypes may be possible in the near future (Hart, unpublished), especially given the increasing availability of high-throughput sequencing technologies. Future phylogeographic study of interacting genes may reveal an historical signature of antagonistic coevolution, particularly in combination with ongoing fertilization experiments.

A third avenue for future research is to attain ecological information to support or refute alternatives to the sexual conflict hypothesis. Cryptic ecological specialization may

138

exist among the focal populations of adult P. miniata, or in gametes vulnerable to local pathogens, and local ecological specialization could be tested with common-garden experiments. Likewise, knowledge of the relative fitness of hybrids in various environments would help to contextualize the relative importance of pre- vs. post-zygotic isolation in this system. Finally, information about natural spawning conditions of P. miniata among populations would provide the context under which to assess the strength of sexual conflict in either population.

This potential future work will benefit from the foundations developed in my thesis. My work provides information on the level of gene flow between populations, potential features of the bindin gene that are likely under negative-density dependent selection, and the hypothesis that sexual conflict is stronger in the southern population.

Speciation in the sea?

The findings of my thesis together provide new insight into how speciation may occur in the ocean. Since the ground-breaking discovery of high positive selection in fertilization proteins among broadcast spawning taxa (Lee & Vacquier 1992; Palumbi 1992), it has been widely speculated that divergent selection among isolated populations may lead to reproductive isolation and speciation (Palumbi 1992; Rice & Holland 1997; Holland & Rice; 1998; Parker & Partridge 1998; Coyne & Orr 2004; Panhuis et al. 2006). If diversifying selection of reproductive traits results from sexually antagonistic coevolution, this provides a unique pathway towards speciation that is independent of ecological factors, and an alternative to ecological speciation (Coyne & Orr 2004; Schluter 2009; Sobel et al. 2009). My findings of divergent positive selection and localized coevolution in the southern population of P. miniata are consistent with this model of speciation driven by sexual conflict. Moreover, the asymmetric pattern in the apparent rate of reproductive evolution among P. miniata populations is consistent with the pattern observed across multiple sister species pairs (Chapter 5). It is compelling to think that P.miniata is at an early stage of speciation driven by intrinsic mechanisms, and that these mechanisms have driven previous speciation events.

139

Because of the potential for high gene flow according the to the life history of P. miniata, these findings are especially important to how we may conceptualize localized evolution and speciation in dispersing marine taxa. The differences in reproductive traits between the populations may have evolved during the long vacariant separation of the two lineages, but they have not been homogenized by gene flow. This suggests that localized evolution in response to divergent natural or sexual selection pressures may exist at similar spatial scales in other marine species with shorter pelagic life histories. Additional analyses of locally selective traits at various spatial scales, may reveal more cryptic local adaptation and signals of insipient speciation in the sea.

References

Coyne HA, Orr JA (2004) Speciation. Sunderland: Sinauer Associates.

Gavrilets S (2000) Rapid evolution of reproductive barriers driven by sexual conflict. Nature 403, 886-889.

Gavrilets S, Waxman D (2002) Sympatric speciation by sexual conflict. Proceedings of the National Academy of Sciences of the United States of America 99, 10533- 10538.

Hart MW (2012) Next-generation studies of mating system evolution. 66, 1675–1680.

Haygood R (2004) Sexual conflict and protein polymorphism. Evolution 58, 1414-1423.

Holland B, Rice WR (1998) Perspective: Chase-away sexual selection: Antagonistic seduction versus resistance. Evolution 52, 1-7.

Lee Y, Vacquier VD (1992) The divergence of species-specific abalone sperm lysins in promoted by positive darwinian selection. Biological Bulletin 182, 97-104.

Levitan DR, Ferrell DL (2006) Selection on gamete recognition proteins depends on sex, density, and genotype frequency. Science 312, 267-269.

140

Levitan DR, Stapper AP (2009) Simultaneous positive and negative frequency- dependent selection on sperm bindin, a gamete recognition protein in the sea urchin Strongylocentrotus purpuratus. Evolution 64, 785-797.

Palumbi SR (1992) Marine speciation on a small planet. Trends in Ecology & Evolution 7, 144-188.

Palumbi SR (1999) All males are not created equal: Fertility differences depend on gamete recognition polymorphisms in sea urchins. Proceedings of the National Academy of Sciences of the United States of America 96, 12632-12637.

Panhuis TM, Clark NL, Swanson WJ (2006) Rapid evolution of reproductive proteins in abalone and Drosophila. Philosophical Transactions of the Royal Society B- Biological Sciences 361, 261-268.

Parker GA, Partridge L (1998) Sexual conflict and speciation. Philosophical Transactions of the Royal Society of London Series B-Biological Sciences 353, 261-274.

Rice WR, Holland B (1997) The enemies within: intergenomic conflict, interlocus contest evolution (ICE), and the intraspecific Red Queen. Behavioral Ecology and Sociobiology 41, 1-10.

Rumrill SS (1989) Population size structure, juvenile growth, and breeding periodicity of the sea star Asterina miniata in Barkley Sound, British Columbia. Marine Ecology-Progress Series 56, 37-47.

Schluter D (2009) Evidence for Ecological Speciation and Its Alternative. Science 323, 737-741.

Schroeter SC, Dixon J, Kastendiek J (1983) Effects of the starfish Patiria miniata on the distribution of the sea urchin Lytechinus anamesus in a southern californian kelp forest. Oecologia 56, 141-147.

Sobel JM, Chen GF, Watt LR, Schemske DW (2009) The biology of speciation. Evolution 64, 295–315.

141

Vacquier VD, Swanson WJ, Hellberg ME (1995) What have we learned about sea urchin sperm bindin. Development Growth & Differentiation 37, 1-10.

Zigler KS, McCartney MA, Levitan DR, Lessios HA (2005) Sea urchin bindin divergence predicts gamete compatibility. Evolution 59, 2399-2404.

142

Appendix

143

Appendix A.

Population pairwise FST values averaged across 7 microsatellite loci.

Two- or three-letter population identifier acronyms as in Table 2.1

144

Appendix B.

Branch-sites model results from PAML analysis.

log log log estimated proportion proportion likelihood of likelihood of likelihood foreground sites at sites at null model: selection difference omega foreground foreground foreground model: (sel. model– omega w/ omega w/ omega fixed foreground null model) background background at 1 omega at 0 at 1 estimated Bamfield in foreground tree 1 -4506.5 -4539 32.6 15.6 0.01566 0.00165 tree 2 -4549.9 -4581.2 31.3 13.6 0.01543 0.00155 tree 3 -4539.7 -4570.2 30.5 14.2 0.01415 0.00147 tree 4 -4526.1 -4557.2 31.1 13.4 0.01562 0.00160 tree 5 -4537.3 -4577.5 40.2 16.1 0.01440 0.00150 tree 6 -4515.0 -4546.4 31.3 14.0 0.01526 0.00160 tree7 -4523.7 -4555.6 31.8 14.1 0.01643 0.00165 tree8 -4514.9 -4554.0 39.1 14.6 0.01338 0.00374 tree9 -4550.5 -4589.3 38.8 15.8 0.01446 0.00150 tree10 -4529.0 -4560.7 31.7 13.9 0.01554 0.00158

Sandspit in foreground tree 1 -4521.6 -4539 17.4 21.7 0.00853 0.00099 tree 2 -4543.4 -4542.0 -1.4 0 0 0 tree 3 -4539.2 -4539.0 -0.2 0 0 0 tree 4 -4531.1 -4530.7 -0.4 0 0 0 tree 5 -4558.5 -4576.2 17.7 19.4 0.00924 0.00104 tree 6

tree7

tree8

tree9

Bold log-likelihood differences indicate significant values in a log-likelihood ratio test. Grey and missing values are from model comparisons where selected models did not resolve a Dn/Ds parameter > 1 within the first 10 rounds. Grey values were obtained by running these unresolved models for the maximum of 200 rounds.

145

Appendix C.

Site-by-site Bayes-empirical-Bayes probability of positive selection in the first recombinant region of bindin in P. miniata for all gene trees investigated.

Bamfield in foreground Sandspit in foreground tree1

tree2 site site

tree3 site site

tree4 site site

tree5 site site

tree6 site site probability dn/ds>1

tree7 site site

tree8 site site

tree9 site site 0.6

tree10 site site 0.0 0 200 400 600 800 0 200 400 600 800

Codonsite site Codonsite site Height of line shows the probability that dN/dS is greater than 1 for each site along the length (x-axis) of the first exon. Grey horizontal bands show the 95% probability range. Light grey and missing plots are where selected models did not resolve a Dn/Ds parameter > 1 within the first 10 rounds; light grey plots show results of 3 of these models after 200 rounds and selection model was not better than the neutral null model.

146

Appendix D.

Model results of fertilization success with and without inclusion of high sperm treatment.

Coefficient Std. Error z value P-value a) Without data from high sperm treatment Intercept (N) 0.01001 0.31638 0.03 0.97476 male population (S) 2.26415 0.40869 5.54 3.02E-08*** female population (S) 0.09802 0.20272 0.48 0.62872 male pop (S) x female pop (S) 0.36788 0.1197 3.07 0.00212** sperm treatment (medium) 2.74132 0.06988 39.23 < 2e-16***

b) With data from high sperm treatment Intercept (N) -2.15724 0.2546 -8.47 < 2e-16*** male population (S) 1.58086 0.29212 5.41 6.24E-08*** female population (S) 0.16323 0.21322 0.77 0.444 male pop (S) x female pop (S) 0.19653 0.09804 2 0.045* sperm treatment (medium) 2.37965 0.0601 39.59 < 2e-16*** sperm treatment (high) 3.58602 0.07157 50.1 < 2e-16***

147

Appendix E.

Egg cell diameter and egg jelly coat thickness across populations. 44 42 40 38 36 Egg jelly coat thickness, um 34 32

180 190 200 210 220

Egg cell diameter, um

Black = southern females, grey=northern females. Lines and points exterior to plot show mean and standard deviation for each population.

148

Appendix F.

Sperm head diameter does not vary across populations.

2.00

1.95

1.90

1.85 Sperm diameter, um

1.80

1.75

Northern Southern

Male population

Points and lines show mean and standard deviation for each male, respectively.

149