MOLECULAR POPULATION GENETICS OF MATING SYSTEM
TRANSITIONS IN THE BRASSICACEAE
JOHN PAUL FOXE
A DISSERTATION SUBMITTED TO THE FACULTY OF GRADUATE
STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
GRADUATE PROGRAM IN BIOLOGY
YORK UNIVERSITY
TORONTO, ONTARIO
OCTOBER 2010 Library and Archives Bibliotheque et 1*1 Canada Archives Canada Published Heritage Direction du Branch Patrimoine de I'edition
395 Wellington Street 395, rue Wellington OttawaONK1A0N4 Ottawa ON K1A 0N4 Canada Canada
Your We Votre r6f6rence ISBN: 978-0-494-80588-6 Our file Notre reference ISBN: 978-0-494-80588-6
NOTICE: AVIS:
The author has granted a non L'auteur a accorde une licence non exclusive exclusive license allowing Library and permettant a la Bibliotheque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par I'lnternet, preter, telecommunication or on the Internet, distribuer et vendre des theses partout dans le loan, distribute and sell theses monde, a des fins commerciales ou autres, sur worldwide, for commercial or non support microforme, papier, electronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats.
The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in this et des droits moraux qui protege cette these. Ni thesis. Neither the thesis nor la these ni des extraits substantiels de celle-ci substantial extracts from it may be ne doivent etre imprimes ou autrement printed or otherwise reproduced reproduits sans son autorisation. without the author's permission.
In compliance with the Canadian Conformement a la loi canadienne sur la Privacy Act some supporting forms protection de la vie privee, quelques may have been removed from this formulaires secondaires ont ete enleves de thesis. cette these.
While these forms may be included Bien que ces formulaires aient inclus dans in the document page count, their la pagination, il n'y aura aucun contenu removal does not represent any loss manquant. of content from the thesis.
1*1 Canada Abstract
The transition from outcrossing to selfing represents one of the most common
major evolutionary transitions in the plant kingdom. Predominantly selfing and
outcrossing species pairs exist in a variety of plant taxa, yet little is known about the
underlying evolutionary dynamics promoting such a transition. In Chapter 2 of this
dissertation, I investigate the evolution of the selfing C. rubella from its sister species, the
outcrossing C. grandiflora. My results suggest that this transition to selfing has led to a
substantial bottleneck in C. rubella with a 100-1500 fold reduction in effective
population size. Furthermore, I estimate a divergence time of approximately 13,500 years
between these two species suggesting that this recent speciation event has led to dramatic
genotypic and phenotypic changes in C. rubella over a relatively short period of time.
While results from Chapter 2 show little evidence for introgression in natural populations
of C. grandiflora and C. rubella, in Chapter 3 I ask whether a shift in mating system has
lead to the establishment of effective reproductive isolating barriers between these two
sister species. Through the investigation of the population genetics of potentially
hybridizing populations of C. grandiflora and C. rubella I find no significant evidence
for ongoing gene flow between C. grandiflora and C. rubella, suggesting that differences
in mating system are acting as an effective reproductive barrier between these two species. In Chapter 4,1 investigate the evolutionary origins of the third species in the
Capsella genus, the selfing polyploid, C. bursa-pastoris. My results suggest that C. bursa-pastoris originated via autopolyploidization involving two distinct C. grandiflora haplotypes. Furthermore, I have dated this event at approximately 667,000 years ago.
IV In Chapter 5,1 investigate the demographic history and population genetics of Great
Lakes region, selfing and outcrossing populations of A. lyrata. I find evidence for strong geographic population clustering irrespective of mating system, suggesting that selfing either evolved multiple times or has spread to multiple genetic backgrounds, unlike the transition to selfing in C. rubella.
v Acknowledgements
The work presented in this dissertation was made possible by a large number of people, to whom I am very grateful. Specific acknowledgements are outlined at the end of Chapters 2 and 5. However, there are a number of other individuals who contributed to additional aspects of my dissertation.
I thank each of my committee members: Dr. Joel Shore, Dr. Imogen Coe and Dr.
Dawn Bazely for their support, their thoughtful suggestions and their encouragement throughout my time spent as a PhD candidate.
I thank my examination committee members Dr. Amro Zayed, Dr. Bridget
Stutchbury, Dr. Raymond Mar and Dr. Dan Schoen for their willingness to serve on my examination committee and for being so flexible around the timing of my defence.
I thank each of my coauthors on Chapters 2 and 5, namely: Tanja Slotte, Eli Stahl,
Barbara Neuffer, Herbert Hurka, Marc Stift, Andrew Tedder, Annabelle Haudry and
Barbara Mable. In particular, Marc Stift and Barbara Mable are two fantastic people to work with and I count myself very lucky to have had the opportunity to collaborate with them.
I thank Kate St. Onge for her assistance while sampling Capsella in Greece and
Martin Lascoux for providing Capsella seeds used in Chapter 4.
There are too many of my peers to thank individually. I thank the members of the
Wright laboratory, as well as the many graduate students in the Biology Department at
York University and in the Department of Ecology and Evolutionary Biology at the
vi University of Toronto, with whom I have had such a great experience over the past few years.
Finally, I thank my supervisor, Dr. Stephen Wright. In addition to his supervision,
Dr. Wright has acted as a mentor to me. More than that, he has become a friend. Without his constant support, advice and guidance, this body of work presented here would not have been possible have been possible. Thank you.
vii Note on authorship
Chapters 2 and 5 of this thesis have been published as refereed journal articles. In each case, I am the first author. Below I outline the contributions of each author to each of these Chapters.
For Chapter 2,1 performed the research. B. Neuffer and H. Hurka provided
Capsella grandiflora and C. rubella seeds. E. Stahl provided computer coding allowing the program MIMAR to be run in a number of different ways. The data were analyzed and the paper written by T. Slotte, S. Wright and myself.
For Chapter 5,1 generated all of the nuclear gene data. M. Stift, A. Tedder and A.
Haudry generated the microsatellite and chloroplast data. M. Stift, A. Tedder, A. Haudry,
S. Wright, B. Mable and myself analyzed the data. M. Stift, B. Mable, S. Wright and myself wrote the paper.
I have received written permission from each of my coauthors to include these published articles in my dissertation (see Appendix 1).
viii Table of Contents
Abstract iv
Acknowledgments vi
Note on authorship viii
Table of Contents ix
List of Tables xi
List of Figures xv
Chapter 1
Introduction to Molecular Population Genetics of Mating System Transitions in the
Brassicacea 1
Chapter 2
Recent speciation associated with the evolution of selfing in Capsella 17
Chapter 3
No evidence for ongoing and widespread hybridization between sympatric populations of
C. grandiflora and C. rubella 61
Chapter 4
Dynamics of polyploid speciation in the genus Capsella 92
ix Chapter 5
Reconstructing origins of loss of self-incompatibility and selfing in North American
Arabidopsis lyrata: a population genetic context 144
Chapter 6
Conclusions 201
Appendix A - Permission to include published manuscripts in dissertation 205
x List of Tables
Chapter 2
Table 1. Modes of parameter estimates under a range of MIMAR models, with 90% HPD intervals in parentheses. 37
SI Table 2. Predictive posterior probabilities from simulations of the posterior distributions. 48
SI Table 3. Likelihood ratio test comparing model 1 to other models. 49
SI Table 4. Goodness-of-fit tests based on simulations under the marginal modes (a) and under maximum likelihood parameter estimates (b). 50
SI Table 5. Goodness-of-fit test P values based on simulations under the marginal modes
(a) and under maximum likelihood parameter estimates (b), allowing for a lineage- specific change in recombination in population 2. 51
SI Table 6. Sequence-based summary statistics as estimated for each locus in both C. grandiflora and C. rubella. 52
xi SI Table 7. Modes of parameter estimates under a range of MIMAR models using
summaries of the data that do not rely on ougroup inference, with 90% HPD intervals in parentheses. 56
Chapter 3
Table 1. Modes of parameter estimates under a range of MIMAR models using the entire dataset, allopatric populations only and sympatric populations only. 85
Table 2. Number of silent sites subdivided by species and population category (allopatric and sympatric) as well as gene ontology terms for each of the 18 loci used in this study.
86
Chapter 4
Table 1. Minimum number of synonymous substitutions and number of fixed differences between C. bursa-pastorisA and C. grandiflora, C. bursa-pastorisB and C. grandiflora and C. bursa-pastoris A and C. bursa-pastorisB. Number of fixed differences are given in parentheses. 122
Table 2. Modes of parameter estimates under a range of MIMAR models. 123
Table SI. Accession name and origin of each individual used in this study. 135
xii Table S2. Names and gene ontology terms for each of the 14 loci studied. 140
Table S3. Modes of parameter estimates under a range of MIMAR models for 5
simulated datasets where an autopolyploid event was simulated using the coalescent
simulation program ms (See Methods). 141
Chapter 5
Table 1. Population-level outcrossing rates (Tm), proportion of self-compatible (SC)
individuals (as an indication of selfing phenotype), summary statistics and observed
heterozygosity for 18 nuclear gene sequences, and observed and expected
heterozygosities (H0 and He) across nine microsatellite loci, with populations ordered by
increasing outcrossing rates. 185
Table 2. Linear regressions of outcrossing rate (Tm) and the proportion of self-compatible
(SC) individuals per population on synonymous diversity (jtsyn), corrected synonymous diversity, the recombination parameter (p), the corrected recombination parameter and observed heterozygosity (H0) across 18 nuclear gene sequences; and observed and expected microsatellite heterozygosity (H0 and He) across nine microsatellite loci. 188
Table 3. Total and unique number of variants (microsatellite alleles across nine loci, nuclear gene haplotypes across 18 genes) for the group of inbreeding populations and for the group of outcrossing populations; overall total number of different variants;
xiii number of variants shared across inbreeding and outcrossing populations. 190
Table SI. Explanation of population abbreviations, lakefront (where relevant), and
geographic specifications (state/province, country and coordinates) for each of the 24 populations used in this study (more details regarding sampling method are given in the
supplementary text). 196
xiv List of Figures
Chapter 2
Figure 1. Floral organs and petals are reduced in C. rubella (left) compared to C.
grandiflora (right). 38
Figure 2. Comparison of polymorphism patterns between C. grandiflora and C. rubella.
Bars represent the median, boxes the interquartile range, and whiskers extend out to 1.5-
times the interquartile range, a) n synonymous where 7i is the average pairwise
differences (16) b), the population recombination estimator p per base pair, using the
composite likelihood estimator of Hudson (32) c) Tajima's D (16) in C. grandiflora and
C. rubella. 39
Figure 3. Derived SNP frequencies in C. grandiflora and C. rubella calculated using A.
thaliana as an outgroup. 40
Figure 4. Smoothed marginal posterior distributions of speciation parameters estimated by MIMAR, for two models with posterior modes showing good fit to data summaries, assuming either symmetric migration (solid lines) or no migration (dashed lines) and equal effective population sizes in the ancestor as in present-day C. grandiflora. (A)
Marginal densities for effective population size (individuals) in C. rubella (grey) and C. grandiflora/the ancestral species. (B) Marginal densities for divergence time (years). (C) Marginal density for migration (4Nm), where N is the effective population size of C. grandiflora. 41
SI Figure 5. Observed and expected (under neutrality as calculated using Equation 49,
Tajima, 1989) minor allele frequency distribution of synonymous SNPs in a) C. grandiflora and b) C. rubella using A. thaliana as an outgroup. 57
SI Figure 6. Average levels of linkage disequilibrium as measured by the squared correlation coefficient r2 in C. rubella and C. grandiflora in lOObp windows. 58
SI Figure 7. STRUCTURE output for a) C. grandiflora and C. rubella combined (k=2) and for b) C. rubella alone (k=3). 59
SI Figure 8. Distribution of the fraction of shared polymorphism from simulated 39-gene datasets. 60
Chapter 3
Figure 1. Comparison of polymorphism patterns between C. grandiflora and C rubella for a) the entire dataset, b) the allopatric populations and c) the sympatric populations as measured by % synonymous where n is the average pairwise differences. 87
xvi Figure 2. Distribution of synonymous variants. Variants are classed as unique to C. rubella, unique to C. grandiflora or shared between species. The datasets are subdivided into a) the entire dataset, b) allopatric populations only and c) sympatric populations only. 88
Figure 3. Posterior probabilities of Bayesian clustering analysis (InStruct) conducted on the entire dataset, where k = 2-4. 89
Figure 4. Posterior probabilities of Bayesian clustering analysis (InStruct), using the (a)
C. grandiflora individuals and the (b) C. rubella individuals, where k = 2-4. 90
Figure SI. Geographic location of each of the nine populations included in this study.
91
Chapter 4
Figure 1. Floral organs and petals are reduced in C. bursa-pastoris and C. rubella (left and middle respectively) compared with C. grandiflora (right).
126
Figure 2. Comparison of silent polymorphism patterns between C. bursa-pastoris A, C. bursa-pastorisB, C. grandiflora and C. rubella, given by 7i synonymous, where % is the average pairwise difference. 127
xvii Figure 3. Number of synonymous fixed differences between all pairs of C. bursa- pastoris A, C. bursa-pastorisB, C. grandiflora and C. rubella. 128
Figure 4. Bayesian estimates of species trees of the Capsella genus generated using
BEST including the close relatives a) A. thaliana and b) A. thaliana and Neslia as outgroups.
129
Figure 5. Marginal posterior distributions of speciation parameters estimated by MIMAR, with posterior modes showing good fit to data summaries. 9 = 4Nefi where Ne is the effective population size and /x is the mutation rate (1.5*10"g) a) constrained model assuming equal effective population sizes in the ancestor as in present-day C. grandiflora b) unconstrained model. 131
Figure 6. Numbers of unique synonymous variants and for each of the three species pairs a) C. bursa-pastorisA and C. bursa-pastorisB, b) C. grandiflora and C bursa-pastorisA and c) C. grandiflora and C. bursa-pastorisB 133
Figure 7. Numbers of a) synonymous shared variants and b) synonymous fixed differences for each of the three species pairs 1) C. bursa-pastoris A and C. bursa-
xvin pastorisB, 2) C. grandiflora and C. bursa-pastorisA and 3) C. grandiflora and C. bursa- pastorisB. 134
Figure SI. Autopolyploid event simulated using the coalescent simulation program ms.
6 is the effective population size {4Nepi, where Ne is the effective population size and fj, is the mutation rate 1.5X10"8). 143
Chapter 5
Figure 1. Posterior probabilities of Bayesian clustering analysis (InStruct) using the combined nuclear gene sequence and microsatellite datasets, based on a prior of six clusters (k = 6). 191
Figure 2. Comparisons between microsatellite-based multilocus outcrossing rates (Tm) and genetic diversity: (a) Synonymous diversity (jtsyn) based on nuclear sequence data from 18 unlinked loci, where JT is the average number of pairwise differences between two sequences; (b) Expected heterozygosity (He) estimated from microsatellite data.
193
Figure SI. Posterior probabilities of Bayesian clustering analysis (using STRUCTURE,
2,000,000 generations with a burnin of 200,000), using a combination of the nuclear haplotype and microsatellite data. 198
xix Figure S2. Posterior probabilities of Bayesian clustering analysis (using InStruct,
2,000,000 generations with a burnin of 200,000), using a combination of the nuclear haplotype and microsatellite data. 199
Figure S3. Distribution of the -In probability (-ln(P), represented by closed circles) and its variance (Var[ln(P)], represented by open squares) for Bayesian clustering analysis
(2,000,000 generations with a burnin of 200,000, five chains for each setting of the number of pre-defined clusters (k), k ranging from 1-12) with a) InStruct and b)
STRUCTURE. 200
xx CHAPTER 1
Introduction to Molecular Population Genetics of Mating System Transitions in the
Brassicaceae
1 The transition from outcrossing to selfing represents one of the most common
major evolutionary transitions in flowering plants (Stebbins 1970; Barrett 2002).
Predominantly selfing and outcrossing species pairs exist in a variety of plant taxa
including Leavenworthia (Charlesworth and Yang 1998; Liu et al. 1998; Filatov and
Charlesworth 1999; Liu et al. 1999), Arabidopsis (Ross-Ibarra et al. 2008; Savolainen et
al. 2000; Wright et al. 2003), Lycopersicon (Baudry et al. 2001), Miscanthus (Chiang et
al. 2003) Amsinckia (Schoen et al. 1997) and Capsella (Foxe et al. 2009 (Chapter 2)).
This transition confers two major advantages upon selfers versus their outcrossing
congeners. First, because selfers are 100% related to their progeny and can also act as
pollen donors for seed produced by other individuals, they have an inherent transmission
advantage over outcrossers (Fisher 1941; Nagylaki 1976; Lloyd 1979). Secondly, a
selfing lifestyle results in reproductive assurance when pollinators or potential mates are
limited (Baker 1955). Evolutionary theory predicts that selfing will evolve when these
advantages outweigh the costs associated with inbreeding depression (Charlesworth
2006).
The shift from outcrossing to selfing often leads to a number of both phenotypic
and genotypic consequences. Phenotypically, this transition to selfing is almost universally associated with the 'selfing syndrome', characterized by a severe reduction in flower size and a breakdown of the morphological and genetic mechanisms preventing
self-fertilization. Empirically, the population genetic consequences associated with a transition to selfing have been well documented at the species level in both plant and animal systems (Charlesworth and Yang 1998; Baudry et al. 2001; Chiang et al. 2003;
2 Cutter and Payseur 2003; Glemin 2006). Most directly, selfing is associated with an
increase in homozygosity and a decrease in levels of genetic diversity (Wright et al.
2008). As a result of a reduction in the number of distinct alleles, the effective population
size (Ne) is decreased. Theory predicts that a completely selfing population will decrease
its effective population size by 50%, as a result of homozygosity (Charlesworth et al.
1993; Nordborg 2000). Furthermore, this increase in homozygosity can lead to an
increase in levels of linkage disequilibrium among loci resulting in amplified effects of
genetic hitchhiking. This can lead to the fixation of positively selected mutations and the
purging of deleterious alleles due to purifying selection, both of which will further reduce
Ne.
While there are clear advantages to a selfing lifestyle (described above), selfing
does come with its disadvantages. The transition to selfing can lead to a rapid increase in
homozgote frequencies. This increase can cause individuals to express recessive or
particularly deleterious alleles, leading to lower survival rates and reduced fertility, a
condition known as inbreeding depression (Charlesworth 2006).
Despite the advantages to a selfing lifestyle, a mere 20-25% of plant taxa are predominantly selfing (Barrett and Eckert 1990), leading us to the obvious question: why
aren't more plants selfing? The predominant hypothesis attempting to answer this
question suggests that selfing is an "evolutionary dead end." Specifically, this hypothesis
suggests that selfing lineages do not persist over extended evolutionary time periods and that new lineages are founded from outcrossing progenitors.
3 Several studies suggest that the complex characteristics promoting outcrossing seem to have broken down frequently leading to transitions to selfing. Furthermore, these selfers are typically restricted to the terminal clades of phylogenies suggesting that the selfing lineages evolved from the outcrossing progenitors and that these selfing lineages tend not to be ancient. For example, a phylogenetic analysis of the genus Amsinckia, based on chloroplast DNA restriction site variation, suggests four independent origins of homostylous selfing species from heterostylous outcrossers. These finding were based on the assumption that distyly was ancestral or that the loss of distyly was more likely than the gain (Schoen et al. 1997).
While studies have suggested frequent transitions to selfing, in most cases the timescale over which this transition occurs remains difficult to ascertain. The model plant system A. thaliana is thought to have evolved self-fertilization approximately 1 million years ago through inactivation of the self-incompatibility locus, referred to as the S-locus
(Tang et al. 2007). Evidence for the role of the S-locus stems from transformation studies, which identified five accessions in which full self-incompatibility could be restored by transformation with a functional S-locus, and implies that all other genes required for SI are still intact in these accessions (Tang et al. 2007; Boggs et al. 2009). Recent results suggest that a mutation in the male component of self-incompatibility (SCR) has resulted in loss of SI, apparently across a wide range of accessions (Tsuchimatsu et al. 2010). In addition, a modifier locus has been identified, unlinked to the 5-locus (Liu et al. 2007), which suggests that 5-locus inactivation may not be the sole mechanism by which SI
4 broke down in A. thaliana and different mechanisms of loss could have operated in
different accessions (Boggs et al. 2009).
Systems with more recent transitions from outcrossing to selfing may provide a
more direct picture of the causes and short-term consequences of mating system
evolution (Foxe et al. 2009 (Chapter'2); Guo et al. 2009; Ness et al. 2010). For example,
if the evolution of selfing involves the long-term spread of modifiers through previously
outcrossing populations, recently-derived selfing populations are expected to retain
reasonably high levels of ancestral polymorphism, as recently observed in Eichhornia paniculata (Ness et al. 2010). In contrast, if a highly selfing lineage evolves rapidly from
a small number of founders, we would expect a severe loss of genetic variation, as seen in
Capsella rubella (Foxe et al. 2009 (Chapter 2)).
In Chapter 2 of this dissertation, I investigate the evolution of selfing in diploid
species in the genus Capsella. Capsella rubella is characterized by a high rate of self-
fertilization (Hurka et al. 1989) and shows the typical morphological characteristics of a
selfing syndrome (Hurka and Neuffer 1997). From genetic marker studies, the selfing rate
in C. rubella has been estimated as 1, with a lower-bound estimate of 0.7 (Hurka et al.
1989). In comparison with its self-incompatible congener C grandiflora, there has been a derived breakdown of the self-incompatibility mechanism, and its floral organ sizes are highly reduced. The breakdown of self-incompatibility is also associated with an expansion of geographic range; C. grandiflora is restricted to Greece and Albania and
locally in Northern Italy, while C rubella has expanded into much of southern Europe extending to Middle Europe, Northern Africa and into Australia and North and South
5 America (Hurka and Neuffer 1997; Paetsch et al. 2006). Interspecific crossing experiments suggest that, in addition to mating system evolution, there is considerable post-pollination reproductive isolation between the species, with only a small proportion of crosses producing viable seed ((Hurka and Neuffer 1997; Koch and Kiefer 2005), T.
Slotte, K. Hazzouri, and S. Wright, unpublished).
My results from Chapter 2 suggest that this transition to selfing has lead to a substantial bottleneck in C. rubella with a 100-1500 fold reduction in effective population size (Ne) (Foxe et al. 2009 (Chapter 2)). Furthermore I estimate a divergence time of approximately 13,500 years between these two species (Foxe et al. 2009 (Chapter
2)) suggesting that dramatic genotypic and phenotypic changes in C. rubella occurred over a relatively short period of time.
While results from studies outlined in Chapter 2 show little evidence for introgression in natural populations of C. grandiflora and C. rubella, in Chapter 3 I ask whether a shift in mating system has lead to the establishment of effective reproductive isolating barriers between these two sister species. Through the investigation of the population genetics of potentially hybridizing populations of C. grandiflora and C. rubella I find no significant evidence for ongoing gene flow between C. grandiflora and
C. rubella, suggesting that differences in mating system are acting as an effective reproductive barrier between these two species.
An additional factor in the speciation process is that of polyploidy. Polyploidy is considered by many to be the predominant mode of sympatric speciation (Coyne and Orr
2004; Mallet 2007). Polyploidization can act to instantly create a new species as the
6 newly formed polyploid is often immediately reproductively isolated from the diploid
progenitor(s) due to changes in ploidy. Such changes in ploidy are in often associated
with mating system transitions and (or) vice versa. The relative contribution of
polyploidy to speciation in plants is a controversial topic with widely varying estimates
of the frequency of polyploid speciation. Based upon the fraction of speciation events that
involve any change in chromosome number as well as the fraction of changes in
chromosome number that involve polyploidy, Otto and Whitton (2000) report that 2-4%
of speciation events in angiosperms and 7% in ferns are a direct result of polyploidy.
More recent estimates based upon phylogenetic data estimate the frequency of polyploid
speciation by tracking changes in ploidy level across infrageneric phylogenetic trees
(Wood et al. 2009). Wood and colleagues (2009) put this number at 15%) in angiosperms
and 31% ferns. These estimates indicate that polyploidization represents an extremely
common vehicle for the speciation process in plants.
Given the dominant role of polyploidization in plant speciation, understanding the
evolutionary context in which this process occurs becomes an important aspect of
speciation genetics. Several relevant and important questions must be posed in order to
elucidate the evolutionary history of a polyploid species, for instance: does the species
have a single or multiple origins; what is the role of founder events in this process; is there ongoing gene flow between the species and its ancestor(s); when did the polyploidization event occur; is the species an alio- or autopolyploid? Addressing these questions is difficult considering the challenges associated with distinguishing multiple
7 origins of polyploids, extinction of parental lineages and the sampling of standing
variation in progenitor species (Doyle and Egan 2009).
One such polyploid is the weed C. bursa-pastoris or Shepherd's Purse, which is a
selfing tetraploid species and a member of the genus Capsella (Hurka and Neuffer 1997).
Like C. rubella, C. bursa-pastoris phenotypically very clearly displays the selfing
syndrome. Previous work on these species has suggested that both C. grandiflora and C.
rubella may be ancestral to C. bursa-pastoris (Hurka and Neuffer 1997) and more recent
findings reveal that C rubella diverged from C. grandiflora approximately 13,500 years
ago (Foxe et al. 2009 (Chapter 2, see above)). Capsella bursa-pastoris has a worldwide
distribution that can partly be explained anthropogenically (Hurka and Neuffer 1997;
Slotte et al. 2008). In contrast to C. grandiflora and C. rubella, C. bursa-pastoris can be
found on each continent and thrives in a wide climate range (Hurka and Neuffer 1997).
In Chapter 4, using DNA sequence data from 14 unlinked nuclear loci in C.
bursa-pastoris, C. grandiflora and C. rubella, I address the following areas. First, I
characterize patterns of polymorphism in all three species in this genus. Next, using
molecular phylogenetic techniques, I elucidate the evolutionary phylogenetic relationships between C. bursa-pastoris, C grandiflora and C. rubella. Finally, using coalescent-based analyses I date the divergence of C. bursa-pastoris from the ancestral C. grandiflora and assess evidence for population bottlenecks in C. bursa-pastoris, thus elucidating the evolutionary history between all three species in the genus Capsella. My results suggest that C. bursa-pastoris originated via autopolyploidization involving two
8 distinct C grandiflora haplotypes. Furthermore, I have dated this event at approximately
667,000 years ago.
In Chapter 5,1 investigate the demographic history and population structure of
North American A. lyrata (Foxe et al. 2010). It has been suggested that A. lyrata
colonized North America from ancestral European populations (Clauss and Mitchell-Olds
2006; Ross-Ibarra et al. 2008), which are highly self-incompatible and exclusively
outcrossing. The North American populations are unique because some are still
predominantly outcrossing, despite the occurrence of self-compatible individuals at low
frequency, while others are almost entirely self-compatible and have undergone a
transition to high rates of selfing (Mable et al. 2005; Mable and Adam 2007). This
transition to selfing in A. lyrata appears to be very recent, as selfing populations belong
to a chloroplast lineage that also contains outcrossing populations (Hoebe et al. 2009).
Moreover, selfing populations are not characterized by smaller flowers (Hoebe 2009),
which contrasts with other systems where the transition to selfing has led to notable floral
evolution towards smaller flowers (Hurka and Neuffer 1997; Charlesworth and
Vekemans 2005; Tang et al. 2007; Foxe et al. 2009 (Chapter 2); Guo et al. 2009).
In this study, I integrate polymorphism information from nuclear genes,
chloroplast markers and nuclear microsatellites, in order to obtain a detailed picture of the
demographic history and population structure of A. lyrata in the Great Lakes region of
North America. I find evidence for strong geographic clustering irrespective of mating
system, suggesting that selfing either evolved multiple times or has spread to multiple genetic backgrounds. I find much reduced diversity in selfing populations, but not to the
9 extent of the severe loss of variation expected if selfing evolved under severe selective pressure to colonize new areas. Furthermore, unlike the transition to selfing in C rubella these results suggest multiple transitions to selfing in this system.
10 References
Baker, H. G. 1955. Self-compatibility and establishment after long-distance dispersal.
Evolution 9:347-349.
Barrett, S. C. H. 2002. The evolution of plant sexual diversity. Nat Rev Genet 3:274-284.
Barrett, S. C. H., and C. G. Eckert. 1990. Variation and evolution of mating systems in
seed plants. Pp. 229-254 in S. Kawano, ed. Biological approaches and
evolutionary trends in plants. Academic Press, New York, New York, USA.
Baudry, E., C. Kerdelhue, H. Innan, and W. Stephan. 2001. Species and recombination
effects on DNA variability in the tomato genus. Genetics 158:1725-1735.
Boggs, N. A., J. B. Nasrallah, and M. E. Nasrallah. 2009. Independent S-locus mutations
caused self-fertility in Arabidopsis thaliana. PLoS Genet 5:e 1000426.
Charlesworth, B., M. T. Morgan, and D. Charlesworth. 1993. The effects of deleterious
mutations on neutral molecular varaition. Genetics 134:1289-1303.
Charlesworth, D. 2006. Evolution of plant breeding systems. Curr Biol 16:R726-735.
Charlesworth, D., and X. Vekemans. 2005. How and when did Arabidopsis thaliana
become highly self-fertilising. Bioessays 27:472-476.
Charlesworth, D., and Z. Yang. 1998. Allozyme diversity in Leavenworthia populations
with different inbreeding levels. Heredity 81:453-461.
Chiang, Y. H., B. A. Schaal, C. H. Chou, S. Huang, and T. Y. Chiang. 2003. Contrasting
selection modes at the Adhl locus in outcrossing Miscanthus sinensis vs.
inbreeding Miscanthus condensatus (Poaceae). Am J Bot 90:561-570.
11 Clauss, M. J., and T. Mitchell-Olds. 2006. Population genetic structure of Arabidopsis
lyrata in Europe. Mol Ecol 15:2753-2766.
Coyne, J. A., and H. A. Orr. 2004. Speciation. Sinauer Associates, Inc., Sunderland,
Massachusetts.
Cutter, A. D., and B. A. Payseur. 2003. Selection at Linked Sites in the Partial Selfer
Caenorhabditis elegans. Mol Biol Evol 20:665-673.
Doyle, J. J., and A. N. Egan. 2009. Dating the origins of polyploidy events. New Phytol
186-73-85.
Filatov, D. A., and D. Charlesworth. 1999. DNA polymorphism, haplotype structure and
balancing selection in the Leavenworthia PgiC locus. Genetics 153:1423-1434.
Fisher, R. A. 1941. Average excess and average effect of a gene substitution. Ann Eugen
11:53-63.
Foxe, J. P., T. Slotte, E. A. Stahl, B. Neuffer, H. Hurka, and S. I. Wright. 2009. Rapid
morphological evolution and speciation associated with the evolution of selfing in
Capsella. PNAS 106:5241-5245.
Foxe, J. P., M. Stift, A. Tedder, A. Haudry, S. I. Wright, and B. K. Mable. 2010.
Reconstructing Origins of Loss of Self-Incompatibility and Selfing in North
American Arabidopsis Lyrata: a Population Genetic Context. Evolution In Print.
Glemin, S., Bazin, E. & Charlesworth, D. 2006. Impact of mating systems on patterns of
sequence polymorphism in flowering plants. Proc Biol Sci 273:3011-3019.
Guo, Y.-L., J. S. Bechsgaardb, T. Slotte, B. Neuffer, M. Lascoux, Weigel D., and M. H.
Schierup. 2009. Recent speciation of Capsella rubella from Capsella grandiflora,
12 associated with loss of self-incompatibility and an extreme bottleneck PNAS
106:5246-5251.
Hoebe, P. N., M. Stift, A. Tedder, and B. K. Mable. 2009. Multiple losses of self-
incompatibility in North-American Arabidopsis lyrata: phylogeographic context
and population genetic consequences. Mol Ecol 18:4924-4939.
Hurka, H., S. Freundner, A. H. Brown, and U. Plantholt. 1989. Aspartate
aminotransferase isozymes in the genus Capsella (Brassicaceae): subcellular
location, gene duplication, and polymorphism. Biochem Genet 27:77-90.
Hurka, H., and B. Neuffer. 1997. Evolutionary processes in the genus Capsella
(Brassicaceae). Plant Sys Evol 206:295-316.
Koch, M., and M. Kiefer. 2005. Genome evolution among cruciferous plants: a lecture
from the genetic maps of three diploid species— Capsella rubella, Arabidopsis
lyrata subsp. petraea, and A. thaliana. Am J Bot 92:761-767.
Liu, F., D. Charlesworth, and M. Kreitman. 1999. The effect of mating system
differences on nucleotide diversity at the phosphoglucose isomerase locus in the
plant genus Leavenworthia. Genetics 151:343-357.
Liu, F., L. Zhang, and D. Charlesworth. 1998. Genetic diversity in Leavenworthia
populations with different inbreeding levels. Proc R Soc Lond B Biol Sci
265:293-301.
Liu, P., S. Sherman-Broyles, M. E. Nasrallah, and J. B. Nasrallah. 2007. A Cryptic
Modifier Causing Transient Self-Incompatibility in Arabidopsis thaliana. Curr
Biol 17:734-740.
13 Lloyd, D. G. 1979. Some reproductive factors affecting the selection of self-fertilization
in plants. Am Nat 113:67-79.
Mable, B. K., and A. Adam. 2007. Patterns of genetic diversity in outcrossing and selfing
populations of Arabidopsis lyrata. Mol Ecol 16:3565-3580.
Mable, B. K., A. V. Robertson, S. Dart, C. Di Berardo, and L. Witham. 2005. Breakdown
of self-incompatibility in the perennial Arabidopsis lyrata (Brassicaceae) and its
genetic consequences. Evolution 59:1437-1448.
Mallet, J. 2007. Hybrid speciation. Nature 446:279-283.
Nagylaki, T. 1976. A model for the evolution of self-fertilization and vegetative
reproduction. J Theor Biol 58:55-58.
Ness, R. W., S. I. Wright, and S. C. H. Barrett. 2010. Mating-System Variation,
Demographic History and Patterns of Nucleotide Diversity in the Tristylous Plant
Eichhornia Paniculata Genetics 184:381-392.
Otto, S. P., and J. Whitton. 2000. Polyploid incidence and evolution. Ann Rev Gen
34:401-437.
Nordborg, M. 2000. Linkage disequilibrium, gene trees and selfing: an ancestral
recombination graph with partial self-fertilization. Genetics 154:923-929.
Paetsch, M., S. Maryland-Quellhorst, and B. Neuffer. 2006. Evolution of the self-
incompatibility system in the Brassicaceae: identification of S-locus receptor
kinase (SRK) in self-incompatible Capsella grandiflora. Heredity 97:283-290.
14 Ross-Ibarra, J., S. I. Wright, J. P. Foxe, A. Kawabe, L. DeRose-Wilson, G. Gos, D.
Charlesworth, and B. S. Gaut. 2008. Patterns of Polymorphism and Demographic
History in Natural Populations of Arabidopsis lyrata. PloS One 3:e2411.
Savolainen, O., C. H. Langley, B. P. Lazzaro and H. Freville, 2000. Contrasting patterns
of nucleotide polymorphism at the alcohol dehydrogenase locus in the outcrossing
Arabidopsis lyrata and the selfing Arabidopsis thaliana. Mol Biol Evol 17:645-
655.
Schoen, D. J., M. O. Johnston, A.-M. L'Heureux, and J. V. Marsolais. 1997. Evolutionary
history of the mating system in Amsinckia (Boraginaceae). Evolution 51:1090-
1099.
Slotte, T., H. Huang, M. Lascoux, and A. Ceplitis. 2008. Polyploid speciation did not
confer instant reproductive isolation in Capsella (Brassicaceae). Mol Biol Evol
25:1472-1481.
Stebbins, G. L. 1970. Adaptative radiation of reproductive characteristics in angiosperms.
I. Pollination mechanisms. Ann Rev Ecol Sys 1:307-326.
Tang, C, C. Toomajian, S. Sherman-Broyles, V. Plagnol, Y. L. Guo, T. T. Hu, R. M.
Clark, J. B. Nasrallah, D. Weigel, and M. Nordborg. 2007. The evolution of
selfing in Arabidopsis thaliana. Science 317:1070-1072.
15 Tsuchimatsu T., K. Suwabe, R. Shimizu-Inatsugi, S. Isokawa, P., Pavlidis, T. Stadler, G.
Suzuki, S. Takayama, M. Watanabe and K. K. Shimizu. 2010. Evolution of self-'
compatibility in Arabidopsis by a mutation in the male specificity gene. Nature
464:1342-1346.
Wood, T. E., N. Takebayashi, M. S. Barker, I. Mayrose, P. B. Greenspoon, and L. H.
Rieseberg. 2009. The frequency of polyploid speciation in vascular plants. PNAS
106:13875-13879.
Wright S. I., B. Lauga and D. Charlesworth. 2003. Subdivision and haplotype structure in
natural populations of Arabidopsis lyrata. Mol Ecol 12:1247-1263.
Wright, S. I., R. W. Ness, J. P. Foxe, and S. C. H. Barrett. 2008. Genomic consequences
of outcrossing and selfing in plants. Int J Plant Sci 169:105-118.
16 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 CHAPTER 2
Recent speciation associated with the evolution of selfing in Capsella
Biological Sciences: Evolution
John Paul Foxe*'+, Tanja Slotte*' * •+, Eli A. Stahls, Barbara Neuffer1', Herbert Hurka11 and Stephen I. Wright*,f
* These authors contributed equally to the work t Department of Biology, York University, 4700 Keele St. Toronto, ON Canada M3J1P3 t Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcocks St., Toronto, ON Canada M5S 3B2
§Department of Biology, University of Massachusets Dartmouth, North Dartmouth, MA 07247 USA
H Universitat Osnabriick, Fachbereich Biologie/Chemie, Spezielle Botanik, Barbarastrasse 11, D-49076 Osnabriick, Germany
Corresponding Author: Stephen I. Wright Department of Ecology and Evolutionary Biology University of Toronto 25 Willcocks St., Toronto, ON Canada. Phone: 416-946-8508 Fax: 416-978-5878 [email protected]
17 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 Abstract
The evolution from outcrossing to predominant self-fertilization represents one of the most common transitions in flowering plant evolution. This shift in mating system is almost universally associated with the 'selfing syndrome', characterized by marked reduction in flower size and a breakdown of the morphological and genetic mechanisms that prevent self-fertilization. In general, the timescale in which these transitions occur, and the evolutionary dynamics associated with the evolution of the i selfing syndrome are poorly known. We investigated the origin and evolution of selfing in the annual plant Capsella rubella from its self-incompatible, outcrossing progenitor Capsella grandiflora by characterizing multilocus patterns of DNA sequence variation at nuclear genes. We estimate that the transition to selfing and subsequent geographic expansion have taken place during the last 20,000 years. This transition was probably associated with a shift from stable equilibrium towards a near-complete population bottleneck causing a major reduction in effective population size. The timing and severe founder event support the hypothesis that selfing was favored during colonization as new habitats emerged following the last glaciation and the expansion of agriculture. These results suggest that natural selection for reproductive assurance can lead to major morphological evolution and speciation on relatively short evolutionary timescales.
18 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 Introduction
Selfing plants benefit from two distinct advantages over their outcrossing
competitors (1-3). First, because selfers are 100% related to their progeny and can
also act as outcross pollen donors for seed produced by other individuals, they have
an inherent transmission advantage over outcrossers (4). A second major advantage
conferred by selfing, first discussed by Darwin (5), is the ability to reproduce when
pollinators or potential mates are limited (reproductive assurance). One important
aspect of reproductive assurance is the ability of selfing lineages to colonize new
habitats from a very small founding population. Evolutionary theory predicts that
selfing will evolve when these advantages outweigh the costs associated with
inbreeding depression, reduced fitness caused by increased homozygosity of
deleterious recessive alleles (6).
In most cases the timescale over which the selfing syndrome has evolved
remains difficult to ascertain. In the model genus Arabidopsis for example, the selfing
A. thaliana diverged from its closest outcrossing relatives roughly 5 million years ago
(7), and patterns of diversity have suggested a possibly complicated picture of the breakdown of outcrossing over a period of more than a million years (7, 8). Thus it remains unclear how rapidly the selfing syndrome can arise, and what historical conditions have favored mating system evolution. Capturing and characterizing a recent transition is important for inferring this timescale, and the relative importance of various forces favoring its evolution. For example, if selfing is initially favored through the genetic transmission advantage, we would expect that self-compatibility alleles would invade individual populations that were ancestrally outcrossing, and the
19 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 recently derived selfing species should maintain considerable levels of genetic variation at regions unlinked to the locus causing selfing (9). In contrast, selection for colonization ability in small, founding populations should lead to a severe reduction in genetic variation genome-wide in the derived selfing lineage.
Here, we investigate the evolution of selfing in diploid species in the genus
Capsella. Capsella rubella is characterized by a high rate of self-fertilization (10) and shows the typical morphological characteristics of a selfing syndrome (11). From genetic marker studies, the selfing rate in C. rubella has been estimated as 1, with a lower-bound estimate of 0.7 (10). In comparison with its self-incompatible congener
C. grandiflora, there has been a derived breakdown of the self-incompatibility mechanism, and its floral organ sizes are highly reduced (Figure 1). The breakdown of self-incompatibility is also associated with an expansion of geographic range; C. grandiflora is restricted to Western Greece and Albania and locally in Northern Italy, while C. rubella has expanded into much of southern Europe extending to Middle
Europe, Northern Africa and into Australia and North and South America (11, 12).
Interspecific crossing experiments suggest that, in addition to mating system evolution, there is considerable post-pollination reproductive isolation between the species, with only a small proportion of crosses producing viable seed ((11, 13), T.
Slotte, K. Hazzouri, and S. Wright, unpublished). We characterized multilocus patterns of neutral DNA sequence diversity across the genome to investigate the evolutionary history associated with this transition to selfing. We estimated the parameters of a model of isolation with or without migration, and tested the goodness of fit of the data to models incorporating species-specific recombination rate. Our
20 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 approach provides a detailed examination of the evolutionary history associated with the origins of selfing Capsella.
Results
Patterns of Polymorphism
We estimated levels of synonymous and nonsynonymous diversity in a geographically broad sample of both C. rubella (14 individuals) and C. grandiflora
(20 diploid individuals) using direct sequencing of 39 nuclear genes. We identified a total of 587 synonymous single nucleotide polymorphisms (SNPs) and 343 nonsynonymous SNPs in C. grandiflora. In C. rubella, diversity was highly reduced with 81 synonymous and 41 nonsynonymous SNPs. Figure 2a illustrates this major reduction in synonymous diversity in C. rubella when compared to C. grandiflora.
Over a third of loci (36%) were devoid of any variation in C rubella, and almost half
(46%) lacked synonymous site variation. In contrast, all loci were highly variable in
C. grandiflora.
Although a reduction in sequence diversity in selfing species is predicted under models of long-term equilibrium (14), the diversity reduction seen here is extreme (15), and our sequence data imply a very close relationship between C. rubella and C. grandiflora. At the majority of loci (74%), we identify sequence haplotypes shared between the species, and sequence divergence, when present, is very low. To illustrate the patterns of polymorphism, we plot the joint derived SNP frequency spectrum from twelve samples of each species (Figure 3). More than 80% of the segregating sites found in C. rubella are shared with C. grandiflora, and thus
21 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 there is a very small proportion of private polymorphisms present in this species
(Figure 3; private polymorphisms to C. rubella have a derived frequency of 0 in C. grandiflora). We also identify a significant fraction of the segregating sites found in
C. grandiflora to be fixed derived SNPs in C. rubella (Figure 3; Derived allele frequency of 12 in C. rubella). Similarly only 3 fixed differences were identified in our dataset in a total of over 20,000 bp surveyed for this study. This suggests that the sequence haplotypes present in C. rubella were mostly sub-sampled from existing variation in C. grandiflora.
Analysis of the frequency distribution of polymorphisms and linkage disequilibrium suggest that C. grandiflora represents a large, stable equilibrium population. In particular, the derived frequency spectrum of synonymous SNPs matches closely to the expected distribution under long-term constant population size
(Fig. 2c, SI Fig. 5). Furthermore, linkage disequilibrium decays rapidly with distance
(SI Fig. 6) giving a high estimate of the effective rate of recombination (Fig. 2b), consistent with expectations from neutral equilibrium (10, 16). In contrast C. rubella shows much greater within-locus linkage disequilibrium and very little effective recombination within genes, consistent with a severe bottleneck and recent transition towards high selfing (Fig. 2; SI Fig. 6). Nevertheless, a lack of linkage disequilibrium between loci (SI Fig. 6) suggests that C. rubella does some level of outcrossing, uncoupling coalescent history among distant loci. The site frequency spectrum, measured by Tajima's D (16), shows an elevated variance across loci (Figure 2) relative to the predictions of stable equilibrium in C. rubella (simulation results,
22 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 p<0.05; Figure 2), providing further evidence for recent changes in population size
(17).
Evidence for Single Origin of G rubella
The extent of shared haplotypes and polymorphism between these species contrasts with the 5 million year divergence of the selfing species Arabidopsis thaliana from its closest extant outcrossing relatives (7), and suggests one of three possibilities; very recent evolution of selfing, multiple origins of C. rubella, and/or ongoing hybridization. We first sought to test whether selfing C. rubella was derived from a single speciation event, or whether we can detect evidence for multiple origins or recent hybridization. A Bayesian clustering algorithm with the entire dataset (18) found the most likely number of clusters to be 2, in which all C. rubella and C. grandiflora individuals cluster with their own species, with no evidence for very recent hybridization affecting this sample (SI Fig. 7). If we analyze the species on their own, we find no evidence for geographic substructure in our sample of C. grandiflora, while C. rubella subdivides into three distinct clusters, broadly consistent with the geographic origins of our samples (SI Fig. 7). Given the wide geographic sampling for this study, the combined results suggest that C. rubella has evolved once, with subsequent geographic expansion and subdivision.
Demographic Model Fitting
Our results provide evidence for a single recent origin of Capsella rubella from Capsella grandiflora, with little or no recent hybridization affecting our
23 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 samples. To investigate the dynamics of the speciation event in detail, we used a
Markov Chain Monte Carlo (MCMC) approach based on coalescent simulations to fit a model of isolation with and without migration to the data (19). The approach makes use of the observed information from each locus on the number of shared and unique polymorphisms, as well as fixed differences. For our analysis, we restrict the inference to synonymous sites, to avoid potential effects of selection on the nonsynonymous variants. The model assumes that a single ancestral population of size Na split into two at time x, and the two derived populations have distinct effective population sizes (Ni and N2).
Although our Structure results suggest little or no hybridization, it remains possible that historical gene flow has contributed to the shared polymorphism between the species. To estimate demographic parameters, we therefore investigated a series of models that varied in the inclusion of symmetrical, asymmetrical or no migration, and varied whether or not the ancestral population size was constrained to be the same size as present-day C. grandiflora. We estimated historical demographic parameters from a total of four alternative models (Table 1, SI Tables 2 to 5). To evaluate the fit of these models to the data, we used goodness-of-fit tests. In particular we simulated datasets from both the posterior distributions and the point parameter estimates of each model, and estimated the probability of seeing our observed multilocus summary statistics, including additional data summaries not used in parameter estimation (see Supplemental Information).
Our parameter estimates from these models are consistent with an extremely recent speciation event associated with a major reduction in effective population size
24 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 in C. rubella; Figure 3 shows the parameter estimates from two of the best fit models, based on the results of goodness-of-fit tests (Table 1 and Supplementary Information).
Under the model with no migration, our most likely estimate of divergence time is approximately 13,500 years ago with a 90% posterior density from 1500 to 51,000 years, suggesting species divergence since the last glacial maximum about 21,000 years ago (20). Similarly, models with migration provide estimates of divergence time of 15,000 or 22,000 years ago. The boundaries on the divergence time estimate across all models examined depend on whether migration is included in the model (Table 1); in particular, allowing for a wide prior probability on migration and time leads to a long tail in the posterior density which is small but non-zero for older divergence
(Figure 4b). However, the best model is clearly a recent divergence in the last 25,000 years. Estimates of population size parameters suggest that C. grandiflora has maintained a large population size since its divergence from the ancestor, while we infer an approximately 100 to 1500-fold smaller effective population size in C. rubella (Table 1; Figure. 4b). Taken together, these results clearly indicate recent speciation associated with the breakdown of self-incompatibility from a small number of founding lineages.
The one exception is a 1.4 million year divergence time estimate from the fully unconstrained model (Table 1, Supplementary Information); this model also infers a very high migration rate from C. rubella to C. grandiflora, an extremely low effective population size in C. rubella and a very low ancestral population size. The inferred level of gene flow under this model seems implausibly high given the mating system, our Structure results, and the evidence for post-pollination reproductive
25 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 isolation between the species. Furthermore, goodness-of-fit tests and comparisons of likelihoods suggest that this unconstrained model provides a poorer fit to our data
(Supplementary Information). However, since some models with and without migration appear to provide comparable fits to the observed data, divergence may be too recent to have sufficient power to infer whether or not there has been ongoing gene flow.
One potential concern with the approach used is that the model assumes a constant rate of recombination since the species split. Since the transition to selfing is expected to lead to a dramatic shift in the effective recombination rate (12), this assumption is violated. However, the method has been shown to be robust to inaccuracies in estimation of recombination (17), and given the very recent timescale and massive population bottleneck inferred here, effective recombination rates are suppressed even in the absence of this effect. Consistent with this, goodness-of-fit tests using simulations that allow for a lineage-specific change in recombination rate to zero in C. rubella show that the best-fit models provide an equally adequate fit to the data when this shift is incorporated (SI Table 5).
Discussion
The timescale of speciation suggested by our analysis is consistent with selfing evolving following the last glacial maximum. C grandiflora is distributed in the Balkans, which was a glacial refugium for many plant species (11), and this is consistent with our inference of long-term equilibrium for the outcrossing species.
Furthermore, the massive loss of diversity in C. rubella suggests that selfing alleles
26 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 did not spread through a previously outcrossing population because of their transmission advantage. Instead selection for reproductive assurance, either due to a lack of pollinators or via selection for colonization ability, seems the most likely explanation for the evolution of selfing. Under either mode of selection for reproductive assurance, the enhanced colonization ability likely contributed to the geographic spread of selfing. As the glaciers receded, agriculture spread and Europe warmed, new habitats emerged, and this would favor colonization by selfing genotypes that can reproduce in the absence of pollinators and/or available mates (21-
23). Our inference of a very severe population bottleneck associated with the recent evolution of selfing is consistent with this model for the evolution and spread of selfing Capsella. This result is in line with mounting ecological data suggesting that reproductive assurance is a major factor favoring the evolution and maintenance of selfing (24-26).
If, as our results suggest, C. rubella experienced recent evolution of selfing associated with a severe population bottleneck, we would predict a similar severe loss of variation with little sequence divergence at the self-incompatibility locus. Recent results confirm this prediction (Guo, Y., et al, unpublished), providing support for our inference of recent speciation tied to the breakdown of self-incompatibility.
In A. thaliana, the transition to selfing appears to have been gradual, and occurred considerably further back in time (about 1 Myr or more), providing ample time for the evolution of morphological traits associated with the selfing habit (8). In contrast, our inference of recent speciation in Capsella implies that extensive phenotypic evolution has occurred on a very short evolutionary timescale. Given the
27 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 evidence for rapid morphological evolution, one possible alternative explanation to a severe bottleneck in C. rubella is the occurrence of positive selection reducing diversity genome-wide. Although it is likely that the evolution of selfing and spread through Europe has lead to novel selective pressures, our observed low between-locus linkage disequilibrium (SI Figure 6) implies that this would likely be restricted to a small subset of the genome even in this highly selfing species, unless the extent of positive selection is very high. Furthermore, positive selection at a subset of loci would be expected to increase the variance in diversity across loci beyond the bottleneck expectations, but goodness-of-fit tests suggest that our models are sufficient to explain the observed variance (SI Tables 4 and 5). Finally, the observed retention of shared ancestral polymorphism in C. rubella is unlikely under a model of genome-wide selective sweeps (SI Figure 8).
Given the extreme population bottleneck, this nevertheless implies that rapid evolution has occurred from small amounts of standing variation. Understanding the genomic extent and genetic basis of these changes, and testing the extent to which phenotypic evolution occurred via positive selection or relaxation of stabilizing selection, will be of considerable interest in future investigations.
Methods
Species Sampling and DNA Sequencing
We collected nucleotide polymorphism data from 39 large exons (see SI Table
6) in C. grandiflora and C. rubella. A total of 20 diploid C. grandiflora and 14 C rubella individuals were used for this study. The C. grandiflora samples originated
28 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 from eight natural localities- in Greece. The plants from which sequences were obtained were four individuals from Ioaninna, three individuals from the Katara Pass; two individuals from Metsovo, four individuals from Sokraki on the island of Corfu; three individuals from Doukades, Corfu; two individuals from Votonosi, Corfu; one individual from Paleokastritsas, and one individuals from Pantokrator, Corfu. The C. rubella samples originated from four natural localities. Four individuals were from
Buenos Aires, Argentina; four were from Cumbre Dorsal, Tenerife, Spain; three were from Caldera de Taburiente, La Palma, Spain and three were from Ioaninna, Greece.
Outgroup data from Boechera stricta was obtained using DNA kindly provided by
B.H. Song and T. Mitchell-Olds.
Seeds were placed at 4°C on wet filter paper for 14 days before being allowed to germinate at room temperature. The seedlings were grown at 20°C under conditions of 18 hours of light and 6 hours of darkness. After 6 weeks of growth,
DNA was extracted from leaf material using a FastDNA® Kit and the FastPrep*
Instrument (Qbiogene, Inc., CA).
PCR primers for the large exons were designed as described by Wright et al. and Ross-Ibarra et al. (27, 28). Briefly, primers were designed to amplify 650-700 bp from single large exons based on the A. thaliana genome sequence, chosen with no a priori expectation as to their function or the action of selection upon these genes.
Each exon was used as a BLAST query against the shotgun genome sequence of
Brassica oleracea and homologous regions were used to design primers using
PrimerQuest (Integrated DNA Technologies). PCR reactions were carried out in
25uL volumes on an Eppendorf Mastercycler. The cycles were as follows: 2 minutes
29 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 at 94°C, 20 seconds at 94°C, 20 seconds at 55°C, 40 seconds at 72°C, for 35 cycles, with a final extension time of 4 minutes at 72°C.
These products were sequenced directly using ABI sequening by Cogenics
(Houston, TX). Chromatograms were checked manually for heterozygous sites, using
Sequencher version 4.7 (Gene Codes, Ann Arbor, MI), with the aid of the 'Call secondary peaks' option. Sequences were aligned using Genedoc (29). Consistent with high levels of selfing, no heterozygous sites were identified in our C. rubella dataset. This complete lack of heterozygosity in C. rubella also allowed us to confirm that we were sequencing single copy regions only. Nucleotide sequences are deposited in GenBank under accession numbers (FJ182244-FJ183352).
Sequence Statistics and Analysis
Sequence-based summary statistics 9 (30) and ;r(31) synonymous and nonsynonymous, as well as frequency data were calculated using a modified version of Perl code (Polymorphurama) written by D. Bachtrog and P. Andolfatto (U.C. San
Diego). The frequency spectra of derived polymorphic variants, and the number of shared derived polymorphisms, unique polymorphisms and fixed differences were calculated using Perl scripts written by S. Wright.
Population recombination rate estimates were calculated using the composite likelihood approach of Hudson (32), using the maxdip program for diploid unphased data for C. grandiflora, and ldhat for C. rubella (33) using data processing code written by J. Ross-Ibarra (http://rossibarra.googlepages.com/ldpipereadme). Linkage disequilibrium, calculated as the squared correlation coefficient r2 was calculated
30 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 using Weir's algorithm (34) for unphased data using R script kindly provided by
Stuart MacDonald (U. Kansas). DNAsp (version 4.0) (35) was used to calculate linkage disequilibrium in C. rubella. To assess significance of the mean and variance of Tajima's D estimates we used Jodi Hey's HKA program
(http://lifesci.rutgers.edU/heylab/DistributedProgramsandData.htm#HKA)
In order to infer haplotype data in C. grandiflora we used PHASE 2.1 which implements a Bayesian statistical method to reconstruct haplotypes from diploid data
(36, 37). Haplotypes with the highest posterior probabilities were used for cluster analysis performed with the program STRUCTURE (18). The program was run under the haploid model assuming values of k (population number) from 1 to 7, each with
1,000,000 repetitions and a burn in period of 100,000.
Coalescent Simulations
Coalescent simulations were conducted using MIMAR which estimates the parameters of an isolation-migration model based on Hudson's ms (38). Because
MIMAR makes use of outgroup information to infer the number of derived SNP fixations, we made every effort to minimize errors associated with ancestral state inference, as described in the Supplemental Information. Simulations using a modified version of MIMAR that does not rely on outgroup inference provide comparable estimates under all models (SI Table 7).
Simulations were conducted using the 25 loci for which we had sequence data from three outgroup species for ancestral state inference. Furthermore, sites with more than 2 segregating bases were excluded from the anaysis. To allow for locus-specific
31 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 mutation rates, the mutation rate scalar was estimated using 0syn for each of the 26 loci, dividing by the mean 0syn. We ran a series of MIMAR coalescent simulations that differed in the degree of constraint on migration rates and effective population sizes (Table 1). Migration rates were either unconstrained (asymmetric migration), constrained to be symmetrical or set to zero (no migration), whereas effective population sizes were either unconstrained or assumed to be identical in C. grandiflora and the ancestor of C. rubella and C. grandiflora.
Prior limits for the Bayesian procedure implemented in MIMAR were set based on initial runs using wide priors. Priors for #were uniform 0.001-0.1 for both
C. grandiflora and the ancestral species, and uniform 0-0.0025 for C. rubella. All runs assumed an exponentially distributed prior with rate 1 for pi6, and a mutation rate per bp of 1.5xl0"8 (39). Migration rate priors were log uniform -5-2.5 for migration from C. grandiflora to C. rubella (forward in time) and log uniform -5-6 for migration from C. rubella to C. grandiflora. The prior for the time of the split between C. rubella and C. grandiflora was uniform 0-4x106.
Each simulation was run for a total of 20,160 minutes (2 weeks) with a burnin of 100,000 steps, and we performed three sets of simulations with different random seeds for each model. Mixing was monitored by assessing parameter autocorrelation over runs and we considered that MIMAR reached convergence when the posterior distributions from independent runs were highly similar (19). The mode of the marginal posterior probability distribution was considered as a point estimate for each parameter, and we calculated 90% highest posterior density (HPD) intervals from the
MIMAR output using the boa package in R 2.6.2 (40).
32 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 Acknowledgements
We thank C. Becquet for assistance and advice with running Mimar, and S.C.H.
Barrett and A. Cutter for comments on the manuscript. This work was supported by a
Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery grant, an Early Researcher Award from the Ontario Ministry of Research and
Innovation, and an Alfred P. Sloan Foundation Fellowship to SIW.
33 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 References
1. Charlesworth D (2006) Evolution of plant breeding systems. Curr Biol 16(17):R726-735. 2. Stebbins GL (1950) Variation and Evolution in Plants (Columbia University Press, New York, NY). 3. Barrett SCH (2002) The evolution of plant sexual diversity. Nature reviews 3((4)):274-284. 4. Fisher RA (1941) Average excess and average effect of a gene substitution. Ann. Eugen. 11:53-63. 5. Darwin CR (1876) The effects of cross and self-fertilization in the vegetable kingdom (John Murray, London). 6. Charlesworth D & Charlesworth B (1987) Inbreeding depression and its evolutionary consequences, ares 18:237-268. 7. Koch MA, Haubold B, & Mitchell-Olds T (2000) Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Molecular biology and evolution 17(10):1483-1498. 8. Tang C, et al. (2007) The evolution of selfing in Arabidopsis thaliana. Science 317(5841):1070-1072. 9. Schoen DJ, Morgan MT, & Bataillon T (1996) How does self-pollination evolve? Inferences from floral ecology and molecular genetic variation. Philos. Trans. R. Soc. Lond. B 351:1281 -1290. 10. Hurka H, Freundner S, Brown AH, & Plantholt U (1989) Aspartate aminotransferase isozymes in the genus Capsella (Brassicaceae): subcellular location, gene duplication, and polymorphism. Biochemical genetics 27(1- 2):77-90. 11. Hurka H & Neuffer B (1997) Evolutionary processes in the genus Capsella (Brassicaceae)*. Plant Systematics and Evolution 206:295-316. 12. Paetsch M, Maryland-Quellhorst S, & Neuffer B (2006) Evolution of the self- incompatibility system in the Brassicaceae: identification of S-locus receptor kinase (SRK) in self-incompatible Capsella grandiflora. Heredity 97:283-290. 13. Koch M & Kiefer M (2005) Genome evolution among cruciferous plants: a lecture from the of the genetic maps of three diploid species— Capsella rubella, Arabidopsis lyrata subsp. petraea, and A. thaliana. Am J Bot 92(4):761-767. 14. Charlesworth D & Wright SI (2001) Breeding systems and genome evolution. Curr Opin Genet Dev 11(6):685-690. 15. Wright SI, Ness RW, Foxe JP, & Barrett SCH (2008) Genomic consequences of outcrossing and selfing in plants. International Journal of Plant Sciences 169(1):105-118. 16. Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595.
34 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 17. Wright SI & Gaut BS (2005) Molecular population genetics and the search for adaptive evolution in plants. Molecular biology and evolution 22(3):506-519. 18. Pritchard JK, Stephens M, & Donnelly P (2000) Inference of population structure using multilocus genotype d&t&.Genetics 155(2):945-959. 19. Becquet C & Przeworski M (2007) A new approach to estimate parameters of speciation models with application to apes. Genome Res 17(10): 1505-1519. 20. Kageyama M, et al. (2001) The Last Glacial Maximum climate over Europe and western Siberia: a PMIP comparison between models and data. Climate dynamics 17(l):23-43. 21. Ammerman AJ & Cavalli-Sforza LL (1984) The Neolithic Transition and the Genetics of Populations in Europe (Princeton Univ. Press, Princeton). 22. Baker G (1985) Prehistoric Farming in Europe (Cambridge University Press, Cambridge). 23. Pinhasi R, Fort J, & Ammerman AJ (2005) Tracing the origin and spread of agriculture in Europe. PLoS Biol 3(12):e410. 24. Baker HG (1955) Self-compatibility and establishment after long-distance dispersal. Evolution 9:347-349. 25. Pannell JR & Barrett SCH (1998) Baker's law revisited: reproductive assurance in a metapopulation. Evolution 5(657-668). 26. Eckert CG, Samis KE, & Dart S (2006) Reproductive assurance and the evolution of uniparental reproduction in flowering plants. Ecology and Evolution of Flowers, eds Harder LD & Barrett SCH (Oxford University Press, Oxford), pp 183-203. 27. Wright SI, et al. (2006) Testing for effects of recombination rate on nucleotide diversity in natural populations of Arabidopsis lyrata. Genetics 174(3):1421- 1430. 28. Ross-Ibarra J, et al. (2008) Patterns of Polymorphism and Demographic History in Natural Populations of Arabidopsis lyrata. PloS One 3(6):e2411. 29. Nicholas KB, Nicholas, H.B. Jr., Deerfield, D.W. II. (1997) GeneDoc: Analysis and Visualization of Genetic Variation. EMBNEW.NEWS 4:14. 30. Watterson GA (1975) On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:188-193. 31. Tajima F (1993) Measurement of DNA polymorphism. Mechanisms of molecular evolution, eds Takahata N & Clark A (Japan Scientific Societies Press, Tokyo), pp 37-60. 32. Hudson RR (2001) Two-locus sampling distributions and their application. Genetics 159:1805-1817. 33. McVean G, Awadalla P, & Fearnhead P (2002) A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160(3):1231-1241. 34. Weir BS (1990) Genetic Data Analysis (Sinauer Assoc, Inc., Sunderland, MA). 35. Rozas J, Sanchez-DelBarrio JC, Messeguer X, & Rozas R (2003) DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19(18):2496-2497.
35 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 36. Stephens M, Smith NJ, & Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. American journal of human genetics 68(4):978-989. 37. Stephens M & Donnelly P (2003) A comparison of bayesian methods for haplotype reconstruction from population genotype data. American journal of human genetics 73(5):1162-1169. 38. Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18(2):337-338. 39. Koch M, Haubold B, & Mitchell-Olds T (2001) Molecular systematics of the Brassicaceae: evidence from coding plastidic matK and nuclear Chs sequences. Am J. Bot. 88:534-544. 40. Smith BJ (2007) boa: An R Package for MCMC Output Convergence Assessment and Posterior Inference. Journal of Statistical Software 21(11): 1- 37.
36 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
Table 1. Modes of parameter estimates under a range of MIMAR models, with 90%
HPD intervals in parentheses.
a a a Model Ne(Cg) Ne(Cr) ~ %TlW T*
1 Ancestral size constrained, no 488.2" 3.0 488.2" - - 13.6 migration (412.2, (0.5, (412.2, (1.5,
536.0) 11.5) 536.0) 51.7)
2 Ancestral size constrained, 463.6" AJ 463.6" 1.9" 1.9" 151) symmetrical migration (352.4, (2.0, (352.4, (0.007, (0.007, (0.8,
602.2) 17.5) 602.2) 4.2) 4.2) 3546.8)
~3 Ancestral size constrained, 487.5" 03 487.5" O 509 21/7 asymmetrical migration (400.4, (0.06, (400.4, (0.009, (7.0, (3.5
540.9) 6.0) 540.9) 3.1) 336.9) 3579.5)
4 Ancestral size unconstrained, 525.0 04 283 lo 73~0 1395.07 asymmetrical migration (410.4, (0.06, (16.7, (0.01, (13.3, (13.3,
652.1) 5.5) 508,5) 4.2) 399.8) 3156.1) a Effective population size (effective number of individualsxlO"3) for C. rubella (Cr),
C. grandiflora (Cg) and their ancestor (A)
Migration rate (4Nem) from C. grandiflora to C. rubella
0 Migration rate (4Nem) from C. rubella to C. grandiflora d Time (ka) of the split of C. rubella and C. grandiflora e Constrained f Bimodal distribution, with second mode <30,000 years. See also goodness-of-fit tests in SI Text.
37 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
Figure 1. Floral organs and petals are reduced in C. rubella (left) compared to C. grandiflora (right).
38 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
B 1.0 2- • 0.08- 0.8- 1- I 0.06- 0 0.6 Q 1 i 0.04- L0.4 FO-
« r- 1 0.02- 0.2 -1- I i 1 1 0- oJ - —i r 1 1 C. grandiflora C. rubella C. grandiflora C. rubella C. grancliflora C. rubella
Figure 2. Comparison of polymorphism patterns between C. grandiflora and C. rubella. Bars represent the median, boxes the interquartile range, and whiskers extend out to 1.5-times the interquartile range, a) n synonymous where n is the average pairwise differences (16) b), the population recombination estimator p per base pair, using the composite likelihood estimator of Hudson (32) c) Tajima's D (16) in C. grandiflora and C. rubella.
39 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
-80
-70 t
-60
-50
>• 40
Figure 3. Derived SNP frequencies in C grandiflora and C rubella calculated using
A thaliana as an outgroup. For this plot, we randomly subsampled the data to include
12 individuals from each species. Values where the derived allele frequencies are greater than 0 and less than 12 would represent polymorphisms, a frequency of 12 is a derived fixation, and a frequency of zero implies a complete absence of the derived
SNP. For example, values with 0 or 12 for one species and not in the other represent unique polymorphism in the other species, while derived frequencies greater than zero and less than 12 in both species represent shared polymorphisms.
40 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
C. amneMfkm „ . ^ A i Ancestor Symmetric migratert 0.081 1 _.»- NomtgratKxn f | • c ' *» •S&04- • OS a 0.02' y* ^0* 2 y// %. „ „ • -*
k 02- 0.04. P f1 J• *l Ot- jl 0 02 ! v\ 0- K J?!..*^ ™'MI§ 0 200,000 400,000 , ^ , g j ^ ^ Divwgenoft Hem (yetrsl Migration (4Nm)
Figure 4. Smoothed marginal posterior distributions of speciation parameters
estimated by MIMAR, for two models with posterior modes showing good fit to data
summaries, assuming either symmetric migration (solid lines) or no migration (dashed
lines) and equal effective population sizes in the ancestor as in present-day C. grandiflora. (A) Marginal densities for effective population size (individuals) in C.
rubella (grey) and C. grandiflora/the ancestral species. (B) Marginal densities for
divergence time (years). (C) Marginal density for migration (4Nm), where N is the
effective population size of C. grandiflora.
41 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
Supplementary Information
MIMAR Model Assessment
Mimar simulates samples under a model of isolation with migration, with six parameters, where population 1 is C. grandiflora and population 2 C. rubella: Q\y 62,
0a, M12 (migration rate from population 1 to population 2), M21 (migration rate from population 2 to population 1), andx (divergence time). We considered four nested models of demographic history: model 1 assumed no migration and constrained ancestral effective populations sizes to be equal to present day C. grandiflora, which is population 1 (i.e. 8a=6i, Mu= M2i=0); model 2 allowed for symmetric migration
= (9a 6i,Mi2= M21), model 3 allowed for asymmetric migration (8a=0i), and model 4 allowed for both asymmetric migration and ancestral effective population sizes to differ allowing for asymmetrical migration and free effective population sizes for each population.
Goodness-of-fit Tests and Likelihood Ratios
We performed goodness-of-fit tests using MIMARgof (1) and using a modified version of ms, mspopr, which can simulate a lineage-specific change in population recombination rate (Stahl, E. unpublished). For each model, we generated
10,000 simulations of 25 loci using the same number of sites and individuals sequenced per locus as in the original data set. Another set of 10,000 simulations of
25 loci were run in mspopr, including a lineage-specific change in recombination in
C. rubella, but with parameter estimates for each model otherwise unchanged.
42 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
Specifically, we set the population recombination rate to zero in the C. rubella lineage
at the time of the split of C. grandiflora and C. rubella. Finally, we also performed
goodness-of-fit tests using the predictive posterior distributions of each model. Model
fit was assessed by calculating one-tailed P-values for observed summary statistics,
based on distributions of simulated statistics. In addition to the mean over all loci of
shared and unique variants used by MIMAR, we performed goodness-of-fit tests on
mean nucleotide diversity estimates, mean Fst and mean Tajima's D, to assess how
well the model could accommodate aspects of data not directly used by MIMAR. To
further assess the fit of the data to a model with no locus-specific positive selection,
we also assessed the fit of the data to the variance in pi and Tajima's D (varal, var7i2,
varTajDl, varTajD2).
Goodness-of-fit tests were performed in two ways, 1) Simulating under the
Bayesian posterior distribution of parameters (i.e. 'predictive posterior'), and 2)
Simulating under the marginal modes of the posterior parameter distributions. The
second approach allowed us to assess the degree to which a single demographic
model can explain all aspects of the data, while the first approach allows a
comparison of the overall fit of each inferred posterior distribution. As shown in
Table 2, predictive posterior simulations suggest that all models fit the data equally
well under this criterion, suggesting that Model 1 is sufficient to explain our data,
without the need to invoke migration and changes in effective population size
between population 1 and the ancestral population. In other words, we cannot reject the null hypothesis of no migration and equal effective population sizes between population 1 and the ancestral population. Using a likelihood ratio test from
43 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 maximum likelihood estimates across the MIMAR runs provides a similar conclusion, i.e. the more parameter-rich models do not provide a significant improvement to the likelihood over model 1 (Table 2). Although the likelihoods in this analysis should be considered as approximate, the combination of results from the predictive posterior and the likelihood analysis is consistent with the hypothesis that model 1 adequately explains the data.
As shown in Table 4, simulations under the marginal modes provide further evidence in favor of the simpler models over the parameter-rich models. In particular, best-fit parameters under models 1 and 2 are consistent with all tested summary statistics, whereas model 3 fails one goodness-of-fit test, and model 4 fails five tests.
Very similar conclusions are obtained when we allow for a lineage-specific change in recombination (Table 5). One possible explanation for the poor fit of model 4 is that the combination of modes from the marginal posterior distributions is distinct from the best-fitting parameter set. To explore this possibility, we also conducted goodness-of-fit tests under the maximum likelihood parameter estimates for each model. Although these simulations improve the fit considerably, the unconstrained model still shows a poorer fit to the data than the simpler models for parameters not used in estimation (Tables 4 and 5). In particular, the observed values for nucleotide diversity, Tajima's D, and the variance of Tajima's D in C. grandiflora have low probabilities under this model. This inconsistency is likely because high gene flow and population growth inferred in C. grandiflora under this model are not consistent with the patterns consistent with stable equilibrium in this species.
44 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
Exploring the Effects of Positive Selection on Diversity
If recurrent positive selection reduced diversity in C. rubella across the genome, we would predict that these positive selection events would erode any ancestral variation such that the majority of segregating variation would be unique to
C. rubella. In contrast with this expectation, we found that 84% of variation in C. rubella is shared with C. grandiflora. To illustrate this expectation, we simulated
10,000 39-gene datasets under a) the inferred bottleneck model assuming no migration; b) the selective sweep model of Thornton and Jensen (2007), allowing for only a slight bottleneck of a twofold reduction in effective population size (2), and c) the selective sweep model of Kim and Innan (2008) under a slight bottleneck (3), with positive selection acting on standing variation. All simulations assumed a divergence time of 14,000 years, and an ancestral 9 equal to 0 in C. grandiflora of 0.03 per base pair.
Under the bottleneck model, 60% of 39-gene datasets show 84% or more shared polymorphisms in C. rubella, consistent with our observed data (SI Figure 8).
In contrast, simulating under a model of selective sweeps, allowing for only a slight bottleneck (reduction of Ne by half in C. rubella) with a selective sweep during the bottleneck (4Ns=6500), no 39-gene datasets in 10,000 were found to have this high a fraction of shared polymorphisms; the maximum proportion of shared polymorphisms was found to be 0.46 (SI Figure 8). Similarly, the model of selection from standing variation generated no datasets with the observed fraction of shared polymorphism, and the maximum proportion observed was 0.79. Although these simulations are not exhaustive, they illustrate that under reasonable parameter values, we would not
45 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 expect genome-wide positive selection to lead to the observed maintenance of shared polymorphism.
Generating Data Summaries and the Use of Ancestral State Inference
Because MIMAR relies on the inference of derived SNPs, we made every effort to minimize errors associated with ancestral misinference. To determine derived states we used PAML (4) to perform likelihood reconstruction of ancestral states for the common ancestor of Capsella under various substitution models. These likelihood reconstructions were based upon sequence data from A. thaliana, A. lyrata, B. stricta,
C. grandiflora and C. rubella. Given that the phylogenetic position of Capsella in relation to Boechera and Arabidopsis shows some uncertainty (5, 6), we assumed a star-shaped phylogeny for the three genera. For the purposes of these reconstructions we also assumed that within-Cop^e/Za genealogies are star shaped. Data reported here are from likelihood reconstructions under the Kimura 2 Parameter (K80) model which distinguishes between transitions and transversions. The reconstructed ancestor of Capsella was then used to infer ancestral states for the Capsella polymorphism data. Using the method of Baudry and Depaulis (2003) we calculated the rate of ancestral misinference in our dataset as 0.084/base, using transition./transversion ratios estimated directly at synonymous sites (7). To explore any effect that this residual error may have on demographic inference, we generated a modified version of MIMAR code to use data summaries that do not rely on outgroup inference. In particular, we modified Ss to include only shared polymorphisms, while SI and S2 represent polymorphisms unique to population 1 and 2 respectively, regardless of
46 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 ancestral vs. derived states. As shown in Table 7, parameter estimates using this approach are very much in line with estimates reported in Table 1 of the text.
References
1. Becquet C & Przeworski M (2007) A new approach to estimate parameters of speciation models with application to apes. Genome Res 17(10): 1505-1519. 2. Thornton KR & Jensen JD (2007) Controlling the false-positive rate in multilocus genome scans for selection. Genetics 175(2):737-750. 3. Innan H & Kim Y (2008) Detecting local adaptation using the joint sampling of polymorphism data in the parental and derived populations. Genetics 179(3):1713-1720. 4. Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555-556. 5. Koch M, Haubold B, & Mitchell-Olds T (2001) Molecular systematics of the Brassicaceae: evidence from coding plastidic matK and nuclear Chs sequences. Am. J. Bot. 88:534-544. 6. Al-Shehbaz IA, Beilstein MA, & Kellog EA (2006) Systematics and phlogeny of the Brassicaceae (Cruciferae): an overview. Plant Systematics and Evolution 259:89-120. 7. Baudry E & Depaulis F (2003) Effect of misoriented sites on neutrality tests with outgroup. Genetics 165(3): 1619-1622.
47 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
SI Table 2. Predictive posterior probabilities from simulations of the posterior distributions
Sla S2b Ssc sf Fste nl* Jt2g TajDlh TajD2' pval pval pval pval pval pval pval pval pval Model 1 0.42 0.25 0.44 0.35 0.47 0.42 0.28 0.34 0.27 Model 2 0.37 0.27 0.42 0.50 0.40 0.37 0.06 0.31 0.47 Model 3 0.44 0.39 0.39 0.38 0.46 0.42 0.25 0.33 0.48 Model 4 0.39 0.34 0.43 0.43 0.45 0.24 0.30 0.16 0.48
Values shown are 1-tailed probabilities of the observed data from simulations under the posterior parameter distributions. aMean number of unique polymorphisms in C. grandiflora. bMean number of unique polymorphisms in C. rubella. c Mean number of shared polymorphisms. dMean number of fixed differences. eMean population differentiation. fAverage pairwise differences in C. grandiflora. gAverage pairwise differences in C. rubella. hAverage Tajma's D in C. grandiflora.
'Average Tajima's D in C. rubella.
48 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
SI Table 3. Likelihood ratio test comparing model 1 to other models
Last column shows twice the difference in In likelihoods between the model and model 1, and significance is given using the Chi Squared approximation (ns= not significant), with the number of degrees of freedom equal to the difference in the number of free parameters.
Model Description No. of parameters Ln likelihood 2(L2-L1)*
1 ancestral theta 3 (thl,th2,t) -117.68 constrained, no migration 2 ancestral theta 4 (thl,th2,M,t) -117.66 0.04 ns constrained, symmetric migration 3 ancestral theta 5 (thl,th2,ml2,m21,t) -116.29 2.78 ns constrained, asymmetric migration 4 Asymmetric 6 (thl,th2,thA,ml2,m21,t) -114.08 7.3 ns migration, ancestral theta distinct
*Last column shows twice the difference in ln likelihoods between the model and model 1, and significance is given by using the Chi squared approximation (ns = not significant), with the number of degrees of freedom equal to the difference in the number of free parameters.
49 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
SI Table 4. Goodness-of-fit tests based on simulations under the marginal modes (a) and under maximum likelihood parameter estimates (b).
var var var var d c e h 1 m Model sr S2 Ss «1' n2 TDl TD2' Fst St" TTl' n2 TD1" TD2° la 0.411 0.301 0.372 0.332 0.231 0.319 0.220 0.385 0.456 0.497 0.226 0.278 0.386 lb 0.455 0.492 0.404 0.302 0.190 0.314 0.215 0.332 0.470 0.480 0.206 0.286 0.397
2a 0.196 0.118 0.237 0.202 0.286 0.324 0.150 0.099 0.257 0.413 0.459 0.290 0.385
2b 0.207 0.405 0.486 0.472 0.004* 0.460 0.459 0.004* 0.323 0.428 0.018* 0.312 0.276
3a 0.326 0.125 0.476 0.326 0.017* 0.325 0.468 0.043* 0.420 0.491 0.050 0.294 0.421
3b 0.408 0.201 0.481 0.300 0.048 0.328 0.466 0.113 0.417 0.483 0.092 0.283 0.469 4a 0.442 0.213 0.009* 0.001* 0.008* 0.031* 0.470 0.100 0.468 0.018* 0.026 0.036* 0.421 4b 0.215 0.343 0.257 0.072* 0.202 0.055* 0.472 0.392 0.315 0.140 0.142 0.045* 0.483
Values shown are 1-tailed P values of the observed mean and variances of summary statistics by using coalescent simulations under the various parameter combinations. Significant and marginally significant departures are shown with an asterisk. cMean number of unique polymorphisms to C. grandiflora. Mean number of unique polymorphisms to C. rubella. eMean number of shared polymorphisms. Average pairwise differences in C. grandiflora. 8Average pairwise differences in C. rubella. hAverage Tajma's D in C. grandiflora. 'Average Tajima's D in C. rubella. ^Mean differentiation. kMean number of fixed differences. 'Variance in pairwise differences in C. grandiflora. "Variance in pairwise differences in C. rubella. "Variance in Tajma's D in C. grandiflora. "Variance in Tajima's D in C. rubella.
50 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
SI Table 5. Goodness-of-fit test P values based on simulations under the marginal modes (a) and under maximum likelihood parameter estimates (b), allowing for a lineage-specific change in recombination in population 2.
var var var var d e e h Model sr S2 Ss Kl' n2 TDl TD2' Fst' Sf" Til' 7f2m TD1" TD2" la 0.415 0.309 0.372 0.337 0.238 0.333 0.219 0.384 0.439 0.498 0.236 0.278 0.397 lb 0.450 0.492 0.406 0.305 0.187 0.332 0.221 0.327 0.478 0.482 0.202 0.284 0.404
2a 0.193 0.115 0.232 0.200 0.287 0.321 0.151 0.098 0.256 0.411 0.448 0.290 0.380
2b 0.206 0.407 0.497 0.475 0.004* 0.460 0.453 0.003* 0.314 0.428 0.018* 0.318 0.262
3a 0.319 0.128 0.475 0.331 0.016* 0.329 0.468 0.042* 0.423 0.495 0.046* 0.285 0.424
3b 0.399 0.204 0.470 0.316 0.054* 0.328 0.473 0.109 0.405 0.475 0 100 0.282 0.475
4a 0.437 0.210 0.009 0.001* 0.007* 0.031 0.469 0.099 0.464 0.017* 0.024* 0.037* 0.414
4b 0.221 0.338 0.250 0.063* 0.201 0.054* 0.475 0.380 0.310 0.131 0.144 0.044* 0.490
Values shown are 1-tailed P values of the observed mean and variances of summary statistics by using coalescent simulations under the various parameter combinations. Significant and marginally significant departures are shown with an asterisk. cMean number of unique polymorphisms to C. grandiflora. dMean number of unique polymorphisms to C. rubella. eMean number of shared polymorphisms. fAverage pairwise differences in C. grandiflora. sAverage pairwise differences in C. rubella. hAverage Tajma's D in C. grandiflora. 'Average Tajima's D in C. rubella. 'Mean differentiation. kMean number of fixed differences. Variance in pairwise differences in C. grandiflora. "Variance in pairwise differences in C. rubella. "Variance in Tajma's D in C. grandiflora. °Variance in Tajima's D in C. rubella.
51 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
SI Table 6. Sequence-based summary statistics as estimated for each locus in both C. grandiflora and C. rubella.
52 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
Sample Synonymous Theta S Tajima's D Replacement Theta S 7T Tajima's D Species Locus Size Sites Synonymous Syn Synonymous Synonymous Sites Replacement Rep Replacement Replacement C grandiflora Atlg01040 34 1201095238 0012217381 6 0 012481181 0 059785052 419 8904762 0 002912315 5 0 001095269 -1 649659063 C grandiflora Atlg03560 28 128 1551724 0 006015516 3 0 00487174 -0 456422119 459 8448276 0 007264736 13 0 004245739 -1 392653386 C grandiflora Atlg04650 32 149 4090909 0 024929075 15 0 020308498 -0 616260234 474 5909091 0 005755263 11 0 006346722 0 326203043 C grandiflora Atlg06520 28 115 3045977 0 02451511 11 0 022484729 -0 270635841 400 6954023 0 005771862 9 0 004819663 -0 520883643 C grandiflora Atlg06530 38 111 4188034 0 034178123 16 0 036921937 0 260600475 440 5811966 0008103115 15 0 006715557 -0 550965203 C grandiflora Atlg 10900 26 139 7407407 0 003750622 2 0 00457991 0 473577154 448 2592593 0 0 0 0 C grandiflora Atlgll050 32 118 4646465 0 044017205 21 0 052792394 0 689702738 352 5353535 0 004226107 6 0 0035629 -0 440821078 C grandiflora Atlg 15240 30 111 9516129 0 022547226 10 0 011417084 -1 563384883 410 0483871 0 015389637 25 0 009127042 -1 448709834 C grandiflora Atlg31930 8 108 0925926 0 024976036 7 0 023458724 -0 289096242 317 9074074 0001213167 1 0 000786392 -1 054819107 C grandiflora Atlg59720 30 114 3494624 0 077260479 35 0 093502442 0 76792837 419 6505376 0 015639002 26 0 016340909 0160323884 C grandiflora Atlg62390 36 124 472973 0 025185881 13 0 021551182 -0 459368216 466 527027 0 003101439 6 0 002534772 -0 499232495 C grandiflora Atlg62520 28 129 8045977 0 075228311 38 0 079260365 0199241023 416 1954023 0 00617434 10 0 00359772 -1 342343614 C grandiflora Atlg65450 22 117 6811594 0 058276411 25 0 067207792 0 580648839 389 3188406 0 004932333 7 0 003891801 -0 6750978 C grandiflora Atlg68530 36 135 7792793 0 030192862 17 0 033469352 0 358580449 443 2207207 0 001632263 3 0 000605238 -1 409039404 C grandiflora Atlg72390 40 115 2804878 0 014275502 7 0 006561477 -1 496242606 409 7195122 0 004016618 7 0 00269728 -0 909511059 C grandiflora Atlg74600 30 121 1236559 0 025007815 12 0 018258153 -0 880694329 406 8763441 0 003722308 6 0 004418299 0 533564216 C grandiflora Atlg78850 36 103 6621622 0 0604842 26 0 08571801 1 442991383 340 3378378 0 009919885 14 0 007606821 -0 750466787 C grandiflora At2g23170 36 123 3648649 0 050824219 26 0 050784957 -0 002671947 407 6351351 0 005915851 10 0 003247535 -1 374013943 C grandiflora At2g26730 38 132 1538462 0 052228182 29 0 050471371 -0 116404366 425 8461538 0 0016767 3 0 001299396 -0 496927108 C grandiflora At2g28050 32 115 4848485 0 025801691 12 0 025226742 -0 071730967 394 5151515 0 00566462 9 0 001998165 -1 98260169 C grandiflora At2g44900 36 123 1711712 0 007831407 4 0 005940887 -0 59129855 386 8288288 0 000623405 1 0 000143618 -1 133212888 C grandiflora At2g47430 38 125 5726496 0 005686079 3 0 005290132 -0 153773609 402 4273504 0 000591424 1 0 00067867 0 213763113 C grandiflora At3gl0340 36 135 3423423 0 040981043 23 0 035535988 -0 454237424 428 6576577 0 004500578 8 0 003036426 -0 948751959 C grandiflora At3g23590 26 106 5061728 0 056591271 23 0 042207738 -0 923383112 313 4938272 0 005015551 6 0 002129842 -1 702992442 C grandiflora At3 £26650 38 93 73504274 0 012695644 5 0 024174554 2 328116714 287 2649573 0 0 0 0 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
C grandiflora At3g44530 38 114 4059829 0 018723218 9 0 018065981 -0 103660344 356 5940171 0 005339525 8 0 003219172 -1 144812013 C grandiflora At3g60750 22 145 6086957 0 003767926 2 0 002318971 -0 871247959 448 3913043 0 00061179 1 0 000202745 -1 162402043 C grandiflora At3g62890 38 103 8504274 0 022918091 10 0 010738718 -1 601213668 352 1495726 0 007434508 11 0 003728372 -1 527871009 C grandiflora At4g08840 28 35 71264368 0 021586739 3 0 009555995 -1 337838808 123 2873563 0 00625303 3 0 004699307 -0 596460285 C grandiflora At4gl4190 30 91 87096774 0 049455853 18 0 045741315 -0 258696163 301 1290323 0 015926651 19 0 011481694 -0 967102366 C grandiflora At4g14370 22 127 0652174 0 034542441 16 0 036930962 0 25117467 451 9347826 0 013960848 23 0 00758643 -I 718251797 C grandiflora At4g38160 34 132 0 012969655 7 0 007778318 -1 148619662 429 0 001710284 3 0 001337937 -0 494978031 C grandiflora At5g04190 32 127 0707071 0 019540986 10 0 018103332 -0 229741211 412 9292929 0 010824024 18 0 008012189 -0 883634204 C grandiflora At5g20280 20 112 9603175 0 044915357 18 0 040815451 -0 343415808 373 0396825 0 0 0 0 C grandiflora At5g41920 30 125 0967742 0 048427116 24 0 050627471 0 161166801 390 9032258 0 005165879 8 0 005786775 0 365120611 C grandiflora At5g43670 32 112 5656566 0 02205901 10 0 028746664 0 946713864 322 4343434 0 001540212 2 0 000737835 -1 046838824 C grandiflora At5g51670 36 129 6936937 0 044625309 24 0 05319004 0 658884766 407 3063063 0 004144439 7 0 001340592 -1 916647035 C grandiflora At5g53020 34 64 30952381 0 049439308 13 0 080853398 2 045386197 241 6904762 0 020238335 20 0 033977844 2 310606856 C grandiflora At5g66280 30 135 1505376 0 013073857 7 0 010971163 -0 47519625 395 8494624 0 001912999 3 0 000505243 -1 731782748 C rubella Atlg01040 13 120 047619 0 0 0 0 419 952381 0 0 0 0 C rubella Atlg03560 12 128 1538462 0 01033568 4 0 012768744 0 82792558 459 8461538 0 001440218 2 0 001779251 0 687881658 C rubella Atlg04650 14 149 5 0 0 0 0 477 5 0 0 0 0 C rubella Atlg06520 14 114 7444444 0 002740457 1 0 001245003 -1 155241342 401 2555556 0 0 0 0 C rubella Atlg06530 14 112 5777778 0 0 0 0 442 4222222 0 0 0 0 C rubella Atlg 10900 12 139 6410256 0 0 0 0 451 3589744 0 0 0 0 C rubella Atlgll050 13 120 5952381 0 026721361 10 0 038271699 1 726825312 359 4047619 0 001793226 2 0 001712233 -0 126877208 C rubella Atlg 15240 14 113 3777778 0 0 0 0 408 6222222 0 0 0 0 C rubella Atlg31930 14 1082111111 0 002905914 1 0 002437238 -0 341438343 317 7888889 0 0009895 1 0 00082991 -0 341438343 C rubella Atlg59720 14 114 0777778 0 002756472 1 0 00433481 1 212185563 419 9222222 0 0 0 0 C rubella Atlg62390 13 124 3928571 0 007771674 3 0 003710329 -1 652312061 466 6071429 0 002071851 3 0 000989137 -1 652312061 C rubella Atlg62520 14 129 7333333 0 048476698 20 0 058446179 0 859675381 416 2666667 0 001510821 2 0 002455104 1 695975145 Reprinted with permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
C rubella Atlg65450 12 118 3846154 0 0 0 0 388 6153846 0 0 0 0 C rubella Atlg68530 14 135 7888889 0 0 0 0 443 2111111 0 0 0 0 C rubella Atlg72390 12 115 6794872 0 005725117 2 0 004060331 -0 849714979 409 3205128 0 002426993 3 0 001998878 -0 578636747 C rubella Atlg74600 , 13 121 6309524 0 013246912 5 0 022135054 2 392106459 406 3690476 0 002378972 3 0 003975166 2 121451886 C rubella Atlg78850 14 105 5444444 0 002979334 1 0 001353526 -1 155241342 344 4555556 0 0 0 0 C rubella At2g23170 14 122 6333333 0 0 0 0 408 3666667 0 0 0 0 C rubella At2g26730 13 132 202381 0 004875054 2 0 002327434 -1 468005781 425 797619 0 0 0 0 C rubella At2g28050 14 115 4222222 0 005448729 2 0 002475384 -1 48074498 394 5777778 0 001593867 2 0 001838103 0 415804348 C rubella At2g44900 12 123 6923077 0 0 0 0 386 3076923 0 0 0 0 C rubella At2g47430 14 126 4222222 0 002487317 1 0 002086154 -0 341438343 404 5777778 0 0 0 0 C rubella At3gl0340 13 135 6190476 0 0 0 0 428 3809524 0 0 0 0 C rubella At3g23590 11 106 5833333 0 01921973 6 0 024905821 1 171200707 313 4166667 0 0 0 0 C rubella At3g26650 14 94 02222222 0 0 0 0 286 9777778 0 001095737 1 0 001531688 0 842275109 C rubella At3g44530 14 114 7888889 0 008218187 3 0 011487883 1217964186 3562111111 0 003531077 4 0 004103012 0 533557751 C rubella At3g60750 5 145 4444444 0 0 0 0 448 5555556 0 0 0 0 C rubella At3g62890 11 104 375 0 0 0 0 351 625 0 00097097 1 0 00051708 -1 128501595 C rubella At4g08840 7 35 35416667 0 03463495 3 0 029632124 -0 65405158 123 6458333 0 006602135 2 0 006161993 -0 27492444 C rubella At4gl4190 13 92 73809524 0 006949612 2 0 006635726 -0 126877208 300 2619048 0 0 0 0 C rubella At4g 14370 14 127 1222222 0 012368104 5 0 005618889 -1 889327875 451 8777778 0 00626291 9 0 003623464 -1 617340049 C rubella At4g38160 14 1314444444 0 007176846 3 0 003260476 -1 670526133 429 5555556 0 000732041 1 0 00033257 -1 155241342 C rubella At5g04190 14 127 2333333 0 0 0 0 412 7666667 0 000761816 1 0 000346097 -1 155241342 C rubella At5g20280 12 112 6025641 0 0 0 0 373 3974359 0 0 0 0 C rubella At5g41920 13 124 6309524 0 0 0 0 391 3690476 0 000823384 1 0 000393097 -1 149147105 C rubella At5g43670 12 113 0 011721744 4 0 014481094 0 82792558 322 0 001028383 1 0 001270469 0 540554689 C rubella At5g51670 14 129 5555556 0 007281483 3 0 007888338 0 25513408 407 4444444 0 000771767 1 0 001078823 0 842275109 C rubella At5g53020 14 64 33333333 0 0 0 0 241 6666667 0 0 0 0 C rubella At5g66280 13 136 1785714 0 0 0 0 397 8214286 0 0 0 0 Reprinted by permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009 SI Table 7. Modes of parameter estimates under a range of MIMAR models using
summaries of the data that do not rely on ougroup inference, with 90% HPD intervals in
parentheses.
a b c s Model Ne(Cg)° TV/ ~ MCg_Cr Mc,Cg 7
1 Ancestral size constrained, 503.8C 2A 503.8s - - 93
no migration (442.6, (0.3, (442.6, (1.2,
576.0) 11.1) 576.0) 37.8)
2 Ancestral size constrained, 497.16 \52 497. \e JJe 3je 71
symmetrical migration (378.3, (5.6, (378.3, (1.1, (1.1,10.7) (0.6,
824) 27.6) 824) 10.7) 3673.0)
3 Ancestral size constrained, 493.8e 0~4 493.8e L9 5L7 171
asymmetrical migration (421.7, (0.1, (421.7, (0.011, (7.7,376.1) (2.5,
594.0) 8.8) 594.0) 5.5) 3608.8)
4 Ancestral size unconstrained, 532.5 06 72~! L9 8L0 1362.5 "
asymmetrical migration (417.6, (0.138, (16.7, (0.01, (10.5,400.8) (1.94,
893.6) 8.6) 584,8) 8.6) 3273.6)
" Effective population size (effective number of individualsxlO" ) for C. rubella (Cr), C. grandiflora (Cg) and their ancestor (A)
b Migration rate (4Nem) from C. grandiflora to C. rubella
c Migration rate (4Nem) from C. rubella to C. grandiflora d Time (ka) of the split of C. rubella and C. grandiflora
e Constrained
56 Reprinted by permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
observed I expected
c .2 20 5 (A observed 3 I expected I " > e Ie 10 I n
SI Figure 5. Observed and expected (under neutrality as calculated using Equation 49,
Tajima, 1989) minor allele frequency distribution of synonymous SNPs in a) C grandiflora and b) C rubella using A thaliana as an outgroup.
57 Reprinted by permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
-t r 201-300 301-400 between-locus
Distance (bp)
C. rubella
C. grandiflora
SI Figure 6. Average levels of linkage disequilibrium as measured by the squared correlation coefficient r in C. rubella and C. grandiflora in lOObp windows.
58 Reprinted by permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
Capsella grandiflora Capsella rubella Capsella rubella
SI Figure 7. STRUCTURE output for a) C. grandiflora and C. rubella combined (k=2) and for b) C. rubella alone (k=3) Each bar represents an individual where the color denotes the proportion of an individual's genome assigned to a cluster (k) based on haplotype information.
59 Reprinted by permission from the Proceedings of the National Academy of Sciences of the United States of America, 106:5241-5245. Copyright 2009
1.2
0.8 • neutral bottleneck
0.6 • selective sweep 1 selection on standing 0.4 variation
0.2
0-0.1 0.1- 0.2- 0.3- 0.4- 0.5- 0.6- 0.7- 0.8- 0.9-1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
SI Figure 8. Distribution of the fraction of shared polymorphism from simulated 39-gene datasets. Arrow shows the observed fraction of shared polymorphisms from the data. The selective sweep model was run using the deterministic model of Thornton and Jensen
(2007), with the sweep occurring 10,500 years ago, and a selection coefficient (s) of
0.003. The model of standing variation assumed a selection coefficient of 0.003 and an initial allele frequency of 0.01 for the selected allele.
60 CHAPTER 3
No evidence for ongoing and widespread hybridization between sympatric populations of C. grandiflora and C. rubella
61 Abstract
The speciation process involves the establishment and maintenance of
reproductive isolating barriers that restrict hybridization and gene flow. Despite the
establishment of these barriers, hybridization in the wild is a relatively common
occurrence. While in some cases, specific reproductive barriers have been identified as the primary isolating barrier, it is often the combination of several reproductive barriers that will result in reproductive isolation. One such barrier is mating system, however, the
efficacy of differences in mating system in functioning as a reproductive barrier remains unclear. Here, I search for evidence of hybridization in the genus Capsella. Capsella rubella is characterized by a high rate of self-fertilization and shows the typical morphological characteristics of a selfing syndrome. In comparison with its self-
incompatible sister species C. grandiflora, there has been a derived breakdown of the
self-incompatibility mechanism, and its floral organ sizes are highly reduced. In this study, I ask whether a shift in mating system has lead to the establishment of effective reproductive isolating barriers between these two sister species. By investigating the population genetics of potentially hybridizing populations of C. grandiflora and C. rubella I find no evidence for ongoing and widespread hybridization between sympatric populations of C. grandiflora and C. rubella. These results suggest that mating system may be acting as an effective reproductive isolating barrier between these two species.
62 Introduction
The speciation process involves the establishment and maintenance of
reproductive isolating barriers that restrict hybridization and gene flow (Dobzhansky
1937; Mayr 1942; Coyne and Orr 2004). These isolating barriers may be classified
according to their timing of action. Prezygotic isolating barriers impede gene flow before the transfer of pollen or sperm to members of other species. Postzygotic barriers such as
hybrid inviability or sterility, act following fertilization (Coyne and Orr 2004; Lowry et
al. 2008). While few studies in the literature have attempted to comprehensively quantify the action of various isolating barriers in a speciation event, it has been established that it
is often the cumulative effects of many reproductive barriers that will result in reproductive isolation (Rieseberg and Willis 2007).
Complete estimates of reproductive isolation have been conducted in a number of different systems. Habitat isolation, followed by pollinator isolation have been identified as the primary causes of speciation between Mimulus lewisii and M. cardinalis (Ramsey et al. 2003), Costus pulverulentus and C. scaber (Kay 2006) and diploid and tetraploid
Chamerion angustifolium (Husband and Sahara 2004). Similarly, ecogeographic and pollinator isolation are thought to contribute most to total isolation between Costus pulverulentus and C. scaber (Kay 2006). Although hybrids are sometimes formed, these hybrids exhibit severely reduced viability.
One additional potential premating barrier is mating system. The transition from outcrossing to selfing represents one of the most common major evolutionary transitions in the plant kingdom (Stebbins 1970; Barrett 2002). Phenotypically, this transition to
63 selfing is almost universally associated with the 'selfing syndrome', characterized by a
severe reduction in flower size and a breakdown of the morphological and genetic
mechanisms preventing self-fertilization and promoting outcrossing. By reducing the
probability of pollen transfer between diverging populations, such changes may directly
act as prezygotic reproductive barriers, thus aiding the speciation process (Fishman et al.
2002). However, the efficacy of differences in mating system in functioning as a
reproductive barrier remains unclear. Furthermore, the extent to which gene flow occurs
between outcrossing and selfing species pairs is not yet known.
Despite the presence of strong intrinsic postzygotic barriers, mating system
isolation has been identified as the primary reproductive barrier between the outcrossing
M. guttatus and its selfing sister species M. nasutus (Martin and Willis 2007). It has been
shown that the difference in mating system (in combination with karyotype) has lead to the maintenance of reproductive barriers between outcrossing and selfing Stephanomeria
(Gottlieb 1973).
Despite the establishment and maintenance of these reproductive barriers, the processes of hybridization and introgression are thought to be common in many groups of plants and animals (Arnold 1997), yet the majority of hybridizing species remain morphologically distinct because of selection against hybrids (Coyne and Orr 2004).
Theoretically, it is possible that neutral or advantageous alleles may move across species boundaries unless they are tightly linked to loci contributing to reproductive isolation yet still little is known about the occurrence of this phenomenon (Barton 1979; Harrison
1990). Conclusive detection of hybridization and introgression can be somewhat difficult.
64 Cryptic introgression can be inferred if phylogenies based on different loci are not
concordant, however, these results may reflect the presence of ancestral polymorphism
(Yatabe et al. 2007).
With recent advances in coalescent modeling, it is now possible to distinguish
between the action of hybridization and the presence of ancestral polymorphism using
coalescent based modeling approaches (Soltis et al. 2003; Noor and Feder 2006; Becquet
and Przeworski 2007; Hey and Nielsen 2007; Hey 2010). Formalized speciation models
now facilitate the differentiation of ancestral polymorphism from introgression and allow
statistically-based timing estimates of divergence events (Wakeley and Hey 1997;
Nielsen and Wakeley 2001). Using these models, speciation processes have been studied
in Drosophila (Wang et al. 1997; Hey and Nielsen 2007), Arabidopsis (Ramos-Onsins et
al. 2004), Oryza (Zhang and Ge 2007) and Capsella (Foxe et al. 2009 (Chapter 2)) and
evidence for hybridization between species in has been detected across a wide variety of
genera; Macaques (Bonhomme et al. 2009), Oryctolagus (rabbit) (Geraldes et al. 2006),
Serrasalmus (Hubert et al. 2008), Acropora (Vollmer and Palumbi 2002), Sorghum
(Morrell et al. 2005), Arabidopsis (Castric et al. 2008) and Capsella (Slotte et al. 2008).
Here, I search for evidence of hybridization in the genus Capsella. Capsella rubella is characterized by a high rate of self-fertilization (Hurka et al. 1989; Hurka and
Neuffer 1997) and shows the typical morphological characteristics of a selfing syndrome
(Hurka and Neuffer 1997). In comparison with its self-incompatible sister species C. grandiflora, there has been a derived breakdown of the self-incompatibility mechanism, and its floral organ sizes are highly reduced (see Figure 1, Foxe et al. 2009 (Chapter 2)).
65 This transition to selfing has also lead to a substantial bottleneck in C. rubella with a 100-
1500 fold reduction in effective population size (Ne) (Foxe et al. 2009 (Chapter 2)).
While C. grandiflora is restricted to Greece and Albania and locally in Northern Italy, C.
rubella has expanded into much of southern Europe extending to Middle Europe,
Northern Africa, and into Australia and North and South America (Hurka and Neuffer
1997; Paetsch et al. 2006). This expansion is likely associated with the colonization
ability conferred upon C. rubella as a result of its transition to selfing. Previous studies
have estimated a divergence time of approximately 13,500 years between these two
species (Foxe et al. 2009 (Chapter 2)).
Interspecific crossing experiments suggest that, in addition to mating system
evolution, there is considerable postpollination reproductive isolation between the
species, with only a small proportion of crosses producing viable seed (Hurka and
Neuffer 1997; Koch and Kiefer 2005). Under controlled conditions, forced crossing may
result in viable seed when the outcrossing C. grandiflora receives pollen from the selfing
C. rubella. Approximately 80% of these crosses gave rise to viable Fl hybrids while the
reciprocal cross was found to succeed at a frequency of approximately 10% (K. M.
Hazzouri personal communication).
While previous work has shown little evidence for introgression in natural
populations of C. grandiflora and C. rubella (Foxe et al. 2009 (Chapter 2)), this work
focused on allopatric populations of these species. Here, I ask whether a shift in mating
system has lead to the establishment of effective reproductive isolating barriers between these two sister species. By investigating the population genetics of potentially
66 hybridizing populations of C. grandiflora and C. rubella I ask whether there is evidence
for ongoing and widespread hybridization between sympatric populations of C.
grandiflora and C. rubella.
Methods
Species Sampling and DNA Sequencing
Nucleotide polymorphism data from 18 large exons were collected (Table 2) in C
grandiflora and C. rubella. Nine natural Capsella localities were visited in the Zagori
region of Northern Greece, from which seed was sampled from a total of 37 diploid C.
grandiflora and 35 C. rubella individuals (Figure SI). Of these nine populations, three
were found to be allopatric C. grandiflora populations, four were found to be allopatric
C. rubella populations and the remaining two populations were found to be sympatric for
C. grandiflora and C. rubella. Five C. grandiflora individuals were sampled from
Lazaena, eleven individuals from Monodendri, eight individuals from Papigo, five
individuals from Retsina and eight individuals from Serviana. Twelve C. rubella
individuals were sampled from Ellinka, seven from Kalavryta, six from Milies, three
from Papigo, two from Retsina and five from Souli. Outgroup data from Arabidopsis
thaliana was obtained from GenBank.
Following sterilization, seeds were placed at 4°C on sterile (Murashige-Skoog;
MS) nutriend medium for 14 days before being allowed to germinate at room temperature. The seedlings were grown at 20°C under conditions of 18 hours of light and
6 hours of darkness.
67 After 6 weeks of growth, DNA was extracted from leaf material using a DNeasy
kit (QIAGEN, Hilden, Germany). PCR primers for the large exons were designed as
described by Wright et al. and Ross-Ibarra et al. (Wright et al. 2006; Ross-Ibarra et al.
2008). Briefly, primers were designed to amplify 650-700 bp from single large exons based on the A. thaliana genome sequence, chosen with no a priori expectation as to their
function or the action of selection upon these genes. Each exon was used as a BLAST
(Altschul et al. 1990) query against the shotgun genome sequence of Brassica oleracea
and homologous regions were used to design primers using PrimerQuest (Integrated
DNA Technologies). PCR reactions were performed in 25uL reaction volumes (15mM
PCR (10X) buffer, 2 mM MgS04, 10mM dNTPs, lOuM forward primer, 10u.M reverse primer, 1U Tsg polymerase and 50-100ng DNA) on an Eppendorf Mastercycler with the following program: 2 minutes at 94°C, followed by 20 seconds at 94°C, 20 seconds at
55°C, 40 seconds at 72°C, for 35 cycles, with a final extension time of 4 minutes at 72°C.
These products were sequenced on an ABI 3730 sequencer at the Genome Quebec
Innovation Centre (McGill University, Canada). Chromatograms were checked manually for heterozygous sites, using Sequencher version 4.7 (Gene Codes, Ann Arbor, MI), with the aid of the 'Call secondary peaks' option. Sequences were aligned using Genedoc
(Nicholas 1997). Consistent with high levels of selfing, no heterozygous sites were identified in our C. rubella dataset. This complete lack of heterozygosity in C. rubella also allowed us to confirm that we were sequencing single copy regions only.
68 Sequence statistics and Analysis
Synonymous and nonsynonymous sites were identified by aligning each fragment
to the corresponding fragment in the A. thaliana genome sequence, identified using
BLAST (Altschul et al. 1990), and using the protein annotation from A. thaliana.
Standard population genetic descriptives, including numbers of synonymous and
nonsynonymous sites, estimates of synonymous (jtsyn) and nonsynymous (nrep) diversity
(Tajima 1993), as well as frequency data were calculated using a modified version of Perl
code (Polymorphurama) written by D. Bachtrog and P. Andolfatto (University of
California at San Diego, available from http://ib.berkeley.edu/labs/bachtrog/data/polyMORPHOrama/polyMORPHOrama.html).
The frequency spectra of derived polymorphic variants, and the number of shared derived polymorphisms, unique polymorphisms and fixed differences were calculated using Perl
scripts written by S. Wright.
Bayesian inference of population structure
Individual haplotypes of unphased diploid sequences were reconstructed using the software PHASE (Stephens et al. 2001), as implemented in DnaSP Version 5.0 (Librado and Rozas 2009). Haplotypes with the highest posterior probabilities were used for cluster analysis performed with the program InStruct version 1.0 (Gao et al. 2007).
InStruct performs Bayesian clustering and works by assigning individuals to a given number of clusters in such a way that deviations from Hardy-Weinberg equilibrium are minimized. Unlike STRUCTURE (Pritchard et al. 2000), InStruct can accommodate non-
69 random mating due to selfing. Based on exploratory runs, the number of clusters (k) were
restricted to range from k = 1 to k = 4, and InStruct was run for 2,000,000 generations
with a burnin of 200,000 generations, with two independent chains (runs) for each k.
InStruct was run in this manner on the entire dataset, including C. grandiflora alone and
including C. rubella alone. DISTRUCT vl .1 (Rosenberg 2004) was used to create bar plots of the aligned matrices.
Coalescent Simulations
Coalescent simulations were conducted using MIMAR (Becquet and Przeworski
2007) which estimates the parameters of an isolation-migration model based on Hudson's ms (Hudson 2002). Simulations were conducted using all 18 loci included in this study.
Sites with more than 2 segregating bases were excluded from the anaysis. Based upon previous analyses (Foxe et al. 2009 (Chapter 2)) and results from crossing experiments, migration rates were either unconstrained (asymmetric migration), or set to zero (no migration), whereas effective population sizes were assumed to be identical in C. grandiflora and the ancestor of C. rubella and C. grandiflora.
Prior limits for the Bayesian procedure implemented in MIMAR were set based on those models used in Foxe et al 2009 (Foxe et al. 2009 (Chapter 2)). Priors for 6(9 =
ANepi, where Ne is the effective population size and n is the mutation rate) were uniform
0.001-0.1 for both C. grandiflora and the ancestral species, and uniform 0-0.0025 for C. rubella. All runs assumed an exponentially distributed prior with rate 1 for rid, and a mutation rate per bp of 1.5xl0"8 (Koch et al. 2001). Migration rate priors were log
70 uniform -5-2.5 for migration from C. grandiflora to C. rubella (forward in time) and log
uniform -5-9 for migration from C. rubella to C. grandiflora. The prior for the time of the
split between C. rubella and C. grandiflora was uniform 0-4x106.
Simulations were conducted in three ways; 1) on the dataset as a whole, 2)
including allopatric populations only and 3) including sympatric populations only. Each
simulation was run for a total of 10,080 minutes (1 week) with a burnin of 100,000 steps.
Mixing was monitored by assessing parameter autocorrelation over runs and I considered
that MIMAR reached convergence when the posterior distributions from independent
runs were highly similar (Becquet and Przeworski 2007). The mode of the marginal
posterior probability distribution was considered as a point estimate for each parameter
and 90% highest posterior density (HPD) intervals from the MIMAR output were
calculated using the boa package in R 2.9.0 (Smith 2007).
Results
Patterns of polymorphism
Levels of synonymous and nonsynonymous diversity were estimated in both C. grandiflora (37 diploid individuals) and C. rubella (35 individuals) using direct
sequencing of 18 nuclear genes. A total of 299 synonymous single nucleotide polymorphisms (SNPs) and 190 nonsynonymous SNPs were found in C. grandiflora versus 97 synonymous and 63 nonsynonymous SNPs in the selfing C. rubella. In C. rubella 28% of loci were found to be completely devoid of any variation while 33%
lacked synonymous variation. Levels of synonymous variation in C. grandiflora were
71 found to be 2.8 times higher than those found in C. rubella (Figure la). These results
reflect previous estimates of diversity in C. grandiflora and C. rubella, which pointed towards massive reductions in diversity in C. rubella when compared to C. grandiflora
resulting from an extreme population bottleneck (Foxe et al. 2009 (Chapter 2); Guo et al.
2009).
Levels of synonymous and nonsynonymous diversity were also estimated in allopatric and sympatric populations only (Figures lb and lc). Were C. grandiflora and
C. rubella hybridizing in natural populations, it would be expected that levels of diversity
should be elevated in sympatric hybridizing populations, particularly in the selfing C. rubella. Levels of synonymous and nonsynonymous diversity in both C. grandiflora and
C. rubella in allopatric populations were similar to those found in the total dataset. In comparison with diversity levels in allopatric C. rubella, synonymous and nonsynonymous diversity in sympatric C. rubella were reduced by 1.7 fold and 2.3 fold respectively.
The distribution of synonymous variants across C. grandiflora and C. rubella was estimated. 46% of synonymous variants were found to be shared between species (Figure
2a). This estimate is considerably higher than previous estimates of shared variants between C. grandiflora and C. rubella (26% (Foxe et al. 2009 (Chapter 2))) However, this may be reflective of the increased sample size in this study (72 compared with 34).
Due to the extremely recent divergence time (-13,500 years, (Foxe et al. 2009 (Chapter
2))) between these two species, an increase in sample size would easily explain any increase in the proportion of shared variants. 50% of synonymous variants were found to
72 be unique to C. grandiflora, while 4% of synonymous variants were unique to C. rubella.
49% of synonymous variants were found to be shared in allopatric populations (Figure
2b) versus 45% shared synonymous variants in sympatric populations (Figure 2c). Were
C. grandiflora and C. rubella hybridizing in these natural sympatric populations it would
be expected that these individuals in sympatric populations would have increased
proportions of shared variants versus those individuals in allopatric populations.
However, this is not the case as the proportion of shared variants in sympatric
populations is decreased when compared to allopatric populations, although this
reduction in shared variation is not statistically significant (p > 0.05, Fisher's exact test).
Bayesian Clustering Analyses
The results from Bayesian clustering analyses for the combined nuclear sequence
data suggest the existence of three clusters across the populations sampled from Greece
(Figure 3). Capsella grandiflora and C. rubella broadly speaking fall into two discrete
clusters where each individual clusters within its own species. The main exception to this
are C. grandiflora individuals found in Retsina and Lazaena who group together to form
a third cluster (Figure 3, k = 3). This pattern can also be clearly seen in the results from
Bayesian clustering analyses conducted on C. grandiflora only (Figure 4a).
The results from Bayesian clustering analyses conducted on C. rubella alone
suggest the existence of three clusters in C. rubella (Figure 4b). Kalavryta predominantly
forms its own cluster although there is evidence for admixture with Souli. Ellinika clusters with Milies and Papigo, Retsina and Souli form the third cluster.
73 Geographically, Kalavryta is situated to the east of the other C. rubella populations and
this may explain its forming an individual cluster. While there are no obvious geographic
reasons for the clustering patterns observed in the other populations, it may be that the
clustering patterns are reflective of a common ancestral population.
Demographic Model Fitting
The results from coalescent simulations are in strong agreement with previous
work in that they provide strong evidence for a single recent origin of C. rubella from C.
grandiflora. Here, a Markov Chain Monte Carlo (MCMC) approach based on coalescent
simulations was used to fit a model of isolation with and without migration to the data
(Becquet and Przeworski 2007). The approach makes use of the observed information
from each locus on the number of shared and unique polymorphisms, as well as fixed
differences. For analysis, the inference was restricted to synonymous sites, to avoid
potential effects of selection on the nonsynonymous variants. The model assumes that a
single ancestral population of size Na split into two at time t, and the two derived
populations have distinct effective population sizes (Ni and N2).
Although previous results have suggested little or no hybridization between C. grandiflora and C. rubella, these results were based upon individuals originating from
allopatric populations. To specifically test for the presence of hybridization, individuals
from both allopatric and sympatric populations were included in analyses. To estimate
demographic parameters, I investigated a series of models that varied in the inclusion of
74 asymmetrical or no migration, and whether the individuals were from allopatric or
sympatric and therefore potentially hybridizing populations (Table 1).
The parameter estimates from these models are consistent with previous results
suggesting an extremely recent speciation event associated with a major reduction in
effective population size in C. rubella (Foxe et al. 2009 (Chapter 2)). Under the model
with no migration, including the entire dataset, the most likely estimate of divergence
time was found to be approximately 14,000 years ago. Similarly, the model with
asymmetrical migration provides estimates of divergence time of approximately 18,000
years ago.
When the allopatric and sympatric datasets were analysed separately, the point estimates
for divergence time in the absence migration were 14,000 and 10,000 years respectively
and 822,000 and 14,000 years allowing for asymmetrical migration. Although the point
estimate of divergence time while allowing for asymmetrical migration, in allopatric populations, differs considerably from that of sympatric populations, the 90% HPD
intervals of each of these estimates overlap (Table 1).
Migration rate estimates are also consistent with previous results providing scant evidence for hybridization between these two species. Although the point estimate for migration from C. rubella to C. grandiflora was found to be 82.6 individuals per
generation in sympatric populations, versus 32.7 individuals per generation in allopatric populations, the 90% HPD intervals of these estimates overlap (Table 1).
Discussion
75 Here, I find no clear evidence for gene flow between the outcrossing C. grandiflora and its selfing sister species C. rubella. These results suggest that differences in mating system have lead to the establishment of effective reproductive barriers preventing gene flow between these two species. A number of lines of evidence support these conclusions. Firstly, diversity in sympatric C. rubella is not increased when compared to allopatric C. rubella, as would be predicted were C. grandiflora and C. rubella hybridizing. Secondly, the distributions of synonymous variants estimated in both allopatric and sympatric Capsella are comparable, with no increase in the proportion of shared alleles in sympatric Capsella. Next, results from Bayesian clustering analyses indicate little or no shared haplotype identity between these two species and finally, demographic model fitting indicates no significant differences in migration rate estimates in allopatric versus sympatric populations.
While a difference in mating system alone can act as the primary reproductive barrier (Gottlieb 1973; Martin and Willis 2007), this difference can also act to promote speciation in other ways. For instance, selfing can allow a single individual to successfully colonize and populate a new habitat, previously unavailable to an outcrossing relative. This in turn may result in the establishment of habitat isolation
(Coyne and Orr 2004). Interestingly, the results from coalescent simulations, with and without migration, reveal a -27 and -14 fold reduction in Ne in sympatric C. rubella when compared with allopatric C. rubella. In this instance, it is possible that C. grandiflora effectively outcompetes C. rubella when the two species are found in close proximity resulting in a fewer number of C. rubella individuals at these localities. If this
76 is the case and C. rubella performs better in allopatry to C. grandiflora, it may be that
habitat isolation has developed as a byproduct of mating system isolation.
Capsella grandiflora is thought to represent a large, stable population at
equilibrium (Foxe et al. 2009 (Chapter 2)). Despite this, there is some evidence for
population structure in C. grandiflora with the existence of two population clusters. A
study investigating the demographic history of Greek Capsella has recently identified the
presence of three clusters in C. grandiflora in Greece (St. Onge et al. 2010). These
clusters are separated by geography with one cluster located in Northern Greece, a second
cluster in Southern Greece and a third cluster on the island of Corfu. The presence of the
Retsina and Lazaena cluster in C. grandiflora in this dataset is likely reflective of this
geographic clustering.
Mating system isolation has been identified as the primary reproductive barrier in just two plant species pairs to date (Gottlieb 1973; Martin and Willis 2007). Although
mating system appears to be acting as a reproductive barrier between C. grandiflora and
C. rubella, what is not clear is the relative contribution of this mating system isolation in
the speciation process. It is likely that other factors are playing a role and some evidence
for postmating reproductive isolation has been observed. Given the extremely recent
divergence between C. grandiflora and C. rubella, fully understanding the extent to
which different reproductive isolating barriers, including mating system isolation, will be
of considerable interest in future investigations. Nevertheless, the results from this study provide no significant evidence for ongoing and widespread hybridization between these two species.
77 References
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local
alignment search tool. J Mol Biol 215:403-410.
Arnold, M. L. 1997. Natural Hybridization and Evolution. Oxford University Press, New
York.
Barrett, S. C. H. 2002. The evolution of plant sexual diversity. Nature Reviews 3:274-
284.
Barton, N. H. 1979. Dynamics of hybrid zones. Heredity 43:341-359.
Becquet, C, and M. Przeworski. 2007. A new approach to estimate parameters of
speciation models with application to apes. Genome Res 17:1505-1519.
Bonhomme, M., S. Cuartero, A. Blancher, and B. Crouau-Roy. 2009. Assessing natural
introgression in 2 biomedical model species, the rhesus macaque (Macaca
mulatto) and the long-tailed macaque (Macaca fascicular is). J Hered 100:158-
169.
Castric, V., J. Bechsgaard, M. H. Schierup, and X. Vekemans. 2008. Repeated adaptive
introgression at a gene under multiallelic balancing selection. PLoS Genetics
4:el000168.
Coyne, J. A., and H. A. Orr. 2004. Speciation. Sinauer Associates, Inc., Sunderland,
Massachusetts.
Dobzhansky, T. 1937. Genetics and the Origin of Species.
78 Fishman, L., A. J. Kelly, and J. H. Willis. 2002. Minor quantitative trait loci underlie
floral traits associated with mating system divergence in Mimulus. Evolution
56:2138-2155.
Foxe, J. P., T. Slotte, E. A. Stahl, B. Neuffer, H. Hurka, and S. I. Wright. 2009. Rapid
morphological evolution and speciation associated with the evolution of selfing in
Capsella. PNAS 106:5241-5245.
Gao, H., S. Williamson, and C. D. Bustamante. 2007. A Markov chain Monte Carlo
approach for joint inference of population structure and inbreeding rates from
multilocus genotype data. Genetics 176:1635-1651.
Geraldes, A., N. Ferrand, and M. W. Nachman. 2006. Contrasting patterns of
introgression at X-linked loci across the hybrid zone between subspecies of the
European rabbit (Oryctolagus cuniculus). Genetics 173:919-933.
Gottlieb, L. D. 1973. Genetic differentiation, sympatric speciation, and the origin of a
diploid species of Stephanomeria. Am J of Bot 60:545-553.
Guo, Y.-L., J. S. Bechsgaardb, T. Slotte, B. Neuffer, M. Lascoux, Weigel D., and M. H.
Schierup. 2009. Recent speciation of Capsella rubella from Capsella grandiflora,
associated with loss of self-incompatibility and an extreme bottleneck PNAS
106:5246-5251.
Harrison, R. G. 1990. Hybrid zones: windows on evolutionary processes. Oxf. Surv.
Evol. Biol. 7:69-128.
Hey, J. 2010. Isolation with migration models for more than two populations. Mol Biol
Evol 27:905-920.
79 Hey, J., and R. Nielsen. 2007. Integration within the Felsenstein equation for improved
Markov chain Monte Carlo methods in population genetics. PNAS 104:2785-
2790.
Hubert, N., J. P. Torrico, F. Bonhomme, and J. F. Renno. 2008. Species polyphyly and
mtDNA introgression among three Serrasalmus sister-species. Mol Phylogenet
Evol 46:375-381.
Hudson, R. R. 2002. Generating samples under a Wright-Fisher neutral model of genetic
variation. Bioinformatics 18:337-338.
Hurka, H., S. Freundner, A. H. Brown, and U. Plantholt. 1989. Aspartate
aminotransferase isozymes in the genus Capsella (Brassicaceae): subcellular
location, gene duplication, and polymorphism. Biochemical Genetics 27:77-90.
Hurka, H., and B. Neuffer. 1997. Evolutionary processes in the genus Capsella
(Brassicaceae). Plant Sys and Evol 206:295-316.
Husband, B. C, and H. A. Sahara. 2004. Reproductive isolation between autotetraploids
and their diploid progenitors in fireweed, Chamerion angustifolium
(Onagraceae). New Phytologist 161:703-713.
Kay, K. M. 2006. Reproductive isolation between two closely related hummingbird-
pollinated neotropical gingers. Evolution 60:538-552.
Koch, M., B. Haubold, and T. Mitchell-Olds. 2001. Molecular systematics of the
Brassicaceae: evidence from coding plastidic matK and nuclear Chs sequences.
Am J Bot 88:534-544.
80 Koch, M., and M. Kiefer. 2005. Genome evolution among cruciferous plants: a lecture
from the genetic maps of three diploid species— Capsella rubella, Arabidopsis
lyrata subsp. petraea, and A. thaliana. Am J Bot 92:761-767.
Librado, P., and J. Rozas. 2009. DnaSP v5: a software for comprehensive analysis of
DNA polymorphism data. Bioinformatics 25:1451-1452.
Lowry, D. B., J. L. Modliszewski, K. M. Wright, C. A. Wu, and J. H. Willis. 2008. The
strength and genetic basis of reproductive isolating barriers in flowering plants.
Phil Trans R Soc Lon 363:3009-3021.
Martin, N. H., and J. H. Willis. 2007. Ecological divergence associated with mating
system causes nearly complete reproductive isolation between sympatric Mimulus
species. Evolution 61:68-82.
Mayr, E. 1942. Systematics and the origin of species. Columbia University Press, New
York.
Morrell, P. L., T. D. Williams-Coplin, A. L. Lattu, J. E. Bowers, J. M. Chandler, and A.
H. Paterson. 2005. Crop-to-weed introgression has impacted allelic composition
of johnsongrass populations with and without recent exposure to cultivated
sorghum. Mol Ecol 14:2143-2154.
Nicholas, K. B., Nicholas, H.B. Jr., Deerfield, D.W. II. 1997. GeneDoc: Analysis and
Visualization of Genetic Variation. EMBNEW.NEWS 4:14.
Nielsen, R., and J. Wakeley. 2001. Distinguishing migration from isolation: a Markov
chain Monte Carlo approach. Genetics 158:885-896.
81 Noor, M. A., and J. L. Feder. 2006. Speciation genetics: evolving approaches. Nature
Reviews 7:851-861.
Paetsch, M., S. Maryland-Quellhorst, and B. Neuffer. 2006. Evolution of the self-
incompatibility system in the Brassicaceae: identification of S-locus receptor
kinase (SRK) in self-incompatible Capsella grandiflora. Heredity 97:283-290.
Pritchard, J. K., M. Stephens, and P. Donnelly. 2000. Inference of population structure
using multilocus genotype data. Genetics 155:945-959.
Ramos-Onsins, S. E., B. E. Stranger, T. Mitchell-Olds, and M. Aguade. 2004. Multilocus
analysis of variation and speciation in the closely related species Arabidopsis
halleri and A. lyrata. Genetics 166:373-388.
Ramsey, J., H. D. Bradshaw, Jr., and D. W. Schemske. 2003. Components of
reproductive isolation between the monkeyflowers Mimulus lewisii and M.
cardinalis (Phrymaceae). Evolution 57:1520-1534.
Rieseberg, L. H., and J. H. Willis. 2007. Plant speciation. Science 317:910-914.
Rosenberg, N. A. 2004. DISTRUCT: a program for the graphical display of population
structure. Mol Ecol Notes 4:137-138.
Ross-Ibarra, J., S. I. Wright, J. P. Foxe, A. Kawabe, L. DeRose-Wilson, G. Gos, D.
Charlesworth, and B. S. Gaut. 2008. Patterns of Polymorphism and Demographic
History in Natural Populations of Arabidopsis lyrata. PloS One 3:e2411.
Slotte, T., H. Huang, M. Lascoux, and A. Ceplitis. 2008. Polyploid speciation did not
confer instant reproductive isolation in Capsella (Brassicaceae). Mol Biol Evol
25:1472-1481.
82 Smith, B. J. 2007. boa: An R Package for MCMC Output Convergence Assessment and
Posterior Inference. Journal of Statistical Software 21:1-37.
Soltis, D. E., P. S. Soltis, and J. A. Tate. 2003. Advances in the study of polyploidy since
Plant Speciation. New Phytologist 161:173-191.
St. Onge, K., T. Kallman, T. Slotte, M. Lascoux, and A. Palme. 2010. Divergent
population history and structure in two closely related species (Capsella rubella
and C. grandiflora) with different mating system In Prep.
Stebbins, G. L. 1970. Adaptative radiation of reproductive characteristics in angiosperms.
I. Pollination mechanisms. Ann Rev of Ecol Sys 1:307-326.
Stephens, M., N. J. Smith, and P. Donnelly. 2001. A new statistical method for haplotype
reconstruction from population data. Am J Hum Gen 68:978-989.
Tajima, F. 1993. Measurement of DNA polymorphism. Pp. 37-60 in N. Takahata, and A.
Clark, eds. Mechanisms of molecular evolution. Japan Scientific Societies Press,
Tokyo.
Vollmer, S. V., and S. R. Palumbi. 2002. Hybridization and the evolution of reef coral
diversity. Science 296:2023-2025.
Wakeley, J., and J. Hey. 1997. Estimating ancestral population parameters. Genetics
145:847-855.
Wang, R. L., J. Wakeley, and J. Hey. 1997. Gene flow and natural selection in the origin
of Drosophilapseudoobscura and close relatives. Genetics 147:1091-1106.
83 Wright, S. I., J. P. Foxe, L. DeRose-Wilson, A. Kawabe, M. Looseley, B. S. Gaut, and D.
Charlesworth. 2006. Testing for effects of recombination rate on nucleotide
diversity in natural populations of Arabidopsis lyrata. Genetics 174:1421-1430.
Yatabe, Y., N. C. Kane, C. Scotti-Saintagne, and L. H. Rieseberg. 2007. Rampant gene
exchange across a strong reproductive barrier between the annual sunflowers,
Helianthus annuus and H. petiolaris. Genetics 175:1883-1893.
Zhang, L.-B., and S. Ge. 2007. Multilocus analysis of nucleotide variation and speciation
in Oryza officinalis and its close relatives. Mol Biol Evol 24:769-783.
84 Table 1. Modes of parameter estimates under a range of MIMAR models using the entire dataset, allopatric populations only and sympatric populations only. 90% HPD intervals are given in parentheses.
--pd Dataset and Model a a b C 9Vg) 6 (Cr) 9 (A) MCg-Cr MCr-Cg Entire dataset, no migration 0.02275 0.00151 (0.00062, 0.02275, - - 14,014(4,210, (0.01908, 0.00244) (0.01908, 33,053) 0.02662) 0.02662) Entire dataset, asymmetrical 0.02216 0.00067(0.00015, 0.02216 0.37390 42.0776 18,018(3,731, migration (0.01839, 0.00221) (0.01839, (0.00674, (0.01323, 3,812,460) 0.02744) 0.02744) 9.63982) 164.544 Allopatric populations, no 0.02335 0.00109 (0.00034, 0.02335 - - 14,014(5,126, migration (0.01777, 0.00212) (0.01777, 39,173) 0.02595) 0.02595) Allopatric populations, 0.02265 0.00083 (0.00007, 0.02265 6.46406 32.6719 822,823 (27,164, asymmetrical migration (0.01783, 0.00145) (0.01783, (0.00675, (0.01003, 3,813,830) 0.02629) 0.02629) 2.8334) 194.819) Sympatric populations, no 0.02295 0.00004 (0.00003, 0.02295 - - 10,010(459, migration (0.01888, 0.00214) (0.01888, 35,520) 0.02788) 0.02788) Sympatric populations, 0.02305 0.00006 0.02305 11.4302 82.6839 14,014 (2,340, asymmetrical migration (0.01845, (0.000002, (0.01845, (0.00674, (0.00688, 3,793,770) 0.02894) 0.00167) 0.02894) 1.00024) 1,296.58 a 9 (4Ne/J-, where Ne is the effective population size and \i is the mutation rate 1.5 X 10" ) for C. grandiflora (Cg), C. rubella (Cr) and their ancestor (A) b Migration rate (4Nem) from C. grandiflora to C. rubella c Migration rate (4Nem) from C. rubella to C. grandiflora
Time of the split of C. rubella and C. grandiflora
85 Table 2. Number of silent sites subdivided by species and population category (allopatric and sympatric) as well as gene ontology terms for each of the 18 loci used in this study.
Number of Silent Polymorphisms C. grandiflora C. rubella Locus Total Allopatric Sympatric Allopatric Sympatric Gene Ontology Terms Atlg01040 9 8 7 0 0 DEAD/DEAH box helicase carpel factory Atlg03560 11 8 5 4 0 pentatricopeptide (PPR) repeat-containing Atlg04650 14 12 11 8 0 hypothetical protein Atlg06520 13 13 9 0 0 phospholipid/glycerol acyltransferase family Atlgl0900 3 3 1 0 0 phosphatidylinositol-4-phosphate 5-kinase family Atlg59720 23 21 18 13 5 pentatricopeptide (PPR) repeat-containing Atlg62390 7 7 5 2 0 octicosapeptide/Phox/Bemlp (PB1) Atlg62520 39 31 28 21 13 expressed protein Atlg65450 28 25 25 17 0 transferase family protein Atlg68530 27 20 22 1 0 very-long-chain fatty acid condensing enzyme Atlg72390 8 5 5 0 1 expressed protein Atlg74600 10 9 6 0 0 pentatricopeptide (PPR) repeat-containing At2g23170 35 28 32 0 0 auxin-responsive GH3 family protein At2g26730 21 17 18 1 1 leucine-rich repeat transmembrane protein At2g44900 5 5 2 0 0 armadillo/beta-catenin repeat family protein/F-box family protein At2g47430 4 4 2 1 1 cytokinin-responsive histidine kinase (CKI1) At4gl4190 27 20 16 13 0 pentatricopeptide (PPR) repeat-containing At4gl4370 26 24 20 8 0 phosphoinositide binding
86 © T ©"
o i
<- o . r " i o 1 a j --T— o i « , | C graadsfiom C rubetfa C jrd/iaiffcfd C 'jbellti
Figure 1. Comparison of polymorphism patterns between C grandiflora and C rubella for a) the entire dataset, b) the allopatric populations and c) the sympatric populations as measured by n synonymous where % is the average pairwise differences. Bars represent the median, boxes the interquartile range, and whiskers extend out to 1.5-times the interquartile range.
87 / | Unique Crub 1 Unique Cgf | Shared I Fixed Differences
Figure 2. Distribution of synonymous variants. Variants are classed as unique to C rubella, unique to C. grandiflora or shared between species. The datasets are subdivided into a) the entire dataset, b) allopatric populations only and c) sympatric populations only.
88 C. gmndifiom C. rub&iia
k2
k3
k4
Figure 3. Posterior probabilities of Bayesian clustering analysis (InStruct) conducted on the entire dataset, where k = 2-4. Bar plots show individual posterior probabilities.
* denotes sympatric populations
89 tSJiL
n
k4
B CD c * * 9 CO 8 'a. i9 3
Figure 4. Posterior probabilities of Bayesian clustering analysis (InStruct), using the (a)
C. grandiflora individuals and the (b) C. rubella individuals, where k = 2-4. Bar plots show individual posterior probabilities.
* denotes sympatric populations
90 ;::.;;'i-.--A'vr-' •* :,'*' *"" 'tr*; :5Hj?j
•^%a£:sH 'ft ,45;
Jr :I^*» -5^r £***** *
•i«fcl •*^s «u* W|*4ojl ?¥&; ?*••* a «J«H •V-*-1 4 B«fe (»•». "J ""Br -*1 : •".*-•»!. .''^ #k !&
, , *..•:.i---"-T;-M":" • !JT..:tj ftrwfrv,-jr/%i.s?f: •E-MWMI* MM* t$»A
!« *-*": "i» V Figure SI. Geographic location of each of the nine populations included in this study.
Allopatric C. grandiflora popultions are marked in blue, allopatric C. rubella populations are marked in pink and sympatric populations are marked in yellow.
91 CHAPTER 4
Dynamics of polyploid speciation in the genus Capsella Abstract
Polyploidy has long been known to be an important form of speciation. Often
polyploidization can act to instantly create a new species as the newly formed polyploid
is immediately reproductively isolated from the diploid progenitor(s) due to resulting
problems with chromosome pairing and segregation in meiosis. Given the dominant role
of polyploidization in plant speciation, understanding the evolutionary context in which
this process occurs becomes an important aspect of speciation genetics. Capsella bursa- pastoris or Shepherd's Purse, is a selfing tetraploid and sister species to the outcrossing
C. grandiflora and selfing C. rubella. Like C. rubella, C bursa-pastoris very clearly
displays the selfing syndrome; in comparison with the outcrossing C. grandiflora, their floral organs are very much reduced. Capsella bursa-pastoris has a worldwide
distribution, which can partly be explained anthropogenically. Investigations into the evolutionary origin of C. bursa-pastoris remain inconclusive. Here, using DNA sequence data from 14 unlinked nuclear loci in C. bursa-pastoris, C. grandiflora and C. rubella I attempt to identify the evolutionary mode of origin of C. bursa-pastoris. Furthermore I address the evolutionary phylogenetic relationships between C. bursa-pastoris, C grandiflora and C. rubella. My results suggest that C. bursa-pastoris diverged from C. grandiflora via autoployploidization, approximately 667,000 years ago and quantification estimates of a population bottleneck in C. bursa-pastoris are inconsistent with a speciation event from a single individual.
93 Introduction
Polyploidy has long been associated with speciation and is considered by many to
be the predominant mode of sympatric speciation (Coyne and Orr 2004; Mallet 2007).
Polyploidization can act to instantly create a new species as the newly formed polyploid
is often immediately reproductively isolated from the diploid progenitor(s) due to the
change in ploidy. Hybridization between a newly created tetraploid individual and an
ancestral diploid individual results in triploid offspring. These progeny are typically
sterile, as in a genome containing an odd number of chromosomes, meiosis cannot proceed correctly due to problems in chromosome pairing and segregation (Ramsey and
Schemske 1998).
The relative contribution of polyploidy to speciation in plants is a controversial topic with widely varying estimates of the frequency of polyploid speciation. Based upon the fraction of speciation events that involve any change in chromosome number as well
as the fraction of changes in chromosome number that involve polyploidy, Otto and
Whitton (2000) report that 2-4% of speciation events in angiosperms and 7% in ferns are
a direct result of polyploidy. More recent estimates based upon phylogenetic data
estimate the frequency of polyploid speciation by tracking changes in ploidy level across
infrageneric phylogenetic trees (Wood et al. 2009). Wood and colleagues (2009) put this number at 15% in angiosperms and 31% ferns. These estimates indicate that polyploidization represents an extremely common vehicle for the speciation process in plants.
94 Given the dominant role of polyploidization in plant speciation, understanding the
evolutionary context in which this process occurs becomes an important aspect of
speciation genetics. Several relevant and important questions must be posed in order to
elucidate the evolutionary history of a polyploid species, for instance: does the species have a single or multiple origins; what is the role of founder events in this process; is there ongoing gene flow between the species and its ancestor(s); when did the polyploidization event occur; is the species an alio- or autopolyploid? Addressing these questions is difficult considering the challenges associated with distinguishing multiple origins of polyploids, extinction of parental lineages and the sampling of standing variation in progenitor species (Doyle and Egan 2009).
Recent advances in coalescent modeling have allowed for much progress in the field of speciation genetics (Soltis et al. 2003; Noor and Feder 2006; Becquet and
Przeworski 2007; Hey and Nielsen 2007; Hey 2010). Formalized speciation models now facilitate the differentiation of ancestral polymorphism from introgression and allow statistically based timing estimates of divergence events (Wakeley and Hey 1997;
Nielsen and Wakeley 2001). Using these models, speciation processes have been studied in Drosophila (Wang et al. 1997; Hey and Nielsen 2007), Arabidopsis (Ramos-Onsins et al. 2004), Oryza (Zhang and Ge 2007) and Capsella (Foxe et al. 2009 (Chapter 2)) and polyploid speciation has been investigated in soybean (Gill et al. 2009), A. suecica
(Jakobsson et al. 2006) and C. bursa-pastoris (Slotte et al. 2008) to name but a few.
It is possible that the formation of a new species may involve but a single individual. This situation however, would result in a severe population bottleneck, as
95 there would be a massive reduction in effective population size. There are several well-
characterized examples of polyploid species where a single origin seems likely including;
wheat (Levy and Feldman 2002); Arachis hypogaea (Kochert et al. 1996); Spartina
anglica (Raybould AF 1991; Ainouche et al. 2004); Arabidopsis suecica (Jakobsson et al.
2006; Hazzouri et al. 2008). However, with the advent of the molecular era, the relative
frequency of recurrent polyploid formation has become apparent with over 45 examples
listed in just two literature reviews (Soltis and Soltis 1993; Soltis and Soltis 1999) with more examples being published on a regular basis (e.g.: A. kamchatica (Shimizu-Inatsugi
et al. 2009); Aegilops (Meimberg et al. 2009); Asteraceae (Grubbs et al. 2009; Symonds
et al. 2010) In fact, recurrent origins of polyploid species may be the rule, not the
exception (Soltis and Soltis 1999).
Polyploids are considered to be tremendously successful species for a number of reasons. Stebbins (1950) suggested that the availability of new ecological niches, closed to the diploid progenitors, was vital to the success of polyploids and anecdotally, it has been documented that many of world's most successful weeds are polyploids (Hegarty and Hiscock 2008). One such polyploid is the weed C. bursa-pastoris or Shepherd's
Purse, which is a selfing tetraploid species and a member of the genus Capsella (Hurka and Neuffer 1997). There are three species in this genus, namely C. bursa-pastoris, a predominantly selfing tetraploid which is among the most successful colonizing plant species in existence (Hintz et al. 2006), C. grandiflora, a diploid outbreeder and C. rubella, a diploid selfer (Hurka and Neuffer 1997). The nature of the three Capsella species make the genus an attractive model in that it contains species with different
96 mating systems and ploidy levels which maintain a close relationship. Previous work has
demonstrated the ability of diploid Capsella to form inter-specific hybrids, allowing for
many types of genetic analysis (Hurka and Neuffer 1997). Both C. bursa-pastoris and C.
rubella phenotypically very clearly display the selfing syndrome; in comparison with the
outcrossing C. grandiflora, their floral organs are very much reduced (Figure 1). Previous
work on these species has suggested that both C. grandiflora and C. rubella may be
ancestral to C. bursa-pastoris (Hurka and Neuffer 1997) and more recent findings reveal
that C. rubella diverged from C. grandiflora approximately 13,500 years ago (Foxe et al.
2009 (Chapter 2)). C. bursa-pastoris has a worldwide distribution that can partly be
explained anthropogenically (Hurka and Neuffer 1997; Slotte et al. 2008). In contrast to
C. grandiflora and C. rubella, C. bursa-pastoris can be found on each continent and thrives in a wide climate range (Hurka and Neuffer 1997).
Although it cannot be conclusively said whether C. bursa-pastoris is an autopolyploid or allopolyploid, speculation has been made. Capsella bursa-pastoris displays disomic inheritance (Hurka et al. 1989). It is often assumed that polyploids that form bivalents during meiosis (i.e. exhibit disomic segregation) are allopolyploids and those that form multivalents during meiosis are autopolyploids (Otto and Whitton 2000).
This however, is not always the case as polyploids with tetrasomic segregation (pairing of four homologous chromosomes during meiosis) tend to rediploidize over time as mutations accumulate and chromosomes diverge (Ramsey and Schemske 1998).
Furthermore, autopolyploids with small chromosomes or low chiasma frequencies may exhibit disomic inheritance immediately after their formation (Stebbins 1971).
97 Clearly, the inheritance mechanism observed in C. bursa-pastoris cannot be used to infer
its mode of origin.
Isozyme electrophoresis indicated that C. bursa-pastoris shared alleles with both
C. grandiflora and C. rubella and was hence thought to be an allopolyploid between
these two species (Hurka et al. 1989). Later, based upon restriction site variation in the
chloroplast genome, C. bursa-pastoris was inferred to be an ancient autopolyploid of C. grandiflora (Hurka and Neuffer 1997). Most recently, it has been hypothesized that C.
bursa-pastoris is in fact an allopolyploid, although not between C. grandiflora and C.
rubella (Slotte et al. 2006).
There are a number of mechanisms by which polyploidy may result in
instantaneous speciation. If the newly formed polyploid is selfing, the polyploidization
event need only occur but once. However, this would represent an extreme population bottleneck. Alternatively, multiple origins of different polyploid species have been hypothesized and this mechanism of formation would explain reported shared polymorphism across species with different levels of ploidy. Finally, single or multiple origins of a polyploid species followed by introgression could explain these patterns of shared polymorphism. In this case, the polyploid event would not have introduced instant reproductive isolation (Ramsey and Schemske 1998). While hybridization between a polyploid and diploid is unusual, there are mechanisms under which it may occur. For example, hybridization between a tetraploid and diploid will lead to triploid offspring.
Backcrossing between this triploid and a diploid can lead to formation of a fully fertile tetraploid individual (Miintzing 1930; Skalihska 1945; Ramsey and Schemske 1998).
98 These three alternatives have recently been tested in C. bursa-pastoris (Slotte et
al. 2008). Using a coalescent based isolation-with-migration model (Nielsen and Wakeley
2001), Slotte et al (2008) found evidence for gene flow from C. rubella to C. bursa- pastoris following the dispersal of C. bursa-pastoris throughout Eurasia. These findings
indicate that, in this case, polyploidy did not result in instantaneous reproductive isolation
in C. bursa-pastoris. However, these conclusions were made using sequence data from C.
bursa-pastoris and C. rubella only. Studies demonstrating an extremely recent divergence time for C. rubella from C. grandiflora suggest it is unlikely that C. rubella is
ancestral to C. bursa-pastoris (Foxe et al. 2009 (Chapter 2); Guo et al. 2009). Given this,
it seems much more likely that including C. grandiflora may allow for more accurate
inferences about the origins of C. bursa-pastoris.
Here, using DNA sequence data from 14 unlinked nuclear loci in C. bursa- pastoris, C. grandiflora and C. rubella, I address the following areas. First, I characterize patterns of polymorphism in all three species in this genus. Next, using molecular phylogenetic techniques, I elucidate the evolutionary phylogenetic relationships between
C. bursa-pastoris, C. grandiflora and C. rubella. Finally, using coalescent-based analyses
I date the divergence of C. bursa-pastoris from its putative ancestor C. grandiflora and assess evidence for population bottlenecks in C. bursa-pastoris, thus elucidating the evolutionary history between all three species in the genus Capsella.
Methods
Sampling
99 Seeds were obtained from a total of 78 accessions of C. bursa-pastoris from
China, Taiwan, Israel and Europe, 53 accessions of C. grandiflora from its native Greece,
43 accessions of C. rubella from Africa, South America, Europe and Israel, as well as
one accession of Neslia paniculata. The accession designations and geographical origin of Capsella and seed material are given in the Table S1.
Following sterilization, seeds were placed at 4°C on sterile (Murashige-Skoog;
MS) nutriend medium for 14 days before being allowed to germinate at room temperature. The seedlings were grown at 20°C under conditions of 18 hours of light and
6 hours of darkness. After 6 weeks of growth DNA was extracted from fresh or frozen
leaf tissue from a single individual per accession. Leaf tissue was ground to a fine powder
in liquid nitrogen, and DNA was extracted using the QIAgen DNeasy Plant Mini Kit
(QIAGEN, Valencia, California, USA).
PCR and Sequencing
Fourteen single copy, effectively unlinked nuclear genes were chosen for inclusion in this analysis (Table S2). Due to the high conservation of gene content between A thaliana, C. grandiflora and C. rubella (Acarkan et al. 2000; Boivin et al.
2004), such genes are likely to be single copy in the diploids C. grandiflora and C. rubella and to be found in 2 copies duplicated by polyploidy, homoeologs, in the tetraploid C. bursa-pastoris.
PCR primers for the large exons were designed as described by Ross-Ibarra et al.
(Ross-Ibarra et al. 2008) and Slotte et al. (Slotte et al. 2008). In brief, primers were
100 designed to amplify 650-700 bp from single large exons based on the A. thaliana genome
sequence, chosen with no a priori expectation as to their function or the action of
selection on these genes. Each exon was used as a BLAST query against the shotgun
genome sequence of Brassica oleracea and homologous regions were used to design primers by using PrimerQuest (Integrated DNA Technologies). PCR reactions were performed in 25uL reaction volumes (15mM PCR (10X) buffer, 2 mM MgS04, lOmM
dNTPs, 10|^M forward primer, lOuM reverse primer, 1U Tsg polymerase and 50-100ng
DNA) on an Eppendorf Mastercycler with the following program: 2 minutes at 94°C, followed by 20 seconds at 94°C, 20 seconds at 55°C, 40 seconds at 72°C, for 35 cycles, with a final extension time of 4 minutes at 72°C.
I amplified -200 bp to ~800 bp of each gene in C. grandiflora, C. rubella and C. bursa-pastoris. These products were sequenced at Lark Technologies (Houston, Texas) and at the Genome Quebec Innovation Centre (McGill University, Canada).
Chromatograms were checked manually for heterozygous sites, using Sequencher version
4.7 (Gene Codes, Ann Arbor, MI), with the aid of the 'Call secondary peaks' option.
Sequences were aligned using Genedoc (Nicholas 1997).
Based on these data, allele-specific primers were designed using polymorphisms to selectively amplify each of the two C. bursa-pastoris homoeologs. Each copy of the 14 loci used in this study was successfully amplified using this approach and sequenced on an ABI 3730 sequencer at the Genome Quebec Innovation Centre (McGill University,
Canada). Because all C. bursa-pastoris sequences analyzed in this study were amplified using homoeolog-specific primers and were sequenced directly, I avoided sequence
101 artefacts resulting from cloning of heterogeneous polymerase chain reaction products
(Cronn et al. 2002). For each accession and homoeolog, a single sequence was retrieved,
as expected in predominantly selfing species.
Standard Population Genetic Analyses
Synonymous and nonsynonymous sites were identified by aligning each fragment
to the corresponding fragment in the A. thaliana genome sequence, identified using
BLAST (Altschul et al. 1990), and using the protein annotation from A. thaliana.
Sequence-based summary statistics 0 (Watterson 1975) 71 (Tajima 1993) and Tajima's D
(Tajima 1989) synonymous and nonsynonymous, as well as frequency data, were
calculated by using a modified version of Perl code (Polymorphurama) written by D.
Bachtrog and P. Andolfatto (University of California at San Diego, available from
http://ib.berkeley.edu/labs/bachtrog/data/polyMORPHOrama/polyMORPHOrama.html).
The frequency spectra of derived polymorphic variants, and the number of shared derived polymorphisms, unique polymorphisms, and fixed differences were calculated by using
Perl scripts written by S.Wright. The minimum number of synonymous substitutions between C. grandiflora haplotypes and each of the two C. bursa-pastoris alleles was estimated using DnaSP Version 5.0 (Librado and Rozas 2009).
Based upon the minimum number of synonymous substitutions between C. grandiflora and each of the two C. bursa-pastoris alleles, each locus was designated
locus A or locus B where locus B has the larger minimum distance compared with locus
A (Table 1; following (Slotte et al. 2006) and (Slotte et al. 2008)).
102 To infer haplotype data in C. grandiflora I used PHASE 2.1, as implemented in
DnaSP Version 5.0 (Librado and Rozas 2009), which uses a Bayesian statistical method
to reconstruct haplotypes from diploid data (Stephens et al. 2001; Stephens and Donnelly
2003).
Bayesian Estimation of Species Tree
The molecular phylogenetic program BEST (Bayesian estimation of species trees)
(Liu 2008), which implements a Bayesian hierarchical model while accounting for a deep
coalescence, was used to estimate the Capsella genus species tree using this multi-locus
dataset (Liu 2008). BEST works using concatenated alignments and reportedly does not
seem to perform well when there are missing data. Consequently, BEST was run using the 7 loci in this dataset that had the most consistent sampling of individuals across loci
(Atlg03560, Atlgl5240, Atlg65450, At2g26730, At4gl4190, At5g51670 and
At5g53020). Alignments were concatenated using MacClade version 4.08 (available from http://macclade.org/). BEST was run in two ways, once using A. thaliana as an outgroup and again including both A. thaliana and N. paniculata (where available). In each case BEST was run twice, with 4 chains for a maximum of 2 million generations, with a burnin of 200,000 generations, sampling every 100 generations.
Coalescent Simulations
Coalescent simulations were conducted using MIMAR, which estimates the parameters of an isolation-migration model based on Hudson's ms (Hudson 2002).
103 Simulations were conducted using the 14 loci included in this study. Furthermore, sites
with >2 segregating bases were excluded from the analysis. Coalescent simulations were
run 1) in the absence of migration 2) allowing symmetrical migration and 3) allowing
asymmetrical migration. Effective population sizes were either unconstrained or assumed to be identical in C grandiflora and the ancestor of C. bursa-pastoris and C. grandiflora.
Prior limits for the Bayesian procedure implemented in MIMAR were set based
on initial runs using wide priors. Priors for 9 were uniform 0.001-0.1 for both C. grandiflora and the ancestral species, and uniform 0-0.0025 for C. rubella. All runs assumed an exponentially distributed prior with rate 1 for p/6, and a mutation rate per bp of 1.5 x 10-8 (Koch et al. 2001). Symmetrical migration rate priors were log uniform -5-
2.5. Asymmetrical migration rate priors were log uniform -5-2.5 for migration from C. grandiflora to C. rubella (forward in time) and log uniform -5-6 for migration from C. rubella to C. grandiflora. The prior for the time of the split between C. bursa-pastoris and C. grandiflora was uniform 0-4x10 .
Simulations were conducted in three ways; 1) including C. grandiflora and C. bursa-pastoris A, 2) including C. grandiflora and C. bursa-pastorisB and 3) including C. bursa-pastoris A and C. bursa-pastorisB. Each simulation was run for a total of 10,080 min (1 week) with a burn-in of 100,000. Mixing was monitored by assessing parameter autocorrelation over runs and it was considered that MIMAR reached convergence when the posterior distributions from independent runs were highly similar (Becquet and
Przeworski 2007). The mode of the marginal posterior probability distribution was considered as a point estimate for each parameter, and 90% highest posterior density
104 (HPD) intervals were calculated from the MIMAR output by using the boa package in R
2.9.0 (Smith 2007).
To assess the validity of the results from MIMAR, an autopolyploid event
(depicted in Figure SI) was simulated using the coalescent simulation program ms
(Hudson 2002). A total of 14,000 simulations were run. From this output, sixty loci were
randomly selected and the C. bursa-pastoris loci "A" and "B" were assigned where,
based on the number of fixed differences and unique variants, the most diverged locus
from C. grandiflora was designated locus B. These 60 loci were then divided into five
groups of twelve and run through MIMAR in the absence of migration under the three
models as described above.
Results
Patterns of Polymorphism
Synonymous diversity as measured by K synonymous (where n is the average
number of pairwise difference between two sequences) was estimated in C. grandiflora,
C. rubella and across the A and B loci in C. bursa-pastoris (Figure 2a). In agreement with previous studies (Foxe et al. 2009 (Chapter 2)), levels of synonymous diversity in the outcrossing C. grandiflora were found to be greatly elevated above those found in the
selfing C. rubella. Similarly, synonymous diversity in C. grandiflora was found to be 5- to 7- fold higher than in C. bursa-pastoris A and C. bursa-pastorisB, respectively.
It has been hypothesized that the reduction of diversity in C. rubella is the result of a massive population bottleneck associated with the transition to a selfing lifestyle as
105 recently as 13,500 years ago (Foxe et al. 2009 (Chapter 2)). The similar reduction in
synonymous diversity in C. bursa-pastoris may also be due to a recent population
bottleneck associated with its origin and/or its selfing lifestyle as it has been well
documented that selfing species display much reduced levels of genetic diversity in
comparison to their outcrossing relatives (Charlesworth and Yang 1998; Baudry et al.
2001; Glemin 2006; Foxe et al. 2009 (Chapter 2)). Average synonymous Tajima's D in
C. bursa-pastoris (Tajima's D synonymous = -0.474) was considerably reduced in
comparison with C. grandiflora (Tajima's D synonymous = -0.132) fitting with a
population expansion following a population bottleneck in C. bursa-pastoris.
Evidence for an autopolyploid origin ofC. bursa-pastoris
The minimum number of synonymous substitutions between C. grandiflora and
each of the two C. bursa-pastoris alleles as well as between C. bursa-pastorisA and C.
bursa-pastorisB was calculated (Table 1). Were C. bursa-pastoris an allopolyploid, it
would be expected that one of the alleles when compared to C. grandiflora would display
a significantly greater number of synonymous substitutions than the other allele. Locus
by locus, there is, in general, a minimal difference in the number of synonymous
substitutions between C. grandiflora and C. bursa-pastoris A versus C. grandiflora and
C. bursa-pastorisB. As the more distantly related homoeolog has been designated C.
bursa-pastorisB, these results represent the most extreme case. What is clear from these results is that even under an extreme case where all the more distant homoeologs come from 'B', the minimum distance to C. grandiflora is still considerably lower than the
106 distance between C. bursa-pastoris A and C. bursa-pastorisB. These results are perhaps
more consistent with an autopolyploid event from two distinct C. grandiflora haplotypes
resulting in the formation of C. bursa-pastoris.
Taking synonymous sites, there were found to be 29 fixed differences between C.
bursa-pastoris A and C. bursa-pastorisB compared with 2 between C. bursa-pastoris A
and C. grandiflora and 19 between C. bursa-pastorisB and C. grandiflora (Figure 3).
Again, these results show that both the C. bursa-pastoris "A" and "B" loci are more
closely related to C. grandiflora than they are to each other. These results too seem
consistent with the origin of C. bursa-pastoris from an autopolyploid event from two
distinct C. grandiflora haplotypes. A possible alternative is an allopolyploid event between C. grandiflora and a very close relative.
Molecular Phylogenetics
Taking a molecular phylogenetic approach, the program BEST (Bayesian estimation of species trees), which implements a Bayesian hierarchical model while accounting for a deep coalescence, was used to estimate the Capsella genus species tree from the multi-locus dataset (Liu 2008). BEST was run twice (see Methods), once using
A. thaliana (Figure 4a) as an outgroup and again including both A. thaliana and Neslia
(where available, Figure 4b). In either tree, C. grandiflora is not shown to be more closely related to either the "A" or "B" C. bursa-pastoris locus and in fact, the tree illustrated in Figure 4a represents exactly what we would expect to see if C. bursa-
107 pastoris is an autopolyploid. Once again, these results do not provide strong evidence for
an allopolyploid event between two distantly related species.
Demographic Model Fitting
To investigate the timing of a speciation event between C. grandiflora and C.
bursa-pastoris and to quantify effective population size Ne, I used a Markov Chain
Monte Carlo (MCMC) approach based on coalescent simulations to fit a model of
isolation with and without migration to the data (Becquet and Przeworski 2007). This
approach makes use of the observed information from each locus on the number of
shared and unique polymorphisms, as well as fixed differences. The model assumes that a
single ancestral population of size Na split into two at time T, and the two derived
populations have distinct effective population sizes (NI and N2). The coalescent
simulations were run under three models; 1) species 1 = C. grandiflora, species 2 = C.
bursa-pastoris A; 2) species 1 = C grandiflora, species 2 = C. bursa-pastorisB; 3)
species 1 = C. bursa-pastoris A, species 2 = C. bursa-pastorisB and each model was run
1) in the absence of gene flow, 2) allowing for symmetrical migration and 3) allowing for
asymmetrical migration (Table 2). The models where C. grandiflora is the ancestor assumed equal effective population sizes in the ancestor as in present-day C. grandiflora.
It should again be mentioned that based upon the minimum number of synonymous
substitutions between C. grandiflora and each of the two C. bursa-pastoris alleles, each locus was designated locus A or locus B where locus B has the larger minimum distance compared with locus A (Table 1). Making these assumptions allows for exploration of
108 one extreme possibility where C. bursa-pastoris has originated via allopolyploidaztion.
However, the comparison between C. bursa-pastorisA and C. bursa-pastorisB should not
be biased by these assumptions.
The results from each of these three species pair comparisons are shown in Table
2 and Figure 5. The results from demographic models allowing for symmetrical and
asymmetrical migration show no evidence for migration between C. grandiflora and C.
bursa-pastoris. Consequently, the results outlined below refer to the models run in the
absence of migration.
In comparison to C. grandiflora, C. bursa-pastoris A and C. bursa-pastorisB display a 5- and 7-fold decrease in effective population size respectively (Figure 5a).
These results sharply contrast previous work, which demonstrated a 100-1,500 fold reduction in effective population size in C. rubella in comparison with C. grandiflora,
likely the result of a population bottleneck associated with the transition to selfing (Foxe et al. 2009 (Chapter 2)). It may be that the formation of C. bursa-pastoris as a new species did not result in as severe a bottleneck as in C. rubella. Were C. bursa-pastoris to have a single origin, it is likely that we would see evidence for a strong population bottleneck in the present day population. It may be that evidence of a bottleneck has eroded with time. It is also possible that C. bursa-pastoris is the result of recurrent polyploid formation which would not leave a signature of a severe population bottleneck.
This would be in keeping with the literature, which states that recurrent polyploid formation is vastly more common than a single polyploid speciation event (Soltis and
Soltis 1999).
109 Divergence time between C. grandiflora and C. bursa-pastoris A was estimated at
-308,000 years and at ~ 1.1 million years between C. grandiflora and C. bursa-pastorisB
(Figure 5a). Divergence estimates between C. bursa-pastoris A and C. bursa-pastorisB lie
at -667,000 years (Figure 5b). While the results from the first set of simulations (Figure
5a) do seem to point to an allopolyploid event they do not completely rule out an
autopolyploid event as it may be that the assumptions made when designating each of the
C. bursa-pastoris alleles "A" and "B" are biasing the results. The results from the second
set of simulations when dating the divergence between C. bursa-pastoris A and C. bursa- pastorisB (Figure 5b) give an approximate average of the divergence times between C. grandiflora and C. bursa-pastorisA and C. grandiflora and C. bursa-pastorisB,
consistent with an autopolyploid origin of C. bursa-pastoris.
To assess the plausibility of an autopolyploid model, an autopolyploid event
(depicted in Figure SI) was simulated using the coalescent simulation program ms
(Hudson 2002). The distribution of synonymous variants in the observed and five
simulated datasets were compared (Figures 6 and 7). The numbers of shared and unique synonymous variants as well as synonymous fixed differences were found to be comparable across all simulated datasets when compared to the observed data.
These simulated results were run through MIMAR in the absence of migration
(see Methods). The results of these simulations are presented in Table S3. Each of the five simulated datasets recapitulate the observed results under the models where; 1) species 1 = C. grandiflora, species 2 = C. bursa-pastoris A; 2) species 1 = C. grandiflora, species 2 = C. bursa-pastorisB. In each of the simulated datasets, in comparison with the
110 observed data, an asymmetry in divergence time can be seen where the average simulated
divergence time is -325,000 years between C. grandiflora and C. bursa-pastoris A and
-967,000 years between C. grandiflora and C. bursa-pastorisB. While the divergence
time between C. bursa-pastoris A and C. bursa-pastorisB fluctuatesacros s simulated
datasets, two of the datasets give divergence time estimates similar to those from
observed data. Overall, the congruence between observed and simulated data serve to
validate the use of our autopolyploid model in MIMAR.
Discussion
Previous studies investigating the evolutionary history of the Capsella genus have
inferred a single origin of the selfing C. rubella from the outcrossing C. grandiflora as
little as 13,500 years ago. Furthermore, this speciation event is thought to have resulted in
a massive population bottleneck in C. rubella (Foxe et al. 2009 (Chapter 2); Guo et al.
2009). Here, these results suggest multiple origins of C. bursa-pastoris via
autopolyploidization involving two distinct C. grandiflora haplotypes.
While C. grandiflora and C. rubella are thought to have an extremely recent
divergence time, results from demographic model fitting estimate a divergence time of
approximately 667,000 years between C. grandiflora and C. bursa-pastoris. With the
divergence of C. rubella from C. grandiflora dated at approximately 13,500 years ago, these results conclusively exclude a role for C. rubella in the formation of C. bursa- pastoris.
Ill The question of whether C. bursa-pastoris is an allopolyploid or an autopolyploid
has been the subject of some controversy in the literature (Hurka et al. 1989; Hurka and
Neuffer 1997; Slotte et al. 2006). Various studies have resulted in often-conflicting
theories as to the evolutionary origins of C. bursa-pastoris. Direct comparisons of the
minimum number of synonymous changes between C. grandiflora and C. bursa-pastoris,
comparisons in fixation patterns as well as a molecular phylogenetic approach all point to
an autopolyploid event resulting in C. bursa-pastoris.
Were C. bursa-pastoris to have a single origin, we might expect to see evidence
for a population bottleneck. While there is evidence for a reduction in effective population size in C. bursa-pastoris relative to its ancestor, the scale of this reduction is
not comparable to the bottleneck observed in C. rubella (where a 100-1500 reduction in
Ne has been estimated (Foxe et al. 2009 (Chapter 2))). It is possible that evidence for a population bottleneck has eroded with time as the divergence of C. bursa-pastoris from
C. grandiflora is approximately 50 times older than the divergence of C. rubella from C. grandiflora. It is likely however, that C. bursa-pastoris had multiple origins, which would not leave a signature of an extreme population bottleneck.
In keeping with earlier results (Foxe et al. 2009 (Chapter 2)), levels of synonymous diversity were found to be reduced in C. rubella when compared to C. grandiflora. However, these estimates are 2.3 times higher than previous studies indicate.
Much of this diversity found in C. rubella is driven by the DFR locus (ix, synonymous =
0.1539). Dihydroflavonol reductase (DFR) is a key enzyme in anthocyanin synthesis
(Holton and Cornish 1995). Anthocyanins play a vital role in the synthesis of brick red,
112 red and blue pigments in plants (Holton and Cornish 1995). Interestingly, one of the
diagnostic characteristics of C. rubella is the presence of the reddish tinge to its fruits.
Recent unpublished flower bud expression data generated from Illumina mRNA
sequencing runs shows an approximate 5-fold increase in DFR expression in C. rubella
relative to C. grandiflora (S. Wright, personal communication). However this does not
explain the increased levels of diversity present in DFR in C. rubella. Anthocyanins are
also known to play a role in UV protection and defence against pathogens (Koes et al.
1994). Disease resistance and defence-related genes are often subject to balancing
selection resulting from continuing plant-pathogen dynamics (Tiffin and Moeller 2006).
It may be that DFR is under balancing selection in C. rubella, which would result in
elevated levels of genetic diversity.
While these results shed much light on the evolutionary origin of C. bursa- pastoris and establish the molecular phylogeny of the Capsella genus, little is still known about the extensive phenotypic changes that have occurred in both C. bursa-pastoris and
C. rubella. Understanding the genomic context and underlying evolutionary forces that have prompted these changes will be of considerable interest in future studies.
113 References
Acarkan A., M. Rossberg, M. Koch, and R. Schmidt (2000) Comparative genome analysis reveals extensive conservation of genome organisation for Arabidopsis thaliana and Capsella rubella. Plant J 23: 55-6
Ainouche, M. L., A. Baumel, and A. Salmon. 2004. Spartina anglica C. E. Hubbard: a
natural model system for analysing early evolutionary changes that affect
allopolyploid genomes. Biol J Lin Soc 82:475-484.
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local
alignment search tool. J Mol Biol 215:403-410.
Baudry, E., C. Kerdelhue, H. Innan, and W. Stephan. 2001. Species and recombination
effects on DNA variability in the tomato genus. Genetics 158:1725-1735.
Becquet, C, and M. Przeworski. 2007. A new approach to estimate parameters of
speciation models with application to apes. Genome Res 17:1505-1519.
Boivin K., A. Acarkan, R. S. Mbulu, O. Clarenz and R. Schmidt (2004) The Arabidopsis
genome sequence as a tool for genome analysis in Brassicaceae. A comparison of
the Arabidopsis and Capsella rubella genomes. Plant Physiol 135: 735-744
Charlesworth, D., and Z. Yang. 1998. Allozyme diversity in Leavenworthia populations
with different inbreeding levels. Heredity 81:453-461.
Coyne, J. A., and H. A. Orr. 2004. Speciation. Sinauer Associates, Inc., Sunderland,
Massachusetts.
114 Cronn, R., M. Cedroni, T. Haselkorn, C. Grover, and J. F. Wendel. 2002. PCR-mediated
recombination in amplification products derived from polyploid cotton. Theor
Appl Genet 104:482-489.
Doyle, J. J., and A. N. Egan. 2009. Dating the origins of polyploidy events. New Phytol.
Foxe, J. P., T. Slotte, E. A. Stahl, B. Neuffer, H. Hurka, and S. I. Wright. 2009. Rapid
morphological evolution and speciation associated with the evolution of selfing in
Capsella. PNAS 106:5241-5245.
Gill, N., S. Findley, J. G. Walling, C. Hans, J. Ma, J. Doyle, G. Stacey, and S. A.
Jackson. 2009. Molecular and chromosomal evidence for allopolyploidy in
soybean. Plant Physiol 151:1167-1174.
Glemin, S., Bazin, E. & Charlesworth, D. 2006. Impact of mating systems on patterns of
sequence polymorphism in flowering plants. Proc Biol Sci 273:3011-3019.
Grubbs, K. C, R. L. Small, and E. E. Schilling. 2009. Evidence for multiple, autoploid
origins of agamospermous populations in Eupatorium sessilifolium (Asteraceae).
Plant Sys Evol 279:151-161.
Guo, Y.-L., J. S. Bechsgaardb, T. Slotte, B. Neuffer, M. Lascoux, Weigel D., and M. H.
Schierup. 2009. Recent speciation of Capsella rubella from Capsella grandiflora,
associated with loss of self-incompatibility and an extreme bottleneck PNAS
106:5246-5251.
Hazzouri, K. M., A. Mohajer, S. I. Dejak, S. P. Otto, and S. I. Wright. 2008. Contrasting
patterns of transposable-element insertion polymorphism and nucleotide diversity
in autotetraploid and allotetraploid Arabidopsis species. Genetics 179:581-592.
115 Hegarty, M. J., and S. J. Hiscock. 2008. Genomic clues to the evolutionary success of
polyploid plants. Curr Biol 18:R435-444.
Hey, J. 2010. Isolation with migration models for more than two populations. Mol Biol
Evol 27:905-920.
Hey, J., and R. Nielsen. 2007. Integration within the Felsenstein equation for improved
Markov chain Monte Carlo methods in population genetics. PNAS 104:2785-
2790.
Hintz, M., C. Bartholmes, P. Nutt, J. Ziermann, S. Hameister, B. Neuffer, and G.
Theissen. 2006. Catching a 'hopeful monster': shepherd's purse (Capsella bursa-
pastoris) as a model system to study the evolution of flower development. J Exp
Bot 57:3531-3542.
Holton, T. A., and E. C. Cornish. 1995. Genetics and Biochemistry of Anthocyanin
Biosynthesis. The Plant Cell 7:1071-1083.
Hudson, R. R. 2002. Generating samples under a Wright-Fisher neutral model of genetic
variation. Bioinformatics 18:337-338.
Hurka, H., S. Freundner, A. H. Brown, and U. Plantholt. 1989. Aspartate
aminotransferase isozymes in the genus Capsella (Brassicaceae): subcellular
location, gene duplication, and polymorphism. Biochemical genetics 27:77-90.
Hurka, H., and B. Neuffer. 1997. Evolutionary processes in the genus Capsella
(Brassicaceae). Plant Sys Evol 206:295-316.
116 Jakobsson, M., J. Hagenblad, S. Tavare, T. Sail, C. Hallden, C. Lind-Hallden, and M.
Nordborg. 2006. A unique recent origin of the allotetraploid species Arabidopsis
suecica: Evidence from nuclear DNA markers. Mol Biol Evol 23:1217-1231.
Koch, M., B. Haubold, and T. Mitchell-Olds. 2001. Molecular systematics of the
Brassicaceae: evidence from coding plastidic matK and nuclear Chs sequences.
Am J Bot 88:534-544.
Kochert, G., H. T. Stocker, M. Gimenes, L. Galgaro, C. R. Lopes, and K. Moore. 1996.
RFLP and cytogenetic evidence on the origin and evolution of allotetraploid
domesticated peanut, Arachis hypogaea (Leguminosae). Am J Bot 83:1282-1291.
Koes, R. E., F. Quattrocchio, and M. J.M.N. 1994. The flavonoid biosynthetic pathway in
plants: function and evolution. BioEssays 16:123-132.
Levy, A. A., and M. Feldman. 2002. The impact of polyploidy on grass genome
evolution. Plant Physiol 130:1587-1593.
Librado, P., and J. Rozas. 2009. DnaSP v5: a software for comprehensive analysis of
DNA polymorphism data. Bioinformatics 25:1451-1452.
Liu, L. 2008. BEST: Bayesian estimation of species trees under the coalescent model.
Bioinformatics 24:2542-2543.
Mallet, J. 2007. Hybrid speciation. Nature 446:279-283.
Meimberg, H., K. J. Rice, N. F. Milan, C. C. Njoku, and J. K. McKay. 2009. Multiple
origins promote the ecological amplitude of allopolyploid Aegilops (Poaceae).
American Journal of Botany 96:1262-1273.
117 Muntzing, A. 1930. Outlines to a genetic monograph of the genus Galeopsis with special
reference to the nature and inheritance of partial sterility. Hereditas 13:185-341.
Nicholas, K. B., Nicholas, H.B. Jr., Deerfield, D.W. II. 1997. GeneDoc: Analysis and
Visualization of Genetic Variation. EMBNEW.NEWS 4:14.
Nielsen, R., and J. Wakeley. 2001. Distinguishing migration from isolation: a Markov
chain Monte Carlo approach. Genetics 158:885-896.
Noor, M. A., and J. L. Feder. 2006. Speciation genetics: evolving approaches. Nature
reviews 7:851-861.
Otto, S. P., and J. Whitton. 2000. Polyploid incidence and evolution. Ann Rev Gen
34:401-437.
Ramos-Onsins, S. E., B. E. Stranger, T. Mitchell-Olds, and M. Aguade. 2004. Multilocus
analysis of variation and speciation in the closely related species Arabidopsis
halleri and A. lyrata. Genetics 166:373-388.
Ramsey, J., and D. Schemske. 1998. Pathways, mechanisms and rates of polyploid
formation in flowering plants. Ann Rev Ecol Sys 29:467-501.
Raybould AF, G. A., Lawrence MJ, Marshall DF. 1991. The evolution of Spartina
anglica C.E. Hubbard (Gramineae): origin and genetic variability. Biological
Journal of the Linnean Society 83:1282-1291.
Ross-Ibarra, J., S. I. Wright, J. P. Foxe, A. Kawabe, L. DeRose-Wilson, G. Gos, D.
Charlesworth, and B. S. Gaut. 2008. Patterns of Polymorphism and Demographic
History in Natural Populations of Arabidopsis lyrata. PloS One 3:e2411.
118 Shimizu-Inatsugi, R., J. Lihova, H. Iwanaga, H. Kudoh, K. Marhold, O. Savolainen, K.
Watanabe, V. V. Yakubov, and K. K. Shimizu. 2009. The allopolyploid
Arabidopsis kamchatica originated from multiple individuals of Arabidopsis
lyrata and Arabidopsis halleri. Mol Ecol 18:4024-4048.
Skalinska, M. 1945. Cytogenetic studies in triploid hybrids of Aquilegia. J Gen 47:87-
111.
Slotte, T., A. Ceplitis, B. Neuffer, H. Hurka, and M. Lascoux. 2006. Intrageneric
phylogeny of Capsella (Brassicaceae) and the origin of the tetraploid C. bursa-
pastoris based on chloroplast and nuclear DNA sequences. Am J Bot 93:1714-
1724.
Slotte, T., H. Huang, M. Lascoux, and A. Ceplitis. 2008. Polyploid speciation did not
confer instant reproductive isolation in Capsella (Brassicaceae). Mol Biol Evol
25:1472-1481.
Smith, B. J. 2007. boa: An R Package for MCMC Output Convergence Assessment and
Posterior Inference. Journal of Statistical Software 21:1-37.
Soltis, D. E., and P. S. Soltis. 1993. Molecular data and the dynamic nature of
polyploidy. Crit Rev Plant Sci 12:243-273.
Soltis, D. E., and P. S. Soltis. 1999. Polyploidy: recurrent formation and genome
evolution. TREE 14:348-352.
Soltis, D. E., P. S. Soltis, and J. A. Tate. 2003. Advances in the study of polyploidy since
Plant Speciation. New Phyt 161:173-191.
Stebbins, G. L. 1971. Chromosomal Evolution in Higher Plants. Arnold, London, UK.
119 Stebbins G. L. 1950. Variation and evolution in plants. Columbia University Press, New
York, New York, USA.
Stephens, M., and P. Donnelly. 2003. A comparison of bayesian methods for haplotype
reconstruction from population genotype data. AM J Hum Gen 73:1162-1169.
Stephens, M., N. J. Smith, and P. Donnelly. 2001. A new statistical method for haplotype
reconstruction from population data. AM J Hum Gen 68:978-989.
Symonds, V. V., P. S. Soltis, and D. E. Soltis. 2010.- Dynamics of Polyploid Formation in
Tragopogon (Asteraceae): Recurrent Formation, Gene Flow, and Population
Structure. Evolution 64:1984-2003.
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA
polymorphism. Genetics 123:585-595.
Tajima, F. 1993. Measurement of DNA polymorphism. Pp. 37-60 in N. Takahata, and A.
Clark, eds. Mechanisms of molecular evolution. Japan Scientific Societies Press,
Tokyo.
Tiffin, P., and D. A. Moeller. 2006. Molecular evolution of plant immune system genes.
Trends Genet 22:662-670.
Wakeley, J., and J. Hey. 1997. Estimating ancestral population parameters. Genetics
145:847-855.
Wang, R. L., J. Wakeley, and J. Hey. 1997. Gene flow and natural selection in the origin
of Drosophila pseudoobscura and close relatives. Genetics 147:1091-1106.
Watterson, G. A. 1975. On the number of segregating sites in genetical models without
recombination. Theor Popul Biol 7:256-276.
120 Wood, T. E., N. Takebayashi, M. S. Barker, I. Mayrose, P. B. Greenspoon, and L. H.
Rieseberg. 2009. The frequency of polyploid speciation in vascular plants. PNAS
106:13875-13879.
Zhang, L.-B., and S. Ge. 2007. Multilocus analysis of nucleotide variation and speciation
in Oryza officinalis and its close relatives. Mol Biol Evol
121 Table 1. Minimum number of synonymous substitutions and number of fixed differences between C. bursa-pastoris A and C. grandiflora, C. bursa-pastorisB and C. grandiflora
and C. bursa-pastoris A and C. bursa-pastorisB. Number of fixed differences are given in parentheses.
C. bursa-pastoris A C bursa-pastorisB C bursa-pastorisA Locus and C. grandiflora and C. grandiflora and C. bursa-pastorisB Atlg03560 0(0) 2(1) 1(1) Atlgl5240 1(1) 2(1) 4(4) Atlg65450 2(1) 2(2) 4(2) Atlg77120 0(0) 2(2) 2(2) At2gl8790 0(0) 4(4) 6(5) At2g26730 2(0) 2(1) 8(4) At4g00650 1(0) 2(0) 3(2) At4g02560 0(0) 0(0) 0(0) At4g08920 0(0) 6(6) 6(4) At4gl4190 0(0) 2(0) 4(1) At5gl0140 0(0) 0(0) 0(0) At5g42800 3(0) 5(0) 0(0) At5g51670 2(0) 4(1) 4(3) At5g53020 0(0) KD KD
122 Table 2. Modes of parameter estimates under a range of MIMAR models. 90% HPD intervals are given in parentheses.
c Dataset and Model (A) ' (Spl) (Sp2) Mspl-Sp2 McsP2-spi
Constrained model, 0.02588 0.02588 0.00307 n/a n/a 278,278(116,568,
Species 1 is C. grandiflora (0.01930, (0.01930, (0.00149, 518,126)
and Species 2 is C. bursa- 0.03711) 0.03711) 0.00631) pastorisA
Constrained model, 0.02601 0.002601 0.00522 n/a n/a 1,060,000
Species 1 is C. grandiflora (0.01951, (0.01951, (0.00256, (639,571,
and Species 2 is C. bursa- 0.03499) 0.03496) 0.00842) 1,567,280)
pastorisB
Unconstrained model, 0.03843 0.00526 0.00424 n/a n/a 562,563 (220,087,
Species 1 is C. bursa-pastorisA (0.01891, (0.00255, (0.00218, 1,159,130)
and Species 2 is C. bursa- 0.07939) 0.00851) 0.00771)
pastorisB
Constrained model, 0.02524 0.02524 0.00307 0.33664 n/a 314,314(75,036,
symmetrical migration, (0.01721, (0.01721, (0.00129, (0.00674: 2,203,320)
Species 1 is C. grandiflora 0.03431) 0.03431) 0.006013) 0.85197)
and Species 2 is C. bursa-
123 pastorisA
Constrained model, 0.02538 0.02538 0.00413 0.03912 n/a 1,263,260(730, symmetrical migration, (0.01854, (0.01854, (0.00167, (0.00674, 954, 2,029,360)
Species 1 is C. grandiflora 0.03438) 0.03438) 0.00750) 0.12639) and Species 2 is C. bursa- pastorisB
Unconstrained model, 0.02656 0.00516 0.00395 0.01214 n/a 1,279,280 symmetrical migration, (0.00250, (0.00287, (0.00191, (0.00674: (397,615,
Species 1 is C. bursa-pastoris A 0.06506) 0.00902) 0.00746) 0.04608) 2,480,350) and Species 2 is C. bursa- pastorisB
Constrained model, 0.02524 0.02524 0.00200 0.00866 2.84202 342,342 (107,360; asymmetrical migration, (0.01814, (0.01814, (0.00045, (0.00674, (0.00675, 3,493,050)
Species 1 is C grandiflora 0.03494) 0.03494) 0.00514) 0.63413) 5.89006) and Species 2 is C. bursa- pastorisA
Constrained model, 0.026374 0.026374 0.00323 0.00707 0.24184 1,347,350 asymmetrical migration, (0.01889, (0.01889, (0.00123, (0.00674, (0.00675, (727,725,
124 Species 1 is C. grandiflora 0.03486) 0.03486) 0.00669) 0.09163) 0.79889) 2,537,080)
and Species 2 is C. bursa- pastorisB
Unconstrained model, 0.019028 0.00555 0.00384 0.00707 0.01546 1,355,360(393,
asymmetrical migration, (0.00250, (0.00292, (0.00171, (0.00674, (0.00674, 270,2,812,750)
Species 1 is C. bursa-pastoris A 0.07184) 0.00928) 0.00712) 0.06124) 0.11387)
and Species 2 is C. bursa- pastorisB
a 8 8 (4NepL, where Ne is the effective population size and fi is the mutation rate 1.5 X 10" ) for the ancestor (A), species 1 (Spl)
and species 2 (Sp2)
Migration rate (4Nem) from Species 1 to Species 2
c Migration rate (4Nem) from Species 2 to Species 1
rfTime of the split of Species 1 and Species 2
125 Figure 1. Floral organs and petals are reduced in C. bursa-pastoris and C. rubella (left and middle respectively) compared with C. grandiflora (right).
126 o o c w s
o q o
Figure 2. Comparison of silent polymorphism patterns between C. bursa-pastoris A, C bursa-pastorisB, C. grandiflora and C. rubella, given by n synonymous, where n is the average pairwise difference. Bars represent the median, boxes the interquartile range, and whiskers extend out to 1.5 times the interquartile range.
127 35
30
I 25 a "S20 LL * 15 e z 10
CbpAandCDpe CbpAandCgl C&pAandCrub CbpBandCgf CbpBanoCfLd Cgla-aCrua
Figure 3. Number of synonymous fixed differences between all pairs of C. bursa- pastorisA, C. bursa-pastorisB, C grandiflora and C. rubella
128 A. thaliana
C. bursa-pastorisA
C. bursa-pastorisB
C. grandiflora
C rubella
A. thaliana
Neslia
C. bursa-pastorisA
C. bursa-pastorisB
C. grandiflora
B C. rubella
129 Figure 4. Bayesian estimates of species trees of the Capsella genus generated using
BEST including the close relatives a) A. thaliana and b) A. thaliana and Neslia as outgroups. The numbers at each node represent branch lengths.
130 >. 0 005 ^ 0O045] "§ 0004 O 00351 f„ 0O03 fa 0 00251 2 0 002 as 0.0015 g 0.001 O. 0 00051
J5- 0 005 ~ 0.0045 « 0.004 9 0.0035 ^ 0.003 S 0.0025 S 0.002 g 0.0015 Q- 0.001 0.0005 B 2000000 3000000 4000000
Figure 5. Marginal posterior distributions of speciation parameters estimated by MIMAR, with posterior modes showing good fit to data summaries. 0 = ANepi where Ne is the effective population size and /A is the mutation rate (1.5*10"8) a) constrained model assuming equal effective population sizes in the ancestor as in present-day C. grandiflora.
Model 1; Species 1 = C. grandiflora, Species 2 = C. bursa-pastorisA is shown in blue
Model 2; Species 1 = C. grandiflora, Species 2 = C. bursa-pastorisB is shown in red
131 1 C. grandiflora
2 C bursa-pastorisA (given in blue) and C. bursa-pastorisB (given in red)
Tgen Divergence time (years) between C. grandiflora and C. bursa-pastoris A (in blue) and C. bursa-pastorisB (in red) b) unconstrained model
A ancestral C. grandiflora
1 C. bursa-pastoris A (given in blue)
2 C. bursa-pastorisB (given in red)
Tgen Divergence time (years) between C. bursa-pastoris A and C. bursa-pastorisB
132 C. bursa-pastorisA C. bursa-pastorisB
C. grandiflora J C. bursa-pastorisA
C. grandiflora C. bursa-pastorisB
Observed 1
Figure 6. Numbers of unique synonymous variants and for each of the three species pairs a) C. bursa-pastoris A and C. bursa-pastorisB, b) C. grandiflora and C. bursa-pastoris A and c) C. grandiflora and C. bursa-pastorisB. Shown are the observed values as well as 5 simulated datasets for which an autopolyploid event was simulated using the coalescent simulation program ms (see Methods).
133 C. bursa-pastorisA and C. bursa-pastorisB C. grandiflora and C. bursa-pastorisA C. grandiflora and C. bursa-pastorisB
Observed 12 3 4 5 Figure 7. Numbers of a) synonymous shared variants and b) synonymous fixed differences for each of the three species pairs 1) C. bursa-pastoris A and C. bursa- pastorisB, 2) C. grandiflora and C. bursa-pastoris A and 3) C. grandiflora and C. bursa- pastorisB. Shown are the observed values as well as 5 simulated datasets for which an autopolyploid event was simulated using the coalescent simulation program ms (see
Methods).
134 Supplementary Information
Table SI. Accession name and origin of each individual used in this study.
Species Accession Origin C. bursa-•pastoris AQ404 China C. bursa-•pastoris AQ416 China C. bursa -pastoris BJB234 China C. bursa •pastoris BJB236 China C. bursa -pastoris CSH1 China C. bursa •pastoris CSH2 China C. bursa •pastoris CSH3 China C. bursa •pastoris DL167 China C. bursa •pastoris DL170 China C. bursa -pastoris GY31 China C. bursa -pastoris GY36 China C. bursa -pastoris HD11 China C bursa •pastoris HD6 China C. bursa -pastoris HD63 China C. bursa -pastoris HD68 China C. bursa -pastoris HRB135 China C. bursa -pastoris HRB137 China C. bursa -pastoris HY7 China C bursa -pastoris HY76 China C bursa -pastoris HY89 China C. bursa -pastoris JZH1 China China C. bursa -pastoris JZH145 China C. bursa -pastoris JZH147 China C bursa--pastoris KMB205 China C. bursa-•pastoris KMB21 China C. bursa-•pastoris KMB210 China C. bursa-•pastoris LN16(DL) China C. bursa-•pastoris NCH3 China C. bursa--pastoris NCH350 China C. bursa-•pastoris NCH355 China C bursa- NCHJ (NCH) •pastoris China C. bursa- NJ218 •pastoris China C. bursa- •pastoris NJ222 China C. bursa- •pastoris QD3 China C. bursa- •pastoris QD318 China C. bursa- •pastoris QD319 China C. bursa-•pastoris QD4 China C. bursa-•pastoris QF3
135 C. bursa-pastoris QF335 China C bursa-pastoris QF340 China C. bursa-pastoris SHX2 (XA) China C. bursa-pastoris SHX3 China C bursa-pastoris TY11 China C. bursa-pastoris TY121 China C. bursa-pastoris TY123 China C. bursa-pastoris WH2 China C. bursa-pastoris WH48 China C bursa-pastoris WH49 China C. bursa-pastoris XA106 China C bursa-pastoris XA110 China C bursa-pastoris XN442 China C. bursa-pastoris XY11 China C. bursa-pastoris XY18 China C bursa-pastoris XY20 China C. bursa-pastoris ZZH2 China C. bursa-pastoris ZZH279 China C. bursa-pastoris ZZH284 China C. bursa-pastoris ZZH3 China C. bursa-pastoris 6 20 France C. bursa-pastoris FR4 2 France C. bursa-pastoris ISR3 3(MAE-ISR3) Israel C. bursa-pastoris 22 19 Italy C bursa-pastoris 32 25 Italy C. bursa-pastoris 32 27 Italy C bursa-pastoris 39 12 Italy C. bursa-pastoris BEL RU13-1 Russia C. bursa-pastoris MOG RUlO Russia C bursa-pastoris OBL RU5 Russia C. bursa-pastoris RUlO 2 (MOG-RUIO) Russia C. bursa-pastoris RU3 1 (KAB-RU3) Russia C bursa-pastoris RU3 2(KAB-RU3) Russia C. bursa-pastoris RU5 5(OBL-RU5) Russia C. bursa-pastoris RU8 1 (BAD-RU8) Russia C. bursa-pastoris C. bursa-pastoris VLA RU2 Russia Spain C. bursa-pastoris 1272 16 2 C. bursa-pastoris TWPL Taiwan C. bursa-pastoris CHPL (TWPL) ? C grandiflora CHPY (TWTY) ? C. grandiflora 8c block3 Greece C. grandiflora 1 8h (8h blockl) Greece C. grandiflora lOh Greece llf Greece
136 C. grandiflora 2e_block5 Greece C. grandiflora 4a block5 Greece C grandiflora 4o Greece C grandiflora 7g block3 Greece C grandiflora 910 18 1 Greece (Corfu) C. grandiflora 910 18 2 Greece (Corfu) C grandiflora 910 20 1 Greece (Corfu) C grandiflora 910 20 2 Greece (Corfu) C. grandiflora 910 21 1 Greece (Corfu) C. grandiflora 910 21 2 Greece (Corfu) C grandiflora 918 1 1 Greece (Corfu) C grandiflora 918 1 2 Greece (Corfu) C. grandiflora 921 3 1 Greece (Corfu) C. grandiflora 921 3 2 Greece (Corfu) C. grandiflora 925 14 1 Greece (Katara Pass) C grandiflora 925 14 2 Greece (Katara Pass) C grandiflora 925 19 1 Greece (Ioannina) C grandiflora 925 19 2 Greece (Ioannina) C. grandiflora 925 21A 1 Greece (Ioannina) C. grandiflora 925 21A 2 Greece (Ioannina) C. grandiflora 925 26A 1 Greece (Ioannina) C grandiflora 925 26A 2 Greece (Ioannina) C. grandiflora 925 9A 1 Greece (Ioannina) C. grandiflora 925 9A 2 Greece (Ioannina) C. grandiflora 926 4 1 Greece (Corfu) C. grandiflora 926 4 2 Greece (Corfu) C. grandiflora 926 8 1 Greece (Corfu) C. grandiflora 926 8 2 Greece (Corfu) C. grandiflora 933 14 1 Greece (Katara Pass) C. grandiflora 933 14 2 Greece (Katara Pass) C. grandiflora 933 15 1 Greece (Katara Pass) C. grandiflora 933 15 2 Greece (Katara Pass) C. grandiflora 933 18 1 Greece (Katara Pass) C. grandiflora 933 18 1 Greece (Katara Pass) C. grandiflora 933 18 2 Greece ^Katara Pass) C. grandiflora 933 18 2 Greece ^Katara Pass) C. grandiflora 934 31 1 Greece (Metsovo) C. grandiflora 934 31 2 Greece (Metsovo) C. grandiflora 934 32 1 Greece (Metsovo) C. grandiflora 934 32 2 Greece (Metsovo) C grandiflora 935 13 1 Greece (Corfu) C grandiflora 935 13 2 Greece (Corfu) C. grandiflora 935 15A 1 Greece (Corfu) C grandiflora 935 15A 2 Greece (Corfu)
137 C. grandiflora 935 17 1 Greece (Corfu) C grandiflora 935 17 2 Greece (Corfu) C. grandiflora 935 4A 1 Greece (Corfu) C. grandiflora 935 4A 2 Greece (Corfu) C. grandiflora 91 Greece C. rubella 60 1 Algeria C. rubella 1377 10A Argentina C. rubella 1377 1A Argentina C. rubella 1377 2A Argentina C. rubella 1377 5A Argentina C. rubella 6 24 France C rubella 6 25 France C. rubella 6 26 France C rubella 63 1 France C. rubella FR1 3 (U S-FR1) France C. rubella FR4 1 (STJ-FR4) France C. rubella U S FR1 France C. rubella 925 3 Greece C. rubella 925 6 • Greece C. rubella AET ISR1 Israel C rubella ENY ISR2 Israel C rubella ISR1 6(AET-ISR1) Israel C rubella ISR2 2 (ENY-ISR2) Israel C. rubella 2 3 Italy C. rubella 2 5 Italy C. rubella 22 12 Italy C. rubella 22 15 Italy C rubella 23 13 Italy C. rubella 27 6 Italy C. rubella 28 9 Italy C. rubella 3 11 Italy C. rubella 32 13 Italy C. rubella 32 15 Italy C. rubella 32 17 Italy C. rubella 35 13 Italy C. rubella 39 5 Italy C. rubella 49 13 Italy C. rubella 8 5 Luxembourg C rubella 4 23 Spain C. rubella 50 14 Spain C. rubella 1209 38A Spain (Canary Islands) C rubella 1209 24 Spain (Canary Islands) C rubella 1209 26 Spain (Canary Islands) C. rubella 1209 36 Spain (Canary Islands)
138 C. rubella 1504_10 Spain (Canary Islands) C. rubella 1504_2A Spain (Canary Islands) C. rubella 15048 Spain (Canary Islands) C. rubella 1204 2 Spain (Teneriffa)
139 Table S2. Names and gene ontology terms for each of the 14 loci studied.
Locus Gene Ontology Terms
Atlg03560 pentatricopeptide (PPR) repeat-containing
Atlgl5240 phox (PX) domain-containing protein
Atlg65450 transferase family protein
Atlg77120 ADH1 (alcohol dehydrogenase 1)
PHYB (PHYTOCHROME B); G-protein coupled photoreceptor
At2gl 8790 / protein histidine kinase/ red or far-red light photoreceptor (PHYB)
At2g26730 leucine-rich repeat transmembrane protein
FRI (FRIGIDA); protein heterodimerization/protein homodimerization
At4g00650 (FRI)
At4g02560 LD (luminidependens); transcription factor (LD)
ATP binding/blue light photoreceptor/protein homodimerization
At4g08920 /protein kinase (CRY1)
At4g 14190 pentatricopeptide (PPR) repeat-containing
specific transcriptional repressor/ transcription factor (FLC)/flowering
At5gl0140 locusC
At5g42800 dihydrokaempferol 4-reductase (DFR)
At5g51670 expressed protein
At5g53020 expressed protein
140 Table S3. Modes of parameter estimates under a range of MIMAR models for 5 simulated datasets where an autopolyploid event was simulated using the coalescent simulation program ms (See Methods). 90% HPD intervals are given in parentheses.
Model Dataset 0^ 6\s^ 0^ T6
#
Constrained model, Species 1 is C.grandiflora and 1 0.03168(0.02389, 0.03168(0.02389, 0.00296(0.00136, 254,254(114,175,
Species 2 is C. bursa-pastorisA 0.04140) 0.04140) 0.00572) 139,990)
2 0.03344 (0.02533, 0.03344 (0.02533, 0.00317(0.00154, 430,430 (223,414,
0.04348) 0.04348) 0.00583) 795,308)
3 0.02699 (0.01965, 0.02699 (0.01965, 0.00313(0.00146, 286,286(118,124,
0.03559) 0.03559) 0.00604) 531,389)
4 0.02688 (0.01903, 0.02688 (0.01903, 0.00459(0.00219, 310,310(148,054,
0.03485) 0.03485) 0.00804) 572,365)
5 0.03509(0.02677, 0.03509(0.02677, 0.00346(0.00174, 342,342(163,933,
0.04526) 0.04526) 0.00652) 600,000)
Constrained model, Species 1 is C. grandiflora 1 0.02687 (0.02014, 0.02687 (0.02014, 0.00529(0.00316, 974,975 (598,900,
and Species 2 is C bursa-pastorisB 0.03529) 0.03529) 0.00906) 1,414,660)
~2 0.03331 (0.02515, 0.03331 (0.02515, 0.00166(0.00610, 1,023,020(602,382,
0.04299) 0.04299) 0.00378) 1,490,120)
141 0.03164(0.02372, 0.03164(0.02372, 0.00771 (0.00436, 950,951 (584,542,
0.04109) 0.04109) 0.01124) 1,378,180)
0.02646(0.01921, 0.02646(0.01921, 0.00554(0.00318, 1,031,030(652,155,
0.03416) 0.03416) 0.00897) 1,502,950)
0.02795 (0.02094, 0.02795 (0.02094, 0.00375(0.00184, 854,855(529,511,
0.03714) 0.03714) 0.00654) 1,295,880)
Unconstrained model, Species 1 is 0.036258(0.01815, 0.00379(0.00185, 0.00357(0.00184, 426,426 (179,693,
C. bursa-pastorisA and Species 2 is C. bursa- 0.06888) 0.00662) 0.00659) 807,732) pastorisB 0.02579(0.00251, 0.00324(0.00156, 0.00545(0.00317, 926,927 (445,775,
0.05503) 0.00594) 0.00908) 1,865,000)
0.00614 (0.00250, 0.00315(0.00150, 0.00485 (0.00276, 1,143,140(503,620,
0.03983) 0.00590) 0.00829) 1,824,720)
0.01634(0.00250, 0.00457 (0.00235, 0.00237(0.00103, 558,559 (220,947,
0.04241) 0.00784) 0.00471) 1,277,640)
0.01699(0.00251, 0.00386(0.00195, 0.00474 (0.00270, 982,983 (449,219,
0.04681) 0.00680) 0.00818) 1,777,900)
a 8 6 (4Ne[i, where Ne is the effective population size and /u is the mutation rate 1.5 X 10" ) for the ancestor (A), species 1 (Spl)
and species 2 (Sp2) Time of the split of Species 1 and Species 2
142 A B c grandiflora
Figure SI. Autopolyploid event simulated using the coalescent simulation program ms.
8 is the effective population size (4Ne[t, where Ne is the effective population size and /u is the mutation rate 1.5 X 10"8).
1 is C. grandiflora
2 is C. bursa-pastoris locus A
3 is C. bursa-pastoris locus B
T is the time at which the autopolyploid event occurred
143 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
CHAPTER 5
Title: Reconstructing origins of loss of self-incompatibility and selfing in North
American Arabidopsis lyrata: a population genetic context
Running title: Reconstructing origins of selfing in A lyrata
John Paul Foxe*1, Marc Stiff*2'3, Andrew Tedder2, Annabelle Haudry2, Stephen I.
Wright4'5, Barbara K. Mable2
*These authors contributed equally to this work
1 Department of Biology, York University, 4700 Keele St. Toronto, ON Canada M3J 1P3
2 Division of Ecology and Evolutionary Biology, University of Glasgow, Glasgow, Scotland G12 8QQ
3 Present address: Centra de Investigacao em Biodiversidade e Recursos Geneticos, Campus Agrario de Vairao, R. Padre Armando Quintas, 4485-661, Vairao, Portugal
4 Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcocks St., Toronto, ON Canada M5S 3B2
5 Centre for the Analysis of Genome Evolution and Function, University of Toronto
Corresponding Author: Dr. Marc Stift Centra de Investigacao em Biodiversidade e Recursos Geneticos Campus Agrario de Vairao R. Padre Armando Quintas 4485-661, Vairao, Portugal email: [email protected] Phone:+351 252660411 Fax:+351 252661780
Keywords: mating system evolution; breakdown of self-incompatibility; demography; inbreeding depression; Arabidopsis; population genetics; bottlenecks; effective population size
144 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Abstract
Theoretical and empirical comparisons of molecular diversity in selfing and outcrossing plants have primarily focused on long-term consequences of differences in mating system (between species). However, improving our understanding of the causes of mating system evolution requires ecological and genetic studies of the early stages of mating system transition. Here, we examine nuclear and chloroplast DNA sequences and microsatellite variation in a large sample of populations of Arabidopsis lyrata from the Great Lakes region of Eastern North American that show intra- and inter-population variation in the degree of self-incompatibility and realized outcrossing rates. Populations show strong geographic clustering irrespective of mating system, suggesting that selfing either evolved multiple times or has spread to multiple genetic backgrounds. Diversity is reduced in selfing populations, but not to the extent of the severe loss of variation expected if selfing evolved due to selection for reproductive assurance in connection with strong founder events. The spread of self-compatibility in this region may have been favored as colonization bottlenecks following glaciation or migration from Europe reduced standing levels of inbreeding depression. However, our results do not suggest a single transition to selfing in this system, as has been suggested for some other species in the Brassicaceae.
145 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Introduction
Inbreeding has often been posited as an evolutionary dead end because of the accumulation of slightly deleterious mutations and reduced adaptability (Stebbins
1957; Takebayashi and Morrell 2001). Paradoxically, the transition to selfing is cited as one of the most common major evolutionary transitions across the plant kingdom and is well documented in a variety of species across many genera (Stebbins 1950;
Stebbins 1970; Grant 1981; Barrett et al. 1996; Barrett 2002; Igic et al. 2008). Selfers have two main advantages over outcrossers: an inherent transmission advantage
(Fisher 1941) and the ability to reproduce without mates (Darwin 1876; Kalisz et al.,
2004; Charlesworth 2006). While outcrossers only transmit 50% of their genome to their offspring, strict selfers transmit their whole genome and can at the same time act as pollen donors for seed produced by other individuals (Fisher 1941). Moreover, unlike outcrossers, selfers can reproduce when pollinators or potential mates are limited (reproductive assurance, first proposed by Darwin (1876)). Thus, a selfing lifestyle can result in increased colonization ability, as a new population may be founded from a single plant (Baker 1955; Stebbins 1956; Stebbins 1957; Pannell and
Barrett 1998). Particularly in the initial stages after a transition to selfing, increased homozygosity may lead to inbreeding depression (the reduction in fitness of selfed versus outcrossed individuals) due to expression of recessive deleterious load. Hence, theory predicts that selfing is only likely to evolve when the advantages of selfing outweigh the costs associated with inbreeding depression (Charlesworth and
Charlesworth 1987).
146 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
A selfing strategy is also expected to come at the cost of reduced genetic variation (Charlesworth et al. 1993; Nordborg 2000; Charlesworth and Wright 2001;
Glemin et al. 2006; Wright et al. 2008). First, selfing increases levels of homozygosity, thereby reducing the effective population size (iVe) and levels of diversity up to twofold under complete selfing (Pollak 1987). This increased homozygosity also leads to a reduction in effective recombination rate, resulting in increased linkage disequilibrium (LD) across loci (Nordborg 2000). This facilitates genetic hitchhiking through both positive (selective sweeps (Maynard Smith and
Haigh 1974)) and negative selection (background selection (Charlesworth et al.
1993)). Genetic hitchhiking exacerbates the reduction in 7Ye and diversity. Finally, increased population turnover and colonization bottlenecks in selfing plants may contribute to further reductions in diversity (Ingvarsson 2002).
Empirically, the population genetic consequences associated with a transition to selfing have been well documented at the species level in both plant and animal systems (Charlesworth and Yang 1998; Baudry et al. 2001; Chiang et al. 2003; Cutter and Payseur 2003; Glemin et al., 2006). Selfing species are typically characterized by greater than twofold reductions in diversity, consistent with roles for genetic hitchhiking and/or increased colonization bottlenecks (Wright et al. 2008). These patterns of reduced genetic diversity in selfers have been found across a number of plant genera that include both outcrossing and selfing species, including
Leavenworthia (Charlesworth and Yang 1998; Liu et al. 1998; Filatov and
Charlesworth 1999; Liu et al. 1999), Arabidopsis (Ross-Ibarra et al. 2008; Savolainen
147 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 et al. 2000; Wright et al. 2003), Lycopersicon (Baudry et al. 2001), and Miscanthus
(Chiang et al. 2003).
The model plant system A. thaliana is thought to have evolved self- fertilization approximately 1 million years ago through inactivation of the self- incompatibility locus, referred to as the S-locus (Tang et al. 2007). Evidence for the role of the S-locus stems from transformation studies, which identified five accessions in which full self-incompatibility could be restored by transformation with a functional S-locus, and implies that all other genes required for SI are still intact in these accessions (Tang et al. 2007; Boggs et al. 2009). Recent results suggest that a mutation in the male component of self-incompatibility (SCR) has resulted in loss of
SI, apparently across a wide range of accessions (Tsuchimatsu et al. 2010). In addition, a modifier locus has been identified, unlinked to the S-locus (Liu et al.
2007), which suggests that S-locus inactivation may not be the sole mechanism by which SI broke down in A. thaliana and different mechanisms of loss could have operated in different accessions (Boggs et al. 2009).
Systems with more recent transitions from outcrossing to selfing may provide a more direct picture of the causes and short-term consequences of mating system evolution (Foxe et al. 2009 (Chapter 2); Guo et al. 2009; Ness et al. 2010). For example, if the evolution of selfing involves the long-term spread of modifiers through previously outcrossing populations, recently-derived selfing populations are expected to retain reasonably high levels of ancestral polymorphism, as recently observed in Eichhornia paniculata (Ness et al. 2010). In contrast, if a highly selfing
148 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 lineage evolves rapidly from a small number of founders, we would expect a severe loss of genetic variation, as seen in Capsella rubella (Foxe et al. 2009 (Chapter 2)).
Here, we have set out to investigate the loss of self-incompatibility in North American populations of the normally outcrossing species ,4. lyrata (Brassicaceae).
It has been suggested that A. lyrata colonized North America from ancestral
European populations (Clauss and Mitchell-Olds 2006; Ross-Ibarra et al. 2008), which are highly self-incompatible and exclusively outcrossing. The North American populations are unique because some are still predominantly outcrossing, despite the occurrence of self-compatible individuals at low frequency, while others are almost entirely self-compatible and have undergone a transition to high rates of selfing
(Mable et al. 2005; Mable and Adam 2007). This transition to selfing in A. lyrata appears to be very recent, as selfing populations belong to a chloroplast lineage that also contains outcrossing populations (Hoebe et al. 2009). Moreover, selfing populations are not characterized by smaller flowers (Hoebe 2009), which contrasts with other systems where the transition to selfing has led to notable floral evolution towards smaller flowers (Hurka and Neuffer 1997; Charlesworth and Vekemans
2005; Tang et al. 2007; Foxe et al. 2009 (Chapter 2); Guo et al. 2009).
Previous work on North American populations of A. lyrata in the Great Lakes region (where loss of self-incompatibility has occurred) has been based on chloroplast sequences and microsatellite genotypes. The former, as a marker with a single coalescent history and limited variation across populations, allowed (limited) phylogeographic inferences (Hoebe et al. 2009, Tedder et al. 2010). The latter
149 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 allowed basic inferences about population structure and diversity (Mable et al. 2005;
Mable & Adam 2007; Hoebe et al. 2009), but suffers from potential limitations due to homoplasy and uncertainties in the mutation model, particularly for the individual markers that we had been using (Muller et al. 2008). However, nuclear gene sequences are more powerful to explicitly test population genetic predictions about the reductions in diversity in selfing populations. They also allow for the detection of recombination, and for testing whether selfing populations show more evidence of departures from demographic equilibrium than outcrossing populations. We have extended the population sampling presented in previous work (Mable et al. 2005;
Mable and Adam 2007; Hoebe et al. 2009), primarily to establish if there are more populations that have undergone a transition to selfing.
In this study, we integrate polymorphism information from nuclear genes, chloroplast markers and nuclear microsatellites, in order to obtain a detailed picture of the demographic history and population structure of A. lyrata in the Great Lakes region of North America. Our ultimate goal is to use this framework to elucidate the origins of the selfing populations. Specifically, we aim to: 1) investigate the demographic and population genetic consequences of losses of SI and transition to selfing, by describing population structure and testing the effects of individual selfing phenotype and selfing rate on genetic diversity; and 2) elucidate the extent to which severe population bottlenecks may have played a role in the transition to inbreeding.
150 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Methods
Sampling
The samples for this study were collected from 22 locations throughout the
Great Lakes region of eastern North America (Figure 1, Table SI). From each location, we collected batches of seeds from 25-30 independent plants (growing at least 5 meters apart). More detailed sampling description is available as a supplement
(Supporting Information). In one location (Tobermory Cliffs, on the Bruce Peninsula that extends into Georgian Bay), we collected seeds from two spatially separated areas: in the first, plants were previously demonstrated to be highly selfing (dubbed
TC: Mable et al. 2005); in the second, plants were observed to have lower and more variable seed set in the field and were thus suspected of being more highly outcrossing (TCA). In another location (Tobermory Singing Sands, also on the Bruce
Peninsula, but on the Lake Huron side), we collected seeds from two spatially separated areas: in the first, plants grew on the characteristic sand dune habitat and were previously demonstrated to be highly outcrossing (dubbed TSS: Mable et al.
2005); in the second, plants grew on alvar (limestone pavement) and were observed to have high seed set in the field, suggestive of selfing (TSSA). We grew up eight plants from each of 10 seed batches from each of the 24 population samples (22 locations, cf. Table SI), forming a collection of 192 plants, grown in a common greenhouse environment at the Scottish Crop Research Institute, Invergowrie under a constant regime of 16 hours light: 8 hours dark, 22°C days and 18°C night.
151 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Selfing phenotype determination
For all individual plants that flowered, we manually self-pollinated six
flowers. The resulting siliques were scored as either negative (no seeds), small (a
short silique smaller than 9 mm with no more than 3 seeds) or positive (a silique of 9
mm or longer with more than three seeds). Small siliques were considered as negative
for the purposes of classifying selfing phenotypes (as in Mable et al. 2005) but were
recorded to enable assessment of whether the degree of leakiness in the SI system
varied by population or geographic region. Based on this, we defined the selfing
phenotype of each plant based on the siliques produced after manual self-pollination
as: 1) self-incompatible (SI): zero or one (out of six) positive siliques; 2) self-
compatible (SC): five or six (out of six) positive siliques; and 3) partially self-
compatible (PC): two, three or four (out of six) positive siliques. To exclude pollen
sterility or total sterility causing a false SI phenotype, plants classified as self-
incompatible were crossed with plants from the same population to test for fertility.
All self-incompatible plants were cross-fertile with at least one other plant tested.
Mating system determination (establishing population level outcrossing rates)
For 12 populations (IND, LPT, LSP, MAN, PIC, PIN, PTP, PUK, RON, TC,
TSS, WAS), multi-locus outcrossing rates had been determined in previous studies
(Mable and Adam 2007). For nine of the newly sampled populations (BEI, HDC,
KTT, OWB, PCR, PIR, PRIA, SBD, TSSA) we used the same microsatellite markers
and procedures to calculate outcrossing rates as outlined in Mable and Adam (2007)
152 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 and Hoebe et al. (2009). In brief, we genotyped progeny arrays (6-10 offspring per mother) from 17 to 27 mothers per population, and estimated multilocus outcrossing rates using MLTR version 2.3 (Ritland 2002), which implements the mixed-mating model described by Ritland and Jain (Ritland and Jain 1981). Too few seed families from IOM, and TCA germinated to allow reliable evaluation of outcrossing rates but a rough estimate from TCA was obtained from 5 maternal families and those for IOM were obtained from Yvonne Willi (personal communication), who used a similar set of microsatellite markers on the same seed samples. Similarly, too few seed families from NCM germinated for reliable outcrossing estimates. Therefore, mating system for NCM was assumed based upon selfing phenotypes, which showed a predominance of SI individuals.
Microsatellite genotyping
Nine microsatellite loci previously used by Mable and Adam (2007) were screened for variation across all 192 individuals: ADH-1, AthZFPG, ATTS0392,
F20D22, ICE12, ICE9 (Clauss et al. 2002), LYR104, LYR133 and LYR417 (obtained from V. Castric and X. Vekemans, personal communication). Products were amplified by multiplex PCR, using the default reagent concentrations recommended by the kit instruction manual (QIAGEN Multiplex PCR Kit, QIAGEN Ltd, exact primer concentrations can be requested from the authors). Thermocycling was performed on PTC-200 (MJ research) machines using the following programme: initial denaturation at 95°C for 15 min followed by 34 cycles of 94° for 30 s, 55°C for
153 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
90 s, 72°C for 90 s, (ramp to 72°C at 0.7°C/s) and a final 72°C extension for 10 min.
Multiplex products (1:160 dilutions) were genotyped using an ABI 3730 sequencer
(by The Sequencing Service, University of Dundee). Genotypes were analyzed using
GENEMAPPER 4.0 (Applied Biosystems) and corrected manually.
PCR and sequencing of nuclear genes
For each of the 192 individuals across 24 populations, products were produced from PCR primer pairs that were previously designed and confirmed to amplify large exons from 18 nuclear genes (putative functions listed in Table SI in Ross-Ibarra et al. 2008). following methods described by Wright et al. (2006) and Ross-Ibarra et al.
(2008)., PCR reactions were performed in 25uL reaction volumes (15mM PCR (10X) buffer, 2 mM MgS04, lOmM dNTPs, 10uM forward primer, lO^M reverse primer,
1U Tsg polymerase and 50-1 OOng DNA) on an Eppendorf Mastercycler with the following program: 2 minutes at 94°C, 20 seconds at 94°C, 20 seconds at 55°C, 40 seconds at 72°C, for 35 cycles, with a final extension time of 4 minutes at 72°C.
Sequencing reactions were carried out by Lark Technologies, Texas, USA.
Chromatograms were analyzed using Sequencher 4.6 (Gene Codes, Corp.), using the
'call secondary peaks' option to aid in the identification of heterozygous sites. All chromatograms were checked manually for heterozygous nucleotide positions, using the sequence from both strands to confirm putative heterozygous sites. Due to a significant amount of sequencing failure across all 18 loci for individuals from LPT,
154 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
this population was removed from subsequent nuclear data analyses. All sequences
have been submitted to Genbank, with accession numbers HM168020-HM171110.
DNA sequencing of chloroplast DNA
The noncoding cpDNA region fr7jL(UAA)3'exon-TrnF(GAA) was amplified
with the primers E 5'-GGTTCAAGTCCCTCTATCCC-3' and F 5'-
ATTTGAACTGGTGACACGAG-3' (Taberlet et al. 1991) and sequenced. This
includes a region with pseudogene copies of the trn gene (Koch and Kiefer 2005;
Ansell et al. 2007; Tedder et al. 2010). Using the same primers in a smaller
population sample, Hoebe et al. (2009) identified a short haplotype (515bp, dubbed
SI) and two long haplotypes (741bp, dubbed LI and L2). After purification with
QiaQuick gel extraction kits (Qiagen Inc.), all PCR products were sequenced directly
on an ABI 3730 sequencer by The Sequencing Service, University of Dundee.
Sequences were visually checked using Sequencher 4.7 (Gene Codes, Corp.) and
aligned to the previously identified LI, L2, and SI haplotypes (Hoebe et al. 2009).
Nuclear gene sequence analysis
We reconstructed individual haplotypes of unphased diploid sequences using
the software PHASE (Stephens et al. 2001), as implemented in DnaSP Version 5.0
(Librado and Rozas 2009). For the sequence data, synonymous and nonsynonymous
sites were identified by aligning each fragment to the corresponding fragment in the
A. thaliana genome sequence, identified using BLAST (Altschul et al. 1990), and
155 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
using the protein annotation from A. thaliana. Standard population genetic
descriptives, including numbers of synonymous and nonsynonymous sites, estimates
of synonymous (Jtsyn) and nonsynymous (jTrep) diversity and Tajima's D, were
calculated using a modified version of Polymorphurama, a Perl script written by D.
Bachtrog and P. Andolfatto (available from
http ://ib. berkeley. edu/labs/bachtrog/data/po ly MORPHOram a/poly MORPHOrama. ht
ml). Significance of within-population mean Tajima's D was determined by
conducting 10,000 coalescent simulations as implemented in the HKA software
(available from http://genfacultv.iTitgers.edU/hey/software#HKA. Kliman et al. 2000).
The within population recombination parameter p (where p = 4Ner, Ne being the
effective population size and r recombination rate) was calculated for each locus with
more than three segregating sites and for each population by using the maxdip
program (available from http://genapps.uchicago.edu/maxdip/index.html) for diploid
unphased data. Maxdip applies a composite likelihood approach fit to the observed
pairwise SNP frequencies (Hudson 2001) and assumes an infinite-sites constant-
population-size neutral model.
Microsatellite analysis
For the microsatellite data, we used MSA (Dieringer and Schlotterer 2003) to
calculate observed and expected heterozygosity (H0 and He) for each locus.
156 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
The effects of selfing phenotype and mating system on genetic diversity and heterozygosity
Observed heterozygosity at nuclear loci was calculated using Perl scripts written by A.H. For each individual and each locus, the heterozygosity status was determined by comparing the two gene copies carried by the individual. If the two haplotypes were different, the status was described as heterozygous; if they were identical, the status was described as homozygous. Individual H0 was estimated using the average H0 over all the loci for each individual.
Linear regressions as implemented in JMP 8.0.2 (http://wwvv.jmp.coin') were used to test whether summary statistics describing genetic diversity and heterozygosity at the population level (7tsyn, p, and H0 for nuclear gene data), H0 and
He (for microsatellites) varied in relation to outcrossing rate (Tm) and/or the proportion of SC individuals in each population. Oneway ANOVA, also implemented in JMP 8.0.2 was used to test the effect of selfing phenotype on individual heterozygosity.
Explaining the reduced genetic diversity of selfing populations: testing for bottleneck effects beyond selfing alone
Selfing reduces effective population size (Nordborg 2000). Therefore, genetic diversity is expected to decrease with selfing rate, up to a twofold reduction for completely selfing populations. Other demographic effects and genetic hitchhiking could further decrease genetic diversity. We tested if genetic diversity (jtsyn and Jtrep
157 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 for gene sequences) in populations classified as selfing (based on multilocus outcrossing rates) was lower than expected based on the selfing rate (S- l-Tm) alone.
To do this, we 'corrected' the estimated genetic diversity using the formula ©corrected =
OobsO+.r7), where 0 is the genetic diversity measure being corrected, and F = S/2-S, where F is the inbreeding coefficient and S is the selfing rate for the population
(Nordborg 2000). Similarly, we corrected p using the formula Corrected = RobJ (1-S) where R is the recombination rate and S is the selfing rate for the population
(Nordborg 2000). Then, we used linear regressions to test if population outcrossing rates (Tm) predicted corrected Jtsyn, jTrep, microsatellite H0_ He and p. If so, and corrected genetic diversity decreased with selfing rate, this would provide evidence that genetic diversity in selfing populations is reduced due to additional factors beyond selfing alone (e.g., a bottleneck, selection).
Bayesian inference of population structure
We used InStruct (version 1.0, Gao et al. 2007) to infer population structure using a combination of phased nuclear gene sequence and microsatellite data. For comparative purposes, we also used STRUCTURE (version 2.3.2, Pritchard et al.
2000) to infer population structure. Both programs perform Bayesian clustering and work by assigning individuals to a given number of clusters in such a way that deviations from Hardy-Weinberg equilibrium are minimized. Unlike STRUCTURE,
InStruct can accommodate non-random mating due to selfing. Based on exploratory runs, we restricted the number of clusters (k) to range from k = 1 to k = 12, and ran
158 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 both programs for 2,000,000 generations with a burnin of 200,000 generations, with five independent chains (runs) for each k. We ran InStruct in the mode that allows for admixture and individual selfing rates. We used Perl scripts written by Joseph Hughes
(available from http://linnaeus.zoology.gla.ac.uk/~jhughes/Bioinformatics.html) on
InStruct output files, and Structure Harvester (v0.3 by D.A. Earl: http://taylorO.biology.ucla.edu/struct_harvest/) on STRUCTURE output files to excise the probability matrices for each level of k and perform matrix aligment using the software CLUMPP v 1.1.1 (Jakobsson and Rosenberg 2007). DISTRUCT vl.l
(Rosenberg 2004) was used to create bar plots of the aligned matrices.
Inferring the origin of selfing populations
For each locus, we counted the total number of microsatellite alleles and gene sequence haplotypes over all populations by using Perl scripts written by A.H. Then, we grouped populations according to their classification as selfing (Tm< 0.5), and outcrossing (Tm > 0.5). For both groups, we then counted the total number of microsatellite alleles and nuclear gene sequence haplotypes, and those that were unique to either group. Finally, we counted those that were shared between the groups. Then, we tested if the selfing and outcrossing populations had a significantly different number of unique variants relative to the total number of variants observed in each of these groups. We did this separately for microsatellite alleles (across all loci) and unique nuclear gene sequence haplotypes (across all genes) using G-test for goodness-of-fit to assess significance.
159 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
We then looked at the haplotypes that were unique to the selfing population group (i.e., the haplotypes not shared between the selfing and outcrossing group of populations) in a separate analysis. Specifically, we explored whether haplotypes unique to selfing populations were derived from haplotypes occurring in particular outcrossing populations, or unrelated to any haplotypes in outcrossing populations.
We considered haplotypes to be derived from another haplotype if they had up to two base pair differences, and unrelated if they differed by more than two base pairs from any other haplotype in our sampling. The presence of such 'unrelated' haplotypes within selfing populations would indicate an origin not included in our sampling, or mutations that have accumulated subsequent to divergence from a shared ancestor.
Note that, if a selfing population originated from another selfing population, this would be reflected by two or more selfing populations sharing haplotypes that are derived from the same ancestral population. As in the previous paragraph, this would be indistinguishable from a scenario of independent colonizations from the same outcrossing population.
Results
Selfing phenotypes and outcrossing rates
Population level multilocus outcrossing rate estimates (Tm) ranged from 0.09-
0.99 (Table 1; Table S2). We categorized populations as selfing if Tm was < 0.5, and outcrossing if Tm > 0.5. Eight populations were classified as selfing (Tm < 0.5: KTT,
LPT, PTP, RON, TC, TCA, TSSA, WAS). All of the remaining populations were
160 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 classified as outcrossing (Tm > 0.5: BEI, HDC, IND, IOM, LSP, MAN, OWB, PCR,
PIC, PIN, PIR, PRI, PUK, SBD, TSS). NCM was classified as outcrossing based on a preponderance of SI individuals, but outcrossing rates were not estimated due to insufficient seeds per mother (Table 1). Small siliques were found in all predominantly outcrossing populations but not in the populations that were predominantly self-compatible, emphasizing that their occurrence represents leakiness of the self-incompatibility system rather than a complete loss of it, as suggested previously (Mable et al. 2005). Overall, the proportion of self-compatible individuals in each population was a strong predictor of outcrossing rates (linear regression,
2 beta = -0.68, R = 0.887, F[Ui] = 165.7, p < 0.0005). Nevertheless, it is worth noting that some self-compatible (SC) or partially self-compatible (PC) plants were found in the majority of the outcrossing populations (Table S2). SC individuals were found in five of the 15 outcrossing populations (HDC, OWB, IND, MAN and PRI) and PC individuals were found in nine (BEI, HDC, IOM, NCM, OWB, PCR, PIR, PUK, and
SBD). Likewise, some of the populations classified as selfing included some SI and
PC individuals, with fully SI individuals found in two of the eight selfing populations
(TSSA and TC).
The effect of selfing phenotype and mating system (outcrossing rates) on genetic diversity and heterozygosity
Significant linear regressions indicated that both outcrossing rate and the proportion of SC individuals in a population were good predictors of average
161 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 multilocus synonymous nucleotide diversity (jtsyn) and expected microsatellite heterozygosity (He) (Table 2): both Jtsyn and He diversities increased with increasing outcrossing rate (Tm) (Figure 2) and decreased with increasing proportion of SC individuals. While a similar effect was observed for Jtrep, it was not statistically significant. Average population multilocus H0 was found to increase with increasing outcrossing rate and decrease with increasing proportion of SC individuals (Table 2).
In contrast, outcrossing rate and the proportion of SC individuals did not explain the population recombination parameter p (Table 2).
Observed heterozygosities (H0) calculated for each individual for the nuclear and the microsatellite datasets were highly correlated (Spearman rank correlation: r2 =
0.58, p < 0.0001). For both datasets, the selfing phenotype (PC, SC, SI) had a
= significant effect on individual heterozygosity (One-way ANOVA, Fr2,i67] 19.0, p <
0.0001). SC individuals (mean Ho=0.\0 for the nuclear gene sequences, mean H0 =
0.11 for the microsatellites) were found to be less heterozygous than SI (mean H0 =
0.21 for the nuclear gene sequences, mean H0= 0.26 for the microsatellites) and PC individuals (mean H0= 0.22 for the nuclear gene sequences, mean H0= 0.25 for the microsatellites). When the effect of selfing phenotype (SI, PC, SC) on heterozygosity was tested only considering outcrossing populations, the selfing phenotype did not have a significant effect for either the nuclear gene sequence data (overall mean H0 =
0.22), or for the microsatellite data (overall mean H0 = 0.27).
162 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Explaining the reduced genetic diversity of selfing populations: testing for bottleneck effects beyond selfing alone
After correcting Jtsyn for the differences in effective population size expected due to selfing alone, neither outcrossing rate (Tm) nor proportion of SC individuals per population explained levels of diversity (r2 = 0.039, p > 0.05, r2 = 0.14, p > 0.05), indicating that the neutral effects of selfing alone may explain the reduction in diversity in selfing populations (Table 2).
Many populations showed average Tajima's D values that had significant departures from a standard neutral model; in particular, the average Tajima's D was significantly positive in 15 of our 22 populations, and significantly negative in one
(Table 1). It is worth noting, however, that neither the selfing nor outcrossing populations consistently display more significant departures from neutrality.
Chloroplast Haplotype Distribution
Expanding on the results and following the same naming reported by Hoebe et al. (2009), the previously identified 741 bp LI and L2 haplotypes and two additional
741 bp haplotypes (dubbed L3 and L4) were found among the 24 populations sampled here. In addition, we found the previously identified 515 bp SI haplotype
(Hoebe et al. 2009) and two additional (498 bp) haplotypes (dubbed S2 and S3). The shorter haplotypes S2 and S3 differed by 1 bp from one another. Based on the eight individuals per population sampled, most populations (21) were fixed for a single chloroplast haplotype, while three populations contained a mixture of cpDNA
163 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 haplotypes (Figure 1). Throughout the Great Lakes region, haplotypes LI, L2, L4 and
SI predominate (Figure 1). Self-compatible individuals were found with LI, L2, L3,
L4 and SI chloroplast haplotypes and partially compatible individuals were found with haplotypes LI, L2, SI, S2 and S3. Predominantly selfing populations were associated with LI (LPT, PTP, RON, TSSA), SI (TC, TCA, TSSA), L3 (KTT) and
L4 (WAS). Of those, only L3 was unique to selfing populations. Self-incompatible individuals, and outcrossing populations in general were found with all haplotypes except L3.
Bayesian Clustering Analyses
Bayesian clustering using InStruct with estimation of individual selfing levels gave similar results as STRUCTURE (compare Figure SI and S2) and identified six main clusters when combining nuclear gene sequence and microsatellite data (Figure
1; Figure S3). Four clusters contained both selfing and outcrossing populations, while the two remaining clusters contained only outcrossing populations. Clustering based on nuclear gene or microsatellite data alone in general agreed well with the results from the combined dataset, and also identified six clusters (data not shown). Four of the clusters included both selfing and outcrossing populations and there were no clusters that consisted only of selfing populations. There were only a few populations with evidence of admixture, of which the MAN and IND population had the strongest signal.
164 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Inferences on the origin of selfing populations
Only one microsatellite allele was found to be unique to the group of selfing populations (Tm < 0.5). The remaining 30 microsatellite alleles that occurred in the selfing group were shared with the 52 alleles occurring in the outcrossing group of populations (Table 3). There was a highly significant under-representation of unique microsatellite alleles in the selfing vs. outcrossing populations (Table 3). A similar pattern emerged for the gene sequence haplotypes. Although in absolute terms there were more variants unique to the selfing group (55 of 145), the majority of variants
(90) still were shared with the 449 haplotypes occurring in the group of outcrossing populations (Table 3). The group of selfing populations thus appeared to have a subset of the microsatellite alleles and gene sequences haplotypes found in the outcrossing group. None of these patterns were driven by particular microsatellite loci or nuclear genes (Table S3).
The haplotypes unique to the group of selfing populations were mostly uninformative with regards to revealing a potential origin because they were either not closely related (i.e., three or more bp difference) to any haplotype found in outcrossing populations (Table S4), or closely related (i.e. only one or two bp differences) to haplotypes that occurred in multiple outcrossing populations (Table
S4). Finally, each selfing population had its own unique haplotypes (not shared with other selfing populations in our sampling) and these were never closely related to haplotypes unique to other selfing populations (Table S4). Haplotypes that were not closely related (i.e. three or more bp differences) to any haplotype in outcrossing
165 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 populations or other selfing populations, occurred in all selfing populations except
PTP (Table S4).
Discussion
Breakdown of self-incompatibility and patterns of genetic variation
Previous investigations into the levels of genetic diversity in mixed mating populations of North- American A. lyrata revealed expected patterns, where genetic diversity in selfing populations has been reduced in comparison with outcrossing populations (Mable et al. 2005; Mable and Adam 2007; Hoebe et al. 2009). These studies were based on microsatellite variation alone whereas in this study we combine nuclear gene sequence data with microsatellites and found a striking concordance between them. We found an increase of synonymous nucleotide diversity (jtsyn) for nuclear sequence data and He for microsatellite data (Table 2, Figure 2) with increasing outcrossing rate, which corroborates previous conclusions (Mable and
Adam 2007; Hoebe et al. 2009) and confirms the theoretical prediction that a shift to selfing comes at a cost to genetic diversity (Charlesworth et al. 1993; Nordborg 2000;
Charlesworth and Wright 2001; Glemin et al. 2006; Wright et al. 2008). For both datasets, individual H0 was significantly lower for SC individuals versus SI individuals. This effect appeared to be solely due to the transition to inbreeding (so an effect of mating system rather than loss of SI), for when the heterozygosity of SC, PC and SI individuals was compared excluding the inbreeding populations (Tm < 0.5), selfing phenotype (SC, PC, SI) had no effect on heterozygosity. Maintenance of high
166 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 heterozygosity in SC and PC individuals in outcrossing populations emphasizes that loss of SI does not always lead to shifts to inbreeding (i.e., there is a two-step process). Even though we only sampled eight individuals per population, self- compatible individuals were found in five of the 15 predominantly outcrossing populations, and partially compatible individuals were found in nine, suggesting that self-compatibility is widespread. The comparable levels of heterozygosity of SC and
SI individuals in outcrossing populations, and the lack of any clustering according to selfing phenotype (Figures SI and S2), also suggests that self-compatible plants in outcrossing populations, despite their ability to self-fertilize, still predominantly outcross. This is not entirely unexpected because neither the loss of SI nor the shift to selfing in A. lyrata is associated with a reduction in flower size (Hoebe 2009), which would promote exclusive selfing. These results are compatible with a scenario of one or more mutations facilitating the loss of SI having occurred early in the colonization history of North-American A. lyrata, and that segregation of these mutations causes segregation of SC phenotypes in all populations, but shifts towards selfing only in a subset.
No role for bottlenecks in the breakdown of self-incompatibility of North American
A. lyrata?
Population bottlenecks may be expected to be common in highly selfing populations, particularly if strong founder events were important in their origins
(Foxe et al. 2009 (Chapter 2); Guo et al. 2009). If this is the case, we would expect a
167 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
more severe reduction in diversity in selfing populations than expected under
neutrality (i.e. a greater than twofold reduction in diversity under complete selfing).
Similarly, we would expect a greater skew in the allele frequency spectrum in selfing
populations, generating more positive Tajima's D values. When levels of Jtsyn were
corrected for differences in Ne due to selfing alone, no significant correlation between
jtsyn and multilocus estimation of the outcrossing rate (Tm) was found. Thus, similar to
recent results from Eichhorniapaniculata (Ness et al. 2010), our data provided no
evidence for reductions in diversity beyond neutral expectations, providing no
signature of elevated demographic effects or hitchhiking in selfing populations.
Furthermore, although 15 populations had a significantly positive Tajima's D value
(Table 1), which is expected if recent bottlenecks have played a role, there was no
difference in Tajima's D values between selfing and outcrossing populations.
While population bottlenecks alone may result in a positive Tajima's D,
admixture resulting from gene flow can also elevate such estimates (Wright & Gaut,
2005). It is possible that positive Tajima's D values found in the outcrossing
populations are the result of elevated incoming gene flow when compared to the
selfing populations; although our clustering analyses do not suggest that this is
generally true, larger within-population samples might reveal greater gene flow
among outcrossing than selfing populations.
Demographic history and the breakdown of self-incompatibility of North American
A. lyrata
168 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
The results from Bayesian clustering analyses for the combined nuclear sequence and microsatellite data suggest the existence of six clusters across the populations sampled from eastern North America (Figure 1). There was good agreement between InStruct and STRUCTURE (Figures SI and S2). This is surprising given that the STRUCTURE assumption of random mating (Pritchard et al.
2000) is clearly violated in this system with varying levels of inbreeding, and may be reassuring for studies that have used STRUCTURE in systems where selfing plays a role (e.g., Hoebe et al. 2009; Foxe et al. 2009 (Chapter 2)). It is worth noting that the clusters appear to be fairly isolated; a signal of admixture was only particularly strong in IND and MAN. These populations are also characterized by a mixture of chloroplast haplotypes (IND, LI and L2: cf. Figure 1; MAN, LI and L2: cf. Hoebe et al. 2009; Tedder etal. 2010).
The overwhelming pattern here is that populations are not clustered by mating system or selfing phenotype (Figure 1), which would have suggested that selfing evolved only once in the region. Instead, they are predominantly clustered by geographic location. This geographic clustering of the populations by both nuclear markers and chloroplast haplotypes (Figure 1) likely reflects the recolonization history of North America after the end of the Wisconsin glaciation (ca. 10,000 years ago), before which the entire Great Lakes region was covered in ice (Lewis et al.
2008). Hoebe et al. (2009) concluded that a mating system transition may have occurred more than once in North American^, lyrata, as the loss of self- incompatibility and a transition to high levels of selfing occurred in multiple
169 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
chloroplast lineages (LI, SI; Hoebe et al. 2009). The more extensive geographic
sampling here confirmed this and identified a new population that has undergone a
complete transition to predominant selfing (KTT), which was characterized by a
fourth origin based on clustering analysis and a unique chloroplast haplotype (L3).
The variation among different selfing populations within a geographic area is
once again in stark contrast with the uniformity of different populations of C. rubella
(Foxe et al. 2009 (Chapter 2); Guo et al. 2009). This may reflect that North-American
A. lyrata is at an earlier stage in the transition to selfing, or that the underlying forces
driving the transition are inherently different. None of the selfing populations had a
strong relationship with any of the outcrossing populations we sampled in terms of
sharing of haplotypes (data not shown), contrary to what would be expected if selfing
populations had an origin in specific outcrossing populations in our sampling. Selfing
populations as a group appear to harbor a subset of the genetic variation of
outcrossing populations, with most haplotypes shared with outcrossing populations
and a significant under-representation of unique variants (Table 3).
Outcrossing rate and the proportion of self-compatible individuals do not account for levels of recombination
The transition from outcrossing to selfing can result in a number of
consequences at the genotypic level (reviewed in Wright et al. 2008). Selfing is
associated with a decrease in levels of polymorphism and an increase in levels of
linkage disequilibrium. Increased levels of homozygosity in selfing populations
170 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 should lead to less efficient recombination (r) and a reduction in Ne, thus reflected in a reduced estimate of p (p = 4Ner). However, in our dataset outcrossing rate and the proportion of SC individuals did not explain the population recombination parameter p (Table 2). One possible explanation for this is that the power to estimate p in selfing populations may be diminished due to low levels of diversity. Alternatively, recent transitions to selfing may retain ancestral recombination events, especially at local physical distances (Tang et al. 2007).
What favours self-compatibility in the Great Lakes Region ?
The question remains as to why selfing broke down and has persisted in North
American A lyrata, and has not occurred in the European subspecies. Selfing is only expected to become established in a previously outcrossing population when the advantages associated with selfing outweigh the costs in terms of inbreeding depression and reduced adaptability (Charlesworth and Charlesworth 1987). On a coarse scale, the geographic distribution of selfing populations is not consistent with expectations of reproductive assurance being favored in peripheral populations at the front of colonization wave (Baker's law, Baker 1955; also see Pannell and Barrett
1998), since both selfing and outcrossing populations have colonized new areas within the past 10,000 years. Selfing populations tend to be distributed towards the southern part of the habitat that opened up first following the last glacial maximum
(and so are not at the periphery of the current distribution), and there does not appear
171 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 to be a difference in population size or obvious habitat differences between the two types of populations (Mable and Adam 2007).
An alternative explanation is that North American populations have generally experienced reduced inbreeding depression through purging, hence lowering the cost associated with a transition to inbreeding. Population bottlenecks, particularly accompanied by long-term reductions in effective population size, can lead to significant purging and/or fixation of deleterious alleles (Bataillon and Kirkpatrick
2000). Lower levels of genetic diversity found in North American versus European A. lyrata have been suggested to reflect a long-term population bottleneck associated with the colonization of North America from European populations (Ross-Ibarra et al.
2008). Therefore, reduced inbreeding depression as a consequence of this bottleneck may have partly facilitated the evolution of selfing. As the Great Lakes populations of
A. lyrata are thought to have spread north from their glacial refugia following the last
Ice Age, this may have further contributed to a substantial purging of deleterious alleles in founder populations. Consistent with this possibility, it has recently been shown that population range expansion can lead to a significant decrease in genetic load in Mercurialis annua (Pujol et al. 2009). Comparative studies of inbreeding depression, both within North American and European populations, could enable a test of this hypothesis in A lyrata. Previous studies have demonstrated high levels of inbreeding depression in European populations (Karkkainen et al. 1999), which does not seem to be as apparent in A. lyrata (Hoebe 2009).
172 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Conclusions
In summary, we have shown that, compared to predominantly outcrossing populations, genetic diversity was reduced in selfing populations of North American
A. lyrata. We found a strong concordance between chloroplast markers, nuclear gene sequence and microsatellite data. The general reduction in diversity appeared to be the consequence of the transition to a selfing mating system, and not of the loss of SI alone. We found no evidence of severe bottlenecks associated with the transition to selfing beyond the bottleneck expected due to the transition itself. Although we assume that the transition to selfing in this system is recent (at least much more recent than the transition to selfing in other systems), and that there have been multiple independent origins of selfing, we did not find a clear relation of any of the selfing populations to a particular outcrossing "parent" population. The genetic basis for the loss of SI has not yet been elucidated in A. lyrata but it is important to identify potentially unique origins of self-compatibility prior to designing and interpreting crossing studies to investigate mechanisms of loss. For example, initial investigations of loss of SI in A. thaliana suggested a selective sweep across all populations
(Shimizu et al. 2004); later studies that have sampled populations more broadly have found that the story is more complicated, with the possibility of multiple independent losses, perhaps involving different mechanisms (Bechsgaard et al. 2006; Sherman-
Broyles et al. 2007; Tang et al. 2007). The North American^, lyrata populations offer an exciting system to unravel the causes and consequences of the loss of SI and
173 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 an evolutionary transition from outcrossing to selfing, as it involves a much more recent change than that which has given rise to the highly selfing A. thaliana species.
Acknowledgements
We thank Yvonne Willi and David Remington for generously providing seeds and outcrossing rates; Aileen Adam, Peter Hoebe, and Hong-Guang Zha for assistance with laboratory work and plant maintenance; Peter Hoebe, Rob Ness and Tanja Slotte for useful discussions; Joseph Hughes for assistance with Perl scripting; Hong Gao for providing assistance with the InStruct source code; three anonymous referees for their constructive and detailed comments that greatly improved the manuscript. We are grateful for funding from the Natural Environment Research Council
(NE/D013461/1) and European Research Area Network in Plant
Genomics/Biotechnology and Biosciences Research Council joint funding
(ERAPGFP-06.058A) to BKM. We thank Parks Canada, Ontario Parks, Michigan
State Parks Authority, U.S. National Park Service, Ohio Department of Natural
Resources, and the Ohio Nature Conservancy for access to protected park areas and advice on plant locations.
174 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
References
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local
alignment search tool. J. Mol. Biol. 215:403-410.
Ansell, S. W., H. Schneider, N. Pedersen, M. Grundmann, S. J. Russell, and J. C.
Vogel. 2007. Recombination diversifies chloroplast.trnF pseudogenes in
Arabidopsis lyrata. J. Evol. Biol. 20:2400-2411.
Baker, H. G. 1955. Self-compatibility and establishment after long-distance dispersal.
Evolution 9:347-349.
Barrett, S. C. H. 2002. The evolution of plant sexual diversity. Nat. Rev. Genet.
3:274-284.
Barrett, S. C. H., L. D. Harder, and A. Worley. 1996. The comparative biology of
pollination and mating in flowering plants. Phil. Trans. R. Soc. Ser. B.
351:1271-1280.
Bataillon, T., and M. Kirkpatrick. 2000. Inbreeding depression due to mildly
deleterious mutations in finite populations: size does matter. Genet. Res.
75:75-81.
Baudry, E., C. Kerdelhue, H. Innan, and W. Stephan. 2001. Species and
recombination effects on DNA variability in the tomato genus. Genetics
158:1725-1735.
Bechsgaard, J. S., V. Castric, D. Charlesworth, X. Vekemans and M. H. Schierup.
2006. The transition to self-compatibility in Arabidopsis thaliana and
evolution within S-haplotypes over 10 Myr. Mol. Biol. Evol. 23:1741-1750.
175 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Boggs, N. A., J. B. Nasrallah, and M. E. Nasrallah. 2009. Independent S-locus
mutations caused self-fertility in Arabidopsis thaliana. PLoS Genetics
5:el000426.
Charlesworth, B., M. T. Morgan, and D. Charlesworth. 1993. The effects of
deleterious mutations on neutral molecular variation. Genetics 134:1289-1303.
Charlesworth, D. 2006. Evolution of plant breeding systems. Curr. Biol. 16:R726-
735.
Charlesworth, D., and B. Charlesworth. 1987. Inbreeding depression and its
evolutionary consequences. Annu. Rev. Ecol. Syst. 18:237-268.
Charlesworth, D., and X. Vekemans. 2005. How and when did Arabidopsis thaliana
become highly self-fertilising. Bioessays 27:472-476.
Charlesworth, D., and S. I. Wright. 2001. Breeding systems and genome evolution.
Curr. Opin. Genet. Dev. 11:685-690.
Charlesworth, D., and Z. Yang. 1998. Allozyme diversity in Leavenworthia
populations with different inbreeding levels. Heredity 81:453-461.
Chiang, Y. H., B. A. Schaal, C. H. Chou, S. Huang, and T. Y. Chiang. 2003.
Contrasting selection modes at the Adhl locus in outcrossing Miscanthus
sinensis vs. inbreeding Miscanthus condensatus (Poaceae). Am. J. Bot.
90:561-570.
Clauss, M. J., H. Cobban, and T. Mitchell-Olds. 2002. Cross-species microsatellite
markers for elucidating population genetic structure in Arabidopsis and Arabis
(Brassicaeae). Mol. Ecol. 11:591-601.
176 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Clauss, M. J., and T. Mitchell-Olds. 2006. Population genetic structure of Arabidopsis
lyrata in Europe. Mol. Ecol. 15:2753-2766.
Cutter, A. D., and B. A. Payseur. 2003. Selection at linked sites in the partial selfer
Caenorhabditis elegans. Mol. Biol. Evol. 20:665-673.
Darwin, C. R. 1876. The effects of cross and self-fertilization in the vegetable
kingdom. John Murray, London.
Dieringer, D., and C. Schlotterer. 2003. Two distinct modes of microsatellite mutation
processes: evidence from the complete genomic sequences of nine species.
Genome Res. 13:2242-2251.
Filatov, D. A., and D. Charlesworth. 1999. DNA polymorphism, haplotype structure
and balancing selection in the Leavenworthia PgiC locus. Genetics 153:1423-
1434.
Fisher, R. A. 1941. Average excess and average effect of a gene substitution. Ann.
Eugen. 11:53-63.
Foxe, J. P., T. Slotte, E. A. Stahl, B. Neuffer, H. Hurka, and S. I. Wright. 2009. Rapid
morphological evolution and speciation associated with the evolution of
selfing in Capsella. Proc. Natl. Acad. Sci. USA 106:5241-5245.
Gao, H., S. Williamson and C. D. Bustamante. 2007. A Markov chain Monte Carlo
approach for joint inference of population structure and inbreeding rates from
multilocus genotype data. Genetics 176:1635-1651.
177 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Glemin, S., E. Bazin, and D. Charlesworth 2006. Impact of mating systems on
patterns of sequence polymorphism in flowering plants. Proc. R. Soc. B.
273:3011-3019.
Grant, V. 1981. Plant Speciation. Columbia University Press, New York.
Guo, Y.-L., J. S. Bechsgaardb, T. Slotte, B. Neuffer, M. Lascoux, Weigel D., and M.
H. Schierup. 2009. Recent speciation of Capsella rubella from Capsella
grandiflora, associated with loss of self-incompatibility and an extreme
bottleneck Proc. Natl. Acad. Sci. USA 106:5246-5251.
Hoebe, P. N. 2009. Evolutionary dynamics of mating systems in populations of North
American Arabidopsis lyrata. Dissertation, 164pp. University of Glasgow.
Hoebe, P. N., M. Stift, A. Tedder, and B. K. Mable. 2009. Multiple losses of self-
incompatibility in North-American Arabidopsis lyrata: phylogeographic
context and population genetic consequences. Mol. Ecol. 18:4924-4939.
Hudson, R. R. 2001. Two-locus sampling distributions and their application. Genetics
159:1805-1817.
Hurka, H., and B. Neuffer. 1997. Evolutionary processes in the genus Capsella
(Brassicaceae). Plant Syst. Evol. 206:295-316.
Igic, B., R. Lande, and J. R. Kohn. 2008. Loss of self-incompatibility and its
evolutionary consequences. Int. J. Plant Sci. 169:93-104.
Ingvarsson, P. K. 2002. A metapopulation perspective on genetic diversity and
differentiation in partially self-fertilizing plants. Evolution 56:2368-2373.
178 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Jakobsson, M., and N. A. Rosenberg. 2007. CLUMPP: a cluster matching and
permutation program for dealing with label switching and multimodality in
analysis of population structure. Bioinformatics 23:1801-1806.
Kalisz, S., D. W. Volger and K. M. Hanley 2004. Context-dependent autonomous
self-fertilization yields reproductive assurance and mixed mating. Nature
430:884-887.
Karkkainen, K., H. Kuittinen, R. Van Treuren, C. Vogl, S. Oikarinen and O.
Savolainen. 1999. Genetic basis of inbreeding depression in Arabis petraea.
Evolution 53: 1354-1365
Kliman, R. M., P. Andolfatto, J. A. Coyne, F. Depaulis, M. Kreitman, A. J. Berry, J.
McCarter, J. Wakeley, and J. Hey. 2000. The Population Genetics of the
Origin and Divergence of the Drosophila simulans Complex Species. Genetics
156:1913-1931.
Koch, M., and M. Kiefer. 2005. Genome evolution among cruciferous plants: a
lecture from the comparison of the genetic maps of three diploid species-
Capsella rubella, Arabidopsis lyrata subsp. petraea, and A. thaliana. Am. J.
Bot. 92:761-767.
Lewis, C. F. M., P. F. Karrow, S. M. Blasco, F. M. G. McCarthy, J. W. King, T. C.
Moore Jr., and D. K. Rea. 2008. Evolution of lakes in the Huron basin:
Deglaciation to present. Aquatic Ecosystem Health and Management 11:127-
136.
179 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Librado, P., and J. Rozas. 2009. DnaSP v5: a software for comprehensive analysis of
DNA polymorphism data. Bioinformatics 25:1451-1452.
Liu, F., D. Charlesworth, and M. Kreitman. 1999. The effect of mating system
differences on nucleotide diversity at the phosphoglucose isomerase locus in
the plant genus Leavenworthia. Genetics 151:343-357.
Liu, F., L. Zhang, and D. Charlesworth. 1998. Genetic diversity in Leavenworthia
populations with different inbreeding levels. Proc. R. Soc. Lond. B. Biol. Sci.
265:293-301.
Liu, P., S. Sherman-Broyles, M. E. Nasrallah, and J. B. Nasrallah. 2007. A Cryptic
Modifier Causing Transient Self-Incompatibility in Arabidopsis thaliana.
Curr. Biol. 17:734-740.
Mable, B. K., and A. Adam. 2007. Patterns of genetic diversity in outcrossing and
selfing populations of Arabidopsis lyrata. Mol. Ecol. 16:3565-3580.
Mable, B. K., A. V. Robertson, S. Dart, C. Di Berardo, and L. Witham. 2005.
Breakdown of self-incompatibility in the perennial Arabidopsis lyrata
(Brassicaceae) and its genetic consequences. Evolution 59:1437-1448.
Maynard Smith, J., and J. Haigh. 1974. The hitch-hiking effect of a favourable gene.
Genet. Res. 23:23-25.
Muller, M. H., J. Leppala and O. Savolainen. 2008. Genome-wide effects of
postglacial colonization in Arabidopsis lyrata. Heredity 100:47-58.
180 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Ness, R. W., S. I. Wright, and S. C. H. Barrett. 2010. Mating-System Variation,
Demographic History and Patterns of Nucleotide Diversity in the Tristylous
Plant Eichhornia Paniculata Genetics 184:381-392.
Nordborg, M. 2000. Linkage disequilibrium, gene trees and selfing: an ancestral
recombination graph with partial self-fertilization. Genetics 154:923-929.
Pannell, J. R., and S. C. H. Barrett. 1998. Baker's law revisited: reproductive
assurance in a metapopulation. Evolution 52: 657-668.
Pollak, E. 1987. On the theory of partially inbreeding finite populations. I. Partial
selfing. Genetics 117:353-360.
Pritchard, J. K., M. Stephens, and P. Donnelly. 2000. Inference of population
structure using multilocus genotype data. Genetics 155:945-959.
Pujol, B., S. R. Zhou, J. Sanchez Vilas, and J. R. Pannell. 2009. Reduced inbreeding
depression after species range expansion. Proc. Natl. Acad. Sci. USA 106:
15379-15383.
Ritland, K. 2002. Extensions of models for the estimation of mating systems using n
independent loci. Heredity 88:221-228.
Ritland, K., and S. K. Jain 1981. A model for the estimation of outcrossing rate and
gene frequencies using n independent loci. Heredity 47:35-52.
Rosenberg, N. A. 2004. DISTRUCT: a program for the graphical display of
population structure. Mol. Ecol. Notes 4:137-138.
Ross-Ibarra, J., S. I. Wright, J. P. Foxe, A. Kawabe, L. DeRose-Wilson, G. Gos, D.
Charlesworth, and B. S. Gaut. 2008. Patterns of Polymorphism and
181 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Demographic History in Natural Populations of Arabidopsis lyrata. PloS One
3:e2411.
Savolainen, O., C. H. Langley, B. P. Lazzaro and H. Freville, 2000. Contrasting
patterns of nucleotide polymorphism at the alcohol dehydrogenase locus in the
outcrossing Arabidopsis lyrata and the selfing Arabidopsis thaliana. Mol.
Biol. Evol. 17:645-655.
Sherman-Broyles, S., N. Boggs, A. Farkas, P. Liu, J. Vrebalov, M. E. Nasrallah and J.
B. Nasrallah. 2007. S locus genes and the evolution of self-fertility in
Arabidopsisthaliana. Plant Cell 19:94-106.
Shimizu, K. K., J. M. Cork, A. L. Caicedo, C. A. Mays, R. C. Moore, K. M. Olsen, S.
Ruzsa, G. Coop, C. D. Bustamante, P. Awadalla and M. D. Purugganan. 2004.
Darwinian selection on a selfing locus. Science. 306:2081-2084.
Stebbins, G. L. 1950. Variation and evolution in plants. Columbia University. Press,
New York.
Stebbins, G. L. 1956. Taxonomy and evolution of genera with special reference to
family Gramineae. Evolution 10:235-245.
Stebbins, G. L. 1957. Self-fertilization and population variability in higher plants.
Am. Nat. 41:337-354.
Stebbins, G. L. 1970. Adaptative radiation of reproductive characteristics in
angiosperms. I. Pollination mechanisms. Ann. Rev. Ecol. Syst. 1:307-326.
182 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Stephens, M., N. J. Smith, and P. Donnelly. 2001. A new statistical method for
haplotype reconstruction from population data. Am. J. Hum. Genet. 68:978-
989.
Taberlet, P., L. Gielly, G. Pautou, and J. Bouvet. 1991. Universal primers for
amplification of three non-coding regions of chloroplast DNA. Plant. Mol.
Biol. 17:1105-1109.
Takebayashi, N., and P. L. Morrell. 2001. Is self-fertilization an evolutionary dead
end? Revisiting an old hypothesis with genetic theories and a
macroevolutionary approach. Am. J. Bot. 88:1143-1150.
Tang, C, C. Toomajian, S. Sherman-Broyles, V. Plagnol, Y. L. Guo, T. T. Hu, R. M.
Clark, J. B. Nasrallah, D. Weigel, and M. Nordborg. 2007. The evolution of
selfing in Arabidopsis thaliana. Science 317:1070-1072.
Tedder, A., P. N. Hoebe, S. W. Ansell and B. K. Mable. 2010 Using chloroplast trnF
pseudogenes for phylogeography in Arabidopsis lyrata. Diversity 2:653-678.
Tsuchimatsu T., K. Suwabe, R. Shimizu-Inatsugi, S. Isokawa, P., Pavlidis, T. Stadler,
G. Suzuki, S. Takayama, M. Watanabe and K. K. Shimizu. 2010. Evolution of
self-compatibility in Arabidopsis by a mutation in the male specificity gene.
Nature 464:1342-1346.
Wright, S. I., J. P. Foxe, L. DeRose-Wilson, A. Kawabe, M. Looseley, B.S. Gaut and
D. Charlesworth 2006. Testing for effects of recombination rate on nucleotide
diversity in natural populations of Arabidopsis lyrata. Genetics 174:1421-
1430.
183 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Wright, S. I. and B. S. Gaut. 2005. Molecular population genetics and the search for
adaptive evolution in plants. Mol. Biol. Evol. 22:506-519.
Wright S. I., B. Lauga and D. Charlesworth. 2003. Subdivision and haplotype
structure in natural populations of Arabidopsis lyrata. Mol. Ecol. 12:1247-
1263.
Wright, S. I., R. W. Ness, J. P. Foxe, and S. C. H. Barrett. 2008. Genomic
consequences of outcrossing and selfing in plants. Int. J. Plant Sci. 169:105-
118.
184 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Table 1. Population-level outcrossing rates (Tm), proportion of self-compatible (SC) individuals (as an indication of selfing phenotype),
summary statistics and observed heterozygosity for 18 nuclear gene sequences, and observed and expected heterozygosities (H0 and He)
across nine microsatellite loci, with populations ordered by increasing outcrossing rates.
Based on nine Based on nuclear gene sequences of 18 loci microsatellite loci
1 5 7 8 9 Population Tm Prop. SC Syn. Repl. Jisy„ Corr. jirep Corr. Taj D p Corr. Ho Ho He Corr. 2 6 plants Sites Sites Jtsyn jirep p He PTP 0.09 1 1880.8 6225.2 0.005 0.0087 0.001 0.0018 0.421 0 0 0.022 0.069 0.077 0.14 LPT11 0.13 1 0.069 0.13 0.23 TC 0.18 0.88 1868.03 6174.97 0.011 0.0193 0.002 0.0034 0.505 * 0.005 0.025 0.117 0.181 0.281 0.48 WAS 0.25 1 1862.12 6174.88 0.005 0.0073 0.001 0.0016 0.774 ** 0 0 0.104 0.083 0.136 0.22 RON 0.28 1 1870.3 6190.8 0.004 0.0056 0.001 0.0016 -0.831 *** 0 0 0.045 0.028 0.028 0.044 KTT 0.31 1 1876.4 6211.6 0.008 0.012 0.001 0.0015 0.631 ** 0.008 0.024 0.057 0 0.044 0.067 TSSA 0.41 0.5 1880.52 6225.48 0.006 0.0082 0.002 0.0028 -0.278 0 0 0.084 0.097 0.187 0.27 TCA 0.48 1 1880.66 6225.34 0.013 0.0182 0.002 0.0027 0.509 ** 0 0 0.063 0.042 0.121 0.16 OWB 0.64 0.29 1835.9 6090 0.008 0.0094 0.001 0.0012 0.754 ** 0.004 0.006 0.27 0.35 0.3 0.36 HDC 0.65 0.14 1879.8 6226.3 0.003 0.0037 0.001 0.0012 0.480 0.009 0.014 0.17 0.069 0.14 0.17 PIC 0.77 0 1823.2 6015.8 0.013 0.0142 0.002 0.0023 0.513 * 0 0 0.21 0.36 0.32 0.36 MAN 0.83 0.13 1790.4 5850.6 0.016 0.0177 0.002 0.0022 0.700 ** 0.005 0.005 0.23 0.22 0.27 0.29 PIN 0.84 0 1878.4 6218.7 0.014 0.0149 0.002 0.0022 0.191 0.002 0.002 0.22 0.18 0.26 0.28 PIR 0.88 0 1780.5 5929.5 0.012 0.013 0.002 0.0021 0.170 0.001 0.001 0.21 0.26 0.26 0.28 PRI 0.89 0.13 1854.2 6143.8 0.008 0.0086 0.001 0.0011 0.527 * 0 0 0.076 0.14 0.15 0.16
185 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
TSS 0.91 0 1880.1 6225.9 0.01 0.0101 0.002 0.0021 0.578 ** 0.005 0.005 0.301 0.361 0.456 0.48 IOM 0.94 0 1873.9 6199 0.01 0.0106 0.002 0.0021 0.419 * 0.003 0.003 0.23 0.28 0.4 0.41 LSP 0.94 0 1781.3 5829.7 0.005 0.0057 0.001 0.001 -0.335 0.01 0.011 0.16 0.14 0.11 0.11 SBD 0.94 0 1813.3 6010.7 0.015 0.0157 0.003 0.0031 0.757 *** 0.023 0.024 0.27 0.42 0.54 0.56 PUK 0.96 0 1856.8 6150.2 0.016 0.0161 0.002 0.002 0.822 *** 0.009 0.009 0.26 0.34 0.34 0.34 BEI 0.98 0 1853.8 6135.2 0.02 0.0204 0.003 0.003 0.683 ** 0.016 0.016 0.33 0.32 0.39 0.39 PCR 0.98 0 1871.6 6195.4 0.012 0.0119 0.001 0.001 -0.009 0.002 0.002 0.23 0.21 0.3 0.3 IND 0.99 0.14 1834.6 6067.4 0.011 0.0111 0.002 0.002 0.421 * 0.005 0.005 0.3 0.43 0.47 0.47 NCM12 - 0 1741.9 5668.1 0.003 - 0.001 - 0.875 ** 0.003 - 0.12 0.14 0.17 -
* indicates p < 0.05, ** indicates p < 0.01, *** indicates p < 0.001 1 Based on multilocus estimates obtained from microsatellite variation in progeny arrays, using MLTR version 2.3 (Ritland 2002) 2 Based on manual self-pollinations (see Table S2 for more details) 3 Total number of synonymous sites across loci 4 Total number of replacement (non-synonymous) sites across loci 5 7tsyn: synonymous nucleotide diversity: the average number of pairwise differences between two sequences across loci 6 For each of the diversity measures corrections for the reduction in effective population size due to selfing rate (5=1- Tm) were = 7 performed using the formula: 9corrected 90bS(l+Jr ), where 9 represents the genetic diversity measure, and F = S/2-S, where F is the inbreeding coefficient and 5 is the selfing rate for the population 7 7irep: replacement (non-synonymous) nucleotide diversity: the average number of pairwise differences between two sequences across loci 8 Taj D: average Tajima's D across loci p: population recombination parameter (see text for details), median across loci 10 The recombination parameter p was corrected for the reduction in effective population size due to selfing rate (5=1- Tm) using the formula Rcorrected = Robsl(l-S) where R is the recombination parameter and 5 is the selfing rate for the population
186 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Nuclear gene sequencing failed for this population due to PCR problems 12 Outcrossing rates could not be obtained for this population due to insufficient seeds in the batches for progeny arrays
187 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Table 2. Linear regressions of outcrossing rate (Tm) and the proportion of self-compatible (SC) individuals per population on synonymous diversity
(jtSyn), corrected synonymous diversity, the recombination parameter (p), the corrected recombination parameter and observed heterozygosity (H0) across 18 nuclear gene sequences; and observed and expected microsatellite heterozygosity (H0 and He) across nine microsatellite loci.
Proportion of T^ SC Individuals2 Degrees r2 F ratio Beta Freedom p value r2 F ratio beta Degrees Freedom p value
JtSyn 0.3 8.65 0.008 21 0.008 0.21 5.4 -0.004 22 0.03
4 Corrected Jtsyn 0.039 0.81 0.003 21 0.38 0.14 0.28 -0.001 21 0.61 n 5 0.171 4.12 0.008 21 0.0558 0.112 2.77 -0.0005 22 0.11
4 Corrected Jtrep 0.0069 0.139 0.00019 21 0.71 0.0004 0.08 0.0001 21 0.78 P6 0.149 3.49 -0.005 21 0.0761 0.129 2.96 -0.005 22 0.1006 Corrected p7 0.0008 0.0156 0.0003 21 0.90 0.0003 0.0053 0.0003 21 0.94
Nuclear H„ 0.64 35.23 0.250 21 <0.0001 0.58 1.13 -0.169 22 <0.0001 Microsatellite H„ 0.49 19.1 0.306 22 0.0003 0.51 20.5 -0.223 23 0.0002
Microsatellite He 0.44 15.6 0.313 22 0.0008 0.44 15.5 -0.223 23 0.0008
188 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
1 Estimated based on multilocus microsatellite variation in progeny arrays, using MLTR version 2.3 (Ritland 2002) 2 Proportion of self-compatible (SC) individuals within the population (cf. Table S2) 3 Synonymous diversity across all 18 nuclear loci in each population as measured by Jtsyn, where Jt is the average number of pairwise differences between two sequences 4 For each of Jtsyn and Jtrep corrections for the reduction in effective population size due to selfing rate (5=1- Tm) were performed using = ? the formula: 9COiiected 90bs(l+-r ), where 9 represents the genetic diversity measure, and F = S/2-S, where F is the inbreeding coefficient and 5 is the selfing rate for the population
5 Replacement diversity across all 18 nuclear loci in each population as measured by Jtrep, where JI is the average number of pairwise differences between two sequences 6 The population recombination parameter p; here we use the median p across all 18 nuclear loci in each population on sequences with > 3 segregating sites 7 The recombination parameter p was corrected for the reduction in effective population size due to selfing rate (5=1- Tm) using the
formula Rcorrected =R0bJ(l-S) where R is the recombination parameter and 5 is the selfing rate for the population
189 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010
Table 3. Total and unique number of variants (microsatellite alleles across nine loci, nuclear gene haplotypes across 18 genes) for the
group of inbreeding populations and for the group of outcrossing populations; overall total number of different variants; number of variants shared across inbreeding and outcrossing populations. For the unique alleles, expected numbers under the null-hypothesis (equal
proportion of unique alleles in inbreeding vs. outcrossing populations) are given in brackets. A goodness-of-fit-test (G-test) was used to
evaluate if the observed numbers of unique alleles significantly differed from null-expectations (* p < 0.05-10" , ** p < 0.05-10" ).
Group of inbreeding Group of G-test Total Total shared populations outcrossing statistic number between populations of inbreeding different and variants outcrossing Total Unique Total Unique Microsatellites 31 1 52 22 14.3 * 53 30 (expected) (8.59) (14.4) Nuclear genes 145 55 449 359 •7] 1 ** 504 90 (expected) (101.1) (312.9)
190 Reprinted with permission from Evolution: International Journal of Organic Evolution, In Press. Copyright 2010 v^
PIC ,JF- t PUK v^' // ^fateftMHrtar / ^ ISP TCATC / Hjran y y . , .
' \ * 7 ffBO )«»)\iWA S
v Lalte,!* PCi PT| t-
lAlD
Selfing Outcrossing