EVOLUTIONARY CONSEQUENCES OF COLONIZATION IN THE

GENUS

GESSECA GOS

A DISSERTATION SUBMITTED TO

THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

GRADUATE PROGRAM IN BIOLOGY

YORK UNIVERSITY

TORONTO, ONTARIO

JUNE 2012

© Gesseca Gos, 2012 Library and Archives Bibliotheque et Canada Archives Canada

Published Heritage Direction du Branch Patrimoine de I'edition

395 Wellington Street 395, rue Wellington Ottawa ON K1A0N4 Ottawa ON K1A 0N4 Canada Canada Your file Votre reference ISBN: 978-0-494-90373-5

Our file Notre reference ISBN: 978-0-494-90373-5

NOTICE: AVIS: The author has granted a non­ L'auteur a accorde une licence non exclusive exclusive license allowing Library and permettant a la Bibliotheque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par I'lnternet, preter, telecommunication or on the Internet, distribuer et vendre des theses partout dans le loan, distrbute and sell theses monde, a des fins commerciales ou autres, sur worldwide, for commercial or non­ support microforme, papier, electronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats.

The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in this et des droits moraux qui protege cette these. Ni thesis. Neither the thesis nor la these ni des extraits substantiels de celle-ci substantial extracts from it may be ne doivent etre imprimes ou autrement printed or otherwise reproduced reproduits sans son autorisation. without the author's permission.

In compliance with the Canadian Conformement a la loi canadienne sur la Privacy Act some supporting forms protection de la vie privee, quelques may have been removed from this formulaires secondaires ont ete enleves de thesis. cette these.

While these forms may be included Bien que ces formulaires aient inclus dans in the document page count, their la pagination, il n'y aura aucun contenu removal does not represent any loss manquant. of content from the thesis. Canada Abstract

The establishment of a species in a new geographic range has profound implications for its evolution. A colonizing species may encounter founder effects such decreased genetic diversity and drift, as well as new selection pressures in the environment. Other genetic changes that frequently accompany range expansion, such as shifts in mating system and chromosome copy number, can also influence the evolutionary processes of the founding population. In this thesis, I have investigated the consequences of colonization for two different species, Capsella rubella and

Capsella bursa-pastoris, following divergence from their range-stable progenitor

Capsella grandiflora.

To investigate the relationship between ecological stoichiometry and colonization ability, I compared nitrogen use efficiency and other plant performance-related traits under three different soil nitrogen levels in the three Capsella species that differ in their colonization histories. No differences in the traits were found between species, but a large degree of between-population variation was observed. This variation indicates a large potential for local adaptation that was likely present prior to species divergence.

To investigate the genetic diversity of disease resistance genes following colonization and a population bottleneck, I partially sequenced 9 NBS-LRR disease resistance genes in the outcrossing Capsella grandiflora and the recently derived, bottlenecked selfing species Capsella rubella, and compared patterns of nucleotide diversity and divergence with genome-wide reference loci. Average diversity at ii resistance loci was comparable between C. rubella and C. grandiflora, indicating a retention of genetic diversity at disease resistance genes in Capsella rubella, despite the genome-wide diversity reduction following a population bottleneck.

Finally, I investigated the genome-wide consequences of in the world­ wide colonizer Capsella bursa-pastoris. Indications of the past population bottleneck were apparent, as was a large-scale reduction in the prevalence of purifying selection.

This implies a lighter load of deleterious mutations in the polyploid C. bursa-pastoris compared to its progenitor C. grandiflora, which has implications for its evolution and may be related to its colonization success. These results provide an overview of the evolutionary consequences that the colonization process has had in the Capsella genus, with regard to nitrogen use, disease resistance, and polyploidy.

iii This thesis is dedicated to my parents, Silvana and Elci, who always knew I could

accomplish this,

and to my supervisor, Stephen, who gave me the chance.

Thank you.

iv Acknowledgements

I would like to thank the members of my Supervisory Committee, Dr. Joel Shore,

Dr. Bridget Stutchbury, and Dr. Norman Yan for their continued guidance throughout my time as a graduate student. I would also like to thank the members of my Examining

Committee, for agreeing to be a part of the most important step in my career.

I would also like to thank my Supervisor, Dr. Stephen Wright, for giving me the opportunity to study in his laboratory, and for providing such strong guidance along the way.

I want to thank the members of the Wright lab, especially Khaled Hazzouri, not only for help in the laboratory, but also for a great friendship throughout our time as graduate students together.

v Author Contributions

Chapters 2, 3 and 4 of this thesis are being prepared as refereed journal articles. I am the first author in each case, however, those with whom I share authorship must be acknowledged.

My supervisor, Stephen Wright, is a co-author on all journal articles.

For Chapter 3, Tanja Slotte contributed partially to the data collection by providing the sequences from the 283 reference genes, which I used for comparison to the disease resistance genes.

For Chapter 4, Khaled Hazzouri performed the CTAB DNA extraction protocol on leaf material for both Capsella species. Robert Williamson wrote several of the computer program scripts that were either used, or modified for use in my analysis.

vi Table of Contents

Abstract

Dedication

Acknowledgments...,

Author Contributions

Table of Contents

List of Tables

List of Figures

Chapter One:

Introduction to the evolutionary consequenes of colonization

References 15

Chapter Two:

No relationship between species' colonization history and plant performance in the genus

Capsella 26

Abstract 27

Introduction 28

Materials and Methods 34

Results 36 vii Discussion 39

References 53

Appendix and Supplementary Material 59

Chapter Three:

Signatures of balancing selection are maintained at disease resistance loci following mating system evolution and a population bottleneck in the genus Capsella 73

Abstract 74

Introduction 75

Materials and Methods 82

Results 86

Discussion 88

Conclusions 92

Supplementary Material 98

References 102

Chapter Four:

Genome-wide relaxation of purifying selection in the recent polyploid Capsella bursa- pastoris 110

Introduction 111

Materials and Methods 118 viii Results 123

Discussion 127

References 154

Chapter Five:

Conclusions 161

ix List of Tables

Chapter 2

Table 1:

Log likelihood values for the first full model and its explanatory factors, as predictors of the measurements 45

Table 2:

Log likelihood values for the second full model and its explanatory factors, as predictors of the measurements 46

Appendix SI:

Seed sample genotype labels and collecting locations 59

Chapter 3

Table 1:

Individual and average summary statistics for the disease resistance genes 95

Table 2:

Differentiation of individual and average disease resistance genes between species 96

Table 3:

Percentages of shared, unique and fixed polymorphisms by category for individual and average disease resistance genes 97 x Table SI:

Locations of the individuals from which the R-gene sequences were sampled 98

Table S2:

Primers used for PCR amplification of R-gene fragments 99

Table S3:

BlastX Coordinates and protein coding domains for the R-gene fragments 100

Table S4: Locations of the individuals from which the genome-wide sequences were sampled 101

xi List of Figures

Chapter 2

Figure 1:

Nitrogen use efficiency for (A) the three Capsella species and (B) the eight Capsella populations under three different levels of nitrogen fertilization 47

Figure 2:

Total biomass (grams) for (A) the three Capsella species and (B) the eight Capsella populations under three different levels of nitrogen fertilizan 50

Figure 3:

Root to Shoot ratio (grams) for (A) the three Capsella species and (B) the eight Capsella populations under three different levels of nitrogen fertilization 52

Appendix S2:

A branch diagram representing the sampling design for this study 60

Appendix S3:

Randomized Latin Square design used in the greenhouse 61

Appendix S4:

Nutrient Solution Recipes 62

Appendix S5:

Aboveground (shoot) biomass (grams) for (A) the three Capsella species and (B) the eight Capsella populations under three different levels of nitrogen fertilization 63

Appendix S6: xii Root biomass (grams) for (A) the three Capsella species and (B) the eight Capsella populations under three different levels of nitrogen fertilization 65

Appendix S7:

Percent carbon contained in leaves for (A) the three Capsella species and (B) the eight

Capsella populations under three different levels of nitrogen fertilization 67

Appendix S8:

Percent nitrogen contained in leaves for (A) the three Capsella species and (B) the eight

Capsella populations under three different levels of nitrogen fertilization 69

Appendix S9:

Carbon to nitrogen ratio in the leaves for (A) the three Capsella species and (B) the eight

Capsella populations under three different levels of nitrogen fertilization 71

Chapter 3

Figure 1:

Correlation in nucleotide diversity between Capsella species and 94

Chapter 4

Figure 1:

Flo cytometry results for A) radish and B) radish and the Capsella bursa-pastoris sample from Spain 135 xiii Figure 2:

Sliding windows of nucleotide diversity (pi) across chromosome 1, with a window size of

1000 SNPs and a step of 500 SNPs, shown for A) Capsella grandiflora, and B) Capsella bursa-pastoris, excluding SNPs fixed between homeologs 136

Figure 3:

Sliding windows of the site frequency spectrum (Tajima's D statistic) across chromosome

1, with a window size of 1000 SNPs and a step of 500 SNPs, shown for A) Capsella grandiflora, and B) Capsella bursa-pastoris, excluding SNPs fixed between homeologs 138

Figure 4:

Minor allele frequencies across all individuals and for all site type categories in A)

Capsella grandiflora and B) Capsella bursa-pastoris 140

Figure 5:

The ratios of the total genome-wide non-synonymous segregating sites compared to synonymous segregating sites in in all individuals of Capsella grandiflora and Capsella bursa-pastoris 142

Figure 6:

The ratio of total genome-wide segregating and fixed stop codons to 4-fold synonymous sites in for all individuals of Capsella grandiflora and Capsella bursa-pastoris 143

Figure 7:

xiv Two measures of the genome-wide average synonymous and non-synonymous nucleotide diversity, Theta and Pi, shown for all individuals of Capsella grandiflora and Capsella bursa-pastoris 144

Figure 8:

Average genome-wide synonymous and non-synonymous Tajima's D statistic for all individuals Capsella grandiflora and Capsella bursa-pastoris 145

Figure 9:

The distribution of fitness effects of new mutations at 0-fold sites, across all individuals, in both speices 146

Figure 10:

Frequency spectra of single nucleotide polymorphism (SNP) types that are predicted to have the most deleterious effects on fitness, or "high impact", including SNPs located in

A) splice acceptor sites, B) splice site donor sites, and SNPs causing C) the loss of a start codon, D) the gain of a stop codon, as well as E) non-synonymous SNPs in coding regions 147

Figure 11:

Proportion of adaptive substitutions (a) for all site types in Capsella grandiflora and

Capsella bursa-pastoris 152

Figure 12. Relative rate of adaptive substitutions (coa) for all site types in Capsella grandiflora and Capsella bursa-pastoris 153

xv CHAPTER 1

INTRODUCTION TO THE EVOLUTIONARY CONSEQUENCES OF

COLONIZATION

1 The colonization of a novel environment and subsequent geographic range expansion has profound implications for the evolution of a population. During this process, new selection pressures in the environment will drive phenotypic evolution. Loss of genetic diversity, reduction in effective population size, and impact of genetic drift from founder effects can influence how those new pressures affect evolution at the genetic level. Inbreeding, which is known to enhance colonization ability, also reduces effective population size and increases the power of genetic drift. Polyploidy, which is widely associated with range expansion in , further influences evolutionary patterns throughout the entire genome. Colonization is a complex process, both in terms of the factors that enable range expansion, and the genotypic and phenotypic consequences. My goal is to investigate the evolutionary consequences of colonization at the phenotypic and genetic level. 1 focus on three factors that are of considerable importance to colonization: nitrogen use, disease resistance, and whole-genome duplication.

Before considering colonization, we must explore the nature of the limits to a species' range, which must be overcome. Dispersal of propagules is the first limit to range expansion for plant species, which have limited mobility (Brown et al. 1996; Gaston,

1996; Lloyd et al. 2003; Lowry and Lester, 2006). The ability to transport pollen or seeds to a new location is limited by dispersal mechanisms (Edwards and Westoby, 1996).

Range limits, if not driven by dispersal, may be demographic or evolutionary in nature

(Moeller et al. 2011). Dispersal alone is insufficient to expand a species range if propagules are transported to suitable habitats but are unable to establish there. In a 2 landscape where suitable habitat is heterogeneous, or patchy in distribution, the limit may begin where there is a decline in the frequency of inhabitable patches, where local population extinction rates are high, or where colonization rates of patches are low

(Lennon et al. 1997; Holt and Keight, 2000; Moeller et al. 2011; Holt, 2005). When range expansion is limited by adaptation across environmental gradients, peripheral populations may be maintained entirely by immigration, if dispersal rates are sufficient. This is the source-sink model of range limits. The individuals in the sink population are maladapted to their environment, but do not respond to selection, and are not self-sufficient (Moeller et al. 2011).

Where range limits are evolutionary, populations fail to adapt to environmental conditions beyond their range (Kirkpatrick and Barton, 1997). This includes the source- sink model, but does not necessarily require the presence of sink populations. Whether peripheral populations are self-maintaining or not, further expansion is limited by a lack of the evolutionary potential required to adapt to the environment neighboring the range boundary. In the source-sink model, gene flow is directional from center to edge, hindering adaptive evolution to the edge environment (Holt, 1996). This follows the

'abundant center hypothesis', where the environment is most favourable, and population density is greatest, in the center of the species range, and declines approaching the edges

(Guo et al. 2005; Moeller et al. 2011). Alternatively, there may be inherent genetic constraints, such as ecological trade-offs, where multiple traits cannot be simultaneously optimized for fitness, or lack of genetic variation in the particular trait required, 3 preventing further adaptation in the necessary direction, and causing an evolutionary range limit that is not driven by population processes, such as patterns of gene flow

(Moeller 2011).

The colonization of novel habitats must therefore necessarily require a species to overcome the driving force of the range limit. If the limit is dispersal-driven, it may be overcome by long-distance transport by humans, or human activity, to a climate similar to the species' native ranges, which is the case with many invasive species. Examples include giant hogweed, Heracleum mantegazzianum, which was introduced in Europe from the Caucasus Mountains as a garden plant (Otte et al. 1998), the zebra mussel,

Dreissena polymorpha, which was introduced to the Great Lakes in North America from

Eurasia through the ballast water of freighters (Johnson and Carlton, 1993), and the emerald ash borer, Agrilus planipennis, which was introduced to North America from

Asia in wood packing materials for international trade (Poland and McCullough, 2006). If the limit is driven by a specific environmental factor, such as temperature, climate change may play a role in driving colonization, though this is less related to a species' colonization ability, than the range limit itself undergoing an expansion. The british butterfly, Aricia agestis, has recently expanded its range in response to climate change

(Buckley et al, 2012), as has the pipistrelle bat, Pipistrellus nathusii (Lundy et al, 2010), and the prickly, lettuce Lactuca serriola (D'Andrea et al. 2009).

Inbreeding has long been thought to enhance colonization ability, and a transition to this mating system may allow a species to overcome a range limit. The ability to self- 4 fertilize provides the advantage of reproductive assurance in founder populations where both pollinators and mates may be scarce. Therefore, selfing is predicted to be prominent among colonizing species, according to Baker's Law (Baker, 1959). Cases where selection for reproductive assurance facilitates the evolution of selfing include

Leavenworthia (Busch et al. 2011), evening primrose (Evans et al. 2011), and alpine ginger (Zhang and Li, 2008). For populations under pollen limitation, selfing has been shown to decrease the risk of extinction compared outcrossing populations (Lennartsson,

2002). Inbreeding may allow a species to overcome a range limit that is driven by either dispersal or heterogeneity of suitable habitat, including low patch fequency, high extinction, or low colonization rates.

Polyploidy, or whole genome duplication, is also associated with colonization success. Rapid range expansion is commonly associated with newly formed polyploids

(Hull-Sanders et al. 2009; Treier et al. 2009). Invasive species, the most successful colonizers, are also more likely to be polyploid compared to rare or endangered species, suggesting that polyploidy enhances colonization ability (Pandit et al. 2011). This may be due, in part, to the fact that polyploids have a wider range of environmental tolerance, allowing them to thrive in environments where their diploid relatives may not (Levin,

1983; Otto, 2000; Leitch and Leitch, 2008). Genome duplication can also lead to rapid phenotypic change (Hegarty and Hiscock, 2007; Leitch and Leitch, 2008; Otto and

Whitton, 2000), giving polyploids the ability to readily adapt to their new environment, even to the point of outcompeting the resident natives. While polyploidy is thought to 5 promote adaptive evolution (Leitch and Leitch, 2008), it is not known whether polyploids

are successful colonizers due to evolutionary potential, or are pre-adapted for invasion. It

has been demonstrated for several species, where polyploids and their diploid relatives

coexist in the native range, that the invaded range contains exclusively the polyploids

(Lafuma et al. 2003; Mandak et al. 2005; Kubatova et al. 2008; Schlaepfer et al. 2008,

Treier et al. 2009). This scenario supports the pre-adaptation hypothesis, but cannot

discount the importance of post-colonization evolution. The two are, of course, not

mutually exclusive, and both may play a role in successful colonization by polyploids.

Many species also colonize, or are introduced to novel ranges, and then subsequently

undergo polyploidization, suggesting that there is selection towards traits promoting

range expansion in the new environment, including chromosome duplication (Mandak et

al. 2004; Suda et al. 2010). Polyploidy may allow species to overcome range limits that

are evolutionary in nature, such as inherent genetic constraints, or ecological trade-offs.

When considering the implications of colonization for evolution, the question

remains as to whether successful colonization and range expansion occur when species arrive at their new habitats already well adapted for life there, or whether adaptation to

the novel environment follows colonization. The scenario of pre-adaptation may apply

when the new climate is similar to that of the native range. This most often occurs through human-mediated long-distance dispersal to regions that are environmentally similar, but too far away for the natural dispersal mechanisms of the species (Otte et al.

1998; Johnson and Carlton, 1993; Welk et al. 2002; Poland and McCullough, 2006; 6 Richardson and Thuiller, 2007). Polyploidy may also play a role in pre-adaptation to novel environments, by conferring tolerance for a broader range of environmental factors to the polyploids, compared to their diploid relatives (Levin, 1983; Otto, 2000; Leitch and

Leitch, 2008).

Whether pre-adapted or not, the process of colonization and range expansion has profound implications for the evolution of a population. Range expansion can expose populations to new and strong selection pressures, promoting local adaptation (Maron et al. 2004; Xu et al. 2010; Colautti et al. 2010). The degree of adaptation following range expansion will necessarily depend, in part, on the genetic diversity of the founder populations. Those with higher diversity are expected to have greater colonization success, due to their adaptive potential (Sakai et al. 2001; Lee 2002, Holt, 2005), and higher diversity has been demonstrated to promote colonization success (Crawford and

Whitney 2010). A reduction in diversity, caused by the population bottleneck and reduced effective population size during colonization (Nei, 1975), may present a problem for adaptation to the new environment, because less genetic variation is available on which natural selection may act (Kliber and Eckert, 2005; Puillandre et al. 2008), and evolutionary potential becomes reduced (Willi et al. 2006; Heerwaarden et al. 2008).

Indeed, genetic bottlenecks have been reported in many colonizing species, including the wetland herb Lythrum salicaria (Eckert et al. 1996), and the forest herb Primula elatior in Belgium (Jacquemyn et al. 2009). Increased self-fertilization can exacerbate this

7 problem, as it further reduces population genetic diversity, which may lead to inbreeding depression (Lande and Schemske 1985).

Nevertheless, adaptation following colonization has been observed in many species, particularly the most successful colonizers, invasive species. Examples include copepods in the Great Lakes (Lee et al. 2011), and purple loosestrife Lythrum salicara

(Barrett et al. 2008). Some degree of genetic diversity is required to facilitate this (Sakai et al. 2001; Lee, 2002; Holt, 2005), which implies that, in the event of population bottlenecks, populations must overcome the reduction in effective population size that is associated with founder effects, and in selfing species, with inbreeding as well. Some species may overcome genetic bottlenecks by having their genetic variation increased by multiple introductions from isolated source populations (Verhoeven et al. 2011). The hybridization between individuals from different source populations enhances overall diversity in the founders, sometimes to the point where they have more genetic variation than the populations in their native range (Kolbe et al. 2004; Genton et al. 2005;

Crawford and Whitney 2010). Examples include reed canary grass Phalaris arundinacea

(Lavergne and Molofsky, 2007), and ground finches Geospiza magnirostris (Grant et al.

2001). This may also be accomplished by a high degree of gene flow beteween mulitple founder populations, or founders and the source population.

Genetic bottlenecks themselves, despite reducing overall effective population size and diversity, may enhance evolutionary potential in some respects. This appears to be a contradiction, since a reduction in population size and diversity is expected to decrease 8 adaptive potential. However, some evolutionary models propose that the genetic drift experienced by small populations can allow them to reach new adaptive peaks by altering the genetic background, which is stable in large populations (Wright, 1931; Templeton,

1980; Carsen and Templeton, 1984; Heerwaarden et al. 2008). Altering the genetic background is expected to release cryptic variation, genetic variation which is not expressed in the phenotype, due to effects such as dominance or epistasis (McGuigan and

Sgro, 2009). Once expressed, the phenotypic differences are visible to natural selection, and may promote adaptive evolution. Therefore, it is unclear whether a genetic bottleneck will inhibit or promote adaptive evolution, but theory suggests that in some cases, the latter may be true.

The potential for the evolution of a phenotype depends on additive genetic variance (Va), the genetic variation that is transmitted from parents to offspring (Connor and Hartl, 2004). A population bottleneck is expected to reduce adaptive potential by decreasing additive genetic variance (Va) (Chakraborty and Nei, 1982; Lynch and Hill

1986). This decrease is expected to be proportional to the degree of inbreeding in the population (Falconer and Mackay, 1996). However, bottlenecked populations may differ significantly from their source population in terms of the distribution of genotypic and phenotypic variation (Jarvis et al. 2011). For quantitative traits where genes have dominance or epistatic architecture, it is possible for Va to increase following a bottleneck (van Heerwaarden et al. 2008). Where there are dominance effects between alleles, including complete dominance or overdominance, a population bottleneck may 9

! increase the frequency of recessive alleles by genetic drift, which will increase their overall effect in the population, and the Va of the trait (Willis and Orr, 1993; Wang et al.

1998). Where there are epistatic interactions between loci, and the epistatic allele increases in frequency compared to the masking allele, Va is also expected to increase for the related trait, following a genetic bottleneck (Bryant et al. 1986; Goodnight, 1988;

Lopez-Fanjul et al. 1999; Hill et al. 2006; Turelli and Barton, 2006). Increased VA will theoretically increase the adaptive potential of the trait if recessive or epistatic alleles are beneficial; however, it is noteworthy that if recessive alleles are deleterious, natural selection will favor a reversion to the previous state of the population, and this will lead to inbreeding depression (Lopez-Fanjul et al. 1999). Therefore, whether the result of the bottleneck is increased adaptation, or inbreeding depression, may depend largely on the genetic architecture of the trait, and fitness effect of the alleles in question. Empirical evidence for increased additive genetic variance following a bottleneck has been demonstrated in Drosophila species (Lopez-Fanjul et al. 1999; Fowler and Whitlock,

1999; van Heerwaarden et al. 2008), and the house mouse Mus musculus (Jarvis et al.

2011). However, an increase in the response to selection of the traits has not been demonstrated (Heerwaarden et al. 2008), and the ability of an increase in additive genetic variance to promote adaptive evolution is uncertain (Lopez-Fanjul et al. 1999;

Heerwaarden et al. 2008). Therefore, the question of whether the increase in additive genetic variance of a trait following a bottleneck enhances its evolutionary potential is controversial, and remains unresolved. The increase in additive genetic variance for a trait following a population bottleneck, as a result of the increased frequency of beneficial recessive alleles, or the exposure of epistatic ones, can be said to release 'cryptic variation' that was previously hidden from the phenotype, and therefore inaccessible to natural selection. It may contain changes in trait values beyond what is expressed in the source population (Gibson and

Dworkin 2004; Masel et al. 2006). Prior to the increase in VA, polymorphisms of this nature would have been retained or lost due to random genetic drift alone (McGuigan and

Sgro, 2009), but once exposed, may be selected on, and therefore increase adaptive potential of the trait in question. A population bottleneck is not the only component of the colonization process that may have this effect. Novel selection pressures or environmental stress in the newly colonized environment are also predicted to release cryptic variation (Schlichting, 2008; Hoffmann, 2000; Badyaev, 2005; McGuigan and

Sgro). This can occur if polymorphisms that were weakly deleterious in the native range become beneficial in the new environment, as a result of changes in the direction of selection pressures (Carlborg et al. 2006; Masel et al. 2006; Schlichting, 2008). This has been demonstrated for reproductive morphology in the yellow fly Scathophaga stercoraria, following exposure to high temperatures (Berger et al. 2011), for resource- use traits in the tadpoles of spadefoot toads Spea following the transition to a carnivorous diet (Ledon-Rettig et al, 2010), and for body size in the three spined sickleback

Gasterosteus aculeatus, under conditions of low salinity (McGuigan et al. 2010).

11 However, the ability of environmental stress to release cryptic genetic variation or enhance evolvability remains controversial. The argument has been made that the designs of laboratory experiments demonstrating this empirically are overly simplistic, considering a single trait and environmental variable, and that the degree of stress or level of environmental variation required to release cryptic genetic variation is not plausible for natural populations to encounter (McGuigan and Sgro, 2009). Furthermore, some empirical studies demonstrated that the release of cryptic variation by the environment is inconsistent, and is dependent on the genetic background (Yeyati et al. 2007; Sangster et al. 2008). This is an indication that this process may interact with colonization, as population bottlenecks can lead to genetic background shifts.

When a species overcomes the limits to its geographic range and colonizes a new environment, it is difficult to predict how its evolution may be affected. It may experience a decrease in genetic diversity due to the founder effect of a population bottleneck and reduced effective population size, or it may have increased diversity as a result of multiple introductions from previously isolated source populations, or extensive gene flow from source to founder population, or between founder populations. A colonizing species may experience decreased adaptive potential due to a reduction in effective population size and diversity, or it may have increased evolvability by the release of cryptic variation caused by the population bottleneck or new selective pressures. Adaptive potential may be unnecessary for species invading suitable habitats that are similar to their native range, if their limit was dispersal based, and overcome by 12 human transport. Conversely, adaptive potential may be essential for a colonizing population, as they experience new selection pressure in their new environment. In this thesis, I investigate the consequences of colonization in the genus Capsella, in which three closely related species have varying degrees of colonization success in their histories, with respect to three traits that are likely to be highly relevant to the colonization process: nitrogen use efficiency and phenotypic plasticity, pathogen resistance, and polyploidy.

In the second chapter, I compared nitrogen use efficiency and other plant performance-related traits, as well as phenotypic plasticity of those traits, under three different levels of soil nitrogen in a common garden experiment with three members of the Capsella genus, which have vastly different histories of colonization success. The more successful colonizers are both inbreeding, and have undergone population bottlenecks prior to range expansion. My goal was to determine if increased nitrogen use efficiency, phenotypic plasticity, or plant performance was associated with the historically more successful colonizers within this genus, Capsella rubella and Capsella bursa-pastoris. Instead, I found extensive population-level variation in the traits investigated, but no differences between species, indicating a retention of phenotypic variation following speciation and colonization.

In the third chapter, I examined DNA sequences of the putative pathogen-binding sites in nine disease resistance genes in the outcrossing Capsella grandiflora and the recently derived, bottlenecked selfing species Capsella rubella. I compared levels and 13 patterns of nucleotide diversity and divergence at the disease resistance genes in both species with the DNA sequences of reference genes sampled throughout the rest of the genome. Plant disease resistance genes are frequently under balancing selection, which maintains genetic diversity, in order to protect the plant from pathogen attack. My goal was to determine if nucleotide diversity is preferentially maintained at disease resistance genes, due to historical or current balancing selection, following a genome-wide reduction in diversity as a result of the population bottleneck. The results did indeed support a retention of diversity at disease resistance loci.

In the fourth chapter, I used next-generation whole-genome sequencing techniques to investigate the genome-wide molecular evolutionary consequences of polyploidy in the bottlenecked, selfing, highly successful colonizer Capsella bursa- pastoris, and compare it with its diploid progenitor Capsella grandiflora. Eight full genome sequences of C. bursa-pastoris and thirteen whole genome sequences of C. grandiflora were subjected to extensive molecular evolutionary analyses. Evidence of the bottleneck and selfing mating system are apparent from reductions in genome-wide nucleotide diversity. The genome-wide strength of purifying selection is extensively reduced in the polyploid C. bursa-pastoris, a result which has not been previously demonstrated, but no clear evidence for increased positive selection was found.

14 References

Andersson, S, M Ellmer, TH Jorgensen, A Palme. 2010. Quantitative genetic effects of

bottlenecks: experimental evidence from a wild plant species, Nigella degenii. Jouranl of

Heredity 101:298-307.

Badyaev, AV. 2005. Stress-induced variation in evolution: from behavioural plasticity to genetic

assimilation. Proceedings of the Royal Society B-Biological Sciences 272:877-886.

Baker, HG. 1959. Reproductive methods as factors in speciation in flowering plants. Cold Spring

Harbor Symposia on Quantitative Biology 24:177-191.

Barrett, SCH, RI Colautti, CG Eckert. 2008. Plant reproductive systems and evolution during

biological invasion. Molecular Ecology 17:373-383.

Berger, D, SS Bauerfeind, WU Blanckenhorn, MA Schafer. 2011. High temperatures reveal

cryptic genetic variation in a polymorphic female sperm storage organ. Evolution

65:2830-2842.

Brown, JH, GC Stevens, DM Kaufman. 1996. The geographic range: Size, shape, boundaries,

and internal structure. Annual Review of Ecology and Systematics 27:597-623.

Bryant, EH, SA McCommas, LM Combs. 1986. The effect of an experimental bottleneck upon

quantitative genetic variation in the housefly. Genetics 114:1191-1211.

Buckley, J, RK Butlin, JR Bridle. 2012. Evidence for evolutionary change associated with the

recent range expansion of the British butterfly, Aricia agestis, in response to climate

change. Molecular Ecology 21:267-280.

Busch, JW, S Joly, DJ Schoen. 2011. Demographic signatures accompanying the evolution of 15 selfing in Leavenworthia alabamica. Molecular Biology and Evolution 28:1717-1729,

Carlborg, O, L Jacobsson, P Ahgren, P Siegel, L Andersson. 2006. Epistasis and the release of

genetic variation during long-term selection. Nature Genetics 38:418-420.

Carson, HL, AR Templeton. 1984. Genetic Revolutions in Relation to Speciation Phenomena -

the Founding of New Populations. Annual Review of Ecology and Systematics 15:97-

131.

Chakraborty, R, M Nei. 1982. Genetic differentiation of quantitative characters between

populations or species 1. Mutation and random genetic drift. Genetical Research 39:303-

314.

Colautti, RI, CG Eckert, SCH Barrett. 2010. Evolutionary constraints on adaptive evolution

during range expansion in an invasive plant. Proceedings of the Royal Society B-

Biological Sciences 277:1799-1806.

Conner, JK, DL Hartl. 2004. A Primer of Ecological Genetics. Sunderland, MA, USA: Sinauer

Associates.

Crawford, KM, KD Whitney. 2010. Population genetic diversity influences colonization success.

Molecular Ecology 19:1253-1263.

D'Andrea, L, O Broennimann, G Kozlowski, A Guisan, X Morin, J Keller-Senften, F Felber.

2009. Climate change, anthropogenic disturbance and the northward range expansion of

Lactuca serriola (Asteraceae). Journal of Biogeography 36:1573-1587.

Eckert, CG, D Manicacci, SCH Barrett. 1996. Genetic drift and founder effect in native versus

introduced populations of an invading plant, Lythrum salicaria (Lythraceae). Evolution 16 50:1512-1519.

Edwards, W, M Westoby. 1996. Reserve mass and dispersal investment in relation to geographic

range of plant species: Phylogenetically independent contrasts. Journal of Biogeography

23:329-338.

Evans, ME, DJ Hearn, KE Theiss, K Cranston, KE Holsinger, MJ Donoghue. 2011. Extreme

environments select for reproductive assurance: evidence from evening primroses

(Oenothera). New Phytologist 191:555-563.

Excoffier, L, M Foil, RJ Petit. 2009. Genetic Consequences of Range Expansions. Annual

Review of Ecology Evolution and Systematics 40:481-501.

Falconer, DS, TFC Mackay. 1996. Introduction to Quantitative Genetics. London, England:

Longman Group.

Fowler, K, MC Whitlock. 1999. The distribution of phenotypic variance with inbreeding.

Evolution 53:1143-1156.

Gaston, KJ. 1996. Species-range-size distributions: Patterns, mechanisms and implications.

Trends in Ecology & Evolution 11:197-201.

Genton, BJ, JA Shykoff, T Giraud. 2005. High genetic diversity in French invasive populations

of common ragweed, Ambrosia artemisiifolia, as a result of multiple sources of

introduction. Molecular Ecology 14:4275-4285.

Gibson, G, I Dworkin. 2004. Uncovering cryptic genetic variation. Nature Reviews Genetics

5:681-U611.

Goodnight, CJ. 1988. Epistasis and the effect of founder events on the additive genetic variance. 17 Evolution 42:441-454.

Grant, PR, BR Grant, K Petren. 2001. A population founded by a single pair of individuals:

establishment, expansion, and evolution. Genetica 112:359-382.

Guo, QF, M Taper, M Schoenberger, J Brandle. 2005. Spatial-temporal population dynamics

across species range: from centre to margin. Oikos 108:47-57.

Hegarty, M, S Hiscock. 2007. Polyploidy: Doubling up for evolutionary success. Current

Biology 17:R927-R929.

Hill, WG, NH Barton, M Turelli. 2006. Prediction of effects of genetic drift on variance

components under a general model of epistasis. Theoretical Population Biology 70:56-62.

Hoffmann, AA, MJ Hercus. 2000. Environmental stress as an evolutionary force. Bioscience

50:217-226.

Holt, RD. 1996. Adaptive evolution in source-sink environments: Direct and indirect effects of

density-dependence on niche evolution. Oikos 75:182-192.

Holt, RD, TH Keitt. 2000. Alternative causes for range limits: a metapopulation perspective.

Ecology Letters 3:41-47.

Holt, RD, TH Keitt, MA Lewis, BA Maurer, ML Taper. 2005. Theoretical models of species'

borders: single species approaches. Oikos 108:18-27.

Houle, D. 1992. Comparing Evolvability and Variability of Quantitative Traits. Genetics

130:195-204.

Hull-Sanders, HM, RH Johnson, HA Owen, GA Meyer. 2009. Effects of polyploidy on

secondary chemistry, physiology, and performance of native and invasive genotypes of 18 Solidago gigantea (Asteraceae). American Journal of Botany 96:762-770.

Hull-Sanders, HM, RH Johnson, HA Owen, GA Meyer. 2009. Effects of polyploidy on

secondary chemistry, physiology, and performance of native and invasive genotypes of

Solidago gigantea (Asteraceae). American Journal of Botany 96:762-770.

Jacquemyn, H, K Vandepitte, I Roldan-Ruiz, O Honnay. 2009. Rapid loss of genetic variation in

a founding population of Primula elatior (Primulaceae) after colonization. Annals of

Botany 103:777-783.

Jarvis, JP, SN Cropp, TT Vaughn, LS Pletscher, K King-Ellison, E Adams-Hunt, C Erickson, JM

Cheverud. 2011. The effect of a population bottleneck on the evolution of genetic

variance/covariance structure. Journal of Evolutionary Biology 24:2139-2152.

Johnson, LE, JT Carlton. 1996. Post-establishment spread in large-scale invasions: Dispersal

mechanisms of the zebra mussel Dreissena polymorpha. Ecology 77:1686-1690.

Kirkpatrick, M, NH Barton. 1997. Evolution of a species' range. American Naturalist 150:1-23.

Kliber, A, CG Eckert. 2005. Interaction between founder effect and selection during biological

invasion in an aquatic plant. Evolution 59:1900-1913.

Kolbe, JJ, RE Glor, L Rodriguez Schettino, AC Lara, A Larson, JB Losos. 2004. Genetic

variation increases during biological invasion by a Cuban lizard. Nature 431:177-181.

Kubatova, B, P Travnicek, D Bastlova, V Curn, V Jarolimova, J Suda. 2008. DNA ploidy-level

variation in native and invasive populations of Lythrum salicaria at a large geographical

scale. Journal of Biogeography 35:167-176.

Lafuma, L, K Balkwill, E Imbert, R Verlaque, S Maurice. 2003. Ploidy level and origin of the European invasive weed Senecio inaequidens (Asteraceae). Plant Systematics and

Evolution 243:59-72.

Lande, R, DW Schemskg. 1985. The Evolution of Self-Fertilization and Inbreeding Depression

in Plants .1. Genetic Models. Evolution 39:24-40.

Lavergne, S, J Molofsky. 2007. Increased genetic variation and evolutionary potential drive the

success of an invasive grass. Proceedings of the National Acadamy of Sciences USA

104:3883-3888.

Ledon-Rettig, CC, DW Pfennig, EJ Crespi. 2010. Diet and hormonal manipulation reveal cryptic

genetic variation: implications for the evolution of novel feeding strategies. Proceedings

of the Royal Society B-Biological Sciences 277:3569-3578.

Lee, CE. 2002. Evolutionary genetics of invasive species. Trends in Ecology & Evolution

17:386-391.

Lee, CE. 2002. Evolutionary genetics of invasive species. Trends in Ecology & Evolution

17:386-391.

Lee, CE, M Kiergaard, GW Gelembiuk, BD Eads, M Posavi. 2011. Pumping Ions: Rapid Parallel

Evolution of Ionic Regulation Following Habitat Invasions. Evolution 65:2229-2244.

Leitch, AR, IJ Leitch. 2008. Genomic plasticity and the diversity of polyploid plants. Science

320:481-483.

Lennartsson, T. 2002. Extinction thresholds and disrupted plant-pollinator interactions in

fragmented plant populations. Ecology 83:3060-3072.

Lennon, JJ, JRG Turner, D Connell. 1997. A metapopulation model of species boundaries. Oikos 20 78:486-502.

Levin, DA. 1983. Polyploidy and novelty in flowering plants. American Naturalist 122:1-25.

Lloyd, KM, JB Wilson, WG Lee. 2003. Correlates of geographic range size in New Zealand

Chionochloa (Poaceae) species. Journal of Biogeography 30:1751-1761.

Lopez-Fanjul, C, A Fernandez, MA Toro. 1999. The role of epistasis in the increase in the

additive genetic variance after population bottlenecks. Genetical Research 73:45-59.

Lowry, E, SE Lester. 2006. The biogeography of plant reproduction: potential determinants of

species' range sizes. Journal of Biogeography 33:1975-1982.

Lundy, M, I Montgomery, J Russ. 2010. Climate change-linked range expansion of Nathusius'

pipistrelle bat, Pipistrellus nathusii (Keyserling & Blasius, 1839). Journal of

Biogeography 37:2232-2242.

Lynch, M, WG Hill. 1986. Phenotypic evolution by neutral mutation. Evolution 40:915-935.

Mandak, B, K Bimova, P Pysek, J Stepanek, I Plackova. 2005. Isoenzyme diversity in

Reynoutria (Polygonaceae) taxa: escape from sterility by hybridization. Plant Systematics

and Evolution 253:219-230.

Mandak, B, P Pysek, K Bimova. 2004. History of the invasion and distribution of Reynoutria

taxa in the Czech Republic: a hybrid spreading faster than its parents. Preslia 76:15-64.

Maron, JL, M Vila, R Bommarco, S Elmendorf, P Beardsley. 2004. Rapid evolution of an

invasive plant. Ecological Monographs 74:261-280.

Masel, J. 2006. Cryptic genetic variation is enriched for potential adaptations. Genetics

172:1985-1991. 21 McGuigan, K, N Nishimura, M Currey, D Hurwit, WA Cresko. 2011. Cryptic genetic variation

and body size evolution in the threespine sickelback. Evolution 65:1203-1211.

McGuigan, K, CM Sgro. 2009. Evolutionary consequences of cryptic genetic variation. Trends

in Ecology & Evolution 24:305-311.

McGuigan, K, CM Sgro. 2009. Evolutionary consequences of cryptic genetic variation. Trends

in Ecology & Evolution 24:305-311.

Moeller, DA, MA Geber, P Tiffin. 2011. Population genetics and the evolution of geographic

range limits in an annual plant. American Naturalist 178 Suppl 1:S44-57.

Nei, M, T Maruyama, R Chakraborty. 1975. Bottleneck Effect and Genetic-Variability in

Populations. Evolution 29:1-10.

Otte, A, R Franke. 1998. The ecology of the Caucasian herbaceous perennial Heracleum

mantegazzianum Somm. et Lev. (Giant Hogweed) in cultural ecosystems of Central

Europe. Phytocoenologia 28:205-232.

Otto, SP, J Whitton. 2000. Polyploid incidence and evolution. Annual Review of Genetics

34:401-437.

Pandit, MK, MJO Pocock, WE Kunin. 2011. Ploidy influences rarity and invasiveness in plants.

Journal of Ecology 99:1108-1115.

Petit, RJ. 2011. Early insights into the genetic consequences of range expansions. Heredity

106:203-204.

Poland, TM, DG McCullough. 2006. Emerald ash borer: Invasion of the urban forest and the

threat to North America's ash resource. Journal of Forestry 104:118-124. 22 Puillandre, N, S Dupas, 0 Dangles, JL Zeddam, C Capdevielle-Dulac, K Barbin, M Torres-

Leguizamon, JF Silvain. 2008. Genetic bottleneck in invasive species: the potato tuber

moth adds to the list. Biological Invasions 10:319-333.

Richardson, DM, W Thuiller. 2007. Home away from home - objective mapping of high-risk

source areas for plant introductions. Diversity and Distributions 13:299-312.

Sakai, AK, FW Allendorf, JS Holt, et al. 2001. The population biology of invasive species.

Annual Review of Ecology and Systematics 32:305-332.

Sangster, TA, N Salathia, HN Lee, E Watanabe, K Schellenberg, K Morneau, H Wang, S

Undurraga, C Queitsch, S Lindquist. 2008. HSP90-buffered genetic variation is common

in Arabidopsis thaliana. Proceedings of the National Acadamy of Sciences USA

105:2969-2974.

Schlaepfer, DR, PJ Edwards, JC Semple, R Billeter. 2008. Cytogeography of Solidago gigantea

(Asteraceae) and its invasive ploidy level. Journal of Biogeography 35:2119-2127.

Schlichting, CD. 2008. Hidden Reaction Norms, Cryptic Genetic Variation, and Evolvability.

Year in Evolutionary Biology 2008 1133:187-203.

Suda, J, P Travnicek, B Mandak, K Berchova-Bimova. 2010. Genome size as a marker for

identifying the invasive alien taxa in Fallopia section Reynoutria. Preslia 82:97-106.

Templeton, AR. 1980. The Theory of Speciation Via the Founder Principle. Genetics 94:1011-

1038.

Treier, UA, O Broennimann, S Normand, A Guisan, U Schaffner, T Steinger, H Muller-Scharer.

2009. Shift in cytotype frequency and niche space in the invasive plant Centaurea 23 maculosa. Ecology 90:1366-1377.

Turelli, M, NH Barton. 2006. Will population bottlenecks and multilocus epistasis increase

additive genetic variance? Evolution 60:1763-1776. van Heerwaarden, B, Y Willi, TN Kristensen, AA Hoffmann. 2008. Population bottlenecks

increase additive genetic variance but do not break a selection limit in rain forest

Drosophila. Genetics 179:2135-2146.

Verhoeven, KJF, M Macel, LM Wolfe, A Biere. 2011. Population admixture, biological

invasions and the balance between local adaptation and inbreeding depression.

Proceedings of the Royal Society B-Biological Sciences 278:2-8.

Wang, JL, A Caballero, PD Keightley, WG Hill. 1998. Bottleneck effect on genetic variance: A

theoretical investigation of the role of dominance. Genetics 150:435-447.

Welk, E, K Schubert, MH Hoffmann. 2002. Present and potential distribution of invasive garlic

mustard (Alliaria petiolata) in North America. Diversity and Distributions 8:219-233.

Willis, JH, HA Orr. 1993. Increased heritable variation following population bottlenecks - the

role of dominance. Evolution 47:949-957.

Wright, S. 1931. Evolution in Mendelian populations. Genetics 16:0097-0159.

Xu, CY, MH Julien, M Fatemi, C Girod, RD Van Klinken, CL Gross, SJ Novak. 2010.

Phenotypic divergence during the invasion of Phyla canescens in Australia and France:

evidence for selection-driven evolution. Ecology Letters 13:32-44.

Yeyati, PL, RM Bancewicz, J Maule, V van Heyningen. 2007. Hsp90 selectively modulates

phenotype in vertebrate development. PLoS Genetics 3:431-447. 24 Zhang, ZQ, QJ Li. 2008. Autonomous selfing provides reproductive assurance in an alpine

ginger Roscoea schneideriana (Zingiberaceae). Annals of Botany 102:531-538.

25 CHAPTER 2

NO RELATIONSHIP BETWEEN SPECIES' COLONIZATION HISTORY AND

PLANT PERFORMANCE IN THE GENUS CAPSELLA

26 Abstract

Biological invasion by exotic species is a growing threat to biodiversity, yet the

traits that characterize invaders are largely unknown. To investigate the relationship

between ecological stoichiometry and colonization ability, we compared nitrogen use

efficiency and other plant performance-related traits under three different levels of soil

nitrogen in three closely related members of the genus Capsella that differ widely in their colonization histories. The investigation of traits conferring advantages in the

colonization phase of species invasion, rather than the competition phase, may be

beneficial in identifying invaders early, and lead to more effective preventative measures

against the spread of invasive species.

We treated the three species with three different levels of soil nitrogen

concentration, and measured nitrogen use efficiency, biomass, root:shoot ratio, carbon

and nitrogen leaf content, and carbon: nitrogen ratio. The differences between species and

treatments, as well as between populations and treatments, were analyzed using mixed effects models.

There were no differences in any of the traits measured between species, but a

large degree of between-population variation was observed. The traits that were measured

are likely unrelated to the differences in colonization history between the three Capsella

species. However, the large degree of between-population variation indicates a large

potential for local adaptation that was present prior to species divergence.

27 Introduction

Biological invasion by exotic species is a growing threat to biodiversity and ecosystem functioning worldwide. An understanding of the invasion process is critical to preventing or controlling the spread of invasives, and preserving ecosystem integrity

(Vitousek et al. 1997; Simberloff, 2005). Most invasive species are non-native (exotic) to the ecosystem that they threaten, but only a subset of non-native species becomes invasive (Richardson & Pysek, 2006). Many species introduced to novel ecosystems fail to establish populations, or simply acclimatize to the community, without displacing native species, or altering the functioning of the ecosystem. Despite extensive efforts to predict species' potential invasiveness, the ecological traits that characterize invaders in general have remained elusive, and are a major focus in invasion biology (Rejmanek

1996; Hastwell 2005).

Ecological stoichiometry, a discipline focused on the relationship between the balance of chemical elements as resources and the functioning of biological systems, has recently been proposed as an important component in both organism invasiveness and ecosystem invasibility (Gonzalez et al. 2010). The theory of'fluctuating resource availability' predicts that community invasibility is correlated with resource availability, since invaders will be less resource-limited, and nutrient rich habitats tend to be more invaded than those that are nutrient-poor (Davis et al. 2000; Blumenthal 2005). Although resources may benefit both natives and exotics, the exotics may have a greater ability to take advantage of resources. 28 Some invaders have shown higher productivity under nutrient enrichment, while their competing natives do not (Gonzalez et al. 2010). Invasive perennial forbs were demonstrated to have higher leaf nitrogen accumulation and higher nitrogen use efficiency compared to their native counterparts (Drenovsky et al. 2008). In these cases, higher resource availability can facilitate invasion. However, habitats low in nutrients and other resources are also susceptible to biological invasion (Funk and Vitousek, 2007).

Low nutrient environments may be invaded by exotics with higher resource use efficiencies and lower nutrient requirements for growth and reproduction. Indeed, invasives have been shown to outperform native species in both low and high resource environments, as they can better take advantage of high nutrient availability (Chun et al.

2007; Hastwell et al. 2008; Gonzalez et al. 2010) and are more efficient when resources are limited (Funk and Vitousek 2007). Therefore, nutrient use efficiency, and other traits related to organism stoichiometry and overall performance, may be important factors in the success of an invading species.

Here, our focus is on plant species, and the stoichiometric traits that influence exotic plant invasiveness. Traits related to plant performance, stoichiometry, and potential invasiveness include plant biomass, nutrient use efficiency, the proportion of biomass allocation to roots, leaf nutrient content, and ratio of carbon to nutrients. Total above- and below- ground biomass can be used as a surrogate measure to evaluate plant performance (Farris & Lechowicz 1990; Givnish 2002; Funk et al. 2008). The proportion of biomass that a plant allocates to its roots, which can be measured by root biomass and 29 the root to shoot ratio, affects its ability to capture nutrients from the soil, and a higher allocation of biomass to roots can increase resource acquisition (Fransen et al, 1999).

Root biomass is correlated with leaf nutrient content, another indicator of a plant's success at capturing nutrients (Fransen et al. 1998). Carbon content, and the carbon: nutrient ratio, are indicators of a plant's growth rate and nutrient demands, as those with higher growth rates and nutrient requirements have lower carbon:nutrient ratios (Elser et al. 2006; Gonzalez et al. 2010).

Phenotypic plasticity, the change in the expression of a genotype in response to environmental factors (Schlichting and Levin 1986), has often been suggested as a trait that increases species' invasiveness (Schlichting and Levin, 1986; Rejmanek and

Richardson 1996; Sultan 2001; Pigliucci 2005). This is because phenotypic plasticity should enable a colonizing species to tolerate a wide range of environmental conditions

(Schlichting and Levin, 1986), as well as take advantage of environmental fluctuations, thereby facilitating population establishment (Callaway et al. 2003; Pigliucci 2005).

Invasive species are expected to have high phenotpypic plasticity (Marshall & Jain, 1968;

Sultan 2001; Pigliucci 2005), and while empirical evidence exists (Pattison et al. 1998;

Gerlach & Rice 2003; Funk 2008), there is a need for more phylogenetically and ecologically equivalent comparisons between invaders and non-invaders (Muth and

Pigliucci, 2006; Funk, 2008).

Plant biomass, nutrient use efficiency, and other physiological traits related to nutrient metabolism and plant performance, as well as the phenotypic plasticity of these 30 traits, are all factors that are expected to influence the ability of species to colonize novel environments, establish populations, and potentially become invasive. In this study, we focus on nitrogen physiology because it is the most common limiting nutrient for plants in terrestrial temperate ecosystems (Vitousek & Howarth 1991; Reich and Oleksyn

2004). We investigated nitrogen use efficiency (NUE), root and shoot biomass, and carbon and nitrogen leaf content in three very closely related and ecologically similar plant species belonging to the genus Capsella.

There are several stages to a biological invasion. These include the initial introduction, population establishment, growth and range expansion, followed by the out- competing or displacing of native species, or other negative impacts on the ecosystem

(Sakai et al. 2001). Invasives must be both successful colonizers, able to establish and expand their populations, as well as good competitors, if they are to out-compete native species. Here, we focus on the colonization stage, which is a prerequisite for becoming an invader, and yet has received relatively little attention compared to the competition stage.

The genus Capsella is ideal for a study of this nature, as the three species,

Capsella grandiflora, C. rubella, and C. bursa-pastoris are closely related, but differ greatly in their colonization history, from completely range-restricted, to non-invasive successful colonizer, to world-wide colonizer. Capsella grandiflora is an obligately outcrossing annual herb that is closely related to the genetic model Arabidopsis thaliana.

A native to Western Greece, its geographic range is largely restricted to this area (Hurka

& Neuffer, 1997; Paetsch et al. 2006). The effective population size is large, 31 approximately 500,000 individuals, and it shows no evidence for recent changes in

population size (Foxe et al. 2009). There is also relatively little population structure in

this species (Foxe et al. 2009; Slotte et al. 2010).

Capsella rubella has recently diverged from C. grandiflora in a single event that is estimated to have occurred within the last 20-139 000 years, a recent event in evolutionary time (St. Onge, 2011). This speciation was associated with a transition in

mating system from outcrossing to self-fertilizing, and this was followed by a geographic

range expansion throughout most of Southern Europe, as well as Middle Europe, North

Africa, Australia, and America (Hurka & Neuffer, 1997; Paetsch et al. 2006). Genetic diversity is greatly reduced in C. rubella compared to C. grandiflora, due to a nearly complete population bottleneck associated with the speciation event combined with the mating system transition (Foxe et al. 2009). Capsella rubella has a much smaller effective population size than C. grandiflora, by approximately 100 to 1500 fold (Foxe et al. 2009).

The third member of this genus is Capsella bursa-pastoris, which also diverged from C. grandiflora (Hurka & Neuffer, 1997), but less recently, an estimated 43 000 to

430 000 years ago (Slotte et al. 2006; St. Onge et al. 1012). There is evidence for a severe founder event during speciation, followed by population expansion (Slotte et al. 2008). It is predominantly self-fertilizing, and is tetraploid, while the other two Capsella species are diploid. The distribution of C. bursa-pastoris is worldwide. It is one of the most successful colonizing angiosperms in the world (Hurka & Neuffer, 1997). 32 In this study, we investigate the traits related to plant performance and stoichiometry in order to determine if they are involved in the colonization phase of

biological invasion, by comparing these traits in the three closely related species in the

genus Capsella, that have different colonization history. We address the three following questions:

1) Are the successful colonizers, Capsella rubella and Capsella bursa-pastoris,

more efficient in their nitrogen use than the geographically stable Capsella grandiflora,

and is the response proportional to the historical colonization success of each species?

2) Do the colonizers differ from C. grandiflora in other traits which are expected to affect ecological success in new environments, including plant performance, root allocation, leaf nitrogen content, and nitrogen requirements?

3) Do the three species differ in their phenotypic plasticity of these traits? Do the successful colonizers show patterns that are distinct from the geographically limited species? Is plasticity correlated with the historical colonization success of each species?

33 Materials and Methods

Capsella grandiflora seeds were collected by Barbara Neuffer (University of

Osnabruk, Germany) in Greece, C. rubella and C. bursa-pastoris seeds were collected in

Spain, Greece, Argentina, and Germany (Appendix SI, see Supplemental Data with the online version of this article). For each species, seeds from five populations, and four different accessions (mother plants) within each population, were included in the study.

Six seeds per accession were used, two for each nitrogen level (Appendix S2, see

Supplemental Data with the online version of this article), to make a total of 360 plants.

Seeds were sterilized, plated on half-strength Murashige-Skoog nutrient medium, and placed in the fridge at 4°C, to be vernalized for three weeks, after which they were removed from the fridge and stored at room temperature for several days to germinate

until they were large enough to be planted. Seeds were planted in single 495ml square pots, 1 plant per pot, in a 1:1 mixture of profile and turface: two nutrient-free potting mixes whose combination resembles the water retention of soil. Fertilization began immediately after planting. Plants were treated with nutrient solutions containing high

(lOmM) medium (3mM), and low (0.1mM) levels of nitrate, and sufficient levels of all other plant essential nutrients following Loudet et al. 2003 (Appendix 4, see

Supplemental Data with the online version of this article). Nutrient solutions were stabilized at a pH between 6 and 6.5. Pots were arranged in trays in the greenhouse in a completely randomized Latin square design (Appendix 5, see Supplemental Data with the online version of this article). Trays were rotated clockwise, and moved one tray position 34 to the right, every two weeks. Time to flowering was recorded when the first flower appeared. Plants were harvested at flowering, dried in a drying oven for 48 hours at 60°C, and weighed to obtain the root and shoot biomass at flowering. Dried leaves were ground to a fine powder, and their carbon and nitrogen content was measured with a C:N

Analyzer. Nitrogen use efficiency was calculated by dividing shoot dry weight (g) by leaf nitrogen content (g), following Good et al. (2004).

Data for all plant traits were analyzed by fitting a mixed effects model, hereafter referred to as 'Model 1where the explanatory variables included nitrogen treatment level, species, and their interaction factor, as fixed effects. The random effect was accession (maternal family) nested within population. The explanatory power of the model was evaluated by comparison to the null model using a log likelihood ratio test.

Explanatory factors were assessed by comparing the full model with a reduced model, excluding the factor under assessment, using a Chi Square test. We also considered a second mixed effects model, hereafter referred to as 'model 2', which included nitrogen treatment level and population, as well as the interaction factor of nitrogen treatment by population, as fixed effects. Accession nested within population was again included as a random effect. All statistical analyses were performed using the R 2.13.1 statistical programming environment (R Development Core Team, 2005).

35 Results

Nitrogen use efficiency (NUE) for the three species and nitrogen levels is shown in Figure 1 A. All species responded to differences in nitrogen treatment, increasing their use efficiency with decreasing nitrogen availability. All species showed the most efficiency in the low (0.1 mM) nitrogen treatment, and the least efficiency in the high (10 mM) nitrogen treatment, though a much greater difference is observed between the medium and low treatments compared to the medium and high treatments (Figure 1 A).

Nitrogen use efficiency for all eight populations having at least three samples per treatments shown in Figure IB. The variability between populations both within and between species is striking, which indicates that population is a much better predictor of nitrogen use efficiency than species is. Furthermore, there is also a significant interaction between population and nitrogen treatment in predicting nitrogen use efficiency (Table

2), indicating that there are different levels of phenotypic plasticity in this trait between populations.

All other plant traits measured, including whole plant biomass (Figure 2A), rootrshoot ratio (Figure 3A), aboveground biomass (Appendix S5A, see Supplemental

Data with the online version of this article), root biomass (Appendix S6A), leaf percent carbon (Appendix S7A), leaf percent nitrogen (Appendix S8A), and leaf carbon: nitrogen

(Appendix S9A) ratio show patterns similar to NUE in that all three species respond to the different nitrogen treatments, but there was no clear difference in the response of individual species (Table 1). Biomass increased with nitrogen supply (Figure 2A), as did 36 shoot biomass (Appendix S5A), root biomass (Appendix S6A), and percent nitrogen

(Appendix S8A). Root to shoot ratio decreased with increasing nitrogen availability

(Figure 5A), as did percent carbon (Appendix S7A), and the carbon to nitrogen ratio

(Appendix S9A).

Model 1, where the explanatory variables included nitrogen treatment level, species, and their interaction factor, as fixed effects and accession nested within population as the random effect, described the data significantly better than the null model, for all measured plant traits (Table 1), according to a log likelihood ratio test

(p<0.001 for root:shoot ratio, p<0.0001 for all other traits). Species was not a significant explanatory factor for any trait, nor was the interaction term (Table 1), but nitrogen treatment was highly significant for all traits in this model (Table 1, p<0.0001).

As species was not a significant predictor for any of the measured plant traits, a second model, Model 2, was built with population as an explanatory factor, in the place of species. Populations with fewer than three samples per treatment were excluded from this analysis. This model, like Model 1, described the data significantly better than the null model (Table 2, p <0.0001). Nitrogen treatment, population, and the nitrogen treatment by population interaction factor were all highly significant predictors of all measured plant traits (Table 2, 0.05< p <0.0001).

Population was a significant predictor of all measured plant traits, and there were significant interactions between populations and nitrogen treatments, indicating population-level differences in phenotypic plasticity for all traits (Table 2). The 37 variability between populations for these traits is shown in Figures 1B-3B (NUE, total biomass, and root:shoot ratio) and Appendix S5B-S9B (aboveground biomass, root biomass, leaf percent carbon, leaf percent nitrogen, and leaf carbon: nitrogen ratio).

38 Discussion

Despite extensive research efforts, the struggle to understand species invasiveness is ongoing. It has been suggested that opportunistic species with the ability to take advantage of available nutrients, or hardy species that are equipped to withstand harsh conditions by using resources more efficiently, as well as species with higher levels of phenotypic plasticity, are more likely to become good invaders compared to species more limited in their environmental tolerance (Chun et al. 2007; Drenovsky et al. 2008; Funk

2008; Hastwell et al. 2008; Gonzales et al. 2010). Here, we investigated the nitrogen use efficiency and other plant stoichiometric and performance-related traits in three species in the genus Capsella, that vary widely in their colonization history, but are closely related and otherwise ecologically similar (Hurka & Neuffer, 1997; Paetsch et al. 2006), thus fulfilling the need for phylogenetically equivalent comparisons between invasive and non-invasive species (Funk, 2008).

We sought to address whether the successful colonizers of this genus were more efficient in their nitrogen use than the non-colonizer, Capsella grandiflora, as well as whether they performed better overall, or showed differences in their tissue nitrogen and carbon concentrations, or the amount of biomass they partitioned to their root systems, which might indicate different nitrogen use strategies (Evans, 1989; Fransen et al., 1999).

We also investigated the degree of phenotypic plasticity for these traits in all three species, in order to determine if the colonizers were more plastic than the range-restricted species, and if plasticity was correlated with the degree of colonization success. 39 We found no evidence that nitrogen use efficiency (NUE), total biomass, root to

shoot biomass ratio, leaf tissue nitrogen or carbon concentration, or the leaf carbon to

nitrogen mass ratio differed between the three species Capsella grandiflora, C. rubella,

or C. bursa-pastoris. All species responded similarly to the differences in nitrogen

concentration in their soil, increasing efficiency when nitrogen was limited (Table 1,

Figures 1A-3A, Appendices S5A-S9A). Rather, there were major differences between the

populations of each species in their responses to different levels of nitrogen availability

(Table 2). Furthermore, no differences in phenotypic plasticity were observed between species in the investigated traits, as indicated by the non-significant interaction factor in

model 1 (Table 1).

There were highly significant differences between populations for all traits

examined, as well as in the phenotypic plasticity of these traits, which is measured by the

nitrogen treatment by population interaction factor in model 2 (Table 2). This indicates

that there are much greater differences among populations than between species. We can

therefore conclude that differences in nitrogen use efficiency, plant performance, root

allocation, and tissue concentrations of carbon and nitrogen, as well as the phenotypic

plasticity for these traits, are not responsible for the large differences in colonization

history between the three Capsella species. Although we cannot determine whether the

population differences are adaptive with our current analysis, the variation among

populations highlights that there is clearly plenty of scope for local adaptation, even in

the highly bottlenecked selfing species. 40 We find no evidence for the fluctuating resource hypothesis, which suggests that invasive species are better able to take advantage of excess available nutrients in disturbed areas or those with high atmospheric nitrogen deposition (Davis et al. 2000;

Blumenthal 2005), nor do we find evidence that invasive species may be more tolerant to low nutrient systems by having increased nitrogen use efficiency (Funk and Vitousek

2007). Our data are not consistent with other studies that have shown invasives to outperform native species through their enhanced abilities to capture nutrients, including having higher nitrogen use efficiency and leaf nitrogen concentration, as well as allocating a larger proportion of their biomass to root systems, and generally displaying higher levels of phenotypic plasticity (Richards et al. 2006; Drenovsky et al 2008; Funk

2008).

However, the aforementioned studies involve comparisons between invasive plants and the natives that co-occur in their invaded range, whereas our comparison between the three Capsella species involves close relatives that originated from the same geographic area, and have different colonization histories and patterns of range expansion. A recent meta-analysis found similar results, with respect to phenotypic plasticity. Palacio-Lopez (2011) compared 93 pairs of invasive and non-invasive closely related plant species and found no difference in phenotypic plasticity between the two groups. These results imply that the advantage attained by invasive species through nutrient acquisition and use efficiency in low and high resource environments may be

41 more related to a competition advantage over co-occuring native species, than to the

success of the initial colonization and population establishment phase of invasion.

As the major differences between the three Capsella species are their mating

systems and ploidy levels, these may be the traits affecting colonization ability and the

degree of range expansion in this genus, as has been previously suggested (Hurka and

Neuffer, 1997). The range expansion of Capsella rubella coincided with its divergence

from C. grandiflora, and a shift to self-compatibility, and the origin of C. bursa-pastoris

was also followed by a large range expansion (Hurka and Neuffer, 1997; Slotte et al.

2006, 2008; Foxe et al. 2009). The greater range expansion of Capsella bursa-pastoris compared to C. rubella may be due to more time having passed since its divergence from

C. grandiflora, an estimated 43 000 to 430 000 years ago (Slotte et al. 2006) compared to

20 000 years ago for C. rubella (Foxe et al. 2009). It also possible, though, that the great colonization success is attributable, at least in part, to its polyploidy, which is a trait

commonly present in good colonizers and invasive species (Hurka and Neuffer, 1997;

Pandit et al. 2011). It has been previously suggested that polyploidy provided C. bursa-

pastoris with the genetic variability to extend its range beyond that of the other Capsella

species (Hurka and Neuffer, 1997), particularly because its disomic inheritance increased intraspecific genetic diversity, and could have helped to avoid inbreeding depression

(Hintz et al. 2006).

Whatever the role of mating system and ploidy level may be, we can conclusively state that our results provide no evidence that the differences in colonization histories and 42 range expansion between Capsella grandiflora, C. rubella, and C. bursa-pastoris are due

to the performance and nitrogen-use related traits examined here.

Given the large amount of variation between populations in overall trait

measurements, as well as phenotypic plasticity, it is clear that local adaptation may be

the major driving force of evolution in the Capsella genus, with regards to the nitrogen

use and plant performance traits that were investigated here. The variation may also be a

result of relaxed constraint, if the plant traits investigated are not under pressure from

natural selection. Relaxed constraint of nitrogen use efficiency under high nitrogen levels

has been demonstrated in Arabidopsis lyrata. Populations were found to have vastly

different levels nitrogen use efficiency under low levels of nitrogen, as those in areas

with high amounts of historical atmospheric nitrogen deposition had lost their plastic

ability to become more efficient when nitrogen is limiting (Vergeer et al. 2008). It is

possible that a similar situation has arisen in the different Capsella populations, but

information on soil nitrogen levels in the geographic sampling areas would be needed to evaluate this hypothesis.

Since a great deal of variation exists between populations of all three species, we can conclude that it may have existed prior to the divergence of either colonizing species

from its ancestor, C. grandiflora, and that this genetic variation is riot the product of

novel selection pressures or evolution following range expansion.

Further investigation into the traits conferring advantages in the colonization

phase of species invasion, rather than the competition phase, may be beneficial in 43 identifying invaders early, before their populations grow to unmanageable levels. This may lead to more effective preventative measures against the spread of invasive species.

44 Tables

Nitrogen Nitrogen Treatment Measurement Model 1 Treatment Species * Species *** *** NUE - *** *** Whole plant biomass (g) - *** *** Aboveground biomass (g) - •** Root Biomass (g) - ** *** Root:Shoot (mass ratio) - *** #*• Percent Carbon - *** *** Percent Nitrogen - *** *** C:N (mass ratio) - *** pcO.OOl, ** p<0.01, * p<0.05, - not significant (p >= 0.05)

Table 1. Statistical significance of the first full model and its explanatory factors, as predictors of the measurements. The model was assessed by comparison to the null model using a log likelihood ratio test. Explanatory factors were assessed by comparing the model with a reduced model excluding the factor.

45 Nitrogen Nitrogen Treatment * Measurement Model 2 Treatment Population Population NUE *** *** *** *** Whole plant biomass (g) *** *** *** *** Aboveground biomass (g) *** *** *** *** Root Biomass (g) *** *** *** *** Root:Shoot (mass ratio) *** *** * ** Percent Carbon *** *** *** ** Percent Nitrogen *** *** *** *** C:N (mass ratio) *** *** *** **• *** p<0.001, ** p<0.01, * p<0.05, - not significant (p >= 0.05)

Table 2. Statistical significance of the second full model and its explanatory factors, as predictors of the measurements. The model was assessed by comparison to the null model using a log likelihood ratio test. Explanatory factors were assessed by comparing the model with a reduced model excluding the factor.

46 Figures

A

UJ o • Capsella grandiflora • Capsella rubella • Capsella bursa-pastoris m >% o 0 c <0 o £ •t ill o 0)CD D CO § © 8>

CM O

Low Medium High

Nitrogen Treatment

47 B

• grt tft m gr2 o m grS • nil • c • rv 3 .9? • bp) o • tips e in $ D c $> (n d ,S©

Low Medium High

Nitrogen Treatment

Figure 1. Nitrogen use efficiency for (A) the three Capsella species and (B) the eight

Capsella populations under three different levels of nitrogen fertilization. Midlines represent median values, boundaries of the boxes are the interquartile ranges (25th to 75th percentiles), and whiskers indicate the maximum and minimum values, up to 1.5 times the interquartile range. Outliers are plotted individually. (B) In the legend, Capsella grandijlora populations are denoted "gr" with a number denoting the individual population. Capsella rubella and C. bursa-pastoris are similarly denoted "ru" and "bp" respectively. Populations containing fewer than two individuals in one or more nitrogen treatment level were excluded in the graph.

48 A

• Capsella grandlflora • Capsella rubella • Capsella bursa-pastoris

Low Medium High

Nitrogen Treatment

49 B

• bp1 • bps

Low Medium High

Nitrogen Treatment

Figure 2. Log values of total biomass (grams) for (A) the three Capsella species and (B)

the eight Capsella populations under three different levels of nitrogen fertilization.

Midlines represent median values, boundaries of the boxes are the interquartile ranges

(25th to 75th percentiles), and whiskers indicate the maximum and minimum values, up to

1.5 times the interquartile range. Outliers are plotted individually. (B) In the legend,

Capsella grandiflora populations are denoted "gr" with a number denoting the individual

population. Capsella rubella and C. bursa-pastoris are similarly denoted "ru" and "bp"

respectively. Populations containing fewer than two individuals in one or more nitrogen

treatment level were excluded in the graph.

50 51 • Capsella grandiflora • Capsella rubella • Capsella bursa-pastoris

1 1

Low Medium High

Nitrogen Treatment

52 B

• grt • gr2 • grS • ry1 m ru2 3 m ru3 .2 • bpl • bp5

o o

Low Medium High

Nitrogen Treatment

Figure 3. Root to Shoot ratio (grams) for (A) the three Capsella species and (B) the eight

Capsella populations under three different levels of nitrogen fertilization. Midlines represent median values, boundaries of the boxes are the interquartile ranges (25th to 75th percentiles), and whiskers indicate the maximum and minimum values, up to 1.5 times the interquartile range. Outliers are plotted individually. (B) In the legend, Capsella grandiflora populations are denoted "gr" with a number denoting the individual population. Capsella rubella and C. bursa-pastoris are similarly denoted "ru" and "bp" respectively. Populations containing fewer than two individuals in one or more nitrogen treatment level were excluded in the graph.

53 References

Bloom, AJ, FS Chapin, HA Mooney. 1985. Resource limitation in plants - an economic analogy.

Annual Review of Ecology and Systematics 16:363-392.

Blumenthal, D. 2005. Ecology - Interrelated causes of plant invasion. Science 310:243-244.

Callaway, RM, SC Pennings, CL Richards. 2003. Phenotypic plasticity and interactions among

plants. Ecology 84:1115-1128.

Chun, YJ, ML Collyer, KA Moloney, JD Nason. 2007. Phenotypic plasticity of native vs.

invasive purple loosestrife: A two-state multivariate approach. Ecology 88:1499-1512.

Davis, MA, JP Grime, K Thompson. 2000. Fluctuating resources in plant communities: a general

theory of invasibility. Journal of Ecology 88:528-534.

Drenovsky, RE, CE Martin, MR Falasco, JJ James. 2008. Variation in resource acquisition and

utilization traits between native and invasive perennial forbs. American Journal of Botany

95:681-687.

Elser, JJ, DR Dobberfuhl, NA MacKay, JH Schampel. 1996. Organism size, life history, and N:P

stoichiometry. Bioscience 46:674-684.

Evans, JR. 1989. Photosynthesis and nitrogen relationships in leaves of C-3 plants. Oecologia

78:9-19.

Farris, MA, MJ Lechowicz. 1990. Functional interactions among traits that determine

reproductive success in a native annual plant. Ecology 71:548-557.

Foxe, JP, T Slotte, EA Stahl, B Neuffer, H Hurka, SI Wright. 2009. Recent speciation associated

with the evolution of selfing in Capsella. Proceedings of the National Academy of 54 Sciences of the United States of America 106:5241-5245.

Fransen, B, H de Kroon, F Berendse. 1998. Root morphological plasticity and nutrient

acquisition of perennial grass species from habitats of different nutrient availability.

Oecologia 115:351-358.

Fransen, B, H De Kroon, CGF De Kovel, F Van den Bosch. 1999. Disentangling the effects of

root foraging and inherent growth rate on plant biomass accumulation in heterogeneous

environments: A modelling study. Annals of Botany 84:305-311.

Funk, JL. 2008. Differences in plasticity between invasive and native plants from a low resource

environment. Journal of Ecology 96:1162-1173.

Funk, JL, PM Vitousek. 2007. Resource-use efficiency and plant invasion in low-resource

systems. Nature 446:1079-1081.

Gerlach, JD, KJ Rice. 2003. Testing life history correlates of invasiveness using congeneric plant

species. Ecological Applications 13:167-179.

Givnish, TJ. 2002. Ecological constraints on the evolution of plasticity in plants. Evolutionary

Ecology 16:213-242.

Gonzalez, AL, JS Kominoski, M Danger, S Ishida, N Iwai, A Rubach. 2010. Can ecological

stoichiometry help explain patterns of biological invasions? Oikos 119:779-790.

Guo, YL, JS Bechsgaard, T Slotte, B Neuffer, M Lascoux, D Weigel, MH Schierup. 2009.

Recent speciation of Capsella rubella from Capsella grandiflora, associated with loss of

self-incompatibility and an extreme bottleneck. Proceedings of the National Academy of

Sciences of the United States of America 106:5246-5251. Hastwell, GT, AJ Daniel, G Vivian-Smith. 2008. Predicting invasiveness in exotic species: do

subtropical native and invasive exotic aquatic plants differ in their growth responses to

macronutrients? Diversity and Distributions 14:243-251.

Hastwell, GT, FD Panetta. 2005. Can differential responses to nutrients explain the success of

environmental weeds? Journal of Vegetation Science 16:77-84.

Hintz, M, C Bartholmes, P Nutt, J Ziermann, S Hameister, B Neuffer, G Theissen. 2006.

Catching a 'hopeful monster': shepherd's purse (Capsella bursa-pastoris) as a model

system to study the evolution of flower development. Journal of Experimental Botany

57:3531-3542.

Hurka, H, B Neuffer. 1997. Evolutionary processes in the genus Capsella (). Plant

Systematics and Evolution 206:295-316.

Marshall, DR, SK Jain. 1968. Phenotypic plasticity of Avena fatua and A. barbata. American

Naturalist 102:457-&.

Muth, NZ, M Pigliucci. 2006. Traits of invasives reconsidered: Phenotypic comparisons of

introduced invasive and introduced noninvasive plant species within two closely related

clades. American Journal of Botany 93:188-196.

Paetsch, M, S Mayland-Quellhorst, B Neuffer. 2006. Evolution of the self-incompatibility

system in the Brassicaceae: identification of S-locus receptor kinase (SRK) in self-

incompatible Capsella grandiflora. Heredity 97:283-290.

Palacio-Lopez, K, E Gianoli. 2011. Invasive plants do not display greater phenotypic plasticity

than their native or non-invasive counterparts: a meta-analysis. Oikos 120:1393-1401. 56 Pandit, MK, MJO Pocock, WE Kunin. 2011. Ploidy influences rarity and invasiveness in plants.

Journal of Ecology 99:1108-1115.

Pattison, RR, G Goldstein, A Ares. 1998. Growth, biomass allocation and photosynthesis of

invasive and native Hawaiian rainforest species. Oecologia 117:449-459.

Pigliucci, M. 2005. Evolution of phenotypic plasticity: where are we going now? Trends in

Ecology & Evolution 20:481-486.

Reich, PB, J Oleksyn. 2004. Global patterns of plant leaf N and P in relation to temperature and

latitude. Proceedings of the National Academy of Sciences of the United States of

America 101:11001-11006.

Rejmanek, M, DM Richardson. 1996. What attributes make some plant species more invasive?

Ecology 77:1655-1661.

Richards, CL, O Bossdorf, NZ Muth, J Gurevitch, M Pigliucci. 2006. Jack of all trades, master of

some? On the role of phenotypic plasticity in plant invasions. Ecology Letters 9:981-993.

Richardson, DM, P Pysek. 2006. Plant invasions: merging the concepts of species invasiveness

and community invasibility. Progress in Physical Geography 30:409-431.

Robinson, D. 1994. The responses of plants to nonuniform supplies of nutrients. New

Phytologist 127:635-674.

Sakai, AK, FW Allendorf, JS Holt, et al. 2001. The population biology of invasive species.

Annual Review of Ecology and Systematics 32:305-332.

Schlichting, CD, DA Levin. 1986. Phenotypic plasticity - an evolving plant character. Biological

Journal of the Linnean Society 29:37-47. 57 Simberloff, D. 2005. Non-native species do threaten the natural environment! Journal of

Agricultural & Environmental Ethics 18:595-607.

Slotte, T, A Ceplitis, B Neuffer, H Hurka, M Lascoux. 2006. Intrageneric phylogeny of Capsella

(Brassicaceae) and the origin of the tetraploid C-bursa-pastoris based on chloroplast and

nuclear DNA sequences. American Journal of Botany 93:1714-1724.

Slotte, T, JP Foxe, KM Hazzouri, SI Wright. 2010. Genome-Wide Evidence for Efficient

Positive and Purifying Selection in Capsella grandiflora, a Plant Species with a Large

Effective Population Size. Molecular Biology and Evolution 27:1813-1821.

St Onge, KR, T Kallman, T Slotte, M Lascoux, AE Palme. 2011. Contrasting demographic

history and population structure in Capsella rubella and Capsella grandiflora, two closely

related species with different mating systems. Molecular Ecology 20:3306-3320.

Sultan, SE. 2001. Phenotypic plasticity and ecological breadth in plants. American Zoologist

41:1599-1599.

Sultan, SE. 2001. Phenotypic plasticity for fitness components in Polygonum species of

contrasting ecological breadth. Ecology 82:328-343.

Vitousek, P. 1982. Nutrient cycling and nutrient ues efficiency. American Naturalist 119:553-

572.

Vitousek, PM, JD Aber, RW Howarth, GE Likens, PA Matson, DW Schindler, WH Schlesinger,

DG Tilman. 1997. Human alteration of the global nitrogen cycle: Sources and

consequences. Ecological Applications 7:737-750.

Vitousek, PM, RW Howarth. 1991. Nitrogen limitation on land and in the sea - how can it occur. 58 Biogeochemistry 13:87-115.

59 Appendix and Supplementary Material

Appendix SI. Seed sample genotype labels and collecting locations.

Species Capsella grand/flora Capsella rubella Capsella bursa-pastoris

Population Label Sample Location Label Sample Location Label Sample Location 1 G1 910/18 Korfu, Greece R1 1209/38-2 Tenerlffe, Spain bl 1475/9 Tartastan, Russia G2 910/20 Korfu, Greece R2 1209/26-2 Tenerlffe, Spain b2 1475/8 Tartastan, Russia G3 910/21 Korfu, Greece R3 1209/24 Tenerlffe, Spain b3 1475/6 Tartastan, Russia G4 910/19 Korfu, Greece R4 1209/36 Tenerlffe, Spain b4 1475/13 Tartastan, Russia 2 G5 88.21 Ioaninna, Greece R5 925/9 Igoumenltsa, Greece b5 578/10 Parkplatzes, Germany G6 88.12 Ioanlnna, Greece R6 925/4 Igoumenltsa, Greece b6 578/3 Parkplatzes, Germany G7 88.58 Ioaninna, Greece R7 925/3 Igoumenltsa, Greece b7 578/15 Parkplatzes, Germany G8 88.55 Ioanlnna, Greece R8 925/6 Igoumenltsa, Greece b8 578/4 Parkplatzes, Germany 3 G9 926/8 Korfu, Greece R9 1377/2 Buenos Aires, Argentina b9 1171/13 Oulo, Finland G10 926/4 Korfu, Greece R10 1377/5 Buenos Aires, Argentina bio 1171/14 Oulo, Finland Gil 926/3 Korfu, Greece Rll 1377/10 Buenos Aires, Argentina bll 1171/12 Oulo, Finland G12 926/1 Korfu, Greece R12 1377/18 Buenos Aires, Argentina bl2 1171/15 Oulo, Finland 4 G13 928/6 Metsovo, Greece R13 1504/10 La Palma, Italy bl3 740/4 Nevada, USA G14 928/7 Metsovo, Greece R14 1504/11 La Palma, Italy bl4 740/1 Nevada, USA G15 928/5 Metsovo, Greece R15 1504/8 La Palma, Italy bl5 740/8 Nevada, USA G16 928/4 Metsovo, Greece R16 1504/2 La Palma, Italy bl6 740/7 Nevada, USA 5 G17 934/32 Metsovo, Greece R17 698/2 Die Hutte, Germany bl7 1272/17 Matalascanas, Spain G18 934/30 Metsovo, Greece R18 698/4 Die Hutte, Germany bl8 1272/20 Matalascanas, Spain G19 934/29 Metsovo, Greece R19 698/8 Die Hutte, Germany bl9 1272/19 Matalascanas, Spain G20 934/31 Metsovo, Greece R20 698/1 Die Hutte, Germany b20 1272/16 Matalascanas, Spain

60 Appendix S2. A branch diagram representing the sampling design for this study.

3 Species

Capula Capsflia Capwiaburea- rubela grandrftora pastora

• i \ -

» p • « » Pap 1 Pap 2 Pop 3 Pop 4 Pap 5

• » * * F«n1 Ftn 2 ftmJ f«m4

2 Scads 2 Soodi Mk^IN , ^N 2 SMdt MedumN

61 Appendix S3. Randomized Latin Square design used in the greenhouse. Coloured squares represent pots.

Rectangles labeled "Tray" represent trays containing the pots. Each pot is labeled with the genotype of the seed it contains. The first letters of the genotype labels, and the colours of the squares correspond to the species: Green squares are Capsella grandiflora (G), pink are C. rubella (R), and blue squares are C. bursa- pastoris (b). This design was replicated for each nitrogen treatment.

Nitrogen Treatment 1 - Low Tray Tray Tray 1123456 2123456 3123456

Tray 7 1 2 3 4 5 6

62 Appendix S4. Nutrient Solution Recipes.

10 mM Nitrate Solution: 5 mM KN03 potassium nitrate 2.5 mM Ca(N03)2 calcium nitrate 0.25 mM KH2P04 potassium phosphate (monobasic) 0.25 MgS04 mM magnesium sulfate (anhydrous) 0.2 mM NaCl sodium chloride

3 mM Nitrate Solution: 2.5 mM KN03 potassium nitrate 0.25 mM Ca(N03)2 calcium nitrate 0.25 mM K.H2P04 potassium phosphate 0.25 mM MgS04 magnesium sulfate 0.25 mM CaC12 calcium chloride 0.2 mM NaCl sodium chloride

0.1 mM Nitrate Solution: 0.1 mM KN03 potassium nitrate 0.25 mM KH2P04 potassium phosphate 0.325 mM K2S04 potassium sulfate 0.25 mM MgS04 magnesium sulfate 0.5 mM CaC12 calcium chloride 0.2 mM NaCl sodium chloride

63 Appendix S5. Logarithm of aboveground (shoot) biomass (grams) for (A) the three

Capsella species and (B) the eight Capsella populations under three different levels of

nitrogen fertilization. Midlines represent median values, boundaries of the boxes are the

interquartile ranges (25th to 75th percentiles), and whiskers indicate the maximum and

minimum values, up to 1.5 times the interquartile range. Outliers are plotted individually.

(B) In the legend, Capsella grandiflora populations are denoted "gr" with a number denoting the individual population. Capsella rubella and C. bursa-pastoris are similarly denoted "ru" and "bp" respectively. Populations containing fewer than two individuals in one or more nitrogen treatment level were excluded in the graph.

A

• Capsella grandiflora in • Capsella rubella O) d • Capsella bursa-pastoris o © .2 CO m o 8 .£ C/) O £ •c£ CO o g> csi

csi

Low Medium High

Nitrogen Treatment

64 1/5 © o o • dp 7 (A • bp5 9

rsi© m csi

Low Medium High

Nitrogen Treatment

65 Appendix S6. Root biomass (grams) for (A) the three Capsella species and (B) the eight

Capsella populations under three different levels of nitrogen fertilization. Midlines represent median values, boundaries of the boxes are the interquartile ranges (25th to 75th percentiles), and whiskers indicate the maximum and minimum values, up to 1.5 times the interquartile range. Outliers are plotted individually. (B) In the legend, Capsella grandiflora populations are denoted "gr" with a number denoting the individual population. Capsella rubella and C. bursa-pastoris are similarly denoted "ru" and "bp" respectively. Populations containing fewer than two individuals in one or more nitrogen treatment level were excluded in the graph.

A

• Capsella grandiflora • Capsella rubella • Capsella bursa-pastoris

•I. t .miLM. a ..J. ...1..

Low Medium High

Nitrogen Treatment

66 B

• gr2 | I • bp1 • bps •I

Low Medium High

Nitrogen Treatment

67 Appendix S7. Percent carbon contained in leaves for (A) the three Capsella species and

(B) the eight Capsella populations under three different levels of nitrogen fertilization.

Midlines represent median values, boundaries of the boxes are the interquartile ranges

(25th to 75th percentiles), and whiskers indicate the maximum and minimum values, up to

1.5 times the interquartile range. Outliers are plotted individually. (B) In the legend,

Capsella grandiflora populations are denoted "gr" with a number denoting the individual population. Capsella rubella and C. bursa-pastoris are similarly denoted "ru" and "bp" respectively. Populations containing fewer than two individuals in one or more nitrogen treatment level were excluded in the graph.

A

o • Capsella grandiflora • Capsella rubella • Capsella buraa-pastoris

Low Medium High

Nitrogen Treatment

68 B

• grl OXi m gr2 B gr5 B rul B ru2 B ro3 a bpi • 6pS <3 o _ i 0 I l T S inco • 1 i^T a oco

Low Medium High

Nitrogen Treatment

69 Appendix S8. Percent nitrogen contained in leaves for (A) the three Capsella species and

(B) the eight Capsella populations under three different levels of nitrogen fertilization.

Midlines represent median values, boundaries of the boxes are the interquartile ranges

(25th to 75th percentiles), and whiskers indicate the maximum and minimum values, up to

1.5 times the interquartile range. Outliers are plotted individually. (B) In the legend,

Capsella grandiflora populations are denoted "gr" with a number denoting the individual population. Capsella rubella and C. bursa-pastoris are similarly denoted "ru" and "bp" respectively. Populations containing fewer than two individuals in one or more nitrogen treatment level were excluded in the graph.

A

00 - • Capsella grandiflora • Capsella rubella • Capsella bursa-pastoris & to - P

c ®

Q.

CO -

CM - 1 !

Low Medium High

Nitrogen Treatment

70 ! • bpl • bpS

Low Medium High

Nitrogen Treatment

\

71 Appendix S9. Carbon to nitrogen ratio in the leaves for (A) the three Capsella species and

(B) the eight Capsella populations under three different levels of nitrogen fertilization.

Midlines represent median values, boundaries of the boxes are the interquartile ranges

(25th to 75th percentiles), and whiskers indicate the maximum and minimum values, up to

1.5 times the interquartile range. Outliers are plotted individually. (B) In the legend,

Capsella grandiflora populations are denoted "gr" with a number denoting the individual

population. Capsella rubella and C. bursa-pastoris are similarly denoted "ru" and "bp"

respectively. Populations containing fewer than two individuals in one or more nitrogen

treatment level were excluded in the graph.

A

• Capsella grand/flora • Capsella rubella • Capsella bursa-pastoris

Low Medium High

Nitrogen Treatment

72 • bpl O bp5

Medium

Nitrogen Treatment

73 CHAPTER 3

SIGNATURES OF BALANCING SELECTION ARE MAINTAINED AT

DISEASE RESISTANCE LOCI FOLLOWING MATING SYSTEM EVOLUTION

AND A POPULATION BOTTLENECK IN THE GENUS CAPSELLA

74 Abstract

Population bottlenecks can lead to a loss of variation at disease resistance loci, which could have important consequences for the ability of populations to adapt to pathogen pressure. Alternatively, current or past balancing selection could maintain high diversity, creating a strong heterogeneity in the retention of polymorphism across the genome of bottlenecked populations. We sequenced part of the LRR region of 9 NBS-

LRR disease resistance genes in the outcrossing Capsella grandiflora and the recently derived, bottlenecked selfing species Capsella rubella, and compared levels and patterns of nucleotide diversity and divergence with genome-wide reference loci.

In strong contrast with reference loci, average diversity at resistance loci was comparable between C. rubella and C. grandiflora, primarily due to two loci with highly elevated diversity indicative of past or present balancing selection. Average between- species differentiation was also reduced at the set of R-genes compared with reference loci, which is consistent with the maintenance of ancestral polymorphism. Historical or ongoing balancing selection on plant disease resistance genes is a likely contributor to the retention of ancestral polymorphism in some regions of the bottlenecked Capella rubella genome.

75 Introduction

The prevalence of adaptive evolution in natural populations is one of the most

widely investigated questions in evolutionary genetics. The long-held theory that the vast

majority of mutations are either neutral or strongly deleterious (Kimura 1968), has

recently come into question, in light of evidence to the contrary. Several model

organisms, including Drosophila melangaster (Smith, Eyre-Walker 2002), Mus musculus

(Halligan et al. 2010), Escherichia coli (Charlesworth, Eyre-Walker 2006), Capsella

grandiflora (Slotte et al. 2010), and several Helianthus species (Strasburg et al. 2011), are

estimated to have large proportions (40-50%) of amino acid divergence driven to fixation

by positive selection. Estimates for other organisms, using comparable approaches, are

much lower and thus more consistent with the neutral theory. These include humans

(Zhang, Li 2005; Boyko et al. 2008), and Arabidopsis thaliana (Bustamante et al. 2002;

Nordborg et al. 2005; Foxe et al. 2008; Slotte et al. 2010). The presence and prevalence

of species-wide fixations of beneficial mutations across genomes therefore appears to

vary among taxa, and is currently a focal point of interest in the field of molecular

evolution.

One possible reason for differences in the amount of adaptive evolution between species is a difference in effective population size (Kimura 1968; Ohta 1992). Effective

population size influences the substitution rate of beneficial mutations. Smaller

populations will have lower rates of adaptive substitution compared to larger ones, in

addition to having an increased number of slightly deleterious mutations fixed by genetic 76 drift (Ohta 1973). This will reduce the efficiency and prevalence of both positive and

purifying selection in the genome. The genetic model plant Arabidopsis thaliana has

been shown to have less efficient positive and purifying selection compared to its close

relative Capsella grandiflora, a result that is consistent with the higher population

structure, recent range expansion, and lower effective population size of A. thaliana

(Slotte et al. 2010). A comparison of another genetic model system, the house mouse

(Mus musculus), to humans showed a similar pattern according to their differences in

effective population size (Halligan et al. 2010), and an analysis of six sunflower species

(Helianthus) showed a positive correlation between effective population size and the rate

of adaptive evolution by positive selection (Strasburg et al. 2011).

In addition to the effects on the rates of positive and negative selection, species'

differences in effective population sizes can influence the strength and impact of

balancing selection. On the one hand, severe reductions in effective population size

could lead to a loss of the diversity that is maintained by balancing selection, which could

have important deleterious consequences. For instance, when considering a species'

ability to maintain resistance to parasites, balancing selection is critical at the major

histocompatibility locus (MHC) in vertebrates (fish (Xu, Sun, Wang 2010); prairie

chickens (Eimes et al. 2010); honeycreepers (Jarvi et al. 2004); voles (Bryja et al. 2006);

deer mice (Richman, Herrera, Nash 2001), foxes (Aguilar et al. 2004)), and balancing selection has also been shown in a number of cases in plants at disease resistance (R) loci

(Arabidopsis (Stahl et al. 1999; Tian et al. 2002; Tian et al. 2003); tomato (Rose, 77 Michelmore, Langley 2007); rice (Liu et al. 2010); grasses (Meyer et al. 2010). Strong reductions in effective size could greatly reduce the possibilities of maintaining resistance in populations (Aguilar et al. 2004). Alternatively, strong balancing selection, either past or present, could maintain high polymorphism in heavily bottlenecked populations at specific loci, showing a pronounced retention of diversity at specific regions of the genome, despite genome-wide loss of diversity. At the MHC locus, several cases of striking retention of diversity under severe bottlenecks have been found (Aguilar et al.

2004; Jarvi et al. 2004). In other cases, a loss of balancing selection has been observed at

MHC (Eimes et al. 2010). A similar pattern has been observed at the plant self- incompatibility (SI) locus, as long-term allelic variation maintained by balancing selection at SI was lost following an ancient population bottleneck of the Solanaceae

(Paape et al. 2008).

Here, we investigate the comparative population genetics of a set of disease resistance (R) genes in two plant species, Capsella grandiflora and Capsella rubella, two members of the Brassicaceae. Capsella grandiflora is an annual, self-incompatible herb that is closely related to the genetic model Arabidopsis thaliana (~10mya divergence time, (Koch, Kiefer 2005)). Capsella rubella, a recently diverged relative, is self- fertilizing, and has experienced a severe population bottleneck. The bottleneck and change in mating system resulted in a major reduction in genetic diversity and effective population size (Foxe et al. 2009). The fact that speciation is recent, coupled with the severity of the diversity reduction, make these two species a useful system in which to 78 explore the evolutionary fate of selected genes, in light of a dramatic shift in genetic background.

Capsella grandiflora is native to Western Greece, and its geographic range is largely restricted to this area, in addition to small populations in Albania and Northern

Italy (Hurka, Neuffer 1997; Paetsch, Mayland-Quellhorst, Neuffer 2006). Its effective population size is large, approximately 500,000 individuals, and appears to have been relatively stable over a long time period, as it shows no evidence for recent changes in population size (Foxe et al. 2009). There is also relatively little population structure in this species, and the effective rate of recombination is high (Foxe et al. 2009; St Onge et al. 2011). As stated earlier, selection has been inferred to be highly efficient in this species, with over 40% of amino acid divergence inferred to be subject to positive selection (Slotte et al. 2010).

Capsella rubella diverged from C. grandiflora in a single event that is estimated to have taken place within the last 20,000 years (Foxe et al. 2009; Guo et al. 2009).

Speciation was associated with the breakdown of self-incompatibility in C. rubella, and this species has evolved to be highly self-fertilizing (Hurka et al. 1989). The transition in mating system was followed by a geographic range expansion throughout most of

Southern Europe, as well as Middle Europe, North Africa, Australia, and North and

South America (Hurka, Neuffer 1997; Paetsch, Mayland-Quellhorst, Neuffer 2006).

Genetic diversity is greatly reduced in C. rubella compared to C. grandiflora, even more so than would be expected from inbreeding alone, due to a nearly complete population 79 bottleneck (Foxe et al. 2009). Capsella rubella therefore has a much smaller effective population size than C. grandiflora, approximately 100 to 1500 fold smaller, as well as a lower effective recombination rate (Foxe et al. 2009). These two species represent a recent and rapid dramatic shift in genomic characteristics, including a mating system transition, a reduction in genetic diversity and effective population size following a population bottleneck, and recent widespread expansion in geographic range. Despite this severe bottleneck, however, there is strong heterogeneity in the retention of polymorphism at different loci in C. rubella (Foxe et al. 2009; St Onge et al. 2011). One

possible explanation for this heterogeneity could be the maintenance of balancing

selection and/or historical balancing selection having led to a higher retention of diversity

at a subset of genomic regions.

The genes we investigated here are a subset of the genes thought to be involved in plant immune system function, the disease resistance (R) genes. These genes are abundant in every plant species investigated to date (Michelmore, Meyers 1998), and can be subdivided into classes based on their coding domains. The largest class are characterized by a nucleotide binding site combined with a region of leucine-rich repeats,

refered to as the NBS-LRR region, which is thought to be the site of pathogen

recognition. The R-genes are typically characterized by a gene-for-gene interaction (Flor

1955; Flor 1971), whereby each gene in the plant specifically recognizes an avirulence

(avr) gene in the pathogen, and recognition triggers a defense response in the plant.

80 Evidence for natural selection on plant R-genes, including positive and balancing

selection, has been well documented for several well-characterized genes in the genetic

model Arabidopsis thaliana, including RPM1, RPS2, RPS4, RPS5, RPP1, RPP13, and

RPP8 (Botella et al. 1998; McDowell et al. 1998; Caicedo, Schaal, Kunkel 1999; Stahl et

al. 1999; Bergelson et al. 2001; Tian et al. 2002; Rose et al. 2004). The majority of clear

evidence for selection was found to be balancing selection in these genes, with the

exception of RPS4, which has undergone a recent selective sweep (Bergelson et al.

2001), Plant R-genes often segregate for alleles that confer either resistance or

susceptibility to a specific pathogen. In the case of balancing selection, both alleles are

maintained over long time periods. The mechanism for this is proposed to be frequency-

dependent selection, where resistance genes are advantageous when the pathogen is

common, but incur a fitness cost when pathogens are rare. The result is a cycle of

resistance and susceptibility alleles that alternate in frequency following the dynamics of

the pathogen population (Stahl et al. 1999; Tian et al. 2002). Although it has not been

demonstrated directly to date, another possible mode of balancing selection could arise

when alleles at a single locus show varying specificities to different pathogen strains, and

they are subject to frequency-dependent selection (Rose, Michelmore, Langley 2007). On

the other hand, R genes may also often experience relaxed constraint under conditions

where target pathogens are absent and there is no cost of resistance (Gos, Wright 2008).

In general, large surveys of R-genes generally show more clear evidence for balancing selection than positive selection, although this is still only at a subset of loci (Bergelson et 81 al. 2001; Bakker et al. 2006). Overall, there are patterns indicating that new R-gene

alleles are constantly being generated but only briefly maintained have also been found at

disease resistance loci, a scenario closer to diversifying selection (Bakker et al. 2006).

Here, we aim to investigate the consequences of a severe population bottleneck and mating system transition on the polymorphism patterns at R-genes in the two

Capsella species. Genetic signatures of natural selection that are present in Capsella

grandiflora at disease resistance genes may be diminished or absent in C. rubella if the

bottleneck has effectively eroded allelic variation generated by selection. However, if the selective signatures at R-genes in C. grandiflora are also present in C. rubella, this would suggest that strong balancing or diversifying selection associated with pathogen resistance, or a history of such selection in C. grandiflora, has caused the allelic diversity

in these regions to be maintained in C. rubella, despite a genome-wide loss of neutral

variation. We take advantage of an extensive dataset on coding region polymorphism in the two species at 283 reference genes, in order to contrast R gene diversity in a comparable population sample with the genome-wide pattern.

82 Methods

Frozen leaf material from 8 accessions of Capsella grandiflora and 7 accessions

of C. rubella was used in this study. Samples of C. grandiflora were collected in Greece,

and those of C. rubella were collected from several countries in Europe (Table SI).

Sampling was conducted to largely match the populations and sample sizes of a survey of

genome-wide polymorphism at reference genes (Slotte et al. 2010) (Table S4).

The primers used for PCR (Table S2) were designed by Bakker (Bakker et al.

2006) to amplify fragments in the leucine-rich-repeat regions of R-genes in Arabidopsis.

Those included in the study were the primers that were successful in amplifying LRR

regions of R-genes in both Capsella species. Protein coding domains of the amplified

fragments were confirmed in the Arabidopsis thaliana genome using the Basic Local

Alignment Search Tool (BLAST) program, BlastX, using the default settings (Altschul et

al. 1990; Camacho et al. 2009) (Table S3). The single-copy nature of the R-genes was

confirmed by the lack of double peaks in the chromatograms for individuals of C.

rubella, indicating no 'heterozygosity' in this species reflective of the amplification of

gene duplicates.

DNA was extracted from frozen leaf material using the Dneasy Plant Mini Kit

(Qiagen), and amplified by polymerase chain reaction (PCR) in 96-well plates using a

MasterCycler thermocycler (Eppendorf). Temperature cycles began with 2 min at 94°C,

followed by 35 cycles of the following: 20s at 94°C, 20s at 55°C, and 40s at 72°C. When the cycles were completed the samples were kept at 72°C for 4 min, and then cooled to 83 4°C, after which they were moved to a -20°C freezer and stored until sequencing. Sanger

sequencing of PCR products was performed by McGill University and the Genome

Quebec Innovation Center (Quebec, Canada). Chromatograms were analyzed using

Sequencher 4.7 (Gene Codes, Ann Arbor, MI). Heterozygous sites were found by first

calling secondary peaks at the 35% threshold, followed by manual inspection of all

putative heterozygous positions. Sequence data from both forward and reverse sequence

strands were used for confirmation. Homologous regions of the sequences in the closely

related genetic model plant Arabidopsis thaliana were determined using BLAST

(Altschul et al. 1990), and aligned to the sequences using the software GeneDoc

(Nicholas 1997) in order to obtain an outgroup for estimates of nucleotide divergence.

Sequences with chromatograms of poor quality were excluded, as were those with sample sizes of less than 6 haploid sequences, and those containing fewer than 60

synonymous sites. In total, one sequence fragment from the LRR region of each gene was

included for each of the nine R-genes. Sequences from seven of the genes were included

for both species, and two were included only in C. grandiflora. The R-gene sample sizes

are therefore 9 and 7 for C. grandiflora and C. rubella, respectively.

Population summary statistics for nucleotide diversity, divergence, and frequency

were generated using a version of the perl script Polymorphurama (Bachtrog, Andolfatto

2006). Diversity statistics included two estimators of the population mutation parameter,

pi (jt) and Watterson's 0W. Pi (ji) is defined as the average pairwise number of nucleotide differences per site for a sample of DNA sequences (Tajima 1983; Nei, Tajima 1987), 84 and Watterson's 0W summarizes the amount of nucleotide diversity based on the total

number of segregating sites, and the sample size, in a group of DNA sequences

(Watterson 1975). The frequency spectrum of polymorphism was measured by Tajima's

D statistic, calculated by taking the difference between n and 0W (Tajima 1989). Average

pairwise divergence (K) was calculated, using Arabidopsis thaliana as an outgroup, as

the average number of nucleotide substitutions per site between species, with a Jukes and

Cantor correction (Nei, Tajima 1987). Direction and degree of selection was qualitatively

measured using the neutrality index (Rand, Kann 1996), based on the McDonald

Kreitman Test (McDonald, Kreitman 1991). This statistic measures the degree to which

the levels of amino acid variation within species depart from the expectations of

neutrality. Differentiation between species was measured using Wright's Fst Statistic

(Wright 1951; Weir, Cockerham 1984), which uses the amount of variation in SNP allele frequencies between samples of DNA sequences from different groups to determine the degree to which those groups are genetically dissimilar.

R-genes were noted as being single copy or members of gene clusters according to the information available for Arabidopsis thaliana (Bakker et al. 2006).

In order to detect non-neutral patterns of evolution, summary statistics for the R- genes were compared to a genome-wide sample of 'reference' genes from plants of both species from related plant families of equivalent geographic sampling (Slotte et al. 2010), each containing a minimum of 60 synonymous sites and a sample size of six. In total there were 283 neutral genes that met the above criteria for both species. R-gene 85 summary statistics were tested for significance in a two tailed test (p<0.05). The values of summary statistics for which individual R-genes fell within the 2.5% tails of the genome- wide distribution for the different summary statistics were noted. The R-genes as a group were compared to the neutral genome-wide distribution using permutation tests for the means of the summary statistics. For each permutation the means of the summary statistics were calculated for the samples of R-genes in both species, and compared to an equal number of reference genes resampled from the genome-wide dataset for ten thousand permutations, in order to calculate the proportion of permuted datasets for which the mean value was as or more extreme than the disease resistance loci at the one- tailed 2.5% level. Shared, unique, and fixed nucleotide differences at the R-loci between species were calculated using a perl script written by the authors.

Since the primers for the disease resistance genes used in this study were designed for Arabidopsis thaliana by Bakker (Bakker et al. 2006), there is complete overlap with the R-genes sampled here and those analyzed in A. thaliana (Bakker et al. 2006).

Therefore we correlated values for the nucleotide diversity statistic re, between both

Capsella species and Arabidopsis thaliana. Figures were produced by the statistical software R 2.13.1 (R-Team 2011).

86 Results

Diversity levels at R genes in both C. rubella and C. grandijlora are highly

variable across loci (Table 1). Nevertheless, average nonsynonymous diversity (X 0W)

and divergence from the outgroup Arabidopsis thaliana (Ka) are significantly greater

than the genome-wide average (p < 0.001) for the R-genes in both species, according to

the permutation tests. All Ks values, including averages, were not significantly elevated

(data not shown). In C. rubella, average synonymous diversity at R-genes is also greater than the genome average (p < 0.001), but this is not the case in C. grandijlora (Table 1).

The summary statistics of many of the individual R-genes fall within the upper 2.5% tail of the genome-wide distribution, as indicated with asterisks (Table 1). The locus

AT1G56540 has particularly elevated diversity, both synonymous and non-synonymous, in both species, in addition to showing increased non-synonymous divergence and

Tajima's D statistic. Additionally, locus AT1G63730, has extremely high polymorphism

levels in C. rubella alone, and it is these two loci that are largely driving elevated

diversity levels in C. rubella R genes (Table 1). Many individual resistance loci have

higher levels of non-synonymous divergence than the genome average (p < 0.025) in one or both species, including ATIG17600 (C. grandijlora), AT1G63730 (C. rubella),

AT1G27170, AT1G54540, and AT1G64070 (Table 1).

Differentiation between species is lower than the genome-wide distribution (p <

0.025) in four of the seven R-genes that were investigated in both species (Table 2). The disease resistance genes as a group have significantly lower differentiation than the 87 genome average by an even greater extent (p < 0.001), according to the permutation tests

(Table 2). The percentage of unique synonymous polymorphisms is also significantly increased in Capsella rubella in the R-genes as a group compared to the genome-wide distribution (p < 0.001) (Table 3). The pattern is largely driven by an extremely high percentage of unique synonymous polymorphisms in C. rubella at the highly polymorphic AT1G63730. The two loci AT1G56540 and AT5G17680 also have more unique synonymous polymorphisms in C. rubella than the genome average, showing a trend in the same direction.

Interestingly, the levels of synonymous nucleotide diversity (jis) at the R-genes in

Capsella grandiflora show a strong correlation with levels of nucleotide diversity (jt) for the same sample of R-genes in Arabidopsis thaliana (Bakker et al. 2006) (Figure 1A; p =

0.0175, r2=0.768). However, when the outlier AT1G56540, is removed, the correlation disappears (Figure IB; p = 0.748, r2 = 0.136), suggesting the pattern may be driven primarily by a shared selective history at this locus. No correlation exists between synonymous nucleotide diversity at the R-genes in Capsella rubella and the same R- genes in Arabidopsis thaliana (Figure 1C; p= 0.487, r2= 0.318), although the shared pattern of high polymorphism in AT1G56540 is evident.

88 Discussion

Overall, patterns of polymorphism at disease resistance loci show clear departures

from the reference genes. In particular, the difference in synonymous and non-

synonymous nucleotide diversity statistics is striking (Table 1). Although C. rubella has

largely reduced diversity compared to C. grandiflora throughout its genome, it has even

higher average diversity than C. grandiflora at this set of disease resistance loci (Table

1). Similarly, the average synonymous diversity at this set of R genes in C. rubella is five

times higher than the genome-wide pattern, while in C. grandiflora average diversity is in

fact slightly lower in this set of genes. This is a strong indication that balancing selection, either ongoing or historically, may have maintained ancestral polymorphism at some R-

genes in C. rubella, despite a severe population bottleneck and mating system shift. The significant reduction in levels of differentiation between the species, as measured by Fst, as well as higher proportions of shared (AT1G56540) and unique polymorphism in C. rubella (Table 3) is also in line with the maintenance of variation due to balancing selection. Although we don't see a significant elevation of synonymous Tajima's D, as expected under some parameter space for balancing selection, the trend is towards elevated Tajima's D values, particularly at the loci showing elevated diversity (Table 1).

It is important to consider the extent to which the patterns described above are

reflective of individual unusual loci vs. the set of R-genes as a whole. Clearly, a global assessment of the retention of polymorphism in C. rubella due to balancing selection will

require a genome-wide analysis, but our present data allows us to get a first sense of the 89 variance in selection patterns across R genes. The elevated levels of non-synonymous

nucleotide diversity in several of the R-genes are indications of non-neutral evolution

(Table 1). This pattern is consistent with balancing selection, as well as relaxed selective

constraint on the R-genes. However, balancing selection is expected to cause elevated

levels of synonymous polymorphism as well, a pattern that was observed in only one of

the R-genes in C. grandiflora (AT1G56540) and two in C. rubella (AT1G56540,

AT1G63730) (Table 1). Furthermore, there is a similar excess of nonsynonymous

divergence across genes, suggestive of relaxed constraint.

The two genes with elevated synonymous diversity therefore show the strongest evidence for balancing selection, and the highly elevated average diversity in C. rubella

is in large part driven by AT1G63730 that, surprisingly, shows low diversity in C.

grandiflora. This could reflect an unsampled divergent haplotype or a loss of balancing

selection in C. grandiflora. Since we recovered PCR amplicons from all individuals for

this locus, biased amplification of one allele seems unlikely. However, inspection of new

Illumina resequencing data in C. grandiflora indicates that the haplotype occurs in C.

grandiflora, but it simply remains unsampled in this dataset (data not shown).

Interestingly, the locus AT1G56540 also shows evidence of balancing selection in

Arabidopsis thaliana (Figure 1), while AT1G63730 is a candidate for a partial selective sweep in A. thaliana (Bakker et al. 2006). This suggests that particular loci may remain

the target of ongoing diversifying and balancing selection over long evolutionary timescales. A third locus, AT1G64070, shows significantly elevated nonsynonymous 90 diversity, as well as a non-significant trend towards higher synonymous diversity in C. grandiflora. It is possible that this locus is subject to weak balancing selection, or the region we have sequenced is linked to a region under balancing selection, and the loss of variation at this locus in C. rubella could reflect a loss of selected diversity. In the case of the R-genes with elevated non-synonymous divergence, but not diversity (C. grandiflora:

ATI G17600, AT1G27170; C. rubella: AT1G27170, AT1G64070, Table 1) this could be due to positive selection or lower levels of constraint on these genes compared with the rest of the genome. However, McDonald-Kreitman tests (McDonald, Kreitman 1991) were not significant for any of these R-genes (data not shown), so there is no evidence for positive selection, and thus we cannot reject the hypothesis that this simply reflects relaxed selective constraint.

Why do we detect more significant differences at R genes compared to the genome-wide average in C. rubella than in the ancestral C. grandiflora? One possibility might be that PCR problems meant that we failed to amplify more divergent alleles in C. grandiflora. However, this would predict a general reduction in our inferred C. grandiflora diversity levels in loci with smaller realized sample sizes, but there is no evidence that this is the case (Table 1). Nevertheless, a divergent haplotype at

AT1G63670 did remain unsampled in C. grandiflora. Additionally, high levels of linkage disequilibrium in C. rubella combined with the genome-wide loss of diversity due to the population bottleneck may exaggerate the signal of balanced polymorphism. In highly outcrossing, equilibrium species, recombination events will limit the signal of balancing 91 selection to a very narrow region surrounding the selected site (Charlesworth, Nordborg,

Charlesworth 1997), whereas we expect a more extended elevation of diversity in selfing species, creating a much greater difference between regions under balancing selection and the genomic average. Our results highlight the possible importance of balancing

selection in generating strong variance in the retention of diversity in bottlenecked, selfing species.

92 '7

Conclusions

Our data are consistent with previous studies that found more polymorphism

patterns consistent with balancing selection compared to positive selection in surveys of

plant disease resistance genes (Botella et al. 1998; McDowell et al. 1998; Caicedo,

Schaal, Kunkel 1999; Stahl et al. 1999; Tian et al. 2002; Rose et al. 2004). However, there is also a great deal of evidence for relaxed selective constraint on many of the R- genes compared to reference genes, which was also found in the R-genes in Arabidopsis thaliana (Bakker et al. 2006), a pattern which may be reflective of conditional neutrality of the loci under environmental conditions where the functional gene is no longer adaptive (Gos, Wright 2008). In either case, diversity may be maintained by ongoing selection, or neutrally, following a history of balancing selection, at the disease resistance genes in C. rubella. Nevertheless, it is also quite possible that the low power of the

McDonald-Kreitman test at individual loci is preventing the detection of positive selection on amino acids. Further studies of genome-wide patterns will help assess the degree to which high amino acid substitution at R-genes is driven by recurrent positive selection, weak negative selection, diversifying selection, or relaxed constraint. Two of the seven R-genes in our sample for which we have data from both species, AT1G56540 and AT1G63730, show patterns of polymorphism that are consistent with either present or past balancing selection acting at these loci. The patterns persisted through the speciation, and diversity reduction in Capsella rubella, regardless of whether the selective forces are still active. Therefore we can conclude that historical or ongoing 93 balancing selection may play an important role in the differential retention of polymorphism across the genome.

94 Figures

A B

£ © o o outlier removed o

d o

0.00 0.02 0.04 0.06 0.08 0.00 0.05 0.10 0.15 Capsella grandiflora Capsella rubella synonymous nucleotide diversity synonymous nucleotide diversity

Figure 1. Correlation in nucleotide diversity between Capsella species and Arabidopsis

thaliana. Correlation between synonymous nucleotide diversity (synonymous n) for the

R-genes in the two Capsella species and nucleotide diversity (re) in the same set of R-

genes in Arabidopsis thaliana (Bakker et al. 2006). A) Capsella grandiflora and

Arabidopsis thaliana. The dashed line represents the correlation between the two species

after the outlier, AT1G56540, has been removed. B) Correlation between R-genes in

Capsella rubella and Arabidopsis thaliana.

95 Tables

Table 1. Individual and average summary statistics for the disease resistance genes.

Species Locus n #of Theta Pi Tajima's #of Theta Pi Ka Tajima's sites D site* D

Atl«17600 10 123 0.0029 0.0038 0.8198 444 0.0016 0.0019 0.0998* 0.5259

At la56540 6 69 0.0637* 0.0776* 1.3080 222 0.0197* 0.0240 * 0.1219* 1.3080*

Atlg63740 12 129 0.0258 0 0345 1.4014 420 00087 0 0076 0 0547 -0 4916

At3a50950 16 127 0.0213 0.0255 0.7168 428 0.0014 0 0011 0.0124 -0 5778 wmmmmm R-Genes 122 0.0157 0.0173 01981 404 0.0063*"* 0.0064 *** 00708*** -02452 Average

Cmpttila Atlgl2290 7 126 0.0032 0 0023 -1 0062 420 0.0000 0.0000 0.0441 ruMl0

AIUS6S40 7 69 0.0534* 00692* 1 5748 222 0.0147* 0.0193 * 0.1204* 16416*

Atlg64070 7 146 0.0000 0.0000 469 0.0009 0.0006 0.1153* -10062

AtSg 17680 7 128 0 0159 0.0111 •1:4861 415 0 0039 0 0034 0.0462 -0 5976

Genome 126 0.0059 0.0061 0.0121 412 0 0007 00006 00262 -00558 Averaae • p<0.025, ** p<0.01, ***p<0.001

96 Table 2. Differentiation of individual and average disease resistance genes between species.

Locus Fst

Atle27170

Atlg63730 0.0000*

At3e50950

R-genes o.i 864**

*p <0.025 , **p<0.001.

97 Table 3 - Percentages of shared, unique and fixed polymorphisms by category for

individual and average disease resistance genes

R-Gene % unique % unique % shared % fixed % unique % unique % shared % fixed non- synonymous. synonymous, synonymous synonymous non- non- non- synonymous C. C. rubella synonymous, synonymous, synonymous grandiflvra C. C. rubella grandiflora

Atlf56540 0.11 0.1! 0.78* 0.00 0 14 0.14 0.71* 0.00

Atl|64070 0.69 0.00 0.00 0.31* 0.71 0.00 0.05 0.24*

At5fl7680 0.44 0.33 0.22 0.00 0.43 0.57 0.00 0.00

R-Gcne 0.61 0.20* 0.14 0.04 0.48 0.23 0.11 0.03

* p<0.05

98 Supplementary Material

Table SI. Locations of the individuals from which the R-gene sequences were sampled

Sample Name Designation Species Location G1 2e-TS2 Capsella grandiflora Paleokastritsas, Corfu, Greece G2 2h-TS3 Capsella grandiflora Paleokastritsas, Corfu, Greece G3 918/1-TS2 Capsella grandiflora Pantokrator, Corfu, Greece G4 918/8-TS2 Capsella grandiflora Pantokrator, Corfu, Greece G5 934/32-TS2 Capsella grandiflora Metsovo, Zagori, Greece G6 934/31-TS1 Capsella grandiflora Metsovo, Zagori, Greece G7 935/13-TS1 Capsella grandiflora Sokraki, Corfu, Greece G8 925/9-TS2 Capsella grandiflora loannina, Zagori, Greece R1 1GR1-TS1 Capsella rubella Samos, Greece R2 39.5-TS2 Capsella rubella Bacia, Sicily, Italy R3 1209/26-TS3 Capsella rubella Cumbre Dorsal, Tenerife, Spain R4 80TR1-TS1 Capsella rubella Istanbul, Turkey R5 50.1-TS2 Capsella rubella La Calma, Spain R6 1215/16-TS1 Capsella rubella La Laguna, Tenerife, Spain R7 TAAL-1-TS3 Capsella rubella Taguemont, Algeria

99 Table S2. Primers used for PCR amplification of R-gene fragments

* Data from C. grandiflora only.

Name Locus Fragment R-Gene Type Gene Name Forward Primer Reverse Primer P2 Atlgl2290_2 CC-NBS-LRR CTGACTGATTCTTCGTCGAG GCCaGTGGATAGCATCTG P3 Atlgl7600_l* TIR-NBS-LRR GATCTCTGATGCTTGATGGC CAGACGCAAAAGTGAAAATG P10 Atlg27170_6 TIR-NBS-LRR TGGnGCCACAGCTTAGAAG GAACTCGGAACaGTTTCAG P13 Atlg56540_l TIR-NBS-LRR WRR4 TGGATnTCTTCCTCGCCTA CAGTCGTAAGCATCCCATCA P15 Atlg63730_2 TIR-NBS-LRR GCATTCAGCCCCTTAC CCTACGTGCTTCTTGACCCA P17 Atlg63740_2* TIR-NBS-LRR TGTGGGTCCTTGAGATTGAA CCCTTCCTGGTAAGTATGC P19 Atlg64070_2 TIR-NBS-LRR RLM1 GGATGCATACCCAAGCAAGT GCGTGCTTCACAAGAGACTG P20 At3g50950_l CC-NBS-LRR ZAR1 G CCT CAACT GTCGTCACCTT TGGAGGAGTAAGGGCGTCTA P21 At5gl7680_3 TIR-NBS-LRR ATGAGaGCTTGAGGTGGTT TTGGTGATTGAAGCAAGTGG

100 Table S3. BlastX Coordinates and protein coding domains for the R-gene fragments

* Data from C. grandiflora only.

Locus Fragment RefSeq Accession Number BlastX Coordinates Coordinates of Protein Domain Conserved Domain Database (CPO) BLAST Result Atlgl2290_2 NM_001160857.1 460-641 465-602 LRR protein Atlgl7600_l* NM_101622.1 752-941,919-1000 843-953 LRR protein Atlg27170_6 NM_001160900.1 696-878,814-1015 708-1104 LRR receptor-like protein kinase Atlg56540_l NM_104531.1 597-626,620-691 532-750 LRR receptor-like protein kinase Atlg63730_2 NM_105050.1 642-814 601-732 LRR protein Atlg63740_2* NM_105051.3 618-800 588-715 LRR protein Atlg64070_2 NM_105080.2 619-825 602-621,760-782 LRR protein At3g50950_l NM_11495S.4 530-714 At5gl7680_3 NM 121774.1 898-1078 719-997 LRR protein

101 Table S4. Locations of teh individuals from which the genome-wide sequences sampled.

Species Designation Origin C. grandiflora 2e-TSl Paleokastritsas, Corfu, Greece C. grandiflora 918/1-TS4,918/8-TS1 Pantokrator, Corfu, Greece C. grandiflora 934/31-TS3 Metsovo, Zagori, Greece C. grandiflora 935/13-TS2 Sokraki, Corfu, Greece C. grandiflora 925/9-TS3 loannina, Zagori, Greece C. rubella 1GR1-TS1 Samos, Greece C. rubella 39.1-TS1 Bacia, Sicily, Italy C. rubella 1209/26-TS4 Cumbre Dorsal, Tenerife, Spain C. rubella 80TR1-TS1 Istanbul, Turkey C. rubella 50.1-TS1 La Calma, Spain C. rubella 1215/17-TS1 La Laguna, Tenerife, Spain C. rubella TAAL-1-TS2 Taguemont, Algeria C. rubella 75.2 Moni Megalou, Spilaiou, Greece

102 References

Aguilar, A, G Roemer, S Debenham, M Binns, D Garcelon, RK Wayne. 2004. High

MHC diversity maintained by balancing selection in an otherwise genetically

monomorphic mammal. Proceedings of the National Academy of Sciences of the

United States of America 101:3490-3494.

Altschul, SF, W Gish, W Miller, EW Myers, DJ Lipman. 1990. Basic local alignment

search tool. Journal of Molecular Biology 215:403-410.

Bachtrog, D, P Andolfatto. 2006. Selection, recombination and demographic history in

Drosophila miranda. Genetics 174:2045-2059.

Bakker, EG, C Toomajian, M Kreitman, J Bergelson. 2006. A genome-wide survey of R

gene polymorphisms in Arabidopsis. Plant Cell 18:1803-1818.

Bergelson, J, M Kreitman, EA Stahl, DC Tian. 2001. Evolutionary dynamics of plant R-

genes. Science 292:2281-2285.

Botella, MA, JE Parker, LN Frost, PD Bittner-Eddy, JL Beynon, MJ Daniels, EB Holub,

JDG Jones. 1998. Three genes of the arabidopsis RPP1 complex resistance locus

recognize distinct Peronospora parasitica avirulence determinants. Plant Cell

10:1847-1860.

Boyko, AR, SH Williamson, AR Indap, et al. 2008. Assessing the evolutionary impact of

amino acid mutations in the human genome. Plos Genetics 4(5):el000083.

103 Bryja, J, M Galan, N Charbonnel, JF Cosson. 2006. Duplication, balancing selection and

trans-species evolution explain the high levels of polymorphism of the DQA

MHC class II gene in voles (Arvicolinae). Immunogenetics 58:191-202.

Bustamante, CD, R Nielsen, SA Sawyer, KM Olsen, MD Purugganan, DL Hartl. 2002.

The cost of inbreeding in Arabidopsis. Nature 416:531-534.

Caicedo, AL, BA Schaal, BN Kunkel. 1999. Diversity and molecular evolution of the

RPS2 resistance gene in Arabidopsis thaliana. Proceedings of the National

Academy of Sciences of the United States of America 96:302-306.

Camacho, C, G Coulouris, V Avagyan, N Ma, J Papadopoulos, K Bealer, TL Madden.

2009. BLAST plus : architecture and applications. BMC Bioinformatics 10:421

Charlesworth, B, M Nordborg, D Charlesworth. 1997. The effects of local selection,

balanced polymorphism and background selection on equilibrium patterns of

genetic diversity in subdivided populations. Genetical Research 70:155-174.

Charlesworth, J, A Eyre-Walker. 2006. The rate of adaptive evolution in enteric bacteria.

Molecular Biology and Evolution 23:1348-1356.

Eimes, JA, JL Bollmer, PO Dunn, LA Whittingham, C Wimpee. 2010. Mhc class II

diversity and balancing selection in greater prairie-chickens. Genetica 138:265-

271.

Flor, HH. 1955. Host-parasite interaction in flax rust - its genetics and other implications.

Phytopathology 45:680-685.

104 Flor, HH. 1971. Current status of the gene-for-gene concept. Annual Review of

Phytopathology 9:275-296.

Foxe, JP, VUN Dar, H Zheng, M Nordborg, BS Gaut, SI Wright. 2008. Selection on

amino acid substitutions in Arabidopsis. Molecular Biology and Evolution

25:1375-1383.

Foxe, JP, T Slotte, EA Stahl, B Neuffer, H Hurka, SI Wright. 2009. Recent speciation

associated with the evolution of selfing in Capsella. Proceedings of the National

Academy of Sciences of the United States of America 106:5241-5245.

Gos, G, SI Wright. 2008. Conditional neutrality at two adjacent NBS-LRR disease

resistance loci in natural populations of Arabidopsis lyrata. Molecular Ecology

17:4953-4962.

Guo, YL, JS Bechsgaard, T Slotte, B Neuffer, M Lascoux, D Weigel, MH Schierup.

2009. Recent speciation of Capsella rubella from Capsella grandiflora, associated

with loss of self-incompatibility and an extreme bottleneck. Proceedings of the

National Academy of Sciences of the United States of America 106:5246-5251.

Halligan, DL, F Oliver, A Eyre-Walker, B Harr, PD Keightley. 2010. Evidence for

Pervasive Adaptive Protein Evolution in Wild Mice. Plos Genetics

6(l):el000825.

Hurka, H, S Freundner, AHD Brown, U Plantholt. 1989. Aspartate-aminotransferase

isozymes in the genus Capsella (Brassicaceae). Biochemical Genetics 27:77-90.

105 Hurka, H, B Neuffer. 1997. Evolutionary processes in the genus Capsella (Brassicaceae).

Plant Systematics and Evolution 206:295-316.

Jarvi, SI, CL Tarr, CE Mcintosh, CT Atkinson, RC Fleischer. 2004. Natural selection of

the major histocompatibility complex (Mhc) in Hawaiian honeycreepers

(Drepanidinae). Molecular Ecology 13:2157-2168.

Kimura, M. 1968. Evolutionary rate at the molecular level. Nature 217:624-&.

Koch, MA, M Kiefer. 2005. Genome evolution among cruciferous plants: A lecture from

the comparison of the genetic maps of three diploid species - Capsella rubella,

Arabidopsis lyrata subsp Petraea, and A. thaliana. American Journal of Botany

92:761-767.

Liu, XQ, L Wang, XD Liu, XQ Liu, DB Wang, CT Wang, F Lin, QH Pan. 2010. The

molecular evolution of the rice blast resistance gene Pi36. International Journal of

Plant Sciences 171:235-243.

McDonald, JH, M Kreitman. 1991. Adaptive protein evolution at the Adh locus in

Drosophila. Nature 351:652-654.

McDowell, JM, M Dhandaydham, TA Long, MGM Aarts, S Goff, EB Holub, JL Dangl.

1998. Intragenic recombination and diversifying selection contribute to the

evolution of downy mildew resistance at the RPP8 locus of arabidopsis. Plant Cell

10:1861-1874.

Meyer, SE, DL Nelson, S Clement, A Ramakrishnan. 2010. Ecological genetics of the

Bromus tectorum (Poaceae) - Ustilago bullata (Ustilaginaceae) pathosystem: A 106 role for frequency-dependent selection? American Journal of Botany 97:1304-

1312.

Michelmore, RW, BC Meyers. 1998. Clusters of resistance genes in plants evolve by

divergent selection and a birth-and-death process. Genome Research 8:1113-

1130.

Nei, M, F Tajima. 1987. Problems arising in phylogenetic inference from restriction-site

data. Molecular Biology and Evolution 4:320-323.

Nicholas, KB, Nicholas H.B.Jr., and Deerfield, D.W. II. 1997. GeneDoc: Analysis and

Visualization of Genetic Variation. EMB NEWS 4:14.

Nordborg, M, TT Hu, Y Ishino, et al. 2005. The pattern of polymorphism in Arabidopsis

thaliana. Plos Biology 3:1289-1299.

Ohta, T. 1973. Slightly deleterious mutant substitutions in evolution. Nature 246:96-98.

Ohta, T. 1992. The Nearly Neutral Theory Of Molecular Evolution. Annual Review of

Ecology and Systematics 23:263-286.

Paape, T, B Igic, SD Smith, R Olmstead, L Bohs, JR Kohn. 2008. A 15-Myr-old genetic

bottleneck. Molecular Biology and Evolution 25:655-663.

Paetsch, M, S Mayland-Quellhorst, B Neuffer. 2006. Evolution of the self-

incompatibility system in the Brassicaceae: identification of S-locus receptor

kinase (SRK) in self-incompatible Capsella grandiflora. Heredity 97:283-290.

R-Team. 2011. R: A language and environment for statistical computing. R Foundation

for Statistical Computing, Vienna, Austria. 107 Rand, DM, LM Kann. 1996. Excess amino acid polymorphism in mitochondrial DNA:

Contrasts among genes from Drosophila, mice, and humans. Molecular Biology

and Evolution 13:735-748.

Richman, AD, LG Herrera, D Nash. 2001. MHC class II beta sequence diversity in the

deer mouse (Peromyscus maniculatus): implications for models of balancing

selection. Molecular Ecology 10:2765-2773.

Rose, LE, PD Bittner-Eddy, CH Langley, EB Holub, RW Michelmore, JL Beynon. 2004.

The maintenance of extreme amino acid diversity at the disease resistance gene,

RPP13, in Arabidopsis thaliana. Genetics 166:1517-1527.

Rose, LE, RW Michelmore, CH Langley. 2007. Natural variation in the Pto disease

resistance gene within species of wild tomato (Lycopersicon). II. Population

genetics of Pto. Genetics 175:1307-1319.

Slotte, T, JP Foxe, KM Hazzouri, SI Wright. 2010. Genome-Wide Evidence for Efficient

Positive and Purifying Selection in Capsella grandiflora, a Plant Species with a

Large Effective Population Size. Molecular Biology and Evolution 27:1813-1821.

Smith, NGC, A Eyre-Walker. 2002. Adaptive protein evolution in Drosophila. Nature

415:1022-1024.

St Onge, KR, T Kallman, T Slotte, M Lascoux, AE Palme. 2011. Contrasting

demographic history and population structure in Capsella rubella and Capsella

grandiflora, two closely related species with different mating systems. Molecular

Ecology 20:3306-3320. 108 Stahl, EA, G Dwyer, R Mauricio, M Kreitman, J Bergelson. 1999. Dynamics of disease

resistance polymorphism at the Rpml locus of Arabidopsis. Nature 400:667-671.

Strasburg, JL, NC Kane, AR Raduski, A Bonin, R Michelmore, LH Rieseberg. 2011.

Effective Population Size Is Positively Correlated with Levels of Adaptive

Divergence among Annual Sunflowers. Molecular Biology and Evolution

28:1569-1580.

Tajima, F. 1983. Evolutionary relationship of dna-sequences in finite populations.

Genetics 105:437-460.

Tajima, f. 1989. Dna polymorphism in a subdivided population - the expected number of

segregating sites in the 2-subpopulation model. Genetics 123:229-240.

Tian, D, MB Traw, JQ Chen, M Kreitman, J Bergelson. 2003. Fitness costs of R-gene-

mediated resistance in Arabidopsis thaliana. Nature 423:74-77.

Tian, DC, H Araki, E Stahl, J Bergelson, M Kreitman. 2002. Signature of balancing

selection in Arabidopsis. Proceedings of the National Academy of Sciences of the

United States of America 99:11525-11530.

Watterson, GA. 1975. Number of segregating sites in genetic models without

recombination. Theoretical Population Biology 7:256-276.

Weir, BS, CC Cockerham. 1984. Estimating F-statistics for the analysis of population-

structure. Evolution 38:1358-1370.

Wright, S. 1951. The genetical structure of populations. Annals of Eugenics 15:323-354.

109 Xu, TJ, YN Sun, RX Wang. 2010. Gene duplication and evidence for balancing selection

acting on MHC class IIDAA gene of the half-smooth tongue sole (Cynoglossus

semilaevis). Marine Genomics 3:117-123.

Zhang, LQ, WH Li. 2005. Human SNPs reveal no evidence of frequent positive selection.

Molecular Biology and Evolution 22:2504-2507.

110 CHAPTER 4

GENOME-WIDE RELAXATION OF PURIFYING SELECTION IN THE

RECENT POLYPLOID CAPSELLA BURSA-PASTORIS

111 Introduction

Organisms that have acquired additional copies of their ancestral chromosomes, leaving them with more than two complete sets of chromosomes, are said to be polyploid.

This phenomenon is widespread in plants. An estimated 30% to 80% of all plant species are polyploid (Masterson et al. 1994), and a significant number of plant speciation events include a change in ploidy level (Wood et al. 2009). Many of the most important agricultural species, including wheat, potatoes, sugarcane, soybean, and cotton are polyploid (Hilu, 1993). Even plant species that are diploid often have polyploidy in their past, and are termed "paeleopolyploid". The proportion of angiosperms estimated to have had a polyploidy event in their history is between 47% and 100%, according to the fossil record (Masterson, 1994), and genomic data (Cui et al. 2006). Many plant species have undergone multiple rounds of polyploidy in their past, including those that are current polyploids (Soltis and Soltis 1999) and paeleopolyploids (Fawcett et al. 2009).

A polyploid species may arise in one of two ways. Those originating from a hybridization event between two different species are referred to as 'allopolyploid', while polyploids that have undergone whole genome duplication within the same species are called 'autopolyploid' (Otto and Whitton, 2000). Modes of chromosomal inheritance also differ between polyploid species, and depend to some extent on the type of polyploid origin. Disomic inheritance, where the chromosomes form bivalents during meiosis and segregate in pairs, displaying "chromosomal diploidy", is typical of allopolyploids. Since they have two full genomes from two separate species, they behave as diploids with extra 112 sets of chromosomes (Doyle et al. 2008). Allopolyploids tend to display polysomic

inheritance, where multivalents are formed during meiosis, and all copies of the

chromosomes have equal chances of pairing with each other (Cifuentes et al. 2010).

There are, however, autopolyploids that exhibit disomic inheritance, and recent work has shown that this is more common than was previously believed (Soltis et al.

2010). In fact, polysomy may be a transient stage leading to eventual chromosomal diploidy and disomic inheritance (Ramsay and Schemske, 2002; Doyle, 2008).

The evolutionary advantages of polyploidy are controversial, as advantages and disadvantages to whole chromosome duplication are apparent. Polyploidy is considered by many to be a major mechanism of evolution, adaptation, and speciation in plants.

Large differences in morphology, ecology, and physiology can be observed between

polyploids and their progenitors (Ramsey and Schemske, 2002; Van de Peer, 2009).

These phenotypic differences are visible to natural selection, and can provide it novel variation, thereby driving evolution. Some of the most common phenotypic effects include increased cell volume, especially stomatal cell size, due to the change in surface to volume ratio of cells from the extra chromosomes, which in turn alters metabolic processes (Levin, 1983; Otto and Whitton, 2000). The seeds of polyploids are routinely larger than those of related diploids, which can lead to earlier development and subsequent niche differentiation (Villar, et al. 1998; Otto and Whitton, 2000).

Conversely, polyploidy may hinder evolutionary processes, as genome-wide redundancy can reduce the efficiency of natural selection acting on advantageous 113 mutations. Due to the presence of multiple alleles at each gene, the spread of a favorable allele is expected to be slower at higher ploidy levels, for a given allele frequency (Otto and Whitton, 2000). Furthermore, polyploid plants have also been demonstrated to have lower rates of diversification and higher rates of extinction (Mayrose et al. 2011). This contradicts the prevalence of paleopolyploidy events in the majority of plant taxa

(Masterson, 1994; Cui et al. 2006). Mayrose et al. (2011) attempts to resolve the contradiction by the observation that, if diversification rates were equal between polyploids and diploids, the expected number of paleopolyploidization events would be higher than the number that is observed. Their conclusion is that polyploidy can be an evolutionary 'dead end', but polyploids that do survive may experience enhanced evolutionary potential, thereby driving them to success in the long term (Mayrose et al.

2011).

Changes in reproductive systems that are associated with a polyploidization event include asexuality and the breakdown of self-incompatibility. Polyploid plants have higher selfing rates, which is predicted to facilitate reproductive isolation and speciation from their diploid progenitors (Otto and Whitton, 2000; Comai, 2005; Barringer, 2007, but see Mable, 2004). Higher selfing rates also provide reproductive assurance, thereby enhancing colonization ability. Polyploid plants are famously good colonizers, and they excel at both establishing new populations and expanding their geographic ranges (te

Beest et al. 2012). Establishment may be facilitated by the reproductive assurance of selfing (Barrett, 2002; Kalisz, Vogler, and Hanley, 2004; Wolf and Takebayashi, 2004), 114 the masking of deleterious alleles (Comai, 2005), or both. Another feature that may enhance a polyploid plant's colonization and establishment ability is the wider range of ecological conditions that it is able to tolerate, compared to its diploid progenitor (Otto,

2000; Leitch and Leitch, 2008). The link between polyploidy and geographic range expansion is well established. In fact, polyploids are over-represented among plant species that are known to be invasive, which is the most extreme form of colonization (te

Beest, 2012). Therefore, the phenotypic effects of having multilpe chromosome sets may have evolutionary advantages.

Polyploidy has profound implications for genome evolution, and among them is the process of differential gene loss. Polyploidy is a cyclical process, whereby polyploids inevitably evolve towards diploidy until the next genome doubling event in their evolutionary history (Wendel 2000; Leitch and Bennett 2004; Adams and Wendel 2005).

The result is that many diploid plant species have had multiple polyploidy events in their past (Fawcett et al. 2009), even those with genomes as small as Arabidopsis (Blanc et al.

2003). Multiple rounds of polyploidy must necessarily involve the loss of duplicated genes, if genome sizes are to be restrained, and diploidy eventually restored. However, only a subset of duplicated genes are removed, while others are retained (Adams and

Wendel 2005). Evidence suggests that gene loss is non-random, and instead is subject to natural selection. Certain classes of genes are retained more than others (Blanc and

Wolfe, 2004), and duplicated genes that are retained following one round of polyploidy are more likely to be retained through subsequent genome doubling events (Seoighe and 115 Gehring, 2004). This is because the duplicated genes that are maintained over long periods of time are likely to have acquired novel functions, called neofunctinalization, or perform only a portion of their original function, while their homeolog performs the rest, a process called subfunctionalization. Genes that are not maintained are likely to have been silenced (nonfunctionalization) (Lynch and Conery, 2000; Adams and Wendel

2005). This implies that polyploidy drives genome evolution, by providing new raw material for natural selection to act upon through gene redundancy. It is therefore possible that, in contrast to the arguments above, polyploid species show greater amounts of adaptive evolution at the molecular level than their diploid relatives, as there is a greater pool of genes and alleles on which natural selection may act (Hegarty and

Hiscock, 2008). For those gene duplicates that do not gain new or partitioned functions, they may be silenced in order to preserve the ancestral gene dosage, before eventually being lost.

The sequence evolution of duplicated genes may differ from those of their diploid ancestors due to changes in selection pressures following polyploidization. Given the functional changes in duplicated genes, including neo- and subfunctionalization, it is reasonable to expect novel sequence evolution. For example, a gene formerly under purifying selection, when duplicated, may have one copy become pseudogenized, while the other continues to be selected upon. The pseudogene will experience relaxed selective constraint, and an increase in nucleotide diversity as a result (Wendel 2000). Duplicate

116 genes that acquire new or partitioned functions are expected to be under the influence of

positive selection, and to harbor more adaptive mutations than their ancestral homologs.

Sequence evolution of the polyploid genome overall is expected to diverge from that of the ancestral diploid as a consequence of the complete genome redundancy. One such consequence may be the acquisition of deleterious mutations. Having a duplicate of every gene in the genome provides a buffer against the effects of deleterious mutations.

Functional gene copies are able to compensate for a homeolog that is damaged by a harmful mutation. We therefore can predict a polyploid species to harbor a larger ratio of protein-altering (non-synonymous) substitutions to silent (synonymous) substitutions, which are not expected to alter gene function, compared to its diploid progenitor. We can also expect fewer mutations of strong deleterious effect and more of nearly neutral effect in a polyploid compared to a diploid, as many mutations will be less harmful in a polyploid as a result of gene redundancy.

The genus Capsella, within the family Brassicaceae, is ideal for investigating the genomic consequences of polyploidy. The genus consists of three closely related species,

Capsella grandiflora, C. rubella, and C. bursa-pastoris. Here, I focus on Capsella grandilfora and C. bursa-pastoris. Capsella grandiflora is an annual, self-incompatible diploid plant that is native to Western Greece, where it exists in a large, and size-stable

population of effectively 500,000 individuals (Foxe et al. 2009; Slotte et al. 2010). It is largely restricted to this geographic range, and occurs elsewhere, in Albania and Italy, only in small populations (Hurka and Neuffer, 1997; Koch et al, 2005; Paetsch et al. 117 2006). Capsella bursa-pastoris is a recent allopolyploid of C. grandiflorct, an event estimated at 270,000 - 700,000 years ago (St. Onge and Foxe, 2012), yet it already exhibits disomic inheritance. The event initially caused a severe population bottleneck

but the population subsequently expanded (Slotte et al. 2008). The geographic range of C.

bursa-pastoris is now world-wide, excluding only the hot wet tropics (Hurka and

Neuffer, 1997). It is one of the most successful known colonizing species.

Here my aim is to investigate the genomic consequences of polyploidy on the

Capsella bursa-pastoris genome, by comparing patterns of nucleotide polymorphism across the genome to those in C. grandiflora, its diploid progenitor, using full-coverage genome sequences. This allows me to investigate the consequences of polyploidy at a greater genomic resolution than has ever been possible, given recent advances in genome sequencing technology. It also provides insights into the timing of genomic changes following a polyploidization event, as the C. bursa-pastoris event is estimated at 270-700 thousand years ago, and any genomic changes detected can be placed within this time scale. Here, I characterize the Capsella bursa-pastoris genome in comparison with the C. grandiflora genome with respect to patterns of nucleotide polymorphism and adaptive evolution, in order to gain insight into the process of polyploidization at the molecular level. The Capsella bursa-pasotoris data come from two homeologous loci in a highly selfing tetraploid of disomic inheritance. Therefore, we estimate population summary statistics by taking the number of individuals as the sample size. This method is based on the assumption that individuals are largely homozygous within homeologs, which is a 118 appropriate due to the high degree of selfing in this species. In order to investigate within-homeolog genetic variation independently, polymorphisms that were heterozygoes for all individuals (fixed heterozygotes) were removed from the analysis of population summary statistics, as described below.

119 Materials and Methods

Seeds from eight Capsella bursa-pastoris plants were collected from one individual in each of the following countries: Greece (70.5), Spain (5.16), Poland (13.16),

Italy (39-12-28), Germany (12.4), the Netherlands (16.9), Russia (VAL), and Iceland

(RK32). Seeds from thirteen Capsella grandiflora were collected from various locations in Greece, including Corfu (5a, AXE), Ioannina (83.17, 85.33, 86.8), Zagory (88.56,

91.2), Metsovo (93.2, 97.26), Mikropapingo (94.12,95.15), Trikala (103.17). Collections were completed by Kate St. Onge and Martin Lascoux (Uppsala University, Sweden).

Seeds were sterilized, plated on half-strength-Murashige-Skoog nutrient medium, and placed in the fridge at 4°C, to be vernalized for three weeks, after which they were removed from the fridge and stored at room temperature for several days to germinate until they were large enough to be planted. Seedlings were potted in a standard potting mix in 1L round pots and grown in a greenhouse. Leaf material was collected when plants were full grown. To ensure proper species identification, tetraploidy was confirmed using flow cytometry, a method that measures the optical properties of fluorescent-stained cells in order to quantify the total amount of nuclear DNA. For calibration, we used a leaf sample of radish, Raphanus sativus. The result for one sample is shown in Figure 1.

Leaf material was extracted from samples using a modified CTAB protocol

(Doyle and Doyle, 1987). Whole genome sequencing of each individual was completed at the Genome Quebec Innovation Centre using the Illumina Genome Analyzer platform 120 (Illumina, San Diego, California, USA). Paired-end 108 base pair reads were generated for Capsella bursa-pastoris, for a median coverage of 20 reads per site. Reads of C.

grandiflora were of single-end, of similar length and depth coverage. For genome

assembly of C. bursa-pastoris and C. grandiflora, reads were aligned to the C. rubella de-novo genome assembly (Joint Genome Institute, California, USA).

Sequences for each individual were obtained in fastq format. The program

Stampyl3 was used for file format conversion and read mapping to the Capsella rubella reference genome (JGI). The program Picard (picard.sourceforge.net) was used for sorting reads into the correct order and further file format conversion in preparation for the analysis that followed. The Genome Analysis Toolkit (GATK, McKenna et al. 2010) was used to re-align sequences that are close to insertion-deletions, as these can be inaccurately mapped by standard read-mapping programs. Single nucleotide polymorphisms (SNPs) were called (genotyped) using GATK, and the output was generated in variant call format (Danecek et al. 2011). The vcf file was filtered to remove sites with a SNP quality score of less than 60 or a depth of coverage that was less than 20 or greater than 100. The minimum quality and depth scores were chosen in order to be confident in the accuracy of the genotype call, and the upper limit of depth was chosen in order to exclude repetitive regions that may map incorrectly. Nucleotide polymorphisms for which all individuals were heterozygous in Capsella bursa-pastoris were also filtered out, as these represent SNPs that are fixed between homeologs, and therefore represent ancestral polymorphism from the progenitor species, Capsella grandiflora. During

121 filtering, lists of different nucleotide site types, including 4-fold, intergenic, intron, 0- fold, 3-prime untranslated region, and 5-prime untranslated region, and their locations were compiled based on the Capsella rubella genome annotation (Joint Genome

Institute) and the genotype calls from GATK, for use in future analyses.

Analysis of different site types is necessary due to the fact that they are subject to different types of selection and levels of constraint. Introns and intergenic regions, which are non-coding, are presumably subject to less selective constraint, as nucleotide changes are not expected to result in alteration of the protein function. For this reason, variability is usually higher in non-coding regions, except for those with important function, such as the regulation of gene activity. Similarly, 4-fold degenerate (hereafter 4-fold) sites are expected to be unconstrained due to their redundancy in the genetic code. A 4-fold site is a site in a coding region at which any possible nucleotide change will still result in the same amino acid. Untranslated regions, both three prime and five prime, are non-coding but may be constrained because they regulate gene activity. The 0-fold sites are expected to be especially constrained, because any possible nucleotide change will be non- synonymous, and therefore alter the function of the protein, which is, more often than not, deleterious.

Sliding window graphs of chromosome-wide nucleotide diversity and the allele frequency spectrum were extracted from the vcf file using scripts written in Python 2.7.

Windows included 1000 SNPs, with a step of 500 SNPs, for optimal resolution. The minor allele frequency spectra for both species were calculated from the vcf file with 122 Python scripts, and graphs were coded using R statistical programming software.

Summary statistics for Capsella bursa-pastoris and Capsella grandiflora, including nucleotide diversity, segregating sites and allele frequency were generated using a version of the perl script Polymorphurama (Bachtrog, 2006) that was modified by the authors. The script input included FASTA files that were generated using the Capsella rubella genome annotation (JGI) and the genotype calls from GATK. Diversity statistics

included two estimators of the population mutation parameter, pi (ji, Nei, 1987) and

Watterson's 0W. Pi (it) is defined as the average pairwise number of nucleotide differences

per site for a sample of DNA sequences (Tajima, 1983), and Watterson's 0W summarizes the amount of nucleotide diversity based on the total number of segregating sites, and the sample size, in a group of DNA sequences (Watterson, 1975). The frequency spectrum of

polymorphism was measured by Tajima's D statistic, calculated by taking the difference between it and 0W (Tajima 1989). The effects of SNPs were predicted using the program

SnpEff (Cingolani, P. "snpEff: Variant effect prediction", http://snpeff.sourceforge.net,

2012). For this analysis, SNPs fixed between homeologs (50% segregation) were also included as a separate category, to test for fixation of deleterious mutations within a homeolog. The distribution of fitness effects and proportion of adaptive substitutions were estimated using the program DFE Alpha, which relies on within species polymorphism and between-species divergence, while accounting for difference in sample size in its calculations (Keightley and Eyre-walker, 2007; 2012). Divergence

123 estimates were calculated with the close relative Neslia paniculata as the outgroup.

Statistical significance of the differences between species for both ratios of non- synonymous to synonymous segregating sites, and number of stop codons compared to four-fold sites, were evaluated using a Chi-Square test by software available online from

VassarStats Website for Statistical Computing (http://vassarstats.net/). Segregating four­ fold synonymous sites were used for this comparison, because they approximate the baseline mutation rate of the species, allowing a more direct comparison of the numbers of segregating and fixed stop codons between species, as they control for sample size, as well as the difference in overall genetic diversity between the two species. It is noteworthy that the algorithm used to calculate the distribution of fitness effects, that is based on within species polymorphism and between species divergence, is designed to take into account differences in sample size (Keightley and Eyre-Walker, 2007).

124 Results

Sliding windows of nucleotide diversity show a great deal more local heterogeneity in Capsella grandiflora compared to C. bursa-pastoris, for the same window size and number of SNPs (Figure 2A), while more global heterogeneity is visible across chromosomes in C. bursa pastoris (Figure 2B). In this species, long stretches of chromosome possess diversity values that are consistently above or below the average

(Figure 2B). This is likely a result of extended linkage disequilibrium generated by the

population bottleneck in C. bursa-pastoris following the transition in ploidy and mating system from C. grandiflora. Average values of nucleotide diversity across chromosomes are reduced in Capsella bursa-pastoris compared to C. grandiflora (Figure 2). This is another signal of the population bottleneck, as well the selfing mating system in C. bursa- pastoris.

Sliding windows of Tajima's D statistic show a pattern similar to nucleotide diversity: greater local and less global variability across chromosomes in the frequency spectrum of Capsella grandiflora compared to C. bursa-pastoris (Figure 3). This likely also reflects the large scope of linkage disequilibrium in Capsella bursa-pastoris that was caused by its historical population bottleneck and shift in mating system. Average

Tajima's D values across chromosomes are notably different in the two species, Capsella grandiflora having negative average D (Figure 3A), while it is positive in Capsella bursa-pastoris (Figure 3B). This indicates an excess of rare variants in Capsella grandiflora, especially for non-synonymous variants, with largely negative values of D 125 (Figure 8), and an excess of intermediate-frequency variants, especially at synonymous sites in Capsella bursa-pastoris. Synonymous D is only half as negative as non- synonymous D in Capsella grandiflora, indicating a greater excess of non-synonymous compared to synonymous rare variants (Figure 8).

The site frequency spectrum of minor alleles is shown for both species and different site types in Figure 4. The number of 0-fold sites exceeds 4-fold sites most prominently in the singleton category, where the minor allele count is one, indicating purifying selection. Frequencies of all site types decrease as the minor allele count increases, indicating a skew towards rare alleles in both species (Figure 4A).

Consistent with the hypothesis of relaxed selection, the ratio of non-synonymous to synonymous segregating sites is significantly greater for Capsella bursa-pastoris compared to C. grandiflora (Figure 5, Chi-square test, p <0.0001). The same is true for the ratio of segregating and fixed stop codons to four-fold synonymous sites (Figure 6,

Chi-square test, p <0.0001).

Both measures of nucleotide diversity, Watterson's Theta (Watterson, 1975) and

Pi (Nei, 1987), reveal higher synonymous than nonsynonymous diversity in both species, with Capsella grandiflora having higher diversity of both types and for both measurements, while also showing a greater difference between synonymous and non- synonymous diversity than Capsella bursa-pastoris (Figure 7).

The distribution of fitness effects for 0-fold mutations is shown in Figure 9.

Capsella bursa-pastoris has a larger proportion of weakly deleterious mutations: those 126 with an Ne(s) value of less than one, where 'Ne' is the effective population size, and's' is the selection coefficient. The pattern continues for moderately deleterious mutations, having Ne(s) values of up to 100, when it becomes dramatically inverted (Figure 9).

Mutations with Ne(s) values of greater than 100 are considered to be very strongly deleterious, and Capsella bursa-pastoris has a proportion of mutations in this category that is less than half that of C. grandiflora.

The frequency spectra of deleterious polymorphisms, including SNPs located in splice acceptor sites, splice donor sites, as well as those causing the loss of start codons or gain of stop codons, and those in non-synonymous coding regions, are shown in Figure

10. Capsella grandiflora has many more rare deleterious mutations than Capsella bursa- pastoris, especially in the most rare category, which includes SNPs with a frequency of

0.125 in the population of sampled haplotypes. Capsella grandiflora has nearly six times the number of rare splice site acceptor and donor mutations than C. bursa-pastoris

(Figure 10A,B), more than four times the frequency of start codon losses (Figure 10C), more than six times the frequency of stop codon gain mutations (Figure 10D), and more than eight times the number of deleterious mutations in non-synonymous coding regions

(Figure 10E). For deleterious mutations having frequencies of more than 50%, the numbers in Capsella bursa-pastoris well outnumber, and more than double in some cases, those in C. grandiflora for all types of deleterious mutations presented here (Figure

10A-E). The pattern is even more pronounced for fixed deleterious mutations. Capsella

127 bursa-pastoris has many orders of magnitude more fixed deleterious mutations compared to Capsella grandiflora (Figure 10A-E).

It is noteworthy that the sample size differences make for a conservative comparison of the two species, as the greater number of C. grandiflora individuals should lend this analysis towards the capture of more rare alleles, yet consistently higher numbers are detected in Capsella bursa-pastoris.

The proportion (a) and relative rate (G)a) of adaptive substitutions are both higher in Capsella grandiflora compared to C. bursa-pastoris for all site types, including introns, intergenic regions, 0-fold sites, and 3-prime and 5-prime untranslated regions

(Figures 11 and 12). The most striking differences between the two species are at intergenic regions and 0-fold sites, where the proportion and relative rate of adaptive substitution is many times higher in Capsella grandiflora than in Capsella bursa- pastoris, although the pattern holds for all site types (Figures 11 and 12).

128 Discussion

The genomic changes induced by whole-genome duplication, and the timescale in

which they occur, remain largely unknown, despite the prevalence of polyploidy events

among plant species. Here, I have investigated the differences between closely related

polyploid and progenitor species, Capsella bursa-piastoris and Capsella grandiflora, at an

unprecedented genomic resolution, in order to determine what changes in nucleotide

polymorphism patterns have resulted from whole-genome duplication, within the last

270-700 thousand years, since the two species have diverged.

Genomic signatures of the population bottleneck and reduction in effective size that followed the transition to polyploidy and speciation in Capsella bursa-pastoris are abundant. The high degree of global heterogeneity across chromosomes in both nucleotide diversity (Figure 2B) and Tajima's D values (Figure 3B), compared to the locally heterogeneous values in C. grandiflora (Figures 2A; 3A) is a signature of the extended linkage disequilibrium caused by the bottleneck and shift to selfing, as large

regions have either retained or lost diversity. Genome-wide average values of nucleotide

diversity and Tajima's D statistic (Figures 7, 8) also reflect the bottleneck and mating system in Capsella bursa-pastoris, as an overall reduction in nucleotide diversity, and excess of intermediate frequency polymorphisms, is expected under a small effective population size.

Capsella grandiflora and Capsella bursa-pastoris, both exhibit signs of purifying selection, which is the force preventing deleterious mutations from increasing in 129 frequency. This is reflected in the site frequency spectrum, which contains a skew

towards rare alleles, and an excess of 0-fold compared to 4-fold sites, in both species,

particularly in the singleton category (Figure 4). The pattern provides evidence of purifying selection, because mutations at 4-fold synonymous sites may increase in frequency without hindering fitness, as they do not alter protein function, and are usually harmless. They are therefore less constrained by selection than 0-fold sites, and their numbers are fewer that 4-fold sites in the singleton category, but greater among the higher frequencies. The negative average Tajima's D values in Capsella grandiflora

(Figure 8) may also be indicative of purifying selection on synonymous sites.

Relaxation of purifying selection in the polyploid Capsella bursa-pastoris compared to its diploid progenitor, C. grandiflora, is also apparent in the data. The significant increase in the ratios of both non-synonymous compared to synonymous segregating sites (Figure 5; Chi Square test, p < 0.0001), and segregating stop codons compared to 4-fold synonymous sites (Figure 6; Chi Square test, p < 0.0001) in Capsella bursa-pastoris, is a clear indication of relaxed selection. Deleterious mutations such as non-synonymous SNPs and segregating stop codons are usually kept at low frequencies by purifying selection, as they can be expected to be highly damaging to the gene's protein product and overall function. If they increase in frequency, it is an indication that purifying selection is weak, and not preventing their spread through the population.

The distribution of fitness effects of new mutations at 0-fold non-synonymous sites quantifies the degree to which negative selection is relaxed in Capsella bursa- 130 pastoris (Figure 9), by predicting the explicit effect that non-synonymous mutations have on fitness in terms of the effective population size (Ne) and the selection coefficient (s)

(Keightley and Eyre-Walker, 2007). The larger proportion of weakly and intermediately deleterious mutations (Ne*s < 100) in Capsella bursa-pastoris indicates that many more mutations are less harmful to fitness in individuals of this species, compared to C. grandiflora, and/or that selection is less efficient at eliminating them. Capsella grandiflora has more than double the proportion of mutations with extreme deleterious effects on fitness (Ne*s > 100) compared to C. bursa-pastoris (Figure 9). Both results are likely due to gene redundancy in the polyploid. Mutations in Capsella bursa-pastoris may often not be harmful as long as one copy of the gene is functional, placing many of them in the weak or intermediate category, and leaving very few in the extremely deleterious category. Capsella grandiflora, without the advantage of genome-wide redundancy, possesses many more mutations that are significantly harmful to fitness

(Figure 9). Therefore, the forces driving negative selection may be muted in C. bursa- pastoris, as fitness is much less dependent on the purging of harmful mutations in the genome.

This finding is further reinforced by examining the frequency distribution of several classes of harmful mutations, according to the SNPEff algorithm (Cingolani,

2012), which predicts the types of effects that mutations will have on the genome. I show the frequency distributions of the most harmful types of mutations, which include splice- site acceptor mutations (Figure 10A), splice-site donor mutations (Figure 10B), loss of 131 start codons (Figure IOC), gain of stop codons (Figure 10D), and non-synonymous coding region mutations (Figure 10E). The frequency distributions of all these mutation

types show the same pattern: much higher numbers in the most rare category, with a frequency of 12.5%, in Capsella grandiflora, and the opposite for the fixed category

(Figure 10A-E). Capsella bursa-pastoris has allowed many more deleterious mutations to become fixed, while they are kept rare in C. grandiflora, lending further support to the inference that the strength of negative selection is greatly reduced in the polyploid.

Polyploids are expected to accumulate deleterious mutations, since they are

buffered from harmful effects by the redundancy in their genomes (Matzke and Matzke,

1998; Wendel 2000). As long as one functional copy of each gene is present, the other may in many cases accumulate all manner of deleterious mutations without compromising the fitness of the organism, assuming it does not undergo neofunctionalization. Therefore, the need for negative selection to purge these mutations is greatly reduced, and their accumulation indicates a reduction in this force acting on the genome. Some mutations that compromise protein function may even be beneficial in a newly-formed polyploid genome, as it may be essential to silence certain genes whose protein products are required to maintain a specific dosage (Osborn, 2003), or where groups of genes participating in a particular regulatory network or metabolic pathway have co-evolved, and gene copies may have conflicting alleles. This can lead to the silencing of groups of genes involved in one type of pathway that are all from the same

132 parent, in order to maintain the functioning of a tightly co-evolved network (Riddle,

2003; Adams and Wendel 2005).

Adaptive evolution appears to be decreased in Capsella bursa-pastoris, compared to C. grandiflora, according to both the proportion (a, Figure 11), and rate (coa, Figure

12), of adaptive substitutions. The pattern is especially strong in intergenic and 0-fold sites (Figures 11, 12). These results imply, contrary to some predictions, a loss of adaptive evolution in the tetraploid. However, some caution should be placed on this conclusion, since the power to detect adaptive substitution can be masked by the presence of slightly deleterious mutations, which could be limiting power to detect positive selection in C. bursa-pastoris (Keightley and Eyre-Walker, 2007).

With the caveat above about power in mind, there are a number of reasons why

Capsella bursa-pastoris may experience less adaptive evolution than its diploid progenitor. The smaller effective population size can reduce the efficiency of selection, both positive and negative. Polyploidy itself can reduce the efficiency of natural selection, as the spread of favorable alleles is expected to be slower (Otto and Whitton,

2000). These two factors are not mutually exclusive, and it may be a combination of both that reduces the efficiency of selection in Capsella bursa-pastoris, leading to a lower rate of adaptive evolution. Finally, the time since the divergence of the two species, polyploid and progenitor, must be taken into account. Our investigation of the two genomes is a snapshot in time, and can only capture the patterns of polymorphism that are present at this moment during the evolution of the two species. We can infer that adaptive evolution 133 is reduced in Capsella bursa-pastoris compared to C. grandiflora within the first 270-

700,000 years of their divergence. It may be that adaptation will increase over time. In fact, the same is true for the relaxation of negative selection in C. bursa-pastoris. The result holds true at this point in the species' evolutionary history, and therefore occurred quickly following genome duplication.

This study demonstrates the effects of a transition to polyploidy, in the context of population bottleneck and shift in mating system, all three of which are common aspects of the colonization process in plants. The observations include an increase in chromosome-wide heterogeneity of the frequency spectrum and levels of nucleotide diversity, as large regions of the chromosomes have retained or lost genetic diversity, or polymorphisms of specific frequencies. These patterns, and an overall decrease in diversity and increase in intermediate-frequency polymorphisms reflect the reduction in effective population size caused by the population bottleneck and transition to inbreeding that accompanied speciation and the polyploidy event in C. bursa-pastoris.

An increase in adaptive evolution is not observed in Capsella bursa-pastoris, and in fact the trend is in the opposite direction. This may be due to the reduced effective population size reducing the efficiency of selection in this species, or it may simply represent a time lag, and adaptive evolution will increase later in the evolutionary history of this species. Another possibility is a lack of power to detect adaptive evolution using the DFE Alpha method.

134 The most striking observation is the genome-wide reduction in negative, or purifying selection in the polyploid species. The result was expected due to gene redundancy, and has implications for future genome evolution, as selective constraint is relaxed, and more functional variation may be available for natural selection to act upon, which may increase adaptive potential and eventually lead to neofunctionalization of pseudogenes. We can conclude that relaxation of purifying selection has quickly followed the transition to polyploidy in Capsella bursa-pastoris. This is the first genome-wide evidence of a reduction in purifying selection in a polyploid.

135 Figures

A)

o _ Radishi .006 CM

FL2-Area

136 B)

5-16.011

200 400 600 800 1000 FL2-Area

Figure 1. Flo cytometry results for A) radish and B) radish and the Capsella bursa- pastoris sample from Spain. On the y-axis are the cell counts of cells possessing the corresponding area on the x-axis.

137 A

0 1000000

«hf*nieM«M<

y^VM-fc^V^

2000000 *000000

138 B

Figure 2. Sliding windows of nucleotide diversity (pi) across chromosome 1, with a window size of 1000 SNPs and a step of 500 SNPs, shown for A) Capsella grandiflora, and B) Capsella bursa-pastoris, excluding SNPs fixed between homeologs. The dashed line indicates the overall average pi value for the chromosome.

139 0 1000000 lOOOOOOO IH3OOO00

«b#emowwwS (MmoNiMt

"•

• : 0 *000000 <000000 0000000 4000000 10000000 <1000000 14000C 0 MOOOOO <0000000 14000000

ohwwmwm 7 (hroiMtomol

0 4000000 10000000 1MOOOOO 0 2000000 4000000 4000000 oooooaa 10000004 11000000

140 B

'Wy-yi/W^

Figure 3. Sliding windows of the site frequency spectrum (Tajima's D statistic) across chromosome 1, with a window size of 1000 SNPs and a step of 500 SNPs, shown for A)

Capsella grandiflora, and B) Capsella bursa-pastoris, excluding SNPs fixed between homeologs. The dashed line indicates the overall average Tajima's D value for the chromosome.

141 • 4fold • intergene • Ofold • 3utr • intron • 5utr

lllHhlfettihki.

5 7 9 11 13

Minor Allele Count

142 B

in 0 • 4-fold • intergene • 0-fold • 5-prime • intron • 3-prime o CO o 3 1 CM IL ©

Minor Allele Count

Figure 4. Minor allele frequencies across all individuals and for all site type categories in

A) Capsella grandiflora and B) Capsella bursa-pastoris.

143 0.84

c 0.82 Sr 0.8 ui '£ 0.78 H 0 C 0.76 1 to 0.74

0.72 C. bursa-pastoris C. grandiflora Species

Figure 5. The ratios of the total genome-wide non-synonymous segregating sites compared to synonymous segregating sites in in all individuals of Capsella grandiflora and Capsella bursa-pastoris. *** p < 0.0001.

144 0.006 "

^ 0.005 -

<11 § 0.004 •5 «/> o 2 0.003 *% o §• 2 0.002 +•> ^ 0.001 - 0 I C. bursa-pastoris C. grandiflora Species

Figure 6. The ratio of total genome-wide segregating and fixed stop codons to 4-fold synonymous sites in for all individuals of Capsella grandiflora and Capsella bursa- pastoris. *** p < 0.0001

145 0.03 n

0.025 *

V 0.02

• Synonymous Theta SJ 0.015 m Synonymous Pi

0.01 • Non-synonymous Theta • Non-synonymous Pi 0.005

Capsella bursa-pastoris Capsella grandiflora Species

Figure 7. Two measures of the genome-wide average synonymous and non-synonymous nucleotide diversity, Theta and Pi, shown for all individuals of Capsella grandiflora and

Capsella bursa-pastoris.

146 0.2 -

0.1 -

0 JHHHB&iL Capsella bursa- Capse^^^Rliflora -0.1 • pastoris i Synonymous -0.2 - Tajima's D

-0.3 * i Non-Synonymous -0.4 - Tajima's D

-0.5 "

-0.6 •

-0.7 -

-0.8

-0.9 J Species

Figure 8. Average genome-wide synonymous and non-synonymous Tajima's D statistic for all individuals Capsella grandiflora and Capsella bursa-pastoris.

147 o 0.40

n 0.35

° 0.25 C. bursa-pastoris .2 0.20 C. grandiflora O 0.15

0.00 <1 1 < 10 10 < 100 >100 Ne(s)

Figure 9. The distribution of fitness effects of new mutations at 0-fold sites, across all individuals, in both speices. Fitness effects are shown by categories, mutations with Ne(s)

< 1 being the most weakly deleterious, and mutations with Ne(s) < 100 being the most deleterious.

148 A)

600 i

• C. bursa-pastoris 500 - • C. grandiflora

400 •

V) 300 CO % 200

100

0.125 0.25 0.375 0.5 0.625 0.75 0.875 1

Frequency of Splice Site Acceptor Mutations

149 B)

700

1C. bursa-pastoris 600 1C. grandiflora 500

<0 400 8 <75 % 300 -

200 -

100 •

0 1 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1

Frequency of Splice Site Donor Mutations

150 C)

250 -i

C. bursa-pastoris

200 - C. grandifiora

150 -

* 100 -

JL 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1

Frequency of Start Codon Loss.Mutations

151 C. bursa-pastoris

C. grandiflora

2500 -

1500 -

1000 •

0.125 0.25 0.375 0.5 0.625 0.75 0.875 1

Frequency of Stop Codon Gain Mutations

152 E)

250000

• C. bursa-pastoris 200000 • C. grandiflora

150000

100000

50000

0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1

Frequency of Non-Synonymous Mutations in Coding Regions

Figure 10. Frequency spectra of single nucleotide polymorphism (SNP) types that are predicted to have the most deleterious effects on fitness, or "high impact", including

SNPs located in A) splice acceptor sites, B) splice site donor sites, and SNPs causing C) the loss of a start codon, D) the gain of a stop codon, as well as E) non-synonymous

SNPs in coding regions.

153 • C. bursa-pastoris

• C. grandiflora

intergene intron Ofold 3utr 5utr

Site Type

Figure 11. Proportion of adaptive substitutions (a) for all site types in Capsella grandiflora and Capsella bursa-pastoris.

154 0.90

0.80 - • C. bursa-pastoris

• C. grandiflora 0.70

0.60

« 0.50 " S1 oE 0.40 1

0.30

0.20 -

0.10

0.00 intergene intron Ofold 3utr 5utr

Site Type

Figure 12. Relative rate of adaptive substitutions (toa) for all site types in Capsella grandiflora and Capsella bursa-pastoris.

155 References

Adams, KL, JF Wendel. 2005. Polyploidy and genome evolution in plants. Current Opinion in

Plant Biology 8:135-141.

Bachtrog, D, P Andolfatto. 2006. Selection, recombination and demographic history in

Drosophila miranda. Genetics 174:2045-2059.

Barrett, SCH. 2002. The evolution of plant sexual diversity. Nature Reviews Genetics 3:274-284.

Barringer, BC. 2007. Polyploidy and self-fertilization in flowering plants. American Journal of

Botany 94:1527-1533.

Blanc, G, K Hokamp, KH Wolfe. 2003. A recent polyploidy superimposed on older large-scale

duplications in the Arabidopsis genome. Genome Research 13:137-144.

Blanc, G, KH Wolfe. 2004. Functional divergence of duplicated genes formed by polyploidy

during Arabidopsis evolution. Plant Cell 16:1679-1691.

Chantret, N, J Salse, F Sabot, et al. 2005. Molecular basis of evolutionary events that shaped the

hardness locus in diploid and polyploid wheat species (Triticum and aegilops). Plant Cell

17:1033-1045.

Cifuentes, M, L Grandont, G Moore, AM Chevre, E Jenczewski. 2010. Genetic regulation of

meiosis in polyploid species: new insights into an old question. New Phytologist 186:29-

36.

Cingolani, P. "snpEff: Variant effect prediction", http://snpeff.sourceforge.net, 2012

Comai, L. 2005. The advantages and disadvantages of being polyploid. Nature Reviews Genetics

6:836-846. 156 Coyne, JA, NH Barton, M Turelli. 2000. Is Wright's shifting balance process important in

evolution? Evolution 54:306-317.

Cui, LY, PK Wall, JH Leebens-Mack, et al. 2006. Widespread genome duplications throughout

the history of flowering plants. Genome Research 16:738-749.

Danecek, P, A Auton, G Abecasis, et al. 2011. The variant call format and VCFtools.

Bioinformatics 27:2156-2158.

Doyle, JJ, JL Doyle. 1987. A rapid DNA isolation procedure for small quantities of fresh leaf

tissue. Phytochemistry:l 1-15.

Doyle, JJ, LE Flagel, AH Paterson, RA Rapp, DE Soltis, PS Soltis, JF Wendel. 2008.

Evolutionary Genetics of Genome Merger and Doubling in Plants. Annual Reviews

Genetics 42:443-461.

Eyre-Walker, A, PD Keightley. 2009. Estimating the rate of adaptive molecular evolution in the

presence of slightly deleterious mutations and population size change. Molecular Biology

and Evolution 26:2097-2108.

Fawcett, JA, S Maere, Y Van de Peer. 2009. Plants with double genomes might have had a better

chance to survive the Cretaceous-Tertiary extinction event. Proceedings of the National

Acadamy of Sciences USA 106:5737-5742.

Flagel, LE, JF Wendel. 2009. Gene duplication and evolutionary novelty in plants. New

Phytologist 183:557-564.

Grover, CE, H Kim, RA Wing, AH Paterson, JF Wendel. 2007. Microcolinearity and genome

evolution in the AdhA region of diploid and polyploid cotton (Gossypium). Plant Journal 50:995-1006.

Hegarty, MJ, SJ Hiscock. 2008. Genomic clues to the evolutionary success of review polyploid

plants. Current Biology 18:R435-R444.

Hilu, KW. 1993. Polyploidy and the Evolution of Domesticated Plants. American Journal of

Botany 80:1494-1499.

Hurka, H, B Neuffer. 1997. Evolutionary processes in the genus Capsella (Brassicaceae). Plant

Systematics and Evolution 206:295-316.

Kalisz, S, DW Vogler, KM Hanley. 2004. Context-dependent autonomous self-fertilization

yields reproductive assurance and mixed mating. Nature 430:884-887.

Keightley, PD, A Eyre-Walker. 2007. Joint inference of the distribution of fitness effects of

deleterious mutations and population demography based on nucleotide polymorphism

frequencies. Genetics 177:2251-2261.

Keightley, PD, A Eyre-Walker. 2012. Estimating the Rate of Adaptive Molecular Evolution

When the Evolutionary Divergence Between Species is Small. Journal of Molecular

Evolution 74:61-68.

Koch, MA, M Kiefer. 2005. Genome evolution among cruciferous plants: A lecture from the

comparison of the genetic maps of three diploid species - Capsella rubella, Arabidopsis

lyrata subsp Petraea, and A. thaliana. American Journal of Botany 92:761-767.

Leitch, AR, IJ Leitch. 2008. Perspective - Genomic plasticity and the diversity of polyploid

plants. Science 320:481-483.

Leitch, IJ, MD Bennett. 2004. Genome downsizing in polyploid plants. Biological Journal of the 158 Linnean Society 82:651-663.

Levin, DA. 1983. Polyploidy and Novelty in Flowering Plants. American Naturalist 122:1-25.

Li, H, B Handsaker, A Wysoker, T Fennell, J Ruan, N Homer, G Marth, G Abecasis, R Durbin,

P Genome Project Data. 2009. The Sequence Alignment/Map format and SAMtools.

Bioinformatics 25:2078-2079.

Lynch, M, JS Conery. 2000. The evolutionary fate and consequences of duplicate genes. Science

290:1151-1155.

Mable, BK. 2004. Polyploidy and self-compatibility: is there an association? New Phytologist

162:803-811.

Masterson, J. 1994. Stomatal Size in Fossil Plants - Evidence for Polyploidy in Majority of

Angiosperms. Science 264:421-424.

Matzke, MA, AJM Matzke. 1998. Polyploidy and transposons. Trends in Ecology & Evolution

13:241-241.

Mayrose, I, SH Zhan, CJ Rothfels, K Magnuson-Ford, MS Barker, LH Rieseberg, SP Otto. 2011.

Recently Formed Polyploid Plants Diversify at Lower Rates. Science 333:1257-1257.

McKenna, A, M Hanna, E Banks, et al. 2010. The Genome Analysis Toolkit: A MapReduce

framework for analyzing next-generation DNA sequencing data. Genome Research

20:1297-1303.

Nei, M. 1987. Molecular Evolutionary Genetics. New York: Columbia University Press.

Osborn, TC, JC Pires, JA Birchler, et al. 2003. Understanding mechanisms of novel gene

expression in polyploids. Trends in Genetics 19:141-147. 159 Otto, SP, J Whitton. 2000. Polyploid incidence and evolution. Annu Rev Genet 34:401-437.

Paetsch, M, S Mayland-Quellhorst, B Neuffer. 2006. Evolution of the self-incompatibility

system in the Brassicaceae: identification of S-locus receptor kinase (SRK) in self-

incompatible Capsella grandiflora. Heredity 97:283-290.

Ramsey, J, DW Schemske. 1998. Pathways, mechanisms, and rates of polyploid formation in

flowering plants. Annual Review of Ecology and Systematics 29:467-501.

Ramsey, J, DW Schemske. 2002. Neopolyploidy in flowering plants. Annual Review of Ecology

and Systematics 33:589-639.

Riddle, NC, JA Birchler. 2003. Effects of reunited diverged regulatory hierarchies in

allopolyploids and species hybrids. Trends in Genetics 19:597-600.

Seoighe, C, C Gehring. 2004. Genome duplication led to highly selective expansion of the

Arabidopsis thaliana proteome. Trends in Genetics 20:461-464.

Slotte, T, JP Foxe, KM Hazzouri, SI Wright. 2010. Genome-wide evidence for efficient positive

and purifying selection in Capsella grandiflora, a plant species with a large effective

population size. Molecular Biology and Evolution 27:1813-1821.

Slotte, T, HR Huang, M Lascoux, A Ceplitis. 2008. Polyploid speciation did not confer instant

reproductive isolation in Capsella (Brassicaceae). Molecular Biology and Evolution

25:1472-1481.

Soltis, DE, RJA Buggs, JJ Doyle, PS Soltis. 2010. What we still don't know about polyploidy.

Taxon 59:1387-1403.

Soltis, DE, PS Soltis. 1999. Polyploidy: recurrent formation and genome evolution. Trends in 160 Ecology & Evolution 14:348-352.

St Onge, K, JP Foxe, L Junrui, L Haipeng, K Holm, P Corcoran, T Slotte, M Lascoux, S Wright.

2012. Coalescent-based analysis distinguishes between alio- and autopolyploid origin in

shepherd's purse (Capsella bursa-pastoris). Molecular Biology and Evolution.

Stebbins, GL. 1957. Self fertilization and population variability in the higher plants. American

Naturalist 91:337-354.

Tajima, F. 1983. Evolutionary relationship of DNA sequences in finite populations. Genetics

105:437-460.

Tajima, F. 1989. DNA polymorphism in a subdivided population: the expected number of

segregating sites in the two-subpopulation model. Genetics 123:229-240.

te Beest, M, JJ Le Roux, DM Richardson, AK Brysting, J Suda, M Kubesova, P Pysek. 2012.

The more the better? The role of polyploidy in facilitating plant invasions. Annals of

Botany 109:19-45.

Van de Peer, Y, S Maere, A Meyer. 2009. The evolutionary significance of ancient genome

duplications. Nature Reviews Genetics 10:725-732. <

Villar, R, EJ Veneklaas, P Jordano, H Lambers. 1998. Relative growth rate and biomass

allocation in 20 Aegilops (Poaceae) species. New Phytologist 140:425-437.

Wang, BS, ZY Ding, W Liu, J Pan, CB Li, S Ge, DM Zhang. 2009. Polyploid evolution in Oryza

officinalis complex of the genus Oryza. BMC Evolutionary Biology 9:250.

Watterson, GA. 1975. On the number of segregating sites in genetical models without

recombination. Theoretical Population Biology 7:256-276. 161 Wendel, JF. 2000. Genome evolution in polyploids. Plant Molecular Biology 42:225-249.

Whitlock, MC, PC Phillips, K Fowler. 2002. Persistence of changes in the genetic covariance

matrix after a bottleneck. Evolution 56:1968-1975.

Wolf, DE, N Takebayashi. 2004. Pollen limitation and the evolution of androdioecy from dioecy.

American Naturalist 163:122-137.

Wood, TE, N Takebayashi, MS Barker, I Mayrose, PB Greenspoon, LH Rieseberg. 2009. The

frequency of polyploid speciation in vascular plants. Proceedings of the National

Acadamy of Sciences USA 106:13875-13879.

162 CHAPTER 5

CONCLUSIONS

163 The establishment of a species in a new geographic range has profound implications for its evolution. Founder effects such as genetic drift, and shifts in the amount and distribution of genetic diversity can interact with new selection pressures, and other genetic changes that may accompany range expansion, such as shifts in mating system and chromosome copy number. In this thesis, I have investigated the genotypic and phenotypic differences between three members of the Capsella genus, which differ in colonization history, mating system, and ploidy, with respect to three factors that likely to be relevant to colonization: nutrient use and phenotypic plasticity thereof, disease resistance, and genome duplication.

My common garden experiment comparing the nitrogen use efficiency, plant performance traits, and phenotypic plasticity of these, demonstrated extensive population-level variation within all three species. While no associations between species colonization history and any of these traits were found, we can infer from the population- level variation that phenotypic diversity was preserved in the two colonizing species, following the genetic bottlenecks and shift to an inbreeding mating system, which both species had experienced, and transition to polyploidy in Capsella bursa-pastoris. I have shown that, while genome-wide diversity had been reduced in these species by the aforementioned processes, variation in nutrient use, performance traits, and plasticity was preserved. This variation may provide the fuel for local adaptation in these populations, if it hasn't already.

164 My analysis of the NBS-LRR regions of the disease resistance genes in Capsella

grandiflora and C. rubella demonstrated that nucleotide diversity was maintained at these

loci following major reduction in genetic diversity in C. rubella. Polymorphism patterns at plant disease resistance genes are expected to be influenced by balancing selection, either historical or ongoing, which maintains nucleotide diversity. Given the ecological importance of disease resistance, and the genetic diversity observed at the R-genes, we can infer that it is possible to preferentially retain genetic diversity at certain loci, even following a genome-wide bottleneck and transition to inbreeding, if selection pressures dictate.

The two projects demonstrate that phenotypic and genetic diversity can be maintained through a population bottleneck and transition to inbreeding. The whole- genome sequencing comparisons between Capsella bursa-pastoris and Capsella grandiflora demonstrated a major reduction in purifying selection on the genome of C. bursa-pastoris, a bottlenecked selfing polyploid that is a tremendously successful colonizer. The relaxation of purifying selection caused a shift in the spectrum of polymorphisms from being deleterious towards being neutral. The excess of neutral diversity and decrease in deleterious mutation load may have helped Capsella bursa- pastoris to overcome the effects of genetic drift following the bottleneck or inbreeding depression following the transition to selfing. Genetic drift causes more deleterious mutations to increase in frequency by chance, and inbreeding leads to increased homozygosity, allowing recessive deleterious mutations to become expressed. 165 Deleterious mutations shifting towards becoming neutral may buffer the genome against those effects.

In this thesis, I have investigated the consequences of colonization for two different plant species, Capsella rubella and Capsella bursa-pastoris, following divergence from their range-stable progenitor Capsella grandiflora. I have demonstrated the maintenance of phenotypic diversity and plasticity with respect to nitrogen use efficiency and plant performance in both Capsella rubella and C. bursa-pastoris. My analyses also show the retention of genetic diversity at disease resistance loci in Capsella rubella. Finally, I provide evidence for the genome-wide reduction in purifying selection and degree of deleterious polymorphism in Capsella bursa-pastoris. These results provide an overview of the evolutionary consequences that the colonization process has had in the Capsella genus, with regard to nitrogen use, disease resistance, and polyploidy.

166