Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1632

Genome evolution and adaptation of a successful allopolyploid, bursa-pastoris

DMYTRO KRYVOKHYZHA

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6214 ISBN 978-91-513-0236-2 UPPSALA urn:nbn:se:uu:diva-341709 2018 Dissertation presented at Uppsala University to be publicly examined in Lindhalsalen, Norbyväagen 18, Uppsala, Tuesday, 27 March 2018 at 10:00 for the degree of Doctor of Philosophy. The examination will be conducted in English. Faculty examiner: Dr. Vincent Castric (CNRS / Université de Lille 1, France).

Abstract Kryvokhyzha, D. 2018. Genome evolution and adaptation of a successful allopolyploid, Capsella bursa-pastoris. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1632. 51 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-513-0236-2.

The term allopolyploid refers to an organism that originated through hybridization and increased its ploidy level by retaining the unreduced genomes of its parents. Both hybridization and polyploidy usually have negative consequences for the organism. However, there are species that not only survive these modifications but even thrive and can outcompete their diploid relatives. There are many intuitive explanations for the success of polyploids, but the number of empirical studies is limited. The shepherd's purse (Capsella bursa-pastoris) is an emerging model for studying a successful allopolyploid species. C. bursa-pastoris occurs worldwide, whereas its parental species, and Capsella orientalis, have more limited distribution range. C. grandiflora is confined to Northern Greece and Albania, and C. orientalis is found only in the steppes of Central Asia. We described the genetic variation within C. bursa-pastoris and showed that it is not homogeneous across Eurasia but rather subdivided into three genetically distinct populations: one comprises accessions from Europe and Eastern Siberia, the second one is located in Eastern Asia and the third one groups accessions around the Middle East. Reconstruction of the colonization history suggested that this species originated in the Middle East and subsequently spread to Europe and Eastern Asia. This colonization was probably human-mediated. Interestingly, these three populations survive in different environmental conditions, and yet most gene expression differences between them could be explained by neutral processes. We also found that despite a common history within one species, the two subgenomes retained differences already present between the parental species. In particular, the genetic load was still higher on the subgenome inherited from C. orientalis than on the one inherited from C. grandiflora. The two subgenomes were also differentially influenced by introgression and selection in the three genetic clusters. Gene expression variation was highly correlated between the two subgenomes but the total level of expression showed variation in parental dominance across flower, leaf, and root tissues. This thesis for the first time shows that the evolutionary pathways of allopolyploids may differ not only on the species level but also between populations within one species. It also supports the theory that alloploidy provides an increased amount of genetic material that enables evolutionary flexibility.

Keywords: allopolyploidy, population structure, selection, genetic drift, gene expression, parental legacy, genetic load

Dmytro Kryvokhyzha, Department of Ecology and Genetics, Norbyvägen 18 D, Uppsala University, SE-752 36 Uppsala, Sweden.

© Dmytro Kryvokhyzha 2018

ISSN 1651-6214 ISBN 978-91-513-0236-2 urn:nbn:se:uu:diva-341709 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-341709) ”If you thought that science was certain - well, that is just an error on your part.” – Richard P. Feynman

List of papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Cornille A., Salcedo A., Kryvokhyzha D., Glémin S., Holm K., Wright S.I. and Lascoux M., 2016. Genomic signature of successful colonization of Eurasia by the allopolyploid shepherd’s purse (Capsella bursa-pastoris). Molecular ecology, 25(2), pp. 616-629.

II Kryvokhyzha D.*, Holm K.*, Chen J., Cornille A., Glémin S., Wright S.I., Lagercrantz U. and Lascoux M., 2016. The influence of population structure on gene expression and flowering time variation in the ubiquitous weed Capsella bursa-pastoris (). Molecular ecology, 25(5), pp. 1106-1121.

III Kryvokhyzha D.*, Salcedo A.*, Eriksson M., Duan T., Tawari N., Chen J., Guerrina M., Kreiner J.M., Kent T.V., Lagercrantz U., Stinchcombe J., Glémin S., Wright S.I., and Lascoux M., 2017. Parental legacy, demography, and introgression influenced the evolution of the two subgenomes of the tetraploid Capsella bursa-pastoris (Brassicaceae). bioRxiv, 234096.

IV Kryvokhyzha D., Duan T., Orsucci M., Milesi P., Glémin, S., Wright, S.I., and Lascoux, M., 2018. Towards the new normal: Genomic and transcriptomic changes in the two subgenomes of a 100,000 years old tetraploid, Capsella bursa-pastoris. (Manuscript)

* These authors contributed equally to the work. Reprints were made with permission from the publishers.

Contents

1 Introduction ...... 9 1.1 Evolutionary significance of polyploidy and hybridization ...... 10 1.2 Population structure, genetic drift and adaptation ...... 14 1.3 Challenges of studying allopolyploid organisms ...... 18 1.4 A promising model system to understand allopolyploidy: C. bursa-pastoris ...... 19

2 Present study ...... 24

3 Conclusions and future perspectives ...... 35

Svensk sammanfattning ...... 37

Резюме українською ...... 39

Acknowledgements ...... 41

References ...... 43

1. Introduction

If you ate oatmeal or a sandwich for breakfast today, you ate a polyploid. Oat- meal is made from oat Avena sativa, which is a polyploid. Bread flour is usu- ally produced by graining seeds of either common wheat Triticum aestivum or durum wheat Triticum turgidum, both of which are also polyploids. Moreover, these three species are also hybrids. Polyploids and hybrids are quite common among domesticated , and with the recent advent of molecular tools, it has become evident that they are also abundant in the wild. Polyploid is an organism that has more than two paired sets of chromo- somes. The majority of macroorganisms have a diploid set of chromosomes (2n), but some animals and many plants have tri- (3n), tetra- (4n), hexa- (6n) and even octoploid (8n) sets of chromosomes. Such increase in the ploidy level may arise from a doubling of the same genome within a cell, resulting in an autopolyploid, or from merging two genomes from different species during hy- bridization, resulting in an allopolyploid (Fig. 1.1). Both autopolyploidy and allopolyploidy imply significant genetic changes that disrupt cellular functions and usually lead to extinction. However, hybrids and polyploids that manage to survive often become successful species and can even outperform their diploid relatives. Studying the phenomena associated to polyploidization may shed more light on our understanding of the link between the content of a genome and fitness. The genetic variability of polyploids has often been used in the domestication process. More than half of all crop species are polyploids. The frequency of polyploids in the wild, while significantly lower, remains high ( 40%) [1]. There has been a lot of progress in studying the genomic consequences of poly- ploidy in crops [2, 3]. However, polyploidy was rather a precondition of do- mestication than a consequence [1]. So, we also need to study the origin and mechanisms of evolution in natural polyploid species. The genome-wide analysis of allopolyploid species is, however, not easy. The significant size of polyploid genomes, the aggregate of two or more diver- gent genomes and the presence of multiple alleles and loci, as well as massive structural and functional rearrangements, make it challenging to assemble the genome, obtain accurate genotype data and analyze it with existing popula- tion genetics models that are primarily developed for diploid data. This is the main reason why genomic studies of polyploids are still lagging behind those of diploids. This thesis is a contribution to the progress of genomic studies of allopoly- ploids. Here, I summarize our investigations of genome evolution and adap- tation in one of the most successful natural polyploids, the shepherd’s purse

9 > Homoploid hybrid

> Allopolyploid

> Autopolyploid

Figure 1.1. Scheme of origin of a homoploid hybrid, allopolyploid and autopoly- ploid species. Parental species contain two chromosomes (2n). Diploid hybrid (homo- ploid) species arises through merging of one chromosome from each parental species. Allopolyploid species originates the same way except that it combines the entire chro- mosomal sets of both parental species. Autopolyploid species emerges by chromosome doubling of the same species.

Capsella bursa-pastoris. The results show that evolution of allopolyploids is more complex than previously imagined. In particular, our results indicate that both demography and introgression played an important role in the evolution of the shepherd’s purse.

1.1 Evolutionary significance of polyploidy and hybridization The evolutionary role of polyploidy has long been a subject of debate. In the 20th century, G. L. Stebbins stated that polyploidy contributed little to progres- sive evolution [4]. W. H. Wagner also denied the significance of polyploidy and regarded it as ”evolutionary noise” [5]. On the other hand, R. J. Schultz insisted that polyploidy was far from playing a secondary role in evolution and had actually contributed to major steps in evolution [6]. Despite important progress in our understanding of polyploids since these early studies [7], it is still debated whether polyploidy is an evolutionary dead-end or a major driver of diversification [8, 9, 10, 11]. The evolutionary significance of hybridization has also been controversial. Charles Darwin himself changed his point of view from discounting hybridiza-

10 tion as a source of new forms (”I am very far from believing in hybrids...”, Darwin’s letter to Joseph Hooker in 1856) to emphasizing its significance (”the several kinds of dogs are almost certainly descended from more than one spe- cies, and so it is with cattle, pigs and some other domesticated animals.”, Vari- ation of Animals and Plants under Domestication, 1890). The progressive role of hybridization was totally discarded during the era of the biological species concept that viewed species as discrete reproductively isolated units [12]. Dur- ing the last two decades, the view of hybridization as a source of diversification and adaptation has been revived [13]. Moreover, with the advent of DNA se- quencing and genome-scale analyses, it has become evident that hybridization resulting in genomic introgression and speciation is widespread [14, 15]. The evolutionary significance of polyploidy is usually attributed to its abil- ity to rapidly produce new vigorous forms. An increase of ploidy results in nearly instantaneous reproductive isolation from parental species [16]. For example, if a newly emerged tetraploid (4n) individual crossed with a diploid (2n) ancestor, it would produce a triploid (3n) offspring. Having an odd num- ber of chromosomes, such triploid progeny would be sterile due to problems with chromosome pairing during meiosis. Reproductive isolation may also arise from switching to the asexual reproduction mode, e.i. reproduction from maternal tissues bypassing sexual fusion of an egg and a sperm. Asexual re- production in polyploids is often realized through parthenogenesis, a process when an embryo develops from an unfertilized egg cell. This is a common reproduction mode in polyploid animals [17]. Polyploids may also reproduce by selfing. The change of ploidy level often disrupts the self-incompatibility system and allows self-fertilization, which is especially common among plants [18, 19]. Thus, polyploidy is one of the few mechanisms that may lead to sym- patric speciation (speciation without geographic isolation). New polyploid lineages often demonstrate superiority relative to diploid species, a phenomenon known as polyploid vigor. This vigor is generally attributed to increased genetic variability. First, polyploid genomes are usu- ally unstable at early stages and experience massive rearrangements and frag- ment loss [20]. These major genomic changes introduce more variation into a polyploid population that survives initial stages of formation and thus provides more material that can be selected for. Second, polyploids have several copies of each gene. This gene redundancy also benefits a polyploid because genes copies can gain new functions while retaining the old function. Third, poly- ploids have greater ability to mask deleterious mutations and suffer less from inbreeding depression. Because polyploids have several genomic copies, dele- terious mutations that are usually recessive have smaller chance to influence a phenotype. For instance, take a diploid that has a genotype with two dominant alleles AA. If one of its two alleles mutates to the deleterious recessive allele a, the resulting genotype is Aa, and a allele needs to replace only one dominant allele A to produce a genotype aa and change the phenotype. Whereas one recessive allele would need to replace three dominant alleles in a tetraploid to

11 achieve the same phenotypic change because the genotype of a tetraploid is AAAa. Several genomic copies also ensure that crossing between closely re- lated individuals, known as inbreeding, rarely leads to homozygous offspring and thus does not have a deleterious effect. Thus, deleterious mutations need to reach higher frequency in polyploids than in diploids to achieve a homozygous state and influence the phenotype. Fourth, polyploids often experience rapid changes in patterns of gene expression due to gene duplication. Duplicated genes may gain new functions (neo-functionalization), partition their origi- nal ancestral function (subfunctionalization) or completely lose their function (pseudogenization) [21, 22, 23]. Finally, these genetic changes also influence the proteome, metabolome and phenotypic variation. A few studies that have investigated the effect of polyploidy on proteome and metabolome show that the variation in proteome and metabolome may substantially differ between polyploids and parental species [24, 25]. Polyploids also tend to possess en- larged organs and overall larger size [26]. All these genetic and phenotypic modifications of polyploids provide more variability that may facilitate sur- vival. Hybridization also increases genetic variance and can produce new vigorous forms. The genetic variance is boosted by genomic re-patterning because of the conflict between two divergent genomes and release of transposable elements from the suppression mechanisms established within each parent. The ge- nomic conflict between two divergent genomes also disrupts gene expression regulation networks and lead to non-additive gene expression patterns when expression levels are not non-intermediate between parental species. This new variation and mixture of different parental traits may result in transgressive phenotypes, whereby a hybrid exhibits extreme traits relative to its two par- ents. Such traits may facilitate colonization of new ecological niches that are not accessible to parental species. Hence, hybridization may result into hy- brid speciation. The hybrid species is then isolated from its parental species by its novel ecological niche. The most famous examples of such a speciation scenario include Helianthus sunflowers, Rhagoletis flies, Cottus sculpins, Ly- caeides butterflies [27, 28, 29, 30]. Isolation from parental species can also arise due to assortative mating when hybrids have different mating seasons or phenology from the parental species and mate only among themselves as in some hybrid Heliconius butterflies [31]. These are scenarios of homoploid hy- brid speciation because the chromosome number of a hybrid does not increase (Fig. 1.1). However, hybridization can also be accompanied by chromosomal dou- bling, i.e. allopolyploidy (Fig. 1.1). Allopolyploid forms can emerge from chromosome doubling in a formed diploid hybrid or from a fusion of two unreduced gametes during the hybridization event [32, 33]. There is also a way through a triploid bridge when an unreduced diploid gamete fuses with a haploid gamete from a diploid species to form a triploid, and later this triploid produces an unreduced triploid gamete that fuses with a haploid gamete of a

12 diploid. There are numerous extant allopolyploid species but in most cases it is still unclear how they originated. A few documented examples include Primula kewensis that originated from somatic chromosome doubling in a diploid hybrid [34], a tetraploid strain of silk moths (Bombyx) that was ob- tained through a triploid bridge [35], and fusion of diploid gametes was used in the production of synthetic polyploids of red clover and potato [36, 37]. Al- lopolyploids combine features of both hybrids and polyploids. They are more diverse than autopolyploids because they comprise two divergent genomes in- stead of one duplicated genome. They also suffer less from aneuploidy than homoploid hybrids and autopolyploids because each divergent subgenome is doubled and thus can find a pair during meiosis. This is why allopolyploids usually have even chromosome number, most frequently 4n, and exhibit di- somic inheritance when divergent subgenomes do not recombine. Allopoly- ploids also benefit from having two divergent genomes by differentially ex- pressing genes from the two subgenomes (homeologue-specific expression). Hybridization may also contribute to adaptability and diversification thro- ugh genomic introgression when one species just acquires a fragment of ge- netic material from another species. It is well known that bacteria thrive on introgression that often confers adaptive traits [38]. There is also growing ev- idence of such benefits among plants and animals. For example, transfer of herbivore resistance in sunflower Helianthus [39] and insecticide resistance in malaria mosquitoes [40] have been attributed to introgressive hybridization. Introgressive hybridization has also contributed to color pattern diversity of Heliconius butterflies [41] and adaptive radiations in Darwin’s finches [42]. There are even evidence for archaic adaptive introgression in humans [43]. All these factors that increase genetic diversity and were described above as advantageous can have negative effects as well. Assuming that the genome structure and regulatory networks were optimized under selection in the paren- tal species, both increase of ploidy level and hybridization would disrupt this long-established design and result in fitness loss. Moreover, an increase in the copy number of chromosomes causes problems with mitosis and meiosis during cell divisions and changes cellular architecture due to an increased nu- cleus size [44]. Forms that switch to self-fertilization suffer from the negative effect of inbreeding depression. Hybrid forms experience conflicts between divergent genomes and may suffer from outbreeding depression. Maybe this is the reason why some allopolyploids evolve towards functional diploidiza- tion when subgenomes experience differential gene loss, deletion, and down- regulation of gene expression [45, 46, 47, 48]. On balance, hybridization and polyploidy are more often deleterious than advantageous but those individuals that survive frequently become evolutionary successful lineages.

13 1.2 Population structure, genetic drift and adaptation Species are highly dynamics entities: their size, density, and geographical lo- cation fluctuate over time; they are also often highly fragmented over space. This seriously complicates the study of species and the reconstruction of their histories. Even if one is primarily interested in a trait that, at first glance, is not related to demographics, for instance, the relation between the two subgenomes of a polyploid, the details of the demographic history of the polyploid species will need to be considered. Demography will for instance influence the evolu- tion of the trait through the local and overall amount of random genetic drift. Structured populations will also face different environmental conditions and be under different selection pressures. A textbook example of the impact of environmental changes and local adaptation is the Peppered moth (Biston be- tularia). It was found that dark moths were prevalent in the polluted environ- ment around Birmingham, UK, because they were less conspicuous on dark tree trunks, while light moths were common in a clean environment such as Devon, UK, where tree trunks were covered with lichens and thus were light in color [49]. This difference is proven to be due to differential selection [50]. However, as hinted above, evolutionary differences between populations are not necessarily the result of selection. Differences may also arise simply due to neutral processes. Suppose there are two well-isolated populations and a new mutation arises in one of these populations. Its probability to disappear from the population is high, even if it is positively selected. This is nicely illustrated by J. Haldane’s [51] early result on the probability of fixation of a positively selected allele: the probability of fixation of a mutation with a selected advan- tage s is 2s. If s = 0.01, a large value, the probability of fixation will be a mere 2%! Of course, a neutral mutation generally has an even bleaker future. If by chance the mutation does survive and increases in frequency, a popula- tion with this new allele will become divergent from a population without this allele because the two populations do not exchange alleles. Over time there will be many mutations, some will go to fixation, primarily neutral or nearly neutral, and populations will diverge through random genetic drift [52]. The concept of random genetic drift stemmed from the work of J. Haldane, S. Wright, and R. Fisher but gained prominence through M. Kimura’s neutral theory of molecular evolution [53]. The neutral theory of molecular evolu- tion states that most changes observed at the molecular level reflect a balance between mutations that are functionally neutral or nearly neutral and random genetic drift acting on these mutations in a population of finite size. The neu- tral theory of evolution does not state that adaptive selection does not occur but assumes that it plays a minor part in changes at the genome level. The neutral theory has been challenged by a selection theory of evolution which states that truly neutral mutations represent a minority of all changes and that selection is a major force of evolution [54]. Whether the majority of changes are neutral or adaptive is still hotly debated [54, 55, 56]. Independently of this debate,

14 the neutral theory laid a foundation for statistical frameworks to distinguish natural selection from random genetic drift at the molecular level [57]. The difficulty in telling apart different theories of molecular evolution stem- med in great part on the difficulty to measure random genetic drift. Since the seminal work of S. Wright, this is done through estimation of the so-called ef- fective population size, Ne [58]. The effective population size of a population, Ne, is the size of an idealized population, the Wright-Fisher population, that has experienced the same amount of random genetic drift than the population of interest. The Wright-Fisher population is a population in which individuals mate randomly and where the only source of change in allele frequencies is allele sampling. In other words, a population without structure and without se- lection or migration. Of course, this is a highly idealized situation. At neutral site, the effective size is directly and simply related to genetic diversity and this is captured by the quantity θ = 4Neµ, where µ is the mutation rate and θ is genetic diversity that can be estimated by the number of pairwise differences between DNA sequences (Tajima’s π) [59] or by the number of segregating sites in DNA sequences (Watterson’s θ) [60]. Hence, assuming equal or sim- ilar mutation rate, a more genetically diverse population has a larger Ne than a less diverse one. The relationship between random genetic drift and Ne is well illustrated in Figure 1.2. In a large population of 2000 individuals, an allele does not deviate too much from its original frequency of 0.5 over the time of 50 generations. However, when the population size is as small as 20 individuals, there is an increasing effect of randomness and an allele can get fixed (frequency = 1) or get lost (frequency = 0) even within 10 generations. If the three populations from the Figure 1.2 were real populations, they would strongly differ in their allele frequency simply due to stochasticity. The effectiveness of selection also depends on Ne. Selection is more ef- fective in populations with large Ne because they are less affected by genetic drift, and the contrary is true for populations with small Ne. More precisely, the probabilities of an advantageous mutation to go to fixation and a deleteri- ous mutation to be eliminated is a function of its initial frequency 1/2Ne and the product Nes, where s measures the strength of selection. The fate of new mutation is largely determined by selection when Nes is much greater than one, and by random genetic drift when Nes is much smaller than one. Thus, if the strength of selection is the same in two populations but their Ne differ, levels of adaptation as the outcome of this selection will be different in these two populations. The level of Ne is affected by several factors. A population may go through a bottleneck, i.e. a strong decrease in size over a short period of time. This can greatly reduce Ne and such populations will evolve largely under random genetic drift. A population bottleneck can be caused by some environmental event (e.g. drought, fire, overhunting), or when a small fraction of the original populations colonizes a new location (founder event). For example, selfing species likely go through repeated bottlenecks when they expand across a new

15 Figure 1.2. Effect of population size on genetic drift. Ten simulations each of ran- dom change in the frequency distribution of a single hypothetical allele over 50 gen- erations for different sized populations; first population size n=20, second population n=200, and third population n=2000. (Image by Professor marginalia, distributed un- der a CC BY-SA 3.0 license).

16 territory. Ne can also decrease with the switch of reproductive mode from outcrossing to self-fertilization, which frequently occurs in plants [61]. Ne can gradually increase when a population is growing or when new variation is introduced by migrants from other populations. It can be concluded from the considerations above that natural populations are constantly under selection, leading to the evolution of adaptive traits, and genetic drift, leading to neutral variation. Distinguishing the two processes is one of the greatest challenges of evolutionary biology. This problem becomes particularly obvious in structured populations. Samples from two divergent populations exhibit differences that may look adaptive, but in fact, they are caused by genetic drift. One of the ways to separate the two processes is to as- sume that populations evolved largely neutrally if the variation between them can be explained by population structure. This is done through various math- ematical models that take into account the population structure during tests for adaptive differences [62, 63, 64]. This control for population structure, however, may also mask real effects if the distribution of adaptive variation coincides with population structure. For example, the first association study between variation in Dwarf8, a gene regulating flowering, and flowering time in maize revealed that the association was due to population structure [65], but a later study with larger sampling proved that it was significant [66]. Thus, knowledge on the population structure of a species is extremely important if the aim is to find adaptive variation. With growing evidence of inter-specific hybridization [14], admixture with other species in structured populations also need to be considered. One pop- ulation of a species may be in contact with other not reproductively isolated species and this may lead to genomic introgression specifically in this popula- tion. These introgressed genomic regions can be neutral, but they may appear as the regions under selection in some selection tests (e.g. selective sweeps scans) because they stand out from the genome-wide and species-wide varia- tion. Without the information on possible admixture with other species, such results may spuriously be interpreted as a local adaptive change because it is present only in one population. Admixture can also be strong enough to ho- mogenize the difference between species and disrupt the whole-genome phy- logeny for some populations. For example, a phylogenetic analyses of Helico- nius elevatus and Heliconius pardalinus butterflies revealed that they are more genetically related to each other in the area where their distribution ranges overlap than to conspecific populations where these two species are geograph- ically isolated [67]. An introgression can, of course, be under selection and truly adaptive. For example, in Spain, where house mice (Mus musculus do- mesticus) can hybridize with the Algerian mouse (Mus spretus), the majority of house mice have resistance to anticoagulant rodenticides that they acquired through introgression of the gene vkorc1 from the Algerian mouse [68]. House mice from other parts of Europe (Greece, Italy, UK), where the Algerian mouse does not occur, do not have vkorc1 that provide such resistance. Thus, know-

17 ing whether some populations of a species can potentially be admixed with any other species is extremely important. The population genetics analyses of Capsella bursa-pastoris presented in this thesis illustrate well how a species can be structured into several popula- tions and how these populations differ in their environments, effects of genetic drift, targets and strengths of selection. This thesis also demonstrates how dif- ferent populations of one species can be influenced by admixture with sister species. It also provides a good example of challenges associated with analyses requiring a correction for population structure.

1.3 Challenges of studying allopolyploid organisms To unravel the complex evolutionary history of allopolyploid organisms, one has to solve many practical and theoretical issues. Many allopolyploid species do not have a reference genome or the quality of their genomes is poor. This is primarily because genome assembly tools that reconstruct a whole genome from short DNA reads coming from a sequencing machine are not designed to resolve genome complexity of allopolyploids that have large genomes with many repeats, paralogs, and high heterozygosity. Genome assembly tools also generally do not handle properly the hybrid nature of allopolyploid genomes - regions of low divergence between two subgenomes would be collapsed to a single sequence by an assembler. Even such economically important species as bread wheat (Triticum aestivum) had only 61% of its genome sequence assem- bled [69] and only recent developments in both genomics and bioinformatics led to near-completion of its genome [70] A general approach to deal with the absence of a reference genome for an allopolyploid species is to use an available reference genome of the closest diploid species. However, this approach has some issues that a researcher should be aware of. First, one of the subgenomes of the allopolyploid may be more related to the reference genome than the other subgenome and this will increase the success of mapping genomic reads from the more closely re- lated subgenome. This mapping bias is not that critical for DNA data because at sufficient coverage most of the variation will be captured correctly. This is, however, a serious problem for gene expression analyses because mapping bias in RNA reads would directly influence the results. To reduce the severity of this mapping bias, one has to specify the correct divergence from the refer- ence genome during mapping. Bias in the RNA data can also be reduced by removing sites that show strong biases in the DNA data. Second, to separate the two subgenomes, the mapping data needs to be phased. There are several programs that can phase the data into haplotype blocks [71], but the concatena- tion of these blocks into continuous sequences representing two subgenomes is only possible if the origin of an allopolyploid is known and parental ge- nomic data is available. Allopolyploids usually exhibit disomic inheritance

18 (two subgenomes do not recombine) and this allows the use of the informa- tion provided by fixed differences between the two subgenomes and the two parental species to perform the concatenation. Third, most of genome analysis software, as well as population genetic models, are developed for haploid or diploid data and they expect bi-allelic variation (two alleles per variable site). Allopolyploids, however, have four alleles at each genomic site and because of their hybrid nature, some variable sites may include more than two alleles. To deal with this issue, one usually has to treat tetraploid data as diploid data and remove all non-biallelic sites. This simplification is based on the fact that selfing and disomic inheritance that are common in allopolyploids lead to the strong reduction of heterozygosity within each subgenome and hence variation within subgenomes can be neglected. The nature of allopolyploid data also makes it complicated to compare gene expression data between the two subgenomes of allopolyploid and the genomes of its diploid parental species. Traditionally, the gene expression level of a given gene is estimated by counting all reads that map to a gene. This is how the data is usually obtained for diploid parental species. On the other hand, gene expression data for the two subgenomes of an allopolyploid species can be obtained only by phasing the count data per variable site. For example, if a site is heterozygous AATT, one can obtain the number of reads with allele A and T correspondingly. Reads that are mapped to the region without polymor- phism cannot be distinguished between two subgenomes and thus are skipped. This results in the reduction of the total read counts and under-estimation of expression level for genes with a low number of heterozygous sites. To correct this bias, one can use the unphased reads counts of an allopolyploid and trans- form them into the phased data by using the ratio between the two subgenomes that was estimated on the phased data. Suppose 30 reads map to a gene, 16 of which map to a polymorphic site AATT, 10 with allele A and 6 with allele T. Hence, the proportion of one subgenome is 6/(6 + 10) = 0.375 and the other one is 1 − 0.375 = 0.625. Extrapolating this proportion to total read count per gene gives 0.375 ∗ 30 = 11 reads for the first subgenome and 0.625 ∗ 30 = 19 reads for the second one. This method is not ideal because allopolyploid genes without any polymorphism are still skipped, but it makes the diploid and al- lopolyploid data more comparable. All these challenges have been faced during the analyses of C. bursa-pastoris in this thesis. Some of the solutions presented above also have been developed during the work on this thesis.

1.4 A promising model system to understand allopolyploidy: C. bursa-pastoris Genetic studies on the shepherd’s purse Capsella bursa-pastoris were started in the early twentieth century by G. Shull. G. Shull, then one of the most

19 prominent geneticists of his time [72] (among other things he introduced the concept and word ”heterosis”), wanted to ”carry Mendelian analysis beyond the field of domesticated plants and animals into the realm of wild nature where relationships have been unmodified by artificial breeding and selection by hu- man agency”. He and his associates carried out a very large amount of crosses (4000 pedigreed families!) from a worldwide collection and also performed some cytological studies. These studies led to a delineation of the species of the Capsella genus [73, 74, 75]. In particular, S. Hill showed that the species could be divided into two groups based on their ploidy level, a tetraploid and a diploid one [74]. The Capsella genus is today considered to consist of 4-5 species: C. grandiflora, C. orientalis, C. rubella, C. bursa-pastoris, and possi- bly C. thracica [76]. Among these species only C. bursa-pastoris is a polyploid with a worldwide distribution (Fig. 1.3). Actually, C. bursa-pastoris is one of the most common weed in the world [77]. C. thracica is also a polyploid that is endemic to Bulgaria and was described as a separate from C. bursa-pastoris species [76]. However, given the recent information on the origin of C. bursa- pastoris [78], the status of C. thracica needs to be verified. It could as well be a divergent population of C. bursa-pastoris. The widespread allotetraploid C. bursa-pastoris has recently emerged as a promising system for studying the evolution of allopolyploidy. First, in- formation on its two progenitor diploid species and their current distribution is readily available. C. bursa-pastoris, a selfing species, originated from the hybridization of the Capsella orientalis and Capsella grandiflora/rubella lin- eages some 100-300 kya [78] (Fig. 1.3). Nowadays, C. orientalis, a geneti- cally depauperate selfer, occurs across the steppes of Central Asia and Eastern Europe, while C. grandiflora, an extremely genetically diverse obligate out- crosser, is primarily confined to a tiny distribution range in the mountains of Northern Greece and Albania (Fig. 1.3). Second, self-fertilization, small size, yearly reproduction and low requirements for growing conditions make it easy to use C. bursa-pastoris in laboratory experiments (Fig. 1.4). Third, genomic data of C. bursa-pastoris can easily be phased into subgenomes [78, 79] due to a disomic inheritance [80]; thus, meiotic pairing happens between homologous chromosomes from the same ancestor and subgenomes inherited from different ancestors do not recombine. Fourth, genomic data of C. bursa-pastoris can be analyzed by mapping to the reference genome of C. rubella, a selfer recently derived from C. grandiflora, that was sequenced, assembled and edited in 2013 [81]. Moreover, high quality genome assemblies of C. bursa-pastoris, C. gran- diflora and C. orientalis should be available soon (D. Weigel, personal comm.). Fifth, genomic studies of C. bursa-pastoris and other Capsella species bene- fit from their close relatedness to the best studied model species Ara- bidopsis thaliana. The estimated divergence time between the Arabidopsis and Capsella genera is about 10 million years [82, 83]. Sixth, the composition of the genus Capsella provides a good set up to study consequences of selfing and polyploidy. There have already been interesting studies on the genetic ba-

20 ~900kya

~100kya

~25-50kya

C. rubella C. grandiflora C. bursa-pastoris C. orientalis

Figure 1.3. Origin and distribution of the Capsella species. The origin scheme shows the most likely scenario of the origin of C. bursa-pastoris through the hy- bridization between C. orientalis and C. grandiflora/rubella lineages and the increase of ploidy level to 4n. The approximate time of species splits is shown in thousands of years ago (kya). Reduced flowers in C. bursa-pastoris, C. rubella, and C. orientalis demonstrate the consequence of selfing when there is no need to attract pollinators. The map shows the limited distribution ranges of the diploid Capsella species in Eura- sia, but C. bursa-pastoris is distributed worldwide because of its invasiveness.

21 sis of the shift to selfing and the so-called selfing syndrome [84, 85] and on the establishment of reproduction barriers [86, 87] in the diploid species of the genus. Our group also recently started a series of growth chamber experiments comparing the competitive ability of the different Capsella species. The first studies indicate that the two diploid selfing species have a much lower com- petitive ability than C. grandiflora, but C. bursa-pastoris has a competitive ability as high if not higher than C. grandiflora [88, 89]. It is, hence, as if the genome doubling associated with polyploidization restored the competi- tive ability lost because of the shift in the mating system. Seventh, worldwide distribution of C. bursa-pastoris exemplifies colonizing success [90]. Eighth, this worldwide distribution also enable studies of clinal variation in such traits as flowering time [91, 92, 93, 94]. Finally, evidence for gene flow between C. bursa-pastoris and its close relatives [95, 79] indicate that C. bursa-pastoris and the genus Capsella can also be used to investigate genomic introgression between diploids and polyploids. In addition, this thesis demonstrates that C. bursa-pastoris provides a model to study allopolyploid evolution in structured populations [96, 94]. In Eurasia, C. bursa-pastoris is divided into three genetic clusters - Middle East, Europe, and Asia - with low gene flow among them and strong differentiation both at the nucleotide and gene expression levels (Papers I and II). This thesis also ex- emplifies that the genomic data of C. bursa-pastoris can easily be phased into subgenomes [79] (Paper III). It also reveals sophisticated regulation of gene ex- pression that varies across different tissues (Paper IV). So, C. bursa-pastoris can be also used to study allopolyploid gene expression evolution across tis- sues and populations. The puzzle posed by C. bursa-pastoris had already fascinated G. Shull, who wrote in 1929 ”It is considered a matter of fundamental significance that the increase in the number of chromosomes in the bursa-pastoris group is cor- related with greater variability, greater adaptability, greater vigor and greater hardiness” [75]. The results presented in this thesis further highlight the intri- cacy of C. bursa-pastoris and more generally the complexity of allopoplyploid evolution. I believe that after reading this thesis C. bursa-pastoris will fasci- nate you too.

22 Figure 1.4. C. bursa-pastoris growing in laboratory conditions.

23 2. Present study

The four papers of this thesis are sequential studies of the shepherd’s purse C. bursa-pastoris. First, we determined its population structure and coloniza- tion history. Then we compared phenotypic variation in the discovered pop- ulations to assess how much of this variation was due to local adaption and genetic drift. Finally, we investigated the evolution of the two subgenomes of C. bursa-pastoris by comparing their genomic and gene expression variation with each other and with the parental species overall and in different popula- tions. The summaries of each of these studies are provided below.

Paper I. Population structure and colonization history Polyploid species usually inhabit larger geographic areas and have wider eco- logical niches than their diploid progenitors [97] but our understanding of how they gained the capacity to colonize different localities with wide ranges of biotic and abiotic conditions remains limited. The first step to address this question is to characterize the population structure of a polyploid species and determine where it originated and how it spread. To that end, one has to ob- tain a wide-range sampling of a polyploid species, generate molecular data for many samples, and use fast and powerful computational tools to infer demo- graphic histories. C. bursa-pastoris is a polyploid species that grows under a large range of environmental conditions world-wide [90] and its global pop- ulation structure, place of origin and colonization history was not known. In this study, we genotyped a large sample of C. bursa-pastoris accessions with individuals sampled across Eurasia and a few sites from North America. We then analyzed its genetic structure and demographic history. We genotyped 261 accessions of C. bursa-pastoris with genotyping-by- sequencing (GBS) and obtained 4,274 SNPs after quality control. The GBS technique uses restriction enzymes to selectively target DNA sequences out- side of repetitive regions and to obtain SNPs in multiple samples with high efficiency [98]. Phasing of the GBS data was not feasible due to low cover- age, so we analyzed the unphased data. To avoid bias in estimates of polymor- phism, we removed sites showing fixed heterozygosity because most of them represented differences between the two subgenomes and not real polymor- phism. We determined the population genetic structure with model-based an- cestry inference implemented in ADMIXTURE [99]. The demographic history

24 was inferred by comparing alternative demographic models in the coalescent- based approximate Bayesian computation framework (ABC) [100]. To further characterize the inferred populations, we also estimated pairwise genetic dif- ferentiation between them [101] and their genetic diversity with Watterson‘s theta (θW ) [60] and mean pairwise nucleotide differences (π) [59].

The ABC approach is a likelihood-free method based on summary statistics of the data that allows estimating several parameters (e.g. di- vergence time, migration rates, effective population size) of a given demographic model. For the sake of simplicity, let us assume a stan- dard coalescent model with a single parameter, θ = 4Neµ. Briefly, one first starts by calculating summary statistics of the data, for instance, the nucleotide diversity in the sample, πobs. In a second step, values of the model parameter, in our example, values of θ, are randomly sampled from a prior distribution and used in coalescent simulations under the demographic model which parameters we want to estimate. For each simulation run, we then compute the simulated summary statistic, πsim, and compare it to the observed estimate, πobs. If the difference between πsim and πobs is less than a threshold value (tolerance) we keep the value of θ used in the simulation run. Otherwise, we discard it. We repeat this a certain number of times until we obtain a posterior distribution of the parameter. ABC can handle complex demographics models and meth- ods have been developed to compare the probability of the different models [102].

C. bursa-pastoris could be divided into three genetically distinct clusters: one cluster grouped accessions from Western Europe and Southeastern Siberia (hereafter EUR), a second included samples from the Middle East and Northern Africa (hereafter ME), and the third one united accessions from Eastern Asia (hereafter ASI) (Fig. 2.1). The ABC analysis supported the hypothesis that C. bursa-pastoris most likely originated in the Middle East and then spread towards Europe and later into Asia. The ME population diverged 167 250 years ago from an ancestral population, the EUR population diverged from ME 7 967 years ago, and the Asian population diverged from the Middle Eastern population 942 years ago. The estimated current effective population sizes (Ne) were 14 748, 21 130 and 25 376 for ASI, EUR and ME, respectively. The gene flow between these populations was moderate and each population underwent a bottleneck at its origin with the strongest bottleneck in ASI. The ME population, as the putative origin population, was the most genetically diverse and the ASI population was the least diverse. A recent colonization of the world by C. bursa-pastoris was in agreement with earlier studies. Previous estimates of the origin time of C. bursa-pastoris were around 67,770 ya [95], 80,000 ya [103] and 128,000 ya [78], which were close to our estimate of the origin of the ME population some 167,250 year ago.

25 A

FST=0.19

FST=0.10 FST=0.16

B

Figure 2.1. Genetic structure and ancestry of C. bursa-pastoris accessions in Eura- sia and Northern America. A. Map with the mean membership proportions inferred by ADMIXTURE for the three detected clusters in samples collected from the same site; FST : pairwise genetic differentiation [101]. B. Population structure of C. bursa- pastoris inferred with ADMIXTURE. Each vertical line represents an individual. Col- ors represent the inferred ancestry from three ancestral populations.

The discovered genetic differentiation between ASI, EUR and ME was also in line with previous studies [104, 95]. We further proposed that the colonization history of C. bursa-pastoris was human-mediated. This was primarily sup- ported by the accessions from the Russian Far East that were geographically closer to the Asian cluster but genetically belonged to the European one. This long-distance dispersion was likely carried out by humans along the Trans- Siberian Railway, or through human colonization of Siberia in the 16th cen- tury. Similarly, the origin of the North American populations from the EUR and ME populations was best explained by human-mediated movement (see [76] and references therein). As illustrated by the following three papers, this strong population structure and the recent multistage colonization history played a key role in the evolution of C. bursa-pastoris.

Paper II. Adaptation and neutral processes The complex population structure and demographic history of species add to the difficulty of telling apart the effects of natural selection and genetic drift. In structured populations, observed phenotypic differences can simply be due to genetic drift, and if the genetic distance between populations is not con-

26 A B C

Figure 2.2. Differentiation between the three populations of C. bursa-pastoris in climatic parameters, gene expression levels and genetic variation. A. A PCA of 20 climatic variables for 65 populations. B. A multidimensional scaling plot of the expression levels in 19 484 genes from 24 accessions. Distances can be interpreted in terms of log2-fold-change (logFC). C. A PCA of 261 accessions based on 4 274 nuclear SNPs obtained by GBS.

sidered, the differentiation may be falsely interpreted as adaptive. The preva- lence of neutral evolutionary processes has been suggested for gene expression evolution in various organisms, including humans [105], plants [106] and fish [107]. The variation in gene expression was often found to be correlated with genetic distance and thus was more likely a consequence of genetic drift than adaptive divergence. In the present study, we assessed the effect of popula- tion genetic structure on variation in gene expression and flowering time in C. bursa-pastoris. To assess population structure, we used the GBS data generated for 261 ac- cessions in Paper I [96]. The gene expression data was generated with RNA- Seq for a subset of 24 accessions. We also measured the variation in flowering time and circadian period length in the 261 accessions. We used flowering time and circadian rhythm as potentially adaptive traits because the onset of flower- ing at the right time ensures reproductive success, and the regulatory network of flowering time overlaps with the regulatory pathways of circadian rhythm [108, 109]. To compare the environmental differences between populations, we retrieved climatic data for the collection sites (20 bioclimatic variables) [110, 111]. The variation in climatic variables, gene expression, and GBS data was explored with a principal component analysis (PCA), multidimensional scaling plot (MDS), correlation tests and linear models. The correction for the population structure was done by using the phylogenetic distance between samples as a random effect in generalized linear mixed models. The Asian (ASI), European (EUR) and Middle Eastern (ME) populations of C. bursa-pastoris showed clear differentiation in climatic conditions, gene expression variation and genetic polymorphism (Fig. 2.2). Variation in flow- ering time and circadian period length was significantly associated with these genetic groups. Controlling for population structure did not affect this associ-

27 ation indicating that variation in these traits could be adaptive. We also found significant differences between the three populations in gene expression. Out of 19 484 analyzed genes, 2 457 genes were differentially expressed between ASI and EUR, 1 317 between ASI and ME, 903 between EUR and ME. Some of these genes control flowering time variation and thus these differences could be adaptive. However, all significant expression differences were removed when correcting for the population genetic structure. This indicated that most differences were due to genetic drift. We assumed that these results could also be due to the confounding of population structure and distribution of adaptive traits. To bypass this effect, we also compared gene expression differences be- tween early and late flowering ecotypes within the ASI and EUR clusters. We found different sets of genes showing differential expression in each of these two populations. Among these genes, FLOWERING LOCUS C (FLC), a gene known to regulate flowering time, was the best candidates for local adaptation in EUR. These results exemplify how a demographic history that results in a strong population structure may complicate the interpretation of gene expression dif- ferences within a species. On the one hand, gene expression differences be- tween populations of C. bursa-pastoris detected without the correction of pop- ulation structure could be interpreted as adaptive. However, all these differ- ences were explained by the population structure. Thus, the potentially adap- tive changes were likely false positives. It has been shown how easy it is to develop biologically sensible explanations of the adaptive importance for the difference detected in the data simulated under a purely neutral model [112]. On the other hand, the three populations inhabit regions with different climatic conditions and variation in gene expression between these populations is un- likely to be purely neutral. For example, variation in flowering time and cir- cadian rhythm was not purely neutral. Thus, either the distribution of adaptive traits in C. bursa-pastoris coincides with population structure or the effect of local adaptation, if present, is weak. Finally, local adaptation may get even more complicated if selection varies over the entire geographical range but acts at a very local scale. For example, it seems that neither humans nor Arabidopsis thaliana have experienced large-scale local adaptation across their geograph- ical range, but rather local selection within different populations [113, 114]. Evidence of selection on FLC and some other genes in EUR and ASI also supports this hypothesis of a more local adaptation for C. bursa-pastoris.

Paper III. Differential evolution of two subgenomes across populations It remains largely unknown how the two genomes evolve once they have fused in an allopolyploid species and how strongly their evolutionary trajectories de- pend on the initial differences between the two parental species and the specific

28 0.10

0.09

0.08

0.07

0.06

0.05 Deleterious / Total

0.04

Co Cg Co Cg Co Cg CO CG ASI EUR ME Figure 2.3. Genetic load in the subgenomes of C. bursa-pastoris and its parental species. The proportion of deleterious nonsynonymous changes was estimated on de- rived alleles, i.e. alleles accumulated after the speciation of C. bursa-pastoris. Co and Cg are the two subgenomes of C. bursa-pastoris. ASI, EUR, ME, CO, CG indi- cate Asian, European and Middle Eastern populations C. bursa-pastoris, and parental species C. orientalis and C. grandiflora, respectively.

demographic history of the newly formed allopolyploid species. For example, species that went through repeated bottlenecks during their range expansion are expected to have reduced genetic variation and higher genetic load than more ancient central populations [115, 116]. Populations can also admix with different related species and such admixture could shift the evolutionary path of the local population. Finally, range expansion will expose newly formed allopolyploid populations to divergent selective pressures, providing the pos- sibility of differentially exploiting the two subgenomes, and creating asym- metrical patterns of adaptive evolution in different populations. In this paper, we tested whether the two subgenomes of C. bursa-pastoris have similar or different evolutionary trajectories in term of hybridization, selection and gene expression in three known populations. We phased 31 genomes and 24 transcriptomes for variation between the two subgenomes, one descended from the outcrossing and highly diverse C. gran- diflora (hereafter CbpCg) and the other one descended from the selfing and ge- netically depauperate C. orientalis (hereafter CbpCo). To perform the phasing, we used genomic data of 10 C. orientalis and 13 C. grandiflora as references. For each subgenome, we assessed its phylogenetic relationship with the diploid relatives, temporal changes of effective population size (Ne), signatures of pos- itive selection, amount of accumulated deleterious mutations (genetic load), genetic diversity, and levels of gene expression. We also performed tests for admixture of the phased subgenomes of C. bursa-pastoris with other Capsella species.

29 Figure 2.4. Phylogenetic relationships between subgenomes of C. bursa-pastoris and other Capsella species. A. Whole genome neighbor-joining phylogenetic tree. The bootstrap support based on 100 replicates is shown only for the major clades. The root (Neslia paniculata) is not shown. B. Overlapping of 1002 neighbor-joining trees reconstructed with 100 Kb sliding genomic windows. ASI, EUR ME, CO, CG, CR indicate Asian, European and Middle Eastern populations of C. bursa-pastoris, C. orientalis, C. grandiflora, and C. rubella, respectively. The CbpCo and CbpCg subgenomes are marked with Co and Cg, respectively.

30 We found that during ∼100,000 generations of co-existence within the same species, the effective population sizes, Ne, of the two subgenomes decreased gradually and converged in all three regions. However, the two subgenomes retained part of the initial difference between the two parental species. The CbpCg subgenome, inherited from a genetically more diverse parent, was also more diverse than the CbpCo subgenome inherited from a selfing parent with a very low genetic diversity. The amount of genetic load also differed between subgenomes and indicated that purifying selection removed deleterious muta- tions more efficiently from CbpCg (Fig. 2.3) that had a larger effective popu- lation size than CbpCo. However, purifying selection was not as efficient as in outcrossing C. grandiflora, but it was more efficient than in selfing diploid C. orientalis. These differences in diversity and load, as well as differences in patterns of positive selection and levels of gene expression, also strongly depended on the specific histories of the three populations considered. The Asian population that was the most distant from the tentative place of origin of the species was the least genetically diverse and had the highest number of deleterious mutations. This population also experienced more selective sweeps on the CbpCo subgenome, whereas the European and Middle Eastern popula- tions experienced positive selection mostly on the CbpCg subgenome. Most strikingly, and unexpectedly, the three populations hybridized with different diploid relatives. There was strong hybridization between C. orientalis and C. bursa-pastoris in Asia and this admixture profoundly affected the phylo- genetic relationship between subgenomes and parental species (Fig. 2.4). It looked like C. orientalis was derived from the Asian C. bursa-pastoris. This pattern could also be retrieved if one assumes a scenario of multiple origins of C. bursa-pastoris, but it seems more likely to be the result of admixture. Admixture was also detected between C. rubella and C. bursa-pastoris in Eu- rope. There were no major changes in gene expression between subgenomes, though the CbpCg subgenome was slightly more expressed than CbpCo in Eu- rope and the Middle-East, whereas there was rather equal expression between the subgenomes in Asia. In summary, the shift to selfing and polyploidy had a global impact on the two subgenomes of C. bursa-pastoris resulting in a sharp reduction of the ef- fective population size in all populations, that was accompanied by relaxed purifying selection and accumulation of deleterious mutations. However, the two subgenomes still retained a strong signature of parental legacy and were differentially affected by introgression and selection in different geographi- cal regions. Hence, this study illustrates that, first, the parental legacy has a long-term impact on the evolution of allopolyploid species, and second, the evolutionary trajectories of allopolyploid populations may differ depending on the ecological arena.

31 Paper IV. Convergence and parental legacy in the subgenomes The impact of allopolyploidy on genome diversity and expression pattern still remains to be understood and put in a clear temporal perspective. Some studies have described genome upheaval and ”transcriptomic shock” [117, 118, 119], whereas others detected a strong legacy from the parental species and much more subtle and concerted changes [120, 121]. These differences across stud- ies could result from many factors such as age and demographic history of the alloploid species or extent of the divergence between the parental species. In the present study, we investigated genomic and gene expression changes in the two subgenomes of the recent allopolyploid C. bursa-pastoris. We sequenced the transcriptomes of three tissues (root, leaf, and flower) of C. bursa-pastoris, its two parental species, C. grandiflora and C. orientalis, and its close relative C. rubella. We compared the overall and homeologue- specific levels of expression between C. bursa-pastoris and its parental species. To control for mapping bias in gene expression data, we used the genomic data generated in the previous study (Paper III) [79]. We compared both the total expression level (CbpCo + CbpCg) as well as homeologue-specific expres- sion with correlation, differential expression analyses and a similarity index (S), which measured relative expression deviance of each subgenome from the average parental expression level. We also compared the distribution of synonymous and deleterious mutations between the two subgenomes with a model-based approach that accounts for an inherent bias towards one of the subgenomes. We then accounted for this bias and tested if the deleterious mu- tations were randomly distributed between the two subgenomes or whether they occurred more frequently on one of them than expected by chance. Fi- nally, we checked whether the proportion of deleterious mutations per gene and gene expression levels were correlated in any way. The gene expression ratio between subgenomes was overall equal (0.496) and not different from the ratio inferred in the DNA count data (0.497), where no difference between subgenomes is expected. Levels of gene expression were strongly correlated between the subgenomes (Spearman’s ρ 0.87-0.94), and expression in each subgenome was even more correlated with the corre- sponding parental species (Spearman’s ρ 0.88-0.97). These strong correlations were further supported by the differential gene expression analysis, which re- vealed that 55-80% of the genes depending on the tissue were of the same level in all three species (Table 2.1). However, a small proportion of genes showed the expression dominance of one of the parental species in the total level of expression. Notably, this dominance was opposite between the two tissues: the total expression level was more similar to C. orientalis in flower, and to C. grandiflora in root (Fig. 2.1). Interestingly, the expression of the selfing species, C. rubella, was the most similar to the expression pattern of its closest relative, the outcrossing C. grandiflora in all tissues including flower. This

32 Table 2.1. Gene expression levels in C. bursa-pastoris and the parental species

Expression pattern Flower Leaf Root

No difference 8 805 12 656 10 278 (55.6%) (80.0%) (65.0%)

CO Cbp CG

Additivity 1 498 713 941 (9.5%) (4.5%) (5.9%)

CO Cbp CG CO Cbp CG

Dominance 2 164 506 687 (13.7%) (3.2%) (4.3%)

CO Cbp CG CO Cbp CG 1 229 720 1 297 (7.8%) (4.6%) (8.2%)

CO Cbp CG CO Cbp CG

Transgressive 1 139 565 1 226 (7.2%) (3.6%) (7.7%)

CO Cbp CG CO Cbp CG CO Cbp CG 989 664 1 395 (6.2%) (4.2%) (8.8%)

CO Cbp CG CO Cbp CG CO Cbp CG

CO, CG, and Cbp correspond to C. orientalis, C. grandiflora, and C. bursa-pastoris, respectively. The y-axes show levels of expression. The levels were considered different if they showed significant differential expression (FDR < 0.05). The most striking differences showing reversed levels of expression between tissues are highlighted in bold.

33 suggested that the detected similarity of C. bursa-pastoris and C. orientalis in flower tissues was not purely due to selfing. The variation between tissues was further confirmed by computing a similarity index (S), which was systemati- cally biased toward the corresponding parental species in each subgenome but this bias varied between tissues. In addition, the correlation of S values of homeologous genes of two subgenomes suggested that these genes were gen- erally regulated into the same direction. The slope of the regression between the expression ratio of the subgenomes over the expression ratio of the parental species was always less than one, indicating the presence of co-regulation of the two subgenomes through a mixture of trans- and cis-regulation [122, 123]. The slope was less in flower tissue than in leaf and root tissues, suggesting a stronger trans-regulation in the former than in the latter. Finally, the deleteri- ous mutations accumulated more on the CbpCo subgenome relative to the syn- onymous mutations, the distribution of deleterious mutations between the two genomes had a smaller variance compared to the distribution of synonymous mutations. This suggests that both copies are needed and prevent to accumu- late too many deleterious mutations. We also found no association between levels of expression and the number of deleterious mutations in genes. To conclude, since the speciation of C. bursa-pastoris around 100,000 years ago, some of the major differences between the ancestral genomes started to be erased and the two subgenomes have started to act in a coordinated manner. However, a significant legacy effect on the number of deleterious mutations carried by the two subgenomes, and on the total expression level in the three tissues can still be detected.

34 3. Conclusions and future perspectives

In the twenty-first century, the research on polyploidy has become particularly molecular-centered due to the availability of huge amount of whole-genome sequencing data. Although this thesis is also primarily based on the whole- genome data, it differs from many modern works by considering not only molecular but also demographic and ecological aspects of polyploidy evolu- tion. In Paper I, we have shown that C. bursa-pastoris is structured into three genetically distinct populations that have different effective population size and thus are unequally affected by random genetic drift. In Paper II, we fur- ther bolster the differentiation between these populations by showing the dif- ferences in climatic, phenotypic and gene expression variation between them. In Paper III, we reveal that selection targets different subgenomes across these populations and that the populations carry a different amount of genetic load. Moreover, C. bursa-pastoris hybridizes with different species depending on geographical region and this hybridization has a profound effect on molecular variation in some populations. All these results emphasize the importance of studying polyploid species across their distribution range because evolutionary processes may substantially differ across populations and sampling only one population could lead to biased results and misleading conclusions. The major global changes relative to the parental species would, of course, be evident in all populations of an allopolyploid. In C. bursa-pastoris, these changes include a steep decrease in the effective population size, relaxed pu- rifying selection and no strong expression dominance between subgenomes. There is also a long-lasting effect of parental legacy in the amount of ge- netic load between subgenomes. From these findings, it can be concluded that C. bursa-pastoris experienced some events shared by many polyploids. For example, it went through a demographic bottleneck and shifted to self- ing which led to reduced efficacy of natural selection. Gene expression in C. bursa-pastoris as in many other allopolyploids was not disrupted, though some allopolyploids experience profound expression changes (so-called ”tran- scriptomic shock”). However, I still would like to point out that smaller scale variation in these effects across population was significant and cannot be ig- nored. This thesis also addresses the question of adaptation in polyploids. Our results on local adaptation seem to strongly depend on the levels of integra- tion and geographical scale that are investigated. Signs of global adaptation in flowering time and circadian rhythm variation are observed in Paper II. On the other hand, gene expression variation does not depart from neutral expecta- tions. In Paper III, variation in positive selection across subgenomes depends

35 on population and could be considered as indirect evidence of local adaptation. However, this inference is not straightforward as the observed patterns could also be due to complex demographic history, especially admixture with differ- ent sister species. The general view of low local adaptation, or possibly mal- adaptation in some subpopulations, is also supported by a large-scale, multi- site, common garden experiment that is not included in this thesis (Manuscript in prep.). Hence, despite being distributed across diverse environments and climates around the globe, the evolution of C. bursa-pastoris seems to have been dominated by neutral rather than adaptive processes. In addition, this thesis further exemplifies how C. bursa-pastoris can be used as a model species to study polyploid evolution. In Paper III and IV, we successfully phased a major part of C. bursa-pastoris genome. Such phasing is only possible because C. bursa-pastoris is of known origin, it exhibits a strictly disomic inheritance, the genomic data of parental species is available, it has a small genome, and the fully assembled and annotated genome of a closely re- lated species is available. These conditions are not met in many allopolyploid species. Paper III shows how this phased data sheds new light on the differen- tial evolution of the subgenomes across populations of C. bursa-pastoris. Pa- per IV, that is based on the phased expression data of C. bursa-pastoris and its parental species, demonstrates that gene expression regulation can vary across different tissues and that there is a strong legacy from the parental species. Finally, these thesis has already shed light on many questions of polyploid evolution, but if I had a chance to continue working on this project, I would work in the following directions. First, I would further extend Paper IV with additional gene expression analyses and a stronger focus on different popula- tions. Second, I would verify the demographic history reconstructed in Paper II but this time with the ABC analysis of the phased genomic data. Third, I think one of the most intriguing questions would be to compare the abso- lute expression levels between C. bursa-pastoris and its diploid relatives at the single-cell level. Fourth, it would be most exciting to analyze genomic data with good quality reference genome assemblies of all four Capsella species to eliminate all technical issues I had to cope with during the work on this thesis. Finally, as a long time perspective, it would also be interesting to look at small RNAs and epigenetic regulation in C. bursa-pastoris.

36 Svensk sammanfattning

Termen allopolyploid avser en organism som har sitt ursprung genom hybri- disering och polyploidi. Hybridisering förenar två genom från olika arter till en organism, medan polyploidi dubblerar (ibland tripplerar eller till och med fyrdubblar) uppsättningarna av kromosomer. Dessa modifieringar har vanligt- vis negativa konsekvenser för organismen. Det finns dock arter som inte bara överlever dessa modifieringar utan till och med frodas och konkurrerar ut nor- mala arter. Det finns flera teoretiska förklaringar till detta fenomen, men antalet empiriska studier är begränsat. Capsella bursa-pastoris, känd som lomme, håller på att bli en modell för studier av en framgångsrik allopolyploid art. C. bursa-pastoris förekommer över hela världen, medan dess föräldraarter, Capsella grandiflora och Cap- sella orientalis, har mer begränsade distributionsområden. C. grandiflora är begränsad till norra Grekland och Albanien, och C. orientalis finns på stäp- perna i Centralasien. Genetiska studier på C. bursa-pastoris startades i början av 1900-talet. Studier om C. bursa-pastoris evolutionsbiologi påbörjades dock senare. Det var exempelvis osäkerhet runt ursprunget av C. bursa-pastoris och det var endast några få år sedan det blev klarlagt att C. bursa-pastoris är en al- lopolyploid. Den världsomspännande distributionen av C. bursa-pastoris var länge känd, men om det fanns någon populationsstruktur, var arten härstamma- de ifrån och hur den spred sig fastställdes inte. C. bursa-pastoris allopolyplo- ida ursprung leder också till frågor om utvecklingen av de två subgenom som ärvts av ganska olika föräldrar, en genetiskt fattig självbefruktare (C. orienta- lis) och en extremt genetiskt varierad fakultativ utkorsare (C. grandiflora). Om C. bursa-pastoris står inför olika miljöförhållanden runt om i världen, anpas- sar den sig lokalt till dessa olika miljöer? Hur påverkar självreproduktionen, som är känd för att leda till ökad genetisk belastning och inavelsdepression, C. bursa-pastoris prestanda? Jag har analyserat genomiska och transkriptomis- ka data samt några fenotypiska egenskaper för att svara på alla dessa frågor i min avhandling. I Artikel I beskrev vi den genetiska variationen i C. bursa-pastoris och visade att den inte är homogen över Eurasien utan snarare indelad i tre gene- tiskt distinkta populationer: den första består av individer från Europa och öst- ra Sibirien, den andra ligger i östra Asien, och den tredje grupperar individer från Mellanöstern och Nordafrika. Rekonstruktion av koloniseringshistorien föreslog att denna art härstammar från Mellanöstern och därefter spridits till Europa och östra Asien. Vi upptäckte också att anslutningar från ryska Fjärran Östern tillhörde den europeiska gruppen genetiskt men befann sig närmare den

37 asiatiska gruppen. Detta indikerade att denna spridning mest sannolikt utfördes av människor längs den transsibiriska järnvägen, eller genom kolonisering av Sibirien av människor under 1500-talet. På samma sätt är ursprunget för de nor- damerikanska populationerna från Europeiska och Mellanöstern-populationer sannolikt också resultatet av människoförmedlad spridning. I Artikel II testade vi om det finns tecken på lokal anpassning i genuttryck och variation av blomningstid i C. bursa-pastoris samt om det finns någon effekt av den upptäckta populationsgenetiska struktur på variation av dessa två egenskaper. I strukturerade populationer kan de observerade fenotypiska skillnaderna helt enkelt bero på genetisk drift, och om det genetiska avståndet mellan populationer inte beaktas, kan differentieringen felaktigt tolkas som ad- aptiv. Vi upptäckte att populationerna av C. bursa-pastoris från Asien, Europa och Mellanöstern visade tydlig differentiering i klimatförhållanden, genuttryck och blomningstid. Variation i blomningstid, en potentiellt adaptiv egenskap ef- tersom blomningsstart vid rätt tidpunkt säkerställer reproduktiv framgång, på- verkades svagt av populationens genetiska struktur vilket indikerade att varia- tion i denna egenskap kunde vara adaptiv. Emellertid försvann alla skillnaderna i uttryck efter korrigering för populationens genetiska struktur. Detta indike- rade antingen att de flesta skillnaderna i genuttryck var neutrala eller att den lokala anpassningen, om den var närvarande, var svag och korrelerad med po- pulationsstrukturen. Vi upptäckte också några bevis för att anpassningen kunde vara mer lokal och variera inom varje genetisk grupp. I Artikel III och IV fokuserade vi på utvecklingen av de två subgenom- en hos C. bursa-pastoris och mängden kvarvarande föräldraarv i dessa två subgenom. Vi upptäckte att båda subgenomen uttryckte kraftig minskning av genetisk mångfald efter artbildningen av C. bursa-pastoris, förmodligen på grund av självbefruktning. Trots deras gemensam historia som en art behöll de två subgenomen också skillnader i mängden genetisk belastning som ärvts av föräldraarterna och påverkades differentiellt av introgression och urval i de tre populationerna. Variationen i genuttryck var högt korrelerad mellan de två subgenomen men en liten del av generna visade differentiering över populatio- ner och vävnader. Det fanns flera gener som visade överuttryck av subgenomet som ärvts av C. grandiflora i Mellanöstern och Europa, medan de två subge- nomen uttrycktes lika i Asien. I blommor hade C. bursa-pastoris ett uttryck som liknar C. orientalis både i den övergripande nivån av transkript och på subgenomnivån, medan den i blad och rötter liknade C. grandiflora. Sammanfattningsvis visar denna avhandling för första gången att allopoly- ploider kanske inte är homogena i sina distributionsområden och de kan också markant följa olika utvecklingsvägar i olika populationer. Denna avhandling visar också att trots om sammanslagningen av två divergerande genom i al- lopolyploider resulterar i deras konvergens, så kan vissa skillnader bibehållas och troligtvis möjliggör dessa skillnader evolutionär flexibilitet hos allopoly- ploida genom.

38 Резюме українською

Алополіплоїд - це організм, який утворився шляхом гібридизації та одно- часного збільшення набору хромосом (поліплоїдія). Гібридизація та полі- плоїдія зазвичай мають негативні наслідки для організму. Однак є види, які не тільки виживаюсь при цих модифікаціях, а навіть процвітають та конкурують з батьківськими видами. Є кілька теоретичних пояснень цьо- го явища, але кількість практичних досліджень обмежена. Вид Capsella bursa-pastoris, відомий як грицики звичайні, є перспе- ктивною моделлю для вивчення таких успішних алополіплоїдних видів. C. bursa-pastoris поширена по всьому світі, тоді як її батьківські види, Capsella grandiflora та Capsella orientalis, мають значно менші ареали. Поширення C. grandiflora обмежується Грецією та Албанією, а C. orienta- lis зустрічається в Середній Азії. Генетичні дослідження грициків звичай- них були розпочаті на початку XX століття. Проте дослідження еволюцій- ної біології C. bursa-pastoris було розпочато значно пізніше. Наприклад, походження C. bursa-pastoris залишалось невідомим досить довго і тіль- ки кілька років тому з’ясувалося що C. bursa-pastoris є алополіплоїдом. Про всесвітнє поширення C. bursa-pastoris було відомо давно, але не було встановлено чи існує у цього виду певна генетична структура популяції, звідки цей вид походить і яким чином він поширився. Алополіплоїдне по- ходження C. bursa-pastoris також ставить питання щодо еволюції її двох субгеномів, які були успадковані від досить різних батьків - генетично бідного пращура C. orientalis, який розмножується самозапиленням, та надзвичайно генетично різноманітного пращура C. grandiflora, який роз- множується перехресним запиленням. Також якщо C. bursa-pastoris зу- стрічається в різних середовищах по всьому світі, як вона адаптується до цих різноманітних умов? Я проаналізував геномні, транскриптомні та де- які фенотипічні дані C. bursa-pastoris для вирішення всіх цих питань в цій дисертації. У статті I ми описали генетичні варіації C. bursa-pastoris і показали, що цей вид не є однорідними, а поділяться на три генетично відмінних популяції на території Євразії: одна розповсюджена в Європі та на Сході Сибіру, друга - в Східній Азії, а третя - на Близькому Сході та Північній Африці. Реконструкція історії колонізації цих регіонів показала, що цей вид зародився на Близькому Сході й згодом поширився в Європу та Схі- дну Азію. Ми також виявили, що зразки зі Сходу Сибіру генетично нале- жали до європейської групи, але були розташовані ближче до азіатської. Це дозволило припустити, що така далека колонізація швидше за все здій- снилась за допомогою людей вздовж Транссибірської магістралі, або під

39 час колонізації Сибіру людьми в XVI столітті. Аналогічним чином, похо- дження північноамериканської популяції від європейської та близькосхі- дної популяції імовірно є наслідком розповсюдження з людьми. У статті II ми перевірили наявність ознак локальної адаптації в експре- сії генів та варіації часу початку цвітіння, а також перевірили чи варіації цих двох рис залежать від виявленої генетичної структури. Фенотипічні відмінності, що спостерігаються у структурованих популяціях, можуть бути пов’язані з дрейфом генів, і якщо не брати до уваги генетичну від- стань між популяціями, відмінності можуть бути помилково інтерпрето- вані як адаптивні. Ми виявили, що всі три популяції демонструють чі- тке розмежування в кліматичних умовах, експресії генів та в часі поча- тку цвітіння. На варіацію в часі цвітіння, як потенційно адаптивної риси, оскільки початок цвітіння в потрібний час забезпечує репродуктивний успіх, популяційна генетична структура мала слабкий вплив, що вказує на те, що варіація цієї ознаки може бути адаптивною. Проте всі відмінно- сті в експресії генів зникли після корекції даних на генетичну структуру популяції. Це вказує на те, що більшість відмінностей в експресії генів є нейтральними або місцева адаптація, якщо вона присутня, є слабкою та збігатися зі структурою популяції. Ми також виявили деякі докази, що адаптація може бути більш локальною в межах кожної генетичної групи. У статтях III та IV ми зосередили увагу на еволюції двох субгеномів та кількості батьківських рис, що залишились в цих двох субгеномах. Ми виявили, що обидва субгенома проявляють зниження генетичної різно- манітності після видоутворення C. bursa-pastoris, ймовірно через самоза- пилення. Проте, не зважаючи на спільну історію в межах одного виду, ці два субгеноми також зберегли відмінності в кількості генетичного тягаря, успадкованого від батьківських видів, а також зазнали різного впливу від- бору та гібридизації з близькими видами. Варіація експресії генів сильно корелювала між субгеномами, але мала частка генів відрізнялась в екс- пресії між популяціями та тканинами. На Близькому Сході та в Європі більша частка генів демонструвала надмірну експресію субгеному успад- кованого від C. grandiflora, тоді як в Азії субгеноми були більш рівні між собою. У квітах експресія генів була більш подібна до C. orientalis, тоді як у листках та корінні вона була більш схожа з C. grandiflora. На завершення варто зазначити, що ця дисертація вперше демонструє, що алополіплоїди можуть бути генетично неоднорідними по території поширення й еволюційні шляхи можуть різнитися між популяціями. Ця робота також виявила, що хоча об’єднання двох різних геномів у алопо- ліплоїдах призводить до їх еволюційного зближення, деякі відмінності між ними можуть бути збережені досить довго, і ці відмінності ймовірно забезпечують еволюційну гнучкість алополіплідних видів.

40 Acknowledgements

There were many people who helped me during the course of this thesis as well as on the way to it and I would like to reflect on all of them here. First of all, I would like to express my sincere gratitude to Martin Lascoux, my Ph.D. supervisor, for giving me a chance to do this Ph.D. project. I am very thankful for your patient guidance, enthusiastic encouragement, gentle criti- cism and constant positivity. These four years have been exciting, challenging and gratifying. You taught me to like negative and intricate results. I also have to admit that thanks to you I do appreciate neutral evolutionary processes now. I would like to express my very great appreciation to Stephen Wright and Sylvain Glémin for their advice and assistance during the work on this Ph.D. thesis. Stephen, thank you for your hospitality during my visit to Toronto and for your prompt replies during our remote communication. Sylvain, your ideas were fascinating and inspiring. I would also like to thank Ulf Lagercrantz for co-supervising this project. Your assistance on the RNA-Seq experiments was very valuable. Special thanks go to Karl Holm, Amandine Cornille, Jun Chen, Maria Guerrina, Clément Lafon-Placette and Marion Orsucci for their coopera- tion in some projects of this thesis. Karl, your time and willingness to help when I just started my Ph.D. studies is greatly appreciated. Amandine, it has been fun to share the office with you and to experience fickle Swedish weather during the common garden experiment. Jun, your expertise in genomic analy- ses was invaluable. Maria and Clément, thank you for helping with the crosses. Marion, I greatly appreciate your help at the final stage of this thesis when there was strong time pressure. I also wish to thank Mimmi Eriksson and Tianlin Duan for doing their Master’s project with us. Your persistence and independence was impressive and the results you produced were a great contribution to this thesis. My grate- ful thanks also go to our collaborators in Toronto. Adriana Salcedo, Julia Kreiner, Tyler Kent, Huirun Huang and John Stinchcombe, working with you has been pleasant and productive. I also wish to recognize the valuable help of Kerstin Jeppsson whose assis- tance in the lab was vital. Without your responsive help and timely provision of chemicals, none of my Ph.D. projects would be finished in time. I express my very profound gratitude to my friend Glib Mazepa, who en- couraged me to apply to this Ph.D. program and helped with the application process. Your contribution to this thesis was indirect but extremely valuable.

41 My sincere thanks also goes to Yevgen Ryeznik for his continuous encourage- ment. Talking to you in some down moments helped me to keep going. It was also a pleasure to work with you on our Data Science project. My grateful thanks are also extended to all people at Evolutionary Biology Center and the Department of Plant Ecology and Evolution, in particular. I am thankful to Andres Cortes, Xiaodong Liu, Lili Li, Judith Trunschke, Elodie Chapurlat, Giulia Zacchello, Luis Leal, Anna-Malin Linde, Fia Bengts- son, Charles Campbell, Matthew Tye, Linus Söderquist, Froukje Postma, Thomas Ellis, Venkat Talla, Ludovic Dutoit, Nagarjun Vijay, Moos Blom, Foteini Spagopoulou, Homa Papoli, Paulina Bolivar, Willian Silva, Ser- gio Tusso, Lore Ament, Kristaps Sokolovskis, Josefine Stångberg, Ivain Martinossi, Matthias Weissensteiner, and Jesper Svedberg for stimulating scientific discussions and all the fun we had together. I also would like to thank the organizers of all courses I took during my Ph.D. studies. I am especially grateful to Lucas Sinclair, Ludovic Dutoit, and Alexander Eiler for the Python course, Raazesh Sainudiin for the Data Science course, Andreas Dahlin for the Visualize your Science course, Terese Bergfors for the Writing Scientific English course, and Göran Arnqvist for the Statistics course. These courses had the most profound impact on my sci- entific skills. Probably, I would never be able to do this Ph.D. thesis without the knowl- edge and experience I gained during the Erasmus Mundus Master Program in Evolutionary Biology. I would like to pay my regards to people who run this program, in particular, Franjo Weissing, Irma Knevel and Jacob Höglund. When I look back and think about the time I decided to pursue a scientific career, I wish to acknowledge my very first scientific supervisor, Gennadiy Shandikov. You kept telling me to do what I like and dispelled all my doubts. I am glad I followed your advice and I owe you a debt of gratitude for the very first step towards a Ph.D. degree. I must express my very profound gratitude to my English teacher, Nikolay Yakymenko. You opened me a door to international science and made this Ph.D. thesis possible. Finally, this accomplishment would not have been possible without unfail- ing support from my parents, sister, and wife. They have given up many things for me to do this Ph.D. thesis. Thank you for your patience, under- standing and motivation during these last four years and my life in general. My sister deserves special thanks for designing the cover of this thesis. I also wish to thank my little daughter who was born just at the time of writing this thesis. I will blissfully remember restless days and sleepless night in the strug- gle to write this thesis and to calm you down.

42 References

[1] A. Salman-Minkov, N. Sabath, and I. Mayrose, “Whole-genome duplication as a key factor in crop domestication,” Nature Plants, vol. 2, p. 16115, 2016. [2] S. Renny-Byfield and J. F. Wendel, “Doubling down on genomes: Polyploidy and crop plants,” American Journal of Botany, vol. 101, no. 10, pp. 1711–1725, 2014. [3] M. C. Sattler, C. R. Carvalho, and W. R. Clarindo, “The polyploidy and its key role in plant breeding,” Planta, vol. 243, no. 2, pp. 281–296, 2016. [4] G. L. Stebbins, Processes of organic evolution. Prentice-Hall, 1971. [5] W. Wagner Jr, “Biosystematics and evolutionary noise,” Taxon, pp. 146–151, 1970. [6] R. J. Schultz, “Role of polyploidy in the evolution of fishes,” in Polyploidy, pp. 313–340, Springer, 1980. [7] D. E. Soltis, C. J. Visger, D. B. Marchant, and P. S. Soltis, “Polyploidy: Pitfalls and paths to a paradigm,” American Journal of Botany, vol. 103, no. 7, pp. 1146–1166, 2016. [8] I. Mayrose, S. H. Zhan, C. J. Rothfels, K. Magnuson-Ford, M. S. Barker, L. H. Rieseberg, and S. P. Otto, “Recently formed polyploid plants diversify at lower rates,” Science, vol. 333, no. 6047, pp. 1257–1257, 2011. [9] D. E. Soltis, M. C. Segovia-Salcedo, I. Jordon-Thaden, L. Majure, N. M. Miles, E. V. Mavrodiev, W. Mei, M. B. Cortez, P. S. Soltis, and M. A. Gitzendanner, “Are polyploids really evolutionary dead-ends (again)? A critical reappraisal of Mayrose et al. (2011),” New Phytologist, vol. 202, no. 4, pp. 1105–1117, 2014. [10] I. Mayrose, S. H. Zhan, C. J. Rothfels, N. Arrigo, M. S. Barker, L. H. Rieseberg, and S. P. Otto, “Methods for studying polyploid diversification and the dead end hypothesis: a reply to Soltis et al. (2014),” New Phytologist, vol. 206, no. 1, pp. 27–35, 2015. [11] E. A. Kellogg, “Has the connection between polyploidy and diversification actually been tested?,” Current Opinion in Plant Biology, vol. 30, pp. 25–32, 2016. [12] E. Mayr, Systematics and the origin of species, from the viewpoint of a zoologist. Harvard University Press, 1942. [13] M. L. Arnold, Natural hybridization and evolution. Oxford University Press, 1997. [14] J. Mallet, “Hybridization as an invasion of the genome,” Trends in Ecology & Evolution, vol. 20, no. 5, pp. 229–237, 2005. [15] J. Mallet, “Hybrid speciation,” Nature, vol. 446, no. 7133, pp. 279–283, 2007. [16] J. A. Coyne and H. A. Orr, Speciation. Sinauer Associates, Inc, 2004. [17] I. Schön, K. Martens, and P. van Dijk, “Lost sex,” The evolutionary biology of parthenogenesis, 2009.

43 [18] B. C. Barringer, “Polyploidy and self-fertilization in flowering plants,” American Journal of Botany, vol. 94, no. 9, pp. 1527–1533, 2007. [19] B. C. Husband, B. Ozimec, S. L. Martin, and L. Pollock, “Mating consequences of polyploid evolution in flowering plants: current trends and insights from synthetic polyploids,” International Journal of Plant Sciences, vol. 169, no. 1, pp. 195–206, 2008. [20] J. F. Wendel, “Genome evolution in polyploids,” in Plant molecular evolution, pp. 225–249, Springer, 2000. [21] S. Ohno, Evolution by Gene Duplication. Springer, 1970. [22] G. Blanc and K. H. Wolfe, “Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution,” The Plant Cell, vol. 16, no. 7, pp. 1679–1691, 2004. [23] K. L. Adams, R. Percifield, and J. F. Wendel, “Organ-specific silencing of duplicated genes in a newly synthesized cotton allotetraploid,” Genetics, vol. 168, no. 4, pp. 2217–2226, 2004. [24] W. Albertin, T. Balliau, P. Brabant, A.-M. Chevre, F. Eber, C. Malosse, and H. Thiellement, “Numerous and rapid nonstochastic modifications of gene products in newly synthesized Brassica napus allotetraploids,” Genetics, vol. 173, no. 2, pp. 1101–1113, 2006. [25] I. S. Pearse, T. Krügel, and I. T. Baldwin, “Innovation in anti-herbivore defense systems during neopolypoloidy – the functional consequences of instantaneous speciation,” The Plant Journal, vol. 47, no. 2, pp. 196–210, 2006. [26] S. P. Otto and J. Whitton, “Polyploid incidence and evolution,” Annual Review of Genetics, vol. 34, no. 1, pp. 401–437, 2000. [27] L. H. Rieseberg, O. Raymond, D. M. Rosenthal, Z. Lai, K. Livingstone, T. Nakazato, J. L. Durphy, A. E. Schwarzbach, L. A. Donovan, and C. Lexer, “Major ecological transitions in wild sunflowers facilitated by hybridization,” Science, vol. 301, no. 5637, pp. 1211–1216, 2003. [28] D. Schwarz, K. D. Shoemaker, N. L. Botteri, and B. A. McPheron, “A novel preference for an invasive plant as a mechanism for animal hybrid speciation,” Evolution, vol. 61, no. 2, pp. 245–256, 2007. [29] A. W. Nolte, J. Freyhof, K. C. Stemshorn, and D. Tautz, “An invasive lineage of sculpins, Cottus sp.(Pisces, Teleostei) in the rhine with new habitat adaptations has originated from hybridization between old phylogeographic groups,” Proceedings of the Royal Society of London B: Biological Sciences, vol. 272, no. 1579, pp. 2379–2387, 2005. [30] Z. Gompert, J. A. Fordyce, M. L. Forister, A. M. Shapiro, and C. C. Nice, “Homoploid hybrid speciation in an extreme habitat,” Science, vol. 314, no. 5807, pp. 1923–1925, 2006. [31] J. Mavárez, C. A. Salazar, E. Bermingham, C. Salcedo, C. D. Jiggins, and M. Linares, “Speciation by hybridization in Heliconius butterflies,” Nature, vol. 441, no. 7095, pp. 868–871, 2006. [32] J. M. Kreiner, P. Kron, and B. C. Husband, “Evolutionary dynamics of unreduced gametes.,” Trends In Genetics, vol. 33, no. 9, pp. 583–593, 2017. [33] J. M. Kreiner, P. Kron, and B. C. Husband, “Frequency and maintenance of unreduced gametes in natural plant populations: associations with reproductive mode, life history and genome size.,” The New Phytologist,

44 vol. 214, no. 2, pp. 879–889, 2017. [34] M. Upcott, “The nature of tetraploidy in Primula kewensis,” Journal of Genetics, vol. 39, pp. 79–100, Nov 1939. [35] B. Astaurov, “Experimental polyploidy in animals,” Annual Review of Genetics, vol. 3, no. 1, pp. 99–126, 1969. [36] W. Parrott, R. Smith, and M. Smith, “Bilateral sexual tetraploidization in red clover,” Canadian journal of genetics and cytology, vol. 27, no. 1, pp. 64–68, 1985. [37] J. E. Werner and S. J. Peloquin, “Yield and tuber characteristics of 4x progeny from 2x×2x crosses,” Potato Research, vol. 34, no. 3, pp. 261–267, 1991. [38] H. Ochman, J. G. Lawrence, and E. A. Groisman, “Lateral gene transfer and the nature of bacterial innovation,” Nature, vol. 405, no. 6784, pp. 299–304, 2000. [39] K. D. Whitney, R. A. Randell, and L. H. Rieseberg, “Adaptive introgression of herbivore resistance traits in the weedy sunflower Helianthus annuus,” The American Naturalist, vol. 167, no. 6, pp. 794–807, 2006. [40] M. Weill, F. Chandre, C. Brengues, S. Manguin, M. Akogbeto, N. Pasteur, P. Guillet, and M. Raymond, “The kdr mutation occurs in the Mopti form of Anopheles gambiaes. s. through introgression,” Insect Molecular Biology, vol. 9, no. 5, pp. 451–455, 2000. [41] K. K. Dasmahapatra, J. R. Walters, A. D. Briscoe, J. W. Davey, A. Whibley, N. J. Nadeau, A. V. Zimin, D. S. Hughes, L. C. Ferguson, S. H. Martin, et al., “Butterfly genome reveals promiscuous exchange of mimicry adaptations among species,” Nature, vol. 487, no. 7405, p. 94, 2012. [42] P. R. Grant and B. R. Grant, 40 Years of Evolution: Darwin’s Finches on Daphne Major Island. Princeton University Press, 2014. [43] F. Racimo, S. Sankararaman, R. Nielsen, and E. Huerta-Sánchez, “Evidence for archaic adaptive introgression in humans,” Nature Reviews Genetics, vol. 16, no. 6, p. 359, 2015. [44] L. Comai, “The advantages and disadvantages of being polyploid,” Nature Reviews Genetics, vol. 6, no. 11, pp. 836–846, 2005. [45] J. A. Tate, P. Joshi, K. A. Soltis, P. S. Soltis, and D. E. Soltis, “On the road to diploidization? Homoeolog loss in independently formed populations of the allopolyploid Tragopogon miscellus (Asteraceae),” BMC Plant Biology, vol. 9, no. 1, p. 80, 2009. [46] J. C. Schnable, X. Wang, J. C. Pires, and M. Freeling, “Escape from preferential retention following repeated whole genome duplications in plants,” Frontiers in plant science, vol. 3, p. 94, 2012. [47] R. De Smet, K. L. Adams, K. Vandepoele, M. C. Van Montagu, S. Maere, and Y. Van de Peer, “Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants,” Proceedings of the National Academy of Sciences, vol. 110, no. 8, pp. 2898–2903, 2013. [48] A. M. Session, Y. Uno, T. Kwon, J. A. Chapman, A. Toyoda, S. Takahashi, A. Fukui, A. Hikosaka, A. Suzuki, M. Kondo, et al., “Genome evolution in the allotetraploid frog Xenopus laevis,” Nature, vol. 538, no. 7625, pp. 336–343, 2016. [49] H. B. D. Kettlewell, “Selection experiments on industrial melanism in the

45 lepidoptera,” Heredity, vol. 9, no. 3, p. 323, 1955. [50] L. Cook, B. Grant, I. Saccheri, and J. Mallet, “Selective bird predation on the peppered moth: the last experiment of Michael Majerus,” Biology Letters, vol. 8, no. 4, pp. 609–612, 2012. [51] J. B. S. Haldane, “A mathematical theory of natural and artificial selection, part V: selection and mutation,” in Mathematical Proceedings of the Cambridge Philosophical Society, vol. 23, pp. 838–844, Cambridge University Press, 1927. [52] J. Masel, “Genetic drift,” Current Biology, vol. 21, no. 20, pp. R837–R838, 2011. [53] M. Kimura et al., “Evolutionary rate at the molecular level,” Nature, vol. 217, no. 5129, pp. 624–626, 1968. [54] M. W. Hahn, “Toward a selection theory of molecular evolution,” Evolution, vol. 62, no. 2, pp. 255–265, 2008. [55] A. Wagner, “Neutralism and selectionism: a network-based reconciliation,” Nature Reviews Genetics, vol. 9, no. 12, pp. 965–974, 2008. [56] M. Nei, Y. Suzuki, and M. Nozawa, “The neutral theory of molecular evolution in the genomic era,” Annual Review of Genomics and Human Genetics, vol. 11, pp. 265–289, 2010. [57] J. J. Vitti, S. R. Grossman, and P. C. Sabeti, “Detecting natural selection in genomic data,” Annual Review of Genetics, vol. 47, pp. 97–120, 2013. [58] S. Wright, “Evolution in Mendelian populations,” Genetics, vol. 16, no. 2, pp. 97–159, 1931. [59] F. Tajima, “Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.,” Genetics, vol. 123, no. 3, pp. 585–595, 1989. [60] G. Watterson, “On the number of segregating sites in genetical models without recombination,” Theoretical Population Biology, vol. 7, no. 2, pp. 256–276, 1975. [61] V. Grant, Plant speciation. New York: Columbia University Press, 1981. [62] G. N. Stone, S. Nee, and J. Felsenstein, “Controlling for non-independence in comparative analysis of patterns across populations within species,” Philosophical Transactions of the Royal Society of London B: Biological Sciences, vol. 366, no. 1569, pp. 1410–1424, 2011. [63] B. Devlin and K. Roeder, “Genomic control for association studies,” Biometrics, vol. 55, no. 4, pp. 997–1004, 1999. [64] J. K. Pritchard and P. Donnelly, “Case-control studies of association in structured or admixed populations,” Theoretical Population Biology, vol. 60, no. 3, pp. 227–237, 2001. [65] J. R. Andersen, T. Schrag, A. E. Melchinger, I. Zein, and T. Lübberstedt, “Validation of Dwarf8 polymorphisms associated with flowering time in elite European inbred lines of maize (Zea mays l.),” Theoretical and Applied Genetics, vol. 111, no. 2, pp. 206–217, 2005. [66] L. Camus-Kulandaivelu, J.-B. Veyrieras, D. Madur, V. Combes, M. Fourmann, S. Barraud, P. Dubreuil, B. Gouesnard, D. Manicacci, and A. Charcosset, “Maize adaptation to temperate climate: relationship between population structure and polymorphism in the Dwarf8 gene,” Genetics, vol. 172, no. 4, pp. 2449–2463, 2006.

46 [67] D. Kryvokhyzha, “Whole genome resequencing of Heliconius butterflies revolutionizes our view of the level of admixture between species,” Master’s thesis, Uppsala University, 2014. [68] Y. Song, S. Endepols, N. Klemann, D. Richter, F.-R. Matuschka, C.-H. Shih, M. W. Nachman, and M. H. Kohn, “Adaptive introgression of anticoagulant rodent poison resistance by hybridization between old world mice,” Current Biology, vol. 21, no. 15, pp. 1296–1301, 2011. [69] I. W. G. S. Consortium et al., “A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome,” Science, vol. 345, no. 6194, p. 1251788, 2014. [70] A. V. Zimin, D. Puiu, R. Hall, S. Kingan, B. J. Clavijo, and S. L. Salzberg, “The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum.,” GigaScience, vol. 6, no. 11, pp. 1–7, 2017. [71] S. R. Browning and B. L. Browning, “Haplotype phasing: existing methods and new developments,” Nature Reviews Genetics, vol. 12, no. 10, p. 703, 2011. [72] J. F. Crow, “90 years ago: The beginning of hybrid maize,” Genetics, vol. 148, no. 3, pp. 923–928, 1998. [73] G. H. Shull, “Elementary species and hybrids of Bursa,” Science, vol. 25, no. 641, p. 590, 1907. [74] S. E. Hill, “Chromosome numbers in the genus Bursa,” The Biological Bulletin, vol. 53, no. 6, pp. 413–415, 1927. [75] G. H. Shull, “Species hybridization among old and new species of shepherd’s purse,” Int Congr Plant Sci., no. 1, pp. 837–888, 1929. [76] H. Hurka, N. Friesen, D. A. German, A. Franzke, and B. Neuffer, “‘Missing link’ species Capsella orientalis and Capsella thracica elucidate evolution of model plant genus Capsella (Brassicaceae),” Molecular Ecology, vol. 21, no. 5, pp. 1223–1238, 2012. [77] M. Coquillat, “Sur les plantes les plus communes a la surface du globe,” Bulletin mensuel de la Société linnéenne de Lyon, vol. 20, no. 7, pp. 165–170, 1951. [78] G. M. Douglas, G. Gos, K. A. Steige, A. Salcedo, K. Holm, E. B. Josephs, R. Arunkumar, J. A. Ågren, K. M. Hazzouri, W. Wang, et al., “Hybrid origins and the earliest stages of diploidization in the highly successful recent polyploid Capsella bursa-pastoris,” Proceedings of the National Academy of Sciences, vol. 112, no. 9, pp. 2806–2811, 2015. [79] D. Kryvokhyzha, A. Salcedo, M. C. Eriksson, T. Duan, N. Tawari, J. Chen, M. Guerrina, J. M. Kreiner, T. V. Kent, U. Lagercrantz, et al., “Parental legacy, demography, and introgression influenced the evolution of the two subgenomes of the tetraploid Capsella bursa-pastoris (Brassicaceae),” bioRxiv, p. 234096, 2017. [80] H. Hurka, S. Freundner, A. H. Brown, and U. Plantholt, “Aspartate aminotransferase isozymes in the genus Capsella (Brassicaceae): Subcellular location, gene duplication, and polymorphism,” Biochemical genetics, vol. 27, no. 1, pp. 77–90, 1989. [81] T. Slotte, K. M. Hazzouri, J. A. Ågren, D. Koenig, F. Maumus, Y.-L. Guo, K. Steige, A. E. Platts, J. S. Escobar, L. K. Newman, et al., “The Capsella

47 rubella genome and the genomic consequences of rapid mating system evolution,” Nature Genetics, vol. 45, no. 7, pp. 831–835, 2013. [82] M. A. Koch, B. Haubold, and T. Mitchell-Olds, “Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae),” Molecular Biology and Evolution, vol. 17, no. 10, pp. 1483–1498, 2000. [83] M. Koch, B. Haubold, and T. Mitchell-Olds, “Molecular systematics of the Brassicaceae: evidence from coding plastidic matK and nuclear Chs sequences,” American Journal of Botany, vol. 88, no. 3, pp. 534–544, 2001. [84] A. Sicard, N. Stacey, K. Hermann, J. Dessoly, B. Neuffer, I. Bäurle, and M. Lenhard, “Genetics, evolution, and adaptive significance of the selfing syndrome in the genus Capsella,” The Plant Cell, vol. 23, no. 9, pp. 3156–3171, 2011. [85] A. Sicard, C. Kappel, Y. W. Lee, N. J. Woźniak, C. Marona, J. R. Stinchcombe, S. I. Wright, and M. Lenhard, “Standing genetic variation in a tissue-specific enhancer underlies selfing-syndrome evolution in Capsella,” Proceedings of the National Academy of Sciences, vol. 113, no. 48, pp. 13911–13916, 2016. [86] C. A. Rebernig, C. Lafon-Placette, M. R. Hatorangan, T. Slotte, and C. Köhler, “Non-reciprocal interspecies hybridization barriers in the Capsella genus are established in the endosperm,” PLoS Genetics, vol. 11, no. 6, p. e1005295, 2015. [87] A. Sicard, C. Kappel, E. B. Josephs, Y. W. Lee, C. Marona, J. R. Stinchcombe, S. I. Wright, and M. Lenhard, “Divergent sorting of a balanced ancestral polymorphism underlies the establishment of gene-flow barriers in Capsella,” Nature Communications, vol. 6, 2015. [88] S. Petrone, M. Lascoux, and S. Glémin, “Competitive ability of Capsella species with different mating systems and ploidy levels,” Annals of Botany, p. in press, 2018. [89] X. Yang, M. Lascoux, and S. Glemin, “Competitive ability of a tetraploid selfing species (Capsella bursa-pastoris) across its expansion range and comparison with its sister species,” bioRxiv, p. 214866, 2017. [90] H. Hurka and B. Neuffer, “Evolutionary processes in the genus Capsella (Brassicaceae),” Plant Systematics and Evolution, vol. 206, no. 1-4, pp. 295–316, 1997. [91] T. Slotte, K. Holm, L. M. McIntyre, U. Lagercrantz, and M. Lascoux, “Differential expression of genes important for adaptation in Capsella bursa-pastoris (Brassicaceae),” Plant physiology, vol. 145, no. 1, pp. 160–173, 2007. [92] T. Slotte, H.-R. Huang, K. Holm, A. Ceplitis, K. S. Onge, J. Chen, U. Lagercrantz, and M. Lascoux, “Splicing variation at a FLOWERING LOCUS C homeolog is associated with flowering time variation in the tetraploid Capsella bursa-pastoris,” Genetics, vol. 183, no. 1, pp. 337–345, 2009. [93] H.-R. Huang, P.-C. Yan, M. Lascoux, and X.-J. Ge, “Flowering time and transcriptome variation in Capsella bursa-pastoris (Brassicaceae),” New Phytologist, vol. 194, no. 3, pp. 676–689, 2012. [94] D. Kryvokhyzha, K. Holm, J. Chen, A. Cornille, S. Glémin, S. I. Wright,

48 U. Lagercrantz, and M. Lascoux, “The influence of population structure on gene expression and flowering time variation in the ubiquitous weed Capsella bursa-pastoris (Brassicaceae),” Molecular Ecology, vol. 25, no. 5, pp. 1106–1121, 2016. [95] T. Slotte, H. Huang, M. Lascoux, and A. Ceplitis, “Polyploid speciation did not confer instant reproductive isolation in Capsella (Brassicaceae),” Molecular Biology and Evolution, vol. 25, no. 7, pp. 1472–1481, 2008. [96] A. Cornille, A. Salcedo, D. Kryvokhyzha, S. Glémin, K. Holm, S. I. Wright, and M. Lascoux, “Genomic signature of successful colonization of eurasia by the allopolyploid shepherd’s purse (Capsella bursa-pastoris),” Molecular Ecology, vol. 25, no. 2, pp. 616–629, 2016. [97] C. Parisod, “Polyploids integrate genomic changes and ecological shifts,” New Phytologist, vol. 193, no. 2, pp. 297–300, 2012. [98] R. J. Elshire, J. C. Glaubitz, Q. Sun, J. A. Poland, K. Kawamoto, E. S. Buckler, and S. E. Mitchell, “A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species,” PloS one, vol. 6, no. 5, p. e19379, 2011. [99] D. H. Alexander, J. Novembre, and K. Lange, “Fast model-based estimation of ancestry in unrelated individuals,” Genome research, vol. 19, no. 9, pp. 1655–1664, 2009. [100] M. A. Beaumont, W. Zhang, and D. J. Balding, “Approximate Bayesian computation in population genetics.,” Genetics, vol. 162, no. 4, pp. 2025–2035, 2002. [101] B. S. Weir and C. C. Cockerham, “Estimating F-statistics for the analysis of population structure,” evolution, vol. 38, no. 6, pp. 1358–1370, 1984. [102] M. A. Beaumont, “Approximate Bayesian Computation in Evolution and Ecology,” dx.doi.org, 2010. [103] C. Roux and J. R. Pannell, “Inferring the mode of origin of polyploid species from next-generation sequence data,” Molecular ecology, vol. 24, no. 5, pp. 1047–1059, 2015. [104] A. Ceplitis, Y. Su, and M. Lascoux, “Bayesian inference of evolutionary history from chloroplast microsatellites in the cosmopolitan weed Capsella bursa-pastoris (Brassicaceae),” Molecular ecology, vol. 14, no. 14, pp. 4221–4233, 2005. [105] P. Khaitovich, G. Weiss, M. Lachmann, I. Hellmann, W. Enard, B. Muetzel, U. Wirkner, W. Ansorge, and S. Pääbo, “A neutral model of transcriptome evolution,” PLoS Biology, vol. 2, no. 5, p. e132, 2004. [106] M. Broadley, P. White, J. Hammond, N. Graham, H. Bowen, Z. Emmerson, R. Fray, P. Iannetta, J. McNicol, and S. May, “Evidence of neutral transcriptome evolution in plants,” New Phytologist, vol. 180, no. 3, pp. 587–593, 2008. [107] A. Whitehead and D. L. Crawford, “Neutral and adaptive variation in gene expression,” Proceedings of the National Academy of Sciences, vol. 103, no. 14, pp. 5425–5430, 2006. [108] P. Suárez-López, K. Wheatley, F. Robson, H. Onouchi, F. Valverde, and G. Coupland, “CONSTANS mediates between the circadian clock and the control of flowering in Arabidopsis,” Nature, vol. 410, no. 6832, p. 1116, 2001. [109] F. Valverde, A. Mouradov, W. Soppe, D. Ravenscroft, A. Samach, and

49 G. Coupland, “Photoreceptor regulation of constans protein in photoperiodic flowering,” Science, vol. 303, no. 5660, pp. 1003–1006, 2004. [110] R. J. Hijmans, S. E. Cameron, J. L. Parra, P. G. Jones, and A. Jarvis, “Very high resolution interpolated climate surfaces for global land areas,” International Journal of Climatology, vol. 25, no. 15, pp. 1965–1978, 2005. [111] A. Trabucco and R. J. Zomer, “Global aridity index (global-aridity) and global potential evapo-transpiration (global-PET) geospatial database,” CGIAR Consortium for Spatial Information, 2009. [112] P. Pavlidis, J. D. Jensen, W. Stephan, and A. Stamatakis, “A critical assessment of storytelling: gene ontology categories and the importance of validating genomic scans,” Molecular Biology and Evolution, vol. 29, no. 10, pp. 3237–3248, 2012. [113] C. D. Huber, M. Nordborg, J. Hermisson, and I. Hellmann, “Keeping it local: Evidence for positive selection in Swedish Arabidopsis thaliana,” Molecular Biology and Evolution, vol. 31, no. 11, pp. 3026–3039, 2014. [114] B. F. Voight, S. Kudaravalli, X. Wen, and J. K. Pritchard, “A map of recent positive selection in the human genome,” PLoS Biology, vol. 4, no. 3, p. e72, 2006. [115] S. Peischl, I. Dupanloup, L. Bosshard, and L. Excoffier, “Genetic surfing in human populations: from genes to genomes,” Current Opinion in Genetics & Development, vol. 41, pp. 53–61, 2016. [116] K. J. Gilbert, N. P. Sharp, A. L. Angert, G. L. Conte, J. A. Draghi, F. Guillaume, A. L. Hargreaves, R. Matthey-Doret, and M. C. Whitlock, “Local adaptation interacts with expansion load during range expansion: Maladaptation reduces expansion load,” The American Naturalist, vol. 189, no. 4, pp. 368–380, 2017. [117] K. L. Adams, R. Cronn, R. Percifield, and J. F. Wendel, “Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing,” Proceedings of the National Academy of Sciences, vol. 100, no. 8, pp. 4649–4654, 2003. [118] R. J. Buggs, L. Zhang, N. Miles, J. A. Tate, L. Gao, W. Wei, P. S. Schnable, W. B. Barbazuk, P. S. Soltis, and D. E. Soltis, “Transcriptomic shock generates evolutionary novelty in a newly formed, natural allopolyploid plant,” Current Biology, vol. 21, no. 7, pp. 551–556, 2011. [119] R. J. Buggs, S. Chamala, W. Wu, J. A. Tate, P. S. Schnable, D. E. Soltis, P. S. Soltis, and W. B. Barbazuk, “Rapid, repeated, and clustered loss of duplicate genes in allopolyploid plant populations of independent origin,” Current Biology, vol. 22, no. 3, pp. 248–252, 2012. [120] M.-J. Yoo, X. Liu, J. C. Pires, P. S. Soltis, and D. E. Soltis, “Nonadditive gene expression in polyploids,” Annual Review of Genetics, vol. 48, pp. 485–517, 2014. [121] M. Pfeifer, K. G. Kugler, S. R. Sandve, B. Zhan, H. Rudi, T. R. Hvidsten, K. F. Mayer, O.-A. Olsen, I. W. G. S. Consortium, et al., “Genome interplay in the grain transcriptome of hexaploid bread wheat,” Science, vol. 345, no. 6194, p. 1250091, 2014. [122] P. J. Wittkopp, B. K. Haerum, and A. G. Clark, “Evolutionary changes in cis and trans gene regulation,” Nature, vol. 430, no. 6995, p. 85, 2004.

50 [123] M.-C. Combes, A. Dereeper, D. Severac, B. Bertrand, and P. Lashermes, “Contribution of subgenomes to the transcriptome and their intertwined regulation in the allopolyploid Coffea arabica grown at contrasted temperatures.,” The New Phytologist, vol. 200, no. 1, pp. 251–260, 2013.

51 Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1632 Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science and Technology, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-341709 2018