<<

Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1714

Speciation of recently diverged

VENKAT TALLA

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6214 ISBN 978-91-513-0429-8 UPPSALA urn:nbn:se:uu:diva-358292 2018 Dissertation presented at Uppsala University to be publicly examined in Lindahlsalen, Evolutionary Centre, Norbyvagan 18, Uppsala, Friday, 19 October 2018 at 10:15 for the degree of Doctor of Philosophy. The examination will be conducted in English. Faculty examiner: Dr Nicola Nadeau (University of Sheffield).

Abstract Talla, V. 2018. genetics of recently diverged species. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1714. 74 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-513-0429-8.

Species differentiation can be a consequence of evolutionary forces including and random . Patterns of genomic differentiation vary across the . This variation seems to be dependent on, for example, differences in genomic architecture and molecular mechanisms. However, the knowledge we currently possess, both regarding the processes driving speciation and the resulting genomic signatures, is from a very small subset of the overall that resides on the planet. Therefore, characterization of the architecture of genomic divergence from more groups will be important to understand the effects of molecular mechanisms and evolutionary forces driving divergence between lineages. Hence it has not been possible to come to a consensus on the relative importance of genetic drift and natural selection on divergence processes in general. In this thesis, I use genomic approaches to investigate the forces underlying species and population differentiation in the European cryptic wood white ( sinapis, L. reali and L. juvernica) and two closely related species, the chiffchaff (Phylloscopus collybita abietinus) and the (P. tristis). Both these groups contain recently diverged species, a prerequisite for investigating initial differentiation processes. However, the study systems also differ in several respects, allowing for applying distinct approaches to understand the divergence process in each system. In summary, by applying a suite of genomic approaches, my thesis work gives novel insights into the speciation history of wood whites and chiffchaff. I identify candidate genes for local in both systems and concludes that differentiation in wood white butterflies have been driven by a combination of random genetic drift and week directional selection in allopatry. In the chiffchaff, the general differentiation landscape seems to have been shaped by recurrent background selection (and potentially selective sweeps), likely as a consequence of regional variation in the recombination rate which has also been observed in other genome- scans in . Potentially, some of the highly differentiated regions contain barriers to gene- flow as these regions are still present in sympatry, where species exchange genetic material at a high rate.

Keywords: Speciation, Leptidea, , chiffchaff, Cryptic species, genome-scan, Diapause

Venkat Talla, Department of and Genetics, , Norbyvägen 18D, Uppsala University, SE-75236 Uppsala, .

© Venkat Talla 2018

ISSN 1651-6214 ISBN 978-91-513-0429-8 urn:nbn:se:uu:diva-358292 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-358292)

To Mom & Dad

List of Papers

I Talla, V., Suh, A., Kalsoom, F., Dincă, V., Vila, R., Friberg, M., Wiklund, C. and Backström, N. (2017) Rapid increase in genome size as a consequence of transposable element hyperactivity in wood-white (Leptidea) butterflies. Genome Biology and Evolu- tion, 9(10), 2491-2505.

II Talla, V., Johansson, A., Dincă, V., Vila, R., Friberg, M., Wiklund, C. and Backström, N. Lack of parallelism: lineage-spe- cific directional selection drives genome differentiation across a triplet of cryptic species. Submitted manuscript.

III Talla, V., Kalsoom, F., Shipilina, D., Marova, I. and Backström, N. (2017) Heterogeneous Patterns of Genetic Diversity and Dif- ferentiation in European and Siberian Chiffchaff (Phylloscopus collybita abietinus/P. tristis). G3: Genes, , Genetics, 7(12), 3983-3998.

IV Talla, V., Soler, L., Kawakami, T.,, Dincă, V., Vila, R., Friberg, M., Wiklund, C. and Backström, N. Dissecting the effects of na- tural selection and on genetic diversity in three recently diverged cryptic butterfly species. Manuscript.

V Leal, L.*, Talla, V.*, Källman, T., Friberg, M., Wiklund, C., Dincă, V., Vila, R. and Backström, N. (2018) Gene expression profiling across ontogenetic stages in the wood white () reveals pathways linked to butterfly diapause regulation. Molecular Ecology, 27(4), 935-948.

* These authors contributed equally to the study.

Reprints were made with permission from the respective publishers.

Contents

1. Introduction ...... 11 1.1 Speciation ...... 13 1.2 Speciation genes ...... 15 1.3 ...... 16 1.3.1 Identifying genes responsible for divergent traits ...... 17 1.3.2 Effect of hybridization ...... 18 1.3.3 Effect of genetic drift ...... 20 1.4 Genome ...... 21 1.4.1 Evolution of transposable elements ...... 21 1.4.2 Evolution of coding genes ...... 22 1.5 Gene expression differences ...... 23 1.6 Cryptic species ...... 24 1.7 Diapause ...... 25 2. Methods ...... 26 2.1 Study systems ...... 26 2.1.1 The wood white butterflies ...... 26 2.1.2 The common chiffchaff ...... 28 2.2 Genome sequencing and assembly ...... 29 2.3 Population re-sequencing ...... 31 2.4 ...... 32 2.4.1 Inference of demographic history ...... 32 2.4.2 Detecting population structure ...... 33 2.4.3 Introgression analysis ...... 33 2.4.4 Genome wide patterns of diversity and differentiation ...... 34 2.4.5 Selection tests ...... 35 2.5 Gene annotation ...... 36 2.6 RNA sequencing and gene expression ...... 36 3. Research aims ...... 38 3.1 General aims ...... 38 3.2 Scientific aims ...... 39

4. Paper summaries ...... 41 5. Conclusions and future projects ...... 48 Svensk sammanfattning ...... 51 Acknowledgements ...... 53 References ...... 55

Abbreviations

DNA Deoxyribonucleic acid RNA Ribonucleic acid SNP Single Indel Insertion and deletion 4D sites 4 fold degenerate sites TE Transposable element TMRCA Time to the most recent common ancestor Mb Mega base pairs Kb Kilo base pairs bp My Million years

10 1. Introduction

The vast diversity of species that resides on earth has probably always been fascinating to mankind. Naming and clustering of into groups has been done by civilisations since the beginning of written language (Stevens, 2001). Although several civilisations classified organisms in their own lan- guages, the most known early classifications were conducted by early Greek philosophers Aristotle and his student Theophrastus (Stevens, 2001). It wasn’t until 1758 that a Swedish scientist named classified organisms using binomial nomenclature and by clustering similar organisms into groups called taxa (Linnaeus, 1758). The concept of natural selection in the formation of species was introduced by almost a century later in his book “ by Means of Natural Selection” (Darwin, 1859). Darwin described how adaptive change resulted in phenotypically distinct populations. However, Darwin did not explain in detail how species accumu- late reproductive incompatibilities – just the process of evolutionary change via natural selection (Darwin, 1859; Mallet, 2007b). Although there were mixed reactions to Darwin’s theory, he raised important arguments in support of the evolutionary theory, despite that he lacked knowledge about genetics or hereditary principles. Almost 40 years later, in 1864, Gregor Mendel formu- lated the laws of inheritance, which brought about the principles of how genes (he called them particles) segregate from parents to the next generation (Mendel, 1866). The inspiration to studying speciation and identifying species grew drastically after these major discoveries. Naturalists continuously clas- sified species based on morphological similarities (Mallet, 2007b) and as mu- seum collections around the world became more accessible, the word ‘subspe- cies’ came into place to classify geographical variants (Wallace, 1865). “What are species?” “How are species formed?” These questions have been spurring a major debate among naturalists ever since Darwin’s spearheading work, and have been key components in the field of evolutionary biology. Although, when the question of what a species really is was first raised, Poul- ton and Wallace proposed the term “syngamy” to refer to interbreeding indi- viduals as species (Wallace, 1865). In 1930, Dobzhansky studied morpholog- ically different Drosophila species and found that individuals preferred to mate with their conspecific mates rather than the individuals from the other species. This led Dobzhansky to propose the concept of species as an inter- breeding group of individuals (Dobzhansky, 1936; Mallet, 2007b). This was later popularised by Ernst Mayr as the “Biological species concept” (Mayr,

11 1963). Mayr described species as a group of interbreeding individuals occur- ring in natural populations that are reproductively isolated from other such groups (Mayr, 1970, 1982). Although this concept of species to some extent cleared the questions regarding previously applied morphological methods, new questions of already classified and geographical variants were to be treated as separate species appeared (Mallet, 2007b). Since the assess- ment of reproductive isolation is tedious and time consuming and sometimes not even possible, this also raised novel questions about if populations that were considered as one species could actually consist of undetected groups of reproductively isolated lineages. The discovery of cryptic species (see in sec- tion 1.6) has increased after the publication of the Ernst Mayr’s biological species concept and actualized the problem of species classification (Bickford et al., 2007). The biological species concept is now generally reduced to an explanatory model for how new species can be formed, but the concept itself does not provide a detailed solution to classify reproductively isolated species. A consensus is rather that the detailed methods, specific for different taxo- nomic groups, are required to ideally classify species (Bickford et al., 2007) if we should continue discussing ‘species’ as independent or reticulated enti- ties (Mallet, Besansky and Hahn, 2016). Using a combination of Darwin’s theory of evolution by natural selection and Mendel’s laws of inheritance, Ronald Fisher in his book ‘The Genetical Theory of Natural Selection’ described how populations could accumulate re- productive barriers by gene frequency changes that can cause phenotypic changes resulting in speciation (Fisher, 1930). Fisher’s expertise in mathemat- ics and theoretical predictions, made his early work combined with Wright’s and Haldane’s work laid the foundation for the field of population genetics (Fisher, 1930; Wright, 1931; Haldane, 1934). The population genetics during the Modern Synthesis had described population differentiation as combined effect of multiple genes using mathematical models (Fisher, 1930). Wright and Fisher introduced the concept of allelic drift as new arising in a population increases in frequency due to stochastic processes (Wright, 1931, 1942). However, it was not until the discovery of DNA as genetic material by (Hershey and Chase, 1952) and the development of methods to generate mo- lecular information (Hubby and Lewontin, 1966; Lewontin and Hubby, 1966) that biologists could start using molecular variation data in natural populations to understand evolutionary processes (McDonald and Kreitman, 1991). Wide range of study systems spanning across the tree of life have been investigated to understand the genetic basis of speciation and reproductive isolation (Seehausen et al., 2014). In this thesis I tried to understand the ge- netic basis of speciation in natural populations of the closely related wood white butterflies (Leptidea sinapis, L. reali and L. juvernica) and two recently diverged bird species, the common European chiffchaff (Phylloscopus colly- bita abietinus) and the Siberian chiffchaff (P. tristis). I have investigated the genome wide patterns of genetic differentiation in these two species groups to

12 identify factors responsible for reproductive isolation and speciation. Using the previously available resources and knowledge about these species, we tried to expand the genetic knowledge and genomic resources to these two study systems. Since diapause (see in section 1.7) has been a key trait for local adaptation in species, we also studied gene expression differences to understand the genetic basis of diapause regulation in the wood white butter- (L. sinapis).

1.1 Speciation The process of speciation has been of major interest for evolutionary biolo- gists for as long as this field of research has existed (De Queiroz, 2007). Spe- ciation can be defined as the process of diversification of populations into dis- tinct entities by accumulation of genetic and phenotypic differences that cause reproductive isolation between them (Mayr, 1942). This gradual transfor- mation from populations to species was emphasised already by Darwin and Wallace, although they had no knowledge of potential genetic barriers be- tween populations (Mayr, 1942; Mallet, 2008). Reproductive isolation refers to any form of reduction or restriction to gene-flow between individuals and are generally classified as pre- or post-mating barriers. Within the class of post-mating barriers there is generally a distinction between pre-zygotic or post-zygotic barriers to discriminate between gametic incompatibilities and barriers due to reduced (Via, 2009). Pre-mating barriers include ecological isolation, where populations are ecologically isolated and hindered from mating due to ecological barriers, for example due to differences in sea- sonal occurrence (Harrison, 1979; Futuyma, 2014). An example of that is pro- vided by the species Gryllus pennsylvanicus and G. veletis, two closely related cricket species that occur in north-eastern United States that are reproductively isolated as they reach reproductive age in different seasons (Harrison, 1979). A similar phenomenon, illustrating that ecological differences may keep spe- cies apart also in sympatry, it is observed in the green lacewings (Chrysoperla) (Martínez Wells and Henry, 1992) . In this system, three closely related spe- cies (C. plorabunda, C. adamsi and C. johnsoni) occur in sympatry but do not recognise each other’s songs (Martínez Wells and Henry, 1992). Post-mating, pre-zygotic barriers prevent by restricting the formation of success- ful hybrids even if the mating takes place, for example via gametic incompat- ibilities (Alipaz, Wu and Karr, 2001) or mismatching genitalia that prevent successful fertilization. In many insect species, genital morphology differ be- tween closely related species restricting the formation of hybrids (Sota and Kubota, 1998). This phenomenon is observed in closely related species of Drosophila where males of the three species (D. simulans, D. sechellia and D. mauritiana) exhibit different genital structures which restricts the transfer of

13 sperm to females when interspecific crosses occur (Eberhard, 1996). Post-zy- gotic barriers include hybrid inviability where hybrids mortality is higher than non-hybrids, and/or hybrid sterility/infertility where hybrids are completely or partly incapable of producing viable offspring. Hybrid inviability is seen be- tween D. melanogaster and D. simulans (Presgraves, 2010) and can be a rea- son for the maintenance of species integrity in closely related sympatric spe- cies pairs (Baker, 1959; Futuyma, 2014). One theoretical obstacle to the un- derstanding of formation of genetic incompatibilities is that novel mutations that provide some kind of isolation would result in reduced fitness of the pop- ulation in which they occur (Barton and Bengtsson, 1986). Bateson, Dobzhan- sky and Muller showed that, when populations are separated, mutations that contribute to reproductive isolation could happen independently at different loci (at least two) in each population, and these could come into effect when range shifts result in secondary contact (famously known as the Bateson-Dob- zhansky-Muller model) (Bateson, 1909; Dobzhansky, 1937; Muller, 1942; Orr, 1996). This model has attracted much attention since it does not include a period of reduced population fitness in any of the involved lineages, but on the other hand requires a period of allopatry. Indirect evidence for that Bateson-Dobzhansky-Muller model is prevalent comes from Haldane’s rule (Haldane, 1922). Haldane’s rule states that if there are genetic incompatibili- ties between species, these should be expressed more often in hybrids of the heterogametic sex (the sex with two different sex ) (Haldane, 1922). The reason for this is that recessive mutations on the sex-chromosomes (X in XY-systems, Z in ZW-systems) are expressed in the heterogametic sex. Study systems across the tree of life have been screened by evolutionary biologists to understand the barriers to gene flow and genetic incompatibilities between closely related species using modern molecular technologies (Seehausen et al., 2014). Studying genome-wide segregating polymorphisms in natural populations of closely related species with different divergence times can be useful to underpin genes responsible for reproductive isolation and these processes can be ‘caught in the act’ as they play their role in repro- ductive isolation (Vijay et al., 2016; Wolf and Ellegren, 2017). Diversification of species is achieved by accumulation of genetic differ- ences between the previously interbreeding populations. These differences are generally thought to accumulate gradually over time. The most common mode is likely the development of reproductive barriers during periods of geo- graphic isolation, either via random fixation of incompatibility or via divergent selection as a consequence of local adaptation to different habitats (Mayr, 1942, 1963; Levene, 1953). However, there has been a recent interest in understanding the efficiency of diversifying selection as a force to induce reproductive isolation between sympatric or parapatric populations – gener- ally referred to as ecological speciation (Jiggins et al., 2001; Schluter, 2001; Nosil, Crespi and Sandoval, 2002; Rundle and Nosil, 2005). In order to un- derstand the relative importance of random genetic drift and natural selection

14 in driving speciation, research has to be extended to more study systems – preferably closely related taxa where initial isolation processes are at play. In addition, to understanding the forces promoting speciation in natural settings, natural populations occurring in the wild have to be in focus.

1.2 Speciation genes Any gene contributing to the process of speciation is defined as a “speciation gene” (Nosil and Schluter, 2011). To classify a gene as speciation gene, it should hence be involved in reproductive isolation at some level (Nosil and Schluter, 2011). Genes responsible for reproductive isolation can be identified using different methods. Identification of genes under divergent selection in comparison to neutral genes can explain the role of natural selection and ran- dom genetic drift in driving speciation (Coyne and Orr, 2004; Schluter, 2009). One can also use candidate gene approaches to identify speciation genes that are involved in reproductive isolation between species. Such candidate genes are usually identified in model organisms and have specific functions that are relevant for the focal study system (Jeukens et al., 2009). Genome-wide se- lection scans can be used for detection of candidate genes present in the dif- ferentiation outlier regions of the genome (Haasl and Payseur, 2016). Alt- hough various genes have been identified that cause reproductive isolation in well studied model organisms like D. melanogaster (OdsH; (Ting et al., 1998; Ting, Tsaur and Wu, 2000; Sun, Ting and Wu, 2004), Hmr; (Barbash et al., 2003; Barbash, Awadalla and Tarone, 2004), Lhr; (Brideau et al., 2006), Nup96; (Presgraves et al., 2003)), there is currently no example of a verified gene that is involved in reproductive isolation in a natural study system out- side of Drosophila. The identification of ‘speciation genes’ has also been bi- ased towards studies comparatively deeply divergent lineages and the genes have mainly been involved in post-zygotic barriers (Orr, Masly and Presgraves, 2004). This means that it is difficult to judge whether the identi- fied genes actually were involved during the formation of reproductively iso- lated lineages or if they have evolved functions providing post-zygotic barri- ers after speciation was completed. Hence, it is important to extend speciation genetic research to study organisms that occur in the wild and to lineages that are at the early stage of divergence. With the advancement of next generation sequencing approaches, identification of loci involved in reproductive isola- tion between species can be done across the genome (Noor and Feder, 2006). However, the verification of a role in reproductive isolation can be hard to establish in most study systems and one has to be cautious to link phenotypic traits with specific genes (Tabor, Risch and Myers, 2002).

15 1.3 Local adaptation Environmental factors change with geography around the globe, forcing phe- notypic divergence in species living in different geographic regions (Endler, 1986). of species tend to change across large environmental gra- dients such as latitude, longitude, elevation and water availability(Atkinson and Sibly, 1997). Understanding the genetic factors responsible for these changes in phenotypes explains the role of natural selection in the diversifica- tion of these species (Endler, 1986; Conover, Duffy and Hice, 2009) as natural selection effects primarily on rather than (Mayr, 1963). Local adaptation can be observed in natural populations as phenotypic differ- entiation across an environmental gradient (Conover, Duffy and Hice, 2009). Rapidly changing climate will undoubtedly have a considerable effect on dis- tribution range shifts and many species will likely encounter radically differ- ent selection pressures. To understand the consequences of environmental change and to predict species distributions and potential formation of novel hybrid zones, it is important to understand the genetic basis of adaptive traits (Franks and Hoffmann, 2012). We are just at the beginning of understanding the genetic architecture underlying adaptive phenotypes. Some well-known examples include the colour polymorphism in the peppered moth (Biston bet- ularia) allowing for rapid expansion of melanic individuals during the indus- trial revolution in UK (Van’t Hof et al., 2011, 2016; Cook and Saccheri, 2013), relaxed selection leading to reduced number of armour plates in fresh- water colonizing three-spine sticklebacks (Gasterosteus aculeatus) (Cresko et al., 2004; Colosimo et al., 2005) and the evolutionary dynamics of cryptic colour in populations of stick (Timema cristinae) (Soria-Carrasco et al., 2014; Comeault et al., 2015; Lindtke et al., 2017; Nosil et al., 2018). With limited genomic resources it has been challenging to identify the genetic basis of traits related to local adaptation (Rockman, 2012). With the advent of ge- nomic resources (Ekblom and Galindo, 2011), efforts to investigate the ge- netic basis of adaptive traits can be initiated at a larger scale and for organisms with different properties. We can also dissect in more detail if adaptive traits are determined by few genes with large effect (Fisher, 1930) or many loci with smaller effects or a mix (Orr, 2001). The relationship between efficiency of selection and effective population size can also be studied in more detail when the genes underlying adaptive traits have been identified (Slotte et al., 2010; Corbett-Detig, Hartl and Sackton, 2015). Determining the genetic factors responsible for adaptive traits gives the power to predict evolutionary change. Although initial efforts using available knowledge in local adaptation and natural selection in some natural popula- tions have led evolutionary biologists to predict short term processes, long term patterns of evolution involving multiple random- and selective forces and environmental factors are extremely hard to predict (Grant and Grant, 2002; Lässig, Mustonen and Walczak, 2017; Nosil et al., 2018).

16 1.3.1 Identifying genes responsible for divergent traits Linking phenotypic data with data has traditionally been done using the method called QTL (Quantitative Trait Locus) mapping (Falconer and Mackay, 1996). With the availability of modern genomic tools, identification of adaptive difference can also be done using genome scan approaches of whole genomes of populations that are from distinct environments (Beaumont and Balding, 2004). Scanning genomes for frequency variations be- tween geographically separated populations to identify loci under selection is currently carried out in several species (Nielsen et al., 2007; Pool et al., 2010). Comparative can also be a powerful tool to characterize loci under selection (Schlötterer, 2003). The level of genetic differentiation between pop- ulations is traditionally measured by the F-statistic (Wright, 1921). If a locus in population A is selected for and the same locus in population B (which is geographically isolated from population A so that gene-flow does not occur) is evolving neutrally, this would reflect in a higher F-statistic at the selected locus than at neutral loci between the populations (Lewontin and Krakauer, 1973). However detection of selection only through a single estimator like the F-statistic can be dubious and subject to detection of considerable frequencies of false positives (Robertson, 1975). For example, a locus with high genetic differentiation (as estimated by F-statistics) can also be a consequence of se- lection in the ancestral population, reducing the in the re- gion before the lineages started diverging. This is because the F-statistic is calculated as a ratio of diversity estimates. Hence, a recommended procedure is to get a detailed understanding of the demographic history of populations and combine the F-statistic (Vitalis, Dawson and Boursot, 2001) with sum- mary statistics that are not dependent on intraspecific levels of genetic diver- sity. The estimator absolute divergence (DXY), is another statistic that describes genetic divergence between populations (it was referred before as πXY in Nei and Li, 1979 and as DXY in Nei, 1987). DXY gives the pairwise difference be- tween sequences of two populations, and it is independent of nucleotide di- versity levels in both populations (Nei and Li, 1979; Cruickshank and Hahn, 2014) as DXY captures the number of pairwise differences accumulated since the split from the most recent common ancestor of the two populations (Nei, 1987; Cruickshank and Hahn, 2014). Nucleotide diversity (Ɵπ) measures the within population pairwise differences in nucleotide sequences. FST is af- fected by the levels of nucleotide diversity in a region. FST can be inflated in the regions where nucleotide diversity within populations is low (Nei, 1973; Charlesworth, Nordborg and Charlesworth, 1997). DXY will not be affected by the levels of present nucleotide diversity in the populations, it can be affected by the levels of diversity in the ancestral state and the substitution rate. Loci harbouring larger number of fixed polymorphisms due to positive selection will show a higher DXY compared to the neutral loci (Cruickshank and Hahn, 2014).

17 Genome-wide differentiation has been studied in a wide range of species with the aim of identifying loci that could inform on traits under adaptive di- vergence or other reproductive barriers (Ellegren et al., 2012; Martin et al., 2013; Liu et al., 2014; Nadeau et al., 2014; Poelstra et al., 2014). Differentia- tion islands (regions with significantly higher genetic differentiation - in gen- eral FST - than other regions of the genome) have been observed in many of these comparisons and the initial interpretation was that such differentiation islands could be barriers to gene flow and therefore harbour genes of im- portance for reproductive isolation (Turner, Hahn and Nuzhdin, 2005). How- ever, more detailed analysis of empirical data using combinations of summary statistics and theoretical modelling and simulations showed that differentia- tion islands not necessarily point towards that the regions have been under diversifying selection or that they contain barriers to gene flow (Cruickshank and Hahn, 2014). In theory, it can just be a consequence of a regional reduc- tion in the recombination rate which results in reduced diversity due to selec- tive sweeps or recurrent background selection in the ancestor (Cruickshank and Hahn, 2014; Burri, 2017; Wolf and Ellegren, 2017). Hence, to understand genome divergence patterns it is essential to combine genome-wide genetic diversity data with detailed recombination data and use combinations of sum- mary statistics that pin-point different evolutionary forces (Seehausen et al., 2014; Wolf and Ellegren, 2017). It should be noted however, even if genome- scan approaches using summary statistics likely have too little power to iden- tify genes involved in reproductive isolation, such efforts can generate useful information about the general effects of natural selection and random genetic drift in shaping genome-wide patterns of diversity within species and diver- gence across species (Turner and Hahn, 2010; Cruickshank and Hahn, 2014). Importantly, even if genome-scans with summary statistics or comparative ge- nomics approaches can generate correlations between adaptive traits and spe- cific genotypes, verification experiments will be needed to establish a causal link between genotype and phenotype by integrating QTL maps to the genome scans (Rogers and Bernatchez, 2005). Various signatures of divergent selec- tion can be seen in the genome of divergent populations. Divergent selection reduces the nucleotide diversity in the species where selection is occurring and increases relative genetic differentiation between geographically / reproduc- tively isolated populations at these selected loci. Genomic regions under di- vergent selection are therefore most likely to reside in the highly differentiated regions of the genome (Beaumont and Nichols, 1996; Beaumont and Balding, 2004; Gompert et al., 2012).

1.3.2 Effect of hybridization Hybridization is a process where partly reproductively isolated populations mate and produce viable offspring (Harrison, 1990). The process of hybridi- zation is of interest for evolutionary biologists as it often represents the early

18 stages of speciation and sometimes can be the source of new (Mallet, 2007a; Taylor, Larson and Harrison, 2015). Hybrid zones are geo- graphical regions where genetically different populations interbreed and pro- duce offspring of mixed ancestry (Barton and Hewitt, 1985; Harrison, 1990). As a consequence of hybridization, quantitative phenotypic characters and al- lele frequencies of individuals change across the hybrid zone which generally results in a distinct cline (Barton and Hewitt, 1985). The processes forming hybrid zones can be separated into two categories. First, which is generally referred to as a primary hybrid zone, can originate due to geographical differ- ences in environmental conditions within the range of an initially homogene- ous population (Futuyma, 2014). As a previously homogeneous population inhabits ecologically different habitats, the population may diverge due to ad- aptation to the different niches. If gene-flow occurs between the populations inhabiting the different ecological niches, a primary hybrid zone may form. The question of how much gene-flow that is allowed to occur for this type of hybrid zone to be correctly interpreted is still under much debate as it relates very much to the classical dispute between sympatric and scenarios. Secondary hybrid zones are more easily understood and likely the most common type of hybrid zones observed. A secondary hybrid zone is formed when one or both of two allopatric populations that are genetically differentiated expand the distribution range. When the populations come into contact, hybridization may occur if reproductive barriers are permeable which results in a hybrid zone. As an example, the carrion crow (Corvus corone cor- one) and the hooded crow (C. cornix) have been through a range expansion as a result of recolonization after separation in glacial refugia. The two (sub)spe- cies formed a narrow hybrid zone at secondary contact where gene-flow now occurs (Hewitt, 2000; Poelstra et al., 2014). In both primary and secondary hybrid zones, allele frequency clines are expected in certain locithat are not introgressed and these may contain genes involved in reproductive isolation (Barton and Hewitt, 1985; Jiggins et al., 1996). (Futuyma, 2014). Understand- ing the dynamics of hybrid zones is one way to investigate reproductive bar- riers between divergent lineages. Describing gene-flow in different parts of the genome may reveal loci that are not introgressed and these may contain genes involved in reproductive isolation (Barton and Hewitt, 1985; Jiggins et al., 1996). Regions of the genomes can be transferred from one species to the other during hybridisation, these loci are referred to as introgressed loci and the pro- cess is called introgression. Introgression happens with transfer of alleles from one species to the other with multiple back-crossings (Barton and Hewitt, 1985; Szymura and Barton, 1986; Rieseberg, Whitton and Gardner, 1999; Lexer et al., 2007; Gompert and Buerkle, 2009). Transfer of this genetic ma- terial between species can supress the genetic differences accumulated be- tween them (Levin et al., 1996; Taylor et al., 2006; Vonlanthen et al., 2012). Regions that show reduced gene-flow as compared to the genomic average

19 between hybridising populations show us that the loci may have been under divergent selection and that they potentially affect the fitness of the hybrids, and in particular later generation back-crosses (Gompert et al., 2012). This effect of the locus on the hybrid fitness can be dependent on the environment (Nolte et al., 2009; Teeter et al., 2010). Genetic differences between popula- tions that are reproductively isolated can be due to non-adaptive forces like linked selection and genetic drift (Gavrilets, Li and Vose, 1998; Fierst and Hansen, 2010). Hybrid inviabilities can also be generated due to differential activity of transposable elements in non-hybridizing, geographically isolated populations (Hurst and Werren, 2001; Lynch, 2007).

1.3.3 Effect of genetic drift Genetic drift is the random change in allele frequencies in a population. It was first described by Wright and Fisher as random change in gene frequency in populations (Wright, 1931, 1942). It was tested in lab stains of Drosophila by Buri in 1956 as he raised 100 populations for 19 generations from 8 males and 8 females (in same conditions) to see random change in gene frequencies (Buri, 1956). The average time to fixation of a new neutral mutation in a dip- loid populations is derived to be in 4Ne generations (Ne is the effective popu- lation size of the population; Kimura and Ohta, 1969). Ne is not the census size of a population but the number of individuals contributing to the next generation (Wright, 1931). Different regions of the genome might have dif- ferent effective population size. The time to fixation of an allele in a popula- tion is dependent on the effective population size of the allele. A new mutation goes to fixation faster in populations with smaller effective population size compared to the populations with larger effective population size (Kimura and Ohta, 1969). The variability of in a region of the genome was pre- dicted to be 4Neµ, where µ is the mutation rate (Watterson, 1975). Hence the effect of genetic drift is dependent on the effective population size and the mutation rate of the population (Watterson, 1975). During the process of population differentiation due to selection or re- striction to gene flow in some regions of the genome, the other regions of the genome have a similar effect due to genetic drift (Wu, 2001; Gavrilets and Vose, 2005). Genetic hitchhiking of alleles linked to the selected loci can also lead to amplification of differentiation due to linkage (Robertson, 1961; Hill and Robertson, 1966). Regions of the genome that show high genetic differ- entiation are caused by selection or restriction to gene flow, regions with in- termediate levels of differentiation are caused by linkage to the selected loci and drift (Nosil, Funk and Ortiz-Barrientos, 2009). Demographic events also have a large effect on drift. If populations have different historic sizes and fluctuations due to environmental factors, Ne in the genome changes due to this fluctuations (Allendorf, Luikart and Aitken, 2009) and as a consequence, facilitates genetic differentiation.

20 1.4 Genome evolution Studying and comparing genomes of different species is a very recent ad- vancement in the field of evolutionary biology. This is obviously a conse- quence of the fact whole-genome sequencing has been affordable and techni- cally achievable for non-model species for not more than a decade (Gilad, Pritchard and Thornton, 2009). The advancement in molecular and sequencing techniques has helped evolutionary biologists describe genome architecture and get initial understanding of the processes driving genome size and com- position (Futuyma, 2014). It has been known since initial annotation processes that 1.5 % of the human genome consists of protein coding sequences (Kellis et al., 2014). The rest of the genome is a mix of different sequence elements with or without direct functional roles. The genome size varies drastically across the tree of life and it only correlates weakly with the inferred functional protein coding DNA – this is usually referred to as the “C-value paradox” after assessment of DNA (Thomas, 1971). C-value is the weight of a DNA meas- ured in pictograms (pg). The main DNA elements that seem to contribute to variation in genome sizes are transposable elements, TEs (Pritham, 2009; Tenaillon, Hollister and Gaut, 2010; Lee and Kim, 2014). There have been two arguments regarding the causes of the genome size expansion. One argu- ment is that the genome size expansion can be an adaptive process (Gregory and Hebert, 1999; Cavalier-Smith, 2005) and the other argument is random accumulation or loss of DNA (Petrov, 2001, 2002; Lynch, 2007).

1.4.1 Evolution of transposable elements Transposable elements (TEs) are selfish genetic elements that are ubiquitous across the genomes of eukaryotes, often encode their own enzymes to mediate spreading (e.g., reverse transcriptase, transposase), and may affect gene regulation through their transcription factor binding sites or non-coding RNA (Chuong, Elde and Feschotte, 2017). The involvement of transposable ele- ments in the evolution of eukaryotic genomes was first described by Barbara McClintock in 1956 when she was researching unstable inheritance of colour patterns in (Zea mays) seeds. She discovered two genetic loci in maize (Dissociation and Activator) that can change position on the and they control the colouration of maize seeds (McClintock, 1956). She was awarded the Nobel Prize in physiology and medicine for this discovery of transposition of genetic material. Later, it has been widely appreciated that TEs play key roles in the repetitive nature of eukaryotic genomes by their movement across the genome by copying-and-pasting (retrotransposons) or cutting-and-pasting (DNA transposons) (Britten and Davidson, 1971; Chuong, Elde and Feschotte, 2017). The actual movement of TEs in genomes can be explained by replication activities and genetic drift alone (Lynch, 2007). Transposable elements use the host cell mechanisms to utilize internal

21 genes that promote their own replication and spreading. Long terminal repeat elements (LTRs) are TEs that contain their own regulatory elements to pro- mote transcription (Mager and Stoye, 2015). Long interspersed nuclear ele- ments (LINEs) are long DNA elements that contains a promoter in the 5`UTR (Garcia-Perez et al., 2015). Short TEs that contain small RNA promoters (i.e., get transcribed by RNA polymerase III) and depend on enzymes generated by LINEs are classified as short interspersed nuclear elements (SINEs), which are short sequences that are transcribed using a promoter that is located within the SINE sequence, but they are dependent on the reverse transcription ma- chinery of the host genome or LINEs for their spreading in the genome (Garcia-Perez et al., 2015). LINEs, SINEs, LTRs can contribute to a signifi- cant proportion of the genome size increase (Sanmiguel and Bennetzen, 1998). DNA transposons proliferate in the eukaryote genome using a “cut-paste” mechanism to insert themselves in the DNA or using replication to proliferate and can also contribute to a significant proportion of the genome size increase (Feschotte and Pritham, 2007). TE proliferation in the genomes can have evo- lutionary consequences like changes in gene regulation (Rebollo, Romanish and Mager, 2012; Elbarbary, Lucas and Maquat, 2016; Chuong, Elde and Feschotte, 2017). The classic example for evolutionary implications due to TEs can be seen in Biston betularia (the peppered moth) as the TE insertion in an intron resulted in a phenotypic change in the colouration of the moth (Van’t Hof et al., 2016).

1.4.2 Evolution of protein coding genes The neutral theory of evolution by Kimura states that the majority of the fixed mutations in the genome do not have any fitness effects and are driven to fix- ation by genetic drift (Kimura, 1979). The rate at which amino acid or nucle- otide mutations happen is roughly constant across the genome per generation (Kimura, 1979). Functional portions of the genome evolve slower than the functionally non-vital portions of the genome (Kimura, 1979, 1991) as a result of a lower rate of fixation of non-synonymous mutations which are generally deleterious. Positive selection can escalate the rate of evolution in protein cod- ing genes. Purifying selection can decrease the rate of evolution by removal of non-advantageous mutations (Charlesworth, Charlesworth and Morgan, 1995; Larracuente et al., 2008). The proportion of amino acid substitutions (adaptive substitutions) that are fixed in a population due to positive selection is predicted to be higher in pop- ulations or species with larger effective population size (Gillespie, 2001; Eyre- Walker and Keightley, 2007; Leffler et al., 2012). This was shown to be cor- rect in flies ( large effective population size), bacteria (large population size) and humans (small effective population size) (Charlesworth and Eyre-Walker, 2006; Eyre-Walker, 2006; Eyre-Walker and Keightley, 2007). While some species such as yeast, maize showed limited adaptive substitutions even

22 though they have a higher effective population sizes (Liti et al., 2009). The reasons for the variation in the rate of adaptive substitutions among taxa is not clearly understood (Strasburg et al., 2011). With the incorporation of genomic data we are now able to use a combina- tion of data from DNA and RNA sequencing to understand selection in genes. We can measure the synonymous and non-synonymous substitutions and pol- ymorphisms using computational tools and understand the effects of different selection regimes in more detail (Li, Wu and Luo, 1985; Yang, 2007). We can test if ratios of non-synonymous to synonymous substitutions (dN/dS) differ significantly from ratios of non-synonymous to synonymous polymorphisms (pN/pS) using the McDonald–Kreitman test (McDonald and Kreitman, 1991). Direction of selection can be obtained by the degree of difference between pN/pS and dN/dS (Stoletzki and Eyre-Walker, 2011). Proportion of adaptive changes going to fixation across the genome can be calculated using the fixed substitutions and the segregating polymorphisms at synonymous and non-syn- onymous sites. The metric is referred to as α (Fay, Wyckoff and Wu, 2001; Obbard et al., 2009; Gossmann, Keightley and Eyre-Walker, 2012).

1.5 Gene expression differences The central dogma of biology explains how transcription of DNA results in RNA and that translation of RNA results in that are central to life of the organism (Crick, 1970). The amount of specific protein produced in a tis- sue depends on the levels of expression of the gene encoding the protein and the post-transcriptional regulation processes leading to specific amounts of mature functional protein (Brockmann et al., 2007; Schwanhäusser et al., 2011; Marguerat et al., 2012; Liu, Beyer and Aebersold, 2016). The process of an organism changing its phenotype to cope with environment without changing its genotype is referred to as phenotypic plasticity (Schlichting, 1986; West-Eberhard, 1989; Scheiner, 1993). Environmental activity do not change the structure or the sequence of the protein coding gene directly but it changes the amount of protein produced and that in turn affects the phenotype (Carroll, 2008). Various studies have found gene expression changes in the organism to be responsible for phenotypic plasticity (Meier et al., 2014; Morris et al., 2014). The best known example of phenotypic plasticity is the eye size variation of the seasonal morphs of the species Bicyclus anynana, which was associated to the differential expression of a few developmental genes (Macias-Munoz et al., 2016). Hence, different phenotypes can be driven by the gene expression differences triggered by the environmental factors (Brand and Perrimon, 1993). Several studies have already established a major effect of gene expression on the process of speciation (Wittkopp, Haerum and Clark, 2008; Tirosh et al., 2009) and there are multiple examples of where gene expression variation

23 underlies relevant adaptive traits in natural populations (López-Maury, Marguerat and Bähler, 2008). Adaptive traits such as response to environmen- tal stress in yeast (Chen et al., 2003; Enjalbert et al., 2006), Arabidopsis tha- liana (Ma and Bohnert, 2007), Drosophila melanogaster (Girardot, Monnier and Tricoire, 2004) and human cells (Murray et al., 2004) and pigmentation in mice (Steiner, Weber and Hoekstra, 2007) and Drosophila biarmipes (Gompel et al., 2005) are some of the examples. Populations that are geo- graphically separated undergo adaptive changes driven by environmental pressure. Extreme phenotypes generated by these gene expression changes can cause hybrid breakdown and sterility (Burke and Arnold, 2001; Rogers and Bernatchez, 2006). These kind of extreme phenotypes can form slowly with time triggered by the action of genes (Rieseberg, Archer and Wayne, 1999). The rate of evolution has been shown to be slower in highly than in lowly expressed genes (Pal, Papp and Hurst, 2001; Drummond, Raval and Wilke, 2005; Larracuente et al., 2008) as a consequence of more efficient purifying selection in genes that are widely (in many/multiple tissues) expressed. Many candidate genes responsible for the phenotypic differences have been identi- fied by analysing the gene expression in model organisms. Phenotypes like castes in ants (Wurm, Wang and Keller, 2010; Ometto et al., 2011), condition dependent sexual dimorphism in Drosophila (Wyman, Agrawal and Rowe, 2010), sex-specific phenotypes in Drosophila (Ranz et al., 2003) are a few that have been identified using gene expression differences. With the increas- ing efficiency of technologies for expression analysis, these phenotypes can be linked to the genes in non-model organisms.

1.6 Cryptic species Cryptic species are visually similar but reproductively isolated from one an- other due to genetic differences. Several cryptic species have been identified and described due to the advancement of molecular techniques (Bickford et al., 2007). As morphological classification system has its own significant lim- itations, biologists started using DNA sequences to identify monophyletic groups. The discovery of mitochondrial gene cytochrome c oxidase I (COI) has given biologists the power to easily identify new species hidden as cryptic units within the morphological classification system (Brown, Emberson and Paterson, 1999; Trewick, 2000; Vincent, Vian and Carlotti, 2000; Hebert et al., 2003; Dincă et al., 2011). Although the DNA barcoding has provided power to identify species beyond morphological data, it alone is not sufficient to identify hidden cryptic species (Schlick-Steiner et al., 2007). A combina- tion of DNA barcoding, morphological and ecological data is generally re- quired to appropriately classify species (Beheregaray and Caccone, 2007). Several insect species have been identified to have been reproductively iso- lated as they have differences in their genital morphology (Dufour, 1844;

24 Masly, 2012). The traditional lock and key hypothesis describes how differ- ences in genital structure can restrict species from hybridizing (Eberhard, 1992). Reproductive isolation by genital morphology differences are hypoth- esised to function in two ways, one is the reproductive isolation due to the structural differences that cause mechanical incompatibilities between popu- lations with distinct genital morphologies (Sota and Kubota, 1998). The other is the sensory lock-and-key model where populations first evolved to have different genital morphology and then evolved to have sensory and behav- ioural changes that resulted in premating reproductive isolation (Eberhard, 1992; Masly, 2012).

1.7 Diapause To cope with variation in environmental conditions like temperature and light conditions, insects in the temperate regions go through a phase of restricted development called diapause (Soediono, 1989). Over the cold winters these stages of delayed development increases their tolerance to stress. This adap- tive process occurs during a time in the life cycle such that it prevents the species from while in extreme environments (Brady, 1976). The identification of genes and pathways responsible for the traits that cause this kind of adaptive plasticity can provide better understanding of speciation in organisms with extreme phenotypes. The wood white (Leptidea sinapis) is a butterfly species that is split in to multiple geographically separated ecotypes with different habitat preferences to the environment (Friberg et al., 2008; Friberg and Wiklund, 2010). L. sinapis in the cold north Europe enter diapause at the pupa stage after just one generation (univoltine) while L. sinapis in warm southern Europe enter diapause after several generations (multivoltine) every season (Friberg et al., 2008).

25 2. Methods

2.1 Study systems Speciation and genetics of adaptation has traditionally been studied in labora- tory model organisms like mouse, drosophila and zebra fish due to various appealing factors of such organisms, for example short generation times, ac- cessible genomic resources, easy maintenance in lab environments and tradi- tion (Hedges, 2002). However, although this definitely helps us understand evolutionary processes in general, the model organisms naturally contribute to just a tiny fraction of the overall biodiversity occupying the planet. In recent years, researches have started focusing more on organisms with interesting genomic features and non-model organisms with interesting phenotypic traits (Hedges, 2002). With the increasing technologies in next generation sequenc- ing, we can now start looking at natural populations to understand different evolutionary processes. Gene expression profiling, for example, has led evo- lutionary biologists to understand gene expression differences in populations and pin-point genes responsible for adaptation to environmental factors (Ekblom and Galindo, 2011). Processes responsible for diversification of pop- ulations into new species can be investigated by studying recently diverged species at the genomic level (Seehausen et al., 2014; Wolf and Ellegren, 2017). In addition, genomic comparisons of species across the tree of life are useful for identification of conserved genes and non-coding elements that can inform on relative selection pressures and functional roles and for tracking down genes under natural selection in different lineages (Hedges and Kumar, 2002). In this thesis, I have utilized two different study systems to answer different questions related to speciation processes. These are described in de- tail below.

2.1.1 The wood white butterflies The order Lepidoptera includes butterflies and moths and spans about 160,000 species spread across the globe (Van Nieukerken et al., 2011). The immense diversity and easy accessibility of natural populations in the wild makes but- terflies and moths an ideal model group to study speciation and evolution (Šíchová et al., 2015). The most famous model organisms in Lepidoptera in- clude Heliconius butterflies that had several evolutionary studies focus on its

26 wing pattern driven speciation, speciation with gene flow and intro- gression (Jiggins et al., 1996, 2001; Consortium., 2012; Martin et al., 2013; Nadeau et al., 2014), The Monarch butterfly (Danaus plexippus) known for its long distance migration across North America (Zhan et al., 2014) and The Swallowtail butterflies of the genus Papilio which are known for their mim- icry patterns (Scriber, Hagen and Lederhouse, 1996; Iijima et al., 2018). Due to the holocentric (absence of a distinct centromere) nature of the lepidopteran chromosomes, evolution is expected to happen via fissions and fu- sions (Suomalainen, 1969). But a surprising amount of chromosomal synteny has been observed between chromosome level assemblies of other lepidop- teran genomes to the reference genome assembly of Bombyx mori (Pringle et al., 2007; Beldade et al., 2009; Yasukochi et al., 2009). In the order Lepidop- tera recombination has been observed to be absent in females in general due to achiasmatic meiosis (Suomalainen, Cook and Turner, 1973; Turner and Sheppard, 1975). Although the genomic resources are growing in Lepidop- tera, majority of the known knowledge is from well-known model organisms like the Post man butterfly (Heliconius melpomene), the Silk moth (Bombyx mori) and the Monarch butterfly (Danaus plexippus). Wood white butterflies are commonly distributed across Eurasia. These butterflies were in the spotlight when a new species, (Reissin- ger, 1989), was discovered to be hidden within the initially described species, Leptidea sinapis (Linnaeus, 1758). The two species were distinguished by the differences in their genital morphology (Réal, 1988; Lorkovic, 1993). Initially it was reported that L. sinapis and L. reali happened to be occurring in sym- patry across Europe and that they could be distinguished by genital morphol- ogy (Lorkovic, 1993; Martin, Gilles and Descimon, 2003; Fumi, 2008), DNA information (Martin, Gilles and Descimon, 2003) and slight variation in pupal colouration (Friberg and Wiklund, 2007). This cryptic complex was again in focus when a third wood white species, L. juvernica (Dincă, 2011), was found recently, showing a distinct mitochondrial DNA haplotype and different kar- yotype set-up as compared to L. reali (Dincă et al., 2011). All species show some variation in chromosome numbers across populations. Within one spe- cies, L. sinapis, this variation is extreme. The karyotype variation in L. sinapis occurs in a cline ranging from the largest number of chromosomes in SW Eu- rope (2n = 106) to only about half as many (2n = 56) in NW Europe and central (Dincă et al., 2011; Lukhtanov et al., 2011). Across the cline the chro- mosome number decreases gradually from SW to N and E so that populations inhabiting geographic regions between these extremes have gradually de- creasing chromosome counts. The chromosome changes have resulted from fissions and fusions as verified from chromosome staining experiments show- ing that populations with a large number of chromosomes have smaller chro- mosomes. Ecological and behavioural experiments have indicated that L. sin- apis, L. reali and L. juvernica are reproductively isolated. This is predomi- nantly a result of female mate choice with no successful mating observed in

27 trials of heterospecific courtship. Geographically isolated populations of L. sinapis and L. juvernica are not reproductively isolated (Dincǎ et al., 2013) but the most extreme karyotype ecotypes in L. sinapis show evidence of con- siderable hybrid break-down. Hence, this system consists of three recently di- verged species with divergence in ecological and behavioural traits and vari- ation in karyotype set-up. In addition, partial reproductive isolation exist be- tween ecotypes within L. sinapis. This provides a range of divergences from partial to complete isolation and can therefore be used to study the causes and consequences of species divergence.

2.1.2 The common chiffchaff The Old World leaf warblers (Phylloscopus) is a group of bird species that show morphological similarities in plumage colour (del Hoyo et al., 2006) but considerable differences in behaviour, migration patterns, vocalizations and habitat preference (Bensch et al., 2009). Within Phylloscopus, the chiffchaff super-species system consists of a set of species and subspecies that are widely distributed across the Palearctic region. Chiffchaff occur with four rather dis- tinct species and multiple subspecies. P. canariensis is restricted to the . P. ibericus is breeding in the and P. sindianus in the Mountains (subspecies P. s. lorenzii) to (subspecies P. s. sindianus). The common chiffchaff (P. collybita), my focal study system, has traditionally been divided into five subspecies but a revision of species status was recently made, separating the Siberian chiffchaff (P. tristis) from all other common chiffchaff (P. collybita) (del Hoyo et al., 2006). P. tristis (from here on tristis) is distributed from the Ural mountains to eastern and migrates to winter in south central and south eastern Asia (Stepanyan, 1990; Svensson, 1992; del Hoyo et al., 2006). P. c. abietinus (from here on abietinus) has a breeding distribution range that spans north-eastern Europe to the Ural mountains and migrates to wintering quarters in eastern and central (Stepanyan, 1990; Hoyo et al., 2006). The two species abietinus and tristis have different habitat preferences and slight differences in size and plumage coloration - abietinus with more greenish yellow plumage and larger body size and tristis with essentially no yellow in the plumage and smaller body size (Marova and Alekseev, 2008; Marova et al., 2009, 2013). These two sister species also have different vocalisations, while abietinus sings a slower song with a clear rhythm, tristis sings a more diverse, erratic and faster song (Ticehurst, 1938; Marova et al., 2009). Previously, it was thought that a dis- tinct chiffchaff taxon (P. fulvescens) with intermediate plumage characters and song occurred in a range between tristis and abietinus, essentially in the area of the Ural Mountains (Helbig et al., 1996). However, based on mtDNA and nuclear data, it was later proven that this was a hybrid zone between abi- etinus and tristis (Marova et al., 2013; Shipilina et al., 2017). The hybrid zone is most likely a secondary contact zone formed due to the re-colonization from

28 refugia inhabited during the last glacial maximum (Hewitt, 1999, 2001). The morphological and acoustic differences are more pronounced in allopatric zones than they are in the sympatric zone (Marova et al., 2009; Shipilina et al., 2017). Although birds with distinct abietinus and tristis characters appear in the sympatric zone, the majority of individuals appear to have intermediate characteristics and sing a mixed song (Lindholm, 2008; Marova and Alekseev, 2008; Shipilina et al., 2017). In the sympatric zone, individuals are also iden- tified to contain a genetic composition of both abietinus and tristis alleles (Ticehurst, 1938; Marova et al., 2009; Shipilina et al., 2017). In addition, birds that appear to be pure abietinus or tristis based on morphology, may contain a significant proportion of foreign alleles, indicating that species identification based on morphology can be dubious (Shipilina et al., 2017). The hybrid zone between abietinus and tristis provides an opportunity to investigate introgres- sion patterns between partly reproductively isolated lineages. By investigating genomic differentiation in allopatry and sympatry there is a chance to identify barriers to gene-flow between these intensively hybridizing lineages and in- vestigate the functional basis of genes located in regions where no introgres- sion occurs.

2.2 Genome sequencing and assembly A DNA molecule can be sequenced to obtain the order of the nucleotides ar- ranged in it (Shendure and Ji, 2008). DNA sequencing until the late 2000’s was mostly based on Sanger’s method of sequencing (Sanger et al., 1977). This method was for example used for the first successful human genome as- sembly, but the cost for generating this assembly was very high (Eaton, 2003; Margulies et al., 2005). In the past decade, much more efficient technologies have made the genome sequencing costs drop (Shendure and Ji, 2008) and generating genomic data is now achievable for almost any organism of interest (Ellegren, 2014). In my thesis, I applied short-read sequencing using Illumina technology (Bentley et al., 2008). The fragments of DNA that are sequenced are generally in the range of 100-150 bp and generally referred to as sequenc- ing reads (Bentley et al., 2008; Ekblom and Wolf, 2014). Libraries of DNA fragments are prepared from the DNA sample obtained from the organism with adapters ligated to the ends of the fragment (van Dijk, Jaszczyszyn and Thermes, 2014). The number of reads that are sequenced for each base in the target DNA is referred to as coverage or depth. Millions of reads are se- quenced and based on sequence similarity between reads and estimated dis- tance between read-pairs in paired-end and mate-pair libraries, the reads can be put together to obtain the genome assembly of the target species. This pro- cess is called de novo whole genome assembly (Sohn and Nam, 2018). The final quality of the de novo genome assembly is generally dependent on the average coverage across the genome and the length of each individual read

29 (Alkan, Sajjadian and Eichler, 2010). The longer the read length, the more efficient the identification of similarities between reads, and the assembly con- tiguity can be improved with longer reads. Various algorithms and computa- tional programs are available for the de novo genome assembly process and these are dependent on the combination of library types and the read length (Sohn and Nam, 2018). The final output of the de novo genome assembly pro- cess is a set of longer sequences that contain the order of the nucleotides pre- sent in the genome, these long sequences are called contigs (Zhang et al., 1994). The contigs can be merged by using distance information from long- insert libraries or very long reads (Zerbino and Birney, 2008). This process is referred to as scaffolding and results in longer fragments containing multiple contigs (where sequence information is available for each base) and unknown nucleotides (usually designated with N:s) that represent inferred gaps in the scaffolds where no read information is available (Luo et al., 2012). The length of single DNA sequencing read that can be sequenced has been a challenging factor for scientists (Ekblom and Wolf, 2014). Transposable elements and other repeats proliferating in the genome are also challenging to assemble since the high sequence similarity of reads from different repeats can generate erroneous inference of sequence homology – this often results in collapsing of reads from different repeats into a single contig with extremely high coverage and a high level of nucleotide inconsistencies (Quail et al., 2012). For exam- ple, over 50% of the human genome comprises of the repeat elements like LINEs, SINEs and LTRs (de Koning et al., 2011). Whole genomes of organ- isms with large repeat content are therefore harder to assemble (Sohn and Nam, 2018). Although the de novo genome assembly can give us more infor- mation about structural variation in the genome, it is possible to assemble the genome of a species using the information from a close relative (Kim et al., 2013). Such reference assisted genome assemblies use a genome assembly from a related species and uses sequence similarity search algorithms to iden- tify homologous regions between the assembly and the sequence reads from the target species (Kolmogorov et al., 2014). Reliable reference assisted ge- nome assemblies can only be efficient in clades with highly conserved genome architecture, like birds (Griffin et al., 2007). To assess assembly quality, different assembly statistics are generally ap- plied. N50 is a common metric used to assess the quality of a genome assem- bly. The N50 statistic is defined as the length of the shortest contig or scaffold included in the genome assembly that covers 50 percent of the total assembly length (the contigs or scaffolds in the genome assembly should be sorted de- scending by length) (Earl et al., 2011). The quality of the assembled genome can also be examined by other factors like average coverage across the ge- nome, comparing observed and expected GC content, tallying the number of conserved genes present in the genome and the total number of scaffolds. Pro- grammes like BUSCO (Benchmarking Universal Single Copy Othologs) and

30 CEGMA (Core Eukaryotic genes Mapping Approaches) can check for the to- tal number of conserved eukaryotic genes in the sequences which can be used to assess the quality (completeness) of assembled genome (Parra, Bradnam and Korf, 2007; Simão et al., 2015). We used ALLPATHS-LG (Butler et al., 2008) to de novo assemble the genome of L. sinapis where we sequenced reads for an average coverage of over 200X for a single inbred (five generations of full-sib mating) male individual. A high quality genome was assembled with a total length of 643 mega bases (Mb), in 7090 scaffolds. It had a N50 length of 857 kilo bases (Kb) and the longest scaffold was 6.9 Mb. The genome as- sembly was one of the best butterfly genomes assemblies with respective to the longest scaffold, total number of scaffolds and N50 value at the time of the assembly (June 2015). A reference assisted genome assembly of chiffchaff (Phylloscopus collybita abietinus) was assembled using the genome assembly of collared flycatcher (Ficedula albicollis) (Ellegren et al., 2012). The chiff- chaff genome assembly was assembled by reads covering about 89% of the reference genome of F. albicollis.

2.3 Population re-sequencing With the emerging cost effective technologies for genome sequencing, biolo- gists studying population wide are moving towards so called next generation sequencing (NGS) methods (Schuster, 2008). In the early 2000s biologists started looking at the genome-wide patterns of nucleotide variation to understand trait variation in natural populations (Black IV et al., 2001; Luikart et al., 2003; Ellegren, 2014). Sequencing population samples of multiple individuals to obtain information about polymorphisms and other structural variation within the population is referred to as population re-se- quencing. Sequencing reads from the re-sequenced individuals can be aligned to the reference genome of the species to identify variants that are segregating in the population and estimate allele frequencies. Computer programs like BWA (Li and Durbin, 2009), Bowtie (Langmead et al., 2009) and Stampy (Lunter and Goodson, 2011) are traditionally used to align the reads to the reference genome. Tools like VCFtools (Danecek et al., 2011), GATK (McKenna et al., 2010) and BayeScan (Foll and Gaggiotti, 2008) provide re- sources to identify SNPs and indels (insertions and deletions) from the se- quencing data. Programs like ANGSD (Analysis of Next Generation Sequenc- ing Data)(Korneliussen, Albrechtsen and Nielsen, 2014) provide resources to make evolutionary inferences from the data. Sequencing errors in different sequencing technologies are a major limitation to the variant discovery proce- dure. This can be bypassed by sequencing a sample to a higher average cov- erage (Chen et al., 2017). We sampled 10 individuals from each of six populations (L. sinapis: Swe- den, and , L. reali: Spain, L. juvernica: Ireland, Kazakhstan)

31 spanning all three species of the Leptidea cryptic complex. These populations were aligned to the reference genome assembly of L. sinapis using the soft- ware BWA (Li and Durbin, 2009). Common variants were discovered using VCFtools (Danecek et al., 2011), GATK (McKenna et al., 2010) and BayeScan (Foll and Gaggiotti, 2008) were used in VQSR filtering (Variant Quality Score Recalibration) in GATK (McKenna et al., 2010) to obtain a high confidence set of variants. To avoid sequencing errors that were not de- tected, we only considered SNPs that were covered in all the individuals for downstream analysis. We sampled 10 individuals from both sympatric and allopatric populations of both abietinus and tristis. These were also mapped to the reference assisted genome assembly of abietinus using BWA (Li and Durbin, 2009). Variant discovery was done using the best practices specified in the software manual of GATK (McKenna et al., 2010). To avoid sequencing errors that were not detected, we only considered SNPs that were covered in at least seven indi- viduals in each population for downstream analysis.

2.4 Population genetics Various molecular mechanisms and evolutionary factors like mutation, re- combination, genetic drift, natural selection, migration and hybridisation play a role in shaping the genome-wide allele frequencies of populations. The ge- nome-wide patterns of polymorphisms in populations can also inform on the demographic history (Locke et al., 2011), and the effects of positive and neg- ative selection (McDonald and Kreitman, 1991) in the population. Various different inferences can be made about the population and speciation in gen- eral when we look at variation segregating in a population. We used the poly- morphism data to estimate the following population genetic summary statis- tics (described below in detail).

2.4.1 Inference of demographic history Genome-wide patterns of genetic diversity and differentiation of a population or species are effected by the evolutionary history and fluctuation of popula- tion size in the past (Charlesworth, 2009). During the periods of glaciation many species in the northern hemisphere has taken refugia in the south (Arenas et al., 2011). After the ice caps melted during the interglacial period, large areas were recolonized by some of the species taking refugia and ex- panded their geographical distribution (Hewitt, 1999, 2000, 2001). Fluctua- tions in the effective population size through time in the past can be estimated using methods like the Pairwise Sequentially Markovian Coalescent model (PSMC) (Li and Durbin, 2011). This method uses the information from the genome-wide polymorphisms from a single diploid individual to obtain

32 changes in the historical population size given a certain mutation and recom- bination rate (Li and Durbin, 2011). The programme estimates TMRCA (time to most recent common ancestor) using the density of heterozygous positions and homozygous positions. It should be noted that PSMC and similar methods are sensitive to input parameter settings and multiple runs with different sets of parameters is generally recommended to achieve robust results regarding changes in population size over time. Although exact estimates are almost cer- tainly out of reach, trends in relative population size can be obtained and com- parisons across species and populations with similar rates of molecular pro- cesses are probably relatively robust (Nadachowska-Brzyska et al., 2016). In the population genetics study with the wood white butterflies we used the settings recommended in Li and Durbin, 2011 for recombination / muta- tion ratios ranging from 0.1 – 5.0 and average generation time was set to one to calculate change in the population size.

2.4.2 Detecting population structure Genome-wide single nucleotide polymorphism data can be used to investigate the presence of population structure between populations or species (Pritchard, Stephens and Donnelly, 2000). Tools like EIGENSOFT (Price et al., 2006; Galinsky et al., 2016) and SNPRelate (Zheng et al., 2012) use the genome-wide single nucleotide polymorphisms to generate a matrix of vari- ants to cluster genetically similar individuals together. This can inform on the number of distinct genetic groups present in a dataset, the level of genetic structure between clusters and identify hybrids and backcrosses with shared genetic components from two groups. Tools like STRUCTURE (Pritchard, Stephens and Donnelly, 2000) and ADMIXTURE (Alexander, Novembre and Lange, 2009) uses genome-wide genotype data to cluster samples together to show proportions of admixture between groups in a dataset. In population genetic studies of the wood white butterflies and the chiff- chaff study system we applied PCA using SNPRelate (Zheng et al., 2012) and checked for population structure using ADMIXTURE (Alexander, Novembre and Lange, 2009) in population genetic study of the wood white butterflies and LEA (Landscape and Ecological Association Studies) in population ge- netic study of the chiffchaff study system (Frichot et al., 2014).

2.4.3 Introgression analysis Detecting hybridization and introgression can direct us towards the regions of the genome that are harbouring loci under restricted gene flow or loci under gene flow. This can for example be done using the four taxon test using shared polymorphisms called the ABBA-BABA test (Kulathinal, Stevison and Noor, 2009; Green et al., 2010; Durand et al., 2011). In the ABBA-BABA test the ‘A’ represents the ancestral state and ‘B’ represents the derived state of the

33 alleles. The test is based on the expectation that equal number of ABBA sites and BABA sites are present at a locus in the four taxon system (Martin, Davey and Jiggins, 2015). Imagine a hypothetical test with four recently diverged taxa G1, G2, G3 and O. ‘O’ is the outgroup to rest of the three taxa and G3 is an outgroup to G1 and G2. A position is said to be as ABBA when nucleotide at that position in G2 will be the same as the nucleotide in G3 and nucleotide in G1 will be the same as in O. An ABBA position specifies that G2 and G3 shares the same nucleotide at that locus. A position is said to be in BABA when nucleotide at that position in G2 will be the same as O and nucleotide at G1 will be the same as G3. A BABA site specifies that G1 and G3 shares the same nucleotide. Under this test G2 and G3 will be identified to be exchanging genetic material if there are an access of ABBA sites and G1 and G3 will be identified to be exchanging genetic material if these are an access BABA sites. The D statistics is calculated as the ratio of the sum and difference of the total number of ABBA positions and BABA positions (D= (nABBA- nBABA)/(nABBA+nBABA); Green et al., 2010).

2.4.4 Genome wide patterns of diversity and differentiation Patterns that occur across the genome are shaped due to forces acting on the segregating genetic variation in populations sampled (Charlesworth, 1998, 2009). The patterns of segregating genetic variation within populations and differences between populations can be used to infer the evolutionary forces that are driving divergence between the populations and to guide to assessing the traits that are important for divergence (Wolf and Ellegren, 2017). Looking at genome-wide patterns of differentiation can point us to loci that are open to gene flow from other species and to loci with restricted gene flow (Turner, Hahn and Nuzhdin, 2005; Nosil, Funk and Ortiz-Barrientos, 2009). The ad- vantage of having genome-wide data is access to all loci across the genome where we can compare the relative effects of random genetic drift and differ- ent types of natural selection (Wolf and Ellegren, 2017). Fixed nucleotide dif- ferences (sometimes within the protein coding genes) can be identified be- tween reproductively isolated species and geographically isolated popula- tions. Genes that are identified as potentially affected by selection can be stud- ied in depth to infer functions related to local adaptation and species divergence (Turner, Hahn and Nuzhdin, 2005; Nosil, Funk and Ortiz- Barrientos, 2009).

Summary statistics like ϴπ (nucleotide diversity, the average number of pair- wise nucleotide differences between two randomly sampled chromosomes in a population; Nei, 1973), FST (the genetic differentiation between populations; Wright, 1931, 1950), DXY (absolute divergence between populations, the av- erage number of nucleotide differences between a pair of chromosomes sam- pled, one in each respective population or species; (Nei and Li, 1979; Nei,

34 1987)) are some of the widely used statistics in genome-wide studies. FST is the measure of allele frequency differences between two populations and has been extensively used in genome-wide differentiation studies (Wright, 1931, 1950). Factors like selection, restricted gene flow and migration have an effect on FST at the locus. The loci with extremely high FST is metaphorically referred to as “islands of genetic differentiation” (Turner, Hahn and Nuzhdin, 2005; Cruickshank and Hahn, 2014). Initially these regions were believed to harbour genes responsible for reproductive isolation (Turner, Hahn and Nuzhdin, 2005). Upon reanalysing, these regions were identified to occur through pro- cesses like ancestral linked selection and reduced recombination in recently diverged species (Cruickshank and Hahn, 2014; Burri, 2017). DXY is the abso- lute divergence between populations and identifies the number of nucleotide differences between populations (Nei, 1987). ϴπ and DXY are effected differ- ently by linage sorting relative to FST. DXY is a measured independent to the diversity levels in the populations in focus (Nei, 1987). Interpreting genomic differentiation with respective to DXY, FST and ϴπ are key in understanding genome-wide patterns of differentiation.

2.4.5 Selection tests

The ratio of non-synonymous to synonymous polymorphisms (pN/pS) and the ratio of non-synonymous to synonymous substitutions (dN/dS ) can be used to understand the processes affecting evolution of genes. Under neutral condi- tions, pN/pS and dN/dS are expected to be equal (This means the amount of amino acid changes are the same as the polymorphism; McDonald and Kreitman, 1991) while the ratios are not expected to be equal in the presence of positive or negative selection. This was observed in natural populations for the first time by McDonald and Kreitman in 1991 (McDonald and Kreitman, 1991). The McDonald-Kreitman test has since then been a classical test to assess presence of selection. As deleterious mutations are rarely fixed in a populations they do not contribute to dN/dS , but can stills segregate (at low frequency) and therefore contribute to a higher pN/pS ratio (Smith and Eyre- Walker, 2002; Bierne and Eyre-Walker, 2004). Genomic regions with lower pN/pS indicates that negative selection removes the deleterious mutations from the population (Murray et al., 2017). A derivative of the classical McDonald- Kreitman test is the test for Direction of Selection (DOS) which is a statistical measure that gives an estimate of selective constraints or adaptive processes acting on the genes (Stoletzki and Eyre-Walker, 2011). Positively selected genes are expected to have a significantly higher dN/dS - than pN/pS-ratio and a positive DOS value (Stoletzki and Eyre-Walker, 2011). Positively selected genes are extracted using the McDonald-Kreitman test and the DOS value (Stoletzki and Eyre-Walker, 2011) in each of three species of the wood white butterflies.

35 2.5 Gene annotation Annotation is a computational procedure of fetching out information from the genome such as genes, CDS (coding sequence), mRNA (messenger RNA), 5’UTR (upstream untranslated region), 3’UTR (downstream untranslated re- gion) and other sequence classes like TEs (transposable elements). There are several ways of building a gene annotation for a genome sequence. When gene annotation information is available from a (close) relative to the focal species, the CDS can be identified using alignments between the two species. This strategy is one of the most cost effective protocols, given that the annotation information resources are already available. An alternative method is ab initio prediction of genes across the scaffolds using computational algorithms. Pro- grammes like AUGUSTUS (Stanke and Waack, 2003) and GENSCAN (Burge and Karlin, 1997) use gene models of already available coding se- quences to identify genes in a reference genome sequence. As ab initio pre- diction accuracies vary between programs it is always used in combination with RNA sequencing data to train the ab initio prediction (Wang, Chen and Li, 2004). The most accurate way of building a gene annotation is usually a combination on RNA-seq data and ab initio predictions from previously avail- able gene models and using alignments to databases (e.g. Swiss-Prot; Bairoch and Apweiler, 2000) to identify potential functionalities of the genes (Mudge and Harrow, 2016).

2.6 RNA sequencing and gene expression RNA-seq data generated using high throughput sequencing can be used for detecting differentially expressed genes (Mortazavi et al., 2008). Identifica- tion of differentially expressed genes from transcriptome assemblies involve three steps. One, RNA obtained from extraction protocols are fragmented and converted into cDNA sequences (Complementary DNA) and sequenced using high throughput sequencing technologies (Mortazavi et al., 2008). Two, se- quencing reads obtained from sequencing are mapped to a transcriptome as- sembly using mapping algorithms such as STAR (Dobin et al., 2013), TopHat (Trapnell, Pachter and Salzberg, 2009). The transcriptome assembly can either be assembled de novo from the produced RNA sequencing reads of the focal organism using an assembly tool (e.g. TRINITY; Grabherr et al., 2011) or a previously available assembly from a related species can be used to map the reads against (STAR; Dobin et al., 2013, TopHat; Trapnell et al., 2009). It is usually recommended to de novo assemble the transcriptomes even if the ref- erence transcriptome is available for mapping as de novo assemblies can iden- tify transcripts that are not present in the reference transcriptome (Martin and Wang, 2011). Three, eradication of confounding effects has to be done to get the unbiased final read counts (expression levels) for each gene to identify

36 differentially expressed genes between groups of interest (Costa-Silva, Domingues and Lopes, 2017). A Gene Ontology analysis (GO) analysis can be done to understand the functions and regulatory pathways the genes are involved in (Ashburner et al., 2000). Phenotypes can be controlled far before the organism reaches adulthood (Mank et al., 2010). Depending on the phenotypes of interest, controlled ex- periments can be conducted to understand the genes underlying the phenotypic changes. We used L. sinapis larvae raised in different light treatments (LD: 22 hours day light and 2 hours darkness, SD: 12 hours day light and 12 hours darkness) to understand genetic basis of diapause regulation. We sequence the RNA of larvae harvested at instar-III, instar-V, one day after pupation in SD and LD treatments. Butterflies that developed directly into adults were also sequenced. The sequencing was performed using Illumina HiSeq 2500 plat- form to obtain an average read length of 126 bp. The transcriptomes of all samples were assembled using TRINITY (Grabherr et al., 2011). Differential gene expression analysis was conducted using DESEQ2 (Love, Huber and Anders, 2014) on R/BIOCONDUCTOR. Batch effects were later removed for technical errors, family and sex using SVA (Leek et al., 2012).

37 3. Research aims

3.1 General aims The general theme of my thesis is to understand the genetic basis of population differentiation and speciation. I have examined the Leptidea cryptic complex and the chiffchaff study system at the genomic level to understand the pro- cesses underlying species divergence. The two study systems provide oppor- tunities to explore different aspects of the divergence process. In the wood whites, the system consists of lineages at different stages of divergence, from completely reproductively isolated taxa (L. juvernica on the one hand versus L. sinapis and L.reali on the other) to partly isolated population pairs (L. sin- apis ecotypes that also differ in chromosome numbers). At the intermediate level, there are hybridizing species with incomplete pre-zygotic barriers where hybrid fertility seems to be extremely low since no back-crossing has been observed (L. sinapis versus L. reali). I was particularly interested in under- standing the speciation history of these largely sympatric species and investi- gate the origin of the formation of the unique chromosome number cline in L. sinapis. A main aim was to characterize the factors shaping genome wide pat- terns of genetic variation (ϴπ), differentiation (FST) and absolute divergence (DXY) in the recently diverged species and to identify targets of divergent se- lection that could inform on potential ecological effects on the divergence pro- cess. In addition, since chromosome changes can induce reproductive isola- tion, I wanted to quantify the effects of karyotypic change on the divergence in general. Finally, since adaptation to climatic conditions may be a potent force underlying divergence processes, I aimed at investigating the genetic basis of a key trait for adaptation in butterflies, the propensity to enter diapause during unfavourable climatic conditions. The chiffchaff study system provides opportunity to investigate the ge- nomic consequences of recurrent hybridization and back-crossing. I first wanted to describe if the phenotypic differences between species are the cor- rect predictors to differentiate between abietinus and tristis in relation to their genetic components. I also wanted to identify the genetic basis of partial iso- lation in this study system. By taking advantage of the occurrence of both al- lopatric and sympatric populations, I wanted to assess how gene-flow affects patterns of genomic divergence and potentially identify barriers to introgres-

38 sion. The amount of gene flow between the three species in the Leptidea cryp- tic complex was previously unknown, I used genome wide polymorphisms to estimate the admixture proportions and gene flow between them.

3.2 Scientific aims Paper I – My primary goal here was to explain the coincidental discovery that the genome size in Leptidea was considerably larger than any other described butterfly genome. I also wanted to examine if the processes driving genome size expansion in Leptidea in general had continued in any (or all) lineage(s), resulting in genome size differences across species/populations within the cryptic complex – i.e. to investigate if all of the populations in the cryptic complex have the same genome size or not. In addition, I wanted to use diver- gence information to determine the rate at which the genome size expansion happened in Leptidea. By estimating expansion rates and combining rate in- ference with analysis of temporal variation in TE activity and global diversity estimates, one aim was to investigate if the genome-expansion has happened as a consequence of stochastic accumulation of fragments (neutral scenario) or if genome-expansion could be advantageous.

Paper II – In this chapter, I wanted to use genome-wide re-sequencing data to characterize the genomic landscape of differentiation between Leptidea species and ecotypes with distinct chromosome numbers. A major aim was to infer the speciation history by applying a combination of demographic infer- ence, tests for post-divergence gene-flow and assessment of genomic regions with high differentiation. I wanted to use a combination of population genetic summary statistics to discriminate between processes driving genomic diver- gence (underlying molecular mechanisms or diversifying selection). The pres- ence of multiple lineages at different stages of divergence also allowed me to assess if genomic divergence had involved the same regions in independent divergence processes – i.e. if there have been specific regions of the genome that have been important for the divergence process over and over again – a process generally referred to as parallelism. Finally, I wanted to get infor- mation about the functions of the genes located in potential differentiation peaks to understand the relevance of ecological and behavioural differences across species and ecotypes in the divergence process.

Paper III – The aims were to use whole genome re-sequencing data to i) quantify the genome-wide level of genetic differentiation between European (Phylloscopus collybita abietinus) and Siberian chiffchaff (P. tristis) in both sympatric and allopatric regions, and ii) use differentiation outliers and infor- mation about fixed, shared and private non-synonymous differences to try to

39 characterize the genetic basis for key phenotypic differences (plumage colour, song, habitat preference and migration direction) between the two species that might contribute to reproductive barriers. A secondary aim was to estimate the association between nuclear genome composition and phenotypic expression in hybrids and backcrosses from the hybrid zone to achieve information about species characterization based on external appearance.

Paper IV – In this study, my main aim was to combine the previously devel- oped genome assembly and transcriptomic data (Paper I, Paper II, Paper V) to generate annotation information. This data was used in combination with a traditional selection test using polymorphism and substitution frequencies (McDonald-Kreitmen test; McDonald and Kreitman, 1991) to detect genes under divergent selection between the three Leptidea species and between eco- types within L. sinapis and L. juvernica. I also used the annotation information to estimate differences in density of targets of natural selection and the popu- lation re-sequencing data was used to quantify regional recombination rates. This was performed in order to investigate the effects of different selection regimes (positive and negative selection) and the mutation rate on regional variation in genetic diversity. The results from the study were used to infer the functional categories of genes underlying genomic divergence between spe- cies and ecotypes and to quantify the relative importance of selective sweeps and back-ground selection in shaping the rate of reduction in genetic diversity in different genomic regions.

Paper V – This study was designed with the major aim to investigate the ge- netic basis of diapause induction – a key trait for local adaptation in many insect species where the response is known to be triggered by exposure to particular light regimes (day lengths) (Soediono, 1989). By controlled rearing of Swedish L. sinapis larvae in both long-day (LD; 22 hours light, 2 hours dark) and short-day (SD; 12 hours light, 12 hours dark) light treatments we forced the larvae reared in different treatments to develop directly (LD) or enter diapause (SD). We used an RNA-seq approach to assemble the transcrip- tomes of L. sinapis in different developmental stages and get expression data to quantify gene expression changes associated with diapause induction. Sec- ondary aims were to characterize the response in gene expression across on- togenetic stages to assess if diapause seems to be induced by a sudden switch mechanism or if the trait is rather determined by gradual change in specific molecules. I also wanted to investigate if there were any sex-specific effects of the light treatment. The major aim was thus to detect the genes and path- ways that could be responsible for the regulation of diapause in L. sinapis.

40 4. Paper summaries

Paper I – Rapid increase in genome size as a consequence of transposable element hyperactivity in wood white (Leptidea) butterflies: Published in Ge- nome Biology and Evolution (Talla et al., 2017).

The Eurasian wood white butterflies Leptidea sinapis, L. reali and L. juver- nica are classified as cryptic species as they are similar in external appearance and can be separated only by examining their genital morphology, behaviour, host plant preference and karyotype variation. Recent observations using flow cytometry have indicated that cell nucleus size differs considerably between species, with a larger nucleus size in L. juvernica compared to L. sinapis and L. reali. As no previous attempt to sequence the genome of any butterfly in the Leptidea cryptic complex was made, we sequenced and de novo assembled the nuclear and mitochondrial genomes of L. sinapis (Swedish population). The individual selected for sequencing was an inbred male individual from five generations of full sib mating. Two independent genome assembly pro- cedures revealed that the genome size of L. sinapis is approximately twice as big as previously published butterfly genomes. We also generated population re-sequencing data of 10 male individuals from six different populations; L. sinapis from Sweden, Kazakhstan and Spain, L. reali from Spain and L. ju- vernica from Kazakhstan and Ireland. We used a k-mer based method and estimated the average genome size of each populations and found that the ge- nome size of the Kazakhstan L. juvernica individuals was significantly larger than the genome size in all other individuals. Using the L. sinapis genome assembly together with the 12 publicly avail- able butterfly genomes at the time, we used repeat identification algorithms to classify transposable elements across the genomes of all available species. This was also done for each of the six Leptidea populations. We found that genome size correlated with the TE content across butterfly lineages and that TEs explained the genome size difference between L. juvernica from Kazakh- stan and other Leptidea populations. Both the expansion of genome size in Leptidea in general and in L. juvernica from Kazakhstan could be attributed to recent expansions of predominantly LINEs, DNA elements and unclassified TEs. To understand the rates at which these genome expansions happened in wood white butterflies, we reconstructed a phylogeny and estimated diver- gence times of the 15 butterfly species using 224 conserved core

41 nuclear genes. Previously available divergence time estimates for key nodes were used as priors. The divergence time between the Leptidea species com- plex and other butterflies was estimated to be 80.6 Million years (My) (95% CI 67.6-93.2 My), translating to a genome expansion rate of approxi- mately 4 Mb / My. We estimated the time to the most recent common ancestor (TMRCA) for the Leptidea complex to be approximately 3 My (95% CI 2.5- 3.6 My), considerably deeper than previous estimates based on mtDNA, trans- lating to an overall expansion rate of 72 Mb / My in L. juvernica from Ka- zakhstan after the split from Irish L. juvernica. Preliminary diversity estimates suggest that genome-wide genetic diversity has been low in Leptidea in gen- eral and in L. juvernica in particular. Hence, we interpret the rapid increase in genome-size as a consequence of low effective population sizes and lower efficiency of selection to remove novel insertions of (potentially slightly del- eterious) TEs than in other Lepidoptera lineages.

Paper II – Lack of parallelism: lineage-specific directional selection drives genome differentiation across a triplet of cryptic butterfly species: Manuscript in review.

Genome wide patterns of genetic diversity and differentiation can be used to understand the relative role of selection and genetic drift underlying diver- gence processes between recently diverged species. Scanning the genome of incipient species can also inform on functions of potential candidate genes underlying adaptive divergence. The Leptidea cryptic complex is well-known for harbouring extreme karyotype variation, with L. sinapis showing a chro- mosomal cline from Sweden (2n = 57, 58) and Kazakhstan (2n = 56 - 64) on one end of the spectrum to Spain (2n = 106, 108) on the other (Lukhtanov et al., unpublished). By using whole-genome re-sequencing information from six populations representing different levels of divergence and different levels of reproductive isolation, we estimated population genetic summary statistics across the genome to investigate the speciation history in general and to iden- tify potential differentiation outliers that could be relevant for the divergence processes. The combination of populations allowed us to investigate potential patterns of parallelism – if the same genomic regions seem to be differentiated in different population comparisons – indicating that the same set of genes has important in independent divergence processes. We also wanted to assess po- tential post-divergence gene-flow to have a chance to identify potential barri- ers to gene-flow. In short, I used population genetics summary statistics esti- mated from SNPs collected from the same populations that were used in paper I to understand 1) speciation and demographic history of these three species, 2) estimate the regional variation in genetic differentiation, absolute diver- gence and nucleotide diversity, 3) identify potential differentiation outliers to find genes that could be responsible for lineage-specific adaptations, 4) test

42 for presence or absence of parallel evolution in genomic differentiation in dif- ferent species- and population pairs, and, 5) understand the effects of karyo- type differences on the genetic differentiation between populations of L. sin- apis. We found that L. juvernica had the lowest overall nucleotide diversity, fol- lowed by L. reali and L. sinapis. Admixture proportions showed that the three species are clearly distinct and that there is considerable genetic structure within L. sinapis and L. juvernica. In L. sinapis, the population from Spain (2n = 106 - 108) was significantly differentiated from the populations from Kazakhstan (2n = 56 - 64) and Sweden (2n = 57, 58) (FST = 0.13 ± 0.06 vs. L. sinapis (Sweden), FST = 0.14 ± 0.06 vs. L. sinapis (Kazakhstan)), while the latter two were hardly differentiated at all (FST = 0.019 ± 0.030). I did not detect any evidence for post-divergence gene-flow between any of the species. The genetic diversity, differentiation and absolute divergence estimated in pair-wise comparisons were highly heterogeneous across the genome and the levels of nucleotide diversity were highly correlated between populations. Ge- netic differentiation was identified to be higher between L. reali and L. juver- nica (FST = 0.28 ± 0.081) compared to L. sinapis and the other two species (FST = 0.15 ± 0.055 vs. L. reali; FST = 0.16 ± 0.044 vs. L. juvernica), probably as a consequence of low effective population sizes, and faster sorting of seg- regating alleles in both L. reali and L. juvernica. In contrast to many previous genome-scan studies of closely to moderately divergent species pairs, the highly differentiated regions also had higher than average absolute divergence. Highly differentiated regions were also narrow and dispersed rather than clustered in specific regions. This suggests that dif- ferentiation has not been driven by reduced recombination (lower regional Ne) resulting in reduced diversity already in the ancestral population (before di- vergence occurred). It rather suggests that genetic drift in combination with weak lineage specific selection, potentially on many genes with small effect, has been the main forces underlying genomic divergence between the species and ecotypes. Differentiation outliers also appeared to be independent in dif- ferent population and species comparisons suggesting that different gene sets have been involved in genomic divergence in each case. However, the func- tional categories of genes in differentiation outliers were overlapping between comparisons suggesting a parallel pattern in terms of function rather than po- sition. Gene functions were related to metabolism, cellular processes and re- sponse to stimuli which is in line with ecological differences between species and ecotypes.

43 Paper III – Heterogeneous patterns of genetic diversity and differentiation in European and Siberian Chiffchaff (Phylloscopus collybita abietinus / P. tristis): Published in G3 (Talla et al., 2017).

Hybrid zones of recently diverged species provide a way to study speciation processes between lineages that are still exchanging genetic material. Hybrids collected from the sympatric regions also provide resources to obtain the ge- netic basis of potential phenotypic differences between the hybridizing taxa. Studying the species in the early stages of diversification is key to understand- ing the genetic basis of speciation. The European (Phylloscopus collybita ab- ietinus) and the Siberian chiffchaff (P. tristis) that were once considered sub- species are now recognised as independent species due to the differences in morphology, plumage colour, song, migration routes and habitat preference (Marova and Alekseev, 2008; Marova et al., 2009, 2013). They are recently diverged species that formed a secondary hybrid zone near the Ural Moun- tains, possibly as a consequence of population expansions out of refugia after glacial periods. The phenotypic differences are less pronounced in the indi- viduals from the hybrid zone and sympatric birds often sing mixed songs and show intermediate morphological and phenotypic characteristics. In this paper we used a population genetics approach using population re- sequencing of both sympatric and allopatric abietinus and tristis. We sampled 10 male individuals of abietinus from both the sympatric and the allopatric region and 10 male individuals of tristis from both sympatric and allopatric regions. A reference assisted genome assembly of one abietinus individual was obtained by applying high-coverage sequencing (>100X) and using the high-quality genome assembly of Ficedula albicollis as reference for map- ping. The chiffchaff assembly covered approximately 90% of the F. albicollis genome and completeness estimates were similar between the two species. The 10 individuals from each of the four populations were sequenced to ap- proximately 4X coverage using standard Illumina paired-end technology and sequence reads were mapped to the reference assisted assembly for identifica- tion of polymorphisms (SNPs). The SNPs were used to estimate global (ge- nome-wide) and regional (10 kb non-overlapping windows across the ge- nome) variation in diversity. We applied both admixture and PCA analysis which both suggested that phenotype is a poor predictor of the genomic composition of birds in the sym- patric zone, as some individuals identified to be abietinus based on external appearance had the majority of the genome introgressed with tristis alleles. The overall level of genomic differentiation was identified to be reduced in the sympatric regions compared to the allopatric regions, likely due to consid- erable historical hybridisation and backcrossing. Consistent with previous ge- nome-wide analyses in birds, regional variation in differentiation correlated negatively with both absolute divergence and nucleotide diversity and differ-

44 entiation islands were located in the same regions as observed in the independ- ent comparison of pied- and collared flycatcher (Ellegren et al., 2012). This suggests that the recombination landscape is conserved also in chiffchaff and that historical linked selection has resulted in reduced diversity in low recom- bination regions prior to species divergence. The genome scan was combined with identification of fixed non-synony- mous and splice altering substitutions between the allopatric populations to identify candidate genes for species specific traits. We could identify a hand- ful of candidate genes that might have been important for trait divergence be- tween the two species.

Paper IV – Dissecting the effects of natural selection and mutation on genetic diversity in three recently diverged cryptic butterfly species: Manuscript.

The association between effective population size (Ne) and genetic diversity and the rate of adaptive evolution has been much debated. According to pop- ulation genetic theory, species with large Ne should harbour more genetic var- iation and potentially accumulate fixed beneficial mutations at a higher rate than species with low Ne. The rate of adaptive change and the level of genetic variation in populations can vary across regions of the genome as a conse- quence of regional variation in Ne (in essence regional variation in recombi- nation rate). Identification of adaptive variants can provide information about the factors leading to phenotypic evolution in populations and in some cases also inform on potential divergence processes between diverging lineages. In this paper, we used the previously available de novo genome and transcrip- tome assemblies of L. sinapis and population re-sequencing data from multi- ple populations representing all Leptidea species. The data were used for gene annotation of the L. sinapis genome and for estimating the population recom- bination rate in non-overlapping windows across the genome of all six popu- lations. The annotation information was used to estimate the nucleotide diver- sity at different site categories across the genome, and to estimate the ratio of non-synonymous to synonymous polymorphisms (pN/pS) and non-synony- mous to synonymous substitutions (dN/dS) for all genes independently and across the genome in all populations. Generating this data set allowed us to investigate the overall effects of natural selection, quantify the effects of link- age on regional levels of genetic diversity and identify potential genes affected by positive selection in each Leptidea species, and in populations with distinct and distribution ranges. Our analyses showed that the level of ge- nome-wide genetic variation was low in all species, again suggesting small effective population sizes. Despite that we found that back-ground selection probably has been the main force reducing nucleotide diversity, especially in regions with high gene density and/or low recombination rate. We also found a low (<1) transition to transversion ratio which was verified with polymor- phism data in the genome assembly. In line with some other studies (Keller,

45 Bensasson and Nichols, 2007), this could indicate that the assumption of a general transition bias across organisms is incorrect and that different muta- tion mechanisms may be at play in different taxa. We found that the overall proportion of adaptive substitutions was similar to previously studied organ- isms. The analysis aimed at identification of positively selected resulted in a set of genes related to metabolism and response to stimulus – functions that may be relevant for the development of ecological differences between the species and between the different populations within species.

Paper V – Gene expression profiling across ontogenetic stages in the wood white (Leptidea sinapis) reveals pathways linked to butterfly diapause regula- tion. Published in Molecular Ecology (Leal et al., 2018).

Insects in temperate environments often go through a phase of arrested devel- opment called diapause to cope with the extreme weather conditions during winter. The propensity to enter diapause may differ across latitudes and light cues are often used in the ‘decision’ making whether direct development or diapause stage should be entered. Understanding the genetic basis of diapause is important to understand the evolutionary processes shaping local adaptation and potentially also how these could contribute to diversification of species. Our primary goal in this paper was to characterize the pathways responsible for regulation of diapause initiation in L. sinapis. Populations of L. sinapis in northern Europe generally enter diapause as pupae after one generation of mating (univoltine), while the populations from southern Europe enter dia- pause after several generations (multivoltine). A second generation (bivoltine) can sometime appear in northern Europe if conditions are beneficial. Diapause induction is triggered by day length (Friberg et al., 2008). Hence, these obvi- ously adaptive characteristics have evolved as a response to differences in the day length and temperatures between north and south to maximize reproduc- tive output in each respective region. Populations of Swedish L. sinapis can be forced to develop directly by treating the larvae with long day length, or to diapause by treating the larvae with short day length. In this study, several L. sinapis females were collected in central Sweden and reared in 50*50*50 cm cages to lay eggs in a common garden setting. The eggs from each female were separated in a split-brood design where one cohort was forced into direct development by treating them with 22 hours daylight and 2 hours darkness (LD) while the other cohort was treated with 12 hours day light and 12 hours darkness (SD) to trigger diapause. When larvae appeared, they were harvested at instar-III, instar V, and one day after pupation in SD treatment and at instar- III, instar V, one day after pupation and adults from LD treatment. For instar V, pupae and adults, both males and females were collected. At least 10 indi- viduals (pupae) from each treatment were kept in the light regimes to verify that the treatment resulted in the expected response. All LD pupae emerged in approximately one week while no SD larvae had emerged after > 3 months.

46 Hence, the treatment resulted in the expected response in both cases. Total RNA was extracted from the whole body, enriched for mRNA and sequenced using standard Illumina pared-end sequencing. The RNA-seq data were used to quantify gene expression differences between treatment groups across on- togenetic stages to identify potential pathways responsible for the regulation of diapause. After correcting for potential confounding effects (family, sex, sequencing batch), a set of differentially expressed genes was identified in each develop- mental stage. The genes were as expected related to metabolic rate, but a sig- nificantly overrepresented group of genes were related to neurotransmission. We found that several genes related to neuroamine synthesis and sequestration were differentially expressed between treatment groups, consistently across ontogenetic stages. Short day conditions triggered an increase in the dopamine levels which, in turn potentially activates the switch to diapause. Long day treatment decreased dopamine levels and potentially forced the butterflies to development directly. This suggests that diapause induction may be triggered by accumulation of specific molecules over a longer time period (hour-glass model) rather than by an instant signal at a specific time point. The hour glass hypothesis predicts that diapause is induced when a certain metabolite is pro- duced up to a minimum threshold. The role of dopamine in diapause regula- tion has also been suggested in other butterfly and moth species (Isabel, Gourdoux and Moreau, 2001; Noguchi and Hayakawa, 2001; Wang et al., 2015).

47 5. Conclusions and future projects

Different factors such as genetic drift, natural selection, chromosome rear- rangements may be the drivers of reproductive isolation. The detailed descrip- tion of these mechanisms is key to understand speciation processes. At the same time, identification of the genetic underpinnings of adaptive traits can inform on the importance of ecological divergence driving species apart. The natural and adaptive process governing the gradual increase of genetic differ- ences and change of allele frequencies can be thoroughly studied in recently diverged species with populations sampled in a broad geographical range (Wolf and Ellegren, 2017). The Leptidea cryptic complex (TMRCA ~ 3 My) and the Chiffchaff study system (TMRCA < 1.5 My (Alstrom et al., 2018)) are both very recently diverged and suited for extensive genomic analysis. Prior to the genomics era, the knowledge about the genetic basis of species barriers and adaptive traits was restricted to model organisms kept in the lab. With the possibility to utilize comparatively cheap methods to generate large scale genetic data from non-model organisms comes the chance to investigate these processes also in natural populations. My primary aims in this thesis were to extend the knowledge of the genetic processes of speciation and ad- aptation using the Leptidea cryptic complex and the chiffchaff study system. The two systems provided opportunities to study genomic divergence at dif- ferent settings. In the Leptidea system reproductive isolation was already com- plete and that no post-divergence gene-flow had occurred between species. In the chiffchaff on the other hand, hybridization and backcrossing have resulted in that a major part of the genome had lower differentiation in sympatry than in allopatry allowing for identification of potential barriers to gene-flow. I also generated RNA-seq data to use expression profiling to find candidate genes / pathways involved in regulation of diapause in L. sinapis. Taken together, my efforts have resulted in increased understanding of the speciation history of both systems and the joint analyses of genomic re-sequencing data and ex- pression data identified candidate regions for local adaptation and potentially reproductive isolation. Further verification is needed to understand the effects of these genes on the phenotype in both Leptidea and in the chiffchaffs. With the increase efficiency of applying verification experiments, for example via RNA interference and / or gene editing with Crispr-Cas9 (Ran et al., 2013) this is potentially an achievable goal within a near future. Especially in the Leptidea system, this should be a potential way forward, since butterflies are

48 relatively easy to in captivity and it is straightforward to monitor phe- notypic variation in many individuals. This may not be achievable in a wild bird study system like the chiffchaff. However, additional analyses, for exam- ple cline sampling across the chiffchaff hybrid zone, may lead to independent verification of candidate barrier loci. My studies also discovered novel things in Lepidoptera. I found that the genome size variation was large and that this was driven by different activity of transposable elements in different butterfly and moth lineages. An addi- tional discovery was that the transition transversion rate was considerably lower than what is observed in other taxa. However, preliminary data from other insect species indicate this as well and also in Heliconius butterflies, this ratio is lower than what is generally observed. Maybe this is a result of poor taxon sampling for these kind of studies, which calls for more analyses of mutation classes in many different organisms. This will increase our under- standing of potential variation in mutation mechanisms across the tree of life. My results also suggest that the Leptidea clade in general has a comparatively low effective population size but that linked selection (predominantly back- ground selection reducing diversity) still has had a considerable effect on di- versity. The regional variation in genetic diversity across the genome in the Leptidea cryptic complex was explained by linked selection (gene density and recombination rate). In depth analysis will be required to elucidate the effects of GC-biased gene conversion (gBGC) on the variation in genetic diversity in various site categories and potential confounding effects on signals of selec- tion across different site categories. With an improved chromosome level ge- nome assembly, effects of chromosome size and position on the chromosome on genetic diversity can be understood in forthcoming projects. In agreement with many previous analyses, a major conclusion is that ge- nomic divergence is dependent on combined effects of many different molec- ular and evolutionary forces. In Leptidea, my studies indicate that reproduc- tive barriers appeared before secondary contact and these probably involve both pre-zygotic barriers (observed before) and karyotype changes. My inves- tigations also pin-point the importance of having information about both di- versity-based estimates and the recombination rate to be able to interpret pat- terns of genomic differentiation. Although more detailed recombination data estimating the cross-over rate (r) directly will be needed to verify this, one conclusion is that the holocentric nature of butterfly chromosomes may lead to an even recombination landscape across chromosomes. I could not detect a significant effect of karyotype on the global recombination rate between kar- yotypic extremes in Leptidea, potentially indicating that karyotypic changes do not result in altered global recombination rates. This could be a result of that crossing-over is not necessary for correct segregation of chromosomes in meiosis in Lepidoptera, but again, more detailed and direct estimates of the recombination rate in different karyotypes will be necessary to verify this. De-

49 tailed recombination maps and linkage maps will hence be needed for the ex- treme karyotype variants of L. sinapis to understand the relationship between recombination, chromosome size, genetic diversity and genetic differentia- tion. A high-density recombination map can also reveal effects of linked se- lection on genetic differentiation in more detail. Understanding the genomic level fitness effects of the hybrids and backcrosses between the extreme kar- yotypes in L. sinapis can also reveal chromosomal incompatibilities. For this, hybrids and back-crosses between the karyotypic extremes have to be made and association between chromosome changes and fitness effects have to be established. We also found that the dopamine production pathway may be key to dia- pause regulation in L. sinapis, in depth quantifications of dopamine in dia- pausing and directly developing individuals will add strong validation to the relevance of this pathway in diapause initiation. Gene expression changes gen- erated in different stages of the butterfly can be used to understand local ad- aptation to various other ecological changes such as host plant differences, temperature and effects of drought. Ideally, an experimental set-up with denser time sampling at earlier developmental stages could be used to detect potential upstream regulators of the neurotransmitter cascade. Finally, a population genetic study with extensive sampling across Eurasia where all species and populations from the genus Leptidea included, would reveal the evolutionary history of this genus in more detail. With the modern sequencing and computational techniques, obtaining a genome assembly of all species in the genus and genetic variation data would be feasible. This would reveal structural variation, gene flow and the complete speciation his- tory of the genus. Interestingly, the Leptidea genus is relatively species poor – unless many cryptic species are hidden within the currently twelve described species – and the closest relatives are other lineages (mimic sulphurs) of the subfamily which occur in South and Central America. Addi- tional sampling from this outgroup subfamily could reveal the origin of the Leptidea colonization of Eurasia and the loss of colourful wing-patterns.

50 Svensk sammanfattning

Genetisk differentiering och artbildning styrs av en mängd olika faktorer som naturligt urval, lokal anpassning och genetisk drift. Det finns också en stor variation i mönstret för genetisk differentiering mellan olika organismer. Denna variation kan till stor del bero på skillnader i hur genomet, arvsmassan, är uppbyggt hos olika arter, men också på olika miljöfaktorer. Det medför att betydelsen av naturligt urval och genetisk drift kan variera mellan olika taxo- nomiska grupper. Den kunskap som vi har om den process som ligger bakom artbildning kommer från en väldigt liten del av världens biodiversitet och det är svårt att nå en konsensus om vilken faktor som har störst betydelse, genetisk drift eller naturligt urval. I denna avhandling försöker jag förstå processen bakom art- och populations diversifiering hos de europeiska vitvingarna, skogsvitvinge (Leptidea sinapis), Reals vitvinge (L. reali) och ängsvitvinge (L. juvernica), samt hos europeisk och sibirisk gransångare (Phylloscopus col- lybita abietinus och P. tristis). Vi har funnit att vitvingarna har dubbelt så stor arvsmassa jämfört med ge- nomsnittet hos andra dagfjärilar, och detta tycks orsakas av en hyperaktivitet hos mobila genelement, så kallade transposoner. En analys av vitvingarnas demografiska historia visade att dessa arter verkar har utvecklats separat utan senare genetiskt utbyte. Det var också en tydlig skillnad i hur populationsstor- leken varierat över tid hos de olika arterna. L. sinapis visade tecken på en sen expansion av populationsstorleken, medan de övriga arterna visade på en längre tid med konstant låg populationsstorlek. Den låga populationsstorleken har dock haft begränsad inverkan på vitvingarnas adaptiva potential, även om den genetiska variationen var generellt låg hos samtliga tre arter. Jag hittade inte något samband mellan vilka regioner i genomet som differentierats hos de olika arterna, dvs parallellism saknades, och de genetiska skillnaderna mel- lan arterna tycks ha utvecklats under tiden som populationerna var åtskilda. Genetisk differentiering hos vitvinge verkar drivas av en kombination av slumpmässig genetisk drift och svagt riktad selektion. Detekterade kandidat- regioner för differentiering och positivt naturligt urval visade sig innehålla ge- ner med liknande funktion hos de olika arterna, bland annat relaterat till äm- nesomsättning och reaktion på stimuli. Europeisk gransångare (Phylloscopus collybita abietinus) och sibirisk gransångare (Phylloscopus collybita tristis) är relativt nybildade fågelarter men kan särskiljas genom tydliga skillnader i utseende, fjäderdräkt och sång.

51 Vi fann att genomet visar ett oregelbundet variations- och differentierings- mönster, samt att den genetiska differentieringen är mer uttalad i sympatriska populationer jämfört med allopatriska. Hos gransångaren tycks genomets dif- ferentiering skapas av upprepad bakgrundsselektion och de har ett konserverat genomisk landskap i likhet med övriga undersökta fågelarter. Vi fann vidare att uppvisade fenotypiska karaktäristika hos individer i gränszonerna mellan de båda arterna inte förutsäger andelen genetiskt bidrag från de båda arterna. Ett fåtal kandidatgener kunde också identifieras som kan vara inblandade i fenotypiska skillnader mellan dessa fågelarter. Lokal anpassning är en av nyckelfaktorerna i den evolutionära processen. En viktig anpassning till den lokala miljön är förmågan att gå in i diapaus (vilostadium). Vi valde därför att studera hur diapaus regleras hos skogsvit- vinge. Fjärilslarver från samma familj utsattes för olika ljusförhållanden, en grupp behandlades med kort dagslängd och den andra gruppen lång dags- längd. Fjärilarna samlades in vid fyra olika åldrar, genuttrycket analyserades för varje stadium och jämfördes mellan de olika behandlingarna. En tydlig skillnad kunde ses i genuttryck hos gener involverade i omsättningen av sig- nalsubstansen dopamin, vilket indikerar att dopamin sannolikt är involverat i regleringen av diapaus hos skogsvitvinge. Sammanfattningsvis, genom att använda en mångsidig uppsättning verktyg för genomisk analys ger min avhandling nya insikter om artbildningsproces- sen hos vitvinge och gransångare.

Translated by: Karin Näsvall

52 Acknowledgements

I want to thank my main supervisor Niclas Backström for his patient guidance throughout my PhD studies and for inspiring me in biology. Thank you for introducing me to the world of butterfly and bird biology. Your support is vital in my overall development as a bioinformatician. I want to thank Hans Elle- gren and the Evolutionary biology department for letting me do my PhD stud- ies here in EBC, Uppsala. Thanks to Jochen Wolf for being my co-supervisor and his support with the graduate school on genomes and phenotypes. Thanks to my fellow PhD student at Backström lab, Karin Näsvall for helping me with the Swedish summary and taking care of the important lab work and all the future projects. Thanks to all the current and past members of Backström lab for sharing their valuable scientific knowledge. Thank you Faheema Kalsoom for some good programming questions, I have definitely improved my pro- graming skills to the next level trying to solve those problems. Thank you Luis Leal for your amazing collaboration and knowledge. Thanks to Lars Höök for the great collaboration. Thanks to all the other bachelors and masters students supervised by Niclas (Keshika Ravichandran, Ye Xiong, Paulina Widell, Axel Söderberg, Kristaps Sokolovskis and John Burley). Thank you Magne Friberg for showing me how to take care of butterflies in the green house and teaching me about the ecology of Leptidea species. Thanks to Alexander Suh for his knowledge regarding TEs and all the whisky club, beer club and Frisbee eve- nings. Thank you Roger Vila and Vlad Dincă for sharing your vast knowledge on butterflies and the Leptidea samples that are vital for my PhD. Thanks to Christer Wiklund for being so inspiring, your skill with taking care of the but- terflies in the green house and the amazing photos for the cover. Thanks to Daria Shipilina for the great collaboration and knowledge with the chiffchaff. Thanks to all the help from UPPMAX, Scilifelab, National Genomics Infra- structure (NGI) and National Bioinformatics Infrastructure (NBIS) for the help with sequencing, data warehousing and bioinformatics support. Thanks to Takeshi Kawakami for cool bioinformatics challenges and whisky clubs. Thanks to Martin Lascoux and his group for interesting discussions at the jour- nal club. Thanks to Hanna Johannesson and Ravi Vumma for recommending me for the PhD position with Niclas. Thanks to Nagarjun Vijay for motivating me to programme everything myself. Thanks to Robert Ekblom for constantly informing us about seminars. Thanks to all the group leaders at the Depart- ment of Evolutionary biology at EBC, Uppsala University for the collabora- tive research environment.

53 Thank you Homa Papoli for your help with my programing problems and coming to my rescue when I am stuck. I also want to thank Berrit Kiehl for helping out in the wet lab and giving me her precious protocols for extractions. Thanks to Ludovic Dutoit for being a great friend, office mate and for bringing me out of perl and introducing me to python. Thank you Ghazal Alavioon for helping me with all the paper work (from migrationsverket to PhD thesis). Thanks to all the people who played ping pong with me during my time here as a PhD student (Severin Uebbing, Aaron Shafer and Jelmer Poelstra). Thanks to Barbara Ellis for being a great friend and office mate. Thanks to Mercè for organising the Beervolution with me. Hope this keeps going for a long time. Viva La Beervolution! . I want to thank all the PhD students and friends for all the fun and support (Nina, Torsten, Alex, Thijessen, Arielle, Mario, Luciana, João, Marco, Rado, Francesca, Valentina, Carina, Aeron, Mathilde, Keri, Jasper, Dmytro, Tobi, Lore, Mahvash, Sergio, Matthias, Mat, Ivain, Annika, Moritz, Varun, Prathusha, Manoj, Kanaka Lakshmi, Sudarsan and Rory). Thanks to my brother Nivas for keeping my morale up whenever I was stuck in a project. I want to thank Jente and his dad (Dirk Ottenburghs) for providing the photo for paper III, Vlad Dincă for providing photos of Lep- tidea for the main cover and Christer Wiklund for providing the cover for Pa- per II and IV. Thanks to Vignyatha for proofreading my thesis and signifi- cantly improving it. I want to thank my Mom (Uma Devi) and Dad (Raja Sekhar Reddy) for supporting and encouraging me to start my PhD. My cousin Viraja for making sure I am doing fine. Finally my wife Manognya for understanding the hard- ships of science and for dragging me through the last months of my PhD.

54 References

Alexander, D. H.,Novembre, J. and Lange, K. (2009) Fast model-based estimation of ancestry in unrelated individuals, Genome Research, 19(9), 1655–1664. Alipaz, J. A.,Wu, C. I. and Karr, T. L. (2001) Gametic incompatibilities between races of Drosophila melanogaster., Proceedings of the Royal Society B: Biological Sciences, 268(1469), 789–795. Alkan, C.,Sajjadian, S. and Eichler, E. E. (2010) Limitations of next-generation genome sequence assembly, Nature methods. Nature Publishing Group, 8(1), 61. Allendorf, F. W.,Luikart, G. and Aitken, S. N. (2009) Conservation and the genetics of populations. John Wiley & Sons. Alstrom, P.,Rheindt, F. E.,Zhang, R.,Zhao, M.,Wang, J.,Zhu, X.,Gwee, C. Y.,Hao, Y.,Ohlson, J.,Jia, C.,Prawiradilaga, D. M.,Ericson, P. G. P.,Lei, F. and Olsson, U. (2018) Complete species-level phylogeny of the (Aves: Phylloscopidae) radiation., Molecular and evolution. United States, 126, 141–152. Arenas, M.,Ray, N.,Currat, M. and Excoffier, L. (2011) Consequences of range contractions and range shifts on molecular diversity, Molecular biology and evolution. Oxford University Press, 29(1), 207–218. Ashburner, M.,Ball, C. A.,Blake, J. A.,Botstein, D.,Butler, H.,Cherry, J. M.,Davis, A. P.,Dolinski, K.,Dwight, S. S.,Eppig, J. T.,Harris, M. A.,Hill, D. P.,Issel-Tarver, L.,Kasarskis, A.,Lewis, S.,Matese, J. C.,Richardson, J. E.,Ringwald, M.,Rubin, G. M. and Sherlock, G. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium., Nature genetics. United States, 25(1), 25–29. Atkinson, D. and Sibly, R. M. (1997) Why are organisms usually bigger in colder environments? Making sense of a life history puzzle., Trends in ecology & evolution. , 12(6), 235–239. Bairoch, A. and Apweiler, R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000, Nucleic Acids Research. Oxford, UK: Oxford University Press, 28(1), 45–48. Baker, H. G. (1959) Reproductive methods as factors in speciation in flowering plants, in Cold Spring Harbor Symposia on Quantitative Biology, 177–191. Barbash, D. A.,Awadalla, P. and Tarone, A. M. (2004) Functional divergence caused by ancient positive selection of a Drosophila hybrid incompatibility locus, PLoS Biology, 2(6), 142. Barbash, D. A.,Siino, D. F.,Tarone, A. M. and Roote, J. (2003) A rapidly evolving MYB-related protein causes species isolation in Drosophila, Proceedings of the National Academy of Sciences, 100(9), 5302–5307. Barton, N. and Bengtsson, B. O. (1986) The barrier to genetic exchange between hybridising populations, . Nature Publishing Group, 57(3), 357. Barton, N. H. and Hewitt, G. M. (1985) Analysis of hybrid zones, Annual Review of Ecology and , 16(1), 113–148. Bateson, W. (1909) Heredity and variation in modern lights (ed. Seward A. C.), in Darwin and Modern Science, 85–101.

55 Beaumont, M. A. and Balding, D. J. (2004) Identifying adaptive genetic divergence among populations from genome scans, Molecular Ecology, 13(4), 969–980. Beaumont, M. A. and Nichols, R. A. (1996) Evaluating loci for use in the genetic analysis of population structure, Proceedings of the Royal Society B: Biological Sciences, 236(1377), 1619–1626. Beheregaray, L. B. and Caccone, A. (2007) Cryptic biodiversity in a changing world, Journal of Biology, 6(4), 9. Beldade, P.,Saenko, S. V,Pul, N. and Long, A. D. (2009) A gene-based linkage map for Bicyclus anynana butterflies allows for a comprehensive analysis of synteny with the lepidopteran reference genome, PLoS genetics. Public Library of Science, 5(2), e1000366. Bensch, S.,Grahn, M.,Muller, N.,Gay, L. and Akesson, S. (2009) Genetic, morphological, and feather isotope variation of migratory willow warblers show gradual divergence in a ring., Molecular ecology. England, 18(14), 3087–3096. Bentley, D. R.,Balasubramanian, S.,Swerdlow, H. P.,Smith, G. P.,Milton, J.,Brown, C. G.,Hall, K. P.,Evers, D. J.,Barnes, C. L.,Bignell, H. R.,Boutell, J. M.,Bryant, J.,Carter, R. J.,Keira Cheetham, R.,Cox, A. J.,Ellis, D. J.,Flatbush, M. R.,Gormley, N. A.,Humphray, S. J.,Irving, L. J.,Karbelashvili, M. S.,Kirk, S. M.,Li, H.,Liu, X.,Maisinger, K. S.,Murray, L. J.,Obradovic, B.,Ost, T.,Parkinson, M. L.,Pratt, M. R.,Rasolonjatovo, I. M. J.,Reed, M. T.,Rigatti, R.,Rodighiero, C.,Ross, M. T.,Sabot, A.,Sankar, S. V,Scally, A.,Schroth, G. P.,Smith, M. E.,Smith, V. P.,Spiridou, A.,Torrance, P. E.,Tzonev, S. S.,Vermaas, E. H.,Walter, K.,Wu, X.,Zhang, L.,Alam, M. D.,Anastasi, C.,Aniebo, I. C.,Bailey, D. M. D.,Bancarz, I. R.,Banerjee, S.,Barbour, S. G.,Baybayan, P. A.,Benoit, V. A.,Benson, K. F.,Bevis, C.,Black, P. J.,Boodhun, A.,Brennan, J. S.,Bridgham, J. A.,Brown, R. C.,Brown, A. A.,Buermann, D. H.,Bundu, A. A.,Burrows, J. C.,Carter, N. P.,Castillo, N.,Chiara E Catenazzi, M.,Chang, S.,Neil Cooley, R.,Crake, N. R.,Dada, O. O.,Diakoumakos, K. D.,Dominguez-Fernandez, B.,Earnshaw, D. J.,Egbujor, U. C.,Elmore, D. W.,Etchin, S. S.,Ewan, M. R.,Fedurco, M.,Fraser, L. J.,Fuentes Fajardo, K. V,Scott Furey, W.,George, D.,Gietzen, K. J.,Goddard, C. P.,Golda, G. S.,Granieri, P. A.,Green, D. E.,Gustafson, D. L.,Hansen, N. F.,Harnish, K.,Haudenschild, C. D.,Heyer, N. I.,Hims, M. M.,Ho, J. T.,Horgan, A. M.,Hoschler, K.,Hurwitz, S.,Ivanov, D. V,Johnson, M. Q.,James, T.,Huw Jones, T. A.,Kang, G.-D.,Kerelska, T. H.,Kersey, A. D.,Khrebtukova, I.,Kindwall, A. P.,Kingsbury, Z.,Kokko- Gonzales, P. I.,Kumar, A.,Laurent, M. A.,Lawley, C. T.,Lee, S. E.,Lee, X.,Liao, A. K.,Loch, J. A.,Lok, M.,Luo, S.,Mammen, R. M.,Martin, J. W.,McCauley, P. G.,McNitt, P.,Mehta, P.,Moon, K. W.,Mullens, J. W.,Newington, T.,Ning, Z.,Ling Ng, B.,Novo, S. M.,O’Neill, M. J.,Osborne, M. A.,Osnowski, A.,Ostadan, O.,Paraschos, L. L.,Pickering, L.,Pike, A. C.,Pike, A. C.,Chris Pinkard, D.,Pliskin, D. P.,Podhasky, J.,Quijano, V. J.,Raczy, C.,Rae, V. H.,Rawlings, S. R.,Chiva Rodriguez, A.,Roe, P. M.,Rogers, J.,Rogert Bacigalupo, M. C.,Romanov, N.,Romieu, A.,Roth, R. K.,Rourke, N. J.,Ruediger, S. T.,Rusman, E.,Sanches-Kuiper, R. M.,Schenker, M. R.,Seoane, J. M.,Shaw, R. J.,Shiver, M. K.,Short, S. W.,Sizto, N. L.,Sluis, J. P.,Smith, M. A.,Ernest Sohna Sohna, J.,Spence, E. J.,Stevens, K.,Sutton, N.,Szajkowski, L.,Tregidgo, C. L.,Turcatti, G.,Vandevondele, S.,Verhovsky, Y.,Virk, S. M.,Wakelin, S.,Walcott, G. C.,Wang, J.,Worsley, G. J.,Yan, J.,Yau, L.,Zuerlein, M.,Rogers, J.,Mullikin, J. C.,Hurles, M. E.,McCooke, N. J.,West, J. S.,Oaks, F. L.,Lundberg, P. L.,Klenerman, D.,Durbin, R. and Smith, A. J. (2008) Accurate whole human genome sequencing using reversible terminator chemistry., Nature. England, 456(7218), 53–59.

56 Bickford, D.,Lohman, D. J.,Sodhi, N. S.,Ng, P. K. L.,Meier, R.,Winker, K.,Ingram, K. K. and Das, I. (2007) Cryptic species as a window on diversity and conservation, Trends in Ecology and Evolution, 22(3), 148–155. Bierne, N. and Eyre-Walker, A. (2004) The genomic rate of adaptive amino acid substitution in Drosophila, Molecular Biology and Evolution. Oxford University Press, 21(7), 1350–1360. Black IV, W. C.,Baer, C. F.,Antolin, M. F. and DuTeau, N. M. (2001) : genome-wide sampling of insect populations, Annual review of entomology, 46(1), 441–469. Brady, J. (1976) Insect clocks, Nature, 264(490). Brand, A. H. and Perrimon, N. (1993) Targeted gene expression as a means of altering cell fates and generating dominant phenotypes., Development (Cambridge, England). England, 118(2), 401–415. Brideau, N. J.,Flores, H. A.,Wang, J.,Maheshwari, S.,Wang, X. and Barbash, D. A. (2006) Two Dobzhansky-Muller Genes interact to cause hybrid lethality in Drosophila, Science, 314(5803), 1292–1296. Britten, R. J. and Davidson, E. H. (1971) Repetitive and non-repetitive DNA sequences and a speculation on the origins of evolutionary novelty., The Quarterly review of biology. United States, 46(2), 111–138. Brockmann, R.,Beyer, A.,Heinisch, J. J. and Wilhelm, T. (2007) Posttranscriptional Expression Regulation: What Determines Translation Rates?, PLOS Computational Biology. Public Library of Science, 3(3), 1–9. Brown, B.,Emberson, R. and Paterson, A. (1999) Mitochondrial COI and II provide useful markers for Wiseana (Lepidoptera: Hepialidae) species identification, Bulletin of Entomological Research, 89(4), 287–293. Burge, C. and Karlin, S. (1997) Prediction of complete gene structures in human genomic DNA1, Journal of molecular biology, 268(1), 78–94. Buri, P. (1956) Gene frequency in small populations of mutant Drosophila, Evolution, 10(4), 367–402. Burke, J. M. and Arnold, M. L. (2001) Genetics and the Fitness of Hybrids, Annual Review of Genetics, 35(1), 31–52. Burri, R. (2017) Interpreting differentiation landscapes in the light of long-term linked selection, Evolution Letters. Wiley Online Library, 1(3), 118–131. Butler, J.,MacCallum, I.,Kleber, M.,Shlyakhter, I. A.,Belmonte, M. K.,Lander, E. S.,Nusbaum, C. and Jaffe, D. B. (2008) ALLPATHS: De novo assembly of whole-genome shotgun microreads, Genome Research, 18(5), 810–820. Carroll, S. B. (2008) Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution., Cell. United States, 134(1), 25–36. Cavalier-Smith, T. (2005) Economy, speed and size matter: evolutionary forces driving nuclear genome miniaturization and expansion, Annals of botany. Oxford University Press, 95(1), 147–175. Charlesworth, B. (1998) Measures of divergence between populations and the effect of forces that reduce variability., Molecular Biology and Evolution, 15(5), 538– 543. Charlesworth, B. (2009) Effective population size and patterns of and variation, Nature Reviews Genetics. Nature Publishing Group, 10, 195. Charlesworth, B.,Nordborg, M. and Charlesworth, D. (1997) The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations, Genetics Research. Cambridge University Press, 70(2), 155–174.

57 Charlesworth, D.,Charlesworth, B. and Morgan, M. T. (1995) The pattern of neutral molecular variation under the background selection model, Genetics, 141(4), 1619–1632. Charlesworth, J. and Eyre-Walker, A. (2006) The rate of adaptive evolution in enteric bacteria., Molecular biology and evolution. United States, 23(7), 1348–1356. Chen, D.,Toone, W. M.,Mata, J.,Lyne, R.,Burns, G.,Kivinen, K.,Brazma, A.,Jones, N. and Bahler, J. (2003) Global transcriptional responses of fission yeast to environmental stress., Molecular biology of the cell. United States, 14(1), 214– 229. Chen, L.,Liu, P.,Evans, T. C. and Ettwiller, L. M. (2017) DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification, Science, 355(6326), 752–756. Chuong, E. B.,Elde, N. C. and Feschotte, C. (2017) Regulatory activities of transposable elements: From conflicts to benefits, Nature Reviews Genetics, 18(2), 71–86. Colosimo, P. F.,Hosemann, K. E.,Balabhadra, S.,Villarreal, G.,Dickson, H.,Grimwood, J.,Schmutz, J.,Myers, R. M.,Schluter, D. and Kingsley, D. M. (2005) Widespread parallel evolution in sticklebacks by repeated fixation of ectodysplasin alleles, Science, 307(5717), 1928–1933. Comeault, A. A.,Flaxman, S. M.,Riesch, R.,Curran, E.,Soria-Carrasco, V.,Gompert, Z.,Farkas, T. E.,Muschick, M.,Parchman, T. L.,Schwander, T.,Slate, J. and Nosil, P. (2015) Selection on a Genetic Polymorphism Counteracts Ecological Speciation in a Stick Insect, Current Biology, 25(15), 1975–1981. Conover, D. O.,Duffy, T. A. and Hice, L. A. (2009) The covariance between genetic and environmental influences across ecological gradients: reassessing the evolutionary significance of countergradient and cogradient variation., Annals of the New York Academy of Sciences. United States, 1168, 100–129. Consortium., H. G. (2012) Butterfly genome reveals promiscuous exchange of mimicry adaptations among species., Nature. England, 487(7405), 94–98. Cook, L. M. and Saccheri, I. J. (2013) The peppered moth and industrial melanism: Evolution of a natural selection case study, Heredity, 110(3), 207–212. Corbett-Detig, R. B.,Hartl, D. L. and Sackton, T. B. (2015) Natural Selection Constrains Neutral Diversity across A Wide Range of Species, PLOS Biology. Public Library of Science, 13(4), 1–25. Costa-Silva, J.,Domingues, D. and Lopes, F. M. (2017) RNA-Seq differential expression analysis: An extended review and a software tool, PLoS ONE, 12(12), e0190152. Coyne, J. A. and Orr, A. H. (2004) Speciation, Sunderland, MA: Sinauer Associates. Cresko, W. A.,Amores, A.,Wilson, C.,Murphy, J.,Currey, M.,Phillips, P.,Bell, M. A.,Kimmel, C. B. and Postlethwait, J. H. (2004) Parallel genetic basis for repeated evolution of armor loss in Alaskan threespine stickleback populations, Proceedings of the National Academy of Sciences. National Academy of Sciences, 101(16), 6050–6055. Crick, F. (1970) Central Dogma of Molecular Biology, Nature. Nature Publishing Group, 227, 561. Cruickshank, T. E. and Hahn, M. W. (2014) Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow, Molecular Ecology, 23(13), 3133–3157. Danecek, P.,Auton, A.,Abecasis, G.,Albers, C. A.,Banks, E.,DePristo, M. A.,Handsaker, R. E.,Lunter, G.,Marth, G. T.,Sherry, S. T.,McVean, G. and Durbin, R. (2011) The variant call format and VCFtools, Bioinformatics, 27(15), 2156–2158.

58 Darwin, C. (1859) On the origins of species by means of natural selection, Murray, . van Dijk, E. L.,Jaszczyszyn, Y. and Thermes, C. (2014) Library preparation methods for next-generation sequencing: Tone down the bias, Experimental Cell Research, 322(1), 12–20. Dincă, V.,Lukhtanov, V. A.,Talavera, G. and Vila, R. (2011) Unexpected layers of cryptic diversity in wood white Leptidea butterflies, Nature Communications. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved., 2, 324. Dincǎ, V.,Wiklund, C.,Lukhtanov, V. A.,Kodandaramaiah, U.,Norén, K.,Dapporto, L.,Wahlberg, N.,Vila, R. and Friberg, M. (2013) Reproductive isolation and patterns of genetic differentiation in a cryptic butterfly species complex, Journal of Evolutionary Biology, 26(10), 2095–2106. Dobin, A.,Davis, C. A.,Schlesinger, F.,Drenkow, J.,Zaleski, C.,Jha, S.,Batut, P.,Chaisson, M. and Gingeras, T. R. (2013) STAR: ultrafast universal RNA-seq aligner, Bioinformatics, 29(1), 15–21. Dobzhansky, T. (1937) Genetics and the Origin of Species. New York: Columbia university press. Dobzhansky, T. H. (1936) Studies on hybrid sterility. II. Localization of sterility factors in Drosophila pseudoobscura hybrids, Genetics. Genetics Society of America, 21(2), 113. Drummond, D. A.,Raval, A. and Wilke, C. O. (2005) A single determinant dominates the rate of yeast protein evolution, Molecular biology and evolution. Oxford University Press, 23(2), 327–337. Dufour, L. (1844) Anatomie generale des dipteres, Annales des Sciences Naturelles, 1, 244–264. Durand, E. Y.,Patterson, N.,Reich, D. and Slatkin, M. (2011) Testing for ancient admixture between closely related populations, Molecular Biology and Evolution, 28(8), 2239–2252. Earl, D.,Bradnam, K.,St. John, J.,Darling, A.,Lin, D.,Fass, J.,Yu, H. O. K.,Buffalo, V.,Zerbino, D. R.,Diekhans, M.,Nguyen, N.,Ariyaratne, P. N.,Sung, W. K.,Ning, Z.,Haimel, M.,Simpson, J. T.,Fonseca, N. A.,Birol, I.,Docking, T. R.,Ho, I. Y.,Rokhsar, D. S.,Chikhi, R.,Lavenier, D.,Chapuis, G.,Naquin, D.,Maillet, N.,Schatz, M. C.,Kelley, D. R.,Phillippy, A. M.,Koren, S.,Yang, S. P.,Wu, W.,Chou, W. C.,Srivastava, A.,Shaw, T. I.,Ruby, J. G.,Skewes-Cox, P.,Betegon, M.,Dimon, M. T.,Solovyev, V.,Seledtsov, I.,Kosarev, P.,Vorobyev, D.,Ramirez- Gonzalez, R.,Leggett, R.,MacLean, D.,Xia, F.,Luo, R.,Li, Z.,Xie, Y.,Liu, B.,Gnerre, S.,MacCallum, I.,Przybylski, D.,Ribeiro, F. J.,Sharpe, T.,Hall, G.,Kersey, P. J.,Durbin, R.,Jackman, S. D.,Chapman, J. A.,Huang, X.,DeRisi, J. L.,Caccamo, M.,Li, Y.,Jaffe, D. B.,Green, R. E.,Haussler, D.,Korf, I. and Paten, B. (2011) Assemblathon 1: A competitive assessment of de novo short read assembly methods, Genome Research, 21(12), 2224–2241. Eaton, L. (2003) Human genome project completed, BMJ : British Medical Journal. BMJ Publishing Group Ltd, 326(7394), 838. Eberhard, W. G. (1992) Species Isolation, Genital Mechanics, and the Evolution of Species-Specific Genitalia in Three Species of Macrodactylus Beetles (Coleoptera, Scarabeidae, Melolonthinae), Evolution, 46(6), 1774–1783. Ekblom, R. and Galindo, J. (2011) Applications of next generation sequencing in molecular ecology of non-model organisms, Heredity, 107(1), 1. Ekblom, R. and Wolf, J. B. W. (2014) A field guide to whole-genome sequencing, assembly and annotation, Evolutionary Applications, 7(9), 1026–1042.

59 Elbarbary, R. A.,Lucas, B. A. and Maquat, L. E. (2016) Retrotransposons as regulators of gene expression, Science, 351(6274), aac7247. Ellegren, H. (2014) Genome sequencing and population genomics in non-model organisms, Trends in Ecology & Evolution, 29(1), 51–63. Ellegren, H.,Smeds, L.,Burri, R.,Olason, P. I.,Backström, N.,Kawakami, T.,Künstner, A.,Mäkinen, H.,Nadachowska-Brzyska, K.,Qvarnström, A.,Uebbing, S. and Wolf, J. B. W. (2012) The genomic landscape of species divergence in Ficedula flycatchers, Nature, 491(7426), 756. Endler, J. (1986) Natural Selection in the Wild., Princeton University Press. Princeton University Press. Enjalbert, B.,Smith, D. A.,Cornell, M. J.,Alam, I.,Nicholls, S.,Brown, A. J. P. and Quinn, J. (2006) Role of the Hog1 stress-activated protein kinase in the global transcriptional response to stress in the fungal Candida albicans., Molecular biology of the cell. United States, 17(2), 1018–1032. Eyre-Walker, A. (2006) The genomic rate of adaptive evolution., Trends in ecology & evolution. England, 21(10), 569–575. Eyre-Walker, A. and Keightley, P. D. (2007) The distribution of fitness effects of new mutations, Nature Reviews Genetics. Nature Publishing Group, 8, 610. Falconer, D. S. and Mackay, T. F. C. (1996) Introduction to , Prentice Hall. Harlow, England. Fay, J. C.,Wyckoff, G. J. and Wu, C. I. (2001) Positive and negative selection on the human genome., Genetics. United States, 158(3), 1227–1234. Feschotte, C. and Pritham, E. J. (2007) DNA Transposons and the Evolution of Eukaryotic Genomes, Annual Review of Genetics, 41, 331–68. Fierst, J. L. and Hansen, T. F. (2010) Genetic architecture and postzygotic reproductive isolation: Evolution of Bateson-Dobzhansky-Muller incompatibilities in a polygenic model, Evolution, 64(3), 675–693. Fisher, R. (1930) The genetical theory of natural selection, The Clarendon Press. Foll, M. and Gaggiotti, O. (2008) A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: A Bayesian perspective, Genetics, 180(2), 977–993. Franks, S. J. and Hoffmann, A. A. (2012) Genetics of Climate Change Adaptation, Annual Review of Genetics, 46, 185–208. Friberg, M.,Bergman, M.,Kullberg, J.,Wahlberg, N. and Wiklund, C. (2008) Niche separation in space and time between two sympatric sister species-a case of ecological pleiotropy, , 22(1), 1–18. Friberg, M. and Wiklund, C. (2007) Generation-dependent female choice: Behavioral polyphenism in a bivoltine butterfly, Behavioral Ecology, 18(4), 758–763. Friberg, M. and Wiklund, C. (2010) Host-plant-induced larval decision-making in a habitat/host-plant generalist butterfly, Ecology, 91(1), 15–21. Frichot, E.,Mathieu, F.,Trouillon, T.,Bouchard, G. and Francois, O. (2014) Fast and efficient estimation of individual ancestry coefficients., Genetics. United States, 196(4), 973–983. Fumi, M. (2008) Distinguishing between Leptidea sinapis and L. reali (Lepidoptera: Pieridae) using a morphometric approach: Impact of measurement error on the discriminative characters, Zootaxa, 1819, 40–54. Futuyma, D. J. (2014) Evolution, Sunderland, MA: Sinauer Associates. Galinsky, K. J.,Loh, P. R.,Mallick, S.,Patterson, N. J. and Price, A. L. (2016) Population Structure of UK Biobank and Ancient Eurasians Reveals Adaptation at Genes Influencing Blood Pressure, American Journal of Human Genetics, 95(5), 1130–1139.

60 Garcia-Perez, J. L.,Doucet, A. J.,Kopera, H. C.,Richardson, S. R.,Moldovan, J. B. and Moran, J. V. (2015) The Influence of LINE-1 and SINE Retrotransposons on Mammalian Genomes, Microbiology Spectrum, 3(2), MDNA3-0061. Gavrilets, S.,Li, H. and Vose, M. D. (1998) Rapid on holey adaptive landscapes, Proceedings of the Royal Society B: Biological Sciences, 265(1405), 1483–1489. Gavrilets, S. and Vose, A. (2005) Dynamic patterns of , Proceedings of the National Academy of Sciences. National Acad Sciences, 102(50), 18040–18045. Gilad, Y.,Pritchard, J. K. and Thornton, K. (2009) Characterizing natural variation using next-generation sequencing technologies., Trends in genetics : TIG. England, 25(10), 463–471. Gillespie, J. H. (2001) Is the population size of a species relevant to its evolution?, Evolution; international journal of organic evolution. United States, 55(11), 2161–2169. Girardot, F.,Monnier, V. and Tricoire, H. (2004) Genome wide analysis of common and specific stress responses in adult drosophila melanogaster., BMC genomics. England, 5, 74. Gompel, N.,Prud’homme, B.,Wittkopp, P. J.,Kassner, V. A. and Carroll, S. B. (2005) Chance caught on the wing: cis-regulatory evolution and the origin of pigment patterns in Drosophila, Nature. Macmillan Magazines Ltd., 433, 481. Gompert, Z. and Buerkle, C. A. (2009) A powerful regression-based method for admixture mapping of isolation across the genome of hybrids, Molecular Ecology, 18(6), 1207–24. Gompert, Z.,Lucas, L. K.,Nice, C. C.,Fordyce, J. A.,Forister, M. L. and Buerkle, C. A. (2012) Genomic regions with a history of divergent selection affect fitness of hybrids between two butterfly species, Evolution, 66(7), 2167–2181. Gossmann, T. I.,Keightley, P. D. and Eyre-Walker, A. (2012) The effect of variation in the effective population size on the rate of adaptive molecular evolution in eukaryotes., Genome biology and evolution. England, 4(5), 658–667. Grabherr, M. G.,Haas, B. J.,Yassour, M.,Levin, J. Z.,Thompson, D. A.,Amit, I.,Adiconis, X.,Fan, L.,Raychowdhury, R.,Zeng, Q.,Chen, Z.,Mauceli, E.,Hacohen, N.,Gnirke, A.,Rhind, N.,di Palma, F.,Birren, B. W.,Nusbaum, C.,Lindblad-Toh, K.,Friedman, N. and Regev, A. (2011) Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data, Nature biotechnology, 29(7), 644–652. Grant, P. R. and Grant, B. R. (2002) Unpredictable evolution in a 30-year study of Darwin’s finches, Science, 296(5568), 707–711. Green, R. E.,Krause, J.,Briggs, A. W.,Maricic, T.,Stenzel, U.,Kircher, M.,Patterson, N.,Li, H.,Zhai, W.,Fritz, M. H. Y.,Hansen, N. F.,Durand, E. Y.,Malaspinas, A. S.,Jensen, J. D.,Marques-Bonet, T.,Alkan, C.,Prüfer, K.,Meyer, M.,Burbano, H. A.,Good, J. M.,Schultz, R.,Aximu-Petri, A.,Butthof, A.,Höber, B.,Höffner, B.,Siegemund, M.,Weihmann, A.,Nusbaum, C.,Lander, E. S.,Russ, C.,Novod, N.,Affourtit, J.,Egholm, M.,Verna, C.,Rudan, P.,Brajkovic, D.,Kucan, Ž.,Gušic, I.,Doronichev, V. B.,Golovanova, L. V.,Lalueza-Fox, C.,De La Rasilla, M.,Fortea, J.,Rosas, A.,Schmitz, R. W.,Johnson, P. L. F.,Eichler, E. E.,Falush, D.,Birney, E.,Mullikin, J. C.,Slatkin, M.,Nielsen, R.,Kelso, J.,Lachmann, M.,Reich, D. and Pääbo, S. (2010) A draft sequence of the neandertal genome, Science, 328(5979), 710–722. Gregory, T. R. and Hebert, P. D. N. (1999) The modulation of DNA content: proximate causes and ultimate consequences, Genome Research. Cold Spring Harbor Lab, 9(4), 317–324.

61 Griffin, D. K.,Robertson, L. B. W.,Tempest, H. G. and Skinner, B. M. (2007) The evolution of the avian genome as revealed by comparative molecular ., Cytogenetic and genome research. Switzerland, 117(1–4), 64–77. Haasl, R. J. and Payseur, B. A. (2016) Fifteen years of genomewide scans for selection: trends, lessons and unaddressed genetic sources of complication., Molecular ecology. England, 25(1), 5–23. Haldane, J. B. S. (1922) Sex ratio and unisexual sterility in hybrid , Journal of Genetics, 12(2), 101–109. Haldane, J. B. S. (1934) A Mathematical Theory of Natural and Artificial Selection Part X. Some Theorems on Artificial Selection, Genetics, 19(5), 412–429. Harrison, R. G. (1979) Speciation in North American Field Crickets: Evidence from Electrophoretic Comparisons, Evolution, 33(4), 1009–1023. Harrison, R. G. (1990) Hybrid zones: windows on evolutionary process, Oxford Surveys in Evolutionary Biology, 7, 69–128. Hebert, P. D. N.,Cywinska, A.,Ball, S. L. and DeWaard, J. R. (2003) Biological identifications through DNA barcodes, Proceedings of the Royal Society B: Biological Sciences, 270(1512), 313–21. Hedges, S. B. (2002) The origin and evolution of model organisms, Nature Reviews Genetics, 3(11), 838–849. Hedges, S. B. and Kumar, S. (2002) Vertebrate Genomes Compared, Science. American Association for the Advancement of Science, 297(5585), 1283–1285. Helbig, A. J.,Martens, J.,Seibold, I.,Henning, F.,Schottler, B. and Wink, M. (1996) Phylogeny and species limits in the Palaearctic chiffchaff Phylloscopus collybita complex: mitochondrial genetic differentiation and bioacoustic evidence, Ibis. Wiley Online Library, 138(4), 650–666. Hershey, A. D. and Chase, M. (1952) Independent functions of viral protein and nucleic acid in growth of bacteriophage., The Journal of general physiology. United States, 36(1), 39–56. Hewitt, G. (2000) The genetic legacy of the quaternary ice ages, Nature, 405(6789), 907–913. Hewitt, G. M. (1999) Post-glacial re-colonization of European biota, Biological Journal of the Linnean Society, 68(1), 87–112. Hewitt, G. M. (2001) Speciation, hybrid zones and phylogeography—or seeing genes in space and time, Molecular ecology. Wiley Online Library, 10(3), 537–549. Hill, W. G. and Robertson, A. (1966) The effect of linkage on limits to artificial selection, Genetics Research. Cambridge University Press, 8(3), 269–294. Hoyo, J.,Elliott, A.,Christie, D. and Sargatal, J. (2006) Handbook of the Birds of the World: Old world flycatchers to old world warblers. Lynx Edicions (Handbook of the Birds of the World). Hubby, J. L. and Lewontin, R. C. (1966) A Molecular Approach to the Study of Genic Heterozygosity in Natural Populations. I. the Number of Alleles at Different Loci in DROSOPHILA PSEUDOOBSCURA, Genetics, 54(2), 577–594. Hurst, G. D. D. and Werren, J. H. (2001) The role of selfish genetic elements in eukaryotic evolution, Nature Reviews Genetics, 2(8), 597–606. Iijima, T.,Kajitani, R.,Komata, S.,Lin, C.-P.,Sota, T.,Itoh, T. and Fujiwara, H. (2018) Parallel evolution of Batesian mimicry supergene in two Papilio butterflies, P. polytes and P. memnon, Science Advances. American Association for the Advancement of Science, 4(4). Isabel, G.,Gourdoux, L. and Moreau, R. (2001) Changes of biogenic amine levels in haemolymph during diapausing and non-diapausing status in Pieris brassicae L., Comparative Biochemistry and Physiology - A Molecular and Integrative Physiology, 128(1), 117–127.

62 Jeukens, J.,Bittner, D.,Knudsen, R. and Bernatchez, L. (2009) Candidate genes and adaptive radiation: Insights from transcriptional adaptation to the limnetic niche among coregonine fishes (Coregonus spp., Salmonidae), Molecular Biology and Evolution, 26(1), 155–166. Jiggins, C. D.,McMillan, W. O.,Neukirchen, W. and Mallet., J. (1996) What can hybrid zones tell us about speciation? The case of Heliconius erato and H. himera (Lepidoptera: Nymphalidae), Biological Journal of the Linnean Society, 59(3), 221–242. Jiggins, C. D.,Naisbit, R. E.,Coe, R. L. and Mallet, J. (2001) Reproductive isolation caused by colour pattern mimicry, Nature, 411(6835), 302–305. Keller, I.,Bensasson, D. and Nichols, R. A. (2007) Transition-Transversion Bias Is Not Universal: A Counter Example from Grasshopper Pseudogenes, PLOS Genetics. Public Library of Science, 3(2), 1–7. Kellis, M.,Wold, B.,Snyder, M. P.,Bernstein, B. E.,Kundaje, A.,Marinov, G. K.,Ward, L. D.,Birney, E.,Crawford, G. E.,Dekker, J.,Dunham, I.,Elnitski, L. L.,Farnham, P. J.,Feingold, E. A.,Gerstein, M.,Giddings, M. C.,Gilbert, D. M.,Gingeras, T. R.,Green, E. D.,Guigo, R.,Hubbard, T.,Kent, J.,Lieb, J. D.,Myers, R. M.,Pazin, M. J.,Ren, B.,Stamatoyannopoulos, J. A.,Weng, Z.,White, K. P. and Hardison, R. C. (2014) Defining functional DNA elements in the human genome., Proceedings of the National Academy of Sciences of the United States of America, 111(17), 6131–6138. Kim, J.,Larkin, D. M.,Cai, Q.,Asan,Zhang, Y.,Ge, R.-L.,Auvil, L.,Capitanu, B.,Zhang, G.,Lewin, H. A. and Ma, J. (2013) Reference-assisted chromosome assembly, Proceedings of the National Academy of Sciences, 110(5), 1785–1790. Kimura, M. (1979) The Neutral Theory of Molecular Evolution, Scientific American. Scientific American, a division of Nature America, Inc., 241(5), 98–129. Kimura, M. (1991) The neutral theory of molecular evolution: a review of recent evidence., Japanese Journal of Genetics, 66(4), 367–386. Kimura, M. and Ohta, T. (1969) The Average Number of Generations until Fixation of a Mutant Gene in a Finite Population, Genetics, 61(3), 763–771. Kolmogorov, M.,Raney, B.,Paten, B. and Pham, S. (2014) Ragout - A reference- assisted assembly tool for bacterial genomes, Bioinformatics, 30(12), i302–i309. de Koning, A. P. J.,Gu, W.,Castoe, T. A.,Batzer, M. A. and Pollock, D. D. (2011) Repetitive elements may comprise over Two-Thirds of the human genome, PLoS Genetics, 7(12), e1002384. Korneliussen, T. S.,Albrechtsen, A. and Nielsen, R. (2014) ANGSD: Analysis of Next Generation Sequencing Data, BMC Bioinformatics, 15(1), 356. Kulathinal, R. J.,Stevison, L. S. and Noor, M. A. F. (2009) The genomics of speciation in Drosophila: Diversity, divergence, and introgression estimated using low- coverage genome sequencing, PLoS Genetics, 5(7), e1000550. Langmead, B.,Trapnell, C.,Pop, M. and Salzberg, S. (2009) Ultrafast and memory- efficient alignment of short DNA sequences to the human genome, Genome biology, 10(3), R25. Larracuente, A. M.,Sackton, T. B.,Greenberg, A. J.,Wong, A.,Singh, N. D.,Sturgill, D.,Zhang, Y.,Oliver, B. and Clark, A. G. (2008) Evolution of protein-coding genes in Drosophila, Trends in Genetics, 24(3), 114–123. Lässig, M.,Mustonen, V. and Walczak, A. M. (2017) Predicting evolution, Nature Ecology and Evolution, 1(3), 0077. Leal, L.,Talla, V.,Källman, T.,Friberg, M.,Wiklund, C.,Dincă, V.,Vila, R. and Backström, N. (2018) Gene expression profiling across ontogenetic stages in the wood white (Leptidea sinapis) reveals pathways linked to butterfly diapause regulation, Molecular Ecology, 27(4), 935–948.

63 Lee, S.-I. and Kim, N.-S. (2014) Transposable Elements and Genome Size Variations in Plants, Genomics & Informatics, 12(3), 87–97. Leek, J. T.,Johnson, W. E.,Parker, H. S.,Jaffe, A. E. and Storey, J. D. (2012) The sva package for removing batch effects and other unwanted variation in high- throughput experiments., Bioinformatics (Oxford, England). England, 28(6), 882–883. Leffler, E. M.,Bullaughey, K.,Matute, D. R.,Meyer, W. K.,Ségurel, L.,Venkat, A.,Andolfatto, P. and Przeworski, M. (2012) Revisiting an Old Riddle: What Determines Genetic Diversity Levels within Species?, PLOS Biology. Public Library of Science, 10(9), 1–9. Levene, H. (1953) Genetic equilibrium when more than one ecological niche is available, The American Naturalist. Science Press, 87(836), 331–333. Levin, D. A.,Francisco-Ortega, J. and Jansen, R. K. (1996) Hybridization and the extinction of rare plant species, , 10(1), 10–16. Lewontin, R. C. and Hubby, J. L. (1966) A molecular approach to the study of genic heterozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura, Genetics. Genetics Society of America, 54(2), 595. Lewontin, R. C. and Krakauer, J. (1973) Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms, Genetics, 74(1), 175–195. Lexer, C.,Buerkle, C. A.,Joseph, J. A.,Heinze, B. and Fay, M. F. (2007) Admixture in European Populus hybrid zones makes feasible the mapping of loci that contribute to reproductive isolation and trait differences, Heredity, 98(2), 74. Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with Burrows- Wheeler transform, Bioinformatics, 25(14), 1754–1760. Li, H. and Durbin, R. (2011) Inference of human population history from individual whole-genome sequences, Nature, 475(7357), 493–496. Li, W. H.,Wu, C. I. and Luo, C. C. (1985) A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes., Molecular biology and evolution. United States, 2(2), 150–174. Lindholm, A. (2008) Mixed songs of chiffchaffs in Northern ., Alula, 3, 108– 115. Lindtke, D.,Lucek, K.,Soria-Carrasco, V.,Villoutreix, R.,Farkas, T. E.,Riesch, R.,Dennis, S. R.,Gompert, Z. and Nosil, P. (2017) Long-term balancing selection on chromosomal variants associated with crypsis in a stick insect, Molecular Ecology, 26(22), 6189–6205. Linnaeus, C. von (1758) Systema Naturae per regna tria naturae. Secundum classes, ordines, genera, species, cum characteribus, differentiis, synonymis, locis, Editio, 1(10), 823. Liti, G.,Carter, D. M.,Moses, A. M.,Warringer, J.,Parts, L.,James, S. A.,Davey, R. P.,Roberts, I. N.,Burt, A.,Koufopanou, V.,Tsai, I. J.,Bergman, C. M.,Bensasson, D.,O’Kelly, M. J. T.,van Oudenaarden, A.,Barton, D. B. H.,Bailes, E.,Nguyen, A. N.,Jones, M.,Quail, M. A.,Goodhead, I.,Sims, S.,Smith, F.,Blomberg, A.,Durbin, R. and Louis, E. J. (2009) Population genomics of domestic and wild yeasts, Nature. Macmillan Publishers Limited. All rights reserved, 458, 337. Liu, S.,Lorenzen, E. D.,Fumagalli, M.,Li, B.,Harris, K.,Xiong, Z.,Zhou, L.,Korneliussen, T. S.,Somel, M.,Babbitt, C.,Wray, G.,Li, J.,He, W.,Wang, Z.,Fu, W.,Xiang, X.,Morgan, C. C.,Doherty, A.,O’Connell, M. J.,McInerney, J. O.,Born, E. W.,Dalén, L.,Dietz, R.,Orlando, L.,Sonne, C.,Zhang, G.,Nielsen, R.,Willerslev, E. and Wang, J. (2014) Population genomics reveal recent

64 speciation and rapid evolutionary adaptation in polar bears, Cell, 157(4), 785– 794. Liu, Y.,Beyer, A. and Aebersold, R. (2016) On the Dependency of Cellular Protein Levels on mRNA Abundance, Cell, 165(3), 535–550. Locke, D. P.,Hillier, L. W.,Warren, W. C.,Worley, K. C.,Nazareth, L. V.,Muzny, D. M.,Yang, S. P.,Wang, Z.,Chinwalla, A. T.,Minx, P.,Mitreva, M.,Cook, L.,Delehaunty, K. D.,Fronick, C.,Schmidt, H.,Fulton, L. A.,Fulton, R. S.,Nelson, J. O.,Magrini, V.,Pohl, C.,Graves, T. A.,Markovic, C.,Cree, A.,Dinh, H. H.,Hume, J.,Kovar, C. L.,Fowler, G. R.,Lunter, G.,Meader, S.,Heger, A.,Ponting, C. P.,Marques-Bonet, T.,Alkan, C.,Chen, L.,Cheng, Z.,Kidd, J. M.,Eichler, E. E.,White, S.,Searle, S.,Vilella, A. J.,Chen, Y.,Flicek, P.,Ma, J.,Raney, B.,Suh, B.,Burhans, R.,Herrero, J.,Haussler, D.,Faria, R.,Fernando, O.,Darré, F.,Farré, D.,Gazave, E.,Oliva, M.,Navarro, A.,Roberto, R.,Capozzi, O.,Archidiacono, N.,Valle, G. Della,Purgato, S.,Rocchi, M.,Konkel, M. K.,Walker, J. A.,Ullmer, B.,Batzer, M. A.,Smit, A. F. A.,Hubley, R.,Casola, C.,Schrider, D. R.,Hahn, M. W.,Quesada, V.,Puente, X. S.,Ordõez, G. R.,Ĺpez-Otín, C.,Vinar, T.,Brejova, B.,Ratan, A.,Harris, R. S.,Miller, W.,Kosiol, C.,Lawson, H. A.,Taliwal, V.,Martins, A. L.,Siepel, A.,Roychoudhury, A.,Ma, X.,Degenhardt, J.,Bustamante, C. D.,Gutenkunst, R. N.,Mailund, T.,Dutheil, J. Y.,Hobolth, A.,Schierup, M. H.,Ryder, O. A.,Yoshinaga, Y.,De Jong, P. J.,Weinstock, G. M.,Rogers, J.,Mardis, E. R.,Gibbs, R. A. and Wilson, R. K. (2011) Comparative and demographic analysis of orang-utan genomes, Nature, 469(7331), 529–33. López-Maury, L.,Marguerat, S. and Bähler, J. (2008) Tuning gene expression to changing environments: from rapid responses to evolutionary adaptation, Nature Reviews Genetics. Nature Publishing Group, 9, 583. Lorkovic, Z. (1993) Leptidea reali Reissinger, 1989 (= lorkovicii Real 1988), a new European species (Lepid., Pieridae), Natura Croatica, 2(1), 1–26. Love, M. I.,Huber, W. and Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, Genome biology. BioMed Central, 15(12), 550. Luikart, G.,England, P. R.,Tallmon, D.,Jordan, S. and Taberlet, P. (2003) The power and promise of population genomics: From genotyping to genome typing, Nature Reviews Genetics, 4(12), 981–994. Lukhtanov, V. A.,Dinc, V.,Talavera, G. and Vila, R. (2011) Unprecedented within- species chromosome number cline in the Wood White butterfly Leptidea sinapis and its significance for karyotype evolution and speciation, BMC Evolutionary Biology, 11(1), 109. Lunter, G. and Goodson, M. (2011) Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads, Genome Research, 21(6), 936–939. Luo, R.,Liu, B.,Xie, Y.,Li, Z.,Huang, W.,Yuan, J.,He, G.,Chen, Y.,Pan, Q.,Liu, Y.,Tang, J.,Wu, G.,Zhang, H.,Shi, Y.,Liu, Y.,Yu, C.,Wang, B.,Lu, Y.,Han, C.,Cheung, D. W.,Yiu, S.-M.,Peng, S.,Xiaoqian, Z.,Liu, G.,Liao, X.,Li, Y.,Yang, H.,Wang, J.,Lam, T.-W. and Wang, J. (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler, GigaScience, 1(1), 18. Lynch, M. (2007) The Origins of Genome Architecture, Science. New York. Ma, S. and Bohnert, H. J. (2007) Integration of Arabidopsis thaliana stress-related transcript profiles, promoter structures, and cell-specific expression., Genome biology. England, 8(4), R49. Macias-Munoz, A.,Smith, G.,Monteiro, A. and Briscoe, A. D. (2016) Transcriptome- Wide Differential Gene Expression in Bicyclus anynana Butterflies: Female Vision-Related Genes Are More Plastic., Molecular biology and evolution. United States, 33(1), 79–92.

65 Mager, D. L. and Stoye, J. P. (2015) Mammalian Endogenous Retroviruses., in Mobile DNA III. United States: American Society of Microbiology, 1079–1100. Mallet, J. (2007a) Hybrid speciation, Nature, 446(7133), 279. Mallet, J. (2007b) SPECIES, CONCEPTS OF, Encyclopedia of Biodiversity, 5, 427– 440. Mallet, J. (2008) Hybridization, ecological races and the nature of species: Empirical evidence for the ease of speciation, Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1506), 2971–2986. Mallet, J.,Besansky, N. and Hahn, M. W. (2016) How reticulated are species?, BioEssays. Wiley Online Library, 38(2), 140–149. Mank, J. E.,Nam, K.,Brunström, B. and Ellegren, H. (2010) Ontogenetic Complexity of Sexual Dimorphism and Sex-Specific Selection, Molecular Biology and Evolution, 27(7), 1570–1578. Marguerat, S.,Schmidt, A.,Codlin, S.,Chen, W.,Aebersold, R. and Bähler, J. (2012) Quantitative Analysis of Fission Yeast Transcriptomes and Proteomes in Proliferating and Quiescent Cells, Cell, 151(3), 671–683. Margulies, M.,Egholm, M.,Altman, W. E.,Attiya, S.,Bader, J. S.,Bemben, L. A.,Berka, J.,Braverman, M. S.,Chen, Y.-J.,Chen, Z.,Dewell, S. B.,Du, L.,Fierro, J. M.,Gomes, X. V,Godwin, B. C.,He, W.,Helgesen, S.,Ho, C. H.,Irzyk, G. P.,Jando, S. C.,Alenquer, M. L. I.,Jarvie, T. P.,Jirage, K. B.,Kim, J.-B.,Knight, J. R.,Lanza, J. R.,Leamon, J. H.,Lefkowitz, S. M.,Lei, M.,Li, J.,Lohman, K. L.,Lu, H.,Makhijani, V. B.,McDade, K. E.,McKenna, M. P.,Myers, E. W.,Nickerson, E.,Nobile, J. R.,Plant, R.,Puc, B. P.,Ronan, M. T.,Roth, G. T.,Sarkis, G. J.,Simons, J. F.,Simpson, J. W.,Srinivasan, M.,Tartaro, K. R.,Tomasz, A.,Vogt, K. A.,Volkmer, G. A.,Wang, S. H.,Wang, Y.,Weiner, M. P.,Yu, P.,Begley, R. F. and Rothberg, J. M. (2005) Genome sequencing in microfabricated high-density picolitre reactors, Nature. The Author(s), 437, 376. Marova, I. M. and Alekseev, V. N. (2008) Population structure and distribution of vocal dialects of chiffchaff (Phylloscopus collybita) in southern Ural mountains, Proceedings of the South Ural State Natural Reserve, 1, 306–318. Marova, I. M.,Fedorov, V. V,Shipilina, D. A. and Alekseev, V. N. (2009) Genetic and vocal differentiation in hybrid zones of birds: Siberian and European chiffchaffs (Phylloscopus [collybita] tristis and Ph. [c.] abietinus) in the southern Urals., in Doklady biological sciences : proceedings of the Academy of Sciences of the USSR, Biological sciences sections. United States: SP MAIK Nauka/Interperiodica., 384–386. Marova, I. M.,Shipilina, D. A.,Fedorov, V. V and Ivanitskii, V. V (2013) Siberian and East European chiffchaffs: Geographical distribution, morphological features, vocalization, phenomenon of mixed singing, and evidences of hybridization in sympatry zone, El mosquitero ibérico. Grupo Iberico de Anillamiento Leon, 119– 139. Martin, J.-F.,Gilles, A. and Descimon, H. (2003) Species Concepts and Sibling Species: The Case of Leptidea sinapis and Leptidea reali, Butterflies: ecology and evolution taking flight, 459–476. Martin, J. A. and Wang, Z. (2011) Next-generation transcriptome assembly., Nature reviews. Genetics. England, 12(10), 671–682. Martin, S. H.,Dasmahapatra, K. K.,Nadeau, N. J.,Salazar, C.,Walters, J. R.,Simpson, F.,Blaxter, M.,Manica, A.,Mallet, J. and Jiggins, C. D. (2013) Genome-wide evidence for speciation with gene flow in Heliconius butterflies, Genome Research, 23(11), 1817–1828.

66 Martin, S. H.,Davey, J. W. and Jiggins, C. D. (2015) Evaluating the use of ABBA- BABA statistics to locate introgressed loci, Molecular Biology and Evolution, 32(1), 244–257. Martínez Wells, M. and Henry, C. S. (1992) Behavioural responses of green lacewings (Neuroptera: Chrysopidae: Chrysoperla) to synthetic mating songs, Behaviour. Academic Press, 44(4), 641–652. Masly, J. P. (2012) 170 Years of ‘Lock-and-Key’: Genital Morphology and Reproductive Isolation, International Journal of Evolutionary Biology. Mayr, E. (1942) Systematics and the Origin of Species, from the Viewpoint of a Zoologist. Harvard University Press. Mayr, E. (1963) Animal species and evolution. Harvard University Press, Cambridge, MA. Mayr, E. (1970) Populations, species, and evolution: an abridgment of animal species and evolution. Harvard University Press, Cambridge, MA. Mayr, E. (1982) The Growth of Biological Thought. Diversity, Evolution, and Inheritance, Harvard University Press. McClintock, B. (1956) Intranuclear systems controlling gene action and mutation., in Brookhaven Symp Biol. United States, 58–74. McDonald, J. H. and Kreitman, M. (1991) Adaptive protein evolution at the Adh locus in Drosophila, Nature, 351(6328), 652. McKenna, A.,Hanna, M.,Banks, E.,Sivachenko, A.,Cibulskis, K.,Kernytsky, A.,Garimella, K.,Altshuler, D.,Gabriel, S.,Daly, M. and DePristo, M. A. (2010) The genome analysis toolkit: A MapReduce framework for analyzing next- generation DNA sequencing data, Genome Research, 20(9), 1297–1303. Meier, K.,Hansen, M. M.,Normandeau, E.,Mensberg, K.-L. D.,Frydenberg, J.,Larsen, P. F.,Bekkevold, D. and Bernatchez, L. (2014) Local Adaptation at the Transcriptome Level in Brown Trout: Evidence from Early Life History Temperature Genomic Reaction Norms, PLOS ONE. Public Library of Science, 9(1), 1–13. Mendel, G. (1866) Versuche über Pflanzenhybriden, Verhandlungen des naturforschenden Vereines in Brunn 4: 3, 44. Morris, M. R. J.,Richard, R.,Leder, E. H.,Barrett, R. D. H.,Aubin-Horth, N. and Rogers, S. M. (2014) Gene expression plasticity evolves in response to colonization of freshwater lakes in threespine stickleback., Molecular ecology. England, 23(13), 3226–3240. Mortazavi, A.,Williams, B. A.,McCue, K.,Schaeffer, L. and Wold, B. (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq, Nature Methods, 5(7), 621–628. Mudge, J. M. and Harrow, J. (2016) The state of play in higher eukaryote gene annotation, Nature Reviews Genetics, 17(12), 758–772. Muller, H. J. (1942) Isolating mechanisms, evolution and temperature, Biol. Symp, 161(3), 939–944. Murray, G. G. R.,Soares, A. E. R.,Novak, B. J.,Schaefer, N. K.,Cahill, J. A.,Baker, A. J.,Demboski, J. R.,Doll, A.,Da Fonseca, R. R.,Fulton, T. L.,Gilbert, T. P.,Heintzman, P. D.,Letts, B.,McIntosh, G.,O’Connell, B. L.,Peck, M.,Pipes, M. L.,Rice, E. S.,Santos, K. M.,Sohrweide, A. G.,Vohr, S. H.,Corbett-Detig, R. B.,Green, R. E. and Shapiro, B. (2017) Natural selection shaped the rise and fall of passenger pigeon genomic diversity, Science, 358(6365), 951–954. Murray, J. I.,Whitfield, M. L.,Trinklein, N. D.,Myers, R. M.,Brown, P. O. and Botstein, D. (2004) Diverse and specific gene expression responses to stresses in cultured human cells., Molecular biology of the cell. United States, 15(5), 2361– 2374.

67 Nadachowska-Brzyska, K.,Burri, R.,Smeds, L. and Ellegren, H. (2016) PSMC analysis of effective population sizes in molecular ecology and its application to black-and-white Ficedula flycatchers, Molecular Ecology, 25(5), 1058–1072. Nadeau, N. J.,Ruiz, M.,Salazar, P.,Counterman, B.,Medina, J. A.,Ortiz-Zuazaga, H.,Morrison, A.,McMillan, W. O.,Jiggins, C. D. and Papa, R. (2014) Population genomics of parallel hybrid zones in the mimetic butterflies, H. melpomene and H. erato, Genome Research, 24(8), 1316–1333. Nei, M. (1973) Analysis of gene diversity in subdivided populations, Proceedings of the National Academy of Sciences. National Acad Sciences, 70(12), 3321–3323. Nei, M. (1987) Molecular evolutionary genetics. Columbia university press. Nei, M. and Li, W.-H. (1979) Mathematical model for studying genetic variation in terms of restriction endonucleases., Proceedings of the National Academy of Sciences of the United States of America, 76(10), 5269–5273. Nielsen, R.,Hellmann, I.,Hubisz, M.,Bustamante, C. and Clark, A. G. (2007) Recent and ongoing selection in the human genome, Nature Reviews Genetics, 8(11), 857. Van Nieukerken, E.,Kaila, L.,Kitching, I.,Kristensen, N.,Lees, D.,Minet, J.,Mitter, C.,Mutanen, M.,Regier, J.,Simonsen, T. and others (2011) Order Lepidoptera Linnaeus, 1758., in. Zootaxa. Noguchi, H. and Hayakawa, Y. (2001) Dopamine is a key factor for the induction of egg diapause of the silkworm, Bombyx mori, European Journal of Biochemistry, 268(3), 774–780. Nolte, A. W.,Gompert, Z. and Buerkle, C. A. (2009) Variable patterns of introgression in two sculpin hybrid zones suggest that genomic isolation differs among populations, Molecular Ecology, 18(12), 2615–2627. Noor, M. A. F. and Feder, J. L. (2006) Speciation genetics: Evolving approaches, Nature Reviews Genetics, 7(11), 851. Nosil, P.,Crespi, B. J. and Sandoval, C. P. (2002) Host-plant adaptation drives the parallel evolution of reproductive isolation, Nature, 417(6887), 440. Nosil, P.,Funk, D. J. and Ortiz-Barrientos, D. (2009) Divergent selection and heterogeneous genomic divergence., Molecular ecology. England, 18(3), 375– 402. Nosil, P. and Schluter, D. (2011) The genes underlying the process of speciation, Trends in Ecology and Evolution, 26(4), 160–167. Nosil, P.,Villoutreix, R.,de Carvalho, C. F.,Farkas, T. E.,Soria-Carrasco, V.,Feder, J. L.,Crespi, B. J. and Gompert, Z. (2018) Natural selection and the predictability of evolution inTimemastick insects., Science, 359(6377), 765–770. Obbard, D. J.,Welch, J. J.,Kim, K. W. and Jiggins, F. M. (2009) Quantifying adaptive evolution in the Drosophila immune system, PLoS Genetics. Ometto, L.,Shoemaker, D.,Ross, K. G. and Keller, L. (2011) Evolution of gene expression in fire ants: the effects of developmental stage, caste, and species., Molecular biology and evolution. United States, 28(4), 1381–1392. Orr, H. A. (1996) Dobzhansky, Bateson, and the genetics of speciation, Genetics, 144(4), 1331. Orr, H. A. (2001) The genetics of species differences., Trends in ecology & evolution. England, 16(7), 343–350. Orr, H. A.,Masly, J. P. and Presgraves, D. C. (2004) Speciation genes., Current opinion in genetics & development. England, 14(6), 675–679. Pal, C.,Papp, B. and Hurst, L. D. (2001) Highly Expressed Genes in Yeast Evolve Slowly, Genetics, 158(2), 927–931. Parra, G.,Bradnam, K. and Korf, I. (2007) CEGMA: A pipeline to accurately annotate core genes in eukaryotic genomes, Bioinformatics, 23(9), 1061–1067.

68 Petrov, D. A. (2001) Evolution of genome size: new approaches to an old problem, TRENDS in Genetics. Elsevier, 17(1), 23–28. Petrov, D. A. (2002) DNA loss and evolution of genome size in Drosophila, Genetica. Springer, 115(1), 81–91. Poelstra, J. W.,Vijay, N.,Bossu, C. M.,Lantz, H.,Ryll, B.,Müller, I.,Baglione, V.,Unneberg, P.,Wikelski, M.,Grabherr, M. G. and Wolf, J. B. W. (2014) The genomic landscape underlying phenotypic integrity in the face of gene flow in crows, Science, 344(1690), 1410–1414. Pool, J. E.,Hellmann, I.,Jensen, J. D. and Nielsen, R. (2010) Population genetic inference from genomic sequence variation, Genome Research, 20(3), 291–300. Presgraves, D. C. (2010) The molecular evolutionary basis of species formation, Nature Reviews Genetics, 11(3), 175–180. Presgraves, D. C.,Balagopalan, L.,Abmayr, S. M. and Orr, H. A. (2003) Adaptive evolution drives divergence of a hybrid inviability gene between two species of Drosophila, Nature, 423(6941), 715–719. Price, A. L.,Patterson, N. J.,Plenge, R. M.,Weinblatt, M. E.,Shadick, N. A. and Reich, D. (2006) Principal components analysis corrects for stratification in genome- wide association studies, Nature Genetics, 38(8), 904–909. Pringle, E. G.,Baxter, S. W.,Webster, C. L.,Papanicolaou, A.,Lee, S. F. and Jiggins, C. D. (2007) Synteny and chromosome evolution in the Lepidoptera: evidence from mapping in Heliconius melpomene, Genetics. Genetics Soc America, 177(1), 417–426. Pritchard, J. K.,Stephens, M. and Donnelly, P. (2000) Inference of population structure using multilocus genotype data, Genetics, 155(2), 945–959. Pritham, E. J. (2009) Transposable elements and factors influencing their success in eukaryotes, Journal of Heredity, 100(5), 648–655. Quail, M. A.,Smith, M.,Coupland, P.,Otto, T. D.,Harris, S. R.,Connor, T. R.,Bertoni, A.,Swerdlow, H. P. and Gu, Y. (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers, BMC Genomics, 13(1), 341. De Queiroz, K. (2007) Species concepts and species delimitation, Systematic Biology, 56(6), 879–886. Ran, F. A.,Hsu, P. D.,Lin, C.-Y.,Gootenberg, J. S.,Konermann, S.,Trevino, A. E.,Scott, D. A.,Inoue, A.,Matoba, S.,Zhang, Y. and others (2013) Double nicking by RNA-guided CRISPR Cas9 for enhanced specificity, Cell. Elsevier, 154(6), 1380–1389. Ranz, J. M.,Castillo-Davis, C. I.,Meiklejohn, C. D. and Hartl, D. L. (2003) Sex- Dependent Gene Expression and Evolution of the Drosophila Transcriptome, Science. American Association for the Advancement of Science, 300(5626), 1742–1745. Réal, P. (1988) Lépidoptères nouveaux principalement jurassiens, Mémoires du comité de liaison pour les recherches ecofaunistiques dans le Jura., Publication périodique, 17–18. Rebollo, R.,Romanish, M. T. and Mager, D. L. (2012) Transposable Elements: An Abundant and Natural Source of Regulatory Sequences for Host Genes, Annual Review of Genetics, 46, 21–42. Rieseberg, L. H.,Archer, M. A. and Wayne, R. K. (1999) Transgressive segregation, adaptation and speciation, Heredity, 83(4), 363. Rieseberg, L. H.,Whitton, J. and Gardner, K. (1999) Hybrid zones and the genetic architecture of a barrier to gene flow between two sunflower species, Genetics, 152(2), 713–727.

69 Robertson, A. (1961) in artificial selection programmes, Genetics Research. Cambridge University Press, 2(2), 189–194. Robertson, A. (1975) Remarks on the Lewontin-Krakauer test, Genetics. Genetics, 80(2), 396. Rockman, M. V. (2012) The QTN program and the alleles that matter for evolution: All that’s gold does not glitter, Evolution, 66(1), 1–17. Rogers, S. M. and Bernatchez, L. (2005) Integrating QTL mapping and genome scans towards the characterization of candidate loci under parallel selection in the lake whitefish (Coregonus clupeaformis)., Molecular ecology. England, 14(2), 351– 361. Rogers, S. M. and Bernatchez, L. (2006) The genetic basis of intrinsic and extrinsic post-zygotic reproductive isolation jointly promoting speciation in the lake whitefish species complex (Coregonus clupeaformis), Journal of Evolutionary Biology, 19(6), 1979–1994. Rundle, H. D. and Nosil, P. (2005) Ecological speciation, Ecology Letters, 8(3), 336– 352. Sanger, F.,Air, G. M.,Barrell, B. G.,Brown, N. L.,Coulson, a R.,Fiddes, C. a,Hutchison, C. a,Slocombe, P. M. and Smith, M. (1977) Nucleotide sequence of bacteriophage phi X174 DNA., Nature, 265(5596), 687–695. Sanmiguel, P. and Bennetzen, J. L. (1998) Evidence that a Recent Increase in Maize Genome Size was Caused by the Massive Amplification of Intergene Retrotransposons, Annals of Botany, 82(suppl_1), 37–44. Scheiner, S. M. (1993) Genetics and Evolution of Phenotypic Plasticity, Annual Review of Ecology and Systematics, 24(1), 35–68. Schlichting, C. D. (1986) The Evolution of Phenotypic Plasticity in Plants, Annual Review of Ecology and Systematics, 17(1), 667–693. Schlick-Steiner, B. C.,Seifert, B.,Stauffer, C.,Christian, E.,Crozier, R. H. and Steiner, F. M. (2007) Without morphology, cryptic species stay in taxonomic crypsis following discovery, Trends in Ecology and Evolution, 22(8), 391–392. Schlötterer, C. (2003) Hitchhiking mapping - Functional genomics from the population genetics perspective, Trends in Genetics, 19(1), 32–38. Schluter, D. (2001) Ecology and the origin of species, Trends in Ecology and Evolution, 16(7), 372–380. Schluter, D. (2009) Evidence for ecological speciation and its alternative, Science, 323(5915), 737–741. Schuster, S. C. (2008) Next-generation sequencing transforms today’s biology, Nature Methods, 5(1), 16. Schwanhäusser, B.,Busse, D.,Li, N.,Dittmar, G.,Schuchhardt, J.,Wolf, J.,Chen, W. and Selbach, M. (2011) Global quantification of mammalian gene expression control, Nature. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved., 473, 337. Scriber, J. M.,Hagen, R. H. and Lederhouse, R. C. (1996) Genetics of mimicry in the swallowtail butterflies, Papilio glaucus and P. canadensis (Lepidoptera: Papilionidae), Evolution. Wiley Online Library, 50(1), 222–236. Seehausen, O.,Butlin, R. K.,Keller, I.,Wagner, C. E.,Boughman, J. W.,Hohenlohe, P. A.,Peichel, C. L.,Saetre, G. P.,Bank, C.,Brännström, Å.,Brelsford, A.,Clarkson, C. S.,Eroukhmanoff, F.,Feder, J. L.,Fischer, M. C.,Foote, A. D.,Foote, A. D.,Franchini, P.,Jiggins, C. D.,Jones, F. C.,Lindholm, A. K.,Lucek, K.,Maan, M. E.,Marques, D. A.,Marques, D. A.,Martin, S. H.,Matthews, B.,Meier, J. I.,Meier, J. I.,Möst, M.,Möst, M.,Nachman, M. W.,Nonaka, E.,Rennison, D. J.,Schwarzer, J.,Schwarzer, J.,Schwarzer, J.,Watson, E. T.,Westram, A. M. and Widmer, A.

70 (2014) Genomics and the origin of species, Nature Reviews Genetics, 15(3), 176– 92. Shendure, J. and Ji, H. (2008) Next-generation DNA sequencing, Nature Biotechnology, 26(10), 1135. Shipilina, D.,Serbyn, M.,Ivanitskii, V.,Marova, I. and Backström, N. (2017) Patterns of genetic, phenotypic, and acoustic variation across a chiffchaff (Phylloscopus collybita abietinus/tristis) hybrid zone, Ecology and Evolution, 7(7), 2169–2180. Šíchová, J.,Voleníková, A.,Dincă, V.,Nguyen, P.,Vila, R.,Sahara, K. and Marec, F. (2015) Dynamic karyotype evolution and unique sex determination systems in Leptidea wood white butterflies, BMC Evolutionary Biology, 15(1), 89. Simão, F. A.,Waterhouse, R. M.,Ioannidis, P.,Kriventseva, E. V. and Zdobnov, E. M. (2015) BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs, Bioinformatics, 31(19), 3210–3212. Slotte, T.,Foxe, J. P.,Hazzouri, K. M. and Wright, S. I. (2010) Genome-Wide Evidence for Efficient Positive and Purifying Selection in Capsella grandiflora, a Plant Species with a Large Effective Population Size, Molecular Biology and Evolution, 27(8), 1813–1821. Smith, N. G. C. and Eyre-Walker, A. (2002) Adaptive protein evolution in Drosophila, Nature. Nature Publishing Group, 415(6875), 1022. Soediono, B. (1989) The Insects, Structure and Functions, Journal of Chemical Information and Modeling. Sohn, J. Il and Nam, J. W. (2018) The present and future of de novo whole-genome assembly, Briefings in Bioinformatics, 19(1), 23–40. Soria-Carrasco, V.,Gompert, Z.,Comeault, A. A.,Farkas, T. E.,Parchman, T. L.,Johnston, J. S.,Buerkle, C. A.,Feder, J. L.,Bast, J.,Schwander, T.,Egan, S. P.,Crespi, B. J. and Nosil, P. (2014) Stick insect genomes reveal natural selection’s role in parallel speciation, Science. Sota, T. and Kubota, K. (1998) Genital lock-and-key as a selective agent against hybridization, Evolution. Stanke, M. and Waack, S. (2003) Gene prediction with a hidden Markov model and a new intron submodel, Bioinformatics. Oxford University Press, 19(suppl_2), ii215--ii225. Steiner, C. C.,Weber, J. N. and Hoekstra, H. E. (2007) Adaptive Variation in Beach Mice Produced by Two Interacting Pigmentation Genes, PLOS Biology. Public Library of Science, 5(9), 1–10. Stepanyan, L. S. (1990) List of ornithofauna of USSR., Moscow: Nauka. Stevens, P. F. (2001) History of , e LS. Wiley Online Library. Stoletzki, N. and Eyre-Walker, A. (2011) Estimation of the neutrality index, Molecular Biology and Evolution, 28(1), 63–70. Strasburg, J. L.,Kane, N. C.,Raduski, A. R.,Bonin, A.,Michelmore, R. and Rieseberg, L. H. (2011) Effective Population Size Is Positively Correlated with Levels of Adaptive Divergence among Annual Sunflowers, Molecular Biology and Evolution. Oxford University Press, 28(5), 1569–1580. Sun, S.,Ting, C. T. and Wu, C. I. (2004) The normal function of a speciation gene, Odysseus, and its hybrid sterility effect, Science. Suomalainen, E. (1969) Chromosome evolution in the Lepidoptera, Chromosomes today, 2, 132–138. Suomalainen, E.,Cook, L. M. and Turner, J. R. G. (1973) Achiasmatic oogenesis in the Heliconiine butterflies, Hereditas, 74(2), 302–304. Svensson, L. (1992) Identification Guide to European ., British Trust for .

71 Szymura, J. M. and Barton, N. H. (1986) GENETIC ANALYSIS OF A HYBRID ZONE BETWEEN THE FIRE-BELLIED TOADS, BOMBINA BOMBINA AND B. VARIEGATA, NEAR CRACOW IN SOUTHERN ., Evolution; international journal of organic evolution. United States, 40(6), 1141– 1159. Tabor, H. K.,Risch, N. J. and Myers, R. M. (2002) Candidate-gene approaches for studying complex genetic traits: practical considerations., Nature reviews. Genetics. England, 3(5), 391–397. Talla, V.,Kalsoom, F.,Shipilina, D.,Marova, I. and Backström, N. (2017) Heterogeneous patterns of genetic diversity and differentiation in European and Siberian Chiffchaff (Phylloscopus collybita abietinus/P. tristis), G3: Genes, Genomes, Genetics, 7(12), 3983–3998. Talla, V.,Suh, A.,Kalsoom, F.,Dinca, V.,Vila, R.,Friberg, M.,Wiklund, C. and Backström, N. (2017) Rapid increase in genome size as a consequence of transposable element hyperactivity in wood-white (leptidea) butterflies, Genome Biology and Evolution, 9(10), 2491–2505. Taylor, E. B.,Boughman, J. W.,Groenenboom, M.,Sniatynski, M.,Schluter, D. and Gow, J. L. (2006) Speciation in reverse: Morphological and genetic evidence of the collapse of a three-spined stickleback (Gasterosteus aculeatus) species pair, Molecular Ecology, 15(2), 343–355. Taylor, S. A.,Larson, E. L. and Harrison, R. G. (2015) Hybrid zones: Windows on climate change, Trends in Ecology and Evolution, 30(7), 398–406. Teeter, K. C.,Thibodeau, L. M.,Gompert, Z.,Buerkle, C. A.,Nachman, M. W. and Tucker, P. K. (2010) The variable genomic architecture of isolation between hybridizing species of house mice, Evolution, 64(2), 472–485. Tenaillon, M. I.,Hollister, J. D. and Gaut, B. S. (2010) A triptych of the evolution of plant transposable elements, Trends in Plant Science, 15(8), 471–478. Thomas, C. A. (1971) The Genetic Organization of Chromosomes, Annual Review of Genetics, 5, 237–256. Ticehurst, C. B. (1938) A systematic review of the genus Phylloscopus., British Museum. Ting, C.-T.,Tsaur, S.-C. and Wu, C.-I. (2000) The phylogeny of closely related species as revealed by the genealogy of a speciation gene, Odysseus, Proceedings of the National Academy of Sciences, 97(10), 5313–5316. Ting, C. T.,Tsaur, S. C.,Wu, M. L. and Wu, C. I. (1998) A rapidly evolving homeobox at the site of a hybrid sterility gene, Science, 282(5393), 1501–1504. Tirosh, I.,Reikhav, S.,Levy, A. a and Barkai, N. (2009) A yeast hybrid provides insight into the evolution of gene expression regulation., Science, 324(5927), 659–662. Trapnell, C.,Pachter, L. and Salzberg, S. L. (2009) TopHat: discovering splice junctions with RNA-Seq, Bioinformatics, 25(9), 1105–1111. Trewick, S. A. (2000) Mitochondrial DNA sequences support allozyme evidence for cryptic radiation of New Zealand Peripatoides (Onychophora), Molecular Ecology, 9(3), 269–281. Turner, J. R. G. and Sheppard, P. M. (1975) Absence of crossing-over in female butterflies (Heliconius), Heredity. The Genetical Society of , 34, 265. Turner, T. L. and Hahn, M. W. (2010) Genomic islands of speciation or genomic islands and speciation?, Molecular ecology. England, 19(5), 848–850. Turner, T. L.,Hahn, M. W. and Nuzhdin, S. V (2005) Genomic islands of speciation in Anopheles gambiae., PLoS biology. United States, 3(9), e285.

72 Van’t Hof, A. E.,Campagne, P.,Rigden, D. J.,Yung, C. J.,Lingley, J.,Quail, M. A.,Hall, N.,Darby, A. C. and Saccheri, I. J. (2016) The industrial melanism mutation in British peppered moths is a transposable element, Nature, 534(7605), 102–105. Van’t Hof, A. E.,Edmonds, N.,Dalíková, M.,Marec, F. and Saccheri, I. J. (2011) Industrial melanism in british peppered moths has a singular and recent mutational origin, Science, 332(6032), 958–960. Via, S. (2009) Natural selection in action during speciation, Proceedings of the National Academy of Sciences, 106(Suppl 1), 9939–9946. Vijay, N.,Bossu, C. M.,Poelstra, J. W.,Weissensteiner, M. H.,Suh, A.,Kryukov, A. P. and Wolf, J. B. W. (2016) Evolution of heterogeneous genome differentiation across multiple contact zones in a crow species complex, Nature Communications. The Author(s), 7, 13195. Vincent, S.,Vian, J. M. and Carlotti, M. P. (2000) Partial sequencing of the cytochrome oxydase b subunit gene I: a tool for the identification of European species of blow flies for postmortem interval estimation., Journal of forensic sciences, 45(4), 820–823. Vitalis, R.,Dawson, K. and Boursot, P. (2001) Interpretation of variation across marker loci as evidence of selection, Genetics, 158(4), 1811–1823. Vonlanthen, P.,Bittner, D.,Hudson, A. G.,Young, K. A.,Müller, R.,Lundsgaard- Hansen, B.,Roy, D.,Di Piazza, S.,Largiader, C. R. and Seehausen, O. (2012) Eutrophication causes speciation reversal in whitefish adaptive radiations, Nature, 482(7385), 357. Wallace, A. . R. R. (1865) On the phenomena of variation and geographical distribution as illustrated by the Papilionidae of the Malayan region, Trans. Linn. Soc. London, 25(1), 1–71. Wang, Q.,Egi, Y.,Takeda, M.,Oishi, K. and Sakamoto, K. (2015) Melatonin pathway transmits information to terminate pupal diapause in the Chinese oak silkmoth Antheraea pernyi and through reciprocated inhibition of dopamine pathway functions as a photoperiodic counter, Entomological Science, 18(1), 74–84. Wang, Z.,Chen, Y. and Li, Y. (2004) A Brief Review of Computational Gene Prediction Methods, Genomics, Proteomics & Bioinformatics. Elsevier, 2(4), 216–221. Watterson, G. A. (1975) On the number of segregating sites in genetical models without recombination., Theoretical population biology. United States, 7(2), 256– 276. West-Eberhard, M. J. (1989) Phenotypic Plasticity and the Origins of Diversity, Annual Review of Ecology and Systematics, 20(1), 249–278. William G. Eberhard (1996) Female Control: Sexual Selection by Cryptic Female Choice. Wittkopp, P. J.,Haerum, B. K. and Clark, A. G. (2008) Regulatory changes underlying expression differences within and between Drosophila species, Nature Genetics, 40(3), 346. Wolf, J. B. W. and Ellegren, H. (2017) Making sense of genomic islands of differentiation in light of speciation, Nature Reviews Genetics, 18(2), 87. Wright, S. (1921) Systems of Mating. IV. the Effects of Selection, Genetics, 6(2), 162–166. Wright, S. (1931) Evolution in mendelian populations, Genetics, 16(2), 97. Wright, S. (1942) Statistical genetics and evolution, Bulletin of the American Mathematical Society, 48(4), 223–246. Wright, S. (1950) Genetical structure of populations, Nature, 166, 247–249.

73 Wu, C.-I. (2001) The genic view of the process of speciation, Journal of Evolutionary Biology. Wiley Online Library, 14(6), 851–865. Wurm, Y.,Wang, J. and Keller, L. (2010) Changes in reproductive roles are associated with changes in gene expression in fire ant queens., Molecular ecology. England, 19(6), 1200–1211. Wyman, M. J.,Agrawal, A. F. and Rowe, L. (2010) Condition-dependence of the sexually dimorphic transcriptome in Drosophila melanogaster., Evolution; international journal of organic evolution. United States, 64(6), 1836–1848. Yang, Z. (2007) PAML 4: Phylogenetic analysis by maximum likelihood, Molecular Biology and Evolution, 24(8), 1586–1591. Yasukochi, Y.,Tanaka-Okuyama, M.,Shibata, F.,Yoshido, A.,Marec, F.,Wu, C.,Zhang, H.,Goldsmith, M. R. and Sahara, K. (2009) Extensive conserved synteny of genes between the karyotypes of Manduca sexta and Bombyx mori revealed by BAC-FISH mapping, PLoS One. Public Library of Science, 4(10), e7465. Zerbino, D. R. and Birney, E. (2008) Velvet: Algorithms for de novo short read assembly using de Bruijn graphs, Genome Research. Cold Spring Harbor Laboratory Press, 18(5), 821–829. Zhan, S.,Zhang, W.,Niitepold, K.,Hsu, J.,Haeger, J. F.,Zalucki, M. P.,Altizer, S.,de Roode, J. C.,Reppert, S. M. and Kronforst, M. R. (2014) The genetics of monarch butterfly migration and warning colouration., Nature. England, 514(7522), 317– 321. Zhang, P.,Schon, E. A.,Fischer, S. G.,Cayanis, E.,Weiss, J.,Kistler, S. and Bourne, P. E. (1994) An algorithm based on graph theory for the assembly of contigs in physical mapping of DNA, Bioinformatics, 10(3), 309–317. Zheng, X.,Levine, D.,Shen, J.,Gogarten, S. M.,Laurie, C. and Weir, B. S. (2012) A high-performance computing toolset for relatedness and principal component analysis of SNP data, Bioinformatics, 28(24), 3326–3328.

74

Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1714 Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science and Technology, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-358292 2018