<<

Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1192

The genetic basis for in natural

SANGEET LAMICHHANEY

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6206 ISBN 978-91-554-9502-2 UPPSALA urn:nbn:se:uu:diva-279969 2016 Dissertation presented at Uppsala University to be publicly examined in B41, BMC, Husargätan 3, Uppsala, Friday, 29 April 2016 at 13:15 for the degree of Doctor of Philosophy (Faculty of Medicine). The examination will be conducted in English. Faculty examiner: Professor Henrik Kaessmann (Heidelberg University, Germany).

Abstract Lamichhaney, S. 2016. The genetic basis for adaptation in natural populations. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1192. 60 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9502-2.

Many previous studies in evolutionary have been based on few model that can be reared at ease in the laboratory. In contrast, genetic studies of non-model, natural populations are desirable as they provide a wider range of adaptive throughout evolutionary timescales and allow a more realistic understanding of how drives adaptive . This thesis represents an example of how modern genomic tools can be effectively used to study adaptation in natural populations. Atlantic herring is one of the world’s most numerous fish having multiple populations with phenotypic differences adapted to strikingly different environments. Our study demonstrated insignificant level of in herring that resulted in minute genetic differences in the majority of the among these populations. In contrast, a small percentage of the loci showed striking genetic differentiation that were potentially under natural selection. We identified loci associated with adaptation to the Baltic Sea and with seasonal (spring- and autumn-spawning) and demonstrated that ecological adaptation in Atlantic herring is highly polygenic but controlled by a finite number of loci. The study of Darwin’s finches constitutes a breakthrough in characterizing their evolution. We identified two loci, ALX1 and HMGA2, which most likely are the two most prominent loci that contributed to diversification and thereby to expanded food utilization. These loci have played a key role in adaptive evolution of Darwin’s finches. Our study also demonstrated that interspecies played a significant role in the radiation of Darwin’s finches and some species have a mixed ancestry. This thesis also explored the genetic basis for the remarkable phenotypic differences between three male morphs in the ruff. Identification of two different versions of a 4.5 MB inversion in Satellites and Faeders that occurred about 4 million years ago revealed clues about the genetic foundation of male mating strategies in ruff. We highlighted two genes in the inverted region; HSD17B2 that affects of testosterone and MC1R that has a key role in regulating pigmentation, as the major loci associated with this adaptation.

Keywords: Adaptive evolution, Atlantic herring, ecological adaptation, seasonal reproduction, TSHR, Darwin’s finches, natural selection, beak, ALX1, HMGA2, ruff, lek, inversion, HSD17B2, MC1R

Sangeet Lamichhaney, Department of Medical Biochemistry and Microbiology, Box 582, Uppsala University, SE-75123 Uppsala, Sweden.

© Sangeet Lamichhaney 2016

ISSN 1651-6206 ISBN 978-91-554-9502-2 urn:nbn:se:uu:diva-279969 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-279969)

The measure of intelligence is the ability to change

- Albert Einstein

Cover images: (1) Herring in the Baltic Sea © Riku Lumiaro/Finnish Envi- ronment Institute (2) Medium ground finch © P. R. Grant, originally pub- lished in Grant & Grant (2014) and (3) Independent male ruff dominating Satellite male at lek © Torsten Green-Petersen

List of papers

This thesis is based on the following five papers

I. Sangeet Lamichhaney*, Alvaro Martinez Barrio*, Nima Rafati*, Görel Sundstrom*, Carl-Johan Rubin, Elizabeth R. Gilbert, Jonas Berglund, Anna Wetterbom, Linda Laikre, Matthew T. Webster, Manfred Grabherr, Nils Ryman and Leif Andersson (2012) -scale sequencing reveals genetic differ- entiation due to local adaptation in Atlantic herring. Proceedings of the Nation- al Academy of Sciences U.S.A. 109, 19345-19350.

II. Alvaro Martinez Barrio*, Sangeet Lamichhaney*, Guangyi Fan*, Nima Ra- fati*, Mats Pettersson, He Zhang, Jacques Dainat, Diana Ekman, Marc Hoeppner, Patric Jern, Marcel Martin, Björn Nystedt, Xin Liu, Wenbin Chen, Xinming Liang, Chengcheng Shi, Yuanyuan Fu, Kailong Ma, Xiao Zhan, Chungang Feng, Ulla Gustafson, Carl-Johan Rubin, Markus Sällman Almén, Martina Blass, Michele Casini, Arild Folkvord, Linda Laikre, Nils Ryman, Si- mon Ming-Yuen Lee, Xun Xu and Leif Andersson, The genetic basis for eco- logical adaptation of the Atlantic herring revealed by genome sequencing, Manuscript under review.

III. Sangeet Lamichhaney*, Jonas Berglund*, Markus Sällman Almén, Khurram Maqbool, Manfred Grabherr, Alvaro Martinez-Barrio, Marta Promerova´, Carl- Johan Rubin, Chao Wang, Neda Zamani, B. Rosemary Grant, Peter R. Grant, Matthew T. Webster and Leif Andersson (2015) Evolution of Darwin's finches and their revealed by genome sequencing. Nature 518, 371-375

IV. Sangeet Lamichhaney, Fan Han, Jonas Berglund, Chao Wang, Markus Säll- man Almén, Matthew T. Webster, B. Rosemary Grant, Peter R. Grant and Leif Andersson, A beak size in Darwin’s finches facilitated character dis- placement during a drought, , in press .

V. Sangeet Lamichhaney*, Guangyi Fan*, Fredrik Widemo*, Ulrika Gunnars- son, Doreen Schwochow Thalmann, Marc P Hoeppner, Susanne Kerje, Ulla Gustafson, Chengcheng Shi, He Zhang, Wenbin Chen, Xinming Liang, Leihuan Huang, Jiahao Wang, Enjing Liang, Qiong Wu, Simon Ming-Yuen Lee, Xun Xu, Jacob Höglund, Xin Liu and Leif Andersson (2016) Structural genomic changes underlie alternative reproductive strategies in the ruff (Philomachus pugnax), Nature Genetics, 48, 84-88.

* These authors contributed equally

Reprints were made with permission from the respective publishers.

List of papers (not included in the thesis)

I. Markus Sällman Almén, Sangeet Lamichhaney, Jonas Berglund, B. Rosemary Grant, Peter R. Grant, Matthew T. Webster and Leif Andersson (2015) of Darwin’s finches revisited using , Bioessays, 38, 14-20.

II. Suzanne V. Saenko, Sangeet Lamichhaney, Alvaro Martinez Barrio, Nima Rafati, Leif Andersson and Michel C. Milinkovitch (2015) Amelanism in the corn snake is caused by the insertion of an LTR-retrotransposon in the OCA2 gene, Scientific Reports, 5, 17118.

III. Mats E. Pettersson, Marcin Kierczak, Markus Sällman Almén, Sangeet Lamich- haney and Leif Andersson, A model-free approach for detecting genomic regions of deep divergence using the distribution of distances, Manuscript sub- mitted.

IV. Neda Zamani*, Behrooz Torabi Moghadam*, Nima Rafati*, Sangeet Lamich- haney*, Görel Sundström, Henrik Lantz, Alvaro Martinez Barrio, Jan Komor- owski, Bernardo J. Clavijo, Patric Jern and Manfred G Grabherr, A local web inter- face for protein and transcript aligners: Smörgås, Manuscript submitted

* These authors contributed equally

Table of contents

1. Introduction ...... 13 1.1. Adaptation ...... 13 1.2. Genetics of adaptation ...... 14 1.3. The ‘’ of adaptation ...... 15 1.4. Methods to identify loci underlying adaptation ...... 15 1.4.1. Forward genetics approaches ...... 15 1.4.2. approaches ...... 18 1.4.3. Candidate genes approaches ...... 19 1.5. Background to the thesis projects ...... 20 1.5.1. Atlantic herring ...... 20 1.5.2. Darwin’s finches ...... 22 1.5.3. Ruff ...... 25 2. Methods ...... 28 2.1. Sampling and collection of ecological data ...... 28 2.2. Whole-genome sequencing ...... 29 2.3. Variant discovery and downstream analysis ...... 29 2.4. Genome-wide screen for genetic differentiation ...... 30 2.5. Underlying genetic basis for adaptive divergence ...... 30 2.6. Further characterization of candidate loci ...... 31 3. Results and Discussion ...... 32 3.1. Herring ...... 32 3.1.1. Paper I ...... 32 3.1.2. Paper II ...... 34 3.2. Darwin’s finches ...... 39 3.2.1 Paper III ...... 39 3.2.2. Paper IV ...... 42 3.3. Ruff ...... 44 3.3.1. Paper V ...... 44 4. Conclusion and future perspectives ...... 48 5. Acknowledgements ...... 53 6. References ...... 55

Abbreviations

ALX1 Aristaless-like homeobox 1 BGI Beijing Genomics Institute CALM Calmodulin CENPN protein N CRISPR Clustered regularly-interspaced short palindromic repeats DNA Deoxyribonucleic acid EMSA Electrophoretic mobility shift assay ESR2a Estrogen receptor 2a FGF10 Fibroblast growth factor 10 FOXC1 Forkhead box C1 FST GO Gene ontology GSC Goosecoid homeobox GWAS Genome wide association studies HCE High choriolytic enzyme HMGA2 High mobility group AT-hook 2 HSD17B2 Hydroxysteroid (17-Beta) dehydrogenase 2 IBD InDels Insertions and deletions KB Kilobase LD MB Megabase MC1R Melanocortin 1 receptor MRCA Most recent common ancestor mtDNA Mitochondrial deoxyribonucleic acid NGS Next generation sequencing PCR Polymerase chain reaction PRLR Prolactin receptor QC Quality control QTL Quantitative trait loci RDH14 Retinol dehydrogenase 14 RNA Ribonucleic acid SciLifeLab Science for Life Laboratory SDR42E1 Short chain dehydrogenase/reductase family 42E, member 1 SNP Single SOX11 SRY (Sex Determining Region Y)-Box 11 TSHR Thyroid stimulating hormone receptor UTR Un-translated region

1. Introduction

1.1. Adaptation Living organisms inhabit diverse ecosystems on earth and these natural envi- ronments present a multitude of biotic and abiotic challenges. (1). Adapta- tion is an evolutionary process whereby a species becomes more able to live in such challenging environments (2). Every species that has successfully adapted to a certain possesses some unique adaptive traits that may be structural, behavioral and physiological or can involve life-history param- eters. Structural are morphological features of an (for instance, large beaks in certain Darwin’s finch species increase their ability to crack larger seeds). Behavioral adaptations are a set of actions that an individual carries out to increase its chance of survival and reproduction (for instance, unique lekking1 behavior in different morphs of male ruff allows them an increased access to females for mating). Physiological adaptation comprises of the response of a species to a particular stimulus (for instance, local adaptation of Atlantic herring populations to salinity gradients in the Baltic sea).

Adaptive evolution is driven by natural selection and is one of the important process that explains the diversity of life we find in nature (3). The origin of this diversity on earth and the mechanisms that drive species to change through time have fascinated scientists and general public since ancient times (4). One of the key challenges in to date is to understand how natural selection drives evolution (5). A species can adapt to either abiotic environment (e.g. climate change), to another species (e.g. a competitor) or a combination of both abiotic and biotic forces which makes mechanistic dissection of adaptive evolution empirically difficult (6).

1 An aggregation of males with competitive displays to entice the visiting females for mating

13 1.2. Genetics of adaptation Many biologists have contributed to our current understanding of adaptive evolution and no one is more famous than , whose theory of evolution provided compelling arguments about the power of natural selec- tion to produce remarkable phenotypic adaptation (7). But, Darwin himself did not know about the actual mode of , i.e. “genes”, and later on, only after the Mendel’s laws of inheritance were accepted, genetics was in- corporated into evolutionary theory (8). in a population provides flexibility to adapt to changing environment and is crucial for sur- vival of the population over time. is regarded as the fun- damental resource upon which adaptation and will depend (9).

Clues underlying specific evolutionary processes responsible for adaptation are known to be hidden in the “genes” that control organismal phenotypes (7). Therefore, knowledge about the genetic basis of adaptive traits has be- come an important requirement to understand the majority of hypotheses regarding the genetics of adaptation (10). Historical developments in the field of adaptation genetics have gone through three major transitions. The first era saw theoretical developments in population and including the codification of population genetics theory by Fisher, Wright and Haldane (11–13). Dobzhansky (2) led the application of population ge- netic principles to natural populations. This era set a theoretical framework for the utilization of genetic data that was soon to follow.

The second era was characterized by a gradual increment in the availability of genetic data. Developments in protein electrophoresis methods opened up the door for assessing the levels of genetic variation in wider variety of or- ganisms rather then being limited to certain model species (14). There was an increased interests in natural populations, asking questions about the lev- els of genetic variation for ecologically important traits. Later on, the devel- opment of Sanger sequencing methods became one of the landmarks of this era.

The third era began with the publication of first metazoan genome sequence (15) in 1998. In this last decade, we have experienced an enormous progress in high-throughput sequencing technologies. Their availability has allowed sequencing at population scales and generating toolkits that has facilitated integration of ecological and genomic data (16). Applications of these methodological advances have now started to answer some of the pre- viously elusive long-standing questions on adaptation and evolutionary biol- ogy (17).

14 1.3. The ‘Genomics’ of adaptation Progress in genetic studies of adaptive evolution until recently had been constrained by the lack of resolution and the absence of a genomic perspec- tive (16). Ironically, the majority of our knowledge on such genetic mecha- nisms was only based on a few model organisms that could be reared at rela- tive ease in the laboratory, away from their natural environment (1). Ge- nomic characterization of natural populations was always desirable as they provide a wider range of phenotypes across evolutionary timescales that allows a more realistic understanding on the origins of genetic variation and how natural selection drives adaptive evolution. Next Generation Sequenc- ing (NGS) methods have not only made the existing genetic techniques cheaper and faster, but have provided ample opportunities for genomic stud- ies on any ecologically interesting species (17). In addition, widespread availability of these technologies has also enhanced the collaborative efforts of diverse research communities working in a particular organismal system with multi-dimensional expertise (18). Cost-effective opportunities for indi- vidual research groups to create genomic resources for their favorite species are changing the research landscape for many evolutionary biologists and ecologists.

1.4. Methods to identify loci underlying adaptation An important objective in the majority of the current studies in the genetics of adaptation is to identify the loci associated with adaptive phenotypes (19). The challenging aspect of narrowing down genomic regions of interest is usually done using three major approaches, namely forward genetics, reverse genetics and studies; either alone or often combined (20, 21).

1.4.1. Forward genetics approaches Forward genetics is a method to identify genomic regions of interest in cases where there is prior knowledge regarding the associated . These approaches perform genetic characterization of the adaptive trait in cases where the phenotypic response to selection has already been quantified. The methods include QTL2 mapping using pedigrees or association mapping (GWAS), both of which screen large number of molecular markers across the genomes of individuals that are segregating for adaptive trait of interest (19).

2 , a genomic region that is linked to, or contains, the genetic factors that control a particular phenotype

15 Parent 1 Parent 2

F1 X F1

Figure 1 QTL mapping requires parental lines that differ genetically for the trait of interest. Parental lines are crossed to produce F1. F1’s are then intercrossed for additional generations to produce individuals that contain different fraction of ge- nome of each parental line. Phenotypes of these recombinant individuals are meas- ured, as is the genotype of multiple markers that are polymorphic between parents. Probability of each markers (or their interval) being associated to the trait is meas- ured using likelihood ratio tests and the best estimate of QTL position is given by chromosomal position with highest significant likelihood ratio (Adapted by permis- sion from Macmillan Publishers Ltd: (22), copyright (2001)

QTL mapping (Figure 1) attempts to create recombinant populations and performs linkage analysis to map genomic loci that are associated with the trait of interest (23). These approaches have been extensively used in hu- mans, livestock, as well as in other model organisms in the past (24). Unfor- tunately, applicability of such studies has been limited due to the require- ment of large families to obtain enough recombinant offspring (19). In addi- tion, mapping traits that are shaped by many minimal effect loci (i.e. com- plex traits) have had much less success (23) with QTL mapping approaches.

16 Identify and measure phenotype

Sub-divide population

Genotype individuals using high density SNP panel Associate genotypes with phenotype

Identify SNPs / Regions and Fine mapping

Figure 2 Genome wide association studies (GWAS) compares common single nu- cleotide polymorphisms between two groups with different phenotypes. The results of GWAS are most often presented as a manhattan plots showing significance of trait-association across the genome. Adapted from (25) © 2014 Schierding, Cutfield and O’Sullivan.

GWAS (Figure 2) is the examination of genetic variants across the genome of different individuals to identify if any variant is associated with the phe- notype of interest by exploiting linkage disequilibrium as measured across several hundred thousand markers (26). GWAS is used for mapping genes within a single population as opposed to a controlled intercross done in QTL mapping studies. In the last few years, such studies have led to many scien- tific discoveries about genes and biological pathways associated with certain diseases in as well as new biological insights in other species (23). However, bridging the gap between loci identified using GWAS and the ac- tual identification of causal variants is still challenging. In addition, in many cases, these studies have only been able to identify a small fraction of the genetic variation associated with the traits of interest (27).

17 1.4.2. Reverse genetics approaches Reverse genetics is a method to identify genes in cases where there is no prior knowledge regarding the associated phenotype. These methods do not require quantification of phenotypic traits and are based on genome-wide screening of markers to detect footprints of selection (28). Other traditional approaches of reverse genetics not mentioned in this thesis include mutagen- esis screens, or direct transgenic modification of candidate genes.

When a new arises in a population that increases the of car- riers, natural selection will favor such individuals and thus frequency of the beneficial mutation increases over time (Figure 3A). Due to a process termed genetic hitchhiking, neutral that are nearby to this beneficial mutation also tend to change their and may gradually lead to fixation (29). This phenomenon will result in a reduction of genetic diversity resulting in the signature of a . Genome-wide scans of nucle- otide diversity3 (π), Tajima’s D4 and patterns of linkage disequilibrium5 are commonly used measures for detecting such signatures of selection within a single population (Figure 3B).

6 Fixation index (FST) is the most widely used method to identify loci show- ing genetic differentiation between populations. The availability of high density SNP genotyping chips and whole-genome sequencing have now allowed the development of new algorithms that account for haplotype struc- tures and not only individual SNP frequency alone, for quantifying genetic differentiation among populations (30, 31).

3 The average number of nucleotide differences per site for a group of DNA sequences sampled 4 Difference between two estimates of genetic diversity; mean number of pairwise differences and number of segregating sites 5 Non-random association of alleles at different loci 6 Ratio of average number of differences between pairs of sampled within a population with average number sampled between different populations

18 A B

Before selection After selection Scaled statistics

Selective sweep Sequence position

Figure 3 Selective sweep screen (A) Under selection, a beneficial mutation will rise in frequency and in addition, nearby linked alleles also “hitchhike” along with it to higher frequency creating a region of selective sweep © 2008 Nature Education (B) Effect of selective sweep on genetic variation, Adapted from (32) © Annual Reviews, 2005.

One major challenge of such studies is to differentiate signatures of natural selection from signals generated due to genetic drift7, population bottlenecks8 and other demographic factors (33). There has been a lot of effort in recent times to develop statistical procedures to better distinguish signals of natural selection from genetic drift (34). Still there is much debate on the proportion of genomic loci under selection and drift (32). Studies on model organisms where genetic drift plays a non-significant role in comparison to other mech- anisms that shape up the adaptive evolution could address many issues of this debate.

1.4.3. Candidate gene approaches In contrast to forward and reverse genetics methods that attempt to scan the entire genome, candidate gene approaches focus on pre-specified genes. In the majority of cases, these genes have already been associated with similar phenotype in other species (35). In addition, such genes are also selected based on a priori knowledge regarding their function and possible biological impact on the particular phenotype.

These methods are relatively cheap and quick to perform and they have been extensively used in human (36) and domestic (37). Currently, as knowledge about genes and their interactions are increasingly becoming available, these studies are moving away from simplistic single gene to trait

7 change in the relative frequency of allele in a population due to random sampling of organisms 8 sharp reduction in (and thereby variation in ) due to environmental events (such as floods, landslides, diseases, droughts) or human activities

19 approach to analysis of multiple genes that are potentially involved in bio- logical pathways associated with a certain trait of interest (38).

However, for the majority of adaptive traits of interest, particularly in natural populations, only a handful of candidate genes are available to date. In addi- tion, these approaches tend to get biased towards well-studied candidate genes. Due to the higher chances of false positives and negatives, results of these studies are also difficult to be replicated in subsequent follow-up stud- ies (36). In today’s post-genomic era, rather than using candidate gene meth- ods as a first step in genetic association studies, these have been effectively used to characterize regions already narrowed down by forward or reverse genetics approaches.

1.5. Background to the thesis projects After the arrival of next-generation sequencing technologies, there have been revolutionary changes in population genetics research. The possibility of data collection from genomes of many individual at affordable costs have become a reality, even for smaller research groups (39). The generation of unprecedented amounts of genomic data has changed the scenario from ex- ploring only a limited “snapshot” of the genome (for instance, mtDNA, mi- crosatellites or few nuclear markers) to development and usage of genome- wide high-density markers (40). This has allowed inference of the evolution- ary history and identification of genetic loci associated with a particular trait in any species at very high resolution. In addition, cost effective approaches of sequencing in “pools” of multiple individual DNA originating from a certain population have been successfully applied to infer demographic his- tory and screen genomes for signatures of selection (41–43). This PhD pro- ject was motivated by the progress achieved in areas of genome analysis in the last decade. It aims to harness the power of high-throughput sequencing technologies to characterize genetic diversity in natural populations and un- derstand the genetic basis of adaptive phenotypes.

1.5.1. Atlantic herring Atlantic herring is a schooling, pelagic fish species that inhabits both sides of the North Atlantic Ocean and the Baltic Sea (Figure 4). It is one of the most abundant marine species with the number of individuals in a herring school being up to a billion (44).

20 Density

Figure 4 Geographical distributions of Atlantic herring © Misigon/Wikimedia Commons/CC-BY-SA-3.0/GFDL

The fecundity in herring females is about 30,000-70,000 eggs per female (45) and herring spawn in huge schools as females release eggs and males release clouds of milt simultaneously in the water. Atlantic herring migrate over great distances in open Sea and Ocean to the feeding grounds during the majority of the year and return back to their specified locations during spawning seasons from early spring to late fall.

Atlantic herring has been critical for the economic development of northern Europe in the past (46) and possess an important presence in scandinavian cuisine. Until imported food became available, salted herring was a crucial source of protein and energy, in particular during winter periods. In addition to human consumption, it is also widely used in fish feed and currently ranks among one of the largest fishery industries in the world (47).

As the habitat of Atlantic herring ranges throughout Baltic Sea and Atlantic Ocean, it shows remarkable adaptation to environmental variables as, for instance, salinity, temperature, light, feed resources and predators are strik- ingly different between the Baltic Sea and Atlantic Ocean. This has allowed the formation of distinct populations that are locally adapted to a particular environment (48). Several herring stocks have been recognized in the past (Figure 5) based on their morphological features, life history parameters, spawning locations and spawning time (49).

21 78°

76°

74°

72°

70°

46°E 68° 44° 66° Northern 42° Norwegian Bothnian 64° Spring Spawners 40° 62° Icelandic Southern 38° 26°W Summer Spawners Bothnian 60° Local 36° Coastal 58° 24° Stocks 34°

56° West of Scotland Central 22° (VIaN) Baltic 32° North Sea Clyde 54° Autumn Spawners 52° 30° (Div IVa&b) 20° West of Ireland

& Porcupine Bank Irish Sea Western 50° Baltic Spring Spawners 28° Thames 48°N 18°W Celtic Sea & VIIj&k Downs 26°E VIIe&f

16°W 14° 12° 10° 8° 6° 4° 2°W 0° 2°E 4° 6° 8° 10° 12° 14° 16° 18° 20° 22° 24°E

Figure 5 Atlantic herring stocks in North Atlantic Ocean and Baltic Sea © C Zim- mermann/www.clupea.net.

Despite clear phenotypic differences, no or only minute genetic differences between these stocks had been documented. Earlier studies based on limited genetic markers (allozymes (14, 50), and SNPs (51–53) have shown limited evidence of genetic differentiations between ecologically divergent stocks or populations with different spawning seasons (autumn- spawners and spring-spawners). Our aim was to generate high quality ge- nome resources for this species and understand the genetic basis of popula- tion differentiations due to local adaptation.

1.5.2. Darwin’s finches Darwin’s finches, studied on the Galápagos archipelago, have historic im- portance in the field of evolutionary biology as they provided some of the fundamental insights into processes of natural selection and adaptive radia- tion9 (54). This archipelago is geologically young and rose from the ocean less than five million years ago (55) and has experienced significant changes in its geography in the past due to regular shifts in sea level during periods of glaciation (56). These changes have resulted in a striking diversity of flora,

9A process in which organisms diversify rapidly into multitude of new forms, particularly when changes in environment make new resources available, create new challenges and open environmental niches

22 fauna and overall landscape in the archipelago (56). These islands range from dry zones with prickly cacti, littoral zones with bushes and grasses to dense forests in humid zones with high elevation and rainfall (57). The abil- ity of Darwin’s finches to adapt, survive and evolve in such diversified habi- tats have made them a classic textbook example for a variety of concepts in evolutionary biology (58).

Apart from a single Darwin’s finch species on Cocos Island, there were 14 recognized species in the Galápagos archipelago (Figure 6). Phylogenetic reconstructions from mtDNA suggested that Darwin’s finches were mono- phyletic and derived from a mainland common ancestor that colonized the archipelago about 2-3 million years ago (58). Evolution in Darwin’s finches is characterized by rapid adaptation to an unstable and challenging environ- ment leading to ecological diversification and speciation. This has resulted in striking diversity in their phenotypes (for instance, beak types, body size, plumage, feeding behavior and song types) (Extended Data table 1, paper III). Beaks are one of the most diversified features in these and are well adapted to the type of food they eat; ranging from fine needle-like beaks in warbler finches that are perfect for picking up insects; long, sharp and point- ed beaks in cactus finches for probing into cactus or deep, broad and blunt beaks in large ground finches suited for cracking large nuts and seeds (Fig- ure 6).

23

Figure 6 Recognized species of Darwin’s finches, their beak types and primary feeding habit. Adapted from © 2015 Encyclopedia Britannica.

Darwin’s finches have always been the subject of intense research; however, we knew little about how processes such as natural selection and gene flow have shaped the patterns of polymorphisms and divergence across the ge- nomes of these birds to produce such astonishing phenotypic variation. Pre- vious studies on the evolutionary history of Darwin’s finches were either based on mtDNA (58) or few nuclear microsatellites (59) and complete ge- nomes of all 14 recognized species were as yet unexplored. In this study, we have characterized genetic diversity within and between each of these spe- cies in a genome-wide scale. We have explored their genomes to identify loci that most likely contributed to morphological diversification and under- stand their roles in adaptive evolution of these iconic birds.

24 1.5.3. Ruff Ruff is a medium-sized wading that breeds in marshes and wet mead- ows across the Palearctic zone. It is a migratory bird that spends winter in the tropics (mainly in Africa) and breeds in the wetlands of northern Europe and Siberia during summer (Figure 7).

Non-breeding period in winter

Present all year Breeding period in summer

Figure 7 Habitat of ruff across the year © Jimfbleak/Wikimedia Commons/CC-BY- SA-3.0/GFDL

Ruffs show a marked sexual dimorphism, with males being larger than fe- males (Supplementary Figure 4, paper V). In addition, males develop dis- tinctive breeding plumage (ornamental feathers around their neck and head tufts) during the breeding season and show one of the most remarkable mat- ing behaviors (60, 61). There are three strikingly different male morphs (In- dependents, Satellites and Faeders) that differ in behavior, plumage color and body size (Figure 8). Independents (80-95% of males) show spectacular diversity in color of ruff/head tufts and vigorously defend territories at leks10. Satellites (5-20% of males) usually have white ruff/head tufts, do not defend territories and display submissive behavior at leks. Faeder is a rare (< 1% of males) third morph, mimicking females by its smaller size and female-like plumage. Independents attract females by their elaborate performance dis- play, high degree of aggression and strong colored plumage. Satellites allow Independents to dominate them, in exchange for getting closer to females visiting the territories occupied by Independents. Faeders being a female- mimic, get uninterrupted access to mating territories due to this disguise and

10 An aggregation of males with competitive displays to entice the visiting females for mating

25 attempt to mate with females accordingly (62). Adaptations in each of theses male morphs allow them to take advantage of their unique courtship behav- ior to reproduce and survive.

Figure 8 Ruff birds (A) Outside the breeding season, © J.M Garg/Wikimedia Com- mons/CC-BY-SA-3.0/GFDL (B) During the breeding season, © Farrell et al.; licen- see BioMed Central Ltd. 2013 (63)

Previous studies in ruff captive populations have suggested that the genetic polymorphism associated with development of either Satellite or Independ- ent was an autosomal, dominant single locus with mendelian mode of inher- itance (64, 65). Later on, following discovery of the third morph Faeder (66), the possibility of a third dominant allele at the same locus was sup- posed to account for the development of female-mimicking Faeder males (67). In addition, a marker near MC1R was found to be linked with ornamental plumage in these birds (63) but a recent study did not find any coding changes in MC1R that were associated with plumage color varia- tion (68).

26 It was an enigma, how could a single genetic locus explain such complex differences in these birds. In this study, we aimed to characterize the genetic architecture underlying phenotypic differences in these three male morphs and get insights into scenarios of adaptive changes that possibly could have led to the evolution of this spectacular mating system.

27 2. Methods

2.1. Sampling and collection of ecological data Herring samples for paper I were collected between 1978-1980. The samples were originally used for studying genetic differentiation among herring pop- ulations based on loci using starch gel electrophoresis (14, 50). Interestingly, we could reuse them for whole-genome sequencing after 30 years. Hence these samples have literally witnessed transitions in genomic technologies in last few decades. For paper II, we used additional herring samples collected recently in 2012-2013 by our collaborators. Muscle tissues of each fish were used for DNA isolations. For transcriptomics studies, RNA extractions from multiple tissues (for instance, skeletal muscle, brain, eye, spleen, intestine and kidney) were used. For paper III and IV, Darwin’s finch samples were collected on the Galápagos archipelago and Cocos Island by our collaborators. Similarly, closely related were collected on main- land Barbados to be used as out-group. The birds were captured in mist nets and released after collecting blood samples on FTA papers. DNA was isolat- ed from pieces of these FTA papers. For paper V, our collaborators provided ruff population samples and DNA from a fresh blood sample required for genome assembly was collected from the Helsinki Zoo. The DNA/RNA extractions were done using current standard protocols (for details, see methods for each paper).

A successful study on adaptation genomics is only possible by integration of the data from long-term life history and other ecological traits, history of and field experimental studies. Our collaborations with ecol- ogists and evolutionary biologists helped us not only to collect samples for genetic studies but, most importantly, allowed us to access their rich ecolog- ical knowledgebase, establish research hypothesis and experimental designs. In addition, they were also crucial for interpreting the results and deciding on future directions.

28 2.2. Whole-genome sequencing All studies within this thesis involved whole-genome sequencing of multiple populations of a particular species. We used two different approaches for genome sequencing of samples (individually or pooled). Individual sequenc- ing is a standard approach where DNA of every individual is sequenced sep- arately. For pooled sequencing, DNA of multiple individuals in each popula- tion was pooled together in equimolar concentrations and the resulting pooled DNA is sequenced. Sequence coverage11 is an important aspect that determines degree of confidence for discovery of genetic variants in such studies. For individual sequencing, we targeted the sequence coverage to ~10X, and for pooled sequencing to ~ 30X. Choice of individual or pooled sequencing depends on the research question. Pooled sequencing is a cost- effective method to scan genomes of multiple individuals across many popu- lations to estimate allele frequency differences and identify most informative SNPs whereas individual sequencing provides additional advantages of per- forming haplotype based demographic inferences and other intra- and inter- population genetics analysis. We used Illumina next generation sequencers for whole-genome sequencing, which is a commonly used platform for these studies.

2.3. Variant discovery and downstream analysis In the current era of wide-scale applications of NGS technologies in almost every biological discipline, methods and pipelines for analyzing these data have become quite robust and streamlined. After receiving the raw data (as short sequence reads of ~125 bp in FASTQ format) from Illumina sequenc- ing platforms, we first started the quality control (QC) steps by removing low quality sequences (generally at the start and end of the reads), trimming Illumina sequence adaptors, checking for possible contaminations and other errors during library preparation and sequencing protocols. These QC-passed sequences were then aligned against a reference genome assembly and se- quence alignments for each individual/pool were compared against each other to identify genetic variants. We used stringent in-house filtering pipe- lines to select high-confidence calls from raw genetic variants. Single nucle- otide polymorphisms (SNPs) were the most commonly used variants for genetic characterizations of the populations. Apart from SNPs, we also screened for other genetic variants such as small insertions and deletions (Indels) and large structural variants (inversion, deletion, duplication and translocation).

11 average number of reads sequenced for each base in the genome

29 2.4. Genome-wide screen for genetic differentiation SNPs were used to generate the phylogenetic trees, which demonstrated vital information regarding genetic diversity within and between populations. These trees also provided genome-wide perspectives on the evolutionary history of populations. Phylogenetic relationships also guided us towards the choice of methods to be used for screening genetic differentiation among these populations. In addition, statistics generated from such phylogenetic inferences were also used to estimate time since divergence between differ- ent populations and from the most recent common ancestor (MRCA).

In our studies, we grouped populations according to a particular ecological adaptation they shared, for instance, Atlantic herring populations adapted to certain salinity conditions or spawning seasons (paper I and II); Darwin’s finches with certain beak shapes (paper III) or beak sizes (paper IV) or male morphs in ruff with different reproductive strategies (paper V). We then compared the allele frequency at each SNP across genomes of these groups to identify regions with increased genetic divergence between them. Such regions with high genetic divergence were potential candidate loci associated with certain phenotypic traits that allowed populations to adapt to a particu- lar environment.

2.5. Underlying genetic basis for adaptive divergence Once genomic regions with increased divergence between populations were identified, an important step ahead was to identify candidate genes/ within the region that might contribute to phenotypic differences. In majority of the cases, it was challenging because these genomic regions were often large and contains multiple genes. Hence, such genomic regions needed careful, fine scale examination using different methods of and functional annotations of the genetic variants before coming to any robust conclusions.

Genetic variation at candidate loci under selection for particular adaptive trait of interest must be shared between populations possessing a similar trait. To identify this shared genetic variation, we looked into the within these regions in each population. Scanning haplotypes from multiple populations allowed us to locate the core IBD12 segment associated with a phenotype. Having narrowed down the candidate region, we further annotat- ed variants within this IBD segment to get overview of gene content and

12 A DNA segment in two or more individuals that is identical by descent due to same ances- tral origin

30 genomic distribution of these changes (e.g. intergenic, up- stream/downstream, intronic, synonymous or non-synonymous) to under- stand what particular genomic category has been enriched in the candidate regions. Such analysis helped us to understand relative importance of genetic variations in regulatory and coding sequences, which has been one of the long-standing questions in evolutionary genetics.

We also analyzed gene ontology (GO13) terms for genes within these candi- date regions to understand if any of the GO terms are over (or under) repre- sented. These analyses provided information regarding biological pathways associated with candidate regions and their contributions in the development of an adaptive phenotype.

We also examined sequence conservation of these genetic variants across published vertebrate sequence databases. The genetic variants showing high divergence between populations that were also highly conserved across the vertebrates were likely to possess functional significance underlying ecolog- ical adaptation. In addition, comparison against out-group populations al- lowed us to identify derived (or ancestral) state of the candidate locus in each population. Such analysis helped to understand evolution of candidate regions associated with the trait of interest.

2.6. Further characterization of candidate loci Once genomic regions that were highly differentiated between populations were identified using bioinformatics approaches, we further selected diag- nostic SNPs from each of these regions. These SNPs were genotyped in a set of individuals from each population using either a single SNP genotyping assay or custom-made SNP chip, depending upon the number of SNPs being used. Genotyping individual samples allowed us to confirm associations detected by whole-genome sequencing and to study the differentiated region in more detail. Genotyping data was also used to examine individual haplo- types in populations that were only sequenced in pools before, providing a better characterization of the differentiated locus. In the presence of quantita- tive phenotypic measurements, these data were also used to perform regres- sion analysis to study their association with phenotypic diversity (paper III and IV) and estimate the amount of phenotypic variance explained by the candidate locus.

13 Standard description of gene products in terms of their associated cellular component, biological processes and molecular functions (104)

31 3. Results and Discussion

3.1. Herring 3.1.1. Paper I Previous genetic studies in herring were limited to allozymes, mtDNA, mi- crosatellites or few numbers of SNPs and high resolution genomic resources were lacking. A major hurdle undertaking such studies in any non-model species like herring is to generate a high quality genome assembly. Generat- ing long insert size libraries for large eukaryotic genomes are still costly and prone to error. Additionally, creating linkage maps and anchoring the correct order of sequences along the chromosomes is technically challenging. In this study, we developed a cost effective strategy that does not require a high- quality genome assembly for such genome-wide studies in non-model spe- cies (Figure 9).

RNA seq reads

De novo transcriptome assembly

Transcript Contig

Align genomic reads III to transcript Genome reads where both pair map to transcript Genome reads where only one in pair maps to transcript Exon Unmapped member of a pair

Local reassembly of the mapped reads and their unmapped pair

Exome contig Figure 9 Exome assembly pipeline, adapted from (69)

The pipeline of this method first required a transcriptome, which was gener- ated by cDNA sequencing of skeletal muscle from a Baltic herring. Further- more, it utilized whole-genome sequences for alignment against transcripts

32 and added pieces of flanking sequences on both sides of exons to extend the transcriptome into an exome14 assembly. An exome assembly provided better alignability of genomic reads at the exon/ boundaries in comparison to a transcriptome assembly and also covered additional features of the genome by adding flanking intronic/intergenic sequences. Although this exome as- sembly was only ~ 6-8 % of whole genome (Table S1, paper I), most im- portantly it captured part of the coding sequences and flanking non-coding regions in the genome. This exome assembly was further used as a reference to characterize genetic variation in Atlantic herring populations.

A

100

90

80

70

60

50

40

Allele frequency (%) frequency Allele 30

20

10

0 AHA AHN AHS AHK AHB BHG BHV BHK

B

100

90

80

70

60

50

40 Allele frequency (%) frequency Allele 30

20

10

0 AHA AHN AHS AHK AHB BHG BHV BHK Figure 10 Example of SNP showing no significant differentiation (A) and highly significant differentiation (B) among populations

14 exons obtained from assembled mRNA transcripts with flanking sequences derived from genomic reads

33 We carried out whole-genome sequencing in eight pools of 50 fish each, sampled from geographic locations across Baltic Sea and Atlantic Ocean (Figure 1A, paper I). Mapping genomic reads against the exome assembly identified ~440,000 SNPs among these populations. Consistent with the results of low genetic differentiation among herring populations from previ- ous studies, the great majority of these SNPs did not show significant allele frequency differences (Figure 10A). A small set of SNPs (2-3%) showed highly significant allele frequency differences (Figure 10B) and a phyloge- netic tree based on these SNPs showed clear separation of populations (Fig- ure 2B, paper I). We validated these results from pooled sequencing by indi- vidual genotyping of the same fish samples using a selected set of “neutral” and “differentiated” SNPs (Figure 2G, S2 and S3, paper I). These results provided for the first time, evidence of genetically differentiated populations in Atlantic herring.

A highly debated question in population genetics is whether the observed genetic differentiation is due to drift or natural selection. We did a simula- tion study to compare distribution of FST in our data with the one expected for selectively neutral loci. The results indicated that there was a significant excess of loci showing extreme FST values in our genome-wide screen than was expected under a genetic drift model (Figure D, paper I). The results confirmed that genetic drift has played a subordinate role in herring popula- tions, consistent with their large population size and natural selection is pos- sibly the major factor underlying genetic differentiation in herring.

This study was an important step in genetic characterization of population structure in Atlantic herring. Identifying loci under natural selection from the background of genetic differentiation due to drift has always been a chal- lenge and Atlantic herring could be used as an excellent model system to address such issues.

3.1.2. Paper II Paper I provided us a preliminary screenshot regarding SNPs showing sig- nificant allele frequency differences. In order to characterize the total num- ber of independent loci in the genome that showed strong genetic differentia- tions among herring populations, to understand the patterns of genetic differ- entiation and annotate such genomic regions, a genome assembly was re- quired. In this study, we generated a high quality herring genome assembly in collaboration with research groups from BGI, China and SciLifeLab, Upp- sala.

34 Atlantic herring are exposed to diverse ecological conditions (for instance, salinity differences across Baltic Sea and Atlantic Ocean) and show a con- siderable variation in spawning time. In the context of genetic drift playing insignificant role in this species, herring provided us with a unique oppor- tunity to explore the genetic basis of ecological adaptation in natural popula- tions. In addition to the eight populations sequenced for paper I, we carried out whole-genome sequencing of 11 more populations across Baltic Sea and Atlantic Ocean and also included a population of Pacific herring as an out- group (Figure 1A, paper II). The reference herring genome assembly was used to analyze genetic differentiation among these 20 populations.

BÄV Baltic Sea BR BH Kattegat BU Skagerrak North Sea BÄS Atlantic Ocean

BA BC

BG

BK

BV

BF*

BÄH*

KT SH NS*

AB1 SB KB

AI

Figure 11 Neighbor-joining tree of Atlantic herring populations using ~6.5 million SNPs across the whole genome

Previous studies have described divergence of Atlantic and Pacific herring as due to geologic evolution of the Arctic-North Atlantic basin (70). Using the data from mtDNA, we estimated the date of split be- tween Pacific and Atlantic herring to be ~2.2 million years ago. A phyloge- netic tree provided a good indication on the genetic similarity of Atlantic herring populations and their divergence from Pacific herring (Figure 1C, paper II).

Although phylogeny of Atlantic herring was almost star-shaped, there were indications of genetic differentiation among populations (Figure 11). For instance, autumn-spawners from Baltic Sea clustered separately from spring-

35 spawning populations, which was possibly due to certain genomic regions that were distinct between autumn- and spring-spawning herring. In order to identify such regions, we analyzed the allele frequencies of each SNP across the genomes and identified ~100 independent loci that showed significant allele frequency differences between autumn- and spring-spawners (Figure 4A, paper II).

PSMC1 FOCN3 ZMPB NRXN3B

GTF2A1 KCNK13 TSHR CEP128 200

-log10P Autumn vs Spring P 10 (East Atlantic / Baltic herring) 100 -log 0 60 Figure 12 Genetic differentiations across the TSHR locus between autumn- and spring-spawners

The strongest signal overlapped TSHR that has an established role in photo- periodic regulation of reproduction in birds and mammals (71) (Figure 12). One of the major differences between populations that either spawn in spring or autumn was their response to varying day lengths and genetic variation in TSHR locus possibly had an important contribution towards this adaptation. Other strong signals included CALM, SOX11 and ESR2a (Figure 4C, paper II) that also had important functions in reproductive biology. The strong genetic differentiation at these loci indicated that they were major loci asso- ciated with preference of spawning season for a particular population.

Using similar methods, we identified ~500 independent loci showing strong genetic differences between populations adapted to Atlantic Ocean and Bal- tic Sea (Figure 3A, paper II). One of the important differences between these two is the presence of a steep salinity gradient. It changes from 35‰ in the marine water in Atlantic Ocean to ~20-32‰ in Kattegat/Skagerrak zone and down to ~3-12‰ across the brackish Baltic Sea. Hence, certain genetic differences between these populations must be associated with salini- ty tolerance. Interestingly, one of the strong differentiated region overlapped PRLR (Figure 13) which had previously been associated with osmoregula- tion in fish (72). Another example of strongly differentiated region was around hatching enzyme HCE, whose association with salinity had been reported previously (73). HCE might be an important locus for adaptation, as hatching eggs in brackish condition for a marine fish must be a challenging stage of development.

36 Genetic differentiation across PRLR

● ● ●

● ● ●● ● ●●● ●● ●● ● ●

60 ● ● ● ● ● ●

● ● ● ● ●● ● ●●

● ●● ● ● ● ●●

● ● 40

●● ● ●

● ●

● ● ● ● ● ●●●

●● ● ● ● ● ● ● ●

20 ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●

Degree of genetic differentiation, − log10P Degree of genetic differentiation, ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ●● ●● ● ●● ● ● ●● ●●● ● ● ●● ●●● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●●●●● ● ● ● ●●● ● ●● ● ●●●●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●●● ●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ● ●●●● ● ● ● ● ●● ●●● ● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●●●●●●●● ●●● ● ● ● ●●● ● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ●●●●● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ●●● ●●●●● ●●●● ● ● ● ●● ●● ● ● ● ●●● ● ●● ●●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●●●●●●●●●● ●●● ●●● ●● ● ●● ●● ● ●●● ●● ●●● ● ● ● ●●●● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ●●●●● ●● ● ●● ● ● ●● ● ● ●● ● ● ●● ●● ● ●●● ●●● ●●● ● ● ●● ●●● ●●●●●●●● ● ●●● ●●●●● ●●●● ● ●● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ●● ●● ●●●● ●●●●●●●●●●●●●● ● ●●● ● ●●●●●●● ●● ●●● ●●● ●●●●●●●●● ●● ●● ●● ● ●●● ● ●●● ●●● ● ● ●● ● ●●●● ●●● ● ● ● ●●● ● ●● ●●● ●● ●●● ● ● ● ● ● ●● ●● ●● ● ●● ●● ● ●● ● ●● ● ●●●●●● ● ● ●●●●●●● ●●●● ●●● ● ● ● ● ●●●●●●● ●●●● ●● ●●●●●●● ●● ●● ● ● ● ●● ●● ●●● ●● ● ● ●● ● ●● ● ● ● ● ●● ●●● ● ● ●● ●●●●● ●●● ● ●● ●● ● ●●●●● ● ●●● ●● ●● ●●● ● ● ●● ● ●● ●●● ● ● ● ●●● ●●●● ● ● ●●●● ●●●●●●●● ●● ●●● ●●●●●●● ● ● ●●●●●●● ●●●●●● ●●●● ●● ●●●●●●●● ● ● ●●●●●●●● ●●●●● ● ● ●● ●●●● ●●●●●● ● ● ●● ●● ●●● ● ●● ●● ●●●●●● ● ●● ● ● ●● ●●● ● ●●● ●●●●●●●●● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ●●●● ●● ●●●●● ● ●●● ● ●●●●● ●●● ● ●● ● ● ●● ● ●●●●● ●●●●●●●●●●●● ●●●●●● ● ●●●●●●●● ● ● ●●● ●●● ●●●●●●●● ●● ● ●●● ●●●●●●● ●● ●●● ●● ●● ●●●●●●●●●● ● ● ●●● ● ●● ●● ●●●●●● ● ●●● ● ●●●●●●●● ●● ●● ●●●● ● ●●● ● ● ●●● ● ●●●●●● ●●●● ●● ●●●●●●● ●●● ●●●●●●● ● ●●●●● ● ●●●●●●●●●●●●●●●● ● ●●●●● ●●●● ●●●●●●●●●● ●●●● ●● ●●●●●●●●●● ● ●●● ●●●● ●●● ●●●●●●●●●●●●●●● ●●●●●●● ●●● ● ●●● ●●●●●●●● ●● ●● ● ● ●●●● ●●●● ● ● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ● ●●● ●●●● ● ●●●●●●●●●● ● ●●●● ● ● ●● ●●●●●●● ●●●●● ●●● ●● ●●●●●● ●●●●●●●● ● ●●●●●●●●●● ●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●● ●● ●●●●●● ●●● ●●●● ● ●●● ● ● ● ●●●●●●●● ●●●●●●●●●●●●●●●●●● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●● ● ●●●●●●●●● ●●●●●●● ● ●● ● ● ●●●●●●●●●● ●●● ● ●●●●●●● ●●●●●●●●●●●●●● ● ●●● ●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●● ●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●● ●●●●●●● ●●●●●●●●●● ● ●●● ●● ● ● ●● ●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●● ●●●●●●●●●●●●●●● ●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●● ●●●●● ● ● ●●● ●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0

0 1000 2000 3000 4000 5000 SNP position, ~ 1.4 Mb Figure 13 Genetic differentiations across PRLR locus between populations adapted to Baltic Sea and Atlantic Ocean

The great majority of loci associated with either difference in choice of spawning season or adaptation to different habitats (Atlantic and Baltic) showed strong differentiation signals but no noticeable differences in the surrounding regions (Figure 12, 13). This result was consistent with low genetic differentiations in most regions of the genome and natural selection associated with ecological adaptation causing strong genetic differentiation at particular loci among herring populations.

There had been a lot of discussion in the past regarding relative importance of coding and regulatory mutations associated with a change in particular phenotype (74). We analyzed genomic distribution of loci showing strong differentiations and identified strong enrichment of non-synonymous as well as changes in UTR’s and 5kb upstream and downstream of the gene (Figure 6, paper II). The result indicated that both coding and regulatory changes contributed to adaptation in herring. The question regarding which of them contributed more could perhaps only be answered once the effect size of each of these loci is quantified.

The majority of loci that showed strong genetic differentiations occurred in larger blocks (Figure 12). We considered two possible explanations for oc- currence of such larger blocks. One, it may be a result of recent selective sweep leading to genetic hitchhiking of neutral variants linked with causal

37 mutations. Second, such regions may include multiple causal mutations as- sociated with the adaptive trait and to avoid the favoured haplotype to break up, there is selection for suppressed recombination and thus leading to large haplotype blocks. To explore more into these hypotheses, we compared nu- cleotide diversity in highly differentiated regions against random regions of the genome. The was significantly higher in larger blocks of highly differentiated regions compared to random regions both within and between divergent populations (Figure 5, paper II). These results indicated that highly differentiated regions occurring in larger blocks may possess multiple causal mutations and possibly were maintained during an evolutionary process rather than being a result of genetic hitchhiking due to recent selective sweep.

An important aspect of this study was the estimation of allele frequency in each population using a pooled-sequencing approach. Since the relative con- tribution of each individual in a pool might not be equal, an obvious question that might arise was regarding the reliability of these estimates. Therefore, results obtained from whole genome sequencing were verified by genotyping ~30 individual fish from each population using a custom-made 70K SNP chip. The same individual samples used for pooled–sequencing were used for genotyping. There was an excellent correlation in the allele frequency estimates from whole genome pooled sequencing and individual genotyping (Figure 3-S1, paper II), which indicated the reliability of allele frequency estimates obtained from pooled sequencing.

This paper constitutes a major advance regarding the genetic characterization of Atlantic herring. The availability of high-quality genome assembly and the annotation database of genes and population variants will be important resources for further studies in this species. Most importantly, it has provid- ed a comprehensive view on the genetic architecture underlying ecological adaptations in herring and these results could be of seminal influence for designing adaptation studies in any natural population. This study has demonstrated that variation in spawning time in herring populations is not just a but genetic factors must contribute. It has high- lighted herring as a unique model organism where a large proportion of the genome shows minute genetic differentiation between populations. In con- trast, natural selection is strong enough to cause genetic differences at loci underlying adaptation.

38 3.2. Darwin’s finches 3.2.1 Paper III In this study, we have done extensive genomic characterization of the entire Darwin’s finch radiation (Figure 6) by whole-genome sequencing 120 indi- vidual birds that included all currently recognized species and two of their close mainland relatives. For certain species, we had sequenced samples from multiple islands (Extended data table 2, paper III) to understand how populations adapt and evolve under different environmental conditions. In contrast to herring, a draft genome assembly from one species of Darwin’s finches (medium ground finch) had been already generated (75) and we used this genome as a reference for our study.

Traditional of Darwin’s finches was based on their (Figure 14A) and previous phylogenetic studies have been limited to mtDNA and few microsatellites. Using our dataset of ~45 million sites that were variable within or between populations, we generated a high-resolution that was in general consistent with current taxonomy but had several interesting new features (Figure 14B).

39 A small ground finch (G. fuliginosa)

medium ground finch (G. fortis)

large ground finch (G. magnirostris)

cactus finch (G. scandens)

large cactus finch (G. conirostris)* Ground finches Ground sharp-beaked finch (G. difficilis)*

small tree finch (C. parvulus)

large tree finch (C. psittacula)

medium tree finch (C. pauper)

woodpecker finch (C. pallidus) Tree finches Tree vegetarian finch (P. crassirostris)

cocos finch (P. inornata)

grey warbler finch (C. fusca)

green warbler finch (C. olivacea) B green warbler finch (C. olivacea)

grey warbler finch (C. fusca, Espanola)

grey warbler finch (C. fusca, San Cristobal)

vegetarian finch (P. crassirostris)

cocos finch (P. inornata)

sharp-beaked finch (G. difficilis, Pinta Fernandina and Santiago)* Ground finches large tree finch (C. psittacula) woodpecker finch (C. pallidus) mangrove finch (C. heliobates)

medium tree finch (C. pauper) Tree finches small tree finch (C. parvulus) sharp-beaked finch (G. difficilis, Wolf and Darwin)*

large cactus finch (G. conirostris, Espanola)*

large ground finch (G. magnirostris) cactus finch (G. scandens)

large cactus finch (G. conirostris, Genovesa)* Ground finches

sharp-beaked finch (G. difficilis, Genovesa)* medium ground finch (G. fortis) small ground finch (G. fuliginosa)

outgroups

Figure 14 Phylogeny of Darwin’s finches (A) traditional taxonomy, adapted from © Rands et al.; licensee BioMed Central Ltd. 2013 (76) (B) based on whole-genome sequence data from our study. The species marked with asterisks were the ones that deviated from the expected traditional phylogeny.

The populations of sharp-beaked ground finches (Geospiza difficilis) from six different islands did not form a monophyletic group but rather separated into three distinct clusters. This result was consistent with an earlier version of taxonomy where these three groups were classified as separate species (77, 78). An ABBA-BABA test15 designed to study the evidence of intro- gression confirmed that G. difficilis populations from Wolf and Darwin is-

15 Study of the pattern of shared derived variants using a four- test (105)

40 lands were species of mixed ancestry. The results indicated that majority of their genome is possibly shared with G. magnirostris or a similar close rela- tive and portion of the genome affecting phenotypic characters was derived from ancestral G. difficilis population (Figure 2b, paper III). Interestingly, beaks in G. difficilis populations from Wolf are not as sharp-pointed as other populations but demonstrate slightly curved-shaped mandible which is a characteristic feature in blunt-beaked birds such as G. magnirostris (Figure 15 A). This typical beak morphology possibly supports the hypothesis of its mixed ancestry.

A G. difficilis (Pinta) G. difficilis (Wolf) G. magnirostris

B G. conirostris (Genovesa) G. conirostris (Espanola)

Figure 15 Some species of Darwin’s finches, images reproduced from "How and Why Species Multiply: The Radiation of Darwin’s Finches by Peter R. Grant & B. Rosemary Grant, Copyright © 2008 Princeton University Press. Reprinted by per- mission.

Similarly, populations of the large cactus finch (G. conirostris) from two islands, Genovesa and Española did not cluster together in the phylogenetic tree (Figure 14B). This result was consistent with differences in their beak shape, as the population from Genovesa has pointed beaks like G. scandens whereas Española population has blunt beaks similar to G. magnirostris (Figure 15B). Due to these discrepancies with traditional phylogeny, we proposed a revised taxonomy for sharp-beaked ground finches and large cactus finches (Supplementary text, paper III).

The most striking morphological diversity among Darwin’s finches are their beaks (Figure 6), which have been a popular subject of interest since the famous visit to Galápagos by Charles Darwin almost 200 years ago. Ge- nome-wide comparisons among closely related species with blunt and point- ed beaks (G. magnirostris/G. conirostris-Española vs. G. conirostris-

41 Genovesa/G. difficilis-Wolf, Figure 15) identified 15 candidate regions with striking allele frequency differences between these birds (Figure 3a, paper III). There were six genes (CALM, GSC, RDH14, ALX1, FGF10 and FOXC1) within these 15 regions that had been previously associated with craniofacial and/or beak development in birds and mammals (79–85).

The strongest signal was a 240-kb region overlapping ALX1 (Figure 3a,b paper III) where blunt beaked G. magnirostris and G. conirostris were ho- mozygous for one haplotype (Figure 3d, paper III). Even though six popula- tions of G. difficilis were distinct for major part of their genome (Figure 14B), they all shared the “pointed” beak haplotype at this locus, as per their species name suggests (sharp-beaked ground finch). A phylogenetic tree based on this 240 kb region indicated deep divergence between “blunt” and “pointed” beak haplotypes that occurred soon after the split between warbler and non-warbler finches (Figure 3c, paper III).

The ALX1 locus was segregating in medium ground finches (G. fortis) (Fig- ure 3c, d paper III). Interestingly, field observations had shown that there was considerable beak shape diversity among medium ground finches (86). This provided an opportunity to study segregation at the ALX1 locus within a species. We genotyped an additional (n=62) medium ground finch for a di- agnostic SNP and observed significant association with beak shape (Figure 3e, paper III). The two homozygotes had significantly different beak shapes measurements whereas heterozygotes were more intermediate, suggesting an additive effect of this locus on beak shape.

This study is a classic example on the usage of recent advancements in popu- lation-scale genomics coupled with detailed knowledge on ecology and evo- lutionary history collected over decades from extensive fieldwork to gener- ate new insights into some age-old questions in evolutionary biology.

3.2.2. Paper IV Multiple species may coexist in geographically overlapping habitat. When- ever there is competition between them for limiting resources, differences among species are accentuated, allowing divergence of their resource ex- ploiting traits. Such ecological character displacement (87, 88) has been regarded as an important component of speciation. It has been difficult to obtain evidence of these events in nature (89, 90) and in many cases it has only been inferred. But, in a landmark study by Peter and Rosemary Grant (91), they actually saw it happening. Due to severe drought on the Daphne island in Galápagos during 2004-05, food availability declined, newly ar- rived large ground finches with large beaks dominated the feeding ground and with near removal of large seeds, larger-beaked birds among medium

42 ground finch did not have enough food to survive. Smaller-beaked birds survived better as they could eat small seeds that were relatively unreward- ing to large beaked birds. This trait was passed onto future generation, as there was a strong shift towards smaller beak size in the next two genera- tions.

Beak size was identified as a major factor affecting survival of medium ground finches during this episode of drought (91). Survivors and non- survivors did not differ in beak shape; hence as expected ALX1 locus that regulates variation in beak shape (paper III) was not associated with surviv- al. In this study, we took advantage of a unique feature in Darwin’s finches; presence of triplet species only differing in size-related traits; large, medium and small ground and tree finches (Figure 1A, paper IV) to understand the genetic basis associated with beak size diversity and its role in character displacement during this particular episode of drought.

We sequenced ten additional birds from each of these large, medium and small ground and tree finch species to ~10X coverage and conducted a ge- nome-wide FST screen that revealed seven independent genomic region showing significant genetic differences among these birds (Fig 2A, Table S2, Fig S1, paper IV). The strongest signal was a ~525 kb region overlapping HMGA2, which was an excellent candidate locus for differences in growth patterns, as this gene is associated with human height, craniofacial distances (92), growth retardation in pygmy mouse (93) and body size in dogs (94). The ~525 kb HMGA2 locus revealed two major haplotypes associated with large and small birds that split even before the divergence of warbler and non-warbler finches about a million years ago (Figure 1C, paper IV). Most species appeared to be fixed for one haplotype but as expected, this locus was segregating in medium ground and tree finches.

As body and beak size are correlated traits, an important question was whether this HMGA2 locus was primarily associated with body size or beak size, or both. Similar as in paper III, we genotyped a diagnostic SNP from the HMGA2 locus in 133 additional medium ground finches and the results demonstrated a highly significant association with beak size rather than body size (Figure 2E, paper IV). Similar to the ALX1 locus, HMGA2 also had an additive effect with heterozygotes showing intermediate beak size compared with the two homozygotes.

Further, in order to investigate the possible role of HMGA2 locus in charac- ter displacement episode during the drought in 2004-05, we genotyped 71 medium ground finches from Daphne that had experienced the drought. Thirty-seven of these birds had survived the drought and 34 had died. Indi-

43 viduals with the SS genotype (associated with small beaks) survived best, LL individuals (associated with large beaks) survived worst and heterozygous had an intermediate survival rate (Table 2F, paper IV). These results indicat- ed that HMGA2 was a major locus controlling beak size that had facilitated rapid diversification of an adaptive trait during a character displacement episode.

Paper III and IV together have provided evidence of two loci with major effects on beak morphology (shape and size) that were the most important traits involved in major evolutionary changes during the adaptive radiation of Darwin’s finches.

3.3. Ruff 3.3.1. Paper V Ruff shows a remarkable lekking behavior where three different male morphs have evolved a unique courtship behavior to attract and gain access to females that are ready to mate. In order to understand the genetic basis of phenotypic differences among male morphs, we first generated a genome assembly from an independent male and further whole-genome sequenced an additional 15 Independents, nine Satellites and a single Faeder male. Com- paring genomes of Independents and Satellites identified a strongly differen- tiated 4.5 MB region (Figure 16, upper panel). Apart from this region, there was no significant differentiation between these male morphs in the rest of genome (Figure 1c, paper V) indicating that this might be the locus control- ling the observed phenotypic differences between the morphs.

44 Independents vs Satellites 0

5

10 ST 15 ZF Highly differentiated 4.5 MB region 20

25

Highly differentiated region associated with 4.5 MB inversion Independent

Satellite chromosome

Diagnostic test for inversion

Independent chromosome

~ 750 bp

Satellite chromosome

~ 1000 bp Satellites and Faeders are heterozygous for the inversion

Independents Satellites Faeders

~ 1000 bp

~ 750 bp

Figure 16 Identification of 4.5 Mb inversion associated with Satellite and Faeder morph in ruff, adapted from (95)

Presence of a large region showing genetic differentiation may be explained by structural changes, in particular inversions. These structural changes re- duce frequency of recombination in a region and allow accumulation of mul- tiple changes leading to a large region of genetic differentiation. We screened for the presence of such structural changes throughout the genome using the sequence data and identified a 4.5 MB inversion in Satellites, per- fectly overlapping with the differentiated region identified in the genome- wide screen (Figure 16, middle panel). A diagnostic PCR-based test con- firmed the break points of this inversion and showed that all Satellites and Faeders were heterozygous for this inversion (Figure 16, lower panel). The inversion disrupts the CENPN, a gene that encodes the centromere protein N and its inactivation has severe deleterious effect in other species (96, 97). Therefore, we hypothesized that this inversion was recessive lethal and maintained by balancing selection. A recent study on pedigrees from a cap- tive ruff population has now confirmed that this inversion was in fact reces- sive lethal (98).

45 As the inverted region was large with ~90 genes, it was challenging to iden- tify candidate genes that might be associated with phenotypic differences. Independents have higher level of circulating testosterone in comparison to Satellites and Faeders (98) which possibly is associated with their aggres- sive behavior in leks. Hence, genes within this inversion that were associated with steroid metabolism were obvious candidate genes to look into.

Additional screening of structural changes within this inversion identified three heterozygous deletions in Satellites and Faeders that clustered in vicin- ity of HSD17B2 and SDR42E1 genes and deleted evolutionary conserved sequences (Figure 2d, 2e and Supplementary Figure 6, paper V). These two genes have important roles in the metabolism of sex hormones; for instance, HSD17B2 catalyzes conversion of testosterone to less active keto-forms. We postulated that these deletions constitute regulatory mutations affecting ex- pression of these genes. For instance, the deletions around HSD17B2 in Sat- ellites and Faeders possibly leads to overexpression of this gene, which en- hances the degradation of testosterone leading to submissive behavior in leks. We also identified additional structural changes unique to Faeders (Supplementary Figure 7, paper V), which may contribute to its female- mimicking phenotype.

Apart from differences in behavior among the male morphs, they also pos- sess striking diversity in plumage color (Figure 8B). One of the candidate genes associated with pigmentation, MC1R, was located within the inverted region. Coding changes in MC1R have been associated with plumage varia- tion in several species of birds (Figure 3, paper V). We found that the Satel- lite MC1R sequence consisted of four missense mutations at positions that were identical among birds and mammals (Figure 3, paper V). We proposed that MC1R allele on Satellite chromosome and possible altered metabolism of sex hormones was associated with white color of ornamental feathers in Satellites. It was surprising that a previous study in ruff (68) did not identify any coding changes in MC1R that were associated with plumage color, prob- ably due to errors in phenotyping or sequencing.

A closer examination of the inverted region revealed disruptions in the strong genetic differentiation between Satellite and Independent chromo- somes at two regions (Figure 2a, paper V). Remarkably, a comparison of Satellite and Faeder chromosomes showed a mirror image with these two regions showing high genetic differentiation. We postulated that the Satellite chromosome arose by one or two rare recombination events between an In- dependent and a Faeder-like chromosome leading to such pattern of genetic differentiations (Figure 2f, paper V). Estimation of evolutionary age indicat- ed that first inversion event occurred ~3.8 million years ago whereas second recombination event must have happened ~520,000 years ago.

46 This study has demonstrated how structural changes like an inversion of a chromosomal section can have profound effect on the evolutionary trajectory of an organism. The inversion constituted the starting point of an evolution- ary process leading to suppression of recombination and allowing for accu- mulation of subsequent adaptive changes that ultimately led to the formation of three different male morphs in ruff.

47 4. Conclusion and future perspectives

The genetic basis of adaptive traits has been an elusive long-standing ques- tion in the field of evolutionary biology. But, progress in such studies were always hampered by the absence of a genomic perspectives and lack of enough genetic markers. Enormous progress in high-throughput sequencing technologies in the last decade has allowed sequencing genomes at a popula- tion scale and generated toolkits that have facilitated the integration of eco- logical and genomic data for any species. This PhD work has successfully utilized these methodological advancements in the areas of whole-genome sequencing and has provided some unique insights into the adaptive traits that have amazed evolutionary biologists for decades.

First, this thesis has characterized the genetic architecture for ecological adaptations in Atlantic herring in unprecedented detail that may have im- portant implications for sustainable herring fishery management. Our results have established herring as a model organism to study natural selection where the noise due to genetic drift is of minor importance. Second, this thesis has revisited the iconic Darwin’s finches from Galápagos islands and revealed clues about the genetic foundation associated with phenotypic di- versity in beak shape and size that played a key role in shaping up the evolu- tion of Darwin’s finches. Finally, the study on ruff has highlighted the im- portance of large genomic structural changes in shaping up an adaptive trait, which we believe will be a textbook example for the evolution of alternative mating strategies in animals.

This thesis opens up a plethora of possibilities for future work. Herring pop- ulations mix when they migrate for feeding outside their spawning seasons. Fisheries, in general catch fish from such a mix and methods to distinguish a particular stock in these mixed catches are not well established. This might lead to overfishing of certain stocks, which has always been a challenging issue for herring fisheries. The comprehensive list of genetic markers identi- fied in our study can be used to develop genotyping panels that can be used to characterize and classify respective herring stocks outside their spawning seasons. Such genotyping panels can be very useful tools in stock assess- ment programs of Atlantic herring.

48 Our findings on the genomic regions that show strong differentiation be- tween autumn- and spring-spawning herring have indicated that spawning preference is not just influenced by the environmental conditions (for in- stance, availability of planktons, certain water temperature) but genetic fac- tors have an important role to determine when does a particular population spawn. Diagnostic genetic markers from such regions can be used to develop genetic tools to distinguish autumn and spring spawning herring that would aid in proper management of stocks with different spawning preference. Candidate genes identified within these regions include some of those that have key roles in photoperiodic regulation of reproduction in birds and mammals. Genetic basis of photoperiodic regulation is not so well studied in fish and our results can provide opportunities for detailed studies in these aspects. Photoperiodic manipulations have been used to control body growth, egg production and fertilization success in aquaculture for commer- cial purposes. The candidate genes and associated genetic markers identified in our study can be of interest for marker-assisted pro- grams for these commercial aquaculture species.

This study has addressed only two of the major questions in Atlantic herring (a) adaptation of a marine species to low salinity and (b) genetic factors as- sociated with timing of reproduction. There is a long list of research ques- tions that can be explored using the data generated in this study. For in- stance, populations in Baltic Sea and Atlantic Ocean are also adapted to dif- ferent feed resources, pathogen content, predators and temperature/light conditions. Whole-genome datasets generated in this study can be utilized to study the underlying genetic basis of such adaptations.

Autumn-spawning herring populations from southern-Baltic Sea used in our study that were sampled in 1970’s have been reported to have begun to dis- appear in recent times (99). An interesting aspect would be to study whether the whole population simply disappeared due to overfishing or if there was a change in spawning preference in this population. Additional sequencing of newly inhabited spring-spawning herring in these areas and their compari- sons to the autumn-spawning samples collected in 1970’s might throw light into this important biological question.

The ongoing study in Atlantic herring could include additional geographic regions that would broaden present knowledge regarding the evolutionary history and population structure of this species. For instance, there are au- tumn- and spring-spawning herring on the west side of the Atlantic Ocean as well. An interesting follow-up study would be to find out whether the results of genetic regions associated with spawning could be replicated in the West- Atlantic populations or if a different set of genetic variants control these phenotypes in those populations. These comparisons would allow us to study

49 of genetic factors associated with a common ecological adaptation in geographically distant populations.

Similarly, there are different populations of herring in the White Sea and the taxonomic status of these populations is not very clear. Occurrence of re- mote populations of Pacific herring in the White Sea is known, whereas At- lantic herring also penetrates the White Sea from the west (100). A study of mitochondrial DNA in these populations had indicated evidence of intro- gression from Atlantic herring to Pacific herring in northern European wa- ters (101). Whole-genome studies of these populations will certainly add a new dimension regarding ecological adaptation of two similar species (Pacif- ic and Atlantic herring) in similar environmental conditions and identify if there are genomic signatures of in these populations.

Our work on Darwin’s finches has revealed some of the underlying genetic variations explaining the diversity in beak shape and size which, probably is the most widely used illustration of morphological adaptation for exploita- tion of available food resource. In addition to beak diversity, Darwin’s finches also possess diversity in types of song they sing. Finch mating dy- namics is known to be under the influence of song and previous studies have associated song evolution to speciation and adaptive radiation in Darwin’s finches (102, 103). Extensive on-field recordings of song for many species have been done and it would be interesting to utilize the sequence data we have generated to explore the genetics behind the vocal signals in species with contrasting song types.

Our results have indicated that certain populations of the same Darwin’s finch species that reside in different islands do not share similar genetic ar- chitecture. This probably is associated with the adaptation to variable envi- ronmental conditions on each island. For certain species, we have not yet explored the populations from multiple islands (for instance, 12 different established populations of medium ground finches have been described in the entire Galápagos archipelago). It would be useful to sequence additional populations of these species to investigate whether they behave as a mono- phyletic or they possess different genetic architectures, as was the case for G. difficilis and G. conirostris populations in our study.

Our results have also provided evidence of widespread interspecific gene flow throughout the radiation. Field studies have shown evidence of inter- species hybridization between G. fortis and G. scandens on Daphne Island in Galápagos archipelago that has resulted in large changes in average beak shape over the period of 30 years. Whole genome sequencing of these birds from pre- and post-introgression period would allow us to trace the genomic changes in these species as a result of introgression and understand if the

50 previously identified locus (i.e. ALX1) is the only genetic factor involved in this phenotypic shift or if there are additional loci involved. In addition, sim- ilar introgressive hybridization has led to the formation of an entirely new reproductively isolated lineage in Daphne Island. Whole genome sequencing of these samples will provide further insights on the importance of introgres- sion among Darwin’s finches.

Our study in ruff has provided evidence of how major structural changes in the genome can lead to evolution of adaptive phenotypes. Results have indi- cated that an inversion itself and further accumulation of genetic changes within this inversion possibly altered the metabolism of sex hormones and pigmentation patterns, which were key for evolution of three male morphs. However, the molecular mechanisms behind these hypotheses remain yet to be unveiled. As a subsequent follow up, tissue samples can be collected from these birds during the breeding season and transcriptomics studies can be carried out. This will allow us to understand which of the genes within the inversion show differential expression among these male morphs during the breeding season. For instance, we have hypothesized that candidate genes such as HSD17B2 is over-expressed in Satellites leading to increased con- version of testosterone to less active keto-forms. These gene expression stud- ies could test for such hypothesis generated from our current study. In depth analysis of differentially expressed genes will also provide clues behind the metabolic pathways associated with these phenotypic changes. In addition, transcriptome data can also be used to improve the current version of ruff genome annotation.

Even though genetic basis of alternative mating strategies was our major biological question in ruff, there are additional aspects that are yet unex- plored. There is, for instance, a striking amount of phenotypic diversity in plumage color within Independents. We have access to collections of tissue samples and photographs from Independents with wide variety of plumage. Comparing genome sequences of the birds with different plumage types will be very useful in understanding the genetic mechanisms behind the plumage diversity in Independent males.

All three studies in this thesis have identified major loci associated with adaptive traits of interest. From a perspective, one im- portant task moving forward is to identify causal variants and understand the cellular and molecular mechanisms that are involved in generating the phe- notypic effects observed. The majority of markers in our study showing strong genetic differentiations are located either upstream or downstream of the gene, probably indicating that they may play an important role in gene regulation. Electrophoretic mobility shift assays (EMSA) is a technique that can be used to screen allele-specific gel shift differences in order to study

51 altered DNA-protein interactions and thus identify potential factor binding sites. These in-vitro studies will allow us to diagnose if such variants are candidate causal mutations that contributed to the phenotypic differences. Similarly, functional assays can be developed using wild type and mutated constructs to characterize missense mutations in the candidate genes showing strong genetic differentiations. tools such as CRISPR/Cas9 have generated considerable excitement in recent times for precise and reliable targeted changes in the genome. We can make use of such methods to manipulate candidate causal mutations in an appropriate model to mimic the trait of interest and study phenotype-genotype correlations in a controlled environment.

In today’s post-genomic era, high-throughput methods that translate vast wealth of data generated by genomics or transcriptomics projects into pro- files at each stage of the flow of biological information from DNA à RNA à proteins à protein interactions are in their infancy. The technological advancements of these methods and their wise usage in the future will allow us to characterize the entire multi-dimensional space of biomolecules that will provide a more complete picture of how an adaptive phenotype has evolved. As gene products function together in biosynthetic networks, these results can complement the work of scientists studying related networks in any species, even well beyond the field of adaptation genomics.

52 5. Acknowledgements

Leif, there are no enough words to explain how grateful I am to get an op- portunity to work with you. You have been an inspirational mentor to me. I can now proudly say that I have worked with you, one of the best in the world. The way you identify the research questions and design bril- liant ideas to solve them, have inspired and motivated me, literally everyday. Apart from being an outstanding scientist, you are also a great human being. Although you have a large number of personnel working with you on multi- ple projects, your tireless efforts to organize everything and taking care of everyone is one of those unique qualities, which I guess is a rarity. The re- search excellence and amazing environment in your group made my PhD never a “work” for me, but a sort of fun adventurous trip to unearth some hidden mysteries of the nature.

I had a fantastic opportunity to receive supervision from Calle and Alvaro, who are among the best in our business. Before I arrived in Sweden, I was just an ordinary veterinarian with limited knowledge in bioinformatics and research skills. I have learnt most of the workflow skills required for my PhD from both of you. Thank you for all your in-depth suggestions and criti- cal insights that has been the backbone of my PhD.

The cherishing aspect of this PhD has been the excellent collaborations with experts from all over the world. To get an opportunity to work alongside Peter and Rosemary Grant on the iconic Darwin’s finches has been one of those “dream come true” moments to me and I would like to extend my sin- cere gratitude to them for sharing their expertise and their vast wealth of knowledge. I also take this opportunity to thank Nils, Linda, Arild, Michele, Fredrik and Jacob for all the fruitful collaborations.

I am very grateful to Matt for all the insightful discussions and suggestions. You have been my primary resource for any technical questions regarding population genetics. Thank you for allowing me to knock your office door anytime I needed. Very special thanks to Nima, Jonas, Fan, Doreen, Mats, Susanne Kerje, Markus, Manfred, Görel and Marta for your valuable contribution to my thesis. Thank you Ulla, Jessica and Ulrika for all your help on wet-lab stuffs whenever I needed them. I have enjoyed the company

53 of brilliant researchers within our department and I am thankful to Patric, Andreas, Erik, Marcin, Ronnie and Miguel for your expert suggestions whenever I needed them. I would also like to thank Björn, Jacques, Marc and the whole SciLifeLab BILS, WABI and sequencing platforms for being extremely helpful collaborators. I am also grateful to all the personnel at IMBIM administration for taking care of all the practical aspects of my PhD program.

I would like to take this opportunity to thank Göran Andersson, my first mentor in Sweden. It was your suggestion that I should contact Leif for my master’s thesis project and the rest is history. Thank you for the valuable advices and insights every time I got an opportunity to meet you. Special thanks for sharing your rare books about Darwin’s finches. I would also like to thank Kerstin for being an inspiring personality and a source of motiva- tion.

I have been blessed with friends and colleagues who have made my experi- ence in the lab very special. Nima and Shady, we three have shared a long partnership, way back from our master’s. A heartfelt thanks for all your sup- port throughout, which has played an important role in my growth as a re- searcher. I have had some quality times with Jonas, Freyja, Iris, Matteo, Fan, Chao, Fabiana, Doreen, Dora, Anna, Ann, Angela, Sam, Sharadha, Martin, Axel, Rakan, Ginger, Jessika, Chungang, Khurram and Shu- maila both in and outside the lab and it has been a lot of fun. I am fortunate to know you all. A big thanks to my “SLU gang”, Lin, Merina, Agnese, Chrissy, Daniela, Bingjie, Sandrine, Nancy, Andre and Viviane for shar- ing all the good times together.

Living far way from your home country makes you alone and miss your family. But love, warmth and support I received from Malin and her whole family here in Sweden made sure that I didn’t miss my home too often. Thank you not only for providing a physical space in your house where I could spend all these years in Sweden, but also accepting me as a part of your extended family. I would also like to thank Maite, Regine and Mat- thias for your friendship, support and care.

Most importantly, none of this would have been possible without the love and blessing of my family. I would like to thank my wife Durga for standing besides me in all rough and happy times and for her continued love, support and encouragement. Special thanks for all your patience that is needed to take care of a crazy husband like me.

54 6. References

1. Ungerer MC, Johnson LC, Herman MA (2008) Ecological genomics: understanding gene and genome function in the natural environment. Heredity (Edinb) 100(2):178–183. 2. Dobzhansky T (1941) Genetics and the origin of species. New York Columbia Univ Press. 3. Mitton JB (2001) Adaptation and Natural Selection: Overview. Encycl Life Sci. 4. Lewontin, C. (1974) The Genetic Basis of Evolutionary Change. Columbia University Press 5. Gilad Y, Pritchard JK, Thornton K (2009) Characterizing natural variation using next-generation sequencing technologies. Trends Genet 25(10):463–471. 6. Turcotte MM, Corrin MSC, Johnson MTJ (2012) Adaptive Evolution in Ecological Communities. PLoS Biol 10(5):e1001332. 7. Palmer DH, Kronforst MR (2015) Divergence and gene flow among Darwin’s finches: A genome-wide view of adaptive radiation driven by interspecies allele sharing. BioEssays 37(9):968–974. 8. Galton D (2009) Did Darwin read Mendel? Qjm 102(8):587–589. 9. Amos W, Harwood J (1998) Factors affecting levels of genetic diversity in natural populations. Philos Trans R Soc B Biol Sci 353(1366):177–186. 10. Phillips PC (2005) Testing hypotheses regarding the genetics of adaptation. Genetica 123(1-2):15–24. 11. Fisher R. (1930) The Genetical Theory of Natural Selection (Oxford University Press). 12. Wright S (1942) Statistical genetics and evolution. Bull Am Math Soc 48(4):223–247. 13. Haldane JBS (1934) A Mathematical Theory of Natural and Artificial Selection Part X. Some Theorems on Artificial Selection. Genetics 19(5):412– 429. 14. Andersson L, Rymani N, Stahli G (1981) Genetic variability in Atlantic herring. 78:69–78. 15. The C. elegans Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282(5396):2012–2018. 16. Radwan J, Babik W (2012) The genomics of adaptation. Proc R Soc B Biol Sci 279(1749):5024–5028. 17. Stapley J, et al. (2010) Adaptation genomics: the next generation. Trends Ecol Evol 25(12):705–712. 18. Twyford AD, Streisfeld MA, Lowry DB, Friedman J (2015) Genomic studies on the nature of species: adaptation and speciation in Mimulus. Mol Ecol 24(11):2601–9. 19. Pardo-Diaz C, Salazar C, Jiggins CD (2015) Towards the identification of the loci of adaptive evolution. Methods Ecol Evol 6(4):445–464.

55 20. Nadeau NJ, Jiggins CD (2015) A golden age for evolutionary genetics? Genomic studies of adaptation in natural populations. Trends Genet 26(11):484–492. 21. Stinchcombe JR, Hoekstra HE (2007) Combining and quantitative genetics: finding the genes underlying ecologically important traits. Heredity (Edinb) 100(2):158–170. 22. Mackay TF (2001) Quantitative trait loci in Drosophila. Nat Rev Genet 2(1):11–20. 23. Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five Years of GWAS Discovery. Am J Hum Genet 90(1):7–24. 24. Schielzeth H, Husby A (2014) Challenges and prospects in genome-wide quantitative trait loci mapping of standing genetic variation in natural populations. Ann N Y Acad Sci 1320(1):35–57. 25. Schierding W, Cutfield WS, O’Sullivan JM (2014) The missing story behind Genome Wide Association Studies: single nucleotide polymorphisms in gene deserts have a story to tell. Front Genet 5(February):1–7. 26. Stranger BE, Stahl EA, Raj T (2011) Progress and Promise of Genome-Wide Association Studies for Human Complex Trait Genetics. Genetics 187(2):367– 383. 27. Klein C, Lohmann K, Ziegler A (2012) The Promise and Limitations of Genome-wide Association Studies. Jama 308(18):1867–8. 28. Hoekstra HE, Coyne JA, Rausher M (2007) The locus of evolution: evo devo and the genetics of adaptation. Evolution (N Y) 61(5):995–1016. 29. Cutter AD, Payseur BA (2013) Genomic signatures of selection at linked sites: unifying the disparity among species. Nat Rev Genet 14(4):262–274. 30. Lawson DJ, Hellenthal G, Myers S, Falush D (2012) Inference of Population Structure using Dense Haplotype Data. PLoS Genet 8(1):e1002453. 31. Pavlidis P, Živković D, Stamatakis A, Alachiotis N (2013) SweeD: Likelihood-based detection of selective sweeps in thousands of genomes. Mol Biol Evol 30(9):2224–2234. 32. Nielsen R (2005) Molecular signatures of natural selection Annu Rev Genet 39:197–218. 33. Staubach F, et al. (2012) Genome Patterns of Selection and Introgression of Haplotypes in Natural Populations of the House Mouse (Mus musculus). PLoS Genet 8(8):e1002891. 34. Chen H, Patterson N, Reich D (2010) Population differentiation as a test for selective sweeps. Genome Res 20(3):393–402. 35. Zhu M, Zhao S (2007) Candidate gene identification approach: Progress and challenges. Int J Biol Sci 3(7):420–427. 36. Tabor HK, Risch NJ, Myers RM (2002) Candidate-gene approaches for studying complex genetic traits: practical considerations. Nat Rev Genet 3(5):391–397. 37. Kerje S, Lind J, Schütz K, Jensen P, Andersson L (2003) Melanocortin 1- receptor (MC1R) mutations are associated with plumage colour in chicken. Anim Genet 34(4):241–248. 38. Sharafeldin N, et al. (2015) A Candidate-Pathway Approach to Identify Gene- Environment Interactions: Analyses of Colon Cancer Risk and Survival. J Natl Cancer Inst 107(9):1–7. 39. Fumagalli M, Vieira FG, Linderoth T, Nielsen R (2014) ngsTools: methods for population genetics analyses from Next-Generation Sequencing data. Bioinformatics

56 40. Veeramah KR, Hammer MF (2014) The impact of whole-genome sequencing on the reconstruction of human population history. Nat Rev Genet 15(3):149– 162. 41. Axelsson E, et al. (2013) The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature 495(7441):360–364. 42. Rubin C-J, et al. (2012) Strong signatures of selection in the domestic pig genome. Proc Natl Acad Sci 109 (48 ):19529–19536. 43. Rubin C-J, et al. (2010) Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464(7288):587–591. 44. Parrish JK, Viscido S V., Grünbaum D (2002) Self-organized fish schools: An examination of emergent properties. Biol Bull 202(3):296–305. 45. Bucholtz RH, Tomkiewicz J, Nyengaard JR, Andersen JB (2013) Oogenesis, fecundity and condition of Baltic herring (Clupea harengus L.): A stereological study. Fish Res 145:100–113. 46. Dickey-Collas M, et al. (2010) Lessons learned from stock collapse and recovery of North Sea herring: a review. ICES J Mar Sci J du Cons 67 (9 ):1875–1886. 47. FAO Yearbook: Fishery and Aquaculture Statistics (2012) 76. 48. Iles TD, Sinclair M (1982) Atlantic Herring: Stock Discreteness and Abundance. Sci 215 (4533 ):627–633. 49. McQuinn I (1997) Metapopulations and the Atlantic herring. Rev Fish Biol Fish 7(3):297–329. 50. Ryman N, Lagercrantz U, Andersson L, Chakraborty R, Rosenberg R (1984) Lack of correspondence between genetic and morphologic variability patterns in Atlantic herring (Clupea harengus). Heredity (Edinb) 53(3):687–704. 51. Gaggiotti OE, et al. (2009) Disentangling the effects of evolutionary, demographic, and environmental factors influencing genetic structure of natural populations: Atlantic herring as a case study. Evolution (N Y) 63(11):2939–2951. 52. Limborg MT, et al. (2012) Environmental selection on transcriptome-derived SNPs in a high gene flow marine fish, the Atlantic herring (Clupea harengus). Mol Ecol 21(15):3686–703. 53. Larsson LC, et al. (2007) Concordance of allozyme and microsatellite differentiation in a marine fish, but evidence of selection at a microsatellite locus. Mol Ecol 16:1135–1147. 54. Darwin C (1988) The Voyage of the Beagle. New Am Libr. 55. Rassmann K (1997) Evolutionary Age of the Galápagos Iguanas Predates the Age of the Present Galápagos Islands. Mol Phylogenet Evol 7(2):158–172. 56. Ali JR, Aitchison JC (2014) Exploring the combined role of eustasy and oceanic island thermal subsidence in shaping on the Galápagos. J Biogeogr 41(7):1227–1241. 57. Galapagos Flora and Fauna Available at: https://www.redmangrove.com/ about-galapagos/galapagos-flora-fauna/. 58. Sato A, et al. (1999) Phylogeny of Darwin’s finches as revealed by mtDNA sequences. Proc Natl Acad Sci U S A 96(9):5101–5106. 59. Grant PR, Grant BR (2010) Conspecific versus heterospecific gene exchange between populations of Darwin’s finches. Philos Trans R Soc B Biol Sci 365(1543):1065–1076. 60. Lank DB, Dale J (2001) Visual Signals for Individual Identification: The Silent “Song” of Ruffs. Auk 118(3):759–765. 61. Höglund J. & Alatalo R.V. (1995) Leks (Princeton University Press).

57 62. Lekking in Ruff (2015) Available at: http://www.scilifelab.se/research/ scientific-highlights/ruff/. 63. Farrell LL, Burke T, Slate J, McRae SB, Lank DB (2013) Genetic mapping of the female mimic morph locus in the ruff. BMC Genet 14(1):1–4. 64. Lank DB, Smith CM, Hanotte O, Burke T, Cooke F (1995) Genetic polymorphism for alternative mating behaviour in lekking male ruff Philomachus pugnax. Nature 378(6552):59–62. 65. Lank DB, Coupe M, Wynne-Edwards KE (1999) Testosterone-induced male traits in female ruffs (Philomachus pugnax): autosomal inheritance and gender differentiation. Proc R Soc L B 266. doi:10.1098/rspb.1999.0926. 66. Jukema J, Piersma T (2006) Permanent female mimics in a lekking shorebird. Biol Lett 2. doi:10.1098/rsbl.2005.0416. 67. Lank DB, Farrell LL, Burke T, Piersma T, McRae SB (2013) A dominant allele controls development into female mimic male and diminutive female ruffs. Biol Lett 9(6). 68. Farrell LL, Kupper C, Burke T, Lank DB (2015) Major breeding plumage color differences of male ruffs (philomachus pugnax) are not associated with coding sequence variation in the MC1R gene. J Hered 106(2):211–215. 69. Lamichhaney S, Martinez A, Rafati N, Sundström G, Rubin C (2012) Population-scale sequencing reveals genetic differentiation due to local adaptation in Atlantic herring. 70. Grant WS (1986) Biochemical Genetic Divergence between Atlantic, Clupea harengus, and Pacific, C. pallasi, Herring. Copeia 1986(3):714–719. 71. Nakao N, et al. (2008) Thyrotrophin in the pars tuberalis triggers photoperiodic response. Nature 452(7185):317–322. 72. Manzon LA (2002) The Role of Prolactin in Fish Osmoregulation: A Review. 310:291–310. 73. Kawaguchi M, et al. (2013) Adaptive hatching enzyme: one amino acid substitution results in differential salt dependency of the enzyme. J Exp Biol 216(9):1609–1615. 74. Wray GA (2007) The evolutionary significance of cis-regulatory mutations. Nat Rev Genet 8(3):206–216. 75. Zhang G, et al. (2014) reveals insights into avian genome evolution and adaptation. Science (80- ) 346(6215):1311–1320. 76. Rands CM, et al. (2013) Insights into the evolution of Darwin’s finches from comparative analysis of the Geospiza magnirostris genome sequence. BMC Genomics 14(1):1–15. 77. Lack D.L. (1945) The Galapagos finches (Geospizinae) a study in variation (California academy of sciences). 78. Swarth HS (1931) The avifauna of the Galapagos Islands (Occ.Pap.Calif.Acad.Sci). 79. Abzhanov A, et al. (2006) The calmodulin pathway and evolution of elongated beak morphology in Darwin’s finches. Nature 442(7102):563–567. 80. Rivera-Perez JA, Wakamiya M, Behringer RR (1999) Goosecoid acts cell autonomously in mesenchyme-derived tissues during craniofacial development. Development 126(17):3811–3821. 81. Rowe A, Richman JM, Brickell PM (1991) Retinoic acid treatment alters the distribution of retinoic acid receptor-beta transcripts in the embryonic chick face. Development 111(4):1007–1016. 82. Uz E, et al. (2015) Disruption of ALX1 causes extreme microphthalmia and severe facial clefting: Expanding the Spectrum of Autosomal-Recessive ALX- Related Frontonasal Dysplasia. Am J Hum Genet 86(5):789–796.

58 83. Dee CT, Szymoniuk CR, Mills PED, Takahashi T (2013) Defective neural crest migration revealed by a Zebrafish model of Alx1-related frontonasal dysplasia. Hum Mol Genet 22 (2 ):239–251. 84. Brugmann SA, et al. (2010) Comparative gene expression analysis of avian embryonic facial structures reveals new candidates for human craniofacial disorders. Hum Mol Genet 19 (5 ):920–930. 85. Sommer P, Napier HR, Hogan BL, Kidson SH (2006) Identification of Tgfβ1i4 as a downstream target of Foxc1. Dev Growth Differ 48(5):297–308. 86. Grant, PR Grant BR (2014) 40 Years of Evolution: Darwin’s Finches on Daphne Major Island (Princeton University Press). 87. Brown WL, Wilson EO (1956) Character Displacement. Syst Zool 5(2):49–64. 88. Grant, P R (1972) Convergent and divergent character displacement. Biol J Linn Soc 4(1):39–68. 89. Stuart YE, Losos JB (2013) Ecological character displacement: glass half full or half empty? Trends Ecol Evol 28(7):402–408. 90. Tobias JA, et al. (2014) Species coexistence and the dynamics of phenotypic evolution in adaptive radiation. Nature 506(7488):359–363. 91. Grant PR, Grant BR (2006) Evolution of Character Displacement in Darwin’s Finches. Sci 313 (5784 ):224–226. 92. Weedon MN, et al. (2008) Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet 40(5):575–583. 93. Zhou X, Benson KF, Ashar HR, Chada K (1995) Mutation responsible for the mouse pygmy phenotype in the developmentally regulated factor HMGI-C. Nature 376(6543):771–774. 94. Webster MT, et al. (2015) Linked genetic variants on chromosome 10 control ear morphology and body mass among dog breeds. BMC Genomics 16(1):1– 17. 95. Lamichhaney S, et al. (2016) Structural genomic changes underlie alternative reproductive strategies in the ruff (Philomachus pugnax). Nat Genet 48(1):84– 88. 96. Foltz DR, et al. (2006) The human CENP-A centromeric nucleosome- associated complex. Nat Cell Biol 8(5):458–469. 97. Amsterdam A, et al. (2004) Identification of 315 genes essential for early zebrafish development. Proc Natl Acad Sci United States Am 101 (35 ):12792–12797. 98. Kupper C, et al. (2015) A supergene determines highly divergent male reproductive morphs in the ruff. Nat Genet advance on. Available at: http://dx.doi.org/10.1038/ng.3443. 99. Parmanne R, Rechlin O, Sjöstrand B (1994) Status and future of herring and sprat stocks in the Baltic Sea. Dana 10:29–59. 100. Laakkonen HM, Lajus DL, Strelkov P, Väinölä R (2013) of amphi-boreal fish: tracing the history of the Pacific herring Clupea pallasii in North-East European seas. BMC Evol Biol 13:67. 101. Laakkonen HM, Strelkov P, Lajus DL, Väinölä R (2014) Introgressive hybridization between the Atlantic and Pacific herrings (Clupea harengus and C. pallasii) in the north of Europe. Mar Biol 162(1):39–54. 102. Podos J, Nowicki S (2004) Beaks, Adaptation, and Vocal Evolution in Darwin’s Finches. Biosci 54 (6 ):501–510. 103. Grant BR, Grant PR (2010) Songs of Darwin’s finches diverge when a new species enters the community. Proc Natl Acad Sci 107 (47 ):20156–20163. 104. Ashburner M, et al. (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25(1):25–29.

59 105. Durand EY, Patterson N, Reich D, Slatkin M (2011) Testing for ancient admixture between closely related populations. Mol Biol Evol . doi:10.1093/molbev/msr048.

60

Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1192 Editor: The Dean of the Faculty of Medicine

A doctoral dissertation from the Faculty of Medicine, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-279969 2016