Natural selection and demography shape the genomes of New World

Lucas Rocha Moreira

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the Executive Committee of the Graduate School of Arts and Sciences

COLUMBIA UNIVERSITY

2021

© 2021

Lucas Rocha Moreira

All Rights Reserved

Abstract

Natural selection and demography shape the genomes of New World birds

Lucas Rocha Moreira

Genomic diversity is shaped by the interplay between mutation, genetic drift, recombination, and natural selection. A major goal of evolutionary biology is to understand the relative contribution of these different microevolutionary forces to patterns of genetic variation both within and across species. The advent of massive parallel sequencing technologies opened new avenues to investigate the extent to which alternative evolutionary mechanisms impact the genome and the footprints they leave. We can leverage genomic information to, for example, trace back the demographic trajectory of populations and to identify genomic regions underlying adaptive traits. In this disser- tation, I employ genomic data to explore the role of demography and natural selection in two New

World systems distributed along steep environmental gradients: the Altamira Oriole (Icterus gularis), a Mesoamerican bird that exhibits large variation in body size across its range, and the

Hairy and Downy woodpecker (Dryobates villosus and D. pubescens), two sympatric species whose phenotypes vary extensively in response to environments in North America.

In Chapter 1, I combine ecological niche model, phenotypic and ddRAD sequencing data from several individuals of I. gularis to investigate which spatial processes best explain geographic variation in phenotypes and alleles: (i) isolation by distance, (ii) isolation by history or (iii) isolation by environment. I find that the pronounced genetic and phenotypic variation in I. gularis are only partially correlated and differ regarding spatial predictors. Whereas genomic variation is largely explained by historical barriers to flow (IBH), variation in body size can be best predicted by contemporary environmental heterogeneity (IBE), which is consistent with a pattern produced by either natural selection or environmental plasticity.

In Chapter 2, I conduct whole genome resequencing on 140 individuals of Downy and

Hairy Woodpecker from across North America to more explicitly elucidate the impact of demog- raphy and natural selection on the genome. I find that despite spatial congruence in allele frequen- cies, population structure in these two species has been produced at different temporal scales.

Whereas Hairy Woodpeckers were isolated into two east-west glacial refugia, Downy woodpecker populations seem to have expanded from a single ancestral refugium. Demographic analyses sug- gest large variation in Ne over the past one million years in both Hairy and Downy Woodpeckers, with repeated episodes of bottleneck followed by population expansion, consistent with the onset of the climatic oscillations of the Pleistocene. Nucleotide diversity in both species was positively correlated with recombination rate and negatively correlated with gene density, suggesting the ef- fect of linked selection. The magnitude of this effect, however, seems to have been modulated by the individual demographic trajectory of populations and species. Nevertheless, patterns of nucle- otide diversity along the genome are highly correlated between Hairy and Downy Woodpecker, which may be attributed to pervasive selection acting on a conserved genomic landscape of re- combination.

Finally, in Chapter 3, I use a suite of statistical methods to scan the genome of Hairy and

Downy Woodpecker for signatures of natural selection associated with population-specific envi- ronmental differences. I test whether climatic adaptation was achieved through selection on the same loci in both species, which would indicate parallel genetic mechanisms for adaptation. I find limited evidence of genomic parallelism at the SNP level, but large parallelism at the gene level.

Candidate were involved in a broad range of biological processes, including immune re- sponse, nutritional metabolism, mitochondrial respiration, and embryonic development. Lastly, I identify potential candidates for key phenotypic traits in Downy and Hairy Woodpecker, such as genes in the IGF signaling pathway, putatively linked to differences in body size, and the mela- noregulin gene (MREG), potentially involved in plumage variation. Together, these findings high- light the significant role of demography and natural selection in shaping genomic variation.

Table of Contents

List of Figures ...... v

List of Tables ...... vii

List of Supplemental Materials ...... viii

Acknowledgements ...... x

Dedication ...... xii

Chapter 1 ...... 1

1.1. Abstract ...... 2

1.2. Introduction ...... 3

1.3. Materials and Methods ...... 7

1.3.1. Sampling and DNA extraction ...... 7

1.3.2. Double-digest restriction site-associated DNA (ddRAD) sequencing ...... 7

1.3.3. mtDNA dataset ...... 7

1.3.4. Genetic structure ...... 8

1.3.5. Demographic modeling ...... 9

1.3.6. Modeling contemporary and paleo-distributions ...... 10

1.3.7. Quantifying phenotypic data from museum specimens ...... 12

1.3.8. Predictors of genetic and phenotypic differentiation ...... 13

1.4. Results ...... 14

1.4.1. Characteristics of SNP dataset ...... 14

1.4.2. Genetic structuring across lowland Middle America ...... 15

i

1.4.3. Demographic history ...... 19

1.4.4. Patterns and correlates of phenotypic variation ...... 21

1.4.5. Predictors of genetic and phenotypic differentiation ...... 23

1.5. Discussion ...... 26

1.5.1. Phylogeographic structure is best explained by historical barriers ...... 26

1.5.2. Patterns of phenotypic variation are best explained by environment ...... 29

1.6. Conclusion ...... 31

1.7. Author contribution ...... 32

1.8. References ...... 33

1.9. Supplemental Material ...... 46

Chapter 2 ...... 53

2.1. Abstract ...... 54

2.2. Introduction ...... 55

2.3. Results ...... 58

2.3.1. Congruent population structure and genetic diversity ...... 58

2.3.2. Demographic history ...... 65

2.2.3. Genomic correlates of nucleotide diversity and differentiation ...... 69

2.2.4. Genetic load and the efficacy of selection ...... 75

2.4. Discussion ...... 78

2.4.1. Conserved properties of the genome underlie the correlated genomic landscape of

Hairy and Downy Woodpecker ...... 78

ii

2.4.2. The interplay between natural selection and recombination produces a heterogeneous

genomic landscape ...... 79

2.4.3. Dynamic population demography characterizes the evolution of Hairy and Downy

Woodpecker in the Pleistocene ...... 81

2.4.4. The efficacy of linked selection was affected by different evolutionary trajectories of

Downy and Hairy Woodpecker ...... 84

2.5. Conclusion ...... 85

2.6. Material and Methods ...... 86

2.6.1. Sample collection and whole genome sequencing ...... 86

2.6.2. Read alignment, variant calling and filtering ...... 87

2.6.4. Population structure ...... 89

2.6.5. Demographic inference ...... 90

2.5.6. Genetic diversity, recombination rates, and linkage disequilibrium ...... 92

2.6.7. Genomic predictors of regional variation in nucleotide diversity ...... 93

2.6.8. Natural selection and genetic load ...... 94

2.7. Author contribution ...... 95

2.8. References ...... 96

2.9. Supplemental Material ...... 107

Chapter 3 ...... 117

3.1. Abstract ...... 118

3.2. Introduction ...... 119

3.2. Results & Discussion ...... 123

iii

3.2.1. Genotype-environment association analysis (GEA) ...... 123

3.2.2. Parallelism at the genic level vs nucleotide level ...... 132

3.2.3. Signatures of elevated genetic differentiation ...... 135

3.2.4. FST-outlier analysis reveals selection on multiple genes related to immune system and

nutrition ...... 136

3.2.5. A wide array of biological processes underlie adaptation at the range peripheries .... 145

3.2.6. Parallel selection on the IGF signaling pathway ...... 149

3.2.7. Parallel selection in hemoglobin genes at high-elevation populations ...... 152

3.2.8. Melanoregulin as a candidate for plumage variation ...... 152

3.3. Conclusions ...... 155

3.4. Material and Methods ...... 156

3.4.1. Sample acquisition and whole genome sequencing...... 156

3.4.2. Read alignment, variant calling and filtering ...... 157

3.4.3. Genotype-environment association analysis ...... 158

3.4.4. Signature of selective sweep ...... 159

3.4.5. Population structure outliers ...... 159

3.4.6. FST-outlier analysis ...... 160

3.4.7. enrichment ...... 161

3.5. Author contribution ...... 161

3.6. References ...... 162

3.7. Supplemental Material ...... 178

iv

List of Figures

Figure 1.1. Geographic distribution of genetic variation in Icterus gularis...... 16

Figure 1.2. Landscape GIS surfaces utilized in CIRCUITSCAPE analyses...... 17

Figure 1.3. Principal component analysis of SNPs showing genetic differentiation among samples of Icterus gularis based on 11,873 unlinked SNPs...... 19

Figure 1.4. The best-fit demographic model: isolation with migration...... 21

Figure 1.5. Geographic variation in body size in Icterus gularis...... 23

Figure 1.6. Plots showing strongest effect sizes for associations between wing length and principal components of the environmental variables...... 24

Figure 1.7. Predictors of genome-wide genetic and phenotypic differentiation in Icterus gularis, as revealed by commonality analysis ...... 25

Figure 2.1. Geographic distribution of genetic variation and demographic history of the Downy

(D. pubescens) and Hairy Woodpecker (D. villosus) ...... 60

Figure 2.2. Population genetic structure in the Downy and Hairy Woodpecker...... 62

Figure 2.3. Spatial patterns of gene flow...... 63

Figure 2.4. Characterization of genome-wide genetic variation in Downy and Hairy Woodpecker.

...... 65

Figure 2.5. Changes in effective population size (Ne) over time and linkage disequilibrium (LD) in Downy and Hairy Woodpecker...... 67

Figure 2.6. Genomic predictors of nucleotide diversity in Downy and Hairy Woodpecker...... 73

Figure 2.7. Landscape of diversity and differentiation of 2 of Downy and Hairy

Woodpecker...... 75

Figure 2.8. Deleterious load in Downy and Hairy Woodpecker...... 77

Figure 3.1. Environmental variation across the ranges of Downy and Hairy Woodpecker...... 125

v

Figure 3.2. Scatter plot representing enriched biological processes in the candidate set identified in LFMM...... 129

Figure 3.3. Candidate SNPs for local adaptation in Downy and Hairy Woodpecker and their effects...... 132

Figure 3.4. Genomic parallelism in Downy and Hairy Woodpecker...... 134

Figure 3.5. FST-outlier analysis comparing Alaska (AK) and the Southeast (SE) population...... 138

Figure 3.6. Heatmap of the number of FST-outlier windows across all pairwise population comparisons and its correlates in Downy and Hairy Woodpecker ...... 145

Figure 3.7. Genomic signatures of selective sweep in the comparison between Alaska (AK) and the Southeast (SE) in a segment of chromosome 2 of Downy Woodpecker ...... 151

Figure 3.8. A candidate gene for plumage variation in Hairy Woodpecker ...... 155

vi

List of Tables

Table 1.1 Demographic parameter estimates from G-PhoCS with their respective 95% high posterior density in brackets...... 20

Table 1.2. Demographic model selection results showing the likelihood of each model in momi2.

...... 22

Table 2.1. Strength of correlation between nucleotide diversity (θπ) and gene density across the four genetic clusters of Downy and Hairy Woodpecker...... 72

Table 3.1. Number of candidate SNP associated with each predictor variable in LFMM 2...... 127

Table 3.2. Enriched gene ontologies for parallel FST-outlier genes across all pairwise population comparisons. Significance was determined through a Fisher’s Exact test and false discovery rate

(FDR) correction...... 139

Table 3.3. Candidate genes within windows of elevated FST in the comparison between Alaska

(AK) and the Southeast (SE)...... 147

vii

List of Supplemental Materials

Figure 1.S1. Geographic distribution and mtDNA (ND2) diversity in Icterus gularis...... 46

Figure 1.S2. Bayesian timetree of 19 ND2 haplotypes of Icterus gularis, plus I. auratus and I. nigrogularis as an outgroup...... 47

Figure 1.S3. Boxplot of body size, as measured by wing length (in mm), in three genetic groups detected in Icterus gularis ...... 48

Figure 1.S4 Predictors of genetic differentiation in Icterus gularis, as reveal by the commonality analysis...... 49

Figure 1.S5. Predictors of phenotypic differentiation in Icterus gularis, as reveal by the commonality analysis...... 50

Figure 1.S6. Predicted current and past distribution of Icterus gularis in Mesoamerica in MaxEnt.51

Figure 2.S1. Correlated landscape of diversity in Downy and Hairy Woodpecker...... 113

Figure 2.S2. Boxplot of recombination rate in each chromosome of Downy Woodpecker...... 114

Figure 2.S3. Boxplot of recombination rate in each chromosome of Hairy Woodpecker...... 114

Figure 2.S4. Correlation among genomic variables in Downy Woodpecker...... 115

Figure 2.S5. Correlation among genomic variables in Hairy Woodpecker...... 116

Table 1.S1. Loadings of the 19 Bioclim environmental variables on the first four principal component (PC) axes of temperature and precipitation, with their respective proportion of variance explained...... 52

Table 2.S1. Sample information...... 107

Table 2.S2. Model selection in fastsimcoal2...... 110

Table 2.S3. Parameter estimates for the best model in fastsimcoal2 and their respective 95% confidence intervals...... 111

Table 2.S4. Principal component regression...... 113

viii

Table 3.S1. Candidate SNPs found near genes detected in all three methods (LFMM 2, PCAdapt, and H-scan) and their respective annotation information...... 178

Table 3.S2. Enriched gene ontologies for FST-outlier genes across all pairwise population comparisons of Downy Woodpecker. Significance was determined through a Fisher’s Exact test and false discovery rate (FDR) correction...... 181

Table 3.S3. Enriched gene ontologies for FST-outlier genes across all pairwise population comparisons of Hairy Woodpecker. Significance was determined through a Fisher’s Exact test and false discovery rate (FDR) correction...... 182

Table 3.S4. Top 5 SNPs with the strongest environmental correlation for each climatic variable, according to LFMM 2...... 183

ix

Acknowledgements

I would like to thank my amazing advisor, Brian T. Smith, whose support and encourage- ment was crucial for the completion of this dissertation. He gave me the tools and the freedom to pursue a very independent research, and for that, I am very thankful. Many thanks to my wonderful lab mates in the Smith’s lab (both past and present), whose contributions were fundamental to the development of this research – Kaiya Provost, Jon Merwin, Vivien Chua, Glenn Seehozer, Greg- ory Thom, Elkin Tenorio, William Mauck, Lukas Musher, Laís Coelho, Amanda Rocha, and Bruno

Almeida. I have been so fortunate to have the opportunity to work with so many bright-minded people at one of my favorite places on the planet Earth, the American Museum of Natural History.

I will never forget the smile on my face seeing my photo eternalized next to Ernst Mayr, what an honor! The museum provided the happiest, most rewarding moments of my Ph.D. Many thanks to Paul Sweet, Thomas Trombone, Gabrielle Rosen, Bently Bird, and Peter Capaionolo for fueling my passion for museum-based science. To John Flynn, Rebecca Johnson, Maria Rios, and Anna

Manuel at the Richard Guilder Graduate School, and Sajesh Sign and Apurva Narechania for all their help navigating graduate school and taming Huxley and Cuvier. To all my friends and the staff at the E3B Department at Columbia University, in special Holly Fuong, Stephanie Schmiege,

Sebastian Heilpern, Alexandra Vamanu, Kyle Bukhari, and Lourdes Gautier.

I would also like to acknowledge my collaborators, Blanca E. Henandez-Baños and John

Klicka, and my Ph.D. committee, Joel Cracraft, Frank Burbrink, Molly Przeworski, Deren Eaton, and Don Melnick for their time and insight. This work would not be possible without the assis- tance of Lucas DeCicco, Matt Brady, and Paul Sweet, who were very generous to help collect samples in the field. I would also like to thank the following individuals and institutions for provid- ing tissue samples and specimen loans for this study: A. G. Navarro Sigüenza (Museo de Zoología;

UNAM), J. Klicka/S. Birk/R. Faucett (University of Washington Burke Museum), C. M. Milensky

(Smithsonian Institution), C. Dardia (Cornell University Museum of Vertebrates), G. Spellman/A.

x

Doll (Denver Museum of Nature & Science), B. Marks/S. Hackett/J. Bates (Field Museum of

Natural History), F. Sheldon/D. Dittmann (LSU Museum of Natural Science), K. Barker/T. Im- feld (Midwest Museum of Natural History), C. Witt/A. Johnson/M. Anderson (Museum of South- western Biology). This research would not be possible without the funding support of Conselho

Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Columbia University E3B De- partment/GSAS, the Chapman Memorial Fund and the Linda J. Gormezano Memorial Fund

(AMNH), the American Ornithology Society Research Award (AOS), and the Society of System- atic Biologists Graduate Student Research Award (SSB).

My final thanks go to all the friends who helped me keep my sanity throughout this Ph.D. years. Pedro Piffer and Manoela Poletto, my second family and my main support system during the hard moments of graduate school. Nicole Mioni, who thought me to love myself and be a better person. Rebecca Barr, my best board game partner. Bruno Lemos and Thalis Pires, for all the vegan holiday parties. My cousin, Marianna Rodovalho, who always had the patience to listen to my 10 min long audio messages complaining about life or just sharing random thoughts. And my extraordinary family (Rosenval A. Moreira, Eva L. da Rocha Moreira, Matheus R. Moreira, and

Emanuel R. Moreira), who always supported me and nurtured my passion for science and biology.

xi

Dedication

To my parents, Eva L. da Rocha Moreira and Rosenval A. Moreira, who worked so hard to give me the education I needed to be the first Ph.D. of the family.

xii

Chapter 1

SPATIAL PREDICTORS OF GENOMIC AND PHENOTYPIC VARIATION DIFFER IN A LOWLAND MIDDLE AMERICAN BIRD (Icterus gularis)

* Published in Molecular Ecology (2020) 29 (16), 3084-3101: https://dx.doi.org/10.1111/mec.15536

1

1.1. Abstract

Spatial patterns of intraspecific variation are shaped by geographical distance among populations, historical changes in gene flow and interactions with local environments. Although these factors are not mutually exclusive and operate on both genomic and phenotypic variation, it is unclear how they affect these two axes of variation. We address this question by exploring the predictors of genomic and phenotypic divergence in Icterus gularis, a broadly distributed Middle American bird that exhibits marked geographical variation in body size across its range. We combined a compre- hensive single nucleotide polymorphism and phenotypic data set to test whether genome-wide genetic and phenotypic differentiation are best explained by (i) isolation by distance, (ii) isolation by history or (iii) isolation by environment. We find that the pronounced genetic and phenotypic variation in I. gularis are only partially correlated and differ regarding spatial predictors. Whereas genomic variation is largely explained by historical barriers to gene flow, phenotypic diversity can be best predicted by contemporary environmental heterogeneity. Our genomic analyses reveal strong phylogeographical structure coinciding with the Chivela Pass at the Isthmus of Tehuantepec that was formed during the Pleistocene, when populations were isolated in north–south refugia.

In contrast, we found a strong association be- tween body size and environmental variables, such as temperature and precipitation. The relationship between body size and local climate is consistent with a pattern produced by either natural selection or environmental plasticity. Overall, these re- sults provide empirical evidence for why phenotypic and genomic data are often in conflict in taxonomic and phylogeographical studies.

2

1.2. Introduction

Intraspecific variation is often spatially structured across landscapes. Historical events that change gene flow regimes, distance-biased mating, and interactions with local environmental conditions are mechanisms that can cause species to differentiate across their ranges (Bradburd et al., 2013;

Wang & Bradburd, 2014; Weber et al., 2017). In phylogeographic or population genetic studies, these alternative scenarios are tested by assessing whether genetic patterns are best-explained by

(1) isolation by distance (IBD; Wright, 1943), (2) isolation by history (IBH; Mayr, 1942; Vascon- cellos et al. 2019), and (3) isolation by environment (IBE; Orr & Smith 1998; Schneider et al. 1999).

IBD refers to the differentiation caused by the non-random mating of individuals due to their limited dispersal distances. Under IBD, nearby populations tend to be more similar than distant ones (Wright, 1943). For example, IBD can account for genetic differentiation across an environ- mental gradient where body size is extremely variable (Seeholzer & Brumfield, 2017). In addition to geographic distance, historical processes can have a long and persistent effect on current pat- terns of population differentiation (Vasconcellos et al. 2019). IBH is a common mode of diver- gence for a number of species separated by topographic features that restrict gene flow across populations, such as rivers and mountains (Mayr, 1942; Smith et al. 2014), as well as transient barriers caused by past climatic fragmentation of habitat (e.g., glacial periods; Davis, 2001; Zink et al. 2004; Araújo et al. 2008; Vasconcellos et al. 2019). Finally, IBE arises when gene flow is con- trolled via natural selection across an environmentally heterogeneous landscape (Orr & Smith

1998; Schneider et al. 1999; Wang & Bradburd 2014). Under IBE, genetic differentiation increases as a function of environmental differences, independent of geographic distance (Wang & Brad- burd, 2014). These mechanisms are not mutually exclusive and operate on both phenotypic and genome-wide genetic variation (Wang et al., 2013). However, it is unclear how these two axes of variation (i.e., phenotypic versus genome-wide genetic variation) are correlated and which of the aforementioned mechanisms best explain their geographic distribution. Identifying the spatial pat- terns that underlie genetic and phenotypic variation within and among populations will provide

3 context for the relative importance of adaptive and stochastic processes in generating and main- taining intraspecific diversity (Holderegger et al., 2006; Keller & Taylor, 2008; Ellegren & Galtier,

2016).

At the genomic level, mutation, recombination, genetic drift, and natural selection are the primary drivers of evolutionary divergence, but these forces do not affect the genome equally (Wu,

2001; Gossmann et al., 2014; Burri et al., 2015; Samuk et al., 2017). Whereas mutation, recombi- nation, and genetic drift are ubiquitously influencing the entire genome, natural selection only acts on regions that affect an individual’s fitness (e.g., coding and regulatory regions; Feder et al., 2012).

Similarly, whereas processes of IBH and IBD produce genome-wide signatures, IBE tends to af- fect only regions that are expressed phenotypically (Coop et al. 2010; Le Corre & Kremer 2012;

Savolainen et al. 2013). Nevertheless, in cases where selection against maladapted immigrants (i.e., local adaptation) is strong enough to constrain gene flow, IBE might lead to genome-wide diver- gence, which can be easily detected with neutral markers (Shafer & Wolf, 2013; Weber et. al.; 2017).

A review found that a larger proportion of surveyed population genetic studies using neutral mark- ers showed support for IBE instead of IBD, indicating that environmental factors have a strong influence in generating diversity in many systems (Sexton et al., 2014), as is seen in snake species in western North America (Myers et al., 2019).

At the phenotypic level, disentangling the effects of these different evolutionary forces is challenging because phenotypic traits are simultaneously affected by random drift, natural selection and environmental plasticity (Mitchell-Olds et al. 2007; Nonaka et al. 2015; Zamudio et al., 2016).

Not surprisingly, patterns of phenotypic and genetic variation are often in disagreement in taxo- nomic and phylogeographic studies (reviewed in Zamudio et al., 2016). Without knowledge of the exact genomic basis and heritability of these traits, it is difficult to precisely determine which forces have shaped phenotypic variation (Mitchell-Olds et al. 2007). However, when combined with ge- nome-wide genetic data, phenotypic information can provide insight into which mechanisms might have played a role in driving the origins of phenotypic variation (Rausher & Delph; 2015;

4

Cabanne et al. 2014). For example, if geographic patterns of phenotypic variation closely coincide with historical barriers to dispersal (IBH) or change clinally with geographic distance (IBD), phe- notypic differences among populations might have evolved primarily as a byproduct of genetic drift, potentially reinforced by natural selection across the barrier or along the cline (Lande; 1980).

This is often the case because the longer populations are isolated, the more phenotypically distinct they tend to be (Lande; 1980; Wang & Summers, 2010; Winger & Bates, 2015; Zamudio et al.,

2016). Conversely, a correlation between geographic patterns of phenotypic and environmental variation (IBE) could indicate that local adaptation is in play, causing spatial clustering of pheno- types, even when genome-wide diversity is homogeneous (Savolainen et al., 2013; Tigano & Frie- sen, 2016). IBE in phenotypes could also arise due to environmental plasticity, in the absence of genetic differentiation (Pfennig et al., 2010; Gruber et al., 2013; Landry & Aubin-Horth, 2014).

Identifying the spatial mechanisms underlying phenotypic diversity will require understanding how intraspecific variation is correlated with geographical and environmental space.

To empirically examine the predictors of genome-wide genetic and phenotypic divergence, we investigated the Altamira Oriole (Icterus gularis), a resident bird distributed from southern Texas along the Caribbean coast of Mexico and the Yucatan Peninsula, along the Pacific coast of Mexico from Guerrero south to west-central Nicaragua, and locally in the interior valleys of Guatemala and Honduras (Howell & Webb, 1995; Figure 1.1). Icterus gularis occurs in a range of habitats, including arid scrubland, scattered clusters of trees in open country, and humid woodland edges, and exhibits atypical geographic variation in body size for birds, with taxa exhibiting a striking 20% average size difference (Dickerman, 2007). The most recent taxonomic review of the species rec- ognized three taxa that differ significantly in body size: I. g. gularis, the southernmost subspecies comprised of the largest individuals, I. g. mentalis, the northernmost subspecies with the smallest individuals, and I. g. flavescens, containing intermediate size individuals occurring in Central west- ern Guerrero, Mexico (Dickerman, 2007). The distribution of these body sizes, however, follows an intriguing geographic pattern where large-bodied populations of I. g. gularis along the Pacific

5 coast of Mexico are separated by small-bodied populations of I. g. mentalis. Individuals abruptly shift in size within a 125 km stretch of land between Oaxaca and Chiapas and reverse back near the border between Guatemala and El Salvador (Dickerman, 2007). The species range coincides with well-known biogeographic barriers in Mexico, such as the Isthmus of Tehuantepec and the

Trans-Mexican Neovolcanic Belt, and its habitat has been severely affected by climatic oscillations of Pleistocene (García-Moreno et al., 2004; Klicka et al., 2011; Gutiérrez-Rodríguez et al., 2011).

We evaluate whether the atypical phenotypic patterns in I. gularis arose as a result of historical population divergence (i.e., byproduct of genome-wide genetic structure) and whether similar land- scape drivers (i.e., IBD, IBE or IBH) promote differentiation in both types of intraspecific varia- tion.

To address these questions, we collected genomic, phenotypic, and environmental data with the goal of exploring the evolutionary history of I. gularis and elucidating the drivers of ge- nomic and phenotypic differentiation. We used an integrative landscape genomics approach to test whether intraspecific variation (both genome-wide genetic and phenotypic) in I. gularis is best ex- plained by (a) isolation by distance (IBD), (b) isolation by history (IBH), or (c) isolation by envi- ronment (IBE). We first characterize genetic variation by analyzing a SNP dataset and mtDNA to investigate patterns of phylogeographic structure and to model demographic history. We then measure phenotypic traits from museum specimens and examine whether patterns of body size variation are concordant with genome-wide genetic structure. We predict that if a similar spatial process (i.e., IBD, IBH, and IBE) is structuring multiple axes of intraspecific variation, then pat- terns of genome-wide genetic structure and body size should coincide. Alternatively, if genome- wide genetic variation and body size are uncorrelated, predictors of population differentiation will differ between the two, indicating alternative factors influencing body size variation. By integrating genetic and phenotypic variation in a variable taxon distributed along the pronounced environ- mental gradients in lowland Middle America, we explore how variation arises in a biodiversity hotspot.

6

1.3. Materials and Methods

1.3.1. Sampling and DNA extraction

We sampled 78 vouchered specimens of I. gularis distributed across the entire range of the species, which includes the Gulf of Mexico (Tamaulipas, Veracruz, and Tabasco), Central Mexico and Bajio

(San Luis Potosi, Queretaro, and Hidalgo), the Yucatan Peninsula, the Pacific Coast of Mexico

(Guerrero, Oaxaca, and Chiapas), Guatemala, El Salvador, and Honduras (Figure 1.1a; Figure

1.S1). We also included five individuals of I. nigrogularis, the sister species of I. gularis (Powell et al.,

2014), as an outgroup taxon. Each sampled locality comprised 1–10 individuals. We extracted total genomic DNA from tissues using the DNeasy tissue extraction kit (Qiagen, Valencia, CA) and quantified DNA extractions using a QUBIT 2.0 fluorometer.

1.3.2. Double-digest restriction site-associated DNA (ddRAD) sequencing

We collected double-digest restriction site-associated (ddRAD) sequencing data using the com- mercial service at the University of Texas at Austin Genomic Sequencing and Analysis Facility (UT

GSAF). They followed a protocol modified from Peterson et al. (2012). Briefly, DNA extractions were normalized to equal concentration and volume, digested with restriction enzymes EcoRI and

MspI, size selected to 200–500 bp fragments, ligated to adaptors, purified and sequenced on a single lane of an Illumina HiSeq 4000 PE 2x150. Raw reads were de-novo assembled using ipyrad

0.7.22 (Eaton & Overcast, 2016; https://github.com/dereneaton/ipyrad). We clustered reads within samples and across ddRAD loci using a 90% sequence threshold. Loci with more than two alleles, heterozygosity above 50%, and depth of coverage below six were excluded. For down- stream analyses, we further retained only loci present across at least 50% of individuals and ex- cluded samples with an excessive amount of missing data (> 75%).

1.3.3. mtDNA dataset

To examine a fast-evolving, maternally inherited mitochondrial gene, we also collected the entire

7

1041 bp long NADH dehydrogenase subunit 2 (ND2). We included three sequences available through Genbank (FMNH ATP91-093, MZFC KEO-003 and BMNH 42540; Omland et al. 1999).

To investigate the relationship among mtDNA haplotypes, we first produced a median-joining haplotype network using PopART (Leigh & Bryant, 2015). Next, we constructed a Bayesian phy- logenetic tree and estimated divergence times using BEAST 2.4.6 (Drummond & Bouckaert,

2014). We determined the best substitution model (TN93) by using a maximum likelihood model selection approach in MEGA-X (Kumar et al. 2018). We used a relaxed log-normal clock model with mean substitution rate 0.0105 substitutions per site per million year (Weir & Schluter, 2008) and a Yule process prior for branch lengths. We ran the MCMC chain for 50 million generations, sampling every generation and using 10% burn-in. TRACER v1.6 (Rambaut et al., 2016) was used to examine convergence. We obtained appropriate mixing, as indicated by values of EES > 200

(Drummond et al., 2012).

1.3.4. Genetic structure

To characterize patterns of genetic structure across samples, we conducted a principal components analysis (PCA) using the R package SNPRelate (Zheng et al., 2012). We used only unlinked SNPs

(linkage-disequilibrium threshold < 0.2) present in more than 25% of samples, and with minor allele frequency larger than 0.03 to avoid potential biases. We then interpolated each principal component onto space using triangular interpolation (TIN) in QGIS 3.0 (http://www.qgis.org).

We further characterized genetic structure using STRUCTURE 2.3 (Pritchard et al., 2000), a Bayes- ian clustering approach that assigns individuals to a pre-defined number of populations. We con- sidered an admixture model with correlated allele frequencies and no prior information regarding sampling locations. We used StrAuto 1.0 (Chhatre & Emerson, 2017) to automatize and parallelize five independent runs for each assumed number of genetic clusters (K = 1–8). Runs consisted of

100,000 Markov chain Monte Carlo (MCMC) cycles after a burn-in period of 2,500 generations.

The output generated from the independent runs were used as input in the web server CLUMPAK

8

(Kopelman et al., 2015) to summarize and display individual probabilities of membership to each genetic cluster. We evaluated the best K using the Evanno et al. (2005) method, which finds the number of clusters that maximizes the second order rate of change of the log probability of data

(ΔK). Because uneven sampling and hierarchical structure are known to negatively impact STRUC-

TURE results (Janes et al., 2017; Puechmaille, 2016), we also ran STRUCTURE by subsampling individuals hierarchically. We finally explored the relationship among individuals by producing a maximum likelihood tree from the concatenated dataset in RAxML (Stamatakis, 2014). We used the model of substitution GTR+gamma and performed 100 bootstraps to evaluate confidence.

1.3.5. Demographic modeling

We modeled the demographic history of I. gularis using two independent approaches. First, we used the Generalized Phylogenetic Coalescent Sampler (G-PhoCS; Gronau et al., 2011), a full- likelihood method based on the multispecies coalescent model to estimate demographic parame- ters (mutation-scaled ancestral population size, divergence time, and migration rate). We estimated demographic parameters in G-PhoCS from two separate models: pure isolation and isolation with migration. The pure isolation model assumes that an ancestral population with mutation rate pa- rameter θ (θ = 4Neμ for a diploid , where Ne is the effective population size and μ is muta- tions per nucleotide site per generation) splits into two populations, each with different θ param- eters, at time T (T = τ/μ) generations ago. The isolation with migration model assumes an addi- tional migration rate per generation parameter Msx (Msx = msx x θx/4), which is the proportion of individuals in population x that migrated into population s per generation (Gronau et al., 2011).

G-PhoCS employs gamma distributions to specify prior distributions for parameters (θ, τ and msx).

To assess the robustness of posterior estimates of demographic parameters to the chosen priors, we ran analyses considering a set of prior distributions for θ, τ, and msx. Estimates of θ and τ were consistent across priors so we report results for only the following priors: α = 1 and β = 300 for θ and τ; α = 0.002 and β = 0.00001 for msx. For each model, we ran 500,000 MCMC iterations,

9 sampling every 100 generations. MCMC results were diagnosed in TRACER v1.6 (Rambaut et al.,

2016) and demographic parameter estimates converted into biologically informative values using the Collared Flycatcher (Ficedula albicollis) germline mutation rate (4.6 × 10-9 mutations per site per generation; Smeds et al., 2016) and a generation time of one year (Mooers, 1994).

Second, because G-PhoCS does not allow for model comparison, we also ran momi2

(Kamm et al., 2018), a computationally efficient composite-likelihood approach that fits the em- pirical value of site frequency spectrum (SFS) to its theoretical expected value. We employed momi2 to estimate parameters and test the support for competing demographic scenarios. To reduce computational effort, we randomly selected 15 samples in each population (north and south of the Chivela Pass). We then tested a set of alternative two-population models to find the demo- graphic scenario that best fit our empirical site frequency spectrum (SFS). We implemented three models that varied regarding the presence and timing of gene flow across the Chivela Pass: 1) pure isolation, 2) isolation with migration, and 3) isolation with secondary contact. For all models, we allowed effective population sizes (Ne) to change at any point in time after the population split.

Because gene flow is modeled as pulse events in momi2, we added four equally distant events of gene exchange as a function of the time of divergence. For the secondary contact model, we con- strained the migration events to happen after the Last Glacial Maximum (LGM; ~22,000 years ago). For each model, we ran 10 optimizations, selecting the one with the largest maximum likeli- hood value for model selection. We used relative Akaike information criterion (AICw; Sakamoto et al., 1986) to select the best-fit model and performed 300 nonparametric bootstraps to generate mean/median and 95% confidence interval for each estimated demographic parameter. Confi- dence intervals were calculated using the adjusted bootstrap percentile (BCa) in the R package boot

(Canty & Ripley, 2012), which corrects for the skewness of the data.

1.3.6. Modeling contemporary and paleo-distributions

To explore changes in the distribution of I. gularis, we modeled its climatic niche using MaxEnt

10

3.2.2 (Phillips et al., 2006), which is implemented in the R package dismo v. 1.0-5 (Hijmans et al.,

2005). MaxEnt uses a machine learning algorithm to find the model that provides the maximum entropy probability distribution for the species occurrence, given a set of presence localities, ran- domly generated pseudo-absence localities, and environmental predictors (Phillips et al., 2006). A total of 861 occurrence points was obtained from VertNet (http://vertnet.org), an online data platform for specimen records from museum collections. To reduce potential biases associated with uneven species occurrence data, we further used the R package spThin (Aiello-Lammens et al., 2015) to retain only records spaced 10 km from each other, resulting in 203 presence points.

To construct ecological niche models (ENMs), we used the 19 bioclimatic variables from the

WorldClim database (Hijmans et al., 2005) at 30 arc-sec resolution (c. 1 km) cropped to Middle

America. Because MaxEnt is able to downweight variables with low importance, using multiple correlated variables does not have a big impact on the model (Elith et al., 2011). We generated pseudo-absence localities by sampling 10,000 points from the background environmental space of

I. gularis. We then fine-tuned our model settings using the R package ENMeval (Muscarella et al.,

2014) by choosing the set of parameters that produced the model with the highest value of area under the receiver operating characteristic curve (AUC). To randomly split occurrence points into training (80%) and testing (20%) datasets, we used the k-fold (K = 5) algorithm. The best model had the following parameters: regularization multiplier = 0.5, features = linear, quadratic, product and hinge. Finally, to convert the continuous model into a presence/absence raster, we used the fixed cumulative value 5 threshold (0.145), as this provided the best fit to the empirical distribution of the species. Cells with suitability values below this threshold were considered unsuitable.

To investigate past changes in the species distribution, the model under current conditions was projected to mid-Holocene (MID; ~6,000 years ago) and Last Glacial Maximum (LGM;

~22,000 years ago) using the same set of 19 bioclimatic variables drawn from the WorldClim da- tabase (Hijmans et al., 2005). To account for the uncertainty associated with different paleoclimatic projections, we averaged suitability scores across three different atmospheric circulation models

11

(Marmion et al., 2009): Community Climate System Model (CCSM4), Max-Planck-Institut für Me- teorologie (MPI-ESM-P), and Model for Interdisciplinary Research on Climate (MIROC). We ap- plied the same threshold utilized for the present distribution to produce presence/absence maps for the past. Similarly, cells with suitability values below the threshold were considered unsuitable.

We estimated geographic distance, as well as contemporary and historical climatic connec- tivity among localities, through CIRCUITSCAPE 4.0 (McRae & Beier, 2007). CIRCUITSCAPE uses electrical circuit theory to estimate the total resistance of the landscape separating a pair of localities. To obtain the geographic distance among points, we calculated the pairwise resistance distance through a “flat” landscape where all cells of the raster had the same resistance value (Fig- ure 1.2a). For contemporary and paleo connectivity, we utilized the ecological niche models gen- erated through MaxEnt (Phillips et al., 2006) for LGM (~22,000 years ago) and current climates, respectively, to specify conductance values in CIRCUITSCAPE (Figure 1.2b-c). Large values of suitability represented high levels of connectivity, whereas low values of suitability represented limited dispersal. To avoid biases, we removed samples from grids that were not suitable across all time slices (i.e., both in the LGM and present).

1.3.7. Quantifying phenotypic data from museum specimens

To explore the potential drivers of phenotypic divergence across individuals of I. gularis, we exam- ined and measured three morphological traits in vouchered specimens available at the American

Museum of Natural History (N=171), the Museum of Zoology Alfonso L. Herrera of UNAM

(N=46), and the Museum of Zoology of ECOSUR Chetumal (N=3): (1) wing chord length, a widely-used proxy for body size in Randbirds (Rand, 1961), (2) bill length, and (3) tarsus length.

The measured specimens were different from the genetic dataset because skins were not accessible for most of our genetic samples. We removed from downstream analyses juvenile individuals, as these have been suggested to differ in body size (Dickerman, 2007). Finally, we retained only

12 georeferenced specimens or specimens that could be georeferenced to a maximum 50 km uncer- tainty, resulting in 151 samples. We performed a k-means clustering analysis to assign samples to phenotypic categories based on the three measured traits. We determined the optimal number of phenotypic clusters by using the average silhouettes method implemented in the R package facto- extra (Kassambara & Mundt, 2017). We evaluated whether variation in traits was associated with the genetic cluster which the sample presumably belonged to based on geographic location by conducting a non-parametric Kruskal-Wallis test, part of the base package in R. We also tested for homogeneity of variances among genetic groups using a non-parametric Fligner-Killeen test, also part of the base package in R. Finally, to assess the association between phenotype and environ- ment, we further ran a series of multivariate linear models between environmental variables and wing length values, using the assumed genetic cluster and sex as additional fixed effects in R. We restricted the analysis to wing length because we found it to best predict spatial variation in body size. We separated the 19 bioclimatic variables from the WorldClim database (Hijmans et al., 2005) into temperature (bio1 to bio11) and precipitation (bio12 to bio19) variables and conducted inde- pendent principal component analyses (PCA) to reduce dimensionality, retaining the first four principal components, which together explained more than 95% of the total environmental varia- tion in each set of variables.

1.3.8. Predictors of genetic and phenotypic differentiation

To evaluate the relative contribution of different spatial processes generating patterns of genetic and phenotypic differentiation in I. gularis, we used a multiple matrix regression with randomization approach (Legendre et al., 1994; Wang et al., 2013) in conjunction with variance partitioning via commonality analysis (Prunier et al., 2015). We tested the effects of the following predictors on genetic and phenotypic distances: (1) geographic distance (IBD), (2) environmental dissimilarity

(ENV), (3) and contemporary (PRES) and paleo connectivity (LGM). Pairwise genetic distance among individuals was measured by the identity-by-state (IBS) index, estimated via the R package

13

SNPRelate (Zheng et al., 2012). Pairwise phenotypic distance among individuals was calculated as the absolute difference between two individuals' wing length, as a proxy for difference in body size. Environmental dissimilarity was estimated by performing a principal component analysis on the 19 bioclimatic variables from WorldClim (Hijmans et al., 2005) and calculating the multivariate

Euclidean distance among points in R (R Development Core Team 2008). We alternatively sepa- rated environmental dissimilarity into temperature and precipitation components, but the results were qualitatively similar (Figure 1.S4 & 1.S5).

We performed multiple matrix regression on z-transformed matrices using the MMRR function from the supplementary material of Wang (2013). We conducted 1,000 permutations to assess the significance of each predictor. We further evaluated the relative importance of each predictor via a commonality analysis (CA), which is a method developed to partition the total model variance into common and unique effects (Prunier et al., 2015). This approach has the ad- vantage of being able to account for multicollinearity among predictor variables, which can cause spurious relationships. Commonality coefficients were calculated through the R package yhat (Ni- mon et al., 2013), and 95% confidence intervals around these coefficients were computed by 1,000 bootstrap replicates.

1.4. Results

1.4.1. Characteristics of SNP dataset

We obtained a total of 74,307,192 reads for 83 samples, including the outgroup. We discarded 14 samples due to low read recovery (<200,000 reads per sample), resulting in a total of 69 samples.

We chose to discard these samples because they were redundant regarding geographical represen- tation and their removal substantially increased the total number of recovered loci, reducing the amount of missing data. A total of 76,887 loci were obtained after excluding low-quality samples.

After filtering paralogues and missing data, 19,232 loci were retained for downstream analyses (see

14

Materials and Methods). The final data set, after excluding the outgroup taxon, produced an aver- age of 13,566 loci per sample with a mean cover- age depth of 19.7× (SD = 63.6×). We obtained a total of 125,112 SNPs (average of 6.5 SNPs per locus), of which 73,298 were parsimony-informa- tive sites.

1.4.2. Genetic structuring across lowland Middle America

For the PCA, we used a subset of 11,873 unlinked SNPs that were filtered from the total data set

(see Material and Methods). The first three principal components explained 21.6% of the total genomic variation. PC1, which explained 11.5% of the variation, corresponded to the major split across the Chivela Pass (Figure 1.3a,b). PC2, on the other hand, discriminated populations mainly south of the Pass, explaining 6.1% of the total variation (Figure 1.3a,c). Finally, PC3 explained 4% of the variation and separated the Northeast population from the remaining ones (Figure 1.3a,d).

STRUCTURE analysis supported two clusters (best K=2; Figure 1.1b) with little admix- ture between them. In accordance with the PCA analysis, this grouping separates samples from the Atlantic and Pacific coasts, north and south of the Chivela Pass respectively. To assess potential patterns of population substructure in I. gularis, we also ran STRUCTURE hierarchically by ana- lyzing samples from each major genetic group independently (i.e., south or north of the Chivela

Pass). Within both north and south samples, we found support for K=2 as the best partitioning of individuals. In the northern group, STRUCTURE analysis revealed additional substructure in I. gularis suggesting differentiation in the Northeast samples (e.g., Tamaulipas and San Luis Potosi).

However, this pattern could be an artifact of our sampling gap in Veracruz. In the southern group, the two recovered clusters form a continuum of change in ancestry proportions from west to east, consistent with an isolation by distance pattern (Mantel test: r = 0.82; p = 0.001). Nevertheless, two groups are detected: one immediately south of the Chivela Pass and the other towards the southern extreme of the distribution, in El Salvador and Honduras. It is worth noticing that this substructure is not detected when the entire dataset is used because the ΔK method is more likely

15 to detect only the uppermost level of genetic structure, highlighting the importance of performing hierarchical STRUCTURE analysis, as recommended elsewhere (Janes et al., 2017; Pritchard et al.,

2000; Puechmaille, 2016).

Figure 1.1. Geographic distribution of genetic variation in Icterus gularis. (A) Geographic map indicating the range of I. gularis (yellow shade), the locality of the ddRAD samples, and their respective admixture proportions from STRUCTURE analysis (pie charts). The blue inset shows a zoom in of the Isthmus of Tehuantepec region. (B) Results of the STRUCTURE analysis for the best fit K = 2 with the entire dataset (above) and hierarchically within each genetic cluster (K = 2 in each group; below). Each bar indicates an individual’s estimated ancestry proportion for each genetic cluster, represented by different colors, grouped by clade and then ordered by lati- tude. (C) Maximum-likelihood tree based on a concatenated SNP dataset in RAxML. Bootstrap 16 support is shown by the nodes. Nodes with bootstrap values < 95 were collapsed. Colored boxes indicate different geographic regions in Middle America. 1: Tamaulipas; 2: San Luis Potosi; 3: Queretaro; 4: Hidalgo; 5: Guerrero; 6: Veracruz; 7: Oaxaca; 8: Tabasco; 9: Yucatan; 10: Chiapas; 11: Guatemala; 12: El Salvador; 13: Honduras.

The maximum likelihood tree constructed from the concatenated ddRAD loci showed a topology concordant with STRUCTURE and the PCA analysis (Figure 1.1c). There are two clades

– one composed of samples from along the coast of the Gulf of Mexico and the Yucatan Peninsula

(i.e., north of the Chivela Pass), and another composed of samples from sites along the Pacific coast (i.e., south of the Chivela Pass – Oaxaca, Chiapas, Guatemala and El Salvador). The southern group shows a well-supported clade comprised of individuals from Guerrero, Oaxaca and Chiapas but less resolution elsewhere. Within the northern group there is a well-supported northeastern clade that includes samples from Tamaulipas, San Luis Potosi, Hidalgo and Queretaro, but the relationships among the other samples (i.e., Yucatan Peninsula and Veracruz) was unresolved.

Overall, the southern clade appears to have accumulated more fixed differences that allow phylo- genetic sorting, when compared to the northern clade.

Figure 1.2. Landscape GIS surfaces utilized in CIRCUITSCAPE analyses. Each dot repre- sents a sampling location. (A) “Flat” surface of resistance. The area in red shows the current pre- dicted range of Icterus gularis based on the thresholded MaxEnt niche model. (B) Suitability val- ues for I. gularis in the present. Warm colors correspond to areas of high suitability whereas cold colors represent areas of low suitability, according to the MaxEnt niche model. (C) Suitability values for I. gularis in the Last Glacial Maximum (LGM; ~22,000 years ago). Warm colors corre- spond to areas of high suitability whereas cold colors represent areas of low suitability, according to the MaxEnt niche model.

17

Across the 76 samples analyzed for ND2, 19 unique haplotypes were found. The mtDNA haplotype network also indicated genetic structure across the Chivela Pass (Figure 1.S1). Two ge- netic groups, separated by eight point mutations, consist of samples from the Atlantic and Pacific portions of the species range, north and south of the Chivela Pass. Each of these genetic clusters shows a star-like shape. The northmost part of the distribution (Tamaulipas, San Luis Potosi, etc.) exhibited the least amount of diversity, being represented by a single haplotype (haplotype 9).

Consistent with the haplotype network, the mtDNA tree showed two clades that diverged 589,900 years ago (95% HPD = 330,000–871,200; Figure 1.S2). These clades are composed of haplotypes unique to samples from either coast (Atlantic and Pacific). In the north clade, two highly supported lineages emerge, one containing haplotypes found mostly in the Yucatan Peninsula and one con- taining haplotypes found in Tabasco, Chiapas and Veracruz (haplotypes 9 and 10) plus the north- east region (haplotype 8). In the south clade, the haplotypes from Guerrero and Oaxaca (haplotype

19) diverge from the ones from Honduras (haplotypes 18) and those from the remaining southern locations.

18

Figure 1.3. Principal component analysis of SNPs showing genetic differentiation among samples of Icterus gularis based on 11,873 unlinked SNPs. (A) Each dot represents an indi- vidual with the respective color indicating geographic location. Each axis representing the score in one of the three first principal components (PC1: 11.5% of variation; PC2: 6.1% of variation; PC3: 4% of variation). (B) Geographic interpolation of PC1. (C) Geographic interpolation of PC2. (D) Geographic interpolation of PC3.

1.4.3. Demographic history

We inferred demographic parameters in G-PhoCS using a set of 19,189 loci. Parameter estimates were consistent between the pure isolation and the isolation with migration models, as well across the sets of priors, but the estimates of divergence differed (Table 1.1). Our G-PhoCS inference for the model of isolation with migration suggests that populations diverged across the Chivela Pass around 182,000 years ago (176,000–185,500; 95% CI). The ancestral Ne was estimated to be

159,000 individuals (156,000–200,000; 95% CI). The mean Ne in the southern population

(331,000; 324,000–338,000; 95% CI) was about 3x larger than in the northern population (126,000;

122,000–128,000; 95% CI). We estimated an average of 0.0002 migrants per generation (0.000204–

0.000241; 95% CI) from the northern population to the southern population and an average of

0.0005 migrants per generation (0.000479–0.000553; 95% CI) from the southern population to the northern population.

The best-fit demographic model in momi2 was isolation with migration, with the largest relative likelihood (AICw = 1578.15; Table 1.2; Figure 1.4). The second-best model, isolation with secondary contact, had an AICw of 1587.78 (Table 1.2). Some parameter estimates, specifically migration rates, exhibited a pathological runaway behavior common in SFS-based demographic inference algorithms (Rosen et al., 2018), and therefore should be interpreted with caution. For the isolation with migration model, median estimates of divergence time were slightly younger than in G-PhoCS (148,204 years ago; 124,345–169,809; 95% CI). The northern population had an esti- mated Ne of 212,007 individuals (191,058–224,871; 95% CI) until it drastically drops to 34,375 individuals (15,063–51,289; 95% CI) around 55,303 years ago (44,838–69,337; 95% CI). Estimates

19 for the southern population suggest a steady Ne of 217,765 individuals since the population split

(205,310–231,767; 95% CI). Estimates of the fraction of migrants were generally low. Gene flow from the south to the north was slightly larger (0.039 of Ne per migration event, 0.0262–0.0626;

95% CI) than from the north to the south (0.027 of Ne per migration event, 0.0116–0.0416; 95%

CI).

Table 1.1 Demographic parameter estimates from G-PhoCS with their respective 95% high posterior density in brackets.

Model Ne Ne ANC Ne Tdiv MNS MSN SOUTH NORTH

Pure Isola- 400,635 142,483 197,510 92,882 tion [391,00– [140,000– [195,000– [91,300– 409,000] 145,000] 200,000] 93,500]

Isolation- 331,423 125,548 158,923 181,656 0.000515 0.000222 with-migra- [324,000– [122,000– [156,000– [176,000– [0.000479– [0.000204– tion 338,000] 128,000] 161,000] 185,000] 0.000553] 0.000241]

Ne: effective population size; ANC: ancestral population; Tdiv: divergence time (in years); MNS: number of migrants per generation from North to South; MSN: number of migrants per genera- tion from South to North.

20

Figure 1.4. The best-fit demographic model: isolation with migration. The width of the tubes represents the mean effective population sizes in haploid individuals (Ne), the arrows indi- cate pulses of gene flow and the dashed lines indicate times of divergence (TD) and retraction (TR). The northern population shows an estimated Ne of 212,007 individuals (191,058–224,871; 95% CI) until it drops to only 34,375 individuals (15,063–51,289; 95% CI) around 55,303 years ago (44,838–69,337; 95% CI). Estimates for the southern population indicate a steady Ne of 217,765 individuals since the population split (205,310–231,767; 95% CI). Other parameter esti- mates are shown in the figure. Figure based on Provost et al. 2018.

1.4.4. Patterns and correlates of phenotypic variation

Measurements of wing, bill, and tarsus length varied substantially across individuals of I. gularis. k- means clustering algorithm found k = 2 as the best number of phenotypic clusters, corresponding to large and small individuals (Figure 1.5). All measured traits differed significantly among the genetic groups (Kruskal-Wallis chi-squared > 10; p < 0.001 for all traits), but wing length showed the strongest association (Kruskal-Wallis chi-squared = 40.459; p = 1.639 x 10-9; Figure 1.S3).

Variance within these groups, however, was not homogeneous (Fligner-Killeen med chi-squared

= 15.816; p = 0.0003). The southern clade exhibited the largest variation in wing length. As previ- ously reported by Dickerman (2007), we observed a shift to smaller body sizes in the southern

21 clade between longitudes -90 and -94 for all measured traits (Figure 1.5).

The first four axes of our PCA on temperature-related climatic variables (bio1 - bio11) explained over 98% of the variance, with PC1 loading most strongly on variables related to sea- sonality and cold extremes, PC2 on hot extremes, PC3 on diurnal range, and PC4 on annual mean temperature (Table 1.S1). Similarly, the first four axes of the PCA on precipitation-related climatic variables (bio12 - bio19) accounted for over 95% of the variation, with PC1 loading most strongly on seasonality, PC2 on annual precipitation and wet extremes, and PC3 on dry extremes (Table

1.S1). We found a significant association (p < 0.05) between wing length and PC1, PC2 and PC4 of temperature, and PC1, PC2 and PC3 of precipitation, after accounting for sex and genetic an- cestry. Sex and genetic ancestry had a strong effect on wing length in all regression models (p <

0.001). PC2 of temperature and PC3 of precipitation had the strongest association with wing length

(PC2-T model; adjusted R2 = 0.557; PC3-P model; adjusted R2 = 0.527; Figure 1.6).

Table 1.2. Demographic model selection results showing the likelihood of each model in momi2.

Model AIC ΔAIC AIC Weight

Pure isolation 1592.30 14.158 8.421e-04

Isolation with migra- 1578.15 0 1 tion

Isolation with sec- 1587.78 9.635 8.086e-03 ondary contact

22

Figure 1.5. Geographic variation in body size in Icterus gularis. Each colored circle on the map represents a specimen measured for phenotypic traits and its respective phenotypic cluster. The box on the bottom is a zoom-in of the area of shift in body size, showing wing length (in mm) variation along a longitudinal axis for males (black points) and females (purple points). The gray area indicates the area of transition between large-bodied individuals to small-bodied indi- viduals and back to large birds, as described by Dickerman (2007). Longitudes -94 and -90 corre- spond to areas near Tapanatepec, Oaxaca and La Avellana, Guatemala, respectively.

1.4.5. Predictors of genetic and phenotypic differentiation

The multivariate regression model explained 50.3% of the variation in the genetic distance (i.e.,

IBS index) among individuals (Figure 1.7). History (connectivity during the LGM) was the most

23 important predictor, uniquely accounting for 29.19% of the total variance of the model (Figure

1.7). Isolation by history (IBH) was followed by pure geographical distance (IBD), which uniquely accounts for 19.28% of the model variation, and present connectivity (PRES), which uniquely accounts for only 4.42% of the model variation. Pure geographical distance and present connec- tivity were highly correlated, as demonstrated by the high effect size of their second-order varia- bles. After accounting for multicollinearity among predictor variables, only environmental dissim- ilarity (ENV) was nonsignificant (p = 0.789).

Figure 1.6. Plots showing strongest effect sizes for associations between wing length and principal components of the environmental variables. (A) Relationship between wing length (in mm) and the principal component 2 of temperature-related variables, loading more heavily on hot extremes (N = 134; R2 = 0.557; p < 0.001). (B) Relationship between wing length (in mm) and the principal component 3 of precipitation-related variables, loading more strongly on dry extremes (N = 134; R2 = 0.527; p < 0.001).

24

For phenotype, the multivariate regression model explained 5.5% of the total variation in wing length among individuals. Only environmental dissimilarity (ENV; p = 0.003) was a signifi- cant predictor of differences in wing length. Different from the genetic data, phenotypes were best predicted by environmental dissimilarity among localities (59.64% of the model variation; Figure

1.7).

Figure 1.7. Predictors of genome-wide genetic and phenotypic differentiation in Icterus gularis, as revealed by commonality analysis. Each of the 15 commonality coefficients repre- sent the percent variance of the respective distance metric (i.e., IBS for genetic distance and ab- solute difference for phenotypic distance) explained by each set of predictors, that is, their effect size. The percent total, on the other hand, represents the proportion of the total variance ac- counted for in the multivariate model (0.533 for genomic and 0.055 for phenotype) explained by each set of predictors. The confidence intervals were computed through 1,000 bootstrap repli- cates of a random selection of 90% of the samples without replacement. The first four rows show the unique effects (U) of each of the four predictor variables (IBD: geographic distance, PRES: present connectivity, LGM: past connectivity, and ENV: environmental dissimilarity) fol- lowed by second and third order interactions. *: p < 0.01.

25

1.5. Discussion

We found that the predictors of genome-wide genetic and phenotypic variation across the distri- bution of I. gularis differed. Whereas genomic differentiation was best explained by changes in habitat suitability and fragmentation during the Pleistocene (i.e., isolation by history; IBH), body size was best predicted by differences in contemporary environmental factors, such as temperature and precipitation (isolation by environment; IBE). The percentage of explained variation was no- tably higher in the genomic than in the phenotypic metric, suggesting that genome-wide genetic variation can be largely explained by simple metrics that characterize present and past connectivity impacting gene flow regimes across the landscape. In contrast, phenotypic variation is expected to be governed by a suite of more complex factors and interactions (e.g., environmental plasticity, seasonal environmental changes, selective regimes, etc.) that are not fully captured in our abiotic variables. Our findings illustrate how genomic and phenotypic variation can be structured across the landscape by different factors, which provides insight into why phenotypic and genetic data are often in conflict in taxonomic and phylogeographic studies (Campagna et al., 2012, 2017;

Rheindt & Edwards, 2011; Lamichhaney et al., 2015; Chaves et al., 2016).

1.5.1. Phylogeographic structure is best explained by historical barriers

Icterus gularis exhibits both deep and shallow genetic structuring across lowland Middle America.

The most pronounced genetic discontinuity coincides with the Chivela Pass, a narrow mountain gap separating three mountain chains in southern Mexico: The Sierra Madre Oriental, Sierra Madre del Sur and the Chiapas-Guatemala Mountains (Barrier et al., 1998) located at the Isthmus of Te- huantepec region. In this area, elevation drops steeply from 2000 m to 200 m, making it a well circumscribed east-west barrier for montane species (Binford, 1989; Sullivan et al., 2000; García-

Moreno et al., 2004; Klicka et al., 2011; Gutiérrez-Rodríguez et al., 2011; Jiménez & Ornelas 2015;

Manthey et al., 2017), but in lowland species a north-south break is commonly observed (Mulcahy et al., 2006; Vázquez-Miranda et al., 2009; Smith et al., 2011). This finding is surprising considering

26 that the Chivela Pass does not impose a clear physical barrier to dispersal for lowland species. Our demographic modeling estimated that the split across the Chivela Pass occurred during the Middle to Late Pleistocene (124,345–185,500 years ago), and that gene flow continued after the genetic break was formed. Within both north and south clades, we observe further genetic differentiation.

The southern clade is characterized by the highest genetic diversity, but variation appears to follow an isolation by distance pattern. In the north, populations at Tamaulipas and San Luis Potosi are differentiated from populations from Southeastern Mexico. However, given our geographic sam- pling, this pattern could be an artifact of the sampling gap in the state of Veracruz. Therefore, we cannot rule out that isolation by distance also drives divergence within the northern clade.

Our landscape genomics analysis indicates that isolation by history (IBH), represented by paleo connectivity (LGM), was the best predictor of genetic differentiation in I. gularis, uniquely explaining more than 50% of the total model variance. The reconstruction of the climatic niche of

I. gularis revealed a retraction of suitable climatic conditions during the LGM with populations that currently inhabit areas north of Veracruz (e.g., San Luis Potosi, Tamaulipas, and southern Texas) being displaced southwards during glacial periods, when temperatures dropped dramatically. In accordance with these findings, our demographic model supports a strong genetic bottleneck oc- curring in the northern clade around 55,300 years ago (Late Pleistocene), when effective popula- tion size (Ne) declined to less than 20% of its original size. This event falls into the onset of the

Wisconsinan Glaciation, the most recent glacial period in North America (Clague & James, 2002).

This bottleneck is further evidenced by the lower genetic diversity observed in the northern pop- ulations (Figures 1.1 and 1.4c). For instance, individuals from Tamaulipas are represented by a single ND2 haplotype, which indicates recent colonization (Excoffier et al., 2009). The climatic niche model also reveals that the connectivity between the Atlantic and Pacific portion of I. gularis range via the Chivela Pass was lost during the Last Glacial Maximum (~22,000 years ago) but re- established during Mid-Holocene (~6,000 years ago; Figure 1.S6), suggesting momentary isolation between southern and northern populations followed by repeated episodes of reconnection. This

27 dynamic history might explain the major genetic break coinciding with the Chivela Pass and the small level of gene flow detected between the two clades, since the demographic model that best fitted our empirical data was isolation with pulse migration. We note that although our estimates of time of divergence between the northern and southern clade do not fall within the Last Glacial

Maximum (~22,000 years ago), our paleo niche model is likely to reproduce similar changes in distribution that occurred during preceding glacial cycles.

The second-best predictor of genetic differentiation in I. gularis was geographic distance, which uniquely accounts for almost 20% of the remaining model variance. In I. gularis, geographic distance appears to have played a major role in promoting and maintaining genetic differentiation across populations. For instance, a pattern of isolation by distance is noticeable within the southern clade, where admixture proportions change nearly continuously across latitude. Moreover, given that individuals can only disperse through a narrow stretch of lowland, the Chivela Pass might also have functioned as a “funnel”, limiting dispersal from one side to the other. This reduction in dispersion might explain why levels of admixture across the Pass are small, despite geographic continuity.

Despite the marked environmental heterogeneity observed across the species distribution, we find little support for the role of environment in driving genome-wide differentiation in I. gularis, which has been observed in other systems (Hoelzer et al. 2008; Rundell & Price 2009; See- holzer & Brumfield, 2017). The lack of association between genomic differentiation and environ- mental dissimilarity highlights the negligible role that niche evolution plays in promoting genome- wide divergence in I. gularis. If local adaptation is occurring across the species range, natural selec- tion might not have been strong enough to reduce gene flow globally and promote genome-wide population differentiation. Instead, population isolation via historical range fragmentation seems to be the main driver of genetic divergence, as evidenced by our multiple matrix analysis. This finding adds to the large body of studies supporting the role of geographic isolation as the main driver of genetic divergence in birds (Mayr, 1963; reviewed in Price, 2008).

28

1.5.2. Patterns of phenotypic variation are best explained by environment

Phenotypic variation in birds can be shaped by a number of factors, including population history, natural and sexual selection, mimicry, competition, camouflage, and plasticity (Galeotti et al., 2003;

Zamudio et al., 2016). We tested whether body size variation in I. gularis could be explained by neutral processes associated with the species’ population history (i.e., isolation by history and iso- lation by distance) or proxies for processes driven by the local environment, such as local adapta- tion and environmental plasticity (i.e., isolation by environment). In contrast to the genetic data, environmental dissimilarity was the lone significant predictor of phenotypic differentiation in I. gularis. The total amount of model variation explained, however, was low, suggesting that factors other than the ones considered here could be playing a more important role in shaping phenotypic variation. Interestingly, despite the small effect size of environmental dissimilarity in our common- ality analysis, we found a strong linear relationship between wing length (our proxy for body size) and several principal components of climatic variables, after accounting for sex and population ancestry. This difference in explanatory power suggests that the low effect size of environmental dissimilarity on our multiple matrix regression analysis might be due to the non-independence introduced by our pairwise distance transformation. This non-independence is known to lower

Pearson correlations and add non-linearity to regression models when compared to the corre- sponding untransformed node-based data, causing an overall loss in explanatory power (Dow et al., 1987; Legendre & Fortin, 2010; Franckowiak et al., 2017).

Environmental variables found to be strongly associated with wing length largely represent the effect of temperature and precipitation extremes, which are factors considered to strongly in- fluence the evolution of body size in vertebrates (Hamilton, 1961; McNab, 1971; Murphy, 1985;

Ashton, 2002; Yom-Tov & Geffen, 2006). For example, birds and mammals occupying colder regions tend to be larger than their counterparts in regions of warmer climate, a pattern known as

Bergmann’s rule (Bergmann, 1847). This phenomenon has been widely demonstrated in birds both within (Hamilton, 1961; Ashton, 2002) and across species (Olson, et. al 2009). Several physiological

29 mechanisms have been proposed to explain body size patterns, including heat conservation through decreased surface area to mass ratio (Bergmann, 1847) and fasting endurance owing to more fat storage by larger (Boyce, 1979; Lindstedt & Boyce, 1985). Seasonality and pre- cipitation have also been invoked to explain the latitudinal gradient of body size, as larger animals are better able to resist starvation and desiccation (Boyce, 1978, Lindstet & Boyce, 1985, Murphy,

1985; Le Lagadec et al., 1998; Yom-Tov & Geffen, 2006; Olalla-Tárraga et al., 2009). Interestingly, in I. gularis, variation in body size supports a reversal of Bergmann’s rule across its lowland tropical distribution, with individuals in warmer areas showing larger body size than individuals in colder ones. In I. gularis, precipitation is negatively associated with body size, with smaller individuals inhabiting more humid areas.

Ecological factors other than climate might also be important determinants of body size.

Resource availability, population density, and interspecific interactions are known to affect body size of birds and mammals via phenotypic plasticity (Damuth, 1981; McAdam & Boutin, 2004;

Husby et al., 2011). A study on bird populations found that recent declines in body size could not be attributed to genetic changes, but were due to phenotypic plasticity (Husby et al.,

2011). Phenotypic changes peaked during times of food scarcity, suggesting that birds adjusted their body size in relation to changes in food abundance (Husby et al., 2011). Although body size changes of this magnitude are much smaller than those that we observed in I. icterus, fine-scale ecological factors could contribute to the high percentage of unexplained variance in our model.

Only experiments that measure heritability and explicitly test for the effect of a range of ecological predictors on body size of I. gularis will be able to elucidate whether variation is driven by natural selection or phenotypic plasticity.

Our phenotypic data shows that individuals of I. gularis fall into one of two categories.

These categories are significantly associated with genetic ancestry of individuals—northern indi- viduals tend to be smaller and southern individuals larger. However, the southern clade exhibits much more variation in body size, encompassing the entire range of body size in the species (Figure

30

1.S3). This difference was largely due to a noticeable pattern of geographic variation observed in the southern clade, in which populations of large-bodied birds are separated by an intervening population of small-bodied birds between southwest Chiapas and the border of Guatemala and El

Salvador. Similar patterns may arise when there is long-distance dispersal or extinction of inter- vening populations (Cadena et al., 2011), which is not the case for I. gularis, whose small-bodied populations lacks phylogenetic affinity (i.e., populations north of the Chivela Pass are not sister to the southern small-bodied population). Alternatively, this pattern may arise by parallel evolution of phenotypes or retention of ancestral character (Cadena et al., 2011). In I. gularis, the small-bodied populations may have independently evolved small body sizes or retained this ancestral state whereas adjacent populations increased in size. These patterns can also be associated with local adaptation when disjunct populations evolve convergent phenotypes via natural selection (Rem- sen, 1984; Cadena et al., 2011). Our data do not allow us to tease these hypotheses apart, but we found a strong correlation between phenotypic traits and environmental variables that might sug- gest that natural selection played a role in driving spatial patterns of variation in body size. Finally, we cannot rule out the role of environmental plasticity in producing this geographic pattern, via genotype-environment interactions. More detailed research on the population biology of I. gularis could clarify whether any peculiar factors in the area cause a reduction in its body size.

1.6. Conclusion

In sum, we found support for distinct spatial processes shaping genomic versus phenotypic vari- ation in a songbird (I. gularis). We showed that genome-wide genetic variation was primarily driven by demographic processes that took place during the Middle to Late Pleistocene (isolation by history), when populations recurrently contracted to isolated refugia on different sides of the

Chivela Pass. In contrast, phenotypic diversity (as represented by differences in body size) in I. gularis was better predicted by environmental variation (isolation by environment), with individu- als being generally larger in hotter and drier regions. Our finding of historical and environmental

31 processes predicting different axes of intraspecific variation may represent a general explanation of why phenotypic and genetic patterns frequently differ in taxonomic and phylogeographic studies.

1.7. Author contribution

This study was conceived and designed by Lucas Rocha Moreira and Brian Tilston Smith. Sam- ples were collected and made available by Blanca E. Hernandez-Baños. Lucas Rocha Moreira conducted laboratory work, took phenotypic measurements, performed statistical analyses and drafted the paper with input from all authors.

32

1.8. References

Aiello-Lammens, M. E., Boria, R. A., Radosavljevic, A., Vilela, B., & Anderson, R. P. (2015). spThin: an R package for spatial thinning of species occurrence records for use in ecologi- cal niche models. Ecography, 38(5), 541–545. https://doi.org/10.1111/ecog.01132

Araújo, M. B., Nogués-Bravo, D., Diniz-Filho, J. A. F., Haywood, A. M., Valdes, P. J., & Rahbek, C. (2008). Quaternary climate changes explain diversity among reptiles and am- phibians. Ecography, 31(1), 8–15. https://doi.org/10.1111/j.2007.0906-7590.05318.x

Ashton, K. G. (2002). Patterns of within-species body size variation of birds: strong evidence for Bergmann’s rule. Global Ecology and Biogeography, 11(6), 505–523. https://doi.org/10.1046/j.1466-822X.2002.00313.x

Barrier, E., Velasquillo, L., Chavez, M., & Gaulon, R. (1998). Neotectonic evolution of the Isth- mus of Tehuantepec (southeastern Mexico). Tectonophysics, 287(1–4), 77–96. https://doi.org/10.1016/S0040-1951(98)80062-0

Bergmann, C. (1847). About the relationships between heat conservation and body size of ani- mals. Goett Stud, 1, 595–708.

Binford, L. C. (1989). A distributional Survey of the birds of the Mexican state of Oaxaca. AOU Ornithological Monographs, 43, 1–418.

Bradburd, G. S., Ralph, P. L., & Coop, G. M. (2013). Disentangling the effects of geographic and ecological isolation on genetic differentiation. Evolution, 67(11), 3258–3273. https://doi.org/10.1111/evo.12193

Boyce, M. S. (1978). Climatic variability and body size variation in the muskrats (Ondatra zibethi- cus) of North America. Oecologia, 36(1), 1–19. https://doi.org/10.1007/BF00344567

Boyce, M. S. (1979). Seasonality and Patterns of Natural Selection for Life Histories. The Ameri- can Naturalist, 114(4), 569–583. https://doi.org/10.1086/283503

Burri, R., Nater, A., Kawakami, T., Mugal, C. F., Olason, P. I., Smeds, L., … Ellegren, H. (2015). Linked selection and recombination rate variation drive the evolution of the genomic land- scape of differentiation across the speciation continuum of Ficedula flycatchers. Genome Research, 25(11), 1656–1665. https://doi.org/10.1101/gr.196485.115

33

Cabanne, G. S., Trujillo-Arias, N., Calderón, L., D’Horta, F. M., & Miyaki, C. Y. (2014). Pheno- typic evolution of an Atlantic Forest passerine (Xiphorhynchus fuscus): biogeographic and sys- tematic implications. Biological Journal of the Linnean Society, 113(4), 1047–1066. https://doi.org/10.1111/bij.12362

Cadena, C. D., Cheviron, Z. A., & Funk, W. C. (2011). Testing the molecular and evolutionary causes of a “leapfrog” pattern of geographical variation in coloration. Journal of Evolution- ary Biology, 24(2), 402–414. https://doi.org/10.1111/j.1420-9101.2010.02175.x

Campagna, L., Benites, P., Lougheed, S. C., Lijtmaer, D. A., Di Giacomo, A. S., Eaton, M. D., & Tubaro, P. L. (2012). Rapid phenotypic evolution during incipient speciation in a continen- tal avian radiation. Proceedings of the Royal Society B: Biological Sciences, 279(1734), 1847–1856. https://doi.org/10.1098/rspb.2011.2170

Campagna, L., Repenning, M., Silveira, L. F., Fontana, C. S., Tubaro, P. L., & Lovette, I. J. (2017). Repeated divergent selection on pigmentation genes in a rapid finch radiation. Sci- ence Advances, 3(5), e1602404. https://doi.org/10.1126/sciadv.1602404

Canty, A., & Ripley, B. (2012). boot: Bootstrap R (S-Plus) functions.

Chaves, J. A., Cooper, E. A., Hendry, A. P., Podos, J., De León, L. F., Raeymaekers, J. A. M., … Uy, J. A. C. (2016). Genomic variation at the tips of the adaptive radiation of Darwin’s finches. Molecular Ecology, 25(21), 5282–5295. https://doi.org/10.1111/mec.13743

Chhatre, V. E., & Emerson, K. J. (2017). StrAuto: automation and parallelization of STRUC- TURE analysis. BMC Bioinformatics, 18(1), 192. https://doi.org/10.1186/s12859-017- 1593-0

Clague, J. J., & James, T. S. (2002). History and isostatic effects of the last ice sheet in southern British Columbia. Quaternary Science Reviews, 21(1–3), 71–87. https://doi.org/10.1016/S0277-3791(01)00070-1

Coop, G., Witonsky, D., Di Rienzo, A., & Pritchard, J. K. (2010). Using environmental correla- tions to identify loci underlying local adaptation. Genetics, 185(4), 1411–1423. https://doi.org/10.1534/genetics.110.114819

Damuth, J. (1981). Population density and body size in mammals. Nature, 290(5808), 699–700. https://doi.org/10.1038/290699a0

34

Davis, M. B. (2001). Range shifts and adaptive responses to Quaternary climate change. Science, 292(5517), 673–679. https://doi.org/10.1126/science.292.5517.673

Dickerman, R. W. (2007). Birds of the Southern Pacific Lowlands of Guatemala with a review of Icterus gularis. Special Publication of the Museum of Southwestern Biology, (7), 1–45.

Dow, M. M., Cheverud, J. M., & Friedlaender, J. S. (1987). Partial correlation of distance matri- ces in studies of population structure. American Journal of Physical Anthropology, 72(3), 343–352. https://doi.org/10.1002/ajpa.1330720307

Drummond, A. J., & Bouckaert, R. R. (2014). Bayesian evolutionary analysis with BEAST 2.

Drummond, A. J., Suchard, M. A., Xie, D., & Rambaut, A. (2012). Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution, 29(8), 1969–1973. https://doi.org/10.1093/molbev/mss075

Eaton, D. A. R., & Overcast, I. (2016). ipyrad: interactive assembly and analysis of RADseq data sets.

Ellegren, H., & Galtier, N. (2016). Determinants of genetic diversity. Nature Reviews Genetics, 17(7), 422–433. https://doi.org/10.1038/nrg.2016.58

Elith, J., Phillips, S. J., Hastie, T., Dudík, M., Chee, Y. E., & Yates, C. J. (2011). A statistical ex- planation of MaxEnt for ecologists. Diversity and Distributions, 17(1), 43–57. https://doi.org/10.1111/j.1472-4642.2010.00725.x

Evanno, G., Regnaut, S., & Goudet, J. (2005). Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology, 14(8), 2611– 2620. https://doi.org/10.1111/j.1365-294X.2005.02553.x

Excoffier, L., Foll, M., & Petit, R. J. (2009). Genetic consequences of range expansions. Annual Review of Ecology, Evolution, and Systematics, 40(1), 481–501. https://doi.org/10.1146/annurev.ecolsys.39.110707.173414

Feder, J. L., Egan, S. P., & Nosil, P. (2012). The genomics of speciation-with-gene-flow. Trends in Genetics, 28(7), 342–350. https://doi.org/10.1016/j.tig.2012.03.009

Franckowiak, R. P., Panasci, M., Jarvis, K. J., Acuña-Rodriguez, I. S., Landguth, E. L., Fortin, M.- J., & Wagner, H. H. (2017). Model selection with multiple regression on distance matrices

35

leads to incorrect inferences. PLOS ONE, 12(4), e0175194. https://doi.org/10.1371/jour- nal.pone.0175194

Galeotti, P., Rubolini, D., Dunn, P. O., & Fasola, M. (2003). Colour polymorphism in birds: causes and functions. Journal of Evolutionary Biology, 16(4), 635–646. https://doi.org/10.1046/j.1420-9101.2003.00569.x

García-Moreno, J., Navarro-Sigüenza, A. G., Peterson, A. T., & Sánchez-González, L. A. (2004). Genetic variation coincides with geographic structure in the common bush-tanager (Chloro- spingus ophthalmicus) complex from Mexico. Molecular Phylogenetics and Evolution, 33(1), 186–196. https://doi.org/10.1016/j.ympev.2004.05.007

Gossmann, T. I., Santure, A. W., Sheldon, B. C., Slate, J., & Zeng, K. (2014). Highly variable re- combinational landscape modulates efficacy of natural selection in birds. Genome Biology and Evolution, 6(8), 2061–2075. https://doi.org/10.1093/gbe/evu157

Gould, S. J., & Johnston, R. F. (1972). Geographic Variation. Annual Review of Ecology and Systematics, 3(1), 457–498. https://doi.org/10.1146/annurev.es.03.110172.002325

Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G., & Siepel, A. (2011). Bayesian inference of an- cient human demography from individual genome sequences. Nature Genetics, 43(10), 1031–1034. https://doi.org/10.1038/ng.937

Gruber, K., Schöning, C., Otte, M., Kinuthia, W., & Hasselmann, M. (2013). Distinct subspecies or phenotypic plasticity? Genetic and morphological differentiation of mountain honey bees in East Africa. Ecology and Evolution, 3(10), 3204–3218. https://doi.org/10.1002/ece3.711

Gutiérrez-Rodríguez, C., Ornelas, J. F., & Rodríguez-Gómez, F. (2011). Chloroplast DNA phy- logeography of a distylous shrub (Palicourea padifolia, Rubiaceae) reveals past fragmentation and demographic expansion in Mexican cloud forests. Molecular Phylogenetics and Evolu- tion, 61(3), 603–615. https://doi.org/10.1016/j.ympev.2011.08.023

Hamilton, T. H. (1961). The adaptive significances of intraspecific trends of variation in wing length and body size among bird species. Evolution, 15(2), 180. https://doi.org/10.2307/2406079

Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G., & Jarvis, A. (2005). Very high resolution

36

interpolated climate surfaces for global land areas. International Journal of Climatology, 25(15), 1965–1978. https://doi.org/10.1002/joc.1276

Hoelzer, G. A., Drewes, R., Meier, J., & Doursat, R. (2008). Isolation-by-distance and outbreed- ing depression are sufficient to drive parapatric speciation in the absence of environmental influences. PLoS Computational Biology, 4(7), e1000126. https://doi.org/10.1371/jour- nal.pcbi.1000126

Holderegger, R., Kamm, U., & Gugerli, F. (2006). Adaptive vs. neutral genetic diversity: implica- tions for landscape genetics. Landscape Ecology, 21(6), 797–807. https://doi.org/10.1007/s10980-005-5245-9

Howell, S. N. G., & Webb, S. (1995). A guide to the birds of Mexico and northern Central America. Oxford University Press.

Husby, A., Hille, S. M., & Visser, M. E. (2011). Testing mechanisms of Bergmann’s Rule: pheno- typic decline but no genetic change in body size in three passerine bird populations. The American Naturalist, 178(2), 202–213. https://doi.org/10.1086/660834

Janes, J. K., Miller, J. M., Dupuis, J. R., Malenfant, R. M., Gorrell, J. C., Cullingham, C. I., & An- drew, R. L. (2017). The K = 2 conundrum. Molecular Ecology, 26(14), 3594–3602. https://doi.org/10.1111/mec.14187

Jiménez, R. A., & Ornelas, J. F. (2015). Historical and current introgression in a Mesoamerican hummingbird species complex: a biogeographic perspective. PeerJ, 4, e1556. https://doi.org/10.7717/peerj.1556

Kamm, J. A., Terhorst, J., Durbin, R., & Song, Y. S. (2018). Efficiently inferring the demographic history of many populations with allele count data. BioRxiv, 1–29. https://doi.org/10.1101/287268

Kassambara, A., & Mundt, F. (2017). factoextra: extract and visualize the results of multivariate data analyses. Retrieved from https://cran.r-project.org/package=factoextra

Keller, S. R., & Taylor, D. R. (2008). History, chance and adaptation during biological invasion: separating stochastic phenotypic evolution from response to selection. Ecology Letters, 11(8), 852–866. https://doi.org/10.1111/j.1461-0248.2008.01188.x

37

Klicka, J., Spellman, G. M., Winker, K., Chua, V., & Smith, B. T. (2011). A phylogeographic and population genetic analysis of a widespread, sedentary North American bird: the Hairy Woodpecker (Picoides villosus). The Auk, 128(2), 346–362. https://doi.org/10.1525/auk.2011.10264

Kopelman, N. M., Mayzel, J., Jakobsson, M., Rosenberg, N. A., & Mayrose, I. (2015). Clumpak: A program for identifying clustering modes and packaging population structure inferences across K. Molecular Ecology Resources, 15(5), 1179–1191. https://doi.org/10.1111/1755- 0998.12387

Kumar, S., Stecher, G., Li, M., Knyaz, C., & Tamura, K. (2018). MEGA X: Molecular evolution- ary genetics analysis across computing platforms. Molecular Biology and Evolution, 35(6), 1547–1549. https://doi.org/10.1093/molbev/msy096

Lamichhaney, S., Berglund, J., Almén, M. S., Maqbool, K., Grabherr, M., Martinez-Barrio, A., … Andersson, L. (2015). Evolution of Darwin’s finches and their beaks revealed by genome sequencing. Nature, 518(February), 371–375. https://doi.org/10.1038/nature14181

Lande, R. (1980). Genetic variation and phenotypic evolution during allopatric speciation. The American Naturalist, 116(4), 463–479. https://doi.org/10.1086/283642

Landry, C. R., & Aubin-Horth, N. (2014). Ecological Genomics. Advances in Experimental Medicine and Biology, 781, 1–5. https://doi.org/10.1007/978-94-007-7347-9

Le Corre, V., & Kremer, A. (2012). The genetic differentiation at quantitative trait loci under lo- cal adaptation. Molecular Ecology, 21(7), 1548–1566. https://doi.org/10.1111/j.1365- 294X.2012.05479.x

Le Lagadec, M. D., Chown, S. L., & Scholtz, C. H. (1998). Desiccation resistance and water bal- ance in southern African keratin beetles (Coleoptera, Trogidae): the influence of body size and habitat. Journal of Comparative Physiology B: Biochemical, Systemic, and Environ- mental Physiology, 168(2), 112–122. https://doi.org/10.1007/s003600050127

Legendre, P., & Fortin, M.-J. (2010). Comparison of the Mantel test and alternative approaches for detecting complex multivariate relationships in the spatial analysis of genetic data. Mo- lecular Ecology Resources, 10(5), 831–844. https://doi.org/10.1111/j.1755- 0998.2010.02866.x

38

Legendre, P., Lapointe, F.-J., & Casgrain, P. (1994). Modeling brain evolution from behavior: a permutational regression approach. Evolution, 48(5), 1487. https://doi.org/10.2307/2410243

Leigh, J. W., & Bryant, D. (2015). POPART: full-feature software for haplotype network con- struction. Methods in Ecology and Evolution, (6), 1110–1116. https://doi.org/10.1111/2041-210X.12410

Lindstedt, S. L., & Boyce, M. S. (1985). Seasonality, fasting endurance, and body size in mam- mals. The American Naturalist, 125(6), 873–878. https://doi.org/10.1086/284385

Manthey, J. D., Geiger, M., & Moyle, R. G. (2017). Relationships of morphological groups in the northern flicker superspecies complex (Colaptes auratus & C. chrysoides). Systematics and Bio- diversity, 15(3), 183–191. https://doi.org/10.1080/14772000.2016.1238020

Marmion, M., Parviainen, M., Luoto, M., Heikkinen, R. K., & Thuiller, W. (2009). Evaluation of consensus methods in predictive species distribution modelling. Diversity and Distribu- tions, 15(1), 59–69. https://doi.org/10.1111/j.1472-4642.2008.00491.x

Mayr, E. (1942). Systematics and the Origin of Species. New York: Columbia University Press.

Mayr, E. (1963). species and evolution. Cambridge, MA: Harvard University Press.

McAdam, A. G., & Boutin, S. (2004). Maternal effects and the response to selection in red squir- rels. Proceedings of the Royal Society of London. Series B: Biological Sciences, 271(1534), 75–79. https://doi.org/10.1098/rspb.2003.2572

McNab, B. K. (1971). On the ecological significance of Bergmann’s rule. Ecology, 52(5), 845– 854. https://doi.org/10.2307/1936032

McRae, B. H., & Beier, P. (2007). Circuit theory predicts gene flow in plant and animal popula- tions. Proceedings of the National Academy of Sciences, 104(50), 19885–19890. https://doi.org/10.1073/pnas.0706568104

Mitchell-Olds, T., Willis, J. H., & Goldstein, D. B. (2007). Which evolutionary processes influ- ence natural genetic variation for phenotypic traits? Nature Reviews Genetics, 8(11), 845– 856. https://doi.org/10.1038/nrg2207

Mooers, A. (1994). Metabolic rate, generation time, and the rate of molecular evolution in birds.

39

Molecular Phylogenetics and Evolution, 3(4), 344–350. https://doi.org/10.1006/mpev.1994.1040

Mulcahy, D. G., Morrill, B. H., & Mendelson, J. R. (2006). Historical biogeography of lowland species of toads (Bufo) across the trans-Mexican neovolcanic belt and the Isthmus of Te- huantepec. Journal of Biogeography, 33(11), 1889–1904. https://doi.org/10.1111/j.1365- 2699.2006.01546.x

Murphy, E. C. (1985). Bergmann’s rule, seasonality, and geographic variation in body size of house sparrows. Evolution, 39(6), 1327–1334. https://doi.org/10.1111/j.1558- 5646.1985.tb05698.x

Muscarella, R., Galante, P. J., Soley-Guardia, M., Boria, R. A., Kass, J. M., Uriarte, M., & Ander- son, R. P. (2014). ENMeval: An R package for conducting spatially independent evalua- tions and estimating optimal model complexity for MaxEnt ecological niche models. Meth- ods in Ecology and Evolution, 5(11), 1198–1205. https://doi.org/10.1111/2041- 210X.12261

Myers, E. A., Xue, A. T., Gehara, M., Cox, C., Davis Rabosky, A. R., Lemos‐Espinal, J., … Bur- brink, F. T. (2019). Environmental heterogeneity and not vicariant biogeographic barriers generate community‐wide population structure in desert‐adapted snakes. Molecular Ecol- ogy, mec.15182. https://doi.org/10.1111/mec.15182

Nimon, K., Oswald, F., & Roberts, J. K. (2013). yhat: Interpreting regression effects. R package version 2.0-0. Retrieved from Https://CRAN.R-Project.Org/Package=yhat.

Nonaka, E., Svanbäck, R., Thibert-Plante, X., Englund, G., & Brännström, Å. (2015). Mecha- nisms by which phenotypic plasticity affects adaptive divergence and ecological speciation. The American Naturalist, 186(5), E126–E143. https://doi.org/10.1086/683231

Olalla-Tárraga, M. Á., Diniz-Filho, J. A. F., Bastos, R. P., & Rodríguez, M. Á. (2009). Geographic body size gradients in tropical regions: water deficit and anuran body size in the Brazilian Cerrado. Ecography, 32(4), 581–590. https://doi.org/10.1111/j.1600-0587.2008.05632.x

Olson, V. A., Davies, R. G., Orme, C. D. L., Thomas, G. H., Meiri, S., Blackburn, T. M., … Ben- nett, P. M. (2009). Global biogeography and ecology of body size in birds. Ecology Letters, 12(3), 249–259. https://doi.org/10.1111/j.1461-0248.2009.01281.x

40

Omland, K. E., Lanyon, S. M., & Fritz, S. J. (1999). A molecular phylogeny of the New World Orioles (Icterus): the importance of dense taxon sampling. Molecular Phylogenetics and Evolution, 12(2), 224–239. https://doi.org/10.1006/mpev.1999.0611

Orr MR, Smith TB (1998) Ecology and speciation. Trends in Ecology & Evolution, 13, 502–506.

Peterson, B. K., Weber, J. N., Kay, E. H., Fisher, H. S., & Hoekstra, H. E. (2012). Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS ONE, 7(5), e37135. https://doi.org/10.1371/jour- nal.pone.0037135

Pfennig, D. W., Wund, M. A., Snell-Rood, E. C., Cruickshank, T., Schlichting, C. D., & Moczek, A. P. (2010). Phenotypic plasticity’s impacts on diversification and speciation. Trends in Ecology & Evolution, 25(8), 459–467. https://doi.org/10.1016/j.tree.2010.05.006

Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190(3–4), 231–259. https://doi.org/10.1016/j.ecolmodel.2005.03.026

Powell, A. F. L. A., Barker, F. K., Lanyon, S. M., Burns, K. J., Klicka, J., & Lovette, I. J. (2014). A comprehensive species-level molecular phylogeny of the New World blackbirds (Icteri- dae). Molecular Phylogenetics and Evolution, 71(1), 94–112. https://doi.org/10.1016/j.ympev.2013.11.009

Price, T. (2008). Speciation in birds. Roberts and Company Publishers.

Pritchard, J. K., Stephens, M., & Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics, 155(2), 945–959.

Prunier, J. G., Colyn, M., Legendre, X., Nimon, K. F., & Flamand, M. C. (2015). Multicollinearity in spatial genetics: separating the wheat from the chaff using commonality analyses. Molec- ular Ecology, 24(2), 263–283. https://doi.org/10.1111/mec.13029

Puechmaille, S. J. (2016). The program structure does not reliably recover the correct population structure when sampling is uneven: Subsampling and new estimators alleviate the problem. Molecular Ecology Resources, 16(3), 608–627. https://doi.org/10.1111/1755-0998.12512

Rambaut, A., Suchard, M. ., Xie, D., & Drummond, A. J. (2016). Tracer v1. 6. Retrieved from

41

http://tree.bio.ed.ac.uk/software/tracer/

Rand, A. L. (1961). Wing length as an indicator of weight: a contribution. Bird-Banding, 32(2), 71. https://doi.org/10.2307/4510860

Rausher, M. D., & Delph, L. F. (2015). Commentary: When does understanding phenotypic evo- lution require identification of the underlying genes? Evolution, 69(7), 1655–1664. https://doi.org/10.1111/evo.12687

Remsen, J. V. (1984). High incidence of “leapfrog” pattern of geographic variation in Andean birds: implications for the speciation process. Science, 224(4645), 171–173. https://doi.org/10.1126/science.224.4645.171

Rheindt, F. E., & Edwards, S. V. (2011). Genetic Introgression: an integral but neglected compo- nent of speciation in birds. The Auk, 128(4), 620–632. https://doi.org/10.1525/auk.2011.128.4.620

Rosen, Z., Bhaskar, A., Roch, S., & Song, Y. S. (2018). Geometry of the sample frequency spec- trum and the perils of demographic inference. Genetics, 210(2), 665–682. https://doi.org/10.1534/genetics.118.300733

Rundell, R. J., & Price, T. D. (2009). Adaptive radiation, nonadaptive radiation, ecological specia- tion and nonecological speciation. Trends in Ecology & Evolution, 24(7), 394–399. https://doi.org/10.1016/j.tree.2009.02.007

Samuk, K., Owens, G. L., Delmore, K. E., Miller, S. E., Rennison, D. J., & Schluter, D. (2017). Gene flow and selection interact to promote adaptive divergence in regions of low recom- bination. Molecular Ecology, 26(17), 4378–4390. https://doi.org/10.1111/mec.14226

Sakamoto, Y., Ishiguro, M., & Kitagawa, G. (1986). Akaike information criterion statistics. Dor- drecht, The Netherlands: D. Reidel.

Savolainen, O., Lascoux, M., & Merilä, J. (2013). Ecological genomics of local adaptation. Nature Reviews. Genetics, 14(11), 807–820. https://doi.org/10.1038/nrg3522

Seeholzer, G. F., & Brumfield, R. T. (2017). Isolation by distance, not incipient ecological specia- tion, explains genetic differentiation in an Andean songbird (Aves: Furnariidae: Cranioleuca

42

antisiensis, Line-cheeked Spinetail) despite near threefold body size change across an envi- ronmental gradient. Molecular Ecology, (May), 1–18. https://doi.org/10.1111/mec.14429

Sexton, J. P., Hangartner, S. B., & Hoffmann, A. A. (2014). Genetic isolation by environment or distance: which pattern of gene flow is most common? Evolution; International Journal of Organic Evolution, 68(1), 1–15. https://doi.org/10.1111/evo.12258

Shafer, A. B. A., & Wolf, J. B. W. (2013). Widespread evidence for incipient ecological specia- tion: A meta-analysis of isolation-by-ecology. Ecology Letters, 16(7), 940–950. https://doi.org/10.1111/ele.12120

Schneider CJ, Smith TB, Larison B, Moritz C (1999) A test of alternative models of diversifica- tion in tropical rainforests: ecological gradients vs. rainforest refugia. Proceedings of the National Academy of Sciences, 96, 13869–13873.

Smeds, L., Qvarnström, A., & Ellegren, H. (2016). Direct estimate of the rate of germline muta- tion in a bird. Genome Research, 26(9), 1211–1218. https://doi.org/10.1101/gr.204669.116

Smith, B. T., Escalante, P., Hernández Baños, B. E., Navarro-Sigüenza, A. G., Rohwer, S., & Klicka, J. (2011). The role of historical and contemporary processes on phylogeographic structure and genetic diversity in the Northern Cardinal, Cardinalis cardinalis. BMC Evolu- tionary Biology, 11(1), 136. https://doi.org/10.1186/1471-2148-11-136

Smith, B. T., McCormack, J. E., Cuervo, A. M., Hickerson, M. J., Aleixo, A., Cadena, C. D., … Brumfield, R. T. (2014). The drivers of tropical speciation. Nature, 515(7527), 1–8. https://doi.org/10.1038/nature13687

Stamatakis, A. (2014). RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9), 1312–1313. https://doi.org/10.1093/bioinfor- matics/btu033

Sullivan, J., Arellano, E., & Rogers, D. S. (2000). Comparative phylogeography of Mesoamerican highland rodents: concerted versus independent response to past climatic fluctuations. The American Naturalist, 155(6), 755–768. https://doi.org/10.1086/303362

Tigano, A., & Friesen, V. L. (2016). Genomics of local adaptation with gene flow. Molecular Ecology, 25(10), 2144–2164. https://doi.org/10.1111/mec.13606

43

Vasconcellos, M. M., Colli, G. R., Weber, J. N., Ortiz, E. M., Rodrigues, M. T., & Cannatella, D. C. (2019). Isolation by instability: Historical climate change shapes population structure and genomic divergence of treefrogs in the Neotropical Cerrado savanna. Molecular Eco- logy, 28(7), 1748–1764. https://doi.org/10.1111/mec.15045

Vázquez-Miranda, H., Navarro-Sigüenza, A. G., & Omland, K. E. (2009). Phylogeography of the Rufous-Naped Wren (Campylorhynchus rufinucha): Speciation and Hybridization in Mesoam- erica. The Auk, 126(4), 765–778. https://doi.org/10.1525/auk.2009.07048

Wang, I. J., & Summers, K. (2010). Genetic structure is correlated with phenotypic divergence rather than geographic isolation in the highly polymorphic strawberry poison-dart frog. Molecular Ecology, 19(3), 447–458. https://doi.org/10.1111/j.1365-294X.2009.04465.x

Wang, I. J., Glor, R. E., & Losos, J. B. (2013). Quantifying the roles of ecology and geography in spatial genetic divergence. Ecology Letters, 16(2), 175–182. https://doi.org/10.1111/ele.12025

Wang, I. J., & Bradburd, G. S. (2014). Isolation by environment. Molecular Ecology, 23(23), 5649–5662. https://doi.org/10.1111/mec.12938

Weber, J. N., Bradburd, G. S., Stuart, Y. E., Stutz, W. E., & Bolnick, D. I. (2017). Partitioning the effects of isolation by distance, environment, and physical barriers on genomic diver- gence between parapatric threespine stickleback. Evolution, 71(2), 342–356. https://doi.org/10.1111/evo.13110

Weir, J. T., & Schluter, D. (2008). Calibrating the avian molecular clock. Molecular Ecology, 17(10), 2321–2328. https://doi.org/10.1111/j.1365-294X.2008.03742.x

Winger, B. M., & Bates, J. M. (2015). The tempo of trait divergence in geographic isolation: Avian speciation across the Marañon Valley of Peru. Evolution, 69(3), 772–787. https://doi.org/10.1111/evo.12607

Wright, S. (1943). Isolation by distance. Genetics, 28(2), 114–138.

Wu, C.-I. (2001). The genic view of the process of speciation. Journal of Evolutionary Biology, 14(6), 851–865. https://doi.org/10.1046/j.1420-9101.2001.00335.x

Yom-Tov, Y., & Geffen, E. (2006). Geographic variation in body size: the effects of ambient

44

temperature and precipitation. Oecologia, 148(2), 213–218. https://doi.org/10.1007/s00442-006-0364-9

Zamudio, K. R., Bell, R. C., & Mason, N. A. (2016). Phenotypes in phylogeography: Species’ traits, environmental variation, and vertebrate diversification. Proceedings of the National Academy of Sciences, 113(29), 8041–8048. https://doi.org/10.1073/pnas.1602237113

Zheng, X., Levine, D., Shen, J., Gogarten, S. M., Laurie, C., & Weir, B. S. (2012). A high-perfor- mance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics, 28(24), 3326–3328. https://doi.org/10.1093/bioinformatics/bts606

Zink, R. M., Klicka, J., & Barber, B. R. (2004). The tempo of avian diversification during the Quaternary. Philosophical Transactions of the Royal Society B: Biological Sciences, 359(1442), 215–220. https://doi.org/10.1098/rstb.2003.1392

45

1.9. Supplemental Material

Figure 1.S1. Geographic distribution and mtDNA (ND2) diversity in Icterus gularis. Yellow- shaded area shows the geographic range of the species. Each dot represents a sampling location with a pie chart illustrating the frequency of the 19 ND2 haplotypes in that location. The diame- ter of the pie chart is scaled to the sample size. The median-joining haplotype network represents the relationship between all haplotypes. Each circle illustrates a unique haplotype with the area of circle proportional to the haplotype frequency. Bars connecting circles indicate the number of mutational steps or missing haplotypes.

46

Figure 1.S2. Bayesian timetree of 19 ND2 haplotypes of Icterus gularis, plus I. auratus and I. nigrogularis as an outgroup. Values along branches indicate Bayesian posterior proba- bilities. Time scale bar is measured in millions of years before present (Mya). Colored circles cor- respond to the color associated with different haplotypes in figure 1.S1.

47

Figure 1.S3. Boxplot of body size, as measured by wing length (in mm), in three genetic groups detected in Icterus gularis.

48

Figure 1.S4 Predictors of genetic differentiation in Icterus gularis, as reveal by the com- monality analysis. Each of the 31 commonality coefficients represent the percent variance of the genetic distance (IBS) explained by each set of predictors, that is, their effect size. The per- cent total, on the other hand, represents the proportion of the total variance accounted for in the multivariate model (0.523) explained by each set of predictors. The confidence intervals were computed through 1,000 bootstrap replicates of a random selection of 90% of the samples with- out replacement. The first four rows show the unique effects (U) of each of the four predictor variables (IBD: geographic distance, PRES: present connectivity, LGM: past connectivity, TEMP: temperature dissimilarity, and PRE: precipitation dissimilarity) followed by higher order interactions.

49

Figure 1.S5. Predictors of phenotypic differentiation in Icterus gularis, as reveal by the commonality analysis. Each of the 31 commonality coefficients represent the percent variance of the phenotypic distance explained by each set of predictors, that is, their effect size. The per- cent total, on the other hand, represents the proportion of the total variance accounted for in the multivariate model (0.043) explained by each set of predictors. The confidence intervals were computed through 1,000 bootstrap replicates of a random selection of 90% of the samples with- out replacement. The first four rows show the unique effects (U) of each of the four predictor variables (IBD: geographic distance, PRES: present connectivity, LGM: past connectivity, TEMP: temperature dissimilarity, and PRE: precipitation dissimilarity) followed by higher order interactions.

50

Figure 1.S6. Predicted current and past distribution of Icterus gularis in Mesoamerica in MaxEnt. Colors correspond to values of climatic suitability, ranging from low (cold colors) to high (warm colors). LGM: Last Glacial Maximum (~22 kya); Mid-Holocene (~6 kys).

51

Table 1.S1. Loadings of the 19 Bioclim environmental variables on the first four principal component (PC) axes of temperature and precipitation, with their respective proportion of variance explained.

Temperature Temperature Temperature Temperature Precipitation Precipitation Precipitation Precipitation Variable PC1 PC2 PC3 PC4 PC1 PC2 PC3 PC3 Annual Mean Temperature 0.340 0.137 0.218 -0.566 Mean Diurnal Range -0.064 0.065 0.795 0.270 Isothermality 0.404 0.049 0.271 0.316 Temperature Seasonality -0.432 -0.054 -0.065 -0.306 Max Temperature of Warmest Month -0.132 -0.456 0.295 -0.078 Temperature Min Temperature of Coldest Month 0.345 -0.328 -0.110 0.081 Temperature Annual Range -0.421 -0.010 0.312 -0.132 Mean Temperature of Wettest Quarter -0.149 -0.470 0.039 -0.232 Mean Temperature of Driest Quarter 0.257 -0.405 -0.072 0.079 Mean Temperature of Warmest Quarter 0.107 -0.499 0.071 -0.050 Mean Temperature of Coldest Quarter 0.342 0.148 0.186 -0.567

Annual Precipitation -0.206 0.655 0.366 -0.313

Precipitation of Wettest Month -0.340 0.380 -0.243 0.505 52 Precipitation of Driest Month 0.425 0.123 0.075 -0.171 Precipitation Seasonality 0.444 -0.044 0.174 -0.073 Precipitation Precipitation of Wettest Quarter 0.296 0.626 -0.105 -0.017 Precipitation of Driest Quarter -0.214 -0.105 0.855 0.222 Precipitation of Warmest Quarter 0.443 -0.019 0.139 -0.022 Precipitation of Coldest Quarter 0.366 0.086 0.099 0.750 Percent of variance 0.433 0.338 0.129 0.084 0.595 0.168 0.123 0.067 explained

Chapter 2

DEMOGRAPHY AND LINKED SELECTION INTERACT TO SHAPE THE GENOMIC LANDSCAPE OF CODISTRIBUTED WOODPECKERS DURING THE ICE AGE

*L.R. Moreira is the lead author on a version of this manuscript in preparation for publication.

53

2.1. Abstract

The glacial cycles of the Pleistocene had a global impact on the evolution of species. The influence of genetic drift on population genetic dynamics is well understood, but the role of selection in shaping patterns of genomic variation during dramatic climatic changes are less clear. In this study, we used whole genome data to investigate the interplay between demography and natural selection and their influence on the genomic landscape of Downy and Hairy Woodpecker, two codistributed species whose populations have been strongly affected by glaciations. We tested whether levels of nucleotide diversity along the genome are correlated with intrinsic genomic properties, such as recombination rate and gene density, and whether different demographic trajectories had an im- pact on the efficacy of natural selection. As expected, our results reveal a dynamic population history for Downy and Hairy Woodpecker, with repeated cycles of bottleneck and expansion, and genetic structure associated with glacial refugia. We found substantial variation in levels of nucle- otide diversity in the genome of Downy and Hairy Woodpecker, but this variation was highly correlated between the two species, suggesting the presence of conserved genomic features. Nu- cleotide diversity in both species was positively correlated with recombination rate and negatively correlated with gene density, suggesting that linked selection played a role reducing diversity in regions of low recombination and high density of targets of selection. Despite strong temporal fluctuations in Ne, our demographic analyses indicate that Downy and Hairy Woodpecker were able to maintain relatively large effective population sizes during glaciations, which might have favored natural selection. However, we found evidence that the magnitude of the effect of linked selection was modulated by the individual demographic trajectory of populations and species, such that purifying selection has been more efficient in removing deleterious alleles in Hairy Wood- pecker owing to its larger long-term Ne. These results highlight the complexity of understanding the impact of natural selection in organisms with fluctuating demographic dynamics and large ef- fective population sizes during Ice Age climatic fluctuations.

54

2.2. Introduction

Pleistocene glacial cycles altered the distribution and evolution of entire communities

(Hewitt 2000; Hewitt 2004). Despite the profound impact glaciations had on the evolutionary tra- jectory of species, the majority of research on the topic has focused on how demographic dynamics have shaped genome-wide neutral variation (Hewitt 2004; Nadachowska-Brzyska et al. 2015). Pop- ulation expansion (Lessa et al. 2003; Burbrink et al. 2016), genetic structuring in refugia (Knowles

2001; Zink et al. 2004; Anderson et al. 2006; Waltari et al. 2007; Shafer et al. 2010), and decreased diversity in expanding populations (Campbell-Staton et al. 2012; Pulgarín-R and Burg 2012; Reid et al. 2018) are among the most common patterns recovered. However, as species rapidly expanded and colonized areas under extreme environmental change they would have been subject to strong selective pressures, such as increased tolerance to cold and selection against deleterious mutations

(Davis, 2001; Gossmann et al., 2019). Understanding how natural selection, along with genetic drift, interact with features of the genome to shape the genomic landscape of diversity and differ- entiation will clarify the broader significance of the Ice Age on the evolution of species.

Demography and natural selection play a central role shaping levels of genetic diversity, but their effects are intertwined (Li et al. 2012; Kern and Hahn 2018; Jensen et al. 2019). Neutral genetic diversity in a population (θ) is the product of the rate at which new alleles are generated

(i.e., mutation rate μ) by its Ne (θ = 4Neμ), so that diversity levels are predicted to be correlated with population size (Kimura and Crow 1964; Kimura 1983). However, in large populations, se- lection tends to be more efficient. Fixation of beneficial allele (selective sweep; Maynard and Haigh

2007; Cutter and Choi 2010) or removal of deleterious mutations (background selection; Charles- worth et al. 1993; Cutter and Choi 2010; Cutter and Payseur 2013; Comeron 2014) can cause genetic diversity to decrease across the genome through the effect of linked selection (Cutter and

Payseur 2013). Demographic perturbations that cause Ne to fluctuate over time and space (e.g., glacial bottlenecks) are, therefore, expected to result in a larger accumulation of mildly deleterious

55

alleles when compared to populations with constant Ne because of the reduced efficacy of purify- ing selection when genetic drift is strong (Henn et al. 2016; Willi et al. 2018; Wang et al. 2018;

Rougemont et al. 2020; de Pedro et al. 2021). Hence, populations resulting from founder events, such as at the leading edge of a postglacial expansion, often show elevated genetic load (Willi et al.

2018; Mattila et al. 2019; de Pedro et al. 2021). The outcome of these drift-selection dynamics, along with genomic features, is a highly heterogeneous landscape, with certain regions of the ge- nome being more diverse or differentiated than others.

Levels of diversity and differentiation along the genome vary due to the differing effects of intrinsic genomic properties (Begun and Aquadro 1992; Gossmann et al. 2011; Dutoit et al.

2017; Stankowski et al. 2019; Wang et al. 2020). Genome features such as variation in mutation rate, recombination rate, distribution of functional elements, and nucleotide composition impact the rates at which genetic variants are produced, maintained, and lost (Talla et al. 2019). Regions enriched for functional elements (e.g., coding sequences), for instance, tend to exhibit significantly lower levels of genetic diversity due to the recurrent effect of natural selection (Andolfatto 2007;

Beissinger et al. 2015; Branca et al. 2011; Gossmann et al. 2011). The loss of variation is further amplified by linkage disequilibrium (LD), which reduces diversity at neutrally-evolving sites in close proximity to the targets of selection (hitchhiking effect; Maynard & Haigh, 2007). The extent to which linked selection affects neighboring sites depends on the recombination rate, which shows considerable genome-wide variation (Jensen-Seaman 2004; Smukowski and Noor 2011; Ka- wakami et al. 2014; Schield et al. 2020). In estrildid finches, for instance, recombination rates vary nearly six orders of magnitude across the genome, with several gene-rich hotspots intervened by large deserts of low recombination (Singhal et al., 2015). Larger reductions in nucleotide diversity are expected to occur in genomic regions enriched for functional elements and with lower recom- bination rates. A correlation between nucleotide diversity, gene density, and recombination rate is therefore indicative that linked selection is at play.

56

Despite the common finding of heterogeneity along the genome, metrics of diversity and differentiation are often correlated across taxa (Renaut et al. 2013; Han et al. 2017; Dutoit et al.

2017; Van Doren et al. 2017; Delmore et al. 2018; Stankowski et al. 2019). For instance, genome- wide nucleotide diversity (θπ) and the fixation index (FST) covary across bird species that diverged over 50 million years (Vijay et al., 2017). Given that distantly related species are not expected to share ancestral polymorphism, a correlated landscape of diversity and differentiation suggests the effect of recurrent selection on conserved properties of the genome (Vijay et al. 2017; Dutoit et al.

2017). In fact, parallel peaks and valleys in θπ and FST across species are often linked to correlation in gene density and recombination rate, further supporting the role of natural selection (Cruick- shank and Hahn 2014; Burri et al. 2015; Stankowski et al. 2019). Quantifying covariance between evolutionarily independent species can help understand the interplay between various conserved features of the genome and their impact on patterns of diversity and differentiation along the genome.

We aim to address drift-selection dynamics in the Pleistocene by estimating the impact of demography and linked selection on the genome of Downy (Dryobates pubescens) and Hairy (D. villosus) Woodpeckers, two co-distributed species that share similar ecologies and evolutionary his- tories. Downy and Hairy Woodpecker are year-round residents of a variety of habitats in North

America, occurring in sympatry across an exceptionally broad geographic area from Alaska to

Florida, although the range of the Hairy Woodpecker extends further south, reaching portions of

Central America and the Bahamas (Ouellet, 1977). Despite looking very similar, Downy and Hairy

Woodpecker are not sister species and share a common ancestor more than eight million years ago

(Weibel and Moore 2005; Dufort 2016). During the glacial cycles of the Pleistocene, especially when the polar ice sheets reached their maximum extent (Last Glacial Maximum; 21 kya), a large portion of the present day distribution of the Downy and Hairy Woodpeckers was covered in ice, and populations of both species were restricted to southern refugia (Klicka et al. 2011; Graham and Burg 2012; Pulgarín-R and Burg 2012). After the retreat of Pleistocene glaciers, Downy and

57

Hairy Woodpeckers extended their distributions north, recolonizing higher latitudes. Phylogeo- graphical studies in Downy and Hairy Woodpecker revealed that populations currently inhabiting previously glaciated areas show strong signatures of population expansion and population struc- turing consistent with multiple glacial refugia (Ball and Avise 1992; Klicka et al. 2011; Pulgarín-R and Burg 2012; Graham and Burg 2012). This shared demographic history provides an opportunity to investigate multiple genomic factors that might have impacted the distribution of diversity across populations and within the genomes of these two natural evolutionary replicates.

In this study, we generated whole genome resequencing data for Downy and Hairy Wood- peckers to explore the role of demography and natural selection in patterns of genome-wide ge- netic variation. We test whether the heterogeneous genomic landscape of diversity and differenti- ation in Downy and Hairy Woodpecker is correlated with intrinsic features of the genome, such as recombination rate and gene density, and whether differences in demographic history had an impact on the efficacy of selection. We hypothesize that if linked selection reduced diversity at linked neutral sites along the genome of Downy and Hairy Woodpecker, local levels of nucleotide diversity should be correlated with the rate of recombination and the density of targets of selection.

In addition, we predict that if the efficacy of selection was a function of the demographic trajectory of populations during the Ice Age, more stable populations (i.e., larger Ne) will exhibit lower ge- netic load and a stronger correlation between nucleotide diversity and intrinsic genomic properties, such as recombination rate. These results have implications for our understanding of the relative importance of neutral and selective processes on the evolution of the genomic landscape of species heavily impacted by glaciations.

2.3. Results

2.3.1. Congruent population structure and genetic diversity

We characterized population genetic structure in Downy and Hairy Woodpeckers across an array of ecological zones that would have been subject to varying effects of Pleistocene climatic cycles.

58

We collected whole genomes of 70 individuals each of Downy and Hairy Woodpecker (140 total samples; Table 2.S1), representing seven geographic locations in North America: Northeast (NE),

Southeast (SE), Midwest (MW), Southern Rockies (SR), Northern Rockies (NR), Pacific North- west (NW), and Alaska (AK; Figure 2.1a–b). Sequenced reads were mapped to a pseudo-reference genome of Downy Woodpecker (Jarvis et al. 2014), yielding an average sequencing depth of 5.1x

(1.4–12.5x) for Downy Woodpecker and 4.5x (1.1–11.7x) for Hairy Woodpecker. A total of

16,736,465 and 15,463,356 single nucleotide polymorphisms (SNPs) were identified in the Downy and Hairy Woodpecker genomes, respectively, using the genotype likelihood approach imple- mented in the ANGSD v0.917 (Korneliussen et al., 2014).

59

Figure 2.1. Geographic distribution of genetic variation and demographic history of the Downy (D. pubescens; top) and Hairy Woodpecker (D. villosus; bottom). (a-b) Results of the NGSadmix analysis for the K = 2–4. Each bar indicates an individual's estimated ancestry proportion for each genetic cluster, represented by different colors. (c-d) Map indicating the cur- rent range of Downy and Hairy Woodpecker (green shade), the locality of the samples, and their respective admixture proportions from NGSadmix (pie charts). (e-f) The best-fit demographic models from fastsimcoal2. The width of the rectangles and arrows are scaled relative to the esti- mated effective population sizes in haploid individuals (Ne) and the migration rate (m) in fraction

-7 of haploid individuals per generation. Only the values of migration rate > 10 x Ne migrants per generation are shown. Illustrations reproduced with permission from Lynx Edicions.

60

To assess patterns of genetic differentiation among these broadly distributed populations, we first performed a principal component analysis (PCA) on a subset of independently-evolving

(linkage disequilibrium r2 < 0.2) SNPs. PCA revealed clear genetic structure separating samples from different localities, consistent with previous genetic studies (Figure 2.2a,c; Klicka et al. 2011;

Graham and Burg 2012; Pulgarín-R and Burg 2012). In Downy Woodpecker, the first and second principal components (PC1 and PC2; 5.4% of variation explained by both) separated populations from the East + Pacific Northwest (NE, SE, MW, and NW), Rocky Mountains (SR and NR), and

Alaska (AK; Figure 2.2a). The third principal component (PC3; 2.1% of variation explained) split the Pacific Northwest population (NW) from all others. Unlike Downy Woodpecker, the first principal component in Hairy Woodpecker (PC1; 3.6% of variation explained) revealed a clear split between East + Alaska (NE, SE, MW, AK) and West (SR, NR, and NW) populations. Con- sistent with these findings, NGSadmix (Skotte et al. 2013) supported four geographically congruent genetic clusters (K=4) in the Downy and Hairy Woodpecker: East (NE, SE, and MW), Pacific

Northwest (NW), Rocky Mountains (SR and NR), and Alaska (AK; Figure 2.1c–d). In agreement with previous genetic studies, the average genome-wide estimate of FST was larger in Hairy Wood- pecker (average FST = 0.1; 0.03–0.19) than Downy Woodpecker (average FST = 0.08; 0.03–0.16), indicating larger (but overlapping) levels of population differentiation. In both species, the largest values of FST involved comparisons between Alaska and other populations (Downy: FST [AK vs

NR] = 0.16; Hairy: FST [AK vs SR] = 0.19). Pairwise FST within the East and the Rocky Mountains clusters showed the lowest overall values (Downy: FST = 0.03–0.06; Hairy: FST = 0.03–0.04), indi- cating greater genetic homogeneity within these groups.

61

Figure 2.2. Population genetic structure in the Downy (top) and Hairy (bottom) Wood- pecker. (a,c) Principal component analysis (PCA) of Downy and Hairy Woodpecker based on 71,228 and 71,763 unlinked genome-wide SNPs, respectively, with < 25% missing data and a mi- nor allele frequency (maf) > 0.05. (b,d) Heatmap showing genome-wide pairwise FST values (left) and associated maximum likelihood tree based on the polymorphism-aware phylogenetic model (PoMo) in IQ-Tree 2. All nodes show 100% bootstrap support. Darker colors on the heatmap correspond to larger values of FST. Illustrations reproduced with permission from Lynx Edicions.

Because the expansion and contraction of glaciers were expected to impact the connectiv- ity of populations across the landscape, we explored spatial patterns of gene flow using the esti- mated effective migration surface (EEMS; Petkova et al. 2016). EEMS compares pairwise genetic dissimilarity among localities to reveal geographic areas that deviate from the null expectation of isolation by distance (IBD). In both Downy and Hairy Woodpecker, we detected a pronounced reduction in effective migration near the Great Plains and along the Rocky Mountains, especially in its Northern portion. In contrast, eastern North America shows a higher degree of connectivity when compared to western (Figure 2.3). This finding indicates that major topographic features and

62

variation in habitat availability contributed to the maintenance of population differentiation, de- spite high levels of gene flow.

Figure 2.3. Spatial patterns of gene flow. (a) Effective migration surface inferred by EEMS in Downy Woodpecker and (b) Hairy Woodpecker. Warmer colors indicate lower and colder col- ors indicate higher effective migration rate on a log scale relative to the overall migration rate over the species range. Triangles represent the grid chosen to assign sampling locations to dis- crete demes.

63

The broadly distributed Downy and Hairy Woodpeckers exhibited genome-wide levels of nucleotide diversity larger than those observed in most bird species (Ellegren 2013; Lamichhaney et al. 2015; Dutoit et al. 2017; Barton and Zeng 2019). Mean values of nucleotide diversity were slightly larger in Hairy (θπ= 0.0064; within population = 0.0045–0.0065) than in Downy Wood- pecker (θπ= 0.006; within population = 0.0049–0.0061). Genetic diversity was lowest in Alaska

(AK; θπ Downy= 0.0049, θπ Hairy = 0.0045) and largest in the Northern Rockies (NR; θπ Downy= 0.0061,

θπ Hairy = 0.0065) and Southeast (SE; θπ Downy= 0.0059, θπ Hairy = 0.0062; Figure 2.4a). Regionally, levels of genetic diversity in populations of Downy Woodpecker surpassed all of those of Hairy Wood- pecker, with exception of Northern Rockies and Southeast (Figure 2.4b). Genome-wide values of

Tajima’s D (Tajima, 1989) were consistently negative across six populations of Downy and Hairy

Woodpecker (Downy: -1.35– -0.39; Hairy: -0.88– -24; Figure 2.4b). Negative values of Tajima’s D indicate an excess of low frequency alleles and are suggestive of population expansion. In Alaska, however, genome-wide Tajima’s D were positive (Downy: 0.19; Hairy: 0.42), indicating ongoing population contraction or very recent population expansion (Simonsen et al., 1995).

64

Figure 2.4. Characterization of genome-wide genetic variation in Downy and Hairy Woodpecker. (a) Genome-wide pairwise nucleotide diversity (D) per population. (b) Genome- wide Tajima’s D per population. (c) Harmonic mean of effective population size (Ne) estimated over the past one million year with Stairway Plot 2 for all four genetic clusters. (d) Relationship between pairwise nucleotide diversity (θπ) and long-term effective population size (Ne).

2.3.2. Demographic history

We tested for signatures of Quaternary climatic oscillations on population dynamics of Downy and Hairy Woodpecker by assessing changes in Ne over time and estimating demographic param- eters. First, we employed Stairway Plot 2 (Liu and Fu 2020) to infer fluctuations in Ne over the past two million years. Stairway Plot 2 uses the site frequency spectrum (SFS) to fit a flexible multi- epoch model of changes in population size. For all demographic analyses, we used the folded SFS and specified a mutation rate of 2.42 x 10-9 mutations per site per generation, as estimated from

65

non-coding regions of the Downy’s genome (Jarvis et al. 2014; Zhang et al. 2014) and a generation time of one year for both species (AnAge database; Tacutu et al., 2018). Changes in effective pop- ulation size over time were generally consistent between Downy and Hairy Woodpecker, being characterized by recurrent episodes of bottleneck followed by population expansion (Figure 2.5a– b). At around 1 mya, both Downy and Hairy Woodpecker had an Ne of nearly 500,000 individuals and population sizes dropped between 1 mya and 500 kya to approximately 100,000 individuals.

This was followed by an episode of demographic expansion when populations increased nearly

10-fold. A second population decline occurred around the onset of the Last Glacial Period (LGP;

115 kya), but the exact timing varied across populations. A final spike in Ne occurred during the

LGP between 115 kya and 22 kya, when a more than 20-fold population growth occurred in the

East and Rocky Mountains. In Alaska, a final population expansion occurs immediately after the

Last Glacial Maximum (LGM; 22 kya), likely a result of the glacial retreat. We found that within each genetic cluster, as expected from θπ = 4Ne μ, nucleotide diversity was highly correlated with the harmonic mean of the Ne estimated from Stairway Plot 2 over the past 1 mya (long-term Ne; linear regression: t = 4.876; R2 = 0.76; p < 0.002; Figure 2.4d), indicating these independent anal- yses were consistent.

66

Figure 2.5. Changes in effective population size (Ne) over time and linkage disequilib- rium (LD) in Downy (top) and Hairy (bottom) Woodpecker. (a–b) Inferred history of ef- fective population size of all four genetic clusters in Downy (a) and Hairy Woodpecker (b) ob- tained with Stairway Plot 2 using the folded SFS. For this analysis, we specified a mutation rate of 2.42 x 10-9 mutations per site per year. Both axes are represented in a log-scale. Dotted lines represent 95% confidence intervals, and vertical lines represent the Last Glacial Period (LGP; 115 kya) and the Last Glacial Maximum (LGM; 22 kya). (c–d) Decay of linkage disequilibrium (LD) in all seven populations of Downy (c) and Hairy (d) Woodpecker.

To further elucidate the evolutionary relationships among populations of Hairy and

Downy Woodpecker, we built a rooted maximum likelihood tree from genome-wide intergenic

SNPs using the IQ-Tree 2 polymorphism-aware phylogenetic model (PoMo; Schrempf et al. 2016;

Minh et al. 2020). PoMo incorporates polymorphic states into DNA substitution models, therefore accounting for incomplete lineage sorting among recently diverged populations. All nodes of the tree were supported by 100% bootstrap values. In agreement with the previously described pat- terns of population structure and consistent with past genetic studies (Klicka et al. 2011; Graham and Burg 2012), the phylogenetic tree for Hairy Woodpecker showed two distinct clades – an East

67

+ Alaska and a West clade. The phylogenetic tree for Downy Woodpecker, however, revealed a different topology. First, the Pacific Northwest population (NW) was more closely related to the eastern clade than to the western clade, supporting our PCA analysis. In addition, the Alaska (AK) population was sister to all other populations. Two hypotheses could explain this pattern: either

(1) Alaska was a distinctive clade that differentiated from the other Downy Woodpecker popula- tions as a consequence of persistence in a separate glacial refugium near the Beringia, as has been suggested for other North American taxa (Pruett and Winker 2008; Hewitt 2004; Brubaker et al.

2005), or (2) the topology of the Downy Woodpecker population tree was more reflective of other factors, such as patterns of gene flow and geographic distance among localities, as opposed to the actual order of population splits. If this was the case, then we expect the relationships among populations to better fit a polytomous tree rather than a bifurcating tree. To test these hypotheses, we used the SFS-based method fastsimcoal2 v2.6.0.3 (Excoffier and Foll 2011) to estimate demo- graphic parameters and evaluate the support for two alternative models – (1) a model where all populations diverge synchronously from a single ancestral refugium and expand independently with asymmetric gene flow, and (2) a bifurcating model where populations diverge at different times from multiple refugia (e.g., Beringia and East or East and West) and expand independently with asymmetric gene flow, following the IQ-Tree 2 tree topology. For these analyses, we consid- ered our four identified genetic clusters: East, Rocky Mountains, Pacific Northwest, and Alaska.

Demographic analyses with fastsimcoal2 show differing support for alternative demographic models in Hairy and Downy Woodpecker. The best-supported model for Hairy Woodpecker was model 2 (Table 2.S2; Figure 2.1f), in which two ancestral populations (East and West) diverged from each other around 873 kya (95% CI = 848–926 kya; Table 2.S3). Around 425–508 kya, these two populations gave rise to the four genetic clusters, which underwent strong bottlenecks that reduced Ne to less than 15% of their ancestral size. The largest decline in Ne occurred in Alaska

(2.4% of original size), followed by East (4.4%) and Pacific Northwest (4.9%). A final explosive expansion then occurred between 319–350 kya when populations grew up to 12-fold. Rocky

68

Mountains and East reached an incredibly large Ne of 47 million (95% CI = 20–94 million) and 18 million individuals (95% CI = 10–86 million), respectively. In contrast, Downy Woodpecker showed support for model 1, in which all populations diverge from a single major refugium (Table

2.S1; Figure 2.2.1e). This divergence occurred around 516 kya (95% CI = 241–910 kya; Table 2.S3) and was accompanied by a large bottleneck, reducing Ne to less than 10% of its original size in most populations. During Mid-Pleistocene (251–383 kya), populations grew 2-fold in the Pacific

Northwest and almost 50-fold in the Rocky Mountains. In Downy Woodpecker, expansion oc- curred first in the Rocky Mountains around 383 kya (95% CI = 112–629 kya), reaching an Ne of

9.7 million individuals (95% CI = 1–35 million), and last in Alaska (251 kya; 95% CI = 114–599 kya), reaching an Ne of 1.9 million individuals (95% CI = 1–3 million). Overall, estimates of Ne from fastsimcoal2 confirmed the trends observed in Stairway Plot 2. We found large and variable levels of post-expansion gene flow across populations in both the Downy and Hairy Woodpecker

(Downy: 0–8 migrants per generation; Hairy: 0–11 migrants per generation). The largest migration rates were estimated to occur from East and Rocky Mountains into the Pacific Northwest (Figure

2.1e–f). There were substantial levels (Downy: 4.8–6.17 migrants per generation; Hairy: 2.4–6.2 migrants per generation) of gene flow from all genetic clusters into Alaska, consistent with a north- ward range expansion.

2.2.3. Genomic correlates of nucleotide diversity and differentiation

To elucidate the evolutionary processes shaping levels of genetic variation along the genome of

Downy and Hairy Woodpecker, we investigated the correlation between regional levels of nucle- otide diversity, measured across non-overlapping 100 kb windows, and three genomic features: recombination rate, gene density, and base composition. We found that nucleotide diversity varied

-4 -2 -3 -2 widely along the genome (θπ Downy = 7.5 x 10 –1.9 x 10 ; θπ Hairy = 1.1 x 10 –2.2 x 10 ), but this variation was highly correlated between Downy and Hairy Woodpecker (Pearson’s r = 0.9; p <

0.001; Figure 2.S1). To estimate recombination rates, we used ReLERNN (Adrion et al. 2020), a

69

method that uses a machine-learning approach to infer per-base recombination rates. We found recombination rates to be highly correlated between the two species (Pearson’s r = 0.66; p < 0.001).

Across the genome, we estimated a mean per-base recombination rate (r) = 1.47 x 10-9 c/bp (0–

2.35 x 10-9) in Downy Woodpecker and r = 2.24 x 10-9 c/bp (2.94 x 10-10–2.35 x 10-9) in Hairy

Woodpecker. Considering the average long-term Ne of Downy and Hairy Woodpecker as approx- imately 1.5 x 106 in the East population, these recombination rates correspond to a population- scaled rate ρ = 4Ner = 0.008 and 0.012, respectively. Mean recombination rates were 2–3-fold higher in autosomal compared to the sex-linked Z chromosome (Figure 2.S2–3), consistent with suppressed recombination in sexual chromosomes (Sundström et al., 2004; Xu et al., 2019; Zhou et al., 2014). As a result of both high recombination rates and large Ne, we also observed that linkage disequilibrium (LD) in Downy and Hairy Woodpecker decays very rapidly.

LD drops to half of its initial levels in less than 100 bp (Figure 2.5c–d). Consistently, the average

LD was greater for populations with smaller Ne or populations that have likely experienced a more recent founder event, such as Alaska and the Southern Rockies. We found a significant positive association between nucleotide diversity (θπ) and recombination rates in both species (linear re- gression – Downy: t = 47.67, R2 = 0.165, p < 0.001; Hairy: t = 54.17, R2 = 0.204, p < 0.001; Figure

2.6a–b). This association, however, is expected (to a certain extent) even if diversity is not corre- lated with recombination rates because recombination rates are estimated directly from θw in

ReLERNN.

To further investigate the impact of linked selection on the genomic landscape of diversity, we also tested the prediction that regions of the genome with a higher density of targets of selection

(i.e., genes) exhibit lower nucleotide diversity. Gene density was measured as the percentage of coding sequence in each of the 100 kb windows. Our results revealed a weak but significant nega- tive association between nucleotide diversity (θπ) and gene density (linear regression – Downy: t =

-12.03, R2 = 0.0123, p < 0.001; Hairy: t = -14.89, R2 = 0.0189, p < 0.001; Figure 2.6c–d). This association was not driven by the collinearity between gene density and recombination because

70

this correlation was positive and negligible (Downy: Pearson’s r = 0.045; Hairy: Pearson’s r =

0.032). We also found that regions with high GC content tended to show higher nucleotide diver- sity (linear regression – Downy: t = 36.37, R2 = 0.0123, p < 0.001; Hairy: t = 44.16, R2 = 0.145, p

< 0.001; Figure 2.6e–f). GC content, however, was positively correlated with gene density in both species (Downy: Pearson’s r = 0.25; p < 0.001; Hairy: Pearson’s r = 0.25; p < 0.001; Figure 2.S4–

5) and a weakly correlated with recombination rates in Hairy Woodpecker (Pearson’s r = 0.064; p

< 0.001; Figure 2.S4–5). We then performed a principal component regression (PCR) to separate the effect of individual explanatory variables and control for the multicollinearity among predictor variables. Principal component regression summarizes variables into orthogonal components

(PCs) and uses these components as predictors in a linear regression. Our results revealed that

PC2, which represented almost exclusively recombination rates (Table 2.S4), uniquely explained

12.3% and 18.6% of variation in nucleotide diversity in Downy and Hairy Woodpecker, respec- tively (PC2 linear regression – Hairy: t = 51.1, R2 = 0.186; Downy: t = 40.14, R2 = 0.123). Both

PC1 and PC3 represented the correlation between gene density and GC content, but PC3 had a much stronger effect (Table 2.S4), accounting for 14.4% and 15.5% of the variation in nucleotide diversity in Downy and Hairy Woodpecker, respectively (PC3 linear regression – Downy: t = 45.92,

R2 = 0.155; Hairy: t = 43.97, R2 = 0.144). Considering that gene density and GC content had an equal contribution to PC3 (Table 2.S4), we were unable to differentiate their relative contributions to the relationship. Regardless, our analyses confirm the central role that these genomic properties played in shaping patterns of nucleotide diversity along the genome.

The effect of linked selection is expected to be weaker in populations that underwent more severe bottlenecks due to their smaller long-term Ne when compared to stable populations that maintained large Ne (Kirkpatrick and Jarne 2000; Charlesworth 2009). We tested this prediction by quantifying the strength of correlation between nucleotide diversity (θπ) and gene density in all four genetic clusters of Downy and Hairy Woodpecker showing varied demographic responses to

71

the Pleistocene glaciations. We found that long-term Ne predicted the strength of correlation be- tween genetic diversity and the density of targets of selection (Table 2.1). Alaska, for example, showed the weakest correlation (Downy: Pearson’s r = -0.1008, t = -10.8, p < 0.001; Hairy: Pear- son’s r = -0.1083, t = -11.6, p < 0.001), whereas Rocky Mountains showed the strongest (Downy:

Pearson’s r = -0.1106, t = -11.9, p < 0.001; Hairy: Pearson’s r = -0.1351, t = -14.5, p < 0.001).

These results support the role of varied demographic trajectories on the efficacy of natural selec- tion during the Ice Age likely due to differences in levels of genetic drift.

Table 2.1. Strength of correlation between nucleotide diversity (θπ) and gene density across the four genetic clusters of Downy and Hairy Woodpecker.

Populations Downy Woodpecker Hairy Woodpecker

Pearson’s r t-value Pearson’s r t-value

AK -0.1008* -10.867 -0.1083* -11.643 NW -0.1007* -10.847 -0.1384* -12.966 E -0.1077* -11.618 -0.1215* -13.084 R -0.1106* -11.927 -0.1351* -14.571

* p < 0.001.

72

Figure 2.6. Genomic predictors of nucleotide diversity in Downy (left) and Hairy (right)

Woodpecker. Association between nucleotide diversity (θπ) and three features of the genome: (a–b) recombination rates (Downy: t = 47.67, p <0.001; Hairy: t = 54.17, p <0.001), (c–d) gene density (Downy: t = -12.03, p <0.001; Hairy: t = -14.89, p <0.001), and (e–f) GC content (Downy: t = 36.37, p <0.001; Hairy: t = 44.16, p <0.001). Each point in the scatter plot repre- sents a 100 kb window of the genome. Colors indicate the density of points.

73

Because genomic properties are also expected to impact levels of population differentiation across the genome, we also tested the association between nucleotide diversity, recombination rate, and the average intraspecific population differentiation (FST) across non-overlapping 100 kb win- dows. For each window, we calculated the FST between each pair of populations and summarized the global FST landscape using two approaches: (1) the average FST across all population pairs; and

(2) the first principal component (PC1) explaining most of the variation in pairwise FST (Downy: variance explained = 37.51%; Hairy: variance explained = 47.5%). We found that summaries of

FST produced by these two approaches were highly correlated (Downy: Pearson’s r = 0.97; p <

0.001; Hairy: Pearson’s r = 0.98; p < 0.001), so we only considered the average FST for simplicity.

There was considerable variation in FST along the genome (Downy: FST = 0.01–0.25; Hairy: FST =

0.01–0.32), indicating high variability in patterns of population differentiation. We found a weak but significant negative association between average FST and nucleotide diversity, suggesting that areas of genome that show elevated differentiation tend to be characterized by reduced diversity

(linear regression – Downy: t = -19.12, R2 = 0.03; p < 0.001; Hairy: t = -53.49, R2 = 0.2; p < 0.001;

Figure 2.7). We also found a weak negative association between average FST and recombination rates, indicating higher differentiation in regions of low recombination (linear regression – Downy: t = -32.18, R2 = 0.08; p < 0.001; Hairy: t = -41.55, R2 = 0.13; p < 0.001).

74

Figure 2.7. Landscape of diversity and differentiation of chromosome 2 of Downy (a) and

Hairy (b) Woodpecker. Top plot shows the average pairwise FST calculated across non-overlap- ping 100 kb windows. Middle plot indicates the recombination rate in c/bp (red) and the nucleo- tide diversity (θπ; blue) for each non-overlapping 100 kb window. Bottom plot represents the percentage of coding sequence in each non-overlapping 100 kb window. Illustrations reproduced with permission from Lynx Edicions.

2.2.4. Genetic load and the efficacy of selection

To further explore the magnitude of linked selection in the genome of Downy and Hairy Wood- pecker, we classified each variant according to their functional impact as predicted by the gene annotation. We found that the majority of identified SNPs in Downy and Hairy Woodpecker were classified as modifiers (Downy: 99.35%; Hairy: 99.13%), which are variants in intergenic or intronic regions whose impacts are hard to determine but tend to be neutral to nearly neutral. Low impact variants (i.e., synonymous mutations) characterized 0.46% and 0.64% of SNPs in Downy and

Hairy Woodpecker, respectively. Moderate impact variants, mutations that cause a change in amino acid sequence (i.e., nonsynonymous mutations) represented 0.17% and 0.22% of the SNPs in

Downy and Hairy Woodpecker, respectively. Finally, only 0.006% and 0.007% of the SNPs were classified as high impact in Downy and Hairy Woodpecker, respectively. These variants corre- spond to mutations that cause loss of function, such as loss or gain of a start or stop codon and are therefore expected to occur at very low frequencies.

We investigated differences in the burden of deleterious alleles carried by populations of

Downy and Hairy Woodpecker that could reflect differences in the efficacy of purifying selection.

For this analysis, we focused on sites that were polymorphic in at least one of the two species and whose ancestral states could be determined unambiguously. Our results revealed that the frequency distribution of mutations with moderate and high impact shifted downwards compared to the mutations with low impact (Figure 2.8a–b). This indicates that purifying selection was successful in purging mutations that were highly deleterious. Hairy Woodpecker, however, showed a larger

75

excess of low frequency mutations of high impact when compared to Downy Woodpecker (Figure

2.8a–b), suggesting that purifying selection might have been more efficient in Hairy Woodpecker.

To further investigate whether the efficacy of purifying selection varied across populations with different demographic trajectories, we estimated the genetic load as the ratio of the count of ho- mozygous derived alleles of high impact (i.e., highly deleterious) over the count of homozygous derived alleles of low impact (i.e., synonymous) for each individual. This metric is a proxy for the genetic load under a recessive model controlling for the underlying population differences in the neutral SFS (Simons et al. 2014; Simons and Sella 2016). We also computed the same metric con- sidering an additive model, in which the presence of a single copy of the derived allele has fitness consequences. Our results reveal that the recessive deleterious load was overall larger in Downy

Woodpecker than Hairy Woodpecker, but this difference was not statistically significant (Kruskal-

Wallis χ2 = 1.33, df =1, p = 0.24; Figure 2.8c–d). We also found that the recessive deleterious load was much larger in the Rocky Mountains when compared to other populations. Alaska also showed elevated recessive deleterious load in both species, generally larger than the East and Pacific North- west (Figure 2.8c–d). Overall, these findings do not support the prediction that populations with lower long-term Ne as a consequence of stronger bottleneck exhibit high deleterious load.

76

Figure 2.8. Deleterious load in Downy and Hairy Woodpecker. (a) Site frequency spectrum (SFS) for variants with low (neutral), moderate (mild), and high (deleterious) impact in Downy Woodpecker and (b) Hairy Woodpecker. (c) Ratio of homozygous derived variants of high im- pact (deleterious) over homozygous derived variants of low impact (neutral) in each genetic clus- ter and species (recessive model). (d) Ratio of the total number of derived variants of high im- pact (deleterious) over total number of derived variants of low impact (neutral) in each genetic cluster and species (additive model). Horizontal bars denote population medians.

Lastly, we investigated the overall impact of natural selection on protein-coding sequences of Downy and Hairy Woodpecker. We calculated the ratio of synonymous over nonsynonymous substitutions (dN/dS) along the branches leading to Downy and Hairy Woodpecker using a set of

397 high-quality orthologous genes distributed throughout the genome. dN/dS ratio was higher in Downy Woodpecker (dN/dS = 0.065) than in Hairy Woodpecker (dN/dS = 0.053), suggesting that purifying selection might have been weaker in Downy Woodpecker over deeper evolutionary

77

times (>4Ne generations ago; Elyashiv et al., 2010; Figuet et al., 2016; Herrera-Álvarez et al., 2020).

2.4. Discussion

Our genomic analyses reveal that both demography and linked selection played a significant role shaping patterns of diversity and differentiation across populations and along the genome of

Downy and Hairy Woodpecker. We found that genome-wide nucleotide diversity, as well as the landscape of recombination, are highly correlated between these two species, which diverged more than 8 mya. This correlation suggests that intrinsic properties of the genome might be conserved across deep evolutionary time. We posit that linked selection might underlie the genomic hetero- geneity observed, as demonstrated by a significant association between nucleotide diversity, re- combination rate, and gene density. Despite strong fluctuations in Ne over the Pleistocene, Downy and Hairy Woodpecker maintained very large population sizes, which might have facilitated the action of natural selection. Nevertheless, given the large differences in long-term Ne observed among populations, our results indicate variation in the efficacy of selection.

2.4.1. Conserved properties of the genome underlie the correlated genomic landscape of Hairy and Downy Woodpecker

We recovered large heterogeneity in patterns of nucleotide diversity (θπ) and FST along the genomes of Downy and Hairy Woodpecker. Despite this variation, our results revealed a highly correlated genomic landscape between the two species. Such covariation in levels of genome-wide measures of diversity and differentiation across distantly related species is fairly common (Renaut et al. 2013;

Burri et al. 2015; Dutoit et al. 2017; Van Doren et al. 2017; Delmore et al. 2018; Stankowski et al.

2019) and suggests that properties of the genome, such as mutation rate, recombination rate, and density of targets of selection are conserved across deep evolutionary time (Dutoit et al. 2017). For example, bird genomes are known to show large karyotypic stability, with very few chromosomal rearrangements and high synteny across highly divergent species (Ellegren 2010; Volker et al. 2010;

Ellegren 2013; Singhal et al. 2015). Features of the genome, such as recombination rates and GC

78

content, might also be conserved across species. We found that estimates of recombination rate are highly correlated between Downy and Hairy Woodpecker, although slightly higher in Hairy

Woodpecker. Linkage disequilibrium (LD), which is a function of both recombination rate and

Ne, was extremely short in Downy and Hairy Woodpecker. Whereas linkage disequilibrium extends for over thousands of base pairs in humans (Reich et al. 2001; Ardlie et al. 2002), for instance, it breaks after only 100 bp in Downy and Hairy Woodpecker. Such properties have been observed in other bird species with very large Ne (Balakrishnan and Edwards 2009; Kardos et al. 2016). We also found large variation in recombination rates both within and among chromosomes, with the

Z chromosome showing the lowest rates. Considering the lack of recombination across much of the Z chromosome in female birds (heterogametic sex; ZW), at the population level, crossing-over occurs at a much lower rate in sex chromosomes than in their autosome counterparts (Sundström et al. 2004; Wilson Sayres 2018; Irwin 2018). Similar to Downy and Hairy Woodpecker, recombi- nation in the chicken (Gallus gallus) was approximately 2.5 times lower in the Z chromosome than in the autosomes (Levin et al., 1993; Schmid et al., 2000). As a consequence, many bird species show reduced diversity and faster divergence in the Z chromosome (Sundström et al. 2004; Borge et al. 2005; Mank et al. 2007; Balakrishnan and Edwards 2009; Zhang et al. 2014).

2.4.2. The interplay between natural selection and recombination produces a heterogene- ous genomic landscape

One of the main mechanisms proposed to explain the substantial heterogeneity in levels of poly- morphism along the genome is the effect of linked selection (Charlesworth et al. 1993; Maynard and Haigh 2007; Cutter and Payseur 2013). Both positive selection (i.e., in favor of a beneficial allele) and negative selection (i.e., against a deleterious allele) are expected to reduce diversity around functional elements (Maynard and Haigh 2007; Charlesworth et al. 1993). Such a reduction is extended to all neighboring sites that happen to be linked to the target of selection (hitchhiking effect; Maynard and Haigh 2007). The extent to which adjacent sites are affected by linked selection

79

is dependent on the recombination landscape, such that regions where recombination rate is lower tend to show lower genetic diversity and vice versa (Begun and Aquadro 1992; Mugal et al. 2013;

Wang et al. 2016). Similarly, the higher the density of functional elements (i.e., targets of selection), the more severe is the reduction in genetic diversity due to the effect of recurrent selection (An- dolfatto 2007; Branca et al. 2011; Gossmann et al. 2011; Beissinger et al. 2015). A correlation between genetic diversity, recombination rate, and gene density has been therefore interpreted as strong evidence of the effect of selection on linked neutral sites and can be used to assess the magnitude of linked selection (Corbett-Detig et al. 2015; Cutter and Payseur 2013). We found strong evidence that linked selection has contributed to shape patterns of genetic diversity along the genomes of Downy and Hairy Woodpecker. First, nucleotide diversity (θπ) was positively as- sociated with recombination rates in both species. Second, there was a weak but highly significant association between nucleotide diversity (θπ) and gene density. Third, as predicted by theory, the strength of association between nucleotide diversity (θπ) and gene density varied according to the long-term Ne, such that larger populations showed more pronounced signatures of linked selec- tion.

We observed an association between nucleotide diversity (θπ) and GC content in Downy and Hairy Woodpecker. The frequency of GC nucleotides is known to be considerably higher in coding sequence when compared to noncoding (Talla et al., 2019). Accordingly, we found a sig- nificant correlation between GC content and gene density, but our principal component regression failed to dissect the effect of these two variables on patterns of nucleotide diversity. We also found a weak correlation between GC content and recombination rate in Hairy Woodpecker. A mecha- nism widely postulated to explain this correlation is GC-biased gene conversion (gBGC), a process whereby AT/GC heterozygotes are more likely to pass GC nucleotides to descendants during meiotic recombination (Duret and Galtier 2009; Mugal et al. 2015). This mechanism mimics selec- tion favoring GC and is tightly linked to variation in recombination rate, so that high GC content is expected in regions of high recombination. In birds, gBGC is an important factor shaping GC

80

content along the genome and is positively correlated with divergence at neutral sites (Webster et al. 2006; Nabholz et al. 2011; Weber et al. 2014; Bolívar et al. 2019).

Natural selection is also expected to impact levels of genetic differentiation along the ge- nome (Cruickshank and Hahn 2014; Matthey-Doret and Whitlock 2019; Stankowski et al. 2019).

We estimated a weak but significant negative association between nucleotide diversity (θπ) and the average pairwise FST, indicating that regions of the genome that are highly differentiated between populations tend to show reduced diversity. In favor of this scenario, we also found that the aver- age pairwise FST is negatively correlated with recombination rate. These correlations are consistent with the effect of linked selection continuously eroding diversity near targets of selection (espe- cially in regions of low recombination), which leads to the inflation of local levels of population differentiation (Cruickshank & Hahn, 2014). Because beneficial alleles are not expected to appear frequently, background selection against deleterious alleles is the most likely selective mechanism underlying the correlation between FST, nucleotide diversity, and recombination rate (Vijay et al.

2017; Matthey-Doret and Whitlock 2019). These findings suggest that population-specific selec- tion associated with local adaptation (i.e., divergent selection) is not necessary to produce a corre- lated genomic landscape. Comparative analyses across both distantly and closely related bird spe- cies demonstrate that linked selection can reduce genetic diversity prior to populations splits and consequently produce parallel patterns of genetic differentiation in regions of low recombination

(Burri et al. 2015; Irwin et al. 2016; Vijay et al. 2017; Delmore et al. 2018).

2.4.3. Dynamic population demography characterizes the evolution of Hairy and Downy Woodpecker in the Pleistocene

We found that population structure was spatially congruent between Downy and Hairy Wood- pecker, which is likely driven by drift and varying gene flow regimes across the landscape. Both species are characterized by four genetic clusters that are consistent with previous phylogeographic studies – East, Alaska, Rocky Mountains, and Pacific Northwest (Klicka et al. 2011; Graham and

Burg 2012; Pulgarín-R and Burg 2012). Genetic structure in Hairy Woodpecker shows a clear east-

81

west subdivision, which is estimated to have occurred in the Mid-Pleistocene transition (848–926 kya) when glacial-interglacial cycles increased in length and intensity (Willeit et al., 2019). An east- west split is a common biogeographic pattern observed in widely distributed North American birds

(Zink 1996; Manthey et al. 2011; Walstrom et al. 2012; Smith et al. 2017; Aguillon et al. 2018). Our demographic analyses supported the existence of at least two glacial refugia that isolated popula- tions of Hairy Woodpecker in either side of North America and gave origin to the four genetic clusters. Previous paleoclimate modelling supports multiple southern refugia during the LGM

(Klicka et al. 2011; Graham and Burg 2012).

Despite geographic congruence, genetic structure in Downy Woodpecker shows a few dif- ferences. First, we found genome-wide population differentiation to be higher in Hairy Wood- pecker (average FST = 0.1; 0.03–0.19) than in Downy Woodpecker (average FST = 0.08; 0.03–0.16).

These results agree with previous genetic studies using a smaller number of loci, which reported very shallow population differentiation in Downy Woodpecker (Pulgarín-R and Burg 2012; Ball and Avise 1992). However, it appears that genetic diversity and structure in the mtDNA of Downy

Woodpecker is much lower than in the nuclear genome (Pulgarín-R and Burg 2012; Ball and Avise

1992). Such discrepancies may reflect inherent differences in Ne between different genomes, a possible selective sweep that could have wiped out diversity from the mtDNA, or even sex-biased dispersal (Toews and Brelsford 2012). In fact, females of Downy Woodpecker have a higher ten- dency for long-distance dispersal than males, which in part Woodpecker have been resilient enough to maintain relatively large populations, which, could explain the homogeneity of the mi- tochondrial genome (Browning, 1995). Regardless, an elevated FST in Hairy Woodpecker when compared to Downy Woodpecker indicates that gene flow in Hairy Woodpecker might be more restricted. Second, the clear east-west subdivision observed in Hairy Woodpecker was not seen in

Downy Woodpecker. Our phylogenetic tree shows that Alaska was the first population to diverge from the clade containing all other populations, followed by the Rocky Mountains. Such a topology may arise if Alaska contained suitable habitat for populations to persist through glacial cycles,

82

which seems to have been the case for several boreal species (Hewitt 2004; Brubaker et al. 2005;

Pruett and Winker 2008). Nevertheless, the low genetic diversity and signature of recent popula- tion expansion in Alaska favors a scenario of colonization (Pulgarín-R and Burg 2012). Moreover, we found support for a model in which all daughter populations of Downy Woodpecker arise simultaneously from a single ancestral population (i.e, polytomy). Under this scenario, the elevated differentiation of Alaska could be due to its further distance from other populations.

Our demographic analyses reveal a dynamic population history for Hairy and Downy

Woodpecker during the Ice Age. Both species underwent repeated cycles of population contrac- tion and expansion, consistent with the climatic fluctuations of the Pleistocene. Two main episodes of bottleneck followed by expansion can be detected in our dataset – the first one occurring in the

Late-Mid Pleistocene, between 1 mya and 500 kya, when range contraction and persistent isolation in glacial refugia have likely contributed to population differentiation. The second one occurred during the Last Glacial Period (LGP; 115 kya – 12 kya), when populations underwent strong de- cline followed by a more than 20-fold growth. The timing and magnitude of these changes differed across geographic regions. Despite strong variation in Ne over the past million year, our data indi- cates that Downy and Hairy Woodpecker have been resilient enough to maintain relatively large populations, which favored the maintenance of very high genetic diversity, even in the face of repeated bottlenecks. Our estimates of Ne of more than 20 million individuals in certain popula- tions far surpass current estimates of census population size in the United States and Canada, which is approximately 13 million individuals of Downy Woodpecker and 8.5 million individuals of Hairy Woodpecker (North American Breeding Bird Survey; Sauer et al. 2017). This discrepancy suggests that current populations might be still undergoing growth and have not yet reached past peak sizes. It is worth noting, however, that these estimates are critically dependent on the choice of mutation rate. Other estimates of mutation rate in birds suggest a large mutation rate than the one utilized here (Smeds et al. 2016; Hruska and Manthey 2021), which would suggest that esti- mates of Ne and time of divergence are smaller.

83

Consistent with theoretical predictions, nucleotide diversity within populations was strongly correlated with the long-term Ne. For example, Alaska showed the lowest genome-wide genetic diversity, likely as a consequence of being one of the latest areas to be deglaciated and most recently founded. On the other hand, populations in eastern North America (e.g., MW, SW, and

NE) showed large levels of genetic diversity, supporting their large population sizes. In both focal species, the Northern Rockies exhibited the largest nucleotide diversity and long-term Ne. Data from multiple sources support the existence of a temporally fluctuating ice-free corridor along the

Canadian Rocky Mountains that might have functioned as a glacial refugium (Jackson 1979; Rutter

1984; Shafer et al. 2010; Pedersen et al. 2016). Thus, it is possible that suitable habitat might have allowed rapid growth and persistence of large populations in the North Rockies during the glacial periods of the Pleistocene (Loehr et al. 2006; Shafer et al. 2010; Pulgarín-R and Burg 2012).

2.4.4. The efficacy of linked selection was affected by different evolutionary trajectories of Downy and Hairy Woodpecker

We investigated whether differences in the demographic trajectories of populations of Downy and

Hairy Woodpecker in response to the Pleistocene glaciation had an impact on the efficacy of nat- ural selection across the genome. Given that purifying selection is more efficient in larger popula- tions (Ohta, 1973), we hypothesized that populations that underwent a stronger bottleneck or maintained lower levels of Ne were more likely to have accumulated highly deleterious mutations

(i.e., genetic load; Henn et al. 2016; Willi et al. 2018; Wang et al. 2018; Rougemont et al. 2020; de

Pedro et al. 2021). We failed to find support for this prediction. In contrast to our expectations, we found that the Rocky Mountains, the genetic cluster with the largest long-term Ne, exhibited the largest genetic load in both species. One possible explanation for this finding is that highly deleterious alleles might have been more efficiently purged from populations that went through more severe bottlenecks due to higher inbreeding (Kirkpatrick and Jarne 2000). For example, spe- cies whose populations underwent extreme bottlenecks, such as the island foxes (Robinson et al.,

84

2018), the mountain gorillas (Xue et al., 2015), and the alpine ibex (Grossen et al., 2020) show fewer mutations of high impact because extensive inbreeding made highly deleterious alleles more likely to be exposed in homozygosity. This is not the case for Downy and Hairy Woodpecker, which despite repeated episodes of bottlenecks still managed to maintain considerably large pop- ulation sizes, making inbreeding very unlikely to have occurred. Besides, we found that Alaska, the population with the lowest long-term Ne, does not carry the fewest highly deleterious alleles, as predicted by the “purging under inbreeding” scenario. Instead, it carries a larger load than the East and the Pacific Northwest, which are populations with a higher long-term Ne. At the species level, however, we found that genetic load was generally larger in Downy Woodpecker than Hairy

Woodpecker, which is consistent with more efficient purifying selection in Hairy Woodpecker.

This finding makes sense considering that Hairy Woodpecker exhibits slightly larger Ne than

Downy Woodpecker. Supporting this observation, we also found a larger excess of highly delete- rious mutations at low frequencies in Hairy Woodpecker, indicating that deleterious alleles were less likely to rise to high frequencies in Hairy Woodpecker than Downy Woodpecker because of more efficient selection. Lastly, we observed that the genome-wide ratio of non-synonymous over synonymous substitutions (dN/dS) was higher in Downy Woodpecker than Hairy Woodpecker.

Elevated genome-wide, as opposed to gene-specific, dN/dS ratio is suggestive of a reduction in the efficacy of purifying selection because adaptive substitutions are expected to occur less often

(Elyashiv et al., 2010; Figuet et al., 2016). This result indicates that a smaller Ne in the lineage leading to Downy Woodpecker might have allowed more fixation of slightly deleterious alleles.

2.5. Conclusion

In conclusion, we investigated the impact of demography and natural selection on the genomic landscape of two co-distributed woodpecker species whose population histories have been pro- foundly impacted by the Ice Age. We found that despite a dynamic demographic history, Downy and Hairy Woodpecker were able to maintain very large Ne even during glacial periods, which

85

might have facilitated the action of natural selection. Supporting this conclusion, our results reveal a correlation between nucleotide diversity, recombination rate, and gene density, which suggests the effect of linked selection shaping the genomic landscape. In addition, we found that the mag- nitude of linked selection was associated with population-specific Ne trajectories, indicating that demography and natural selection operated in concert to shape patterns of polymorphism along the genome. This study adds to the growing body of literature supporting the role of natural se- lection in driving patterns of genome-wide variation but highlights the difficulty of interpreting the outcome of the interplay between genetic drift and natural selection in organisms with non- equilibrium demographic dynamics and large effective population sizes.

2.6. Material and Methods

2.6.1. Sample collection and whole genome sequencing

We collected 70 samples for both the Downy Woodpecker (D. pubescens) and Hairy Woodpecker

(D. villosus) in each of seven populations (n = 10 per population) across their temperate North

American ranges (Figure 2.1): New York (Northeast), Louisiana (Southeast), Minnesota (Midwest),

New Mexico and Colorado (Southern Rockies), Wyoming (Northern Rockies), Washington (Pa- cific Northwest), and Alaska. The samples were obtained through museum loans of vouchered specimens and augmented by field collections in Wyoming, Louisiana, and Alaska (Table 2.S1).

We extracted genomic DNA from tissue samples using the MagAttract High Molecular Weight

DNA Kit from Qiagen following manufacturer’s instructions (Qiagen, California, USA). These samples were then submitted for whole genome resequencing on a paired-end Illumina HiSeq X

Ten machine at RAPiD Genomics (Gainesville, Florida, USA).

86

2.6.2. Read alignment, variant calling and filtering

Raw reads were trimmed for Illumina adapters using Trimmomatic v0.36 (Bolger et al. 2014) with the following parameters: “ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10:8:true”, resulting in an av- erage of 35,689,979 paired reads per sample. Read quality was assessed with FastQC v0.11.4. (An- drews 2010). Given the high synteny and evolutionary stasis of bird chromosomes (Ellegren 2010), we produced a chromosome-length reference genome for Downy Woodpecker by ordering and orienting the scaffolds and contigs of the Downy Woodpecker genome assembly (Jarvis et al. 2014) along the 35 chromosomes of the Zebra finch (Taeniopygia guttata; version taeGut3.2.4) using Chro- mosemble from the Satsuma package (Grabherr et al. 2010). We verified the completeness of this new reference by searching for a set of single-copy avian orthologs using BUSCO v2.0.1 (Bench- marking Universal Single-Copy Orthologs; Waterhouse et al. 2018). A total of 91.1% of these genes were present and complete in our pseudo-chromosome reference, indicating sufficient com- pleteness. We finally transferred the genome annotation of the Downy Woodpecker by mapping the genomic coordinates of each annotated feature against the pseudo-chromosome reference us- ing gmap (Wu and Watanabe 2005). A total of 99.98% of all the 14,443 annotated genes in Downy

Woodpecker were successfully mapped to the pseudo-chromosome reference.

Trimmed reads for both Downy and Hairy Woodpecker were aligned against the pseudo- chromosome reference genome of the Downy Woodpecker using BWA v0.7.15 mem algorithm

(Li and Durbin 2009). On average, 97.27% of reads from Downy Woodpecker and 96.38% of reads from Hairy woodpecker were successfully mapped, demonstrating that despite the large evo- lutionary distance between these two species (6–10 mya; Dufort 2016; Shakya et al. 2017), sequence conservation allows efficient mapping. Resulting sequence alignment/map (SAM) files were con- verted to their binary format (BAM) and sequence group information was added. Next, reads were sorted, marked for duplicates, and indexed using Picard (http://broadinstitute.github.io/picard/).

The Genome Analysis Toolkit (GATK v3.6; DePristo et al., 2011) was then used to perform local

87

realignment of reads near insertion and deletion (indels) polymorphisms. We first used the Rea- lignerTargetCreator tool to identify regions where realignment was needed, then produced a new set of realigned binary sequence alignment/map (BAM) files using IndelRealigner. The final quality of mapping was assessed using QualiMap v.2.2.1 (Okonechnikov et al. 2016).

We implemented two complementary approaches for the downstream analysis of genetic polymorphism. First, we used ANGSD v0.917 (Korneliussen et al., 2014), a method that accounts for the genotype uncertainty inherent to low depth sequencing data by inferring genotype likeli- hoods instead of relying on genotype calls. We estimated genotype likelihoods from BAM files using the GATK model (-GL 2; DePristo et al., 2011), retaining only sites present in at least 70% of sampled individuals (-minInd 50) and with the following filters: a minimum mapping quality of

30 (-minMapQ 30), a minimum quality score of 20 (-minQ 20), a minimum frequency of the minor allele of 5% (-minMaf 0.05), and a P-value threshold for the allele-frequency likelihood ratio test statistic of 0.01 (-SNP_pval 0.01). Allele frequencies were estimated directly from genotype likeli- hoods assuming known major and minor alleles (-doMajorMinor 1 -doMaf 1; Kim et al., 2011). A total of 16,736,465 and 15,463,356 SNPs were identified for Downy and Hairy Woodpecker, re- spectively. Because several downstream analyses lack support for genotype likelihoods, we also called genotypes using GATK v3.8.0 (McKenna et al., 2010). First, we run HaplotypeCaller sepa- rately for each sample using the --emitRefConfidence GVCF -minPruning 1 -minDanglingBranchLength 1 options to create one gVCF per individual, then we ran GenotypeGVCFs with default settings across all samples to jointly call genotypes. In the absence of a training SNP panel for our non- model species, we applied hard filtering recommendations from the Broad Institute's Best Prac- tices (https://gatk.broadinstitute.org/). We filtered SNPs with quality by depth below 2 (QD <

2.0), SNPs where reads with the alternative allele were shorter than those with the reference allele

(ReadPosRankSum < -8), SNPs with evidence of strand bias (FS > 60.0 and SOR > 3.0), SNPs with root mean square of the mapping quality below 40 (MQ < 40.0), and SNPs in reads where the alternative allele had a lower mapping quality than the reference allele (MQRankSumTest < −

88

12.5). In addition, we used VCFtools v0.1.17 (Danecek et al. 2011) to retain only biallelic SNPs occurring in at least 75% of samples, with a minimal mean coverage of 2x, a maximum mean coverage of 100x, and a p-value above 0.01 for the exact test for Hardy-Weinberg Equilibrium.

We applied three different minor allele frequency (maf) thresholds – 0.05 (for most analyses), 0.02

(for the estimation of recombination rates), and no threshold (for demographic analyses based on the SFS).

2.6.4. Population structure

To assess population structure, we performed a principal components analysis (PCA) using the R package SNPRelate v3.3 (Zheng et al. 2012). We first applied the function snpgdsLDpruning to select a subset of unlinked SNPs (LD r2 threshold = 0.2), with < 25% missing data and a maf > 0.05, which resulted in a total of 71,228 SNPs for Downy woodpecker and 71,763 SNPs for Hairy woodpecker. We then used the function snpgdsPCA to calculate the eigenvectors and eigenvalues for the principal component analysis. We investigated population structure by looking at the first three principal components (PC1–PC3). In addition, we used NGSadmix (Skotte et al. 2013), im- plemented in ANGSD (Korneliussen et al. 2014), to investigate the number of genetic clusters, and associated admixture proportions for each individual. NGSadmix is a maximum likelihood approach analogous to STRUCTURE (Pritchard et al. 2000), but bases its inferences on genotype likelihoods instead of SNP calls, therefore accounting for the uncertainty of genotypes.

We also described the relationships among populations by building a maximum likelihood tree based on the polymorphism-aware phylogenetic model (PoMo; Schrempf et al. 2016) imple- mented in IQ-Tree 2 (Minh et al. 2020). PoMo is a phylogenetic method that accounts for incom- plete lineage sorting inherent to population-level data by incorporating polymorphic states into

DNA substitution models. We used a python script (https://github.com/pomo-dev/cflib) to con- vert our vcf files containing only intergenic SNPs into the input format of PoMo (counts file). IQ-

Tree was run using the HKY+P model of sequence evolution with 100 non-parametric bootstraps

89

to assess support. We used three samples from Hairy Woodpecker as an outgroup to root the tree for Downy Woodpecker, and vice versa.

We estimated pairwise FST values among populations in each species using ANGSD v0.917

(Korneliussen et al. 2014). We first produced site-allele-frequency likelihoods using the command

-doSaf, followed by the realSFS -fold 1 command to generate a folded site frequency spectrum (SFS).

We then estimated weighted FST values using the realSFS fst command both globally and across non-overlapping 100 kb windows.

We investigated patterns of gene flow across the landscape using the estimated effective migration surface (EEMS; Petkova et al. 2016), which is a method to visualize variation in patterns of gene flow across a habitat. Low values of relative effective migration rate (m) indicate a rapid decay in genetic similarity in relation to geographic distances, which suggests the presence of bar- riers to gene flow. In contract, high values of m indicate larger genetic similarity than expected given the geographic distance, suggesting genetic connectivity. We generated pairwise identity-by- state (IBS) matrices using the -doIBS function in ANGSD (Korneliussen et al., 2014) and used these matrices to represent dissimilarity between individuals. We ran EEMS using 200 demes and performing a single MCMC chain run with 1 x 107 iterations following a burn‐in of 5 x 106, and a thinning of 9,999. We then checked the posterior probabilities to ensure convergence.

2.6.5. Demographic inference

We inferred past changes in effective population size (Ne) using Stairway Plot 2 (Liu and Fu 2020), a method that leverages information contained in the site frequency spectrum (SFS) to estimate recent population history. Unlike methods based on the Sequentially Markov Coalescent (e.g,

PSMC, SMC++), Stairway Plot 2 is applicable to a large sample of unphased whole genome se- quences, and it is insensitive to read depth limitations. We estimated the folded site frequency spectrum for each population using the realSFS function in ANGSD (Korneliussen et al., 2014).

90

For each population, we used the default 67% sites for training, and calculated median estimates and 95% pseudo-CI based on 200 replicates. We assumed a mutation rate of 2.42 x 10-9 mutations per site per generation, as estimated from non-coding regions of the Downy’s genome (Jarvis et al. 2014; Zhang et al. 2014) and a generation time of one year for both species (AnAge database;

Tacutu et al., 2018). We then utilized the estimates of Ne from Stairway Plot 2 across the past 1 mya to calculate the harmonic mean, representing each population's long-term Ne.

We further investigated the demographic history of the two species using fastsimcoal2 v2.6.0.3, a composite likelihood method that uses the joint site frequency spectrum (jSFS) to per- form model selection and estimate demographic parameters (Excoffier and Foll 2011). We tested the support for two competing demographic models: (1) a model where all populations diverge synchronously from a single large refugium and expand independently with asymmetric gene flow, and (2) a bifurcating model where populations diverge at different times from multiple refugia and expand independently with asymmetric gene flow. Since we only need a reasonably large subset of the genome to get an accurate estimate of the site frequency spectrum (Nunziata and Weisrock

2018; Beichman et al. 2018), we generated the four-population folded jSFS from a set of high quality SNPs with no maf filtering (Downy: 6,030,759 SNPs; Hairy: 7,967,215 SNPs) present in chromosome 1 using easySFS.py (https://github.com/isaacovercast/easySFS). We projected the jSFS down to 20 chromosomes (i.e., 10 diploid samples) per population to avoid issues associated with differences in sample size and missing data. To minimize the impact of selection, we only included sites in non-coding regions of the genome. All models followed the topology of the pop- ulation tree obtained from IQ-Tree 2 and assumed a mutation rate of 2.42 x 10-9 mutations per site per generation. For each model, we conducted 75 iterations of the optimization procedure, each with 40 expectation conditional maximization cycles and 100,000 genealogical simulations per cycle. We performed model selection using the run with the highest likelihood for each model.

For each species, we chose the model with the largest relative Akaike information criterion (AICw;

Sakamoto et al., 1986) as the best-fit model. We obtained 95% pseudo-CI for parameter estimates

91

by performing 100 parametric bootstrap estimates simulating jSFSs under the best model and re- estimating parameters using these simulated datasets.

2.5.6. Genetic diversity, recombination rates, and linkage disequilibrium

We compared genetic diversity among populations of the two species by estimating the genome- wide pairwise nucleotide diversity (θπ; Tajima, 1989) and the Watterson estimator of the rescaled mutation rate per base (θW; Watterson, 1975) using ANGSD (Korneliussen et al. 2014). We first ran the command -doSaf in ANGSD to generate site-allele-frequency likelihoods based on the

GATK model (McKenna et al. 2010), then we used -realSFS with the option -fold 1 to estimate the folded SFS. ANGSD was also used to estimate genome-wide Tajima’s D. We estimated recombi- nation rates (r = recombination rate per per generation) along the genome of the two species using ReLERNN, a deep learning algorithm (Adrion et al. 2020). ReLERNN takes as input a vcf file and simulates training, validation, and test datasets matching the empirical distribution of

θW. ReLERNN then uses the raw genotype matrix and a vector of genomic coordinates to train a model that predicts per-base recombination rates across sliding windows (Adrion et al. 2020). To reduce the impact of population structure on estimates, we restricted the prediction of recombi- nation rates to the Eastern populations (Northeast + Southeast + Midwest), the genetic cluster with most samples. Given the conserved landscape of recombination in birds, we do not expect major differences in recombination across populations (Singhal et al. 2015). We used the SNP dataset with maf > 0.02 and ran the analysis with default settings. Because ReLERNN is robust to demographic model misspecification (Adrion et al. 2020), we simulated an equilibrium model con- sidering a mutation rate of 2.42 x 10-9 mutations per generation (Jarvis et al. 2014; Zhang et al.

2014) and assuming a generation time of one year (AnAge database; Tacutu et al., 2018). Finally, we explored the recombination history of each population by analyzing their patterns of linkage disequilibrium (LD) decay using PopLDdecay (Zhang et al. 2018). We calculated pairwise D’/r2

92

using the default maximum distance between SNPs of 300 kb and plotted it as a function of ge- nomic distance (in kb).

2.6.7. Genomic predictors of regional variation in nucleotide diversity

To investigate the factors shaping the genomic landscape of diversity in the two woodpecker spe- cies, we tested the effect of (1) recombination rate, (2) gene density, and (3) GC content on regional patterns of nucleotide diversity. We computed pairwise nucleotide diversity (θπ) across 100 kb non- overlapping windows using ANGSD (Korneliussen et al. 2014). We first used the -doThetas func- tion to estimate the site-specific nucleotide diversity from the posterior probability of allele fre- quency (SAF) using the estimated site frequency spectrum (SFS) as a prior (Korneliussen et al.

2013). Then, we ran the thetaStat do_stat command to perform the sliding windows analysis. To quantify variation in recombination rates, we calculated weighted averages of recombination rates estimated in ReLERNN across 100 kb non-overlapping windows. We assessed gene density (i.e., density of targets of selection) as the proportion of coding sequence (in number of base-pairs) for any given 100 kb non-overlapping window and estimated GC content in each 100 kb non-over- lapping window using the function GC of the R package seqinr version 3.6-1 (Charif and Lobry

2007). We fit a general linear regression in R to assess the relationship between nucleotide diversity

(θπ) and the three predictor variables – recombination rate, gene density, and base composition.

To control for the collinearity among these variables, we also ran a principal component regression

(PCR). PCR is a technique that summarizes the predictor variables into orthogonal components

(PCs) before performing regression, therefore removing the correlation among variables. PCR was conducted using the R package pls (Wehrens and Mevik 2007). All variables were Z-transformed before these analyses.

We also investigated the association between patterns of intraspecific population differen- tiation (FST) and intrinsic properties of the genome (i.e., nucleotide diversity and recombination rates). To summarize the genomic landscape of differentiation into a single response variable we

93

employed two approaches: for each 100 kb windows, we (1) calculated the average FST across all pairwise population comparisons; (2) we performed a principal component analysis and extracted that first principal component (PC1) that explained the greatest covariance among all pairwise population comparisons.

2.6.8. Natural selection and genetic load

To estimate the genetic load of each species and populations, we first used the software snpEff v4.1 (Cingolani et al. 2012) to classify SNPs into one of four categories of functional impact, ac- cording to the predicted effect of the gene annotation – (1) modifiers: variants in non-coding regions of the genome (e.g, introns, intergenic) whose effects are hard to predict; (2) low: variants in coding sequences that cause no change in amino acid (i.e., synonymous); (3) moderate: variants in coding sequences that cause a change in amino acid (i.e., nonsynonymous); and (4) high: variants in coding sequences that cause gain or loss of start and stop codon. We then selected a subset of individuals in each population to polarize our SNPs. To do so, we looked for biallelic SNPs in

Downy Woodpecker for which one of the alleles were fixed in Hairy Woodpecker and vice versa.

The allele fixed in the outgroup was assumed to be the ancestral state. This is a sensitive step in the estimation of genetic load, so we only kept SNPs for which the ancestral state could be deter- mined unambiguously (Simons and Sella 2016; Grossen et al. 2020). We ended up with a total set of 363,903 polarized SNPs across the genome.

We characterized the site frequency spectrum (SFS) for each type of variant (according to the impact inferred from snpEff) by estimating the total frequency of each derived allele and cal- culating the proportion of each allele frequency bin. As a proxy for genetic load, for each individ- ual, we estimated the ratio of the number of derived alleles of high impact (i.e., loss of function) in homozygosity over the number of derived alleles of low impact (i.e, synonymous) in homozy- gosity. This metric assumes a recessive model, in which derived alleles are only deleterious when in a homozygous state. We therefore also considered an additive model (i.e, semi-dominant) that

94

assumes that derived alleles have deleterious effects in both homozygosity and heterozygosity. For this metric, we counted the total number of derived alleles, instead of only the ones in homozy- gosity (Simons and Sella 2016).

To look at selection over a deeper evolutionary scale, we estimated dN/dS, the ratio of nonsynonymous over synonymous substitution, using a set of 397 genes that were orthologous across Downy Woodpecker, Hairy Woodpecker and two avian outgroups – Chicken (Gallus gallus) and Zebra Finch (Taeniopygia guttata). We identified orthologous genes across all four species using the software JustOrthlogs (Miller et al. 2019) and only kept well-aligned loci. We first downloaded

Ensembl genome assemblies and gene annotations for version GRCg6a and bTaeGut1_v1.p of the

Chicken and Zebra Finch genome, respectively (Ensembl v103). We then extracted coding se- quences (CDS) for all identified orthologs from their respective reference genomes using a GFF3 parser included in JustOrthologs and aligned them with the frameshift-aware MACSE software

(Ranwez et al. 2011). We used the parameter setting --min_percent_NT_at_ends 0.3 and -codonFor-

InternalStop NNN for aligning and exporting sequences. The resulting amino-acid alignments were inspected with HMMcleaner to mask sites that were likely misaligned (Amemiya et al. 2013;

Philippe et al. 2017). We finally used codeml to estimate the overall dN/dS ratio along each branch of the tree assuming a one-ratio branch model in PAML (Yang 2007).

2.7. Author contribution

This study was conceived and designed by Lucas Rocha Moreira and Brian Tilston Smith. A sub- set of samples was collected and made available by John Klicka. Lucas Rocha Moreira conducted all bioinformatic analyses and drafted the paper with input from all authors.

95

2.8. References

Adrion, J. R., Galloway, J. G., & Kern, A. D. (2020). Predicting the landscape of recombination using deep learning. Molecular Biology and Evolution, 1–27.

Aguillon, S. M., Campagna, L., Harrison, R. G., & Lovette, I. J. (2018). A flicker of hope: Genomic data distinguish Northern Flicker taxa despite low levels of divergence. The Auk, 135(3), 748– 766.

Amemiya, C. T., Alföldi, J., Lee, A. P., Fan, S., Philippe, H., Maccallum, I., … Lindblad-Toh, K. (2013). The African coelacanth genome provides insights into tetrapod evolution. Nature, 496(7445), 311–316.

Anderson, L. L., Hu, F. S., Nelson, D. M., Petit, R. J., & Paige, K. N. (2006). Ice-age endurance: DNA evidence of a white spruce refugium in Alaska. Proceedings of the National Academy of Sciences, 103(33), 12447–12450.

Andolfatto, P. (2007). Hitchhiking effects of recurrent beneficial amino acid substitutions in the Drosophila melanogaster genome. Genome Research, Vol. 17, pp. 1755–1762. doi: 10.1101/gr.6691007

Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data. Babraham Bioinformatics, Babraham Institute, Cambridge, United Kingdom.

Ardlie, K. G., Kruglyak, L., & Seielstad, M. (2002). Patterns of linkage disequilibrium in the . Nature Reviews. Genetics, 3(4), 299–309.

Balakrishnan, C. N., & Edwards, S. V. (2009). Nucleotide variation, linkage disequilibrium and founder-facilitated speciation in wild populations of the zebra finch (Taeniopygia guttata). Ge- netics, 181(2), 645–660.

Ball, R. M., & Avise, J. C. (1992). Mitochondrial DNA Phylogeographic Differentiation among Avian Populations and the Evolutionary Significance of Subspecies. The Auk, 109(3), 626– 636.

Barton, H. J., & Zeng, K. (2019). The Impact of Natural Selection on Short Insertion and Deletion Variation in the Great Tit Genome. Genome Biology and Evolution, 11(6), 1514–1524.

Begun, D. J., & Aquadro, C. F. (1992). Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature, 356(6369), 519–520.

Beichman, A. C., Huerta-Sanchez, E., & Lohmueller, K. E. (2018). Using Genomic Data to Infer Historic Population Dynamics of Nonmodel Organisms. Annual Review of Ecology, Evolu- tion, and Systematics. doi: 10.1146/annurev-ecolsys-110617-062431

Beissinger, T. M., Wang, L., Crosby, K., Durvasula, A., Hufford, M. B., & Ross-Ibarra, J. (2015). Recent demography drives changes in linked selection across the maize genome. bioRxiv, 31666.

Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina se- quence data. Bioinformatics, 30(15), 2114–2120.

96

Bolívar, P., Guéguen, L., Duret, L., Ellegren, H., & Mugal, C. F. (2019). GC-biased gene conver- sion conceals the prediction of the nearly neutral theory in avian genomes. Genome Biology, 20(1), 5.

Borge, T., Webster, M. T., Andersson, G., & Saetre, G.-P. (2005). Contrasting Patterns of Poly- morphism and Divergence on the Z Chromosome and Autosomes in Two Ficedula Fly- catcher Species. Genetics, 171(4), 1861–1873.

Branca, A., Paape, T. D., Zhou, P., Briskine, R., Farmer, A. D., Mudge, J., … Tiffin, P. (2011). Whole-genome nucleotide diversity, recombination, and linkage disequilibrium in the model legume Medicago truncatula. Proceedings of the National Academy of Sciences, 108(42), E864– E870.

Browning, M. R. (1995). Do Downy Woodpeckers Migrate? (¿Migra Picoides pubescens?). Journal of Field Ornithology, 66(1), 12–21.

Brubaker, L. B., Anderson, P. M., Edwards, M. E., & Lozhkin, A. V. (2005). Beringia as a glacial refugium for boreal trees and shrubs: new perspectives from mapped pollen data. Journal of Biogeography, 32(5), 833–848.

Burbrink, F., Chan, Y. L., Myers, E. A., Ruane, S., Smith, B. T., & Hickerson, M. J. (2016). Asyn- chronous demographic responses to Pleistocene climate change in Eastern Nearctic verte- brates. Ecology Letters, 19(12), 1457–1467.

Burri, R., Nater, A., Kawakami, T., Mugal, C. F., Olason, P. I., Smeds, L., … Ellegren, H. (2015). Linked selection and recombination rate variation drive the evolution of the genomic land- scape of differentiation across the speciation continuum of Ficedula flycatchers. Genome Research, 25(11), 1656–1665.

Campbell-Staton, S. C., Goodman, R. M., Backström, N., Edwards, S. V., Losos, J. B., & Kolbe, J. J. (2012). Out of Florida: mtDNA reveals patterns of migration and Pleistocene range ex- pansion of the Green Anole lizard (Anolis carolinensis). Ecology and Evolution, 2(9), 2274– 2284.

Charif, D., & Lobry, J. R. (2007). SeqinR 1.0-2: A Contributed Package to the R Project for Statis- tical Computing Devoted to Biological Sequences Retrieval and Analysis. In U. Bastolla, M. Porto, H. E. Roman, & M. Vendruscolo (Eds.), Structural Approaches to Sequence Evolu- tion: Molecules, Networks, Populations (pp. 207–232). Berlin, Heidelberg: Springer Berlin Heidelberg.

Charlesworth, B. (2009). Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nature Reviews. Genetics, 10(3), 195–205.

Charlesworth, B., Morgan, M. T., & Charlesworth, D. (1993). The effect of deleterious mutations on neutral molecular variation. Genetics, 134(4), 1289–1303.

Cingolani, P., Platts, A., Wang, L. L., Coon, M., Nguyen, T., Wang, L., … Ruden, D. M. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly, 6(2), 80–92.

Comeron, J. M. (2014). Background Selection as Baseline for Nucleotide Variation across the Dro- sophila Genome. PLoS Genetics, 10(6), e1004434.

97

Corbett-Detig, R. B., Hartl, D. L., & Sackton, T. B. (2015). Natural selection constrains neutral diversity across a wide range of species. PLoS Biology, 13(4), e1002112.

Cruickshank, T. E., & Hahn, M. W. (2014). Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Molecular Ecology, 23(13), 3133–3157.

Cutter, A. D., & Choi, J. Y. (2010). Natural selection shapes nucleotide polymorphism across the genome of the nematode Caenorhabditis briggsae. Genome Research, 20(8), 1103–1111.

Cutter, A. D., & Payseur, B. A. (2013). Genomic signatures of selection at linked sites: unifying the disparity among species. Nature Reviews. Genetics, 14(4), 262–274.

Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A., … Durbin, R. (2011). The variant call format and VCFtools. Bioinformatics, 27(15), 2156–2158.

Davis, M. B. (2001). Range shifts and adaptive responses to Quaternary climate change. Science, 292(5517), 673–679.

Delmore, K. E., Lugo Ramos, J. S., Van Doren, B. M., Lundberg, M., Bensch, S., Irwin, D. E., & Liedvogel, M. (2018). Comparative analysis examining patterns of genomic differentiation across multiple episodes of population divergence in birds. Evolution Letters, 2(2), 76–87. de Pedro, M., Riba, M., González‐Martínez, S. C., Seoane, P., Bautista, R., Claros, M. G., & Mayol, M. (2021). Demography, genetic diversity and expansion load in the colonizing species Leon- todon longirostris (Asteraceae) throughout its native range. Molecular Ecology, mec.15802.

DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., … Daly, M. J. (2011). A framework for variation discovery and genotyping using next-generation DNA se- quencing data. Nature Genetics, 43(5), 491–498.

Dufort, M. J. (2016). An augmented supermatrix phylogeny of the avian family Picidae reveals uncertainty deep in the family tree. Molecular Phylogenetics and Evolution, 94, 313–326.

Duret, L., & Galtier, N. (2009). Biased Gene Conversion and the Evolution of Mammalian Ge- nomic Landscapes. Annual Review of Genomics and Human Genetics, 10(1), 285–311.

Dutoit, L., Burri, R., Nater, A., Mugal, C. F., & Ellegren, H. (2017). Genomic distribution and estimation of nucleotide diversity in natural populations: perspectives from the collared fly- catcher (Ficedula albicollis) genome. Molecular Ecology Resources, 17(4), 586–597.

Dutoit, L., Vijay, N., Mugal, C. F., Bossu, C. M., Burri, R., Wolf, J., & Ellegren, H. (2017). Covari- ation in levels of nucleotide diversity in homologous regions of the avian genome long after completion of lineage sorting. Proceedings of the Royal Society B: Biological Sciences, 284(1849). doi: 10.1098/rspb.2016.2756

Ellegren, H. (2010). Evolutionary stasis: the stable chromosomes of birds. Trends in Ecology & Evolution, 25(5), 283–291.

Ellegren, H. (2013). The evolutionary genomics of birds. Annual Review of Ecology, Evolution, and Systematics, 44(1), 239–259.

Elyashiv, E., Bullaughey, K., Sattath, S., Rinott, Y., Przeworski, M., & Sella, G. (2010). Shifts in the intensity of purifying selection: An analysis of genome-wide polymorphism data from two

98

closely related yeast species. Genome Research, 20(11), 1558–1573.

Excoffier, L., & Foll, M. (2011). fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics, 27(9), 1332–1334.

Figuet, E., Nabholz, B., Bonneau, M., Mas Carrio, E., Nadachowska-Brzyska, K., Ellegren, H., & Galtier, N. (2016). Life History Traits, Protein Evolution, and the Nearly Neutral Theory in Amniotes. Molecular Biology and Evolution, 33(6), 1517–1527.

Gossmann, T. I., Shanmugasundram, A., Börno, S., Duvaux, L., Lemaire, C., Kuhl, H., … Ralser, M. (2019). Ice-Age Climate Adaptations Trap the Alpine Marmot in a State of Low Genetic Diversity. Current Biology: CB, 29(10), 1712–1720.e7.

Gossmann, T. I., Woolfit, M., & Eyre-Walker, A. (2011). Quantifying the Variation in the Effective Population Size Within a Genome. Genetics, 189(4), 1389–1402.

Grabherr, M. G., Russell, P., Meyer, M., Mauceli, E., Alföldi, J., Di Palma, F., & Lindblad-Toh, K. (2010). Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioin- formatics, 26(9), 1145–1151.

Graham, B. A., & Burg, T. M. (2012). Molecular markers provide insights into contemporary and historic gene flow for a non-migratory species. Journal of Avian Biology, 43(3), 198–214.

Grossen, C., Guillaume, F., Keller, L. F., & Croll, D. (2020). Purging of highly deleterious muta- tions through severe bottlenecks in Alpine ibex. Nature Communications, 11(1), 1001.

Han, F., Lamichhaney, S., Grant, B. R., Grant, P. R., Andersson, L., & Webster, M. T. (2017). Gene flow, ancient polymorphism, and ecological adaptation shape the genomic landscape of divergence among Darwin’s finches. Genome Research, 27(6), 1004–1015.

Henn, B. M., Botigué, L. R., Peischl, S., Dupanloup, I., Lipatov, M., Maples, B. K., … Bustamante, C. D. (2016). Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. Proceedings of the National Academy of Sciences of the United States of America, 113(4), E440–E449.

Herrera-Álvarez, S., Karlsson, E., Ryder, O. A., Lindblad-Toh, K., & Crawford, A. J. (2020). How to Make a Rodent Giant: Genomic Basis and Tradeoffs of Gigantism in the Capybara, the World’s Largest Rodent. Molecular Biology and Evolution. doi: 10.1093/molbev/msaa285

Hewitt, G. (2000). The genetic legacy of the Quaternary ice ages. Nature, 405(6789), 907–913.

Hewitt, G. M. (2004). Genetic consequences of climatic oscillations in the Quaternary. Philosoph- ical Transactions of the Royal Society of London. Series B, Biological Sciences, 359(1442), 183–195; discussion 195.

Hruska, J. P., & Manthey, J. D. (2021). De novo assembly of a chromosome-scale reference ge- nome for the northern flicker Colaptes auratus. G3, 11(1). https://doi.org/10.1093/g3jour- nal/jkaa026

Irwin, D. E. (2018). Sex chromosomes and speciation in birds and other ZW systems. Molecular Ecology, 27(19), 3831–3851.

Irwin, D. E., Alcaide, M., Delmore, K. E., Irwin, J. H., & Owens, G. L. (2016). Recurrent selection

99

explains parallel evolution of genomic regions of high relative but low absolute differentiation in a ring species. Molecular Ecology, 25(18), 4488–4507.

Jackson, L. E., Jr. (1979). New evidence for the existence of an icefree corridor in the Rocky Mountain foothills near Calgary, Alberta, during Late Wisconsinan time. Anatomy & Physi- ology: Current Research, 107–111.

Jarvis, E. D., Mirarab, S., Aberer, A. J., Li, B. B., Houde, P., Li, C., … Al., E. (2014). Whole- genome analyses resolve early branches in the tree of life of modern birds. Science, 346(6215), 1320–1331.

Jensen, J. D., Payseur, B. A., Stephan, W., Aquadro, C. F., Lynch, M., Charlesworth, D., & Charles- worth, B. (2019). The importance of the Neutral Theory in 1968 and 50 years on: A response to Kern and Hahn 2018. Evolution; International Journal of Organic Evolution, 73(1), 111– 114.

Jensen-Seaman, M. I. (2004). Comparative Recombination Rates in the Rat, Mouse, and Human Genomes. Genome Research, 14(4), 528–538.

Kardos, M., Husby, A., McFarlane, S. E., Qvarnström, A., & Ellegren, H. (2016). Whole-genome resequencing of extreme phenotypes in collared flycatchers highlights the difficulty of detect- ing quantitative trait loci in natural populations. Molecular Ecology Resources, 16(3), 727– 741.

Kawakami, T., Smeds, L., Backström, N., Husby, A., Qvarnström, A., Mugal, C. F., … Ellegren, H. (2014). A high-density linkage map enables a second-generation collared flycatcher ge- nome assembly and reveals the patterns of avian recombination rate variation and chromo- somal evolution. Molecular Ecology, 23(16), 4035–4058.

Kern, A. D., & Hahn, M. W. (2018). The Neutral Theory in Light of Natural Selection. Molecular Biology and Evolution, 35(6), 1366–1371.

Kim, S. Y., Lohmueller, K. E., Albrechtsen, A., Li, Y., Korneliussen, T., Tian, G., … Nielsen, R. (2011). Estimation of allele frequency and association mapping using next-generation se- quencing data. BMC Bioinformatics, 12, 231.

Kimura, M. (1983). The Neutral Theory of Molecular Evolution. Cambridge University Press.

Kimura, M., & Crow, J. F. (1964). The number of alleles that can be maintained in a finite popu- lation. Genetics, 49(4), 725–738.

Kirkpatrick, M., & Jarne, P. (2000). The Effects of a Bottleneck on Inbreeding Depression and the Genetic Load. The American Naturalist, 155(2), 154–167.

Klicka, J., Spellman, G. M., Winker, K., Chua, V., & Smith, B. T. (2011). A phylogeographic and population genetic analysis of a widespread, sedentary North American bird: The Hairy Woodpecker (Picoides villosus). The Auk, 128(2), 346–362.

Knowles, L. L. (2001). Did the Pleistocene glaciations promote divergence? Tests of explicit refu- gial models in montane grasshoppers. Molecular Ecology, 10(3), 691–701.

Korneliussen, T. S., Albrechtsen, A., & Nielsen, R. (2014). ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics, 15(1), 356.

100

Korneliussen, T. S., Moltke, I., Albrechtsen, A., & Nielsen, R. (2013). Calculation of Tajima’s D and other neutrality test statistics from low depth next-generation sequencing data. BMC Bioinformatics, 14(1), 289.

Lamichhaney, S., Berglund, J., Almén, M. S., Maqbool, K., Grabherr, M., Martinez-Barrio, A., … Andersson, L. (2015). Evolution of Darwin’s finches and their beaks revealed by genome sequencing. Nature, 518(February), 371–375.

Lessa, E. P., Cook, J. A., & Patton, J. L. (2003). Genetic footprints of demographic expansion in North America, but not Amazonia, during the Late Quaternary. Proceedings of the National Academy of Sciences, 100(18), 10331–10334.

Levin, I., Crittenden, L. B., & Dodgson, J. B. (1993). Genetic Map of the Chicken Z Chromosome Using Random Amplified Polymorphic DNA (RAPD) Markers. Genomics, 16(1), 224–230.

Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler trans- form. Bioinformatics, 25(14), 1754–1760.

Li, J., Li, H., Jakobsson, M., Li, S., Sjödin, P., & Lascoux, M. (2012). Joint analysis of demography and selection in population genetics: Where do we stand and where could we go? Molecular Ecology, 21(1), 28–44.

Liu, X., & Fu, Y.-X. (2020). Stairway Plot 2: demographic history inference with folded SNP fre- quency spectra. Genome Biology, 21(1), 280.

Loehr, J., Worley, K., Grapputo, A., Carey, J., Veitch, A., & Coltman, D. W. (2006). Evidence for cryptic glacial refugia from North American mountain sheep mitochondrial DNA. Journal of Evolutionary Biology, 19(2), 419–430.

Mank, J. E., Axelsson, E., & Ellegren, H. (2007). Fast-X on the Z: rapid evolution of sex-linked genes in birds. Genome Research, 17(5), 618–624.

Manthey, J. D., Klicka, J., & Spellman, G. M. (2011). Cryptic diversity in a widespread North American songbird: Phylogeography of the Brown Creeper (Certhia americana). Molecular Phy- logenetics and Evolution, 58(3), 502–512.

Matthey-Doret, R., & Whitlock, M. C. (2019). Background selection and FST: Consequences for detecting local adaptation. Molecular Ecology, 28(17), 3902–3914.

Mattila, T. M., Laenen, B., Horvath, R., Hämälä, T., Savolainen, O., & Slotte, T. (2019). Impact of demography on linked selection in two outcrossing Brassicaceae species. Ecology and Evo- lution, 9(17), 9532–9545.

Maynard, J., & Haigh, J. (2007). The hitch-hiking effect of a favourable gene. Genetics Research, 89(5-6), 391–403.

McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., … DePristo, M. A. (2010). The Genome Analysis Toolkit: A MapReduce framework for analyzing next- generation DNA sequencing data. Genome Research, 20(9), 1297–1303.

Miller, J. B., Pickett, B. D., & Ridge, P. G. (2019). JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm. Bioinformatics, 35(4), 546–552.

101

Minh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., von Haeseler, A., & Lanfear, R. (2020). IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution, 37(5), 1530–1534.

Mugal, C. F., Nabholz, B., & Ellegren, H. (2013). Genome-wide analysis in chicken reveals that local levels of genetic diversity are mainly governed by the rate of recombination. BMC Ge- nomics, 14(1), 86.

Mugal, C. F., Weber, C. C., & Ellegren, H. (2015). GC-biased gene conversion links the recombi- nation landscape and demography to genomic base composition: GC-biased gene conversion drives genomic base composition across a wide range of species. BioEssays: News and Re- views in Molecular, Cellular and Developmental Biology, 37(12), 1317–1326.

Nabholz, B., Künstner, A., Wang, R., Jarvis, E. D., & Ellegren, H. (2011). Dynamic evolution of base composition: causes and consequences in avian phylogenomics. Molecular Biology and Evolution, 28(8), 2197–2210.

Nadachowska-Brzyska, K., Li, C., Smeds, L., Zhang, G., & Ellegren, H. (2015). Temporal dynam- ics of avian populations during Pleistocene revealed by whole-genome sequences. Current Biology: CB, 25(10), 1375–1380.

Nunziata, S. O., & Weisrock, D. W. (2018). Estimation of contemporary effective population size and population declines using RAD sequence data. Heredity, 120(3), 196–207.

Ohta, T. (1973). Slightly deleterious mutant substitutions in evolution. Nature, 246(5428), 96–98.

Okonechnikov, K., Conesa, A., & García-Alcalde, F. (2016). Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics, 32(2), 292–294.

Ouellet, H. R. (1977). Biosystematics and ecology of Picoides villosus (L.) and P. pubescens (L.) (Aves Picidae). McGill University.

Pedersen, M. W., Ruter, A., Schweger, C., Friebe, H., Staff, R. A., Kjeldsen, K. K., … Willerslev, E. (2016). Postglacial viability and colonization in North America’s ice-free corridor. Nature, 1–15.

Petkova, D., Novembre, J., & Stephens, M. (2016). Visualizing spatial population structure with estimated effective migration surfaces. Nature Genetics, 48(1), 94–100.

Philippe, H., de Vienne, D. M., Ranwez, V., Roure, B., Baurain, D., & Delsuc, F. (2017). Pitfalls in supermatrix phylogenomics. European Journal of , (283). doi: 10.5852/ejt.2017.283

Pritchard, J. K., Stephens, M., & Donnelly, P. (2000). Inference of population structure using mul- tilocus genotype data. Genetics, 155(2), 945–959.

Pruett, C. L., & Winker, K. (2008). Evidence for cryptic northern refugia among high- and tem- perate-latitude species in Beringia. Climatic Change, 86(1-2), 23–27.

Pulgarín-R, P. C., & Burg, T. M. (2012). Genetic signals of demographic expansion in Downy Woodpecker (Picoides pubescens) after the Last North American Glacial Maximum. PloS One, 7(7), e40412.

102

Ranwez, V., Harispe, S., Delsuc, F., & Douzery, E. J. P. (2011). MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons. PLoS ONE, Vol. 6, p. e22594. doi: 10.1371/journal.pone.0022594

Reich, D. E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P. C., Richter, D. J., … Lander, E. S. (2001). Linkage disequilibrium in the human genome. Nature, 411(6834), 199–204.

Reid, B. N., Kass, J. M., Wollney, S., Jensen, E. L., Russello, M. A., Viola, E. M., … Naro-Maciel, E. (2018). Disentangling the genetic effects of refugial isolation and range expansion in a trans-continentally distributed species. Heredity. doi: 10.1038/s41437-018-0135-5

Renaut, S., Grassa, C. J., Yeaman, S., Moyers, B. T., Lai, Z., Kane, N. C., … Rieseberg, L. H. (2013). Genomic islands of divergence are not affected by geography of speciation in sun- flowers. Nature Communications, 4(1), 1827.

Robinson, J. A., Brown, C., Kim, B. Y., Lohmueller, K. E., & Wayne, R. K. (2018). Purging of Strongly Deleterious Mutations Explains Long-Term Persistence and Absence of Inbreeding Depression in Island Foxes. Current Biology: CB, 28(21), 3487–3494.e4.

Rougemont, Q., Moore, J.-S., Leroy, T., Normandeau, E., Rondeau, E. B., Withler, R. E., … Ber- natchez, L. (2020). Demographic history shaped geographical patterns of deleterious muta- tion load in a broadly distributed Pacific Salmon. PLoS Genetics, 16(8), e1008348.

Rutter, N. W. (1984). Pleistocene history of the western Canadian ice-free corridor. Quaternary Stratigraphy of Canada: A Canadian Contribution to the IGCP Project, 24, 49–56.

Sakamoto, Y., Ishiguro, M., & Kitagawa, G. (1986). Akaike information criterion statistics (pp. 902–926). Dordrecht, The Netherlands: D. Reidel.

Sauer, J. R., Link, W. A., Fallon, J. E., Pardieck, K. L., & Ziolkowski, D. J. (2017). The North American Breeding Bird Survey: Results and Analysis 1966–2017 Version 2.07.2017. USGS Patuxent Wildlife Research Center.

Schield, D. R., Pasquesi, G. I. M., Perry, B. W., Adams, R. H., Nikolakis, Z. L., Westfall, A. K., … Castoe, T. A. (2020). Snake recombination landscapes are concentrated in functional regions despite PRDM9. Molecular Biology and Evolution, 37(5), 1272–1294.

Schmid, M., Nanda, I., Guttenbach, M., Steinlein, C., Hoehn, M., Schartl, M., … Mizuno, S. (2000). First report on chicken genes and chromosomes 2000. Cytogenetic and Genome Research, 90(3-4), 169–218.

Schrempf, D., Minh, B. Q., De Maio, N., von Haeseler, A., & Kosiol, C. (2016). Reversible poly- morphism-aware phylogenetic models and their application to tree inference. Journal of The- oretical Biology, 407, 362–370.

Shafer, A. B. A., Cullingham, C. I., Côté, S. D., & Coltman, D. W. (2010). Of glaciers and refugia: A decade of study sheds new light on the phylogeography of northwestern North America. Molecular Ecology, 19(21), 4589–4621.

Simonsen, K. L., Churchill, G. A., & Aquadro, C. F. (1995). Properties of statistical tests of neu- trality for DNA polymorphism data. Genetics, 141(1), 413–429.

Simons, Y. B., & Sella, G. (2016). The impact of recent population history on the deleterious

103

mutation load in humans and close evolutionary relatives. Current Opinion in Genetics & Development, 41, 150–158.

Simons, Y. B., Turchin, M. C., Pritchard, J. K., & Sella, G. (2014). The deleterious mutation load is insensitive to recent population history. Nature Genetics, 46(3), 220–224.

Singhal, S., Leffler, E. M., Sannareddy, K., Turner, I., Venn, O., Hooper, D. M., … Przeworski, M. (2015). Stable recombination hotspots in birds. Science, 350(6263), 928–932.

Skotte, L., Korneliussen, T. S., & Albrechtsen, A. (2013). Estimating individual admixture propor- tions from next generation sequencing data. Genetics, 195(3), 693–702.

Smeds, L., Qvarnström, A., & Ellegren, H. (2016). Direct estimate of the rate of germline mutation in a bird. Genome Research, 26(9), 1211–1218.

Smith, B. T., Seeholzer, G. F., Harvey, M. G., Cuervo, A. M., & Brumfield, R. T. (2017). A latitu- dinal phylogeographic diversity gradient in birds. PLoS Biology, 15(4), e2001073.

Smukowski, C. S., & Noor, M. A. F. (2011). Recombination rate variation in closely related species. Heredity, 107(6), 496–508.

Stankowski, S., Chase, M. A., Fuiten, A. M., Rodrigues, M. F., Ralph, P. L., & Streisfeld, M. A. (2019). Widespread selection and gene flow shape the genomic landscape during a radiation of monkeyflowers. PLoS Biology, 17(7), e3000391.

Sundström, H., Webster, M. T., & Ellegren, H. (2004). Reduced Variation on the Chicken Z Chro- mosome. Genetics, Vol. 167, pp. 377–385. doi: 10.1534/genetics.167.1.377

Tacutu, R., Thornton, D., Johnson, E., Budovsky, A., Barardo, D., Craig, T., … de Magalhães, J. P. (2018). Human Ageing Genomic Resources: new and updated databases. Nucleic Acids Research, 46(D1), D1083–D1090.

Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA poly- morphism. Genetics, 123(3), 585–595.

Talla, V., Soler, L., Kawakami, T., Dincă, V., Vila, R., Friberg, M., … Backström, N. (2019). Dis- secting the Effects of Selection and Mutation on Genetic Diversity in Three Wood White (Leptidea) Butterfly Species. Genome Biology and Evolution, 11(10), 2875–2886.

Toews, D. P. L., & Brelsford, A. (2012). The biogeography of mitochondrial and nuclear discord- ance in animals. Molecular Ecology, 21(16), 3907–3930.

Van Doren, B. M., Campagna, L., Helm, B., Illera, J. C., Lovette, I. J., & Liedvogel, M. (2017). Correlated patterns of genetic diversity and differentiation across an avian family. Molecular Ecology, 26(15), 3982–3997.

Vijay, N., Weissensteiner, M., Burri, R., Kawakami, T., Ellegren, H., & Wolf, J. B. W. (2017). Ge- nomewide patterns of variation in genetic diversity are shared among populations, species and higher-order taxa. Molecular Ecology, 26(16), 4284–4295.

Volker, M., Backstrom, N., Skinner, B. M., Langley, E. J., Bunzey, S. K., Ellegren, H., & Griffin, D. K. (2010). Copy number variation, chromosome rearrangement, and their association with recombination during avian evolution. Genome Research, 20(4), 503–511.

104

Walstrom, V. W., Klicka, J., & Spellman, G. M. (2012). Speciation in the White-breasted Nuthatch (Sitta carolinensis): a multilocus perspective. Molecular Ecology, 21(4), 907–920.

Waltari, E., Hijmans, R. J., Peterson, A. T., Nyári, Á. S., Perkins, S. L., & Guralnick, R. P. (2007). Locating Pleistocene refugia: comparing phylogeographic and ecological niche model predic- tions. PloS One, 2(7), e563.

Wang, J., Street, N. R., Park, E., Liu, J., & Ingvarsson, P. K. (2020). Evidence for widespread selection in shaping the genomic landscape during speciation of Populus. Molecular Ecology, 29(6), 1120–1136.

Wang, J., Street, N. R., Scofield, D. G., & Ingvarsson, P. K. (2016). Natural selection and recom- bination rate variation shape nucleotide polymorphism across the genomes of three related populus species. Genetics, 202(3), 1185–1200.

Wang, X. J., Hu, Q. J., Guo, X. Y., Wang, K., Ru, D. F., German, D. A., … Liu, J. Q. (2018). Demographic expansion and genetic load of the halophyte model plant Eutrema salsugineum. Molecular Ecology, 27(14), 2943–2955.

Waterhouse, R. M., Seppey, M., Simão, F. A., Manni, M., Ioannidis, P., Klioutchnikov, G., … Zdobnov, E. M. (2018). BUSCO applications from quality assessments to gene prediction and phylogenomics. Molecular Biology and Evolution, 35(3), 543–548.

Watterson, G. A. (1975). On the number of segregating sites in genetical models without recom- bination. Theoretical Population Biology, 7(2), 256–276.

Weber, C. C., Boussau, B., Romiguier, J., Jarvis, E. D., & Ellegren, H. (2014). Evidence for GC- biased gene conversion as a driver of between-lineage differences in avian base composition. Genome Biology, 15(12), 549.

Webster, M. T., Axelsson, E., & Ellegren, H. (2006). Strong regional biases in nucleotide substitu- tion in the chicken genome. Molecular Biology and Evolution, 23(6), 1203–1216.

Wehrens, R., & Mevik, B.-H. (2007). The pls package: principal component and partial least squares regression in R. Retrieved from https://repository.ubn.ru.nl/bitstream/han- dle/2066/36604/36604.pdf

Weibel, A. C., & Moore, W. S. (2005). Plumage convergence in Picoides woodpeckers based on a molecular phylogeny, with emphasis on convergence in Downy and Hairy woodpeckers. The Condor, 107(4), 797–809.

Willeit, M., Ganopolski, A., Calov, R., & Brovkin, V. (2019). Mid-Pleistocene transition in glacial cycles explained by declining CO2 and regolith removal. Science Advances, 5(4), eaav7337.

Willi, Y., Fracassetti, M., Zoller, S., & Van Buskirk, J. (2018). Accumulation of Mutational Load at the Edges of a Species Range. Molecular Biology and Evolution, 35(4), 781–791.

Wilson Sayres, M. A. (2018). Genetic diversity on the sex chromosomes. Genome Biology and Evolution, 10(4), 1064–1078.

Wu, T. D., & Watanabe, C. K. (2005). GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics, 21(9), 1859–1875.

105

Xue, Y., Prado-Martinez, J., Sudmant, P. H., Narasimhan, V., Ayub, Q., Szpak, M., … Scally, A. (2015). Mountain gorilla genomes reveal the impact of long-term population decline and in- breeding. Science, 348(6231), 242–245.

Xu, L., Wa Sin, S. Y., Grayson, P., Edwards, S. V., & Sackton, T. B. (2019). Evolutionary Dynamics of Sex Chromosomes of Paleognathous Birds. Genome Biology and Evolution, 11(8), 2376– 2390.

Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution, 24(8), 1586–1591.

Zhang, C., Dong, S.-S., Xu, J.-Y., He, W.-M., & Yang, T.-L. (2018). PopLDdecay: a fast and effec- tive tool for linkage disequilibrium decay analysis based on variant call format files. Bioinfor- matics, (October), 1–3.

Zhang, G., Li, C., Li, Q., Li, B. B., Larkin, D. M., Lee, C., … Froman, D. P. (2014). Comparative genomics reveals insights into avian genome evolution and adaptation. Science, 346(6215), 1311–1320.

Zheng, X., Levine, D., Shen, J., Gogarten, S. M., Laurie, C., & Weir, B. S. (2012). A high-perfor- mance computing toolset for relatedness and principal component analysis of SNP data. Bi- oinformatics, 28(24), 3326–3328.

Zhou, Q., Zhang, J., Bachtrog, D., An, N., Huang, Q., Jarvis, E. D., … Zhang, G. (2014). Complex evolutionary trajectories of sex chromosomes across bird taxa. Science, 346(6215), 1246338– 1246338.

Zink, R. M. (1996). Comparative phylogeography in North American birds. Evolution; Interna- tional Journal of Organic Evolution, 50(1), 308–317.

Zink, R. M., Klicka, J., & Barber, B. R. (2004). The tempo of avian diversification during the Quaternary. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 359(1442), 215–220.

106

2.9. Supplemental Material

Table 2.S1. Sample information. Species Sample ID Population Institution Voucher Sex Number D. pubescens PP-NE-26 NE AMNH DOT 17125 F D. pubescens PP-NE-37 NE AMNH DOT 20863 F D. pubescens PP-NE-38 NE AMNH DOT 20864 F D. pubescens PP-NE-39 NE AMNH DOT 20865 F D. pubescens PP-NE-40 NE AMNH DOT 20866 F D. pubescens PP-NE-42 NE AMNH DOT 21195 M D. pubescens PP-NE-43 NE AMNH DOT 21196 F D. pubescens PP-NE-47 NE CUMV 55468 F D. pubescens PP-NE-48 NE CUMV 55567 F D. pubescens PP-NE-49 NE CUMV 55647 M D. pubescens PP-MW-2 MW FMNH 442445 M D. pubescens PP-MW-3 MW FMNH 442446 M D. pubescens PP-MW-4 MW FMNH 442447 M D. pubescens PP-MW-7 MW FMNH 461726 M D. pubescens PP-MW-11 MW FMNH 485368 F D. pubescens PP-MW-18 MW MMNH 47208 F D. pubescens PP-MW-19 MW MMNH 47209 M D. pubescens PP-MW-20 MW MMNH 47292 M D. pubescens PP-MW-21 MW MMNH 47293 F D. pubescens PP-MW-22 MW MMNH 49295 M D. pubescens PP-SR-9 SR DMNS 44330 M D. pubescens PP-SR-10 SR DMNS 44404 F D. pubescens PP-SR-12 SR MSB 26384 M D. pubescens PP-SR-13 SR MSB 26652 M D. pubescens PP-SR-15 SR MSB 28961 F D. pubescens PP-SR-16 SR MSB 29394 F D. pubescens PP-SR-17 SR MSB 29766 F D. pubescens PP-SR-18 SR MSB 30563 F D. pubescens PP-SR-19 SR MSB 30599 M D. pubescens PP-SR-21 SR MSB 41048 M D. pubescens PP-NW-5 NW UWBM 79382 F D. pubescens PP-NW-8 NW UWBM 81721 F D. pubescens PP-NW-9 NW UWBM 85226 M D. pubescens PP-NW-10 NW UWBM 85940 F D. pubescens PP-NW-11 NW UWBM 89835 F D. pubescens PP-NW-12 NW UWBM 91432 F D. pubescens PP-NW-13 NW UWBM 119080 M D. pubescens PP-NW-15 NW UWBM 121308 M D. pubescens PP-NW-16 NW UWBM 121544 F D. pubescens PP-NW-18 NW UWBM 122655 F D. pubescens PP-AK-1 AK AMNH LRM10 M D. pubescens PP-AK-2 AK AMNH LRM46 F D. pubescens PP-AK-3 AK AMNH LRM50 F D. pubescens PP-AK-4 AK AMNH LRM49 F

107

D. pubescens PP-AK-5 AK AMNH LRM52 F D. pubescens PP-AK-6 AK AMNH LRM51 F D. pubescens PP-AK-7 AK AMNH LRM47 M D. pubescens PP-AK-8 AK AMNH LRM48 F D. pubescens PP-AK-9 AK AMNH LRM53 M D. pubescens PP-AK-10 AK AMNH LRM74 F D. pubescens PP-NR-01 NR AMNH LRM054 F D. pubescens PP-NR-02 NR AMNH LRM058 F D. pubescens PP-NR-03 NR AMNH LRM059 M D. pubescens PP-NR-04 NR AMNH LRM067 M D. pubescens PP-NR-05 NR AMNH LRM068 M D. pubescens PP-NR-06 NR AMNH LRM069 F D. pubescens PP-NR-07 NR AMNH LRM070 M D. pubescens PP-NR-08 NR AMNH LRM071 M D. pubescens PP-NR-09 NR AMNH LRM072 F D. pubescens PP-NR-10 NR AMNH LRM073 F D. pubescens PP-SE-01 SE LSUMZ B45645 F D. pubescens PP-SE-02 SE LSUMZ B45652 D. pubescens PP-SE-08 SE LSUMZ B56712 M D. pubescens PP-SE-09 SE LSUMZ B56924 F D. pubescens PP-SE-10 SE LSUMZ B56926 M D. pubescens PP-SE-12 SE LSUMZ B57578 D. pubescens PP-SE-14 SE LSUMZ B62479 D. pubescens PP-SE-15 SE LSUMZ B62481 D. pubescens PP-SE-16 SE UWBM 96589 F D. pubescens PP-SE-18 SE UWBM 105426 F D. villosus PV-NE-01 NE AMNH F D. villosus PV-NE-27 NE AMNH DOT 18646 M D. villosus PV-NE-28 NE AMNH DOT 18681 M D. villosus PV-NE-29 NE AMNH DOT 18682 F D. villosus PV-NE-31 NE AMNH DOT 18789 M D. villosus PV-NE-32 NE AMNH DOT 20174 M D. villosus PV-NE-36 NE AMNH DOT 21148 M D. villosus PV-NE-37 NE AMNH DOT 21149 F D. villosus PV-NE-47 NE AMNH DOT 22674 M D. villosus PV-NE-51 NE AMNH DOT 23032 F D. villosus PV-NW-8 NW MMNH 47181 F D. villosus PV-NW-12 NW MMNH 47215 F D. villosus PV-NW-16 NW UWBM 49955 F D. villosus PV-NW-17 NW UWBM 49958 F D. villosus PV-NW-18 NW UWBM 62606 F D. villosus PV-NW-21 NW UWBM 79778 M D. villosus PV-NW-23 NW UWBM 84259 F D. villosus PV-NW-25 NW UWBM 89951 M D. villosus PV-NW-26 NW UWBM 109529 M D. villosus PV-NW-27 NW UWBM 109530 F D. villosus PV-SE-1 SE LSUMZ 804 M D. villosus PV-SE-2 SE LSUMZ 3840 M D. villosus PV-SE-3 SE LSUMZ 8532 F D. villosus PV-SE-4 SE UWBM 116288 F D. villosus PV-SE-5 SE UWBM 116289 F

108

D. villosus PV-SE-6 SE AMNH LRM86 M D. villosus PV-SE-7 SE AMNH LRM87 M D. villosus PV-SE-8 SE AMNH LRM88 M D. villosus PV-SE-9 SE AMNH LRM89 F D. villosus PV-SE-10 SE AMNH LRM90 F D. villosus PV-MW-2 MW FMNH 387974 M D. villosus PV-MW-3 MW FMNH 432747 F D. villosus PV-MW-7 MW FMNH 477418 F D. villosus PV-MW-9 MW FMNH 480382 M D. villosus PV-MW-11 MW FMNH 486061 F D. villosus PV-MW-12 MW FMNH 487467 F D. villosus PV-MW-13 MW MMNH 43391 F D. villosus PV-MW-14 MW MMNH 43570 M D. villosus PV-MW-15 MW MMNH 47218 M D. villosus PV-MW-16 MW MMNH 47646 F D. villosus PV-SR-6 SR DMNS 43511 D. villosus PV-SR-12 SR DMNS 46730 M D. villosus PV-SR-16 SR DMNS 47414 M D. villosus PV-SR-18 SR MSB 26715 F D. villosus PV-SR-19 SR MSB 29251 F D. villosus PV-SR-23 SR MSB 39737 F D. villosus PV-SR-24 SR MSB 40436 M D. villosus PV-SR-27 SR MSB 40790 F D. villosus PV-SR-30 SR MSB 45060 F D. villosus PV-SR-31 SR MSB 45150 F D. villosus PV-AK-1 AK AMNH LHD1133 M D. villosus PV-AK-2 AK AMNH LHD1134 F D. villosus PV-AK-3 AK AMNH LHD1137 M D. villosus PV-AK-4 AK AMNH LHD1106 F D. villosus PV-AK-5 AK AMNH LHD1105 M D. villosus PV-AK-6 AK AMNH LHD1107 M D. villosus PV-AK-7 AK AMNH LHD1108 F D. villosus PV-AK-8 AK AMNH LHD1117 F D. villosus PV-AK-9 AK AMNH LHD1118 M D. villosus PV-AK-10 AK AMNH LHD1138 F D. villosus PV-NR-1 NR AMNH LRM055 F D. villosus PV-NR-2 NR AMNH LRM056 F D. villosus PV-NR-3 NR AMNH LRM057 M D. villosus PV-NR-4 NR AMNH LRM060 M D. villosus PV-NR-5 NR AMNH LRM061 F D. villosus PV-NR-6 NR AMNH LRM062 F D. villosus PV-NR-7 NR AMNH LRM063 M D. villosus PV-NR-8 NR AMNH LRM064 D. villosus PV-NR-9 NR AMNH LRM065 F D. villosus PV-NR-10 NR AMNH LRM066 M

NE: Northeast; SE: Southeast; MW: Mid-West; SR: Southern Rockies; NR: Northern Rockies; NW: Pacific Northwest; AK: Alaska. F: female, M: male. AMNH: American Museum of Natural History; CUMV: Cornell University Museum of Vertebrates. FMNH: Field Museum of Natural History; MMNH: Midwest Museum of Natural History; DMNS: Denver Museum of Nature &

109

Sciences, MSB: Museum of Southwestern Biology; UWBM: University of Washington Burke Mu- seum; LSUMZ: Louisiana State University Museum of Zoology.

Table 2.S2. Model selection in fastsimcoal2. Number of Relative Species Model Max ln(L) AIC parameters weight Single ancestral -4361743.523 26 8723539 1 Downy population Woodpecker Two ancestral -4365454.312 30 8730969 0 populations Single ancestral -3572642.346 26 7145337 0 population Hairy Woodpecker Two ancestral -3571022.1 30 7142104 1 populations

110

Table 2.S3. Parameter estimates for the best model in fastsimcoal2 and their respective 95% confidence intervals.

Species Ne_AK Ne_E Ne_R Ne_NW B-Ne_AK B-Ne_E B-Ne_R B-Ne-NW

Estimate 1908367 8185752 9765060 1744261 533967 454210 204653 1556039 Lower 1016566 1046137 1015006 1015893 4.55E+02 8.32E+02 0 7.99E+02 95% CI Upper 3003138 11273989 35811351 2675784 2.23E+07 4426606 1782097 1.45E+08 95% CI Anc_Ne T_div T_div_E T_div_W T_exp_AK T_exp_E T_exp_R T_exp_NW

Estimate 7964011 516553 251877 252967 383864 309059 2.53E-06 2.15E-07 Lower 5653252 241530 114423 115212 112731 113475 1.52E-06 1.75E-09 95% CI Upper 20261433 910177 599007 613959 629105 634054 3.15E-06 2.26E-06

95% CI Downy

M_R>AK M_AK>R M_NW>AK M_AK>NW M_R>E M_E>R M_NW>E M_E>NW 111 Woodpecker Estimate 1.27E-06 3.35E-07 3.24E-06 1.63E-07 3.26E-08 8.29E-09 5.13E-07 4.83E-06 Lower 2.92E-09 1.81E-09 1.87E-06 1.95E-09 2.10E-09 1.50E-09 6.66E-10 6.15E-10 95% CI Upper 2.80E-06 6.53E-06 3.51E-06 3.74E-06 2.35E-06 3.55E-06 8.34E-05 1.08E-04 95% CI M_NW>R M_R>NW

Estimate 1.66E-07 3.70E-06 Lower 7.47E-10 6.91E-10 95% CI Upper 2.73E-05 6.95E-05 95% CI Hairy Ne_AK Ne_E Ne_R Ne_NW B-Ne_AK B-Ne_E B-Ne_R B-Ne-NW

Woodpecker Estimate 7567355 18365368 47953365 5059890 804416 1486729 4693555 1515766 Lower 1015730 10152829 20313563 1015965 31680 1187902 12159534 22617 95% CI Upper 12108980 86705380 94821663 6615990 44404284 1913030 51097323 2.3E+07 95% CI Ne-AK+E Ne-NW+R Anc_Ne T_div T_div_E T_div_W T_exp_AK T_exp_E

Estimate 33510699 30829288 2751382 873091 425555 508965 319711 347851 Lower 27603525 10155085 1033504 848731 272556 405623 115956 115921 95% CI Upper 52560389 67639132 3651731 926396 486276 571448 529659 633082 95% CI T_exp_R T_exp_NW M_E>AK M_AK>E M_R>AK M_AK>R M_NW>AK M_AK>NW

Estimate 350026 335266 8.30E-07 8.89E-08 3.22E-07 9.78E-09 7.99E-07 3.90E-07

Lower 307984 117679 1.98E-09 6.70E-09 1.34E-09 2.76E-09 1.28E-09 1.73E-09 95% CI 112 Upper 356689 429556 5.81E-06 ‡ 3.55E-06 ‡ 4.11E-06 3.88E-06 95% CI M_R>E M_E>R M_NW>E M_E>NW M_NW>R M_R>NW

Estimate 3.87E-07 4.07E-10 5.17E-07 9.91E-07 4.23E-09 1.45E-06 Lower 1.24E-09 3.07E-10 1.05E-07 1.81E-09 2.41E-09 2.22E-06 95% CI Upper ‡ 5.03E-10 ‡ 5.62E-06 ‡ 4.02E-06 95% CI Ne_[pop]: current Ne in population pop; B-Ne_[pop]: Ne during bottleneck in population pop; Ne_[pop1+pop2]: Ne in population ancestral to pop1 and pop2; Anc_Ne: ancestral Ne; T_div: time of divergence (in years); T_div_E: time of divergence of the Eastern clade (in years); T_div_W: time of divergence of the Western clade

(in years); T_exp_[pop]: time of expansion of population pop (in years); M_[pop1>pop1]: migration rate (in percent of Ne) from pop1 to pop2. ‡: Confidence intervals could not be determined because point estimate fell outside the range of values estimated from bootstrap simulations.

Table 2.S4. Principal component regression.

% of variance explained (R2) Species Explanatory variables PC1 PC2 PC3 Recombination rate 0.08 11.89 0.11

Downy Gene density 1.7 0.03 7.8 Woodpecker GC content 1.65 0.36 7.58 Total 3.45 12.3 15.51 Recombination rate 0.35 17.35 0.01 Gene density 2.84 1.02 7 Hairy Woodpecker GC content 2.96 0.06 7.34 Total 6.18 18.6 14.47

Figure 2.S1. Correlated landscape of diversity in Downy (top) and Hairy (bottom) Wood- pecker. (a) Manhattan plot of nucleotide diversity (θπ) along the genome. Each point represents a non-overlapping 100 kb window. Colors depict different chromosomes. (b) Scatterplot of the correlation in nucleotide diversity between Downy and Hairy Woodpecker. Illustrations repro- duced with permission from Lynx Edicions.

113

Figure 2.S2. Boxplot of recombination rate in each chromosome of Downy Woodpecker. Horizontal lines indicate medians, boxes span the interquartile range (IQR), and points represent outliers.

Figure 2.S3. Boxplot of recombination rate in each chromosome of Hairy Woodpecker. Horizontal lines indicate medians, boxes span the interquartile range (IQR), and points represent outliers.

114

Figure 2.S4. Correlation among genomic variables in Downy Woodpecker. Colder colors represent positive values of Pearson’s r. Warmer colors represent negative values of Pearson’s r.

115

Figure 2.S5. Correlation among genomic variables in Hairy Woodpecker. Colder colors represent positive values of Pearson’s r. Warmer colors represent negative values of Pearson’s r.

116

Chapter 3

PARALLEL GENOMIC SIGNATURES OF LOCAL ADAPTATION ACROSS A CONTINENTAL-SCALE ENVIRONMENTAL GRADIENT

*L.R. Moreira is the lead author on a version of this manuscript in preparation for publication.

117

3.1. Abstract

The repeated and independent use of genetic mechanisms for local adaptation provides a win- dow into the role of constraint and stochasticity in species evolution. While most studies investi- gating the genomics of parallel local adaptation focus on closely related lineages distributed across sharp environmental contrasts, little is known about adaptation across continental scales that encompass extreme gradients. To fill this gap, we investigate the genomic architecture of parallel local adaptation in Downy (Dryobates pubescens) and Hairy Woodpecker (D. villosus), two ecologically similar species that co-occur across a complex environmental gradient in North

America. Downy and Hairy Woodpecker exhibit remarkably parallel patterns of geographic vari- ation in plumage and body size – birds are generally darker in the west and larger in higher lati- tudes and elevations. Their long-lasting coexistence in this shared landscape have led them to ex- perience very similar biotic and abiotic selective pressures. If parallel genetic mechanisms for lo- cal adaptation exist, we expect the same loci to have been targeted by selection in the two spe- cies. We tested this hypothesis by comparing signatures of selection across several populations of

Downy and Hairy Woodpecker using whole-genome resequencing data. Our results uncovered limited evidence of genomic parallelism at the SNP level, but an exceedingly large overlap in can- didate genes, indicating that climatic adaptation was more repeatable than expected. We found a large number of SNPs showing correlation with temperature and precipitation, most of which were in non-coding regions, highlighting the dominant role of regulatory change in adaptive evo- lution. Population comparisons detected several candidate genes exhibiting evidence of selective sweeps (e.g. elevated FST and extended homozygosity). A closer look at these loci revealed a range of biological processes, including immune response, nutritional metabolism, mitochondrial respiration, and embryonic development. Our genomic scan for selection also identified potential candidates associated with key phenotypic traits in Downy and Hairy Woodpecker, such as genes in the IGF signaling pathway, putatively linked to differences in body size, and the melanoregulin

118

gene (MREG), potentially involved in plumage variation. Our results provide compelling evi- dence of the dominant role of genomic parallelism in local adaptation across a broad-scale envi- ronmental gradient.

3.2. Introduction

Parallel local adaptation provides insight into the relative role of determinism and stochasticity in evolution (Lobkovsky and Koonin 2012; Orgogozo et al. 2015). One question of particular interest is how often independent lineages subjected to the same selective forces adapt via the same genetic mechanisms. Gould (1989) used the popular metaphor of “replaying the tape of life” to express his view that evolution is dominated by stochastic processes ― if we could go back in time and replay the tape of life, the outcome would be unpredictable. However, empirical studies on the genomics of adaptation have suggested that parallelism on the genetic level is much more common than previously thought (Conte et al. 2012; Martin and Orgogozo 2013; Holliday et al. 2016; Fraser and Whiting 2020; Konečná et al. 2021). Common genomic regions have been implicated in adap- tation to freshwater environments in sticklebacks (Cresko et al. 2004; Colosimo et al. 2005; Chan et al. 2010; Terekhanova et al. 2019; Magalhaes et al. 2020), cold tolerance in Drosophila (Pool et al.

2016), and high altitude in birds (McCracken et al. 2009; Natarajan et al. 2015; Lim et al. 2019).

The repeated use of the same loci during independent episodes of adaptation not only supports parallelism, but also suggests that particular constraints might exist on the number of evolutionary pathways available for adaptive evolution. For example, some genes might contribute more often to adaptation owing to their larger phenotypic effects, lower functional redundancy, higher mutation rates, or fewer epistatic and pleiotropic interactions (Stern and Orgogozo 2008;

Conte et al. 2012; Rosenblum et al. 2014). Under such circumstances, closely related taxa diverging along a similar environmental gradient, such as a latitudinal cline, will tend to exhibit some degree of genetic parallelism, as natural selection operates on organisms with a similar genetic background

(Holliday et al. 2016; Yeaman et al. 2016). On the other hand, local adaptation may evolve via

119

different genetic mechanisms when adaptive mutations are highly redundant and genomic back- grounds differ greatly (Yuan and Stinchcombe 2020; Fraser and Whiting 2020). For example, in divergent populations of Arabidopsis lyrata, climatic adaptation is characterized by lineage-specific signatures of selection (Walden et al. 2020). Similarly, multiple molecular mechanisms provide thermal adaptation in mice of the genus Peromyscus who inhabit contrasting temperature zones

(Colella et al. 2020).

While most studies on parallel adaptation have focused on simple environmental contrasts

(e.g., high vs low elevation; temperature gradient; marine vs freshwater environments; Hohenlohe et al. 2010; Foll et al. 2014; Pool et al. 2016; Walsh et al. 2019), little is known about the mechanisms underlying adaptation on a continental scale, encompassing multiple pronounced contrasts. More- over, much of the knowledge of the repeatability of adaptive evolution is based on the comparison of either very closely related lineages, whose shared standing genetic variation leads to evolutionary nonindependence (Pool et al. 2016; Lamichhaney et al. 2017; Fang et al. 2020; Magalhaes et al.

2020), or very distantly related species, whose diverse demographic histories and genomic back- grounds complicates the interpretation of genomic parallelism (Wang et al. 2013; Yeaman et al.

2016; Walsh et al. 2019; Walters et al. 2020).

We investigate the genomic architecture of parallel local adaptation in Downy (Dryobates pubescens) and Hairy Woodpecker (D. villosus), two sympatric and ecologically similar species that co-occur across a complex environmental gradient in North America. These two woodpeckers are year-round residents of a variety of forested habitats, including coniferous, deciduous and mixed forests, being found from Alaska to Florida, although populations of Hairy Woodpecker are also found in Central America and the Bahamas (Ouellet, 1977). Despite belonging to different clades, separated more than 8 million years ago (mya; Dufort 2016; Shakya et al. 2017), Downy and Hairy

Woodpecker resemble each other more closely than other species of their clades (Weibel and

Moore 2005). This plumage convergence is hypothesized to be the result of interspecies social

120

dominance mimicry (Prum and Samuelson 2012), a phenomenon commonly found in woodpeck- ers (Lammertink et al. 2016; Miller et al. 2019; Fernández et al. 2020). Both species also exhibit extensive geographic variation in plumage and body size throughout their range (Ouellet 1977). In general, their parallel geographic variation comply with major ecogeographical rules – individuals of both species are darker in the more humid west and larger in higher latitudes and elevations, a pattern commonly observed in North American birds (Rand 1961; James 1970; Cooper 2018).

This apparent association between phenotype and environment suggests a potential effect of nat- ural selection on phenotypes. Considering that Downy and Hairy Woodpecker have the same dis- tribution, similar ecologies, evolved in a shared landscape for the same period of time, and exhibit parallel phenotypes, they are natural evolutionary replicates to ask questions about parallel natural selection.

The recent population history of Downy and Hairy Woodpecker has been strongly im- pacted by the Pleistocene glaciations. Over the past one million years, populations of both wood- peckers experienced repeated cycles of bottleneck and population expansion as a result of the advance and retreat of the Pleistocene glacier in North America (Chapter 2). These changes in habitat availability during the Pleistocene led to isolation in multiple glacial refugia, which along with heterogeneous gene flow across the landscape, caused population differentiation. Despite this dynamic demographic history, Downy and Hairy Woodpecker were able to maintain very large effective population sizes, which might have facilitated the action of natural selection. Genomic evidence suggests that linked selection was an important driver of variation in nucleotide diversity along the genome (Chapter 2).

As Downy and Hairy Woodpeckers independently colonized previously glaciated habitats in North America, founder populations had to adapt to a number of novel environmental stress- ors, such as exceptionally low temperatures, seasonal food scarcity, different diets, and new path- ogens and competitors. Genetic variants that allowed individuals to overcome these challenges

121

were likely to have rapidly increased in frequency, leaving clear signatures in the genome (Savo- lainen et al. 2013; Hoban et al. 2016). Populations currently persisting in extreme environments, such as the boreal forests, where temperatures drop to below -10°C, had to inevitably adapt to these local conditions. Prolonged periods of winter, in particular, present a major physiological challenge to small birds, as they must maintain high metabolic rates of energy consumption in the face of severe cold, reduced access to food, and fewer hours of daylight (Steen 1958; Liknes and

Swanson 1996). Characters that confer greater cold resistance (e.g., behavioral and physiological adjustments) are therefore likely to be targeted by natural selection (Kendeigh and Blem 1974).

Downy Woodpecker basal and peak metabolic rates are significantly higher during the winter than during the summer, which indicates that individuals are capable of elevating their metabolic rates in order to compensate for heat loss (Liknes and Swanson 1996; Swanson 2006). Common garden experiments in birds suggest that intraspecific variation in metabolic rates has a strong genetic component and is frequently subjected to divergent natural selection (Wikelski et al. 2003; Broggi et al. 2005). Body size also plays an important adaptive role – heat loss is more pronounced in smaller birds relative to larger ones due to their increased surface to volume ratios. Consequently, optimal body mass seems to vary according to climate (James 1970). The Downy and Hairy Wood- peckers fit this expectation, showing variation that follows closely Bergmann’s ecogeographic rule, which states that individuals in cooler climates are generally larger than conspecifics living in warmer climates (Rand 1961; Hamilton 1961; James 1970; McNab 1971; Ouellet 1977).

Considering the variety of biotic and abiotic factors that impose spatially varying selective pressures on populations of the Downy and Hairy Woodpeckers, we employed a suite of genomic approaches to identify signatures of local adaptation. We resequenced the whole genome of 140 individuals of Downy and Hairy Woodpecker to characterize the genetic basis of local adaptation in these two species and test whether similar genes/loci have been targeted by natural selection for adaptation to a shared environment. We predict that if constraints exist in the number of available pathways for adaptation, a greater than expected number of variants, genes, or biological processes

122

should show signatures of selection in both species (i.e., genomic parallelism). On the other hand, if multiple evolutionary solutions exist for local adaptation and outcomes of natural selection are completely contingent on past stochastic events, we expect signatures of selection to be largely species-specific. This study thus provides insight into the role of parallelism in adaptive evolution and presents exciting new candidates putatively implicated in key phenotypic differences in wood- peckers.

3.2. Results & Discussion

We performed whole genome resequencing in 140 individuals of Downy and Hairy Woodpecker

(70 samples per species) from seven geographic locations representing major bioclimatic domains in temperate North America (n = 10 individuals per population; Figure 3.1) to identify loci con- tributing to local adaptation. Sampled locations covered most of the climatic variation observed across the species range, characterized by multiple clines in temperature and precipitation (Figure

3.1). The sequencing coverage varied from 1.4–12.5x (mean = 5.1x) in Downy Woodpecker and from 1.1–11.7x (mean = 4.5x) in Hairy Woodpecker. Our genotype calling pipeline produced a dataset of 7,009,778 and 4,579,046 biallelic SNPs in Downy and Hairy Woodpecker, respectively.

3.2.1. Genotype-environment association analysis (GEA)

We found that a large proportion of SNPs showed a direct association with environmental varia- bles. To assemble climatic data, we used the 19 bioclimatic variables from the Worldclim database

(Hijmans et al. 2005) and performed a principal component analysis (PCA) separating variables related to temperature (BIO1-BIO11) from the ones related to precipitation (BIO12-BIO19). The first three principal components (PC1–3) of each of these sets were retained for our genotype- environment association analysis, explaining 94–98% of the total environmental variation. We then used a latent factor mixed model implemented in LFMM 2 (Caye et al. 2019) to test for an associ- ation between genotypes in each SNP and PC1–3 of temperature and precipitation. LFMM 2 con- trols for population structure by modeling K latent factors (Frichot et al. 2013). We assumed K =

123

4, as these have been the number of genetic clusters identified in previous analyses (Chapter 2). A

Benjamini–Hochberg p-value correction (Benjamini and Hochberg 1995) with a false discovery rate (FDR) < 0.01 was then applied to detect SNPs highly correlated with the environment. Our results revealed multiple SNPs showing association with environmental variables in both species

(Table 3.1; Table 3.S4). A larger number of SNPs were correlated with precipitation than temper- ature, indicating that precipitation might be a stronger selective force driving local adaptation. In

Downy and Hairy Woodpecker, a total of 87,277 (1.24%) and 11,064 (0.24%) unique SNPs were associated with precipitation variables, respectively, whereas only 3,723 (0.05%) and 5,048 (0.11%)

SNPs were correlated with temperature variables. For temperature, most SNPs (3,195 in Downy

Woodpecker and 4,572 in Hairy Woodpecker) were associated with PC2, which loaded most heav- ily on hot extremes (BIO5, BIO8, and BIO10). For precipitation, a large proportion of SNPs

(58,412 in Downy Woodpecker and 2,045 and Hairy Woodpecker) were associated with PC1, loading most strongly on annual precipitation (BIO12), and PC2 (43,578 SNPs in Downy Wood- pecker), loading most heavily on precipitation seasonality (BIO15). In Hairy Woodpecker, most

SNPs (8,938) were correlated with PC3 of precipitation, loading most heavily on precipitation in the warmest quarter (BIO18). Our results are consistent with a meta-analysis of directional selec- tion in plants and animals that showed that precipitation predicts 20 to 40% of variation in selec- tion, whereas temperature predicts little (Siepielski et al. 2017).

124

Figure 3.1. Environmental variation across the ranges of Downy and Hairy Woodpecker. (a) Map depicting the sympatric range of Downy and Hairy Woodpecker, the location of the study samples (dots), and their respective populations of origin (large circles). Colors on the map are based on the principal component analysis (PCA) of the bioclimatic data shown in (b). We converted scores of the first three principal components into values of RGB (PC1: red; PC2: green; PC3: blue) to represent variation in climate. Similar colors represent similar climates. (b) Principal component analysis (PCA) of the 19 bioclimatic variables from the WorldClim data- base (Hijmans et al. 2005). Background points (grey) represent 1,000 randomly sampled points across the sympatric range of both focal species. Points from each population are represented by different colors. (c) Biplot of the principal component analysis (PCA) of bioclimatic data indicat- ing the correlations among variables, as well as the direction and magnitude of their contribution

125

to the first principal components of the PCA. AK: Alaska; MW: Midwest; NE: Northeast; NR: Northern Rockies; NW: Pacific Northwest; SE: Southeast; SR: Southern Rockies.

The majority of candidate SNPs correlated with climatic variables were found in intergenic regions (63.82–65.1%; Figure 3.2) suggesting that the targets for local adaptation may be predom- inately in regulatory regions. A small proportion of these SNPs were located either upstream (5.66–

5.98%) or downstream (4.53–5.58%) of a gene. Out of the 24.69–25.39% of SNPs within genes, the majority were intronic (95.47–96.95%), whereas only 2.87–4.0% were in coding sequence. A very small fraction of SNPs correlated with climatic variables were nonsynonymous (0.23–0.4%), suggesting limited adaptation at the protein amino acid sequence level. Although a correlation with environment does not necessarily imply a particular SNP is adaptive, this finding suggests that, first, climatic adaptation is likely a very polygenic trait, being acquired via multiple variants of small effect, and second, most of the variation putatively implicated in local adaptation is likely of regu- latory nature (i.e., cis- and trans-acting factors), as evidenced by the large number of non-coding candidate SNPs. This predominance of putatively regulatory candidates corroborates the hypoth- esis that regulatory regions are more likely to contribute to adaptation because of their larger mu- tational target and lower pleiotropic effects when compared to coding regions (Stern and Or- gogozo 2008; Barghi et al. 2020). Our findings highlight the importance of regulatory innovation in local adaptation and trait evolution, as has been reported in several recent studies (Chan et al.

2010; Bozicevic et al. 2016; Martinez Barrio et al. 2016; Phifer-Rixey et al. 2018; Yusuf et al. 2020).

We found that an array of biological processes are represented by our set of annotated genes harboring one of more candidate SNPs (either within or in close proximity). These analyses showed an overrepresentation of functional categories related to anatomical structure develop- ment (GO:0048856; p < 0.001), response to stimulus (GO:0050896; p < 0.001), immune system

(GO:0002376; p < 0.001), lipid metabolism (GO:0006629; p < 0.001), reproduction (GO:0000003;

126

p < 0.001), etc (Figure 3.2). Such diversity of gene functions suggests that, at a broad environmen- tal scale, local adaptation involves a multitude of phenotypic, behavioral, and physiological traits, most of which have a complex genetic underpinning (Bozicevic et al. 2016; Pool et al. 2016; Ex- posito-Alonso et al. 2019; Bourgeois and Boissinot 2019; Jackson et al. 2020). Among the candi- dates associated with temperature, we found several genes directly involved in response to heat, particularly through the HSF1-mediated heat shock response (e.g., Downy: MAPK1, GSK3B, and

NUP153; Hairy: NUP85, NUP93, NUP98, PDCL3, TMEM48), including the heat shock factor- binding protein 1 (HSBP1), which is a well-known negative regulator of HSF1 (Zhang et al. 2010;

Wang et al. 2013). There were also identified genes linked to pathways related to cold response, such as cold-induced thermogenesis (e.g., Downy: PLCL2, DNJC3, IP6K1, FABP5, DOCK7,

ZNF423, ZNF423; Hairy: DNAJC3, DHRS7B, PDGFC, PPARGC1A, TMEM135, FLCN,

GPR120).

Table 3.1. Number of candidate SNP associated with each predictor variable in LFMM 2.

Environmental Strongest variable contribution Number of candidate SNPs* variable Downy Hairy Woodpecker Woodpecker

PC1 - Temperature Cold extremes 245 94

PC2 - Temperature Hot extremes 3,195 4,572

PC3 - Temperature Mean diurnal range 377 477

PC1 - Precipitation Annual precipitation 58,412 2,045

PC2 - Precipitation Precipitation seasonality 43,578 144

PC3 - Precipitation Precipitation on warmest quarter 4,291 8,938 PC: principal component. * Total SNPs: Downy = 7,009,778; Hairy = 4,579,046.

127

128

Figure 3.2. Scatter plot representing enriched biological processes in the candidate set identified in LFMM 2 (Caye et al. 2019). GO terms are clustered by semantic similarity after multidimensional scaling by REVIGO (Supek et al. 2011). Plots show candidates associated with (a) temperature and (b) precipitation in Downy Woodpecker, and (c) temperature and (d) pre- cipitation in Hairy Woodpecker. Colors indicate level of significance and circle size is scaled by the number of genes in the total genomic database.

We employed additional approaches to detect variants under selection in order to refine and validate our list of candidate SNPs obtained from LFMM. Because variants under selection tend to deviate from the general patterns of population structure under neutral evolution, we used the R package PCAdapt (Luu et al. 2017; Privé et al. 2020) to identify population structure outliers.

We found 158,705 outlier SNPs in Downy and 233,943 outlier SNPs in Hairy Woodpecker devi- ating from the global population structure (significance level; FDR < 0.01). 0.16–4% of these SNPs overlapped with our candidate set from LFMM (Figure 3.3c–d). Next, we searched for SNPs that showed a strong signature of selective sweep using H-scan (Messer Lab website 2014), a method that computes the average length of homozygosity (H) around each SNP looking for extended tracts of homozygosity. For this analysis, we considered outliers all variants on the top 1% largest values of H in each genetic cluster. This analysis resulted in a list of 127,933 unique SNPs in Downy

Woodpecker and 93,931 unique SNPs in Hairy Woodpecker. Although many candidate SNPs from H-scan overlapped with candidates from LFMM (1–2,255) and PCAdapt (55–3,296), only a small fraction of these SNPs were shared between all three methods in Downy Woodpecker (tem- perature = 6 SNPs; precipitation = 148 SNPs; Figure 3.3c; Table 3.S1) and no SNP was shared between all three methods in Hairy Woodpecker (Figure 3.3d). For temperature, five of the six candidate SNPs were intergenic whereas a single SNP was found downstream (~2.8 kb away) from the keratin type I cytoskeletal 15 (KRT15) gene, a gene involved in epidermal development and keratinization (Leube et al. 1988). Interestingly, the same SNP was a candidate in our three-way comparison with the precipitation dataset. In this dataset, 75 SNPs were intergenic (50.6%), 22

129

were downstream or upstream of a gene (14.8%), and 50 were intronic (33.7%). A single SNP was exonic, but silent (i.e., synonymous). We found that genes related to eye morphogenesis (TDRD7,

HCN1, PTPRM, and IFT122) were overrepresented in our three-way candidate comparison for precipitation (GO:0048592; p = 0.001). Considering that birds and mammals adjust their vision to adapt to differences in light level across latitudinal and habitat gradients (Pearce and Dunbar 2012;

Thomas et al. 2002; Martínez-Ortega et al. 2014), it is possible that selection has operated on pop- ulations of Downy Woodpecker to improve vision.

130

131

Figure 3.3. Candidate SNPs for local adaptation in Downy and Hairy Woodpecker and their effects. (a–b) Pie charts showing the fraction of candidate SNPs identified by the geno- type-environment association analysis in each type of functional category. (c–d) Venn diagram illustrating the overlap between the three methods used to detect SNPs under selection.

3.2.2. Parallelism at the genic level vs nucleotide level

We found genomic parallelism between Downy and Hairy Woodpecker at the genic level but not at the nucleotide level. To assess the degree of genomic parallelism in local adaptation to climate, we examined the overlap between candidate loci in both species. At the nucleotide level, no SNP putatively associated with temperature (Figure 3.4a) and only four SNPs putatively associated with precipitation (Figure 3.4b) were shared between Downy and Hairy Woodpecker. Given that adap- tive mutations are unlikely to arise at the same site in independent evolutionary lineages (Stern and

Orgogozo 2008; Conte et al. 2012; Rosenblum et al. 2014; Storz 2016), it is not surprising that a limited number of candidate SNPs overlap between the two species. We tested whether genomic parallelism could instead occur at the genic level. If the same genes, as opposed to SNPs, contrib- uted to local adaptation, we expected a large number of overlapping candidate genes. A total of

216 and 1,957 shared orthologous genes (out of 15,407) were associated with temperature (Figure

3.4e) and precipitation (Figure 3.4f), respectively. This large overlap was extremely unlikely to oc- cur by chance, considering the near zero probability of randomly drawing this number of genes from the total genomic pool (Figure 3.4g–h). These results suggest that, at the genic level, climatic adaptation is more repeatable than expected given the highly polygenic nature of adaptive pheno- types (Yeaman et al. 2016).

The genetic architecture of traits involved in local adaptation play an important role in genetic repeatability and could explain the high levels of parallelism observed between Downy and

Hairy Woodpecker. Phenotypes that are controlled by a small number of genes of large effect are more likely to exhibit parallelism because of their lower functional redundancy (Vasemägi and

132

Primmer 2005; Stern and Orgogozo 2008; Rosenblum et al. 2014). Similarly, particular properties of genes may also make them easier targets for adaptation. For instance, some genes may be more susceptible to new mutations, either because they are near hotspots of recombination, insertion and deletion or because of their larger size (Colosimo et al. 2004; Fraser and Whiting 2020). We did not find significant differences in recombination rate at SNPs putatively associated with cli- matic variables compared to presumably neutral sites (Kruskal-Wallis test, p = 0.37), not support- ing this hypothesis. Constraints imposed by gene interactions may also affect the probability of parallel genetic evolution (Stern and Orgogozo 2008; Rosenblum et al. 2014; Fraser and Whiting

2020). For example, epistatic interactions, where the expression of a gene depends on other genes, may limit the number of evolutionary routes available for adaptation (Conte et al. 2012; Storz

2016). In addition, pleiotropic interactions, where the same gene affects multiple distinct pheno- typic traits may also strongly influence parallelism (Chevin et al. 2010; Wang et al. 2010; Wagner and Zhang 2011; Hämälä et al. 2020). If mutations that produce beneficial phenotypic effects in one trait cause deleterious effects in other trait(s) (i.e., antagonistic pleiotropy), the number of viable pathways for adaptation are significantly reduced (Rosenblum et al. 2014). We hypothesize that pervasive gene interactions in the genome of Downy and Hairy Woodpecker could have fa- vored genic parallelism.

133

Figure 3.4. Genomic parallelism in Downy and Hairy Woodpecker. Venn diagram describ- ing the number of candidate (a–b) SNPs or (e–f) genes from LFMM 2 (Caye et al. 2019) shared by Downy (left) and Hairy (right) Woodpecker. Cumulative hypergeometric distributions show- ing the probability of observing a given number of overlapping candidate (c–d) SNP or (g–h) genes. The red dashed line indicates the empirical observation. Illustrations reproduced with per- mission from Lynx Edicions.

134

3.2.3. Signatures of elevated genetic differentiation

By scanning the genome of Downy and Hairy Woodpecker, we characterized regions of elevated population differentiation (FST) when compared to the genomic background. We estimated FST across 50 kb sliding windows in 10 kb increments using the genotype likelihood approach in

ANGSD (Korneliussen et al., 2014). Any genomic window with an FST value five standard devia- tions above the genome-wide mean was considered an outlier. Given that population expansion is expected to produce exceedingly long tails in the distribution of FST, thus confounding FST-outlier analyses, we chose a conservative cutoff value, corresponding to the top 6 x 10-5% of a normal distribution. Simulations have shown that despite conservative, this cutoff performs well in other systems with non-equilibrial demographies (Walsh et al. 2019). Nevertheless, we did not discard the possibility that many of our candidate loci could potentially be false positives while others might be missed. FST-outlier analysis across all 21 pairwise population comparisons revealed a number of outlier windows harboring candidate loci putatively under natural selection (Figure 3.5).

We asked whether the number of outlier windows detected in each population comparison was a function of the genome-wide FST or the average environmental dissimilarity between populations.

Mantel test revealed a correlation between the number of outlier windows and the average envi- ronmental dissimilarity (Downy: Spearman’s r = 0.5; p = 0.01; Hairy: Spearman’s r = 0.42; p = 0.04;

Figure 3.6e–f) but not the genome-wide FST (Downy: Spearman’s r = -0.27; p = 0.83; Hairy: Spear- man’s r = -0.15; p = 0.73; Figure 3.6b–c). This result suggests that the detected candidates are likely associated with local adaptation and not necessarily an artifact of higher genome-wide population differentiation.

135

3.2.4. FST-outlier analysis reveals selection on multiple genes related to immune system and nutrition

Across all pairwise population comparisons, we found 90–503 (Downy Woodpecker) and

86–533 (Hairy Woodpecker) outlier windows showing elevated differentiation (Figure 3.5a–b).

Most of these 50 kb regions harbored annotated genes – across all population comparisons, we found 572 and 610 candidate genes within regions of elevated FST in Downy and Hairy Wood- pecker, respectively. These candidate genes encompassed a broad range of molecular functions and biological processes, including response to nutrients (e.g., OTC, PHEX, HMGCL, ABCG5,

ABCG8, TGFBR2), embryonic development (e.g., LRP6, MSGN1, ALS2, MEOX1, SEMA3C,

OVOL2, SKIL, PRKACB, CCNB2, MYO1E, FZD5, EOMES, TRIM71), organism growth and maturation (e.g., EZH2, SH3BP4, WNT7A, ASPM, IGFBP3, IGFR1, FZD7, SLITRK1), heat response (e.g., HSP90AA1) and melanogenesis (e.g., RAB38, SHROOM2, ADAMTS20, NF1).

These results support our findings from the genotype-environment association analysis, revealing that local adaptation is a complex trait that involves multiple molecular pathways.

Next, we compared candidate outlier windows in Downy and Hairy Woodpecker and iden- tified 217 (12.6% and 11.5%) outlier windows (with 139 annotated genes) shared between popu- lation comparisons of Downy (total unique outlier windows = 2,150) and Hairy Woodpecker (total unique outlier windows = 2,355). Parallel candidate genes showed an overrepresentation of bio- logical processes associated with immune response against pathogens (e.g., CD36, TLR1B,

MFHAS1; Table 3.2). Enriched gene ontologies included “innate immune response in mucosa”

(GO:0002227; p < 0.001), “antibacterial humoral response” (GO:0019731; p = 0.019), “defense response to Gram-positive bacterium” (GO:0050830; p = 0.021), “antimicrobial humoral immune response mediated by antimicrobial peptide” (GO:0061844; p = 0.022), and “immune system de- velopment” (GO:0002520; p = 0.039). Birds living closer to the equator are hypothesized to show higher immune response than birds living in higher latitudes owing to the exposure to a higher variety of pathogens (“adjustment to pathogen load” hypothesis; Piersma 1997; Møller 1998;

136

Lindström et al. 2004; Hasselquist 2007) or their longer life expectancy and larger investment in self maintenance (“pace-of-life” hypothesis; Ricklefs 1992; Irene Tieleman et al. 2005; Edwards

2012; Tieleman 2018). In House Sparrows under common garden conditions, tropical individuals exhibited higher immune response than temperate ones (Martin et al. 2004). Birds are also thought to adjust their immune activity in response to local environment and pathogen pressure (Ru- benstein et al. 2008; Prüter et al. 2020). For instance, differential gene expression profiles from houses finches from the Eastern US that have been exposed to the conjuntivitis-causing bacterium

Mycoplasma gallisepticum for 12 years indicated that the innate immune system had been targeted by selection in response to previous infection, whereas no such effect is observed in Western US population with no history of infection (Bonneaud et al. 2012). Such geographic differences in pathogen load, ecophysiology and life history traits could explain the signatures of parallel selection for local adaptation observed in immune genes of Downy and Hairy Woodpecker.

137

Figure 3.5. FST-outlier analysis comparing Alaska (AK) and the Southeast (SE) popula- tion in Downy (a) and Hairy (b) Woodpecker. Each dot represents the FST value estimated for a given 50 kb sliding window along the genome. Colors differentiate consecutive chromo- somes. The red line indicates de genome-wide mean FST (Downy FST = 0.12; Hairy FST = 0.13) and the blue line indicates the cutoff value of five standard deviations above the mean for a win- dow to be considered outlier (Downy FST = 0.32; Hairy FST = 0.34). Squares indicate the location of key annotated genes found within outlier windows. Illustrations reproduced with permission from Lynx Edicions.

138

Table 3.2. Enriched gene ontologies for parallel FST-outlier genes across all pairwise population comparisons. Significance was determined through a Fisher’s Exact test and false discovery rate (FDR) correction.

Candidate Genome Corrected GO ID GO description Odds ratio count count p-value

GO:0002227 innate immune response in mucosa 45 800 8.14 1.82E-19

GO:0006082 organic acid metabolic process 8 18 82.72 9.96E-10

GO:0006278 RNA-dependent DNA biosynthetic process 10 42 32.81 1.67E-09

GO:0006334 nucleosome assembly 9 42 28.40 3.98E-08

GO:0006342 chromatin silencing 7 23 44.91 2.21E-07

GO:0006352 DNA-templated transcription 8 84 10.85 0.0001

GO:0006508 proteolysis 4 13 44.63 0.0004 139 GO:0006805 xenobiotic metabolic process 4 13 44.63 0.0004

GO:0006970 response to osmotic stress 3 8 59.77 0.0030

GO:0007190 activation of adenylate cyclase activity 3 8 59.77 0.0030

GO:0008210 estrogen metabolic process 4 24 20.09 0.0045

GO:0009404 toxin metabolic process 3 11 37.40 0.0067

GO:0015074 DNA integration 3 11 37.40 0.0067

GO:0015671 oxygen transport 3 12 33.25 0.0076

GO:0017144 drug metabolic process 3 12 33.25 0.0076

GO:0019731 antibacterial humoral response 3 17 21.36 0.0190

GO:0032496 response to lipopolysaccharide 3 17 21.36 0.0190

GO:0035093 spermatogenesis, exchange of chromosomal proteins 2 4 98.89 0.0190

GO:0042744 hydrogen peroxide catabolic process 6 106 6.08 0.0208

GO:0043086 negative regulation of catalytic activity 10 303 3.51 0.0217

GO:0043408 regulation of MAPK cascade 4 43 10.29 0.0217

GO:0043567 regulation of insulin-like growth factor receptor signaling pathway 8 194 4.40 0.0217

GO:0050830 defense response to Gram-positive bacterium 3 19 18.70 0.0217

GO:0051552 flavone metabolic process 3 20 17.60 0.0217

GO:0052696 flavonoid glucuronidation 2 5 65.86 0.0217

GO:0052697 xenobiotic glucuronidation 2 5 65.86 0.0217 140 GO:0061844 antimicrobial humoral immune response mediated by antimicrobial peptide 4 45 9.79 0.0222

GO:0070980 biphenyl catabolic process 3 21 16.62 0.0232

GO:0070995 NADPH oxidation 2 6 49.47 0.0283

GO:0072592 oxygen metabolic process 4 50 8.72 0.0297

GO:0090502 RNA phosphodiester bond hydrolysis, endonucleolytic 8 218 3.89 0.0300

GO:0098869 cellular oxidant detoxification 3 24 14.25 0.0303

GO:1990418 response to insulin-like growth factor stimulus 2 7 39.61 0.0345 GO: gene ontology.

Other immune-related gene ontologies were enriched only in Downy Woodpecker (Table

3.S2). Candidates of selection were overrepresented by genes related to “negative regulation of toll-like receptor 2 signaling pathway” (GO:0034136; p < 0.001) and “toll-like receptor 10 signaling pathway” (GO:0034166; p < 0.001). Toll-like receptors (TLRs) play an important role in the innate immune system – they are expressed in the membrane of many immune cells and can recognize molecular features that are conserved across a large number of pathogens (Akira and Takeda 2004).

Birds have ten described TLR genes, many of which show evidence of positive selection across the evolutionary history of birds (Chen et al. 2013; Velová et al. 2018; Shultz and Sackton 2019).

Considering the large spatial and seasonal variation in pathogen prevalence across North America

(Sol et al. 2000; Olsen et al. 2006; Pagenkopp et al. 2008; Benskin et al. 2009), it is likely that birds have evolved several mechanisms to better detect and fight these different pathogens, either through changes in receptors or intracellular signaling pathways (Randall and Goodbourn 2008;

Quintana-Murci and Clark 2013; Sironi et al. 2015).

In addition to GO terms directly related to the immune system, we also found an overrepresentation of genes associated with nucleic acid replication, transcription, and repair in both species (Table 3.2). Parallel candidates were enriched for genes related to “RNA-dependent

DNA biosynthetic process” (GO:0006278; p < 0.001), “DNA-templated transcription”

(GO:0006352; p < 0.001), “nucleosome assembly” (GO:0006334, p < 0.001), “DNA integration”

(GO:0015074, p = 0.006), and “RNA phosphodiester bond hydrolysis, endonucleolytic”

(GO:0090502, p = 0.03). Shultz and Sackton (2019) found that pathways related to DNA replica- tion and repair were significantly enriched for positively selected genes in birds. They suggest that these genes might be indirectly related to immune response against viruses, which are known to subvert the DNA/RNA replication and repair machineries to promote their own replication

(Chaurushiya and Weitzman 2009; Luftig 2014; Shultz and Sackton 2019). Such genetic mecha- nisms could also have evolved in response to transposable elements (TEs), repetitive DNA se- quences that have the ability to move across the genome (Bourque et al. 2018; Bourgeois and

141

Boissinot 2019). Woodpeckers, in particular, show a large expansion in the number of TEs, espe- cially retrotransposon CR1, compared to other bird lineages, which are known for the paucity of

TEs (Zhang et al. 2014; Manthey et al. 2018). TEs can cause deleterious effects on their “hosts” due to the disruption of gene expression by random insertions, synthesis of deleterious RNAs or proteins, or chromosomal rearrangements caused by ectopic recombination between non-allelic copies (Bourgeois and Boissinot 2019). Thus, selection is expected to operate removing TEs from the genome, leading to a host-parasite evolutionary arms race. Consistent with this hypothesis,

Manthey et al. (2018) found evidence of purifying selection against polymorphic TEs in Downy

Woodpecker and two other closely related species of woodpeckers. However, TEs can also be co- opted for adaptive purposes, such as supplying regulatory elements (e.g., promoter and cis-regula- tory elements) and modulating gene expression, a process known as “TE gene domestication”

(Feschotte 2008; Schrader et al. 2014; Bourgeois and Boissinot 2019). In Drosophila, for example,

TEs play an important role in adaptation to temperate climates (González et al. 2010). Candidate genes involved in retrotranscription and DNA integration suggest that TEs could have an adaptive value in Downy and Hairy Woodpecker and might be under disruptive selection across popula- tions.

We identified several enriched biological processes putatively associated with differences in diet among populations of Downy and Hairy Woodpecker (Table 3.2). Although Downy and

Hairy Woodpecker feed predominantly on animal matter (e.g., invertebrates make up ~ 75% of the diet; Beal 1910; Beal 1911; Howell 1911; Neff 1928), fruits, seeds, and other vegetables still make up about a quarter of the food items consumed. Variation in the percentage of consumed plant-based foods among populations exists and it is likely due to differences in the availability of different food sources (Beal 1910; Beal 1911; Howell 1911; Neff 1928). In Sage Grouse, genomic signatures of local adaptation are evident in several genes associated with detoxification of plant secondary metabolites, such as alkaloids (Oh et al. 2019). We found an overrepresentation of genes related to “xenobiotic metabolic processes” (GO:0006805; p < 0.001), “toxin metabolic process”

142

(GO:0009404; p = 0.006), “drug metabolic processes” (GO:0017144; p = 0.007), and “flavone metabolic process” (GO:0051552; p = 0.02) in our set of parallel candidate genes. Both Downy and Hairy Woodpecker showed signatures of selection in known xenobiotic enzymes, such as

FMO4, FMO5, and UGT1A9 (Tolson and Wang 2010; Hecker et al. 2019). It is possible that dietary adaptation across the broad range of these species play a role in local adaptation.

143

144

Figure 3.6. Heatmap of the number of FST-outlier windows across all pairwise population comparisons and its correlates in Downy (left) and Hairy (right) Woodpecker. (a–b)

Heatmap of the number of genomic windows detected in the FST-outlier for each population comparison in Downy (a) and Hairy (b) Woodpecker. (c–d) Genome-wide FST across all popula- tion comparisons in Downy (c) and Hairy (d) Woodpecker. (e–f) Average environmental dissim- ilarity among populations calculated as the Euclidean distance between principal components of the 19 bioclimatic variables from the WorldClim database (Hijmans et al. 2005) in Downy (e) and Hairy (f) Woodpecker. AK: Alaska; MW: Midwest; NE: Northeast; SE: Southeast; NW: Pacific Northwest; NR: Northern Rockies; SR: Southern Rockies

3.2.5. A wide array of biological processes underlie adaptation at the range peripheries

To more narrowly explore the genomic architecture of local adaptation, we focused on a key pop- ulation comparison that provided an opportunity to understand adaptation at the range peripher- ies. We investigated signatures of selection in the comparison between Alaska (AK) and the South- east (SE), the two latitudinal extremes of the sympatric distribution of Downy and Hairy Wood- pecker, occupying opposite ends of the environmental space (Figure 3.1b). This comparison re- vealed many candidate genes with elevated genetic differentiation compared to the genomic back- ground (Figure 3.5; Table 3.3). Candidates included several genes related to immune response, such as the protein phosphatase 1B (PPM1B), an enzyme that inhibits TBK1-mediated antiviral signaling (Zhao et al. 2012), G-protein coupled receptor 1 (GPR1) and C-C motif chemokine 20

(CCL20), both involved in the inflammation-associated chemotaxis response of several immune cells (Schutyser et al. 2000; Nelson et al. 2001; Barnea et al. 2008; Röhrl et al. 2010), and lymphocyte cytosolic protein 2 (LCP2), which plays a role in T-cell activation (Jordan et al. 2003; Lowell 2004;

Luftig 2014). The analysis also identified genes related to amino acid and lipid metabolism (e.g.,

GADL1 and DEGS2), as well as both the pancreatic and hepatic α-amylase genes (AMY), enzymes that play an important role breaking down long chain polysaccharides. Studies show that passerine birds fed with a starch-rich diet show higher pancreatic amylase activities (Kohl et al. 2011), and

145

birds with a diet richer in seeds exhibit higher values of dN/dS (ω) in amylase genes, suggesting positive selection for enzymatic efficiency (Chen and Zhao 2019). Ravinet et al. (2018) also found signatures of selection in the amylase alpha 2 (AMY2A) in populations of the House Sparrow adapted to the urban environment. This finding may indicate that differences in consumption of polysaccharides between populations of Downy Woodpecker in Alaska and the Southeast may impose significant selective pressures on enzymatic genes.

Several candidate genes in the comparison between Alaska (AK) and the Southeast (SE) were associated with embryonic development. R-spondin-2 (RSPO2) plays a crucial role in limb specification in humans (Szenker-Ravi et al. 2018) and MAGUK p55 subfamily member 4 (MPP4) is important for the localization of retinal photoreceptors in mice (Aartsen et al. 2006). Another candidate gene under selection was the vitellogenin-3 (VTG3), a gene that expresses the precursor of the egg-yolk proteins, an essential source of nutrients in the early stages of bird development

(Mann and Mann 2008; Schneider 2009). We also found a number of candidate genes associated with neurodevelopment and learning, including ATP8A1, a gene responsible for the transport of lipid signaling molecules through the cell membrane and implicated in hippocampus-dependent learning (Levano et al. 2012), and adenylate cyclase type 1 (ADCY1), implicated in memory and learning behavior, as well as regulation of the circadian clock in mice (Wu et al. 1995; Hwang et al.

2013). Several genes related to mitochondrial maintenance and respiration were also among the candidate genes showing elevated FST between Alaska (AK) and the Southeast (SE) – NDUFS1, a component of NADH dehydrogenase complex, key catalyst of mitochondrial membrane respira- tion (Elkholi et al. 2019; Ni et al. 2019), FASTKD5, involved in the processing of mitochondrial mRNA (Antonicka and Shoubridge 2015), and MRPL44, a key component of the 39S subunit of mitochondrial ribosome (Carroll et al. 2013). Downy Woodpecker’s ability to elevate its metabolic rate to compensate for heat lost (Liknes and Swanson 1996; Swanson 2006) is expected to be accompanied by an elevated mitochondrial activity (Pörtner 2004), and could lead to different selective pressures on mitochondrial efficiency.

146

Table 3.3. Candidate genes within windows of elevated FST in the comparison between Alaska (AK) and the Southeast (SE).

Chromosome Gene Protein General function Spp.*

2 AGR2/AGR3 Anterior gradient protein homologs 2 and 3 Production of mucus and regulation of intracellular calcium in tracheal D epithelial cells

2 GADL1 Acidic amino acid decarboxylase Decarboxylation of amino acids D

2 STT3A Dolichyl-diphosphooligosaccharide--protein glyco- Protein glycosylation D syltransferase

2 IGFBP1/IGFBP Insulin-like growth factor-binding proteins 1 and 3 Regulation of growth, cell proliferation, muscular development, re- B 3 sponse to stress, and body size

2 RSPO2 R-spondin-2 Limb specification during embryonic development H

2 PSMA2 Proteasome subunit alpha type-2 Maintenance of protein homeostasis; immune system D

3 PPM1B Protein phosphatase 1B Regulation of immune response to infection and stress D 147 4 ATP8A1 Phospholipid-transporting ATPase IA Aminophspholipid translocase at the plasma membrane; involved in D brain connectivity

4 RELL1 RELT-like protein 1 Activation of the MAPK14/p38 cascade; response to stress D

4 PCDH1 Protocadherin-1 Cell-cell interactions and cell adhesion D

4 FASTKD FAST kinase domain-containing protein 2 Processing of non-canonical mitochondrial mRNA precursors D

5 DEGS2 Sphingolipid delta(4)-desaturase/C4-monooxyge- Sphingolipid biosynthesis H nase

5 YY1 Transcriptional repressor protein YY1 Development and differentiation H

5 SLC25A29 Mitochondrial basic amino acids transporter Mitochondrial basic amino acids transporter H

7 NDUFS1 NADH-ubiquinone oxidoreductase 75 kDa subunit Core subunit of the mitochondrial membrane respiratory chain NADH D dehydrogenase

7 GPR1 G protein-coupled receptor GPR1 Receptor for the inflammation-associated leukocyte chemoattractant; D regulation of inflammation; detection of glucose

7 EEF1B2 Elongation factor 1-beta Translation elongation D

7 VTG3 Vitellogenin-3 Precursor of the egg-yolk proteins D

7 MPP4 MAGUK p55 subfamily member 4 Retinal photoreceptors development D

7 ALS2CR4 Transmembrane protein 237 Ciliogenesis D

8 AMY1/AMY2 Alpha amylase Polysaccharide endohydrolysis (digestive enzyme) D

9 MFF Mitochondrial fission factor Mitochondrial and peroxisomal fission H

9 MRPL44 39S ribosomal protein L44 Component of the 39S subunit of mitochondrial ribosome H

9 SLC19A3 Thiamine transporter 2 Thiamine transporter H

9 CCL20 C-C motif chemokine 20 Chemotaxis of immune cells at skin and mucosal surfaces H 148

13 KCNMB1 Calcium-activated potassium channel subunit beta-1 Regulatory subunit of the calcium activated potassium KCNMA1 H (maxiK) channel

13 LCP2 Lymphocyte cytosolic protein 2 T-cell antigen receptor mediated signaling H

15 MYOC Myocilin Regulation of cell adhesion, cell-matrix adhesion, cytoskeleton organiza- B tion and cell migration; bone formation; muscle hypertrophy; neurite outgrowth

15 PRRC2C BAT2 domain-containing protein 1 Formation of stress granules B

15 FMO4 Dimethylaniline monooxygenase Metabolism of xenobiotics B * Species in which the candidate gene was detected. D: Downy Woodpecker; H: Hairy Woodpecker; B: both.

3.2.6. Parallel selection on the IGF signaling pathway

We found parallel signatures of selection at genes of the insulin growth factor (IGF) signaling pathway in comparisons involving small vs large-bodied populations of Downy and Hairy Wood- pecker. The largest FST peak in the comparison between Alaska (AK) and the Southeast (SE) was located in chromosome 2 between 50 and 55 Mb (Figure 3.5). This peak contained several genes

(some of which have been discussed in the previous paragraphs), but the largest values of FST were found in proximity to two insulin-like growth factor-binding proteins – IGFBP1 and IGFBP3

(Figure 3.7). IGFBP are a highly conserved family of proteins that bind to insulin-like growth factors (IGFs), specially IGF-1, to assist their transport around the body and prolong their half- lives (Upton et al. 1993; Duan and Xu 2005; Allard and Duan 2018). The majority of IGFs in circulation in the body are found bound to IGFBPs, specially IGFBP3, which binds over 75% of

IGF molecules in the blood serum (Jones and Clemmons 1995). It is well known that IGF-1 plays a crucial role stimulating growth, differentiation, and proliferation of cells, thereby mediating the overall postnatal growth rate and body size of various vertebrates (Baker et al. 1993; Lupu et al.

2001; Zhou et al. 2005; Laviola et al. 2007; Perrini et al. 2010; Hellström et al. 2016). IGF-1 has been linked to several metabolic and developmental pathways, including muscle mass growth

(Otto and Patel 2010), remodeling of skeletal tissue (Yakar et al. 2002; Kawai and Rosen 2009;

Mohan and Kesavan 2012), neurogenesis of the nervous system (O'Kusky et al. 2000; Daftary and

Gore 2005), nutrient metabolism (Rajpathak et al. 2009), and tissue regeneration (Emmerson et al.

2012). In chicken, higher expression and plasma levels of IGF-1 were correlated with larger body weight (Beccavin et al. 2001; Zhou et al. 2005), and in ovo injections of IGF-1 resulted in increased body sizes (Kocamis et al. 1998; Wang et al. 2012). Other studies found a positive association between IGF-1 levels and body size both across and within passerine species (Lodjak et al. 2014;

Lodjak et al. 2017), and revealed that levels of IGF-1 are associated with life-history strategies

(Lodjak et al. 2018; Lodjak and Verhulst 2020) and expression of plumage traits (Mahr et al. 2020).

149

The expression and secretion of IGFBPs in the liver are also highly regulated by catabolic factors and hormones. Starvation, hypoxia, and stress, for example, are physiological states that stimulate the expression of IGFBP1 (Maures and Duan 2002; Kajimura et al. 2006).

We found that across 12 population comparisons in Downy Woodpecker and 11 in Hairy

Woodpecker, the genomic region harboring the IGFBP1 and IGFBP3 genes shows exceedingly large genetic differentiation when compared to the genomic background. In both species, an FST peak was evident across all comparisons between the Southeast (SE) and any other population.

This genomic region was also outlier in the comparisons between the Northeast (NE) and other western populations (e.g., NR, NW, AK), and in Hairy Woodpecker, in the comparisons between the Midwest (MW) and other western populations. The region of chromosome 2 containing

IGFBP1 and IGFBP3 was also characterized by low nucleotide diversity and extended homozy- gosity, genomic signatures of a selective sweep (Figure 3.7). Body sizes in both species are known to change clinally with latitude – birds in higher latitudes and elevations (e.g., Alaska and Northern

Rockies) are larger than their southern counterparts (e.g., Southeast; Ouellet 1977). This pattern conforms to Bergmann's rule, where individuals in cooler climates tend to be larger than conspe- cifics in warmer areas (Bergmann 1847), and might indicate that Downy and Hairy Woodpeckers have evolved larger bodies to conserve heat in areas where temperatures are lower (McNab 1971;

Lindstedt and Boyce 1985; Murphy 1985; Ashton 2002; Olson et al. 2009). Our results show that an elevated FST in the genomic region harboring IGFBP1 and IGFBP3 was observed in all com- parisons between populations of small vs large birds. We hypothesize that these are candidate genes that contribute to differences in body size across populations and have been under parallel natural selection in both the species.

150

Figure 3.7. Genomic signatures of selective sweep in the comparison between Alaska (AK) and the Southeast (SE) in a segment of chromosome 2 of Downy Woodpecker. (top) FST between Alaska (AK) and the Southeast (SE) across 10 kb windows in 2 kb increments. The red line represents the local polynomial regression fit and the blue rectangles indicate the location of genes. Five key genes with elevated FST are indicated by different colors. (middle) Nucleotide diversity in Alaska (AK; black line) and the Southeast (SE; blue line). (bottom) The average length of pairwise homozygosity tracts for each SNP (H) along this segment of chromosome 2 in the Alaska (red) and Eastern (blue) population.

151

3.2.7. Parallel selection in hemoglobin genes at high-elevation populations

High-elevation populations of Downy and Hairy Woodpecker show parallel signatures of selection in hemoglobin genes. Across our set of candidate genes for parallel local adaptation, we also found an overrepresentation of genes associated with “oxygen transport” (GO:0015671, p < 0.001) and

“gas transport” (GO:0015669, p < 0.001). Population comparisons between highland (NR and

SR) and lowland (e.g., SE and NW) show outlier values of FST in a region of chromosome 1 be- tween 20.75 and 20.80 Mb and a region of chromosome 14 between 58 and 58.2 Mb. Both of these regions harbor a cluster of genes that encode for the β- and α-type subunits of the hemoglo- bin, the tetrameric protein responsible for carrying oxygen through the bloodstream of most ver- tebrates (Storz 2018). In woodpeckers, the β-globin cluster in chromosome 1 contains three genes arranged in tandem – Hbb-ρ, Hbb-βA, and Hbb-ε, and two pseudogenes – pseudo-Hbb-βH and an inactivated version of Hbb-ε (Zhang et al. 2014; Opazo et al. 2015; Hoffmann et al. 2012). Some of these subunit proteins make up distinct isoforms of hemoglobin that are expressed at different stages of prenatal development (Cirotto et al. 1987; Ikehara et al. 1997; Alev et al. 2009; Storz et al. 2011). Although an adaptive increase in oxygen affinity in high-altitude birds is often associated with (sometimes predictable) amino acid changes in these hemoglobin proteins (Projecto-Garcia et al. 2013; Galen et al. 2015; Natarajan et al. 2016; Zhu et al. 2018), we did not find nonsynony- mous mutations that could be linked to differences in allele frequencies in high elevation in these woodpeckers, as most candidate SNPs were located in pseudogenes. However, we cannot discard the possibility that selection on regulatory and/or structural variants not assessed in this study could underlie differences between lowland and highland populations.

3.2.8. Melanoregulin as a candidate for plumage variation

Another component of parallel adaptation in Downy and Hairy Woodpecker is their covariance in plumage color. In both species, western birds tend to be darker than their eastern counterparts

152

(Ouellet 1977). We found a very conspicuous FST peak in chromosome 7 (between 20.2 and 20.5

Mb) in all comparisons between the Pacific Northwest (NW) and any other population of the

Hairy Woodpecker (Figure 3.8a). This genomic region includes the gene melagonregulin (MREG), a gene implicated in hair and skin pigmentation in mammals (Damek-Poprawa et al. 2009; Wu et al. 2012a; 2012b; Ohbayashi et al. 2012; Rout et al. 2018). In mice, melanoregulin mediates the transfer of melanin from melanocytes (the cells that synthesize melanin) to keratocytes (the main cells of skin and hair; Moore et al. 1988; O'Sullivan et al. 2004; Ohbayashi et al. 2012). Our results indicate that MREG, a gene poorly explored in the avian literature, might be a potential candidate for plumage variation. In birds, the particular location and timing of this transfer during feather development is thought to produce unique pigmentation patterns (O'Sullivan et al. 2004; Yu et al.

2004; Ng and Li 2018). Individuals of both Downy and Hairy Woodpecker in the Pacific North- west population are much darker than individuals from other populations sampled in this study.

Our results revealed exceedingly large FST values near the MREG gene and evidence of a selective sweep in this population, characterized by exceedingly long tracts of homozygosity (Figure 3.8b– c) in comparisons with the Pacific Northwest (NW) population in Hairy Woodpecker only. This finding suggests that although convergent, plumage variation in Downy and Hairy might have different genetic origins. If plumage color is an adaptive trait, this result suggests that despite the overarching importance of genetic parallelism in local adaptation, phenotypic convergence can still be achieved through species-specific solutions.

153

154

Figure 3.8. A candidate gene for plumage variation in Hairy Woodpecker. (a) Manhattan plot showing FST values estimated for 50 kb sliding windows along the genome with 10 kb incre- ments. Colors differentiate consecutive chromosomes. The red line indicates the genome-wide mean FST (FST = 0.05) and the blue line indicates the cutoff value of five standard deviations above the mean for a window to be considered outlier (FST = 0.11). (b) Average length of pair- wise homozygosity tracts for each SNP (H) for the Pacific Northwest (NW; dark individuals) and Northern Rockies (NR; white individuals) population in a segment of chromosome 7 containing the gene melanoregulin (MREG). (c) Genotypes for each SNP located in the segment containing the MREG gene separated by population. Blue: homozygous for the reference allele; pink: heter- ozygous; red: homozygous for the alternative allele. AK: Alaska; MW: Midwest; NE: Northeast; SE: Southeast; SR: Southern Rockies; NR: Northern Rockies; NW: Northwest. Illustrations re- produced with permission from Lynx Edicions.

3.3. Conclusions

Parallel local adaptation provides a unique opportunity to understand how natural selection oper- ates in independent evolutionary lineages. In particular, genomic data allows for the exploration of whether selective constraints limit the total number of genetic avenues available for adaptation, leading to genomic repeatability. We investigated parallel local adaptation in Downy and Hairy

Woodpecker, two species co-distributed across a highly heterogeneous environment gradient in

North America. Our results reveal that despite the large evolutionary distance between the two species, natural selection targeted parallel genetic mechanisms for local adaptation. This genomic parallelism was only observed at the gene level and not at the SNP level, indicating that adaptive changes involved mutations at the same genes, but at different nucleotide positions. Our genotype- environment analysis identified numerous SNPs exhibiting strong association with temperature and precipitation. Most of these variants were correlated with precipitation variables, suggesting that humidity gradients impose strong selective pressures (either directly or indirectly) across the species range. The majority of the SNPs putatively under selection were in non-coding regions, indicating that climatic adaptation is a highly polygenic trait, likely involving the recruitment of

155

many mutations of small effect, most of which are regulatory. This finding adds to the growing body of research showing the importance of regulatory innovation in adaptive evolution.

Using a combination of methods, we detected several candidate genes exhibiting signatures of natural selection (e.g., elevated population differentiation and extended homozygosity) in

Downy and Hairy Woodpecker. These candidate genes were involved in a broad array of biological processes, including embryonic development, nutritional metabolism, mitochondrial respiration, and oxygen transportation. Among the candidates shared by Downy and Hairy Woodpecker, we found an overrepresentation of genes related to DNA replication and immune response, both of which are linked to defense mechanisms against region-specific pathogens. Our genomic scan for selection also identified potential candidates associated with key phenotypic traits in Downy and

Hairy Woodpecker. For example, signatures of selective sweep around the melanoregulin gene

(MREG) in a darker-plumage population of Hairy Woodpecker suggests its role in plumage vari- ation. In addition, parallel signatures of selection in genes belonging to the IGF signaling pathway were consistent with differences in body size among population comparisons. Taken together, these results provide compelling evidence of the dominant role of genomic parallelism in local adaptation across a shared environmental gradient.

3.4. Material and Methods

3.4.1. Sample acquisition and whole genome sequencing

Seventy samples of Downy (D. pubescens) and Hairy Woodpecker (D. villosus) each were collected in seven geographic locations (hereafter referred to as populations) consisting of major bioclimatic domains of temperate North America (n = 10 per population; Figure 3.1): New York

(Northeast), Louisiana (Southeast), Minnesota (Midwest), New Mexico and Colorado (Southern

Rockies), Wyoming (Northern Rockies), Washington (Pacific Northwest), and Alaska. Tissue sam- ples from museum-vouchered specimens were acquired by targeted field expeditions conducted in

156

Wyoming, Louisiana, and Alaska and supplemented by loans from natural history museums (Table

2.S1). Genomic DNA was extracted from tissue samples using the MagAttract High Molecular

Weight DNA Kit from Qiagen following manufacturer’s instructions (Qiagen, California, USA).

Extracted DNA was submitted for whole genome resequencing on a paired-end Illumina HiSeq

X Ten machine by RAPiD Genomics (Gainesville, Florida, USA).

3.4.2. Read alignment, variant calling and filtering

We removed adapters, trimmed low quality ends, and filtered raw reads using Trimmo- matic v0.36 (Bolger et al. 2014), resulting in an average of 35,689,979 paired reads per sample.

Read quality was verified using FastQC v0.11.4. (Andrews 2010). Processed reads were then mapped against the pseudo-chromosome reference genome of Downy Woodpecker using BWA v0.7.15 mem algorithm (Li and Durbin 2009) following the procedures described in detail in Chap- ter 2. We converted resulting sequence alignment/map (SAM) files to their binary format (BAM), added sequence group information, sorted, marked for duplicates, and indexed using Picard

(http://broadinstitute.github.io/picard/). We then used IndelRealigner, part of the Genome Anal- ysis Toolkit (GATK v3.6; DePristo et al., 2011), to correct read alignment errors near insertion and deletion (indels). The quality of mapping was assessed using QualiMap v.2.2.1 (Okonechnikov et al. 2016).

We performed genotype calling using GATK v3.8.0 (McKenna et al., 2010). We first ran the HaplotypeCaller algorithm separately for each sample using the following options: --emitRefCon- fidence GVCF -minPruning 1 -minDanglingBranchLength 1. We then jointly called genotypes across all gVCF files using GenotypeGVCFs with default settings. We followed the hard filtering recom- mendations from the Broad Institute's Best Practices (https://gatk.broadinstitute.org/) to remove

SNPs with annotations values above or below the following thresholds: QD < 2.0, ReadPosRank-

Sum < -80, FS > 60.0, SOR > 3.0, MQ < 40.0, and MQRankSumTest < − 12.5. The meaning of these different annotations is explained in detail in Chapter 2. Finally, we used VCFtools v0.1.17

157

(Danecek et al. 2011) to retain only biallelic SNPs meeting the following criteria: (1) missing data

< 25% across all samples, (2) read depth between 2x and 100x, (3) no strong deviation from the

Hardy-Weinberg equilibrium (Exact test; p-value > 0.01), and (4) minor allele frequency (maf) >

0.05. We finally used snpeff v4.1(Cingolani et al. 2012) to annotate the functional impact of each

SNP, according to gene annotation of Downy Woodpecker.

3.4.3. Genotype-environment association analysis

We performed a genotype-environment association analysis (GEA) to search for SNPs whose genotypes showed a direct association with the bioclimatic variables extracted from the Worldclim database (Hijmans et al. 2005). In order to reduce the collinearity among the 19 environmental variables, we performed a principal component analysis (PCA) separating two sets of variables – one set representing temperature (BIO1–BIO11) and another set representing precipitation

(BIO12–BIO19). All variables were centered and scaled before the PCA. We retained the first 3 principal components (PC1–3; temperature = 95% variance explained; precipitation = 98% vari- ance explained) to summarize the variation in these two sets of variables. We then used the latent factor mixed model implemented in LFMM 2 (Caye et al. 2019) to test the association between genotypes at each SNP and the scores of the retained PC variables of temperature and precipita- tion. LFMM 2 models the environmental variables as fixed effects while controlling for the under- lying population structure through latent factors (Frichot et al. 2013). We used K = 4 latent factors, as this represents the number of genetic clusters previously detected in both of our species (Chap- ter 2). Because LFMM requires a complete genotype matrix, we imputed our missing data using the impute function implemented in the R package LEA (Frichot and François 2015). This approach uses the factors estimated in the snmf function to predict genotypes according to the inferred ge- netic structure (Gain and François 2021). For the imputation of missing data, we assumed K = 4 and used the “mode” method. Parameters in LFMM 2 were estimated using the ridge function and

158

p-values were calibrated using the in-built genomic control method, which uses the genomic infla- tion factor (λ) calculated from the Z-scores. We finally applied a Benjamini–Hochberg p-value correction (Benjamini and Hochberg 1995) assuming a significance cutoff of q < 0.01 to identify

SNPs showing high correlation with environment.

To test for parallelism in a set of SNPs and genes, we calculated the probability of observ- ing a given number of overlapping candidates under a hypergeometric distribution, if SNPs/genes were randomly drawn from a total pool. We used the dhyper function in R to calculate significance and plot the cumulative distributions.

3.4.4. Signature of selective sweep

We investigated signatures of selective sweep by calculating, for each SNP in each genetic cluster

(Alaska, East, Rocky Mountains, and Pacific Northwest), the H statistic implemented in H-scan

(Messer Lab website 2014). This metric measures the average length of pairwise homozygosity tracts around a given focal SNP and, in contrast to other methods that search for genomic signa- tures of selective sweeps, it does not require phased haplotypes. Selective sweeps are expected to show elevated linkage disequilibrium around the target of selection, producing exceedingly long tracts of homozygosity. We used the H statistic to validate the candidates identified through other methods. If genes showing atypically high FST or strong association with environmental variables are in fact under selection, we expect that they will show large values of the H. We also used H- scan to detect SNPs showing strong signatures of selective sweep. We considered outliers, all SNPs in the top 1% quantile of H values.

3.4.5. Population structure outliers

To scan the genome for SNPs deviating from the neutral population structure, we used the R package PCAdapt 4.1.0 (Luu et al. 2017; Privé et al. 2020). PCAdapt computes Mahalanobis dis- tances to measure the extent to which every SNP in the dataset is related to the first K principal components (PCs) of the genetic variation. SNPs under selection are expected to show strong

159

deviation (i.e., large Mahalanobis distances) from the general population structure. For both

Downy and Hairy Woodpecker, we considered K = 4, as population structure was no longer as- certained after the fourth principal component. After obtaining the z-scores from the regression between SNPs and the four principal components, outlier variants were those showing exception- ally high Mahalanobis D2 distances from the vector of z-scores. Statistical significance was obtained from a χ2 distribution considering a false discovery rate (FDR) of 0.01.

3.4.6. FST-outlier analysis

To detect signatures of selection associated with differences in allele frequencies between popula- tions, we estimated FST across 50 kb sliding windows with a window step of 10 kb using ANGSD v0.917 (Korneliussen et al., 2014). ANGSD estimates FST using genotype likelihoods instead of relying on genotype calls, therefore accounting for the uncertainty associated with low depth se- quencing data. For this analysis, we first estimated allele frequencies directly from genotype likeli- hoods assuming known major and minor alleles (-doSaf 1 -doMajorMinor 1 -doMaf 1; Kim et al., 2011) and using only variants meeting the following quality filter criteria: -minMapQ 30, -minQ 20, - minMaf 0.05, -SNP_pval 0.01. A detailed explanation of these filters can be found in Chapter 2.

The resulting site-allele-frequency likelihood files were then used to generate a folded 2-d site fre- quency spectrum (SFS) with the command realSFS -fold 1. Weighted FST estimates were calculated using the realSFS fst command and the 2-d SFS as a prior. We considered an outlier any window with an FST value five standard deviations above the genome-wide mean. This analysis was per- formed across all 21 pairwise population comparisons and results were plotted using the R package qqman (Turner 2014). To avoid biases on the calculation of the genome-wide mean FST, we re- moved the sexual chromosomes which exhibit lower effective population size (Ne) and higher overall FST when compared to autosomal chromosomes.

160

3.4.7. Gene ontology enrichment

We performed a Gene Ontology (GO) term enrichment analysis to test for the overrepresentation of any particular biological function or molecular pathway in our set of candidate genes. Gene information for both Downy and Hairy Woodpecker was retrieved from the gene annotation of the Downy Woodpecker (Jarvis et al. 2014). We used ShinyGO (Ge et al. 2020) to test for enrich- ment of existing GO terms using the gene set of the Zebra Finch (Taeniopygia guttata), the avian species with the largest matching of annotated genes, and humans (Homo sapiens), the species with the most comprehensive gene annotation. A false discovery rate (FDR) cutoff of 5% was utilized to determine enriched GO terms. Because GO term lists can be very extensive and redundant, we used REVIGO (Supek et al. 2011) to summarize resulting GO terms into semantic clusters and assist the interpretation of results. Considering that cross-species GO term analyses can produce biased results when species differ in the completeness of their genome annotation, we also used

PANNZER2 (Törönen et al. 2018) to extract GO terms directly from the list of annotated proteins in Downy Woodpecker based on homology searches in the UniProt database (Bateman et al. 2020).

We then carried out a Fisher’s Exact test in R (R Core Team 2020) to compare the number of candidate genes annotated with a certain GO term versus the total number of genes showing that specific GO term annotation in the entire gene dataset. We considered a p-value of significance of

0.05 after FDR correction for multiple testing. This more conservative approach resulted in a smaller, but more reliable, list of enriched GO terms.

3.5. Author contribution

This study was conceived and designed by Lucas Rocha Moreira and Brian Tilston Smith. Lucas

Rocha Moreira conducted all statistical analyses and drafted the paper with input from Brian Til- ston Smith.

161

3.6. References

Aartsen, Wendy M., Albena Kantardzhieva, Jan Klooster, Agnes G. S. H. van Rossum, Serge A. van de Pavert, Inge Versteeg, Bob Nunes Cardozo, et al. 2006. “Mpp4 Recruits Psd95 and Veli3 towards the Photoreceptor Synapse.” Human Molecular Genetics 15 (8): 1291–1302.

Akira, Shizuo, and Kiyoshi Takeda. 2004. “Toll-like Receptor Signalling.” Nature Reviews. Immunol- ogy 4 (7): 499–511.

Alev, Cantas, Kaori Shinmyozu, Brendan A. S. McIntyre, and Guojun Sheng. 2009. “Genomic Organization of Zebra Finch Alpha and Beta Globin Genes and Their Expression in Primi- tive and Definitive Blood in Comparison with Globins in Chicken.” Development Genes and Evolution 219 (7): 353–60.

Allard, John B., and Cunming Duan. 2018. “IGF-Binding Proteins: Why Do They Exist and Why Are There so Many?” Frontiers in Endocrinology 9 (APR): 1–12.

Antonicka, Hana, and Eric A. Shoubridge. 2015. “Mitochondrial RNA Granules Are Centers for Posttranscriptional RNA Processing and Ribosome Biogenesis.” Cell Reports 10 (6): 920–32.

Ashton, Kyle G. 2002. “Patterns of within-Species Body Size Variation of Birds: Strong Evi- dence for Bergmann’s Rule.” Global Ecology and Biogeography: A Journal of Macroecology 11 (6): 505–23.

Baker, J., J. P. Liu, E. J. Robertson, and A. Efstratiadis. 1993. “Role of Insulin-like Growth Fac- tors in Embryonic and Postnatal Growth.” Cell 75 (1): 73–82.

Barghi, Neda, Joachim Hermisson, and Christian Schlötterer. 2020. “Polygenic Adaptation: A Unifying Framework to Understand Positive Selection.” Nature Reviews. Genetics 21 (12): 769–81.

Barnea, Gilad, Walter Strapps, Gilles Herrada, Yemiliya Berman, Jane Ong, Brian Kloss, Richard Axel, and Kevin J. Lee. 2008. “The Genetic Design of Signaling Cascades to Record Recep- tor Activation.” Proceedings of the National Academy of Sciences of the United States of America 105 (1): 64–69.

Bateman, Alex, Maria-Jesus Martin, Sandra Orchard, Michele Magrane, Rahat Agivetova, Shadab Ahmad, Emanuele Alpi, et al. 2020. “UniProt: The Universal Protein Knowledgebase in 2021.” Nucleic Acids Research. https://academic.oup.com/nar/advance-article-ab- stract/doi/10.1093/nar/gkaa1100/6006196.

Beal, F. E. L. 1910. “Birds of California in Relation to the Fruit Industry, Pt. 2.” U.S. Department of Agriculture, Biological Survey Bulletin 34.

Beal, F. E. L. 1911. “Food of the Woodpeckers of the United States” U.S. Department of Agricul- ture, Biological Survey Bulletin 37.

Beccavin, C., B. Chevalier, L. A. Cogburn, J. Simon, and M. J. Duclos. 2001. “Insulin-like Growth Factors and Body Growth in Chickens Divergently Selected for High or Low Growth Rate.” The Journal of Endocrinology 168 (2): 297–306.

162

Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society. Series B, Sta- tistical Methodology 57 (1): 289–300.

Benskin, Clare Mcw H., Kenneth Wilson, Keith Jones, and Ian R. Hartley. 2009. “Bacterial Path- ogens in Wild Birds: A Review of the Frequency and Effects of Infection.” Biological Reviews of the Cambridge Philosophical Society 84 (3): 349–73.

Bergmann, C. 1847. “About the Relationships between Heat Conservation and Body Size of An- imals.” Goett Stud 1: 595–708.

Bonneaud, Camille, Susan L. Balenger, Jiangwen Zhang, Scott V. Edwards, and Geoffrey E. Hill. 2012. “Innate Immunity and the Evolution of Resistance to an Emerging Infectious Disease in a Wild Bird.” Molecular Ecology 21 (11): 2628–39.

Bourgeois, Yann, and Stéphane Boissinot. 2019a. “On the Population Dynamics of Junk: A Re- view on the Population Genomics of Transposable Elements.” Genes 10 (6). https://doi.org/10.3390/genes10060419.

Bourgeois, Yann, and Stéphane Boissinot. 2019b. “Selection at Behavioural, Developmental and Metabolic Genes Is Associated with the Northward Expansion of a Successful Tropical Colonizer.” Molecular Ecology 28 (15): 3523–43.

Bourque, Guillaume, Kathleen H. Burns, Mary Gehring, Vera Gorbunova, Andrei Seluanov, Molly Hammell, Michaël Imbeault, et al. 2018. “Ten Things You Should Know about Transposable Elements.” Genome Biology 19 (1): 199.

Bozicevic, Vedran, Stephan Hutter, Wolfgang Stephan, and Andreas Wollstein. 2016. “Popula- tion Genetic Evidence for Cold Adaptation in European Drosophila melanogaster Popula- tions.” Molecular Ecology 25 (5): 1175–91.

Broggi, Juli, Esa Hohtola, Markku Orell, and Jan-Åke Nilsson. 2005. “Local Adaptation to Win- ter Conditions in a Passerine Spreading North: A Common-Garden Approach.” Evolution; International Journal of Organic Evolution 59 (7): 1600–1603.

Carroll, Christopher J., Pirjo Isohanni, Rosanna Pöyhönen, Liliya Euro, Uwe Richter, Virginia Brilhante, Alexandra Götz, et al. 2013. “Whole-Exome Sequencing Identifies a Mutation in the Mitochondrial Ribosome Protein MRPL44 to Underlie Mitochondrial Infantile Cardio- myopathy.” Journal of Medical Genetics 50 (3): 151–59.

Caye, Kevin, Basile Jumentier, Johanna Lepeule, and Olivier François. 2019. “LFMM 2: Fast and Accurate Inference of Gene-Environment Associations in Genome-Wide Studies.” Molecular Biology and Evolution 36 (4): 852–60.

Chan, Yingguang Frank, Melissa E. Marks, Felicity C. Jones, Guadalupe Villarreal, Michael D. Shapiro, Shannon D. Brady, Audrey M. Southwick, et al. 2010. “Adaptive Evolution of Pel- vic Reduction in Sticklebacks by Recurrent Deletion of a Pitx1 Enhancer.” Science 327 (5963): 302–5.

Chaurushiya, Mira S., and Matthew D. Weitzman. 2009. “Viral Manipulation of DNA Repair and Cell Cycle Checkpoints.” DNA Repair 8 (9): 1166–76.

163

Chen, Shun, Anchun Cheng, and Mingshu Wang. 2013. “Innate Sensing of Viruses by Pattern Recognition Receptors in Birds.” Veterinary Research 44 (September): 82.

Chen, Yan-Hong, and Huabin Zhao. 2019. “Evolution of Digestive Enzymes and Dietary Diver- sification in Birds.” PeerJ 7 (April): e6840.

Chevin, Luis-Miguel, Guillaume Martin, and Thomas Lenormand. 2010. “Fisher’s Model and the Genomics of Adaptation: Restricted Pleiotropy, Heterogenous Mutation, and Parallel Evo- lution.” Evolution; International Journal of Organic Evolution 64 (11): 3213–31.

Cingolani, Pablo, Adrian Platts, Le Lily Wang, Melissa Coon, Tung Nguyen, Luan Wang, Susan J. Land, Xiangyi Lu, and Douglas M. Ruden. 2012. “A Program for Annotating and Predicting the Effects of Single Nucleotide Polymorphisms, SnpEff.” Fly 6 (2): 80–92.

Cirotto, C., F. Panara, and I. Arangi. 1987. “The Minor Haemoglobins of Primitive and Defini- tive Erythrocytes of the Chicken Embryo. Evidence for Haemoglobin L.” Development 101 (4): 805–13.

Colella, Jocelyn P., Anna Tigano, Olga Dudchenko, Arina D. Omer, Ruqayya Khan, Ivan D. Bochkov, Erez L. Aiden, and Matthew D. MacManes. 2020. “Multiple Evolutionary Path- ways to Achieve Thermal Adaptation in Small Mammals.” bioRxiv. https://doi.org/10.1101/2020.06.29.178392.

Colosimo, Pamela F., Kim E. Hosemann, Sarita Balabhadra, Guadalupe Villarreal Jr., Mark Dick- son, Jane Grimwood, Jeremy Schmutz, Richard M. Myers, Dolph Schluter, and David M. Kingsley. 2005. “Widespread Parallel Evolution in Sticklebacks by Repeated Fixation of Ec- todysplasin Alleles.” Science 307 (5717): 1928–33.

Colosimo, Pamela F., Catherine L. Peichel, Kirsten Nereng, Benjamin K. Blackman, Michael D. Shapiro, Dolph Schluter, and David M. Kingsley. 2004. “The Genetic Architecture of Paral- lel Armor Plate Reduction in Threespine Sticklebacks.” Edited by Nipam Patel. PLoS Biology 2 (5): e109.

Conte, Gina L., Matthew E. Arnegard, Catherine L. Peichel, and Dolph Schluter. 2012. “The Probability of Genetic Parallelism and Convergence in Natural Populations.” Proceedings of the Royal Society B: Biological Sciences 279 (1749): 5039–47.

Cooper, Jacob C. 2018. “Niche Theory and Its Relation to Morphology and Phenotype in Geo- graphic Space: A Case Study in Woodpeckers (Picidae).” Journal of Avian Biology 49 (10): e01771.

Cresko, William A., Angel Amores, Catherine Wilson, Joy Murphy, Mark Currey, Patrick Phillips, Michael A. Bell, Charles B. Kimmel, and John H. Postlethwait. 2004. “Parallel Genetic Basis for Repeated Evolution of Armor Loss in Alaskan Threespine Stickleback Populations.” Proceedings of the National Academy of Sciences 101 (16): 6050–55.

Daftary, Shabrine S., and Andrea C. Gore. 2005. “IGF-1 in the Brain as a Regulator of Repro- ductive Neuroendocrine Function.” Experimental Biology and Medicine 230 (5): 292–306.

Damek-Poprawa, Monika, Tanja Diemer, Vanda S. Lopes, Concepción Lillo, Dawn C. Harper, Michael S. Marks, Yalin Wu, et al. 2009. “Melanoregulin (MREG) Modulates Lysosome Function in Pigment Epithelial Cells.” The Journal of Biological Chemistry 284 (16): 10877–89.

164

Duan, Cunming, and Qijin Xu. 2005. “Roles of Insulin-like Growth Factor (IGF) Binding Pro- teins in Regulating IGF Actions.” General and Comparative Endocrinology 142 (1-2): 44–52.

Dufort, Matthew J. 2016. “An Augmented Supermatrix Phylogeny of the Avian Family Picidae Reveals Uncertainty Deep in the Family Tree.” Molecular Phylogenetics and Evolution 94 (Janu- ary): 313–26.

Edwards, Darryl B. 2012. “Immune Investment Is Explained by Sexual Selection and Pace-of- Life, but Not Longevity in Parrots (Psittaciformes).” PloS One 7 (12): e53066.

Elkholi, Rana, Ioana Abraham-Enachescu, Andrew P. Trotta, Camila Rubio-Patiño, Jarvier N. Mohammed, Mark P. A. Luna-Vargas, Jesse D. Gelles, et al. 2019. “MDM2 Integrates Cellu- lar Respiration and Apoptotic Signaling through NDUFS1 and the Mitochondrial Net- work.” Molecular Cell 74 (3): 452–65.e7.

Emmerson, Elaine, Laura Campbell, Faith C. J. Davies, Nina L. Ross, Gillian S. Ashcroft, An- drée Krust, Pierre Chambon, and Matthew J. Hardman. 2012. “Insulin-like Growth Factor- 1 Promotes Wound Healing in Estrogen-Deprived Mice: New Insights into Cutaneous IGF-1R/ERα Cross Talk.” The Journal of Investigative Dermatology 132 (12): 2838–48.

Exposito-Alonso, Moises, Hernán A. Burbano, Oliver Bossdorf, Rasmus Nielsen, and Detlef Weigel. 2019. “Natural Selection on the Arabidopsis thaliana Genome in Present and Future Climates.” Nature 573 (7772): 126–29.

Fang, Bohao, Petri Kemppainen, Paolo Momigliano, Xueyun Feng, and Juha Merilä. 2020. “On the Causes of Geographically Heterogeneous Parallel Evolution in Sticklebacks.” Nature Ecology and Evolution 4 (8): 1105–15.

Fernández, Juan Manuel, Juan Ignacio Areta, and Martjan Lammertink. 2020. “Does Foraging Competition Drive Plumage Convergence in Three Look-Alike Atlantic Forest Woodpecker Species?” Journal of Ornithology. https://doi.org/10.1007/s10336-020-01802-8.

Feschotte, Cédric. 2008. “Transposable Elements and the Evolution of Regulatory Networks.” Nature Reviews. Genetics 9 (5): 397–405.

Foll, Matthieu, Oscar E. Gaggiotti, Josephine T. Daub, Alexandra Vatsiou, and Laurent Excof- fier. 2014. “Widespread Signals of Convergent Adaptation to High Altitude in Asia and America.” American Journal of Human Genetics 95 (4): 394–407.

Fraser, Bonnie A., and James R. Whiting. 2020. “What Can Be Learned by Scanning the Genome for Molecular Convergence in Wild Populations?” Annals of the New York Academy of Sciences 1476 (1): 23–42.

Frichot, Eric, and Olivier François. 2015. “LEA: An R Package for Landscape and Ecological Association Studies.” Edited by Brian O’Meara. Methods in Ecology and Evolution / British Eco- logical Society 6 (8): 925–29.

Frichot, Eric, Sean D. Schoville, Guillaume Bouchard, and Olivier François. 2013. “Testing for Associations between Loci and Environmental Gradients Using Latent Factor Mixed Mod- els.” Molecular Biology and Evolution 30 (7): 1687–99.

165

Gain, Clément, and Olivier François. 2021. “LEA 3: Factor Models in Population Genetics and Ecological Genomics with R.” Molecular Ecology Resources, February. https://doi.org/10.1111/1755-0998.13366.

Galen, Spencer C., Chandrasekhar Natarajan, Hideaki Moriyama, Roy E. Weber, Angela Fago, Phred M. Benham, Andrea N. Chavez, Zachary A. Cheviron, Jay F. Storz, and Christopher C. Witt. 2015. “Contribution of a Mutational Hot Spot to Hemoglobin Adaptation in High- Altitude Andean House Wrens.” Proceedings of the National Academy of Sciences of the United States of America 112 (45): 13958–63.

Ge, Steven Xijin, Dongmin Jung, and Runan Yao. 2020. “ShinyGO: A Graphical Gene-Set En- richment Tool for Animals and Plants.” Edited by Alfonso Valencia. Bioinformatics 36 (8): 2628–29.

González, Josefa, Talia L. Karasov, Philipp W. Messer, and Dmitri A. Petrov. 2010. “Genome- Wide Patterns of Adaptation to Temperate Environments Associated with Transposable El- ements in Drosophila.” PLoS Genetics 6 (4): e1000905.

Gould, Stephen Jay. 1989. Wonderful Life: The Burgess Shale and the Nature of History. Norton, New York.

Hämälä, Tuomas, Amanda J. Gorton, David A. Moeller, and Peter Tiffin. 2020. “Pleiotropy Fa- cilitates Local Adaptation to Distant Optima in Common Ragweed (Ambrosia Artemisiifo- lia).” PLoS Genetics 16 (3): e1008707.

Hamilton, T. H. 1961. “The Adaptive Significances of Intraspecific Trends of Variation in Wing Length and Body Size among Bird Species.” Evolution; International Journal of Organic Evolution 15 (2): 180.

Hasselquist, Dennis. 2007. “Comparative Immunoecology in Birds: Hypotheses and Tests.” Jour- nal of Ornithology / DO-G 148 (2): 571–82.

Hecker, Nikolai, Virag Sharma, and Michael Hiller. 2019. “Convergent Gene Losses Illuminate Metabolic and Physiological Changes in Herbivores and Carnivores.” Proceedings of the Na- tional Academy of Sciences 116 (8): 3036–41.

Hellström, Ann, David Ley, Ingrid Hansen-Pupp, Boubou Hallberg, Luca A. Ramenghi, Chata- rina Löfqvist, Lois E. H. Smith, and Anna-Lena Hård. 2016. “Role of Insulinlike Growth Factor 1 in Fetal Development and in the Early Postnatal Life of Premature Infants.” Ameri- can Journal of Perinatology 33 (11): 1067–71.

Hijmans, Robert J., Susan E. Cameron, Juan L. Parra, Peter G. Jones, and Andy Jarvis. 2005. “Very High Resolution Interpolated Climate Surfaces for Global Land Areas.” International Journal of Climatology 25 (15): 1965–78.

Hoban, Sean, Joanna L. Kelley, Katie E. Lotterhos, Michael F. Antolin, Gideon Bradburd, David B. Lowry, Mary L. Poss, Laura K. Reed, Andrew Storfer, and Michael C. Whitlock. 2016. “Finding the Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions.” The American Naturalist 188 (4): 379–97.

Hoffmann, Federico G., Juan C. Opazo, and Jay F. Storz. 2012. “Whole-Genome Duplications Spurred the Functional Diversification of the Globin Gene Superfamily in Vertebrates.” Mo- lecular Biology and Evolution 29 (1): 303–12. 166

Hohenlohe, Paul A., Susan Bassham, Paul D. Etter, Nicholas Stiffler, Eric A. Johnson, and Wil- liam A. Cresko. 2010. “Population Genomics of Parallel Adaptation in Threespine Stickle- back Using Sequenced RAD Tags.” PLoS Genetics 6 (2): e1000862.

Holliday, Jason A., Lecong Zhou, Rajesh Bawa, Man Zhang, and Regis W. Oubida. 2016. “Evi- dence for Extensive Parallelism but Divergent Genomic Architecture of Adaptation along Altitudinal and Latitudinal Gradients in Populus trichocarpa.” The New Phytologist 209 (3): 1240– 51.

Howell, Arthur Holmes. 1911. Birds of Arkansas. U.S. Department of Agriculture, Biological Sur- vey.

Hwang, Christopher K., Shyam S. Chaurasia, Chad R. Jackson, Guy C-K Chan, Daniel R. Storm, and P. Michael Iuvone. 2013. “Circadian Rhythm of Contrast Sensitivity Is Regulated by a Dopamine-Neuronal PAS-Domain Protein 2-Adenylyl Cyclase 1 Signaling Pathway in Reti- nal Ganglion Cells.” The Journal of Neuroscience: The Official Journal of the Society for Neuroscience 33 (38): 14989–97.

Ikehara, T., Y. Eguchi, S. Kayo, and H. Takei. 1997. “Isolation and Sequencing of Two Alpha- Globin Genes alpha(A) and alpha(D) in Pigeon and Evidence for Embryo-Specific Expres- sion of the alpha(D)-Globin Gene.” Biochemical and Biophysical Research Communications 234 (2): 450–53.

Irene Tieleman, B., Joseph B. Williams, Robert E. Ricklefs, and Kirk C. Klasing. 2005. “Consti- tutive Innate Immunity Is a Component of the Pace-of-Life Syndrome in Tropical Birds.” Proceedings. Biological Sciences / The Royal Society 272 (1573): 1715–20.

Jackson, Jason M., Meaghan L. Pimsler, Kennan J. Oyen, James P. Strange, Michael E. Dillon, and Jeffrey D. Lozier. 2020. “Local Adaptation across a Complex Bioclimatic Landscape in Two Montane Bumble Bee Species.” Molecular Ecology 29 (5): 920–39.

James, Frances C. 1970. “Geographic Size Variation in Birds and Its Relationship to Climate.” Ecology 51 (3): 365–90.

Jarvis, Erich D., Siavash Mirarab, Andre J. Aberer, B. Bo Li, Peter Houde, Cai Li, Et Al., et al. 2014. “Whole-Genome Analyses Resolve Early Branches in the Tree of Life of Modern Birds.” Science 346 (6215): 1320–31.

Jones, J. I., and D. R. Clemmons. 1995. “Insulin-like Growth Factors and Their Binding Pro- teins: Biological Actions.” Endocrine Reviews 16 (1): 3–34.

Jordan, Martha S., Andrew L. Singer, and Gary A. Koretzky. 2003. “Adaptors as Central Media- tors of Signal Transduction in Immune Cells.” Nature Immunology 4 (2): 110–16.

Kajimura, Shingo, Katsumi Aida, and Cunming Duan. 2006. “Understanding Hypoxia-Induced Gene Expression in Early Development: In Vitro and in Vivo Analysis of Hypoxia-Induci- ble Factor 1-Regulated Zebra Fish Insulin-like Growth Factor Binding Protein 1 Gene Ex- pression.” Molecular and Cellular Biology 26 (3): 1142–55.

Kawai, Masanobu, and Clifford J. Rosen. 2009. “Insulin-like Growth Factor-I and Bone: Lessons from Mice and Men.” Pediatric Nephrology 24 (7): 1277–85.

167

Kendeigh, S. Charles, and Charles R. Blem. 1974. “Metabolic Adaptation to Local Climate in Birds.” Comparative Biochemistry and Physiology. Part A, Physiology 48 (1): 175–87.

Kocamis, H., D. C. Kirkpatrick-Keller, H. Klandorf, and J. Killefer. 1998. “In Ovo Administra- tion of Recombinant Human Insulin-like Growth Factor-I Alters Postnatal Growth and Development of the Broiler Chicken.” Poultry Science 77 (12): 1913–19.

Kohl, Kevin D., Paweł Brzęk, Enrique Caviedes-Vidal, and William H. Karasov. 2011. “Pancre- atic and Intestinal Carbohydrases Are Matched to Dietary Starch Level in Wild Passerine Birds.” Physiological and Biochemical Zoology: PBZ 84 (2): 195–203.

Konečná, Veronika, Sian Bray, Jakub Vlček, Magdalena Bohutínská, Doubravka Požárová, Rim- jhim Roy Choudhury, Anita Bollmann-Giolai, et al. 2021. “Parallel Adaptation in Autopoly- ploid Arabidopsis Arenosa Is Dominated by Repeated Recruitment of Shared Alleles.” bio- Rxiv, 2021.01.15.426785.

Lamichhaney, Sangeet, Angela P. Fuentes-Pardo, Nima Rafati, Nils Ryman, Gregory R. McCracken, Christina Bourne, Rabindra Singh, Daniel E. Ruzzante, and Leif Andersson. 2017. “Parallel Adaptive Evolution of Geographically Distant Herring Populations on Both Sides of the North Atlantic Ocean.” Proceedings of the National Academy of Sciences 114 (17): E3452–61.

Lammertink, Martjan, Cecilia Kopuchian, Hanja B. Brandl, Pablo L. Tubaro, and Hans Winkler. 2016. “A Striking Case of Deceptive Woodpecker Colouration: The Threatened Helmeted Woodpecker Dryocopus galeatus Belongs in the Genus Celeus.” Journal of Ornithology / DO-G 157 (1): 109–16.

Laviola, Luigi, Annalisa Natalicchio, and Francesco Giorgino. 2007. “The IGF-I Signaling Path- way.” Current Pharmaceutical Design 13 (7): 663–69.

Leube, R. E., B. L. Bader, F. X. Bosch, R. Zimbelmann, T. Achtstaetter, and W. W. Franke. 1988. “Molecular Characterization and Expression of the Stratification-Related Cytokeratins 4 and 15.” The Journal of Cell Biology 106 (4): 1249–61.

Levano, Kelly, Vineet Punia, Michael Raghunath, Priya Ranjan Debata, Gina Marie Curcio, Amit Mogha, Sudarshana Purkayastha, Dan McCloskey, Jimmie Fata, and Probal Banerjee. 2012. “ATP8A1 Deficiency Is Associated with Phosphatidylserine Externalization in Hippocam- pus and Delayed Hippocampus-Dependent Learning.” Journal of Neurochemistry 120 (2): 302– 13.

Liknes, Eric T., and David L. Swanson. 1996. “Seasonal Variation in Cold Tolerance, Basal Met- abolic Rate, and Maximal Capacity for Thermogenesis in White-Breasted Nuthatches Sitta carolinensis and Downy Woodpeckers Picoides pubescens, Two Unrelated Arboreal Temperate Residents.” Journal of Avian Biology 27 (4): 279–88.

Lim, Marisa C. W., Christopher C. Witt, Catherine H. Graham, and Liliana M. Dávalos. 2019. “Parallel Molecular Evolution in Pathways, Genes, and Sites in High-Elevation Humming- birds Revealed by Comparative Transcriptomics.” Edited by Balazs Papp. Genome Biology and Evolution 11 (6): 1573–85.

Lindstedt, Stan L., and Mark S. Boyce. 1985. “Seasonality, Fasting Endurance, and Body Size in Mammals.” The American Naturalist 125 (6): 873–78.

168

Lindström, Karin M., Johannes Foufopoulos, Henrik Pärn, and Martin Wikelski. 2004. “Immu- nological Investments Reflect Parasite Abundance in Island Populations of Darwin’s Finches.” Proceedings. Biological Sciences / The Royal Society 271 (1547): 1513–19.

Lobkovsky, Alexander E., and Eugene V. Koonin. 2012. “Replaying the Tape of Life: Quantifi- cation of the Predictability of Evolution.” Frontiers in Genetics 3 (NOV): 1–8.

Lodjak, Jaanis, Marko Mägi, Elin Sild, and Raivo Mänd. 2017. “Causal Link between Insulin‐like Growth Factor 1 and Growth in Nestlings of a Wild Passerine Bird.” Functional Ecology 31 (1): 184–91.

Lodjak, Jaanis, Marko Mägi, and Vallo Tilgar. 2014. “Insulin-like Growth Factor 1 and Growth Rate in Nestlings of a Wild Passerine Bird.” Edited by Jennifer Grindstaff. Functional Ecology 28 (1): 159–66.

Lodjak, Jaanis, Raivo Mänd, and Marko Mägi. 2018. “Insulin-like Growth Factor 1 and Life-His- tory Evolution of Passerine Birds.” Functional Ecology 32 (2): 313–23.

Lodjak, Jaanis, and Simon Verhulst. 2020. “Insulin-like Growth Factor 1 of Wild Vertebrates in a Life-History Context.” Molecular and Cellular Endocrinology 518 (December): 110978.

Lowell, Clifford A. 2004. “SRC-Family Kinases: Rheostats of Immune Cell Signaling.” Molecular Immunology 41 (6-7): 631–43.

Luftig, Micah A. 2014. “Viruses and the DNA Damage Response: Activation and Antagonism.” Annual Review of Virology 1 (1): 605–25.

Lupu, F., J. D. Terwilliger, K. Lee, G. V. Segre, and A. Efstratiadis. 2001. “Roles of Growth Hormone and Insulin-like Growth Factor 1 in Mouse Postnatal Growth.” Developmental Biol- ogy 229 (1): 141–62.

Luu, Keurcien, Eric Bazin, and Michael G. B. Blum. 2017. “Pcadapt: An R Package to Perform Genome Scans for Selection Based on Principal Component Analysis.” Molecular Ecology Re- sources 17 (1): 67–77.

Magalhaes, Isabel S., James R. Whiting, Daniele D’Agostino, Paul A. Hohenlohe, Muayad Mahmud, Michael A. Bell, Skúli Skúlason, and Andrew D. C. MacColl. 2020. “Interconti- nental Genomic Parallelism in Multiple Three-Spined Stickleback Adaptive Radiations.” Na- ture Ecology & Evolution, November. https://doi.org/10.1038/s41559-020-01341-8.

Mahr, Katharina, Orsolya Vincze, Zsófia Tóth, Herbert Hoi, and Ádám Z. Lendvai. 2020. “Insu- lin-like Growth Factor 1 Is Related to the Expression of Plumage Traits in a Passerine Spe- cies.” Behavioral Ecology and Sociobiology 74 (3): 39.

Mann, Karlheinz, and Matthias Mann. 2008. “The Chicken Egg Yolk Plasma and Granule Prote- omes.” Proteomics 8 (1): 178–91.

Manthey, Joseph D., Robert G. Moyle, and Stéphane Boissinot. 2018. “Multiple and Independ- ent Phases of Transposable Element Amplification in the Genomes of Piciformes (wood- peckers and Allies).” Genome Biology and Evolution 10 (6): 1445–56.

169

Martin, Arnaud, and Virginie Orgogozo. 2013. “The Loci of Repeated Evolution: A Catalog of Genetic Hotspots of Phenotypic Variation.” Evolution; International Journal of Organic Evolution 67 (5): 1235–50.

Martinez Barrio, Alvaro, Sangeet Lamichhaney, Guangyi Fan, Nima Rafati, Mats Pettersson, He Zhang, Jacques Dainat, et al. 2016. “The Genetic Basis for Ecological Adaptation of the At- lantic Herring Revealed by Genome Sequencing.” eLife 5 (May): 1–32.

Martínez-Ortega, Cristina, Eduardo Sa Santos, and Diego Gil. 2014. “Species-Specific Differ- ences in Relative Eye Size Are Related to Patterns of Edge Avoidance in an Amazonian Rainforest Bird Community.” Ecology and Evolution 4 (19): 3736–45.

Martin, Lynn B., II, Monica Pless, Julia Svoboda, and Martin Wikelski. 2004. “Immune Activity in Temperate and Tropical House Sparrows: A Common Garden Experiment.” Ecology 85 (8): 2323–31.

Maures, Travis J., and Cunming Duan. 2002. “Structure, Developmental Expression, and Physio- logical Regulation of Zebrafish IGF Binding Protein-1.” Endocrinology 143 (7): 2722–31.

McCracken, K. G., C. P. Barger, M. Bulgarella, K. P. Jhonson, S. A. Sonsthagen, J. Trucco, T. H. Valqui, R. E. Wilson, K. Winker, and M. D. Sorenson. 2009. “Parallel Evolution in the Ma- jor Haemoglobin Genes of Eight Species of Andean Waterfowl.” Molecular Ecology 18 (19): 3992–4005.

McNab, Brian K. 1971. “On the Ecological Significance of Bergmann’s Rule.” Ecology 52 (5): 845–54.

Miller, Eliot T., Gavin M. Leighton, Benjamin G. Freeman, Alexander C. Lees, and Russell A. Ligon. 2019. “Ecological and Geographical Overlap Drive Plumage Evolution and Mimicry in Woodpeckers.” Nature Communications 10 (1): 1602.

Mohan, Subburaman, and Chandrasekhar Kesavan. 2012. “Role of Insulin-like Growth Factor-1 in the Regulation of Skeletal Growth.” Current Osteoporosis Reports 10 (2): 178–86.

Møller, Anders P. 1998. “Evidence of Larger Impact of Parasites on Hosts in the Tropics: In- vestment in Immune Function within and Outside the Tropics.” Oikos 82 (2): 265–70.

Moore, K. J., D. A. Swing, E. M. Rinchik, M. L. Mucenski, A. M. Buchberg, N. G. Copeland, and N. A. Jenkins. 1988. “The Murine Dilute Suppressor Gene Dsu Suppresses the Coat- Color Phenotype of Three Pigment Mutations That Alter Melanocyte Morphology, D, Ash and Ln.” Genetics 119 (4): 933–41.

Murphy, Edward C. 1985. “Bergmann’s Rule, Seasonality, and Geographic Variation in Body Size of House Sparrows.” Evolution; International Journal of Organic Evolution 39 (6): 1327–34.

Natarajan, Chandrasekhar, Federico G. Hoffmann, Roy E. Weber, Angela Fago, Christopher C. Witt, and Jay F. Storz. 2016. “Predictable Convergence in Hemoglobin Function Has Un- predictable Molecular Underpinnings.” Science 354 (6310): 336–39.

Natarajan, Chandrasekhar, Joana Projecto-Garcia, Hideaki Moriyama, Roy E. Weber, Violeta Muñoz-Fuentes, Andy J. Green, Cecilia Kopuchian, et al. 2015. “Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level.” Edited by David J. Begun. PLoS Genetics 11 (12): e1005681.

170

Neff, Johnson Andrew. 1928. A Study of the Economic Status of the Common Woodpeckers in Relation to Oregon Horticulture. Marionville, MO: Free Press Print.

Nelson, R. T., J. Boyd, R. P. Gladue, T. Paradis, R. Thomas, A. C. Cunningham, P. Lira, et al. 2001. “Genomic Organization of the CC Chemokine Mip-3alpha/CCL20/larc/exo- dus/SCYA20, Showing Gene Structure, Splice Variants, and Chromosome Localization.” Genomics 73 (1): 28–37.

Ng, Chen Siang, and Wen-Hsiung Li. 2018. “Genetic and Molecular Basis of Feather Diversity in Birds.” Edited by Esther Betran. Genome Biology and Evolution 10 (10): 2572–86.

Ni, Yang, Muhammad A. Hagras, Vassiliki Konstantopoulou, Johannes A. Mayr, Alexei A. Stu- chebrukhov, and David Meierhofer. 2019. “Mutations in NDUFS1 Cause Metabolic Repro- gramming and Disruption of the Electron Transfer.” Cells 8 (10). https://doi.org/10.3390/cells8101149.

Ohbayashi, Norihiko, Yuto Maruta, Morié Ishida, and Mitsunori Fukuda. 2012. “Melanoregulin Regulates Retrograde Melanosome Transport through Interaction with the RILP- p150Glued Complex in Melanocytes.” Journal of Cell Science 125 (Pt 6): 1508–18.

Oh, Kevin P., Cameron L. Aldridge, Jennifer S. Forbey, Carolyn Y. Dadabay, and Sara J. Oyler- McCance. 2019. “Conservation Genomics in the Sagebrush Sea: Population Divergence, Demographic History, and Local Adaptation in Sage-Grouse (Centrocercus Spp.).” Edited by Charles Baer. Genome Biology and Evolution 11 (7): 2023–34.

O’Kusky, J. R., P. Ye, and A. J. D’Ercole. 2000. “Insulin-like Growth Factor-I Promotes Neuro- genesis and Synaptogenesis in the Hippocampal Dentate Gyrus during Postnatal Develop- ment.” The Journal of Neuroscience: The Official Journal of the Society for Neuroscience 20 (22): 8435– 42.

Olsen, Björn, Vincent J. Munster, Anders Wallensten, J. Waldenstrom, A. D. M. E. Osterhaus, and R. A. M. Fouchier. 2006. “Global Patterns of Influenza A Virus in Wild Birds.” Science 312 (5772): 384–88.

Olson, Valérie A., Richard G. Davies, C. David L. Orme, Gavin H. Thomas, Shai Meiri, Tim M. Blackburn, Kevin J. Gaston, Ian P. F. Owens, and Peter M. Bennett. 2009. “Global Bioge- ography and Ecology of Body Size in Birds.” Ecology Letters 12 (3): 249–59.

Opazo, Juan C., Federico G. Hoffmann, Chandrasekhar Natarajan, Christopher C. Witt, Michael Berenbrink, and Jay F. Storz. 2015. “Gene Turnover in the Avian Globin Gene Families and Evolutionary Changes in Hemoglobin Isoform Expression.” Molecular Biology and Evolu- tion 32 (4): 871–87.

Orgogozo, Virginie, Baptiste Morizot, and Arnaud Martin. 2015. “The Differential View of Gen- otype-Phenotype Relationships.” Frontiers in Genetics 6 (MAY): 1–14.

O’Sullivan, T. Norene, Xufeng S. Wu, Rivka A. Rachel, Jiang-Dong Huang, Deborah A. Swing, Lydia E. Matesic, John A. Hammer 3rd, Neal G. Copeland, and Nancy A. Jenkins. 2004. “Dsu Functions in a MYO5A-Independent Pathway to Suppress the Coat Color of Dilute Mice.” Proceedings of the National Academy of Sciences of the United States of America 101 (48): 16831–36.

171

Otto, Anthony, and Ketan Patel. 2010. “Signalling and the Control of Skeletal Muscle Size.” Ex- perimental Cell Research 316 (18): 3059–66.

Ouellet, Henri Roger. 1977. “Biosystematics and Ecology of Picoides villosus (L.) and P. vubescens (L.) (Aves Picidae).” McGill University.

Pagenkopp, K. M., J. Klicka, K. L. Durrant, J. C. Garvin, and R. C. Fleischer. 2008. “Geographic Variation in Malarial Parasite Lineages in the Common Yellowthroat (Geothlypis trichas).” Conservation Genetics 9 (6): 1577–88.

Pearce, Eiluned, and Robin Dunbar. 2012. “Latitudinal Variation in Light Levels Drives Human Visual System Size.” Biology Letters 8 (1): 90–93.

Perrini, Sebastio, Luigi Laviola, Marcos C. Carreira, Angelo Cignarelli, Annalisa Natalicchio, and Francesco Giorgino. 2010. “The GH/IGF1 Axis and Signaling Pathways in the Muscle and Bone: Mechanisms Underlying Age-Related Skeletal Muscle Wasting and Osteoporosis.” The Journal of Endocrinology 205 (3): 201–10.

Phifer-Rixey, Megan, Ke Bi, Kathleen G. Ferris, Michael J. Sheehan, Dana Lin, Katya L. Mack, Sara M. Keeble, Taichi A. Suzuki, Jeffrey M. Good, and Michael W. Nachman. 2018. “The Genomic Basis of Environmental Adaptation in House Mice.” Edited by Bret A. Payseur. PLoS Genetics 14 (9): e1007672.

Piersma, Theunis. 1997. “Do Global Patterns of Habitat Use and Migration Strategies Co-Evolve with Relative Investments in Immunocompetence due to Spatial Variation in Parasite Pres- sure?” Oikos 80 (3): 623–31.

Pool, John E., Dylan T. Braun, and Justin B. Lack. 2016. “Parallel Evolution of Cold Tolerance within Drosophila melanogaster.” Molecular Biology and Evolution 34 (2): msw232.

Pörtner, Hans O. 2004. “Climate Variability and the Energetic Pathways of Evolution: The Origin of Endothermy in Mammals and Birds.” Physiological and Biochemical Zoology: PBZ 77 (6): 959–81.

Privé, Florian, Keurcien Luu, Bjarni J. Vilhjálmsson, and Michael G. B. Blum. 2020. “Performing Highly Efficient Genome Scans for Local Adaptation with R Package Pcadapt Version 4.” Molecular Biology and Evolution 37 (7): 2153–54.

Projecto-Garcia, Joana, Chandrasekhar Natarajan, Hideaki Moriyama, Roy E. Weber, Angela Fago, Zachary A. Cheviron, Robert Dudley, Jimmy A. McGuire, Christopher C. Witt, and Jay F. Storz. 2013. “Repeated Elevational Transitions in Hemoglobin Function during the Evolution of Andean Hummingbirds.” Proceedings of the National Academy of Sciences 110 (51): 20669–74.

Prum, Richard Owen, and Larry Samuelson. 2012. “The Hairy–Downy Game: A Model of Inter- specific Social Dominance Mimicry.” Journal of Theoretical Biology 313 (November): 42–60.

Prüter, Hanna, Mathias Franz, Sönke Twietmeyer, Niklas Böhm, Gudrun Middendorff, Ruben Portas, Jörg Melzheimer, et al. 2020. “Increased Immune Marker Variance in a Population of Invasive Birds.” Scientific Reports 10 (1): 21764.

Quintana-Murci, Lluís, and Andrew G. Clark. 2013. “Population Genetic Tools for Dissecting Innate Immunity in Humans.” Nature Reviews. Immunology 13 (4): 280–93.

172

Rajpathak, Swapnil N., Marc J. Gunter, Judith Wylie-Rosett, Gloria Y. F. Ho, Robert C. Kaplan, Radhika Muzumdar, Thomas E. Rohan, and Howard D. Strickler. 2009. “The Role of Insu- lin-like Growth Factor-I and Its Binding Proteins in Glucose Homeostasis and Type 2 Dia- betes.” Diabetes/metabolism Research and Reviews 25 (1): 3–12.

Rand, A. L. 1961. “Some Size Gradients in North American Birds.” The Wilson Bulletin 73 (1): 46–56.

Randall, Richard E., and Stephen Goodbourn. 2008. “Interferons and Viruses: An Interplay be- tween Induction, Signalling, Antiviral Responses and Virus Countermeasures.” The Journal of General Virology 89 (Pt 1): 1–47.

Ravinet, Mark, Tore Oldeide Elgvin, Cassandra Trier, Mansour Aliabadian, Andrey Gavrilov, and Glenn-Peter Sætre. 2018. “Signatures of Human-Commensalism in the House Sparrow Genome.” Proceedings of the Royal Society B: Biological Sciences 285 (1884): 20181246.

Ricklefs, R. E. 1992. “Embryonic Development Period and the Prevalence of Avian Blood Para- sites.” Proceedings of the National Academy of Sciences of the United States of America 89 (10): 4722– 25.

Röhrl, Johann, De Yang, Joost J. Oppenheim, and Thomas Hehlgans. 2010. “Specific Binding and Chemotactic Activity of mBD4 and Its Functional Orthologue hBD2 to CCR6-Ex- pressing Cells*.” The Journal of Biological Chemistry 285 (10): 7028–34.

Rosenblum, Erica Bree, Christine E. Parent, and Erin E. Brandt. 2014. “The Molecular Basis of Phenotypic Convergence.” Annual Review of Ecology, Evolution, and Systematics 45 (1): 203–26.

Rout, Ashok K., Xufeng Wu, Mary R. Starich, Marie-Paule Strub, John A. Hammer, and Nico Tjandra. 2018. “The Structure of Melanoregulin Reveals a Role for Cholesterol Recognition in the Protein’s Ability to Promote Dynein Function.” Structure 26 (10): 1373–83.e4.

Rubenstein, Dustin R., A. F. Parlow, Chelsea R. Hutch, and Lynn B. Martin 2nd. 2008. “Envi- ronmental and Hormonal Correlates of Immune Activity in a Cooperatively Breeding Trop- ical Bird.” General and Comparative Endocrinology 159 (1): 10–15.

Savolainen, Outi, Martin Lascoux, and Juha Merilä. 2013. “Ecological Genomics of Local Adap- tation.” Nature Reviews. Genetics 14 (11): 807–20.

Schneider, Wolfgang J. 2009. “Receptor-Mediated Mechanisms in Ovarian Follicle and Oocyte Development.” General and Comparative Endocrinology 163 (1-2): 18–23.

Schrader, Lukas, Jay W. Kim, Daniel Ence, Aleksey Zimin, Antonia Klein, Katharina Wyschetzki, Tobias Weichselgartner, et al. 2014. “Transposable Element Islands Facilitate Adaptation to Novel Environments in an Invasive Species.” Nature Communications 5 (De- cember): 5495.

Schutyser, E., S. Struyf, P. Menten, J. P. Lenaerts, R. Conings, W. Put, A. Wuyts, P. Proost, and J. Van Damme. 2000. “Regulated Production and Molecular Diversity of Human Liver and Activation-Regulated Chemokine/Macrophage Inflammatory Protein-3 Alpha from Normal and Transformed Cells.” Journal of Immunology 165 (8): 4470–77.

173

Shakya, Subir B., Jérôme Fuchs, Jean-Marc Pons, and Frederick H. Sheldon. 2017. “Tapping the Woodpecker Tree for Evolutionary Insight.” Molecular Phylogenetics and Evolution 116 (No- vember): 182–91.

Shultz, Allison J., and Timothy B. Sackton. 2019. “Immune Genes Are Hotspots of Shared Posi- tive Selection across Birds and Mammals.” eLife 8 (January): 1–33.

Siepielski, Adam M., Michael B. Morrissey, Mathieu Buoro, Stephanie M. Carlson, Christina M. Caruso, Sonya M. Clegg, Tim Coulson, et al. 2017. “Precipitation Drives Global Variation in Natural Selection.” Science 355 (6328): 959–62.

Sironi, Manuela, Rachele Cagliani, Diego Forni, and Mario Clerici. 2015. “Evolutionary Insights into Host-Pathogen Interactions from Mammalian Sequence Data.” Nature Reviews. Genetics 16 (4): 224–36.

Sol, Daniel, Roger Jovani, and Jordi Torres. 2000. “Geographical Variation in Blood Parasites in Feral Pigeons: The Role of Vectors.” Ecography 23 (3): 307–14.

Steen, Jon. 1958. “Climatic Adaptation in Some Small Northern Birds.” Ecology 39 (4): 625–29.

Stern, David L., and Virginie Orgogozo. 2008. “The Loci of Evolution: How Predictable Is Ge- netic Evolution?” Evolution; International Journal of Organic Evolution 62 (9): 2155–77.

Storz, Jay F. 2016. “Causes of Molecular Convergence and Parallelism in Protein Evolution.” Nature Reviews. Genetics 17 (4): 239–50.

Storz, Jay F. 2018. Hemoglobin: Insights into Protein Structure, Function, and Evolution. Oxford Univer- sity Press.

Storz, Jay F., Juan C. Opazo, and Federico G. Hoffmann. 2011. “Phylogenetic Diversification of the Globin Gene Superfamily in .” IUBMB Life 63 (5): 313–22.

Supek, Fran, Matko Bošnjak, Nives Škunca, and Tomislav Šmuc. 2011. “REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms.” PloS One 6 (7): e21800.

Swanson, David L. 2006. “A Comparative Analysis of Thermogenic Capacity and Cold Toler- ance in Small Birds.” The Journal of Experimental Biology 209 (3): 466–74.

Szenker-Ravi, Emmanuelle, Umut Altunoglu, Marc Leushacke, Célia Bosso-Lefèvre, Muznah Khatoo, Hong Thi Tran, Thomas Naert, et al. 2018. “RSPO2 Inhibition of RNF43 and ZNRF3 Governs Limb Development Independently of LGR4/5/6.” Nature 557 (7706): 564–69.

Terekhanova, Nadezhda V., Anna E. Barmintseva, Alexey S. Kondrashov, Georgii A. Bazykin, and Nikolai S. Mugue. 2019. “Architecture of Parallel Adaptation in Ten Lacustrine Threespine Stickleback Populations from the White Sea Area.” Edited by Mar Alba. Genome Biology and Evolution 11 (9): 2605–18.

Thomas, Robert J., Tamás Székely, Innes C. Cuthill, David G. C. Harper, Stuart E. Newson, Tim D. Frayling, and Paul D. Wallis. 2002. “Eye Size in Birds and the Timing of Song at Dawn.” Proceedings. Biological Sciences / The Royal Society 269 (1493): 831–37.

174

Tieleman, B. Irene. 2018. “Understanding Immune Function as a Pace of Life Trait Requires En- vironmental Context.” Behavioral Ecology and Sociobiology 72 (3): 55.

Tolson, Antonia H., and Hongbing Wang. 2010. “Regulation of Drug-Metabolizing Enzymes by Xenobiotic Receptors: PXR and CAR.” Advanced Drug Delivery Reviews 62 (13): 1238–49.

Törönen, Petri, Alan Medlar, and Liisa Holm. 2018. “PANNZER2: A Rapid Functional Annota- tion Web Server.” Nucleic Acids Research 46 (W1): W84–88.

Turner, Stephen D. 2014. “Qqman: An R Package for Visualizing GWAS Results Using Q-Q and Manhattan Plots.” Cold Spring Harbor Laboratory. https://doi.org/10.1101/005165.

Upton, Z., S. J. Chan, D. F. Steiner, J. C. Wallace, and F. J. Ballard. 1993. “Evolution of Insulin- like Growth Factor Binding Proteins.” Growth Regulation 3 (1): 29–32.

Vasemägi, A., and C. R. Primmer. 2005. “Challenges for Identifying Functionally Important Ge- netic Variation: The Promise of Combining Complementary Research Strategies.” Molecular Ecology 14 (12): 3623–42.

Velová, Hana, Maria W. Gutowska-Ding, David W. Burt, and Michal Vinkler. 2018. “Toll-Like Receptor Evolution in Birds: Gene Duplication, Pseudogenization, and Diversifying Selec- tion.” Edited by Meredith Yeager. Molecular Biology and Evolution 35 (9): 2170–84.

Wagner, Günter P., and Jianzhi Zhang. 2011. “The Pleiotropic Structure of the Genotype–phe- notype Map: The Evolvability of Complex Organisms.” Nature Reviews. Genetics 12 (3): 204– 13.

Walden, Nora, Kay Lucek, and Yvonne Willi. 2020. “Lineage-Specific Adaptation to Climate In- volves Flowering Time in North American Arabidopsis lyrata.” Molecular Ecology 29 (8): 1436– 51.

Walsh, Jennifer, Phred M. Benham, Petra E. Deane-Coe, Peter Arcese, Bronwyn G. Butcher, Yvonne L. Chan, Zachary A. Cheviron, et al. 2019. “Genomics of Rapid Ecological Diver- gence and Parallel Adaptation in Four Tidal Marsh Sparrows.” Evolution Letters 3 (4): 324– 38.

Walters, Sheree J., Todd P. Robinson, Margaret Byrne, Grant W. Wardell‐Johnson, and Paul Nevill. 2020. “Contrasting Patterns of Local Adaptation along Climatic Gradients between a Sympatric Parasitic and Autotrophic Tree Species.” Molecular Ecology, no. March (July): mec.15537.

Wang, Guo-Dong, Weiwei Zhai, He-Chuan Yang, Ruo-Xi Fan, Xue Cao, Li Zhong, Lu Wang, et al. 2013. “The Genomics of Selection in Dogs and the Parallel Evolution between Dogs and Humans.” Nature Communications 4 (May): 1860.

Wang, Guo-Song, He-He Liu, Lin-Seng Li, and Ji-Wen Wang. 2012. “Influence of Ovo Injecting IGF-1 on Weights of Embryo, Heart and Liver of Duck during Hatching Stages.” free-jour- nal.umm.ac.id. 2012.

Wang, Yanjiu, Jingmin Huang, Peng Xia, Jianbin He, Changfa Wang, Zhihua Ju, Jianbin Li, Rongling Li, Jifeng Zhong, and Qiuling Li. 2013. “Genetic Variations of HSBP1 Gene and Its Effect on Thermal Performance Traits in Chinese Holstein Cattle.” Molecular Biology Re- ports 40 (6): 3877–82.

175

Wang, Zhi, B-Y Liao, and Jianzhi Zhang. 2010. “Genomic Patterns of Pleiotropy and the Evolu- tion of Complexity.” Proceedings of the National Academy of Sciences 107 (42): 18034–39.

Weibel, Amy C., and William S. Moore. 2005. “Plumage Convergence in Picoides Woodpeckers Based on a Molecular Phylogeny, with Emphasis on Convergence in Downy and Hairy Woodpeckers.” The Condor 107 (4): 797–809.

Wikelski, Martin, Laura Spinney, Wendy Schelsky, Alexander Scheuerlein, and Eberhard Gwin- ner. 2003. “Slow Pace of Life in Tropical Sedentary Birds: A Common-Garden Experiment on Four Stonechat Populations from Different Latitudes.” Proceedings of the Royal Society B: Biological Sciences 270 (1531): 2383–88.

Wu, Xufeng S., Jose A. Martina, and John A. Hammer 3rd. 2012. “Melanoregulin Is Stably Tar- geted to the Melanosome Membrane by Palmitoylation.” Biochemical and Biophysical Research Communications 426 (2): 209–14.

Wu, Xufeng S., Andreas Masedunskas, Roberto Weigert, Neal G. Copeland, Nancy A. Jenkins, and John A. Hammer. 2012. “Melanoregulin Regulates a Shedding Mechanism That Drives Melanosome Transfer from Melanocytes to Keratinocytes.” Proceedings of the National Acad- emy of Sciences of the United States of America 109 (31): E2101–9.

Wu, Z. L., S. A. Thomas, E. C. Villacres, Z. Xia, M. L. Simmons, C. Chavkin, R. D. Palmiter, and D. R. Storm. 1995. “Altered Behavior and Long-Term Potentiation in Type I Adenylyl Cyclase Mutant Mice.” Proceedings of the National Academy of Sciences of the United States of Amer- ica 92 (1): 220–24.

Yakar, Shoshana, Clifford J. Rosen, Wesley G. Beamer, Cheryl L. Ackert-Bicknell, Yiping Wu, Jun-Li Liu, Guck T. Ooi, et al. 2002. “Circulating Levels of IGF-1 Directly Regulate Bone Growth and Density.” The Journal of Clinical Investigation 110 (6): 771–81.

Yeaman, Sam, Kathryn A. Hodgins, Katie E. Lotterhos, Haktan Suren, Simon Nadeau, Jon C. Degner, Kristin A. Nurkowski, et al. 2016. “Convergent Local Adaptation to Climate in Distantly Related Conifers.” Science 353 (6306): 23–26.

Yuan, Meng, and John R. Stinchcombe. 2020. “Population Genomics of Parallel Adaptation.” Molecular Ecology 29 (21): 4033–36.

Yu, Mingke, Zhicao Yue, Ping Wu, Da-Yu Wu, Julie-Ann Mayer, Marcus Medina, Randall B. Widelitz, Ting-Xin Jiang, and Cheng-Ming Chuong. 2004. “The Biology of Feather Folli- cles.” The International Journal of Developmental Biology 48 (2-3): 181–91.

Yusuf, Leeban, Matthew C. Heatley, Joseph P. G. Palmer, Henry J. Barton, Christopher R. Cooney, and Toni I. Gossmann. 2020. “Noncoding Regions Underpin Avian Bill Shape Di- versification at Macroevolutionary Scales.” Genome Research 30 (4): 553–65.

Zhang, Guojie, C. Li, Q. Li, B. Bo Li, Denis M. Larkin, Chul Lee, Jay F. Storz, et al. 2014. “Com- parative Genomics Reveals Insights into Avian Genome Evolution and Adaptation.” Science 346 (6215): 1311–20.

Zhang, Hong-Yan, Xue-Qi Liu, Yu-Na Sun, and Others. 2010. “Expression, Purification and Crystallization of Heat Shock Factor Binding Protein 1.” Prog Biochem Biophys 37: 441–44.

176

Zhao, Yanling, Li Liang, Yihui Fan, Surong Sun, Lei An, Zhongcheng Shi, Jin Cheng, et al. 2012. “PPM1B Negatively Regulates Antiviral Response via Dephosphorylating TBK1.” Cellular Signalling 24 (11): 2197–2204.

Zhou, H., A. D. Mitchell, J. P. McMurtry, C. M. Ashwell, and S. J. Lamont. 2005. “Insulin-like Growth Factor-I Gene Polymorphism Associations with Growth, Body Composition, Skel- eton Integrity, and Metabolic Traits in Chickens.” Poultry Science 84 (2): 212–19.

Zhu, Xiaojia, Yuyan Guan, Anthony V. Signore, Chandrasekhar Natarajan, Shane G. DuBay, Ya- lin Cheng, Naijian Han, et al. 2018. “Divergent and Parallel Routes of Biochemical Adapta- tion in High-Altitude Passerine Birds from the Qinghai-Tibet Plateau.” Proceedings of the Na- tional Academy of Sciences of the United States of America 115 (8): 1865–70.

177

3.7. Supplemental Material

Table 3.S1. Candidate SNPs found near genes detected in all three methods (LFMM 2, PCAdapt, and H-scan) and their respective annotation information. Distance from Position Functional Environmental D. pubescens Chromosome nearest gene Gene Protein (bp) annotation correlate gene ID (kb) – Trifunctional purine biosynthetic protein 1 21563815 Intron variant precipitation Ppu_R012755 GART adenosine-3 1 51053364 Intron variant precipitation – Ppu_R000430 ARSD Sulfatase domain-containing protein 1A 27717360 Intron variant precipitation – Ppu_R003091 LRP6 Low-density lipoprotein receptor-related protein 6 1A 27717717 Intron variant precipitation – Ppu_R003091 LRP6 Low-density lipoprotein receptor-related protein 6 1A 27718804 Intron variant precipitation – Ppu_R003091 LRP6 Low-density lipoprotein receptor-related protein 6 1A 27724459 Intron variant precipitation – Ppu_R003091 LRP6 Low-density lipoprotein receptor-related protein 6 1A 27758745 Intron variant precipitation – Ppu_R003091 LRP6 Low-density lipoprotein receptor-related protein 6 1A 71735103 Intron variant precipitation – Ppu_R000535 IQSEC1 IQ motif and SEC7 domain-containing protein 1 2 1095942 Intron variant precipitation – Ppu_R000561 ZMYND11 Zinc finger MYND domain-containing protein 11 2 52092271 Intron variant precipitation – Ppu_R001765 ABHD4 (Lyso)-N-acylphosphatidylethanolamine lipase Upstream 2 52106222 precipitation 14.567 Ppu_R001765 ABHD4 (Lyso)-N-acylphosphatidylethanolamine lipase gene variant 2 52792987 Intron variant precipitation – Ppu_R001754 TNS3 Tensin-3 178 2 92590916 Intron variant precipitation – Ppu_R014952 PTPRM Receptor-type tyrosine-protein phosphatase mu 2 94593305 Intron variant precipitation – Ppu_R007607 THOC1 THO complex subunit 1 Upstream 2 108212833 precipitation 8.857 Ppu_R001894 LACTB2 Endoribonuclease LACTB2 gene variant Upstream 2 108353337 precipitation 64.373 Ppu_R001893 EYA1 Eyes absent homolog 1 gene variant Upstream 2 108353901 precipitation 64.937 Ppu_R001893 EYA1 Eyes absent homolog 1 gene variant Downstream 3 62882529 precipitation 4.314 Ppu_R011563 CNKSR3 Connector enhancer of kinase suppressor of ras 3 gene variant 3 91819237 Intron variant precipitation – Ppu_R006246 SOGA3 Protein SOGA3 4 27118730 Intron variant precipitation – Ppu_R004340 GRID2 Glutamate receptor ionotropic 4 27134598 Intron variant precipitation – Ppu_R004340 GRID2 Glutamate receptor ionotropic 4 27145628 Intron variant precipitation – Ppu_R004340 GRID2 Glutamate receptor ionotropic – Gamma-aminobutyric acid receptor subunit alpha- 4 30398809 Intron variant precipitation Ppu_R007104 GABRA2 2 – NACHT and WD repeat domain-containing 4 43182762 Intron variant precipitation Ppu_R006697 KIAA1239 protein 2

4A 23228055 Intron variant precipitation – Ppu_R002828 Uncharacterized 5 5128012 Intron variant precipitation – Ppu_R013560 SOX6 Transcription factor SOX-6 5 5209051 Intron variant precipitation – Ppu_R013560 SOX6 Transcription factor SOX-6 5 45070403 Intron variant precipitation – Ppu_R010076 MEIS2B Myeloid ecotropic viral insertion site-2a protein – Calcium/calmodulin-dependent protein kinase 6 12627079 Intron variant precipitation Ppu_R006090 CAMK2D type II subunit delta Upstream 7 21335811 precipitation 3.349 Ppu_R007794 VTG3 Vitellogenin-3 gene variant 8 6613067 Intron variant precipitation – Ppu_R010825 WDR78 Dynein intermediate chain 4, axonemal 8 6614114 Intron variant precipitation – Ppu_R010825 WDR78 Dynein intermediate chain 4, axonemal Upstream 8 6645770 precipitation 2.867 Ppu_R010824 MIER1 Mesoderm induction early response protein 1 gene variant Upstream 8 6647324 precipitation 1.313 Ppu_R010824 MIER1 Mesoderm induction early response protein 1 gene variant – FGGY carbohydrate kinase domain-containing 8 29813769 Intron variant precipitation Ppu_R009856 FGGY protein – FGGY carbohydrate kinase domain-containing 8 29814938 Intron variant precipitation Ppu_R009856 FGGY protein 8 29880056 Intron variant precipitation – Ppu_R009855 Uncharacterized 8 30565078 Intron variant precipitation – Ppu_R008643 TTLL7 Tubulin polyglutamylase TTLL7

Downstream 179 9 7933056 precipitation 2.597 Ppu_R003248 SLC7A14 Probable cationic amino acid transporter gene variant Downstream 9 7933511 precipitation 3.052 Ppu_R003248 SLC7A14 Probable cationic amino acid transporter gene variant Downstream 9 7933840 precipitation 3.381 Ppu_R003248 SLC7A14 Probable cationic amino acid transporter gene variant 9 7943261 Intron variant precipitation – Ppu_R003249 CLDN11 Claudin-11 Upstream Ventricular zone-expressed PH domain- 9 10877031 precipitation 50.317 Ppu_R004992 VEPH1 gene variant containing protein homolog 1 11 2027456 Intron variant precipitation – Ppu_R000515 PSKH1 Serine/threonine-protein kinase H1 Upstream 11 2042839 precipitation 2.348 Ppu_R000516 Uncharacterized gene variant 11 2072247 Intron variant precipitation – Ppu_R000520 E2F4 Transcription factor E2F4 11 2083827 Intron variant precipitation – Ppu_R000521 ELMO3 Engulfment and cell motility protein 3 12 24355533 Intron variant precipitation – Ppu_R002286 IFT122 Intraflagellar transport protein 122 homolog 15 1271587 Intron variant precipitation – Ppu_R015557 PRRC2C BAT2 domain-containing protein 1 15 8431688 Intron variant precipitation – Ppu_R000941 ATXN2 Ataxin-2 Downstream B-cell CLL/lymphoma 7 protein family member 15 8947318 precipitation 3.775 Ppu_R000921 BCL7A gene variant A 15 16727281 Intron variant precipitation – Ppu_R014136 RTDR1 Radial spoke head 14 homolog

Synonymous – 15 18786042 precipitation Ppu_R012405 Uncharacterized variant 15 18804925 Intron variant precipitation – Ppu_R012402 DDX51 ATP-dependent RNA helicase DDX51 15 18849593 Intron variant precipitation – Ppu_R012399 Uncharacterized – Phosphatidylinositol glycan anchor biosynthesis 20 213159 Intron variant precipitation Ppu_R003558 PIGU class U protein Upstream 20 1265371 precipitation 3.058 Ppu_R010768 CPNE1 Copine-1 gene variant 20 17324786 Intron variant precipitation – Ppu_R013428 SRMS Tyrosine-protein kinase Srms Downstream 23 54584 precipitation 2.715 Ppu_R014796 Uncharacterized gene variant Upstream 23 1174357 precipitation 0.375 Ppu_R014740 GALE UDP-glucose 4-epimerase gene variant Downstream 23 1175994 precipitation 0.814 Ppu_R014740 GALE UDP-glucose 4-epimerase gene variant Downstream 23 1184392 precipitation 3.92 Ppu_R014738 PITHD1 PITH domain-containing protein 1 gene variant Upstream 27 2021180 precipitation 6.436 Ppu_R005571 CNTD1 Cyclin N-terminal domain-containing protein 1 gene variant Downstream temperature and 27 2507175 2.879 Ppu_R001199 K1C15 Type I alpha-keratin 15

gene variant precipitation 180 Downstream Z 19743362 precipitation 1.374 Ppu_R002587 CENPK Centromere protein K gene variant Z 25910536 Intron variant precipitation – Ppu_R005350 SLC24A2 Sodium/potassium/calcium exchanger 2 Z 31020298 Intron variant precipitation – Ppu_R000280 YTHDC2 3'-5' RNA helicase YTHDC2 Z 45510199 Intron variant precipitation – Ppu_R014530 TDRD7 Tudor domain-containing protein 7 – Potassium/sodium hyperpolarization-activated Z 55531036 Intron variant precipitation Ppu_R000526 HCN1 cyclic nucleotide-gated channel 1 – Potassium/sodium hyperpolarization-activated Z 55531037 Intron variant precipitation Ppu_R000526 HCN1 cyclic nucleotide-gated channel 1 – Potassium/sodium hyperpolarization-activated Z 55570357 Intron variant precipitation Ppu_R000526 HCN1 cyclic nucleotide-gated channel 1 Z 75143011 Intron variant precipitation – Ppu_R009457 UHRF2 E3 ubiquitin-protein ligase UHRF2 Z 82231434 Intron variant precipitation – Ppu_R008951 ACOT12 Acetyl-coenzyme A thioesterase

Table 3.S2. Enriched gene ontologies for FST-outlier genes across all pairwise population comparisons of Downy Woodpecker. Significance was de- termined through a Fisher’s Exact test and false discovery rate (FDR) correction.

Candidate Genome Corrected GO ID GO description Odds ratio count count p-value

GO:0006508 proteolysis 75 800 2.679234 1.32E-08

GO:0015671 oxygen transport 8 18 19.26124 0.000217

GO:0071726 cellular response to diacyl bacterial lipopeptide 4 4 Inf 0.001851

GO:0006352 DNA-templated transcription, initiation 10 42 7.539544 0.002565

GO:0034136 negative regulation of toll-like receptor 2 signaling pathway 4 5 95.51428 0.005375

GO:0042744 hydrogen peroxide catabolic process 7 23 10.51274 0.008259 181

GO:0006334 nucleosome assembly 9 42 6.569507 0.010059

GO:0015074 DNA integration 21 194 2.957407 0.010059

GO:0006278 RNA-dependent DNA biosynthetic process 28 303 2.493505 0.010127

GO:0034166 toll-like receptor 10 signaling pathway 3 3 Inf 0.013826

GO:1901985 positive regulation of protein acetylation 4 8 23.91531 0.031023

GO:0007099 centriole replication 5 15 11.97948 0.038202

GO:0031100 animal organ regeneration 8 42 5.657532 0.038202

GO:0007190 activation of adenylate cyclase activity 6 24 7.996506 0.045924 GO: gene ontology.

Table 3.S3. Enriched gene ontologies for FST-outlier genes across all pairwise population comparisons of Hairy Woodpecker. Significance was deter- mined through a Fisher’s Exact test and false discovery rate (FDR) correction.

Candidate Genome Corrected GO ID GO description Odds ratio count count p-value

GO:0006334 nucleosome assembly 12 42 9.072881 2.83E-05

GO:0006352 DNA-templated transcription, initiation 13 42 10.18481 3.66E-06

GO:0006508 proteolysis 85 800 2.917113 3.19E-12

GO:0008210 estrogen metabolic process 14 24 31.89072 3.52E-11

GO:0009812 flavonoid metabolic process 4 7 29.88721 0.016201

GO:0015671 oxygen transport 9 18 22.59493 4.29E-06

GO:0016999 antibiotic metabolic process 6 8 67.37634 3.40E-05 182 GO:0042744 hydrogen peroxide catabolic process 7 23 9.850683 0.005733

GO:0043086 negative regulation of catalytic activity 17 50 11.7782 4.29E-09

GO:0051552 flavone metabolic process 11 12 250.0335 5.16E-12

GO:0052695 cellular glucuronidation 4 4 Inf 0.000599

GO:0052696 flavonoid glucuronidation 13 13 Inf 1.59E-15

GO:0052697 xenobiotic glucuronidation 13 13 Inf 1.59E-15

GO:0070980 biphenyl catabolic process 4 4 Inf 0.000599

Table 3.S4. Top 5 SNPs with the strongest environmental correlation for each climatic variable, according to LFMM 2. Environmental Species Chromosome Position (bp) Functional annotation Log 10 p-value Gene correlate Downy Woodpecker 1 98706861 intergenic region PC1-T 47.56911891 ?- UBE3A Downy Woodpecker 1 136711329 intron variant PC1-T 47.56911891 ARHGEF17 Downy Woodpecker 7 38819251 intron variant PC1-T 47.56911891 TMEM163 Downy Woodpecker 7 17101382 downstream gene variant PC1-T 32.87466356 TTN Downy Woodpecker 9 2410453 upstream gene variant PC1-T 17.93554235 EIF4E2 Downy Woodpecker 3 21248898 intron variant PC2-T 14.672407 ? Downy Woodpecker 4 38173127 intergenic region PC2-T 14.611323 AGXT2L1 - RCJMB04_2p15 Downy Woodpecker 3 21248899 intron variant PC2-T 14.242413 ? Downy Woodpecker 2 43697315 intergenic region PC2-T 13.066548 VTG3 - CDH8 Downy Woodpecker 11 14327128 intergenic region PC2-T 13.027377 NA Downy Woodpecker 7 17101382 downstream gene variant PC3-T 21.872318 TTN Downy Woodpecker 9 2410453 upstream gene variant PC3-T 16.272248 EIF4E2 Downy Woodpecker 4 75535532 intergenic region PC3-T 13.628446 C4orf23 - GPR78

Downy Woodpecker 1 1687056 intergenic region PC3-T 13.357558 KCNJ6 - KCNJ15 Downy Woodpecker 1 98706861 intergenic region PC3-T 12.666721 ? - UBE3A Downy Woodpecker 7 17101382 downstream gene variant PC1-P 106.72281 TTN 183 Downy Woodpecker 5 29363340 intron variant PC1-P 30.710356 ZFYVE1 Downy Woodpecker 1 97380553 intergenic region PC1-P 29.127983 OCA2 - ? Downy Woodpecker 5 1710290 upstream gene variant PC1-P 28.995058 RCJMB04_15f5 Downy Woodpecker 5 45682823 intergenic region PC1-P 28.79364 C15orf41 - ATPBD4 Downy Woodpecker 7 17101382 downstream gene variant PC2-P 74.570809 TTN Downy Woodpecker 2 118607770 intergenic region PC2-P 32.876074 PARD6A - VTG3 Downy Woodpecker 19 5930761 intron variant PC2-P 27.444742 NF1 Downy Woodpecker 4A 10277217 intergenic region PC2-P 26.022023 CDKN1A - FGF12 Downy Woodpecker 1A 65685363 intron variant PC2-P 25.337774 MON2 Downy Woodpecker 7 37620491 intron variant PC3-P 13.608021 BAZ2B Downy Woodpecker 15 8080864 intergenic region PC3-P 13.517517 RPH3A - ? Downy Woodpecker 2 42803063 intergenic region PC3-P 13.499545 ANKRD28 - GALNTL2 Downy Woodpecker 6 34576603 intergenic region PC3-P 13.471091 ? - MKI67 Downy Woodpecker 15 4494309 upstream gene variant PC3-P 13.371722 TBX4 Hairy Woodpecker 7 38555381 intron variant PC1-T 39.540937 R3HDM1 Hairy Woodpecker 5 55378308 upstream gene variant PC1-T 11.789355 EIF2B2

Hairy Woodpecker 18 6917211 synonymous variant PC1-T 11.747086 RECQL5 Hairy Woodpecker 7 29455387 intron variant PC1-T 11.340045 UBE2F Hairy Woodpecker 7 2957761 downstream gene variant PC1-T 10.72663 SH3BP4 Hairy Woodpecker 7 38555381 intron variant PC2-T 37.266519 R3HDM1 Hairy Woodpecker 10 28737702 intergenic region PC2-T 13.571734 MCTP1 - NR2F2 Hairy Woodpecker 1 109623526 intron variant PC2-T 12.035758 DIAPH2 Hairy Woodpecker 23 6464390 intergenic region PC2-T 11.329763 FGR - SERINC2 Hairy Woodpecker 13 18775004 intron variant PC2-T 11.072553 UBTD1 Hairy Woodpecker 7 38555381 intron variant PC3-T 26.92805407 R3HDM1 Hairy Woodpecker 12 16343297 intron variant PC3-T 14.10978645 SLC25A26 Hairy Woodpecker 11 14481723 intergenic region PC3-T 13.09621761 VTG3 - CDH8 Hairy Woodpecker 5 55378308 upstream gene variant PC3-T 12.50141109 EIF2B2 Hairy Woodpecker 11 14480694 intergenic region PC3-T 11.98771808 VTG3 - CDH8 Hairy Woodpecker 7 38555381 intron variant PC1-P 77.706429 R3HDM1 Hairy Woodpecker 5 55378308 upstream gene variant PC1-P 16.551297 EIF2B2 Hairy Woodpecker 12 16343297 intron variant PC1-P 15.9728 SLC25A26 Hairy Woodpecker 18 6917211 synonymous variant PC1-P 15.368538 RECQL5 Hairy Woodpecker 1 42415490 upstream gene variant PC1-P 14.609225 ?

Hairy Woodpecker 7 38555381 intron variant PC2-P 99.376743 R3HDM1 184 Hairy Woodpecker 3 41010329 upstream gene variant PC2-P 12.951863 VTG3 Hairy Woodpecker 3 113042593 intergenic region PC2-P 12.951863 DLGAP2 - ERICH1 Hairy Woodpecker 26 3716275 intergenic region PC2-P 11.663952 ST7L - WNT2B Hairy Woodpecker 2 16233421 intergenic region PC2-P 10.722445 ACVR2B - ? Hairy Woodpecker 4 46799521 upstream gene variant PC3-P 17.163191 ZDHHC20 Hairy Woodpecker 1A 8617899 intergenic region PC3-P 16.312499 VTG3 - SEMA3D Hairy Woodpecker 1 75511739 intergenic region PC3-P 15.677093 ? - ? Hairy Woodpecker 1 54022426 intergenic region PC3-P 15.4277 ARGLU1 - FAM155A Hairy Woodpecker 3 90853059 intron variant PC3-P 15.4277 LAMA2 * PC: principal component; T: temperature; P: precipitation; ?: uncharacterized gene