<<

Molecular Ecology (2015) doi: 10.1111/mec.13140

Differentially expressed genes match bill morphology and plumage despite largely undifferentiated genomes in a Holarctic songbird

NICHOLAS A. MASON*† and SCOTT A. TAYLOR*† *Department of Ecology and Evolutionary Biology, Cornell University, 215 Tower Rd., Ithaca, NY 14853, USA, †Fuller Evolutionary Biology Program, Laboratory of Ornithology, Cornell University, 159 Sapsucker Woods Road, Ithaca, NY 14850, USA

Abstract Understanding the patterns and processes that contribute to phenotypic diversity and speciation is a central goal of evolutionary biology. Recently, high-throughput sequencing has provided unprecedented phylogenetic resolution in many lineages that have experienced rapid diversification. The Holarctic finches (Genus: Acanthis) provide an intriguing example of a recent, phenotypically diverse lineage; traditional sequencing and genotyping methods have failed to detect any genetic differences between currently recognized species, despite marked variation in plumage and mor- phology within the genus. We examined variation among 20 712 anonymous single nucleotide polymorphisms (SNPs) distributed throughout the redpoll genome in com- bination with 215 825 SNPs within the redpoll transcriptome, gene expression data and ecological niche modelling to evaluate genetic and ecological differentiation among currently recognized species. Expanding upon previous findings, we present evidence of (i) largely undifferentiated genomes among currently recognized species; (ii) substantial niche overlap across the North American Acanthis range; and (iii) a strong relationship between polygenic patterns of gene expression and continuous phenotypic variation within a sample of from . The patterns we report may be caused by high levels of ongoing gene flow between polymorphic populations, incomplete lineage sorting accompanying very recent or ongoing diver- gence, variation in cis-regulatory elements, or phenotypic plasticity, but do not support a scenario of prolonged isolation and subsequent secondary contact. Together, these findings highlight ongoing theoretical and computational challenges presented by recent, rapid bouts of phenotypic diversification and provide new insight into the evo- lutionary dynamics of an intriguing, understudied non-model system.

Keywords: Fringillidae, gene expression, high-throughput sequencing, phenotypic diversity, species limits Received 4 December 2014; accepted 27 February 2015

that different regions of the genome may convey differ- Introduction ent information about the speciation process (Key 1968; Inferring the patterns and processes that accompany the Bazykin 1969; Barton & Hewitt 1981; Rand & Harrison generation of phenotypic diversity and new species is 1989; Harrison & Rand 1989; Harrison 1990; Wu 2001; an overarching goal of evolutionary biology. In recent Nosil & Feder 2012; Seehausen et al. 2014). Moreover, years, evolutionary biologists have embraced the notion the criteria used to delimit species have changed over time—the onset of high-throughput sequencing has pro- Correspondence: Nicholas A. Mason, Fax: (607) 255-8088; vided unprecedented amounts of genetic data that can E-mail: [email protected] be used to infer evolutionary history (Lemmon &

© 2015 John Wiley & Sons Ltd 2 N. A. MASON and S. A. TAYLOR

Lemmon 2013; McCormack et al. 2013). These novel spread genus of songbirds: the redpoll finches (Acan- technologies provide promising tools to study the evo- this). lution of phenotypic diversity and speciation, particu- The redpoll finch complex currently includes three larly within groups that have experienced recent and species, Acanthis flammea, Acanthis hornemanni and Acan- rapid diversification, which typically lack coalescence this cabaret, which are recognized by most authorities and exhibit incomplete lineage sorting (Maddison & (e.g. Clements et al. 2014; Fig. 1B). However, between Knowles 2006). Such ‘species flocks’ present an ongoing one and six species have been recognized based on challenge for evolutionary biologists to discriminate plumage and morphology (Coues 1862; Harris et al. true speciation events from hybrid swarms and ongoing 1965; Troy 1985; Herremans 1989; Seutin et al. 1992; gene flow, especially when marked phenotypic varia- Marthinsen et al. 2008). Collectively, redpolls are dis- tion is present. tributed throughout the Holarctic and individuals that Reduced-representation approaches, such as double- differ in plumage and morphology (putative species) digest restriction-associated DNA sequencing (ddRAD- frequently co-occur within the Holarctic range of the Seq) and genotyping by sequencing (GBS), are genus. The most widespread taxon, A. flammea, has dar- outperforming traditional Sanger-sequencing methods ker plumage overall, heavily streaked sides and under- in their ability to help us understand the evolution of tail coverts, and a longer, wider bill (Clement 2010a). phenotypic variation in young lineages. Hybridization By comparison, A. hornemanni is lighter with less streak- and incomplete lineage sorting are often common in ing and a stubby, narrower, conical bill (Clement such lineages, which can obscure evolutionary rela- 2010b). The most recently recognized species, A. cabaret, tionships (e.g. Lake Victoria cichlids, Wagner et al. is the smallest of the redpoll taxa and is browner over- 2012; Nicaraguan crater lake cichlids, Elmer et al. all (Knox et al. 2001; Sangster et al. 2002). Acanthis flam- 2014; Heliconius butterflies, Nadeau et al. 2013; Xipho- mea and A. hornemanni are both widespread and phorus fishes, Cui et al. 2013; Jones et al. 2013; wolf- abundant throughout the Holarctic, although A. horne- like canids, vonHoldt et al. 2011; American oaks, Hipp manni is generally found at higher latitudes. Acanthis et al. 2014; Carex, Escudero et al. 2014). Reduced-repre- cabaret was historically restricted to the British Isles, but sentation approaches have also generated evidence of has recently colonized northern mainland and strong genetic differentiation in the absence of obvi- southern . ous differences in plumage and morphology in a Although some studies have suggested geographic widely distributed lowland Neotropical species structuring or multimodal distributions of phenotypic (Harvey & Brumfield 2014), provided the resolution variation within the redpolls (Molau 1985; Seutin et al. necessary to robustly test hypotheses about the gener- 1992), other studies have indicated a high prevalence of ation of hybrid species (e.g. Nice et al. 2012) and intermediate phenotypes and overlap in plumage and proved useful for resolving shallow population struc- morphological characters between currently recognized ture in species of conservation concern (e.g. Larson species (Troy 1985; Herremans 1989). Beyond morpho- et al. 2013). logical differences, previous studies have cited differ- In addition to the novel insights gained from ences in vocalizations (Molau 1985; Herremans 1989), restriction enzyme-based approaches, RNA-Seq experi- phenology (Herremans 1989) and physiology (Brooks ments have revealed that gene expression differences 1968) as evidence of multiple species within the com- also play an important role in the speciation process plex. Most recently, Lifjeld & Bjerke (1996) suggested (Wolf et al. 2010). Changes in gene expression often that A. cabaret and A. flammea pair assortatively in underlie phenotypic differences among taxa, and dif- southeast Norway. However, mixed pairs have also ferential gene expression may associate with early been documented (Harris et al. 1965), and the presence stages of the speciation process (Pavey et al. 2010; of hybrid offspring has been debated (Molau 1985). Brawand et al. 2012; Harrison et al. 2012; Filteau et al. Despite phenotypic variation among currently recog- 2013). Interactions among genotypes and the environ- nized species, molecular studies of redpoll populations ment can also affect gene expression, resulting in have consistently failed to document genetic differentia- phenotypic plasticity and polymorphism within a tion (restriction fragment length polymorphisms lineage (West-Eberhard 2005). However, studies that (RFLPs), Marten & Johnson 1986; RFLPs, Seutin et al. evaluate gene expression in addition to variation 1995; mitochondrial control region, Ottvall et al. 2002; among putatively neutral, anonymous loci are scarce. mitochondrial control region and 10 microsatellites Here, we integrate data from reduced-representation Marthinsen et al. 2008). From a biogeographic perspec- libraries with gene expression data to investigate the tive, this lack of genetic variation is unusual; most role of population differentiation and gene expression Holarctic demonstrate some degree of phylogeo- in generating phenotypic diversity within a wide- graphic structuring among temperate ecoregions or

© 2015 John Wiley & Sons Ltd PHENOTYPIC AND GENETIC VARIATION IN REDPOLLS 3

ABC N ° 80 N °

N70 °

N60 °

Common redpoll 50 Habitat suitability

N Low High ° 40 N ° 30 160° W140°W120 °W100 ° W80° W60° W Hoary redpoll N ° 80

N Hoary redpoll °

N70

Species ° Common redpoll Hoary redpoll Count

N60 10 °

50 Habitat suitability Range 20

N Low High °

Breeding 30 40 Resident

N

Wintering ° 30 160° W140° W120° W100° W80°W60 ° W Lesser redpoll

Fig. 1 The redpoll system. (A) Combined Holarctic distribution of redpoll finches. Dark purple indicates breeding grounds, interme- diate purple indicates resident status, and light purple indicates wintering distributions. Sampling sites are indicated with coloured circles: the size of the circle corresponds to the number of individuals sampled from each site. (B) Representative phenotypes of com- mon, hoary and lesser redpolls. Note differences in plumage coloration, patterning and bill morphology. (C) Ecological niche models (ENMs) constructed for common and hoary redpolls in North America. Darker colours indicate more suitable habitat. Occurrence data used to create niche models are shown with black dots. ENMs suggest considerable overlap in suitable abiotic conditions between hoary and common redpolls. Nonetheless, hoary redpolls prefer higher latitudes, while common redpolls are more wide- spread. between continents (e.g. Questiau et al. 1998; Drovetski Materials and methods et al. 2004, 2009). The paucity of genetic differentiation within the red- Sample collections and phenotyping poll complex, despite marked phenotypic variation across a Holarctic distribution, could be the result of Molecular analyses were based on 77 individuals, multiple evolutionary scenarios (Marthinsen et al. 2008): including representatives of all three redpoll species redpolls may be comprised of (i) a single, undifferenti- currently recognized by most authorities (e.g. Clements ated gene pool that exhibits phenotypic polymorphism, et al. 2014), which were from different regions of their in which phenotypic differences reflect locally adapted current distribution (Fig. 1, Table S1, Supporting infor- demes or neutral phenotypic variation within a single mation). Based on recently published phylogenies of the metapopulation; (ii) multiple gene pools that have family Fringillidae (Zuccon et al. 2012), we included recently diverged, in which incomplete lineage sorting two white-winged (Loxia leucoptera) individuals has hindered the capacity of previous studies to differ- as an out-group in our analyses. Because the main goal entiate populations or species; or (iii) multiple divergent of this study was to assess genetic differentiation gene pools that are actively exchanging genes through between redpolls with different plumage and morphol- hybridization and introgression via secondary contact. ogy characteristics (i.e. putative species), our geographic In this study, we implement high-throughput sampling was not exhaustive from a phylogeographic sequencing to evaluate these hypotheses by examining perspective and we did not include representatives genome-wide variation in anonymous loci among red- from all currently recognized redpoll subspecies. For polls sampled from different regions of the Holarctic. this component of our study, we relied on the classifica- We also assess variation among transcriptome sequence tions of collectors and museum curators to assign indi- data and gene expression in a subset of North Ameri- viduals to one of the three currently recognized species. can redpolls that span the phenotypic continuum We collected 10 of the 77 redpolls included in this described above. Finally, we use breeding season occur- study on the same day at the same wintering locality rence records to generate ecological niche models (Cortland, Cortland County, NY, USA; 42.6°N, 76.2°W; (ENMs) that characterize differences in suitable abiotic nine males and one female). These individuals were col- conditions between North American A. flammea and lected because they represented the broadest pheno- A. hornemanni. typic variation possible within the wintering flock. The

© 2015 John Wiley & Sons Ltd 4 N. A. MASON and S. A. TAYLOR

flock remained at this location for over 3 weeks before unique barcodes for each of four subsequent index collection. Therefore, individuals experienced similar groups (a total of 76 unique identifiers—the DNA from environmental conditions and foraging opportunities one sample was excluded due to low quality). Each that approximate a common garden setting. This shared digestion reaction contained 300 ng genomic DNA, experience should have reduced environmentally 3 lL109 CutSmart buffer, 1 lL of 250 nM P1, 1 lLof induced differences in gene expression between flock 25 lM P2, 3 lL10mM ATP, 0.75 lL (15 U) each of members; however, we cannot completely rule out dif- 20 U/lL SbfI-HF and MspI, 0.75 lL of 400 U/lLT4 ferences in microclimate or diet. DNA ligase, and water to a total of 30 lL. Next, sam- Rather than binning these individuals into putative ples were incubated at 37 °C for 30 min followed by species based on plumage characteristics and bill shape, 1 h at 20 °C, pooled in groups of 19 and cleaned with which are known to vary continuously (Troy 1985), we 1.59 volumes of AMPure beads (Beckman Coulter Inc) measured multiple morphological characters for each and two washes of 70% ethanol. The pooled samples individual (Table S2, Supporting information). We were then eluted into 30 lL of Qiagen EB buffer and quantified the amount of streaking on the undertail co- quantified using QFC. For each of the four index verts and rump of each individual by taking digital groups (index primers 6, 12, 1, 2), we set up six repli- photographs that were subsequently measured with cate PCRs containing 20 ng DNA, 12.5 lL29 Phusion IMAGEJ 1.48v (Abramoff et al. 2004). We took four mea- MM, 1.25 lLof5lM P1, 1.25 lLof5lM index primer, surements of shape (width, depth, culmen length, and water to a total reaction volume of 25 lL. The PCR mandible length) for each individual using digital calli- temperature profile included a 30-s incubation at 98 °C pers. Bill and plumage measurements were then incor- followed by 16 cycles of 98 °C for 5 s, 60 °C for 25 s porated into a principal components analysis (PCA) to and 72 °C for 10 s with a final extension step of 5 min obtain multivariate dimensions of phenotypic variation at 72 °C. The six replicate PCRs were pooled within (see Fig. S1, Supporting information for loadings, and each index group and visualized on a 1% agarose gel. Fig. S2, Supporting information for PCA scores). PCA We size-selected (200–1000 bp) each index group using scores were then used to assess statistical associations a two-step AMPure cleanup with 6.5% PEG/1.2 M between phenotypic variation and multiple indices of NaCl, 25% PEG/1.2 M NaCl and 11.5% PEG/1.2 M genetic variation. NaCl. Final elutions of each index group were analysed In addition to collecting genomic DNA from these 10 using QFC and an Agilent Bioanalyzer 2100 (Agilent, individuals, we also preserved separate samples of Santa Clara, CA, USA). Finally, we diluted each index whole brain, liver and muscle in RNAlater within group to 2 nM, combined all four in equal proportions 25 min post-mortem for RNA-Seq data generation and and sequenced the total redpoll ddRAD-Seq library on gene expression analyses. Specimens were processed in two lanes of an Illumina HiSeq 2000 [100 base pair (bp), the order in which they were caught, meaning that paired end] at the Cornell University Life Sciences Core some individuals were held captive longer than others Laboratories Center (Ithaca, NY, USA). The crossbill before collecting tissues. Genomic DNA and RNA sam- ddRAD-Seq library, consisting of two individuals, was ples were subsequently stored at À80 °C until library prepared using identical protocols and was sequenced preparation. on 5.2% of one shared lane of an Illumina HiSeq 2000 (100 bp, single end) at the Cornell University Life Sci- ences Core Laboratories Center. ddRAD-Seq library preparation and sequencing We extracted genomic DNA from each sample using â RNA-Seq library preparation and sequencing Qiagen DNeasy kits (tissue protocol; Qiagen, Valencia, CA, USA), eluted the DNA in water, concentrated each We combined ~15 mg of homogenized liver, pectoral sample using a vacuum centrifuge and determined the muscle and brain from each of the 10 individuals that final concentration of each extraction using Qubit Fluo- were collected on the same day at the same wintering rometric Calibration (QFC; Invitrogen, Carlsbad, CA, locality to total 45 mg of tissue per individual library. USA). DNA extractions are archived at the Cornell Lab We recognize that gene expression probably varies of Ornithology (Ithaca, NY, USA). across the three tissue types we pooled and that the tis- We prepared ddRAD-Seq libraries using a modified sues we chose are not ideal for detecting gene expres- version of the protocol outlined in Peterson et al. (2012). sion differences related to plumage or facial morphology Following a standardizing dilution (all genomic DNA (Abzhanov et al. 2004; Ekblom et al. 2012; Poelstra et al. ~30 ng/lL), we plated the samples and digested each 2014). Nevertheless, maintenance of plumage patterning with the restriction enzymes SbfI and MspI while ligat- and facial morphology (e.g. melanin deposition and ing P1 (barcode) and P2 adaptor primers using 19 bill growth patterns) should be preserved throughout

© 2015 John Wiley & Sons Ltd PHENOTYPIC AND GENETIC VARIATION IN REDPOLLS 5 the life of an individual, and there is the possibility that reads to the reference transcriptome. Before continuing these gene expression profiles may be detectable in with downstream analyses, we tested a range of differ- pooled tissue samples (see Results). One possible draw- ent transcript abundance cut-off values to remove back of pooling tissues is that we are unable to detect underrepresented contigs from the de novo assembly organ-specific differential gene expression among red- (Fig. S3, Supporting information). We removed contigs polls; however, the goal of our RNA-Seq experiment that failed to meet a 1.0 Transcripts Per Million thresh- was to detect possible species-level differences and old and selected the longest isoform for each compo- find candidate genes worthy of further exploration nent. After selecting the longest isoform to represent under controlled conditions. Future experiments in this each contig, our final Trinity assembly included 30 357 system will take a tissue-specific approach to target contigs with an N50 of 2715 bp. The transcriptome more relevant tissue types and developmental stages assembly and raw RNA-Seq reads can be accessed via (e.g. Abzhanov et al. 2004; Ekblom et al. 2012; Poelstra the NCBI Transcriptome Shotgun Assembly Database et al. 2014). (BioProject number PRJNA256306). Following collection, we used a TissueRuptor (Qia- To compare gene expression profiles among the 10 gen) to homogenize each individual tissue pool and fol- individuals that we collected and phenotyped, we lowed the ‘standard’ mRNA extraction protocol as aligned individual RNA-Seq libraries back to the de â TM detailed in the Dynabeads mRNA DIRECT kit (Invi- novo assembly with BOWTIE (Langmead et al. 2009) and trogen). To remove as much rRNA as possible from our used RSEM to estimate read abundance for each gene. extraction before constructing complimentary DNA We applied Trimmed Mean of M-values normalization (cDNA) libraries, we performed the mRNA extraction to obtain Fragments Per Kilobase of exon per Million protocol twice. We converted mRNA into cDNA fragments mapped (FPKM) values for each sample and libraries using the NEBNext RNA First Strand Synthesis contig, which were then log-transformed (Robinson Module (New BioLabs, Ipswich, MA, USA). et al. 2009). We applied multidimensional scaling to We then performed second-strand cDNA synthesis, end FPKM counts to generate multivariate dimensions of repair, dA-tailing and adaptor ligation for each individ- differential gene expression among individuals using ual cDNA library. Following adapter ligation and the package LIMMA (Smyth 2005; Ritchie et al. 2015) library purification, we performed 12 cycles of the within the R programming environment (R Core Devel- ‘denaturation annealing extension’ step during the opment Team 2014). index PCR. Prior to pooling multiplexed individuals, To identify differentially expressed genes that are we assessed the quality and quantity of cDNA using potentially associated with phenotypic variation in red- QFC and an Agilent Bioanalyzer 2100. Multiplexed indi- polls, we used principal component scores of plumage vidual cDNA libraries were pooled in equimolar ratio and morphology as the response variable in a general- and sequenced on a single lane using 100-bp single-end ized linear model (GLM) with normalized, log-trans- reads on the Illumina HiSeq 2000 at the Cornell Core formed FPKM read counts as predictor variables in the Laboratories Center. Raw, demultiplexed reads are LIMMA package (Smyth 2005; Ritchie et al. 2015) and cor- available through the NCBI Short Read Archive rected for multiple hypothesis testing by applying the (SRP052607). false discovery rate method (Benjamini & Hochberg 1995). Sampling order was included as a random factor in the GLM. We searched for matches between each Transcriptome assembly and gene expression profiling transcript in our assembly and the NCBI nonredundant Barcoded RNA-Seq reads were demultiplexed, filtered protein database using BLAST+ (Camacho et al. 2009). For and trimmed prior to assembly. We used TRIMMOMATIC each hit that met a threshold e-value < 1e-5, we v0.27 (Lohse et al. 2012) to remove low-quality reads obtained corresponding Gene Ontology terms using that dropped below a Phred-scale quality score of 20 or BLAST2GO (Conesa et al. 2005) to assess the functional included contamination from Illumina adapters. Filtered implications of any differentially expressed genes. reads were then loaded into TRINITY r2013-02-25 (Grabh- err et al. 2011; Haas et al. 2013) to assemble a de novo Locus assembly and SNP calling reference transcriptome using all 10 individual cDNA libraries, including individuals of both Acanthis flammea We concatenated ddRAD-Seq reads from the forward and Acanthis hornemanni. and reverse direction for downward locus assembly We performed transcript quantification of the de with STACKS v1.20 (Catchen et al. 2011, 2013). We used novo assembly with RNA-Seq by Expectation Maximi- fastx_trimmer (http://hannonlab.cshl.edu/fastx_tool- zation (RSEM v.1.2.3; Li & Dewey 2011), which estimates kit/) to trim all reads to an equal length of 94 bp and transcript abundance by aligning filtered and trimmed then filtered out reads with any bases that fell below a

© 2015 John Wiley & Sons Ltd 6 N. A. MASON and S. A. TAYLOR

Phred score of 10. We also trimmed out any reads for Seq SNPs using the same analytical pipeline: (i) loci which ≥5% of bases had a Phred score below 20. assembled from only redpoll data and (ii) loci We then separated multiplexed libraries using the assembled with data from redpolls and the crossbill process_radtags function from the STACKS pipeline out-group. For the redpoll data set, we ran three repli- (Catchen et al. 2013). The final filtered, trimmed and cate analyses for 10 000 generations following 10 000 demultiplexed data set contained 365 000 000 reads. generations of burn-in, using the ‘admixture’ model We pooled reads from all redpolls to perform de across a range of K values from 1 to 5 (three replicates novo locus assembly for redpolls only using the each), which were then averaged for population assign- denovo_map.pl script, which executes ustacks, cstacks ment scores. Because there were three putative species and sstacks in succession and comes bundled with in this analysis, we paid specific attention to results STACKS (Catchen et al. 2013). In brief, STACKS groups iden- from runs where the a priori constraint on the number tical reads based on sequence similarity to form ‘stacks’, of population clusters was K = 3 (redpolls only). Results which can then be combined to form putative loci. We from STRUCTURE were analysed using the Evanno et al. required a minimum of five reads for stack depth (-m), (2005) method in STRUCTURE HARVESTER (Earl & vonHoldt allowed five SNPs between any two stacks at a locus 2011). For the redpoll + crossbill data set, we ran STRUC- (-M) and five SNPs between any two loci when build- TURE as above with an a priori constraint on the number ing catalogues (-n). These parameter settings performed of population clusters K = 4 (redpolls + ). We well in a comparison of library assembly pipelines did this to ensure that our SNP data could differentiate (Mastretta-Yanes et al. 2015). We allowed 20% missing redpolls from the out-group taxon. data for each locus and extracted one locus per SNP To corroborate our Bayesian clustering analyses, we using the –write_single_snp flag when running the performed principal component analyses (PCA) on the populations program within STACKS. One individual had same two sets of loci using ADEGENET v1.4 (Jombart 2008; to be dropped from the ddRAD-Seq assembly pipeline Jombart & Ahmed 2011) and performed an analysis of due to poor coverage; therefore, the finalized ddRAD molecular variance (AMOVA; Excoffier et al. 1992) set included 76 individuals, including 9 of 10 individu- using PEGAS (Paradis 2010) to examine how genetic vari- als that comprise the RNA-Seq portion of this study. ation is partitioned among currently recognized species Crossbill raw sequencing reads were processed in the within the redpoll complex. We also tested for isolation same manner as redpolls. The final filtered, trimmed by distance among redpolls through a partial Mantel and demultiplexed crossbill data set contained 151 511 test using the R package ADE4 (Chessel et al. 2004). We reads. We pooled all reads from both crossbills and red- used BAYESCAN v2.1 (Foll & Gaggiotti 2008) with default polls to perform de novo crossbill and redpoll locus settings to identify any outlier loci that are highly dif- assembly using the denovo_map.pl script and used the ferentiated between currently recognized species and same STACKS settings detailed above. used a Friedman test to examine differences in We also identified a separate panel of SNPs from the observed heterozygosity between putative species. We de novo transcriptome and 10 individual RNA-Seq also assessed population structure among the 10 indi- libraries. We generated an index from our transcriptom- viduals we collected for our RNA-Seq experiment e and aligned each individual library to the reference (which span the phenotypic continuum—see Results) using BWA under default settings (Li & Durbin 2009). with the panel of 215 825 SNPs called from our de novo We called SNPs from indexed alignments using the transcriptome by running a PCA using ADEGENET (Jom- UnifiedGenotyper tool within Genome Analysis Toolkit bart 2008; Jombart & Ahmed 2011). under default settings (GATK; DePristo et al. 2011). As We performed Bayes factor delimitation (Grummer part of the SNP calling process, we filtered out sites et al. 2014) using a subsample of SNPs from the with Phred quality scores <30 and filtered by mean ddRAD-Seq data set in combination with SNAPP (Bry- depth, allele frequency and call rate and applied the ant et al. 2012), which is a module of BEAST 2 (Bouckaert BadCigarFilter using VCFTOOLS (version 3.0; Danecek et al. 2014), to assess support for lumping or splitting et al. 2011), which removes malformed reads that start redpoll species within a multispecies coalescent frame- with spurious deletions. We retained a total of 215 825 work (Leache et al. 2014). Due to computational con- out of 784 141 possible SNPs after filtering. straints, we randomly sampled six individuals from each species and combined these with the two out- group individuals to construct two data sets. For these Population genetic analyses 20 individuals, one data set included 35 loci with no We used the Bayesian clustering program STRUCTURE v missing data; the other matrix included 200 randomly 2.3.4 (Pritchard et al. 2000) to evaluate genetic differenti- sampled loci with no missing data for the two out- ation among redpolls. We analysed two sets of ddRAD- group individuals and a maximum of 20% missing data

© 2015 John Wiley & Sons Ltd PHENOTYPIC AND GENETIC VARIATION IN REDPOLLS 7 among redpolls. Although these loci represent a small package (Hijmans et al. 2013). We filtered occurrence portion entire genome, similar numbers of loci have data so that only records from the breeding season (i.e. successfully characterized general patterns of coales- June and July; Seutin et al. 1991) were included in our cence in other empirical studies (Grummer et al. 2014; analyses. Because sampling bias is common among Leache et al. 2014). We conducted path sampling with occurrence records (Hijmans et al. 2000), we subsam- 12 steps (100 000 MCMC steps; 100 000 pre-burn-in pled our data set to include only one occurrence record steps) to estimate the marginal likelihoods of models for every cell within a 0.5° latitude 9 0.5° longitude where redpolls are split or lumped (Leache et al. 2014), grid of North America. This resulted in 545 breeding which were later compared using Bayes factors (Kass & occurrence records of A. flammea and 159 breeding Raftery 1995). occurrence records of A. hornemanni (Table S3, Support- ing information). After gathering occurrence records, we used the Generalized linear models of phenotypic and genetic BioClim data set (Hijmans et al. 2005) with a resolution variation of 2.5 arc-minutes to extract 19 abiotic variables associ- To examine statistical associations between phenotypic ated with each set of coordinates. Using these data, we variation (within the sample of 10 individuals that we generated ENMs using MAXENT 3.3 (Phillips et al. 2006; collected in Cortland, NY) and different aspects of Elith et al. 2011), which performs well compared to genetic variation, we constructed GLMs with either PC1 alternative algorithms for generating ENMs (Elith et al. or PC2 from the plumage + bill morphology PCA as 2006). We generated pseudoabsence data by extracting response variables and used various indices of genetic bioclimatic data associated with 500 random points variation as predictor variables. First, we assessed within North America (extent in unprojected coordi- whether phenotypic variation was associated with pat- nates: latitude 30.0° to 80.0° and longitude À50.0° to terns of gene expression by including multidimensional À174.0°; Fig. 1C). This extent is reasonable given that scaling log-fold change dimension 1 and 2 scores redpolls are highly vagile and have been recently (Ritchie et al. 2015) as predictor variables and PC1 and reported as far south as Julian, CA (33.08°N, 116.60°W), PC2 scores from the plumage + bill morphology PCA making this geographic extent an approproiate approxi- as response variables, respectively. We included pro- mation of the ‘accessible’ niche space (Barve et al. 2011). cessing order as a random factor in both of these mod- We used the default settings within Maxent and parti- els to account for the potentially confounding effect of tioned 20% of the occurrence data from both species to time spent in captivity prior to tissue collection. To validate the performance of our models via k-fold cross- determine whether phenotypes were associated with validation (Hastie et al. 2001). More specifically, we cal- variation in anonymous SNP loci, we quantified associ- culated the area under the curve (AUC) of the receiver ations between PC1 and PC2 scores from the plum- operating characteristic score, which indicates the abil- age + bill morphology PCA and PC1 and PC2 scores ity of our model to predict a designated subset of our from the ddRAD-Seq PCA for the 10 individuals we occurrence data. This procedure was replicated 50 times collected. Because one of the individuals we collected to produce a distribution of AUC scores for each ENM. was dropped from the ddRAD-Seq data set due to poor An AUC score of 1 indicates a model that perfectly pre- coverage, these comparisons were restricted to nine dicts the testing data (i.e. species occurrences), whereas data points. Finally, we also assessed whether there an AUC of 0.5 indicates a model that has no predictive were relationships between the PC1 and PC2 scores power. Thus, we accepted a given ENM if the median from the SNPs generated from the de novo transcrip- AUC score across k-fold replicates was greater than an tome and PC1 and PC2 of the plumage + bill morphol- arbitrary cut-off value of 0.80. ogy PCA. We implemented two different statistical tests to com- pare the projected models of A. flammea and A. horne- manni in North America using the PHYLOCLIM package Niche modelling of Acanthis flammea and Acanthis (Heibl & Calenge 2013). Following the methodology hornemanni in North America provided by Warren et al. (2008), we evaluated ‘niche To compare the abiotic niches that constitute the breed- identity’ and ‘niche similarity’ by calculating a modified ing ranges of A. flammea and A. hornemanni in North version of Hellinger’s distance (I; Legendre & Gallagher America (Acanthis cabaret was not included here due to 2001), as well as Schoener’s D (Schoener 1968), between the difficulty of niche modelling for island populations the projected ENMs of the two species (Warren et al. and its limited range), we first gathered occurrence 2010). To assess niche identity, we compared these records through the Global Biodiversity Information niche similarity values to a null distribution of Facility (GBIF; http://www.gbif.org/) via the DISMO similarity measures that was built by comparing pseu-

© 2015 John Wiley & Sons Ltd 8 N. A. MASON and S. A. TAYLOR doreplicated ENMs based on 50 pooled, randomized information). Specifically, Wilcoxon signed-rank tests occurrence data from both species (Graham et al. 2004; with Bonferroni correction indicated that observed Warren et al. 2008). This procedure tests niche conserva- heterozygosity was lower in Acanthis cabaret compared tism in the strictest sense and determines whether the to Acanthis flammea (P < 0.05) and Acanthis hornemanni environmental tolerances for the two species are identi- (P < 0.05). Observed heterozygosity did not differ cal. We also quantified niche similarity by comparing between A. hornemanni and A. flammea (P > 0.05). similarity values between the projected ENMs of These differences probably reflect variation in sample A. flammea and A. hornemanni to distributions of similar- sizes because far fewer A. cabaret samples (6) were ity values obtained by comparing ENMs built from included in this study. occurrence data of each species to ENMs constructed At K = 3 (the putative number of species in the red- with random points sampled throughout the geographic poll analysis), results from STRUCTURE indicated that all extent (i.e. extent pictured in Fig. 1C; Peterson et al. redpolls clustered together (Fig. 2A) and the genetic 1999; Warren et al. 2008). This procedure was repeated PCA revealed low levels of differentiation among cur- 50 times to generate null distributions of similarity val- rently recognized redpoll species (Fig. 2B). By compar- ues and determine whether the observed overlap ing changes in likelihood scores among different between the niches of A. flammea and A. hornemanni is settings of the K parameter, we identified K = 2 as the simply the product of regional similarities or the niche preferred setting via the Evanno et al. (2005) method models of the two species are more similar or different (Table S4, Supporting information); however, at every than would be expected by chance. setting of K, all redpolls were assigned to the same pop- ulation cluster. Our path-sampling analysis of different species delimitation models conducted using Bayes fac- Results tor delimitation with SNAPP favoured a model with redpolls lumped as a single species within a coalescent Population genetics using anonymous loci framework (Fig. 2C); both the 35 locus data set with Filtering and locus assembly protocols from the complete sampling (Bayes factor = 36.80) and the 200 ddRAD data generated a redpoll data set consisting locus data set (Bayes factor = 15.22) supported this find- of 20 712 SNPs and a redpoll + out-group data set ing. consisting of 1587 SNPs. A Friedman test revealed dif- When analysed with the out-group and a smaller ferences in observed heterozygosity between species SNP data set (1587 loci), redpolls were easily differenti- (v2 = 4679.26, d.f. = 2, P < 0.05; Fig. S4, Supporting ated from the out-group, but exhibited low genetic

A Fig. 2 Redpoll population genetic analy- ses. (A) Bayesian assignment probabili- ties from STRUCTURE showing lack of population clustering among currently recognized redpoll species using 20 721 SNPs. (B) Genetic PCA plot indicating weak population structure among cur- rently recognized species of redpolls. Common redpoll is represented with blue, hoary redpoll is represented with red, and lesser redpoll is represented Common Hoary Lesser with yellow dots. (C) SNAPP tree using 1587 SNPs for common, hoary and lesser BC redpoll, and white-winged crossbill (grey). Bayes factor delimitation strongly favoured lumping redpolls into a single species (Bayes factor = 36.80). 0 5 10 15 PC2 (1.92%) −15 −5

−10 −5 0 5 PC1 (2.12%) Outgroup

© 2015 John Wiley & Sons Ltd PHENOTYPIC AND GENETIC VARIATION IN REDPOLLS 9 differentiation among currently recognized species or Differential gene expression and gene annotations geographically isolated populations (Fig. S5A, Support- ing information); the genetic PCA (Fig. S5B, Supporting We used BLAST+ (Camacho et al. 2009) to align our con- information) also separated the out-group from redpolls tigs against the UniProtKB/Swiss-Prot protein database but did not appreciably separate currently recognized and found that 29 850 of our contigs (24.5%) had a sig- redpoll species. Similarly, a PCA of the 215 825 locus nificant BLAST hit with an e-value of ≤1e-5. Our GLMs data set generated from our RNA-Seq experiment (10 that quantified associations between phenotype and individuals that spanned the phenotypic continuum— FPKM values did not recover any genes with a false see Table S2, Supporting information) did not differenti- discovery rate below 0.05 (Table S5, Supporting infor- ate currently described species (Fig. S6, Supporting mation). However, we did identify a number of candi- information). date genes related to morphology that are worthy of An AMOVA of the ddRAD-Seq SNP panel revealed that further study under controlled conditions; for complete- the overwhelming majority of genetic variation is parti- ness, we present the 100 contigs with BLAST hits that tioned within species (98.11%; Table 1) rather than were most strongly associated with phenotypic varia- between species (1.89%; Table 1). We also found that tion among redpolls (Table S5, Supporting information). redpolls exhibit isolation by distance (r = 0.12, P = 0.04): individuals are more closely related to geo- Niche modelling graphically proximate individuals, regardless of their phenotype. Our BAYESCAN analysis did not identify any Acanthis flammea and A. hornemanni demonstrate consid- outlier loci that were highly divergent between cur- erable overlap in suitable habitat in North America, rently recognized species (Fig. S7, Supporting informa- although A. hornemanni does seem to prefer higher lati- tion). tudes (Fig. 1C). The ENM for A. flammea (median AUC = 0.88, interquartile range = 0.87–0.90) predicted highly suitable habitat across much of northern Canada Associations between phenotypes, differentially and Alaska, which reflects its widespread distribution expressed genes and anonymous SNPs throughout North America. In contrast, A. hornemanni We found strong associations between multidimen- (median AUC = 0.94, interquartile range = 0.93–0.95) sional scaling scores of differential gene expression and prefers abiotic conditions associated with higher lati- both principal component scores of the plumage + mor- tudes throughout Canada and Alaska. The variable that phology PCA. The leading log-fold change dimension contributed most to the ENMs of both species was the (LLFC) 1 of multidimensional scaling space was corre- maximum temperature of the warmest month (56.2% lated with PC1 [b = 2.882 Æ 0.307 (SE), P = 3.2e-05, and 49.4% for A. flammea and A. hornemanni, respec- Table 2, Fig. 3A], and LLFC 2 was correlated with PC2 tively; Fig. S8, Supporting information). However, the (b = 0.059 Æ 0.068, P = 0.04, Table 2, Fig. 3B). In con- response curves for certain BioClim variables differed trast, LLFC 1 was not correlated with PC1 of SNP varia- substantially between the two species. For example, the tion from the ddRAD-Seq data (b = À0.229 Æ 0.495, ENM that we constructed for A. hornemanni indicated a P = 0.66, Table 2, Fig. 3C) and LLFC 2 was not corre- higher probability of occurrence, compared to the ENM lated with PC2 of ddRAD-Seq SNP variation for A. flammea, among localities where the maximum (b = 0.144 Æ 0.232, P = 0.55, Table 2, Fig. 3D). Finally, temperature of the warmest month and the annual PC1 and PC2 scores from the panel of SNPs called from mean temperatures were lower (Fig. S9, Supporting the de novo transcriptome were not correlated with information). PC1 and PC2 scores, respectively, from the plum- The niche equivalency test indicated that A. flammea age + morphology PCA (b = À0.812 Æ 0.726, P = 0.30; and A. hornemanni do not occupy identical niches b = À0.005 Æ 0.007, P = 0.50, Table 2, Fig. 3E). (D = 0.579, P < 0.0001; I = 0.824, P < 0.0001; Fig. S10,

Table 1 Results from analysis of molecular variance (AMOVA) using 20 712 SNPs, including the degrees of freedom (d.f.), the sum of squares (SS), mean squared deviation (MSD), variance (r2), the amount of total variation explained by hierarchical level and the Φ estimate of population differentiation ( ST)

r2 Φ d.f. SS MS % of total variation ST

Between species 2 2143.93 1071.97 14.66 1.89 À0.01 Within species 73 55597.32 761.61 761.61 98.11 Total 75 57741.25 769.88

© 2015 John Wiley & Sons Ltd 10 N. A. MASON and S. A. TAYLOR

Table 2 Results from generalized linear models built to assess statistical associations between plumage and morphology principal component scores, multidimensional scaling leading log-fold change scores from RNA-Seq data, SNPs called from the ddRAD-Seq data set, and the transcriptome. Each model is separated with a horizontal rule: models with multidimensional scaling leading log- fold changes also included processing order as a possible predictor variable. The magnitude of each effect score with their corre- sponding standard error (SE) values is shown. All linear model terms that have a P-value lower than 0.05 are indicated with an asterisk

Response Predictor b Æ SE T value P-value

PC1 mdsPC1 2.882 Æ 0.307 9.399 3.20e-05* PC1 Processing order 0.059 Æ 0.068 0.865 0.416 PC2 mdsPC2 1.200 Æ 0.477 2.517 0.04* PC2 Processing order 0.151 Æ 0.08 1.894 0.1 PC1 ddRAD-Seq SNP PC1 À0.229 Æ 0.495 À0.463 0.657 PC2 ddRAD-Seq SNP PC2 0.144 Æ 0.232 0.621 0.554 PC1 RNA-Seq SNP PC1 À0.812 Æ 0.726 À1.119 0.296 PC2 RNA-Seq SNP PC2 À0.005 Æ 0.007 À0.698 0.505

Supporting information). Yet, our background similarity differences in gene expression that are correlated with test indicated that the abiotic conditions that character- redpoll phenotypes, suggesting that gene expression ize the distribution of A. hornemanni are more similar to might play an important role in generating phenotypic those of A. flammea than would be expected based on diversity among redpolls. Finally, we demonstrated that the availability of habitat in North America (D = 0.579, Acanthis flammea and Acanthis hornemanni exhibit 95% CI = 0.371–0.402; I = 0.824, 95% CI = 0.661–0.690; substantial overlap in suitable habitat in North Amer- Fig. S10, Supporting information). Similarly, the ica, with A. hornemanni typically occurring at higher observed similarity indices were higher than the confi- latitudes than A. flammea. dence interval of the null distribution constructed using actual A. flammea occurrence data and randomly gener- Evolutionary history of redpolls ated A. hornemanni data (D = 0.579, 95% CI = 0.502– 0.558; I = 0.824, 95% CI = 0.773–0.813; Fig. S10, Support- The consistent lack of genetic differentiation in two ing information). Thus, while the abiotic niches of large panels of anonymous loci, including hundreds of A. flammea and A. hornemanni are not completely identi- thousands of SNPs within the transcriptome, is surpris- cal, they are more similar than comparisons of either ing given the geographic and phenotypic breadth of species’ ENMs with null models generated from ran- sampling included in this study. Recently, reduced-rep- dom background points throughout North America. resentation genomic approaches, like those used here, have provided unprecedented resolution in other lin- eages that exhibit marked phenotypic diversity on Discussion recent evolutionary timescales (e.g. Lake Victoria cich- Our findings expand upon previous studies of the evo- lids, Wagner et al. 2012; Nicaraguan crater lake cichlids, lutionary dynamics of redpolls that have relied on tra- Elmer et al. 2014; American oaks, Hipp et al. 2014). ditional molecular markers, such as mtDNA and Given the low levels of genome-wide genetic differenti- microsatellites (Marten & Johnson 1986; Seutin et al. ation we detected in redpolls, it appears unlikely that 1995; Ottvall et al. 2002; Marthinsen et al. 2008). currently recognized species underwent long periods of Genome-wide panels of SNPs generated using ddRAD- allopatric divergence and have since come back into Seq and a de novo transcriptome both support the secondary contact. Rather, our findings suggest that hypothesis that redpolls comprise a single, Holarctic A. flammea, A. hornemanni and Acanthis cabaret have a gene pool with remarkably little genetic differentiation predominantly shared evolutionary history and cur- between currently recognized species. The absence of rently comprise a single gene pool distributed through- outlier loci among a panel of tens of thousands of SNPs out the Holarctic, which may be undergoing suggests that currently recognized redpoll species share contemporary differentiation via ecological selection. very recent common ancestry; whole-genome sequenc- Our ENMs demonstrate that the abiotic conditions ing will further clarify patterns of genomic differentia- that characterize A. hornemanni and A. flammea differ tion between redpolls. Intriguingly, we found novel (i.e. A. hornemanni tend to occur at higher latitudes), but

© 2015 John Wiley & Sons Ltd PHENOTYPIC AND GENETIC VARIATION IN REDPOLLS 11

Fig. 3 Statistical associations between phenotypic diversity and A genetic variation. Phenotypic variation is associated with varia- tion in differences in gene expression profiles, but is uncorre- 12 lated with neutral variation from panels of SNPs constructed streakier plumage

Longer, shallower bill shallower Longer, from ddRAD-Seq loci or the transcriptome. Each individual that was phenotyped and included in the RNA-Seq analyses is coded with a different colour that is consistent across panels. An outer blue circle indicates individuals that would be classi- fied as Acanthis flammea according to traditional , and an outer red circle indicates individuals that would be classi- Bill morphology & plumage (PC1) −4 −3 −2 −1 0 cleaner plumage Shorter, deeper bill Shorter, fied as Acanthis hornemanni. Scatter plots showing (A) PC1 of −1.0 −0.5 0.0 0.5 plumage and morphology variation and leading log-fold Leading logFC dim 1 change dimension (LLFC) 1 of gene expression data; (B) PC2 B of plumage and morphology variation and LLFC 2 of gene expression data; (C) PC1 of plumage and morphology variation

1.0 1.5 and PC1 of SNP variation from ddRAD-Seq data; (D) PC2 of plumage and morphology variation and PC2 of SNP variation Shorter, shallower bill shallower Shorter, more ventral streaking more ventral from ddRAD-Seq data; and (E) PC1 of plumage and morphol- 0.0 0.5 ogy variation and PC1 scores from a genetic PCA of SNPs

−0.5 called from the de novo transcriptome.

Bill morphology & plumage (PC2) are more similar than expected based on available con- −1.5 −1.0

Longer, deeper bill Longer, −1.0 −0.5 0.0 0.5 ditions in North America. This pattern may be present less ventral streaking less ventral Leading logFC dim 2 due to the dichotomous nature of the current species C classification scheme, which does not account for the continuous nature of phenotypic diversity in redpolls and could underemphasize perceived differences in abi-

streakier plumage otic conditions between taxa. Individuals with interme- Longer, shallower bill shallower Longer, diate plumage are placed into one of the two species categories, which may influence how occurrence records are classified and the resulting species distribu- tion models. Redpolls could potentially be experiencing Bill morphology & plumage (PC1) −4 −3 −2 −1 0 1 2 different abiotic conditions in their contemporary distri- cleaner plumage Shorter, deeper bill Shorter, −2 −1 0 1 2 butions, which may have important implications for the ddRAD−Seq PC1 patterns of differential gene expression we report (see D below). By comparing different models of species delimitation within a multispecies coalescent framework that Shorter, shallower bill shallower Shorter, more ventral streaking more ventral accounts for incomplete lineage sorting, we find strong

0.0 0.5 1.0 1.5 support that all currently recognized species of redpolls comprise a single coalescent lineage. In combination with the other population genetic analyses and ecologi- −1.0 −0.5 cal niche modelling results, our findings support the −1.5 Bill morphology & plumage (PC2) assertion that redpolls are part of a single, polymorphic −1 0 1 2 3 Longer, deeper bill Longer,

less ventral streaking less ventral ddRAD−Seq PC2 metapopulation rather than distinct biological entities with separate evolutionary histories. Previous studies based on far fewer molecular markers reached similar E conclusions (Marten & Johnson 1986; Seutin et al. 1995; Ottvall et al. 2002; Marthinsen et al. 2008). Given the streakier plumage Longer, shallower bill shallower Longer, low genetic differentiation we documented between currently recognized species and across large geo- graphic expanses, redpolls might best be treated as a single species. Although the possibility persists that cer- tain regions of the genome may be fixed or highly −4 −3 −2 −1 0 1 2 Bill morphology & plumage (PC1) cleaner plumage Shorter, deeper bill Shorter, −2 −1 0 1 divergent between redpoll types, our findings do not RNA−Seq SNP PC1 support the assertion that multiple, separately evolving

© 2015 John Wiley & Sons Ltd 12 N. A. MASON and S. A. TAYLOR metapopulations exist within the genus Acanthis.We however, the number of loci analysed here represents also demonstrate that individual redpolls classified as only a fraction of the redpoll genome and denser different species span a phenotypic continuum, rather sampling will likely be necessary to detect loci under than discrete classes, which has been shown by previ- divergent selection, if they exist (Michel et al. 2010; ous studies (Troy 1985). Certain authorities, such as Tiffin & Ross-Ibarra 2014). Given that redpolls are BirdLife International, already treat redpolls as a single abundant, incomplete lineage sorting due to large species, and previous studies have arrived at similar ancestral population sizes may be obscuring our ability conclusions (Troy 1985). to detect independently evolving lineages. Redpolls are primarily granivorous and rely on patchily distributed tree seed crops that can vary sub- Differential gene expression and phenotypic diversity stantially in abundance from year to year (Clement 2010a,b). As a result, redpolls are highly nomadic dur- Although the tissues sampled were not optimal for ing the nonbreeding season and travel great distances looking at expression differences related to the pheno- to find food within flocks that regularly contain individ- typic traits we measured, we found a strong correlation uals that vary in phenotype; anecdotes from bird band- between overall levels of differential gene expression ing recoveries suggest that individual redpolls and phenotypic variation among individuals. This sug- frequently travel thousands of miles (e.g. recoveries gests that gene expression variation could be playing an between Scandinavia and China). Widespread annual important role in generating phenotypic diversity movements could contribute to genetic connectivity within redpolls. Controlled aviary experiments that among geographically disjunct populations if redpolls sample multiple stages of development and tissues show low breeding site fidelity and pair with individu- more relevant to plumage streaking and bill size and als possessing dissimilar phenotypes. Indeed, there is shape (e.g. feather follicle, embryonic bill) will be little evidence for assortative mating in redpolls (Troy important extensions of this work. Additionally, it is 1985; but see Lifjeld & Bjerke 1996) and a hybrid zone unclear whether gene expression varies seasonally in between the phenotypes, which would indicate adult redpolls. It is possible that gene expression during restricted gene flow between the species, has never the breeding season could be different than what we been reported. Finally, in addition to the continuous detected in this sample of wintering birds. This will be variation in phenotypic characters used to identify spe- explored in future studies. cies (Troy 1985), redpolls possess very similar vocal Gene expression can play an important role in the repertoires and have been shown to adopt identical evolution of phenotypic differences and local adapta- breeding calls in mixed pairs (Molau 1985) and match tions, as shown in humans (Fraser 2013), sticklebacks flock-mate calls (Mundinger 1979). The possibility exists (Shapiro et al. 2004), Peromyscus mice (Manceau et al. that redpolls may develop flock-specific call repertoires 2010) and Heliconius butterflies (Reed et al. 2011). Cis- that are independent of variation in plumage and mor- regulatory modifications can involve few or many phology (N. Pieplow, personal communication), which upstream promoters or enhancers with large cascading could facilitate gene flow between currently recognized effects on gene expression and resulting phenotypes redpoll species. (Romero et al. 2012). Associations between our multi- Although our data suggest the presence of a single, variate measures of gene expression and phenotypic weakly differentiated gene pool within Acanthis, the diversity indicate broad, multigenic patterns of differen- observed patterns could be the result of extremely tial gene expression among individuals. Thus, we are recent and ongoing speciation, perhaps via ecological currently unable to pinpoint the causal genes underly- selection (Marthinsen et al. 2008). The slight differences ing differences in bill morphology or plumage pattern- we detected in abiotic niches between A. flammea and ing among redpolls, but have reason to believe that A. hornemanni in North America could play a role in these differences may be adaptive and potentially the divergence process. Indeed, the plumage and mor- related to the slight niche differences that we detected phological characters we measured may have adaptive (see above). Given that phenotypic variation is continu- significance: lighter plumage may improve camouflage ous within the genus (Fig. 3; Troy 1985), there are prob- at higher latitudes, and smaller bills (possessed by the ably many loci that contribute to the differences in higher latitude A. hornemanni) are known to reduce heat morphology and plumage discussed here. loss in birds (Symonds and Tattersall 2010; Greenberg Although none of the genes identified in this study et al. 2012). If redpolls have recently experienced diver- were significantly associated with phenotypic variation gent selection, we would expect to have detected some following a correction for multiple hypothesis testing, level of genomic heterogeneity rather than the wide- we did find multiple candidate genes worthy of further spread lack of differentiation that we document here; study. For example, our list of genes that were most

© 2015 John Wiley & Sons Ltd PHENOTYPIC AND GENETIC VARIATION IN REDPOLLS 13 strongly associated with phenotypic variation included always be differentiated via high-throughput sequenc- multiple genes involved in the Wnt signalling pathway ing. This has important implications for taxonomic revi- (e.g. tsukushin and frizzled-3; Table S5, Supporting infor- sions, conservation and the designation of conservation mation). Expression levels of multiple genes involved in units (see, Funk et al. 2012; McCormack & Maley 2015): Wnt signalling appear to play a role in developing dif- such phenotypically diverse lineages may represent sin- ferent bill morphologies among birds (Brugmann et al. gle evolutionary units, as appears to be the case in the 2010) and facial developmental pathways of various redpoll finches, for which thousands of anonymous loci vertebrates (Brugmann et al. 2007). Additionally, the provided no evidence of population divergence, or they Wnt signalling pathway regulates Bmp pathway activity may not. Even with whole-genome data, identifying (Tzahor et al. 2003), a well-studied developmental path- loci that contribute to phenotypic diversity is challeng- way that affects bill morphology in birds (Abzhanov ing, particularly for quantitative traits that probably et al. 2004). With respect to plumage variation among involve many loci (Tiffin & Ross-Ibarra 2014). Intrigu- redpolls, the MC1R pathway has been implicated in ingly, our results indicate that gene expression informa- melanin-based phenotypic variation in many vertebrate tion may reveal how phenotypic differences arise systems (Mundy 2005; Hubbard et al. 2010). We found among taxa; however, controlled conditions are that higher expression levels of MC5R, which also plays required to remove environmentally induced variation a role in regulating cyclic AMP levels and the melano- as a source of bias. genesis pathways, are associated with increased ventral Rapid bouts of phenotypic diversification and specia- and dorsal streaking (Table S5, Supporting informa- tion have provided seminal examples of evolution in tion). action, yet also present theoretical, computational and The presence of differential gene expression despite conservation challenges. Despite technological advances low levels of genomic differentiation suggests that phe- that have characterized the genomic era, these chal- notypic plasticity could also play an important role in lenges remain in many systems. Continued persistence generating phenotypic diversity in redpolls. Identical and analytical innovation will reward molecular ecolo- genotypes can produce variable phenotypes under dif- gists with acute knowledge regarding the evolutionary fering environmental conditions, but examinations of and ecological processes that comprise lineage diversifi- phenotypic plasticity usually focus on intraspecific com- cation and phenotypic differentiation in rapidly evolv- parisons rather than interspecific comparisons (West- ing lineages. Eberhard 1989, 2005; Nijhout 2003). Environmental cues could produce different phenotypes among redpolls if they act early in development as is commonly observed Acknowledgements in insects and plants (Nijhout 2003; West-Eberhard M. Young, A. Cibois, D. Bonter, J. Barry, L. Seitz and S. Birks 2005). The potential for environmentally induced phe- at The University of Washington Burke Museum helped notypic plasticity to cause geographic variation in mor- gather genetic resources. E. Bondra and S. Bogdanowicz phology in vertebrates is less well known (but see assisted with library prep. We thank I. Lovette, R. Harrison and K. Wagner for comments on the manuscript. G. Brad- James 1983) and is an important avenue of future burd, P. Title, L. Campagna, P. Deane-Coe, A. Ellison, C. Bala- inquiry in redpolls. krishnan, Z. Cheviron and M. Hare provided useful insight and discussions. W. Jeyasingham drew redpoll portraits for Conclusions our figures. We thank three anonymous reviewers for helpful feedback that improved the manuscript. We are grateful Our ability to understand the evolutionary context that towards all institutions that have provided occurrence data facilitates rapid phenotypic evolution has been greatly through the GBIF portal. This study was funded in part by a Keickheffer Adirondack Fellowship, the Cornell Lab of Orni- improved by the adoption of high-throughput sequenc- thology Athena Fund and the Fuller Evolutionary Biology ing technologies and reduced-representation genomic Program at the Cornell Lab of Ornithology. SAT was sup- approaches (e.g. Wagner et al. 2012; Cui et al. 2013; ported in part by the Cornell Center for Comparative and Jones et al. 2013; Nadeau et al. 2013; Elmer et al. 2014). Population Genomics as well as a Banting Fellowship from However, these approaches still generate data sets that the Natural Sciences and Engineering Research Council of represent a small portion of the genome. It is becoming Canada. NAM was supported in part by an EPA STAR Grad- increasingly clear that such approaches are not a pana- uate Fellowship (F13F21201). cea for resolving species boundaries and the genetic architecture of phenotypic diversity within rapidly References diversifying lineages. As demonstrated here, even lin- ~ eages that have traditionally been classified as separate Abramoff MD, Magalhaes PJ, Ram SJ (2004) Image processing – species with pronounced phenotypic variation cannot with ImageJ. Biophotonics International, 11,36 43.

© 2015 John Wiley & Sons Ltd 14 N. A. MASON and S. A. TAYLOR

Abzhanov A, Protas M, Grant BR, Grant PR, Tabin CJ (2004) Cui R, Schumer M, Kruesi K, Walter R, Andolfatto P, Rosen- Bmp4 and morphological variation of in Darwin’s thal GG (2013) Phylogenomics reveals extensive reticulate finches. Science, 305, 1462–1465. evolution in Xiphophorus fishes. Evolution, 67, 2166–2179. Barton NH, Hewitt GM (1981) Hybrid zones and speciation. Danecek P, Auton A, Abecasis G et al. (2011) The variant call In: Evolution and Speciation (eds Atchley WR, Woodruff DS), format and VCFtools. Bioinformatics, 27, 2156–2158. pp. 109–145. Cambridge University Press, Cambridge, UK. DePristo MA, Banks E, Poplin R et al. (2011) A framework for Barve N, Barve V, Jimenez-Valverde A et al. (2011) The crucial variation discovery and genotyping using next-generation role of the accessible area in ecological niche modeling and spe- DNA sequencing data. Nature Genetics, 43, 491–498. cies distribution modeling. Ecological Modelling, 222,1810–1819. Drovetski SV, Zink RM, Rohwer S et al. (2004) Complex bioge- Bazykin AD (1969) Hypothetical mechanism of speciation. Evo- ographic history of a Holarctic . Proceedings of the lution, 23, 685–687. Royal Society B: Biological Sciences, 271, 545–551. Benjamini Y, Hochberg Y (1995) Controlling the false discovery Drovetski SV, Zink RM, Mode NA (2009) Patchy distribution rate: a practical and powerful approach to multiple testing. belie morphological and genetic homogeneity in rosy-finches. Journal of the Royal Statistical Society Series B-Methodological, Molecular Phylogenetics and Evolution, 50, 437–445. 57, 289–300. Earl DA, vonHoldt BM (2011) STRUCTURE HARVESTER: a Bouckaert R, Heled J, Kuhnert€ D et al. (2014) BEAST 2: a soft- website and program for visualizing STRUCTURE output ware platform for Bayesian evolutionary analysis. PLoS Com- and implementing the Evanno method. Conservation Genetics putational Biology, 10, e1003537. Resources, 4, 359–361. Brawand D, Soumillon M, Necsulea A et al. (2012) The evolu- Ekblom R, Farrell LL, Lank DB, Burke T (2012) Gene expres- tion of gene expression levels in mammalian organs. Nature, sion divergence and nucleotide differentiation between males 478, 343–348. of different color morphs and mating strategies in the ruff. Brooks WS (1968) Comparative adaptations of the Alaskan red- Ecology and Evolution, 2, 2485–2505. polls to the arctic environment. Wilson Bulletin, 80, 253–280. Elith J, Graham CH, Anderson RP et al. (2006) Novel methods Brugmann SA, Goodnough LH, Gregorieff A et al. (2007) Wnt improve prediction of species’ distributions from occurrence signaling mediates regional specification in the vertebrate data. Ecography, 29, 129–151. face. Development, 134, 3283–3295. Elith J, Phillips SJ, Hastie T et al. (2011) A statistical explana- Brugmann SA, Powder KE, Young NM et al. (2010) Compara- tion of MaxEnt for ecologists. Diversity and Distributions, 17, tive gene expression analysis of avian embryonic facial struc- 43–57. tures reveals new candidates for human craniofacial Elmer KR, Fan S, Kusche H et al. (2014) Parallel evolution of disorders. Human Molecular Genetics, 19, 920–930. Nicaraguan crater lake cichlid fishes via non-parallel routes. Bryant D, Bouckaert R, Felsenstein J et al. (2012) Inferring spe- Nature Communications, 5, 5168. cies trees directly from biallelic genetic markers: bypassing Escudero M, Eaton DAR, Hahn M, Hipp AL (2014) Genotyp- gene trees in a full coalescent analysis. Molecular Biology and ing-by-sequencing as a tool to infer phylogeny and ancestral Evolution, 29, 1917–1932. hybridization: a case study in Carex (Cyperaceae). Molecular Camacho C, Coulouris G, Avagyan V et al. (2009) BLAST+: Phylogenetics and Evolution, 79, 359–367. architecture and applications. BMC Bioinformatics, 10, 421. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of Catchen JM, Amores A, Hohenlohe P et al. (2011) Stacks: build- clusters of individuals using the software structure: a simula- ing and genotyping Loci de novo from short-read sequences. tion study. Molecular Ecology, 14, 2611–2620. G3 (Bethesda, Md.), 1, 171–182. Excoffier L, Smouse PE, Quattro JM (1992) Analysis of molecu- Catchen J, Hohenlohe PA, Bassham S et al. (2013) Stacks: an lar variance inferred from metric distances among DNA analysis tool set for population genomics. Molecular Ecology, haplotypes: application to human mitochondrial DNA 22, 3124–3140. restriction data. Genetics, 131, 479–491. Chessel D, Dufour AB, Dray S (2004) CRAN Package ade4. Filteau M, Pavey SA, St-Cyr J, Bernatchez L (2013) Gene coex- Analysis of Ecological Data. v1.6-2. R Foundation for Statistical pression networks reveal key drivers of phenotypic diver- Computing, Vienna, Austria. gence in lake whitefish. Molecular Biology and Evolution, 30, Clement P (2010a) Common Redpoll ( flammea). In: 1384–1396. Handbook of the Birds of the World (eds del Hoyo J, Elliott A, Foll M, Gaggiotti O (2008) A genome-scan method to identify Christie DA), pp. 564–565. Lynx Edicions, Barcelona, Spain. selected loci appropriate for both dominant and codominant Clement P (2010b) (Carduelis hornemanni). In: markers: a Bayesian perspective. Genetics, 180, 977–993. Handbook of the Birds of the World (eds del Hoyo J, Elliott A, Fraser HB (2013) Gene expression drives local adaptation in Christie DA), pp. 565–566. Lynx Edicions, Barcelona, Spain. humans. Genome Research, 23, 1089–1096. Clements JF, Schulenberg TS, Iliff MJ et al. (2014) The Clements Funk WC, McKay JK, Hohenlohe PA, Allendorf FW (2012) Checklist of Birds of the World: Version 6.9. Cornell University Harnessing genomics for delineating conservation units. Press, Ithaca, New York. Trends in Ecology and Evolution, 27, 489–496. Conesa A, Gotz S, Garcia-Gomez JM et al. (2005) Blast2GO: a Grabherr MG, Haas BJ, Yassour M et al. (2011) Full-length tran- universal tool for annotation, visualization and analysis in scriptome assembly from RNA-Seq data without a reference functional genomics research. Bioinformatics, 21, 3674–3676. genome. Nature Biotechnology, 29, 644–652. Coues E (1862) A monograph of the genus Aegiothus with Graham CH, Ron SR, Santos JC et al. (2004) Integrating phylog- descriptions of new species. Proceedings of the Academy of Nat- enetics and environmental niche models to explore specia- ural Sciences of Philadelphia, 13, 373–390. tion mechanisms in dendrobatid frogs. Evolution, 58, 1781.

© 2015 John Wiley & Sons Ltd PHENOTYPIC AND GENETIC VARIATION IN REDPOLLS 15

Greenberg R, Cadena V, Danner RM, Tattersall G (2012) Heat Jones JC, Fan S, Franchini P, Schartl M, Meyer A (2013) The loss may explain bill size differences between birds occupy- evolutionary history of Xiphophorus fish and their sexually ing different habitats. PLoS ONE, 7, e40933. selected sword: a genome-wide approach to using restriction Grummer JA, Bryson RW, Reeder TW (2014) Species delimita- site-associated DNA sequencing. Molecular Ecology, 22, 2986– tion using bayes factors: simulations and application to the 3001. sceloporus scalaris species group (Squamata: Phrynosomati- Kass RE, Raftery AE (1995) Bayes factors. Journal of the Ameri- dae). Systematic Biology, 63, 119–133. can Statistical Association, 90, 773–795. Haas BJ, Papanicolaou A, Yassour M et al. (2013) De novo tran- Key K (1968) The concept of stasipatric speciation. Systematic script sequence reconstruction from RNA-seq using the Trin- Biology, 17,14–22. ity platform for reference generation and analysis. Nature Knox A, Helbig A, Parkin D, Sangster G (2001) The taxonomic Protocols, 8, 1494–1512. status of Lesser Redpoll. British Birds, 94, 260–267. Harris MP, Norman FJ, McColl R (1965) A mixed population Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast of redpolls in northern Norway. British Birds, 58, 288–294. and memory-efficient alignment of short DNA sequences to Harrison RG (1990) Hybrid zones: windows on evolutionary the human genome. Genome Biology, 10, R25. process. Oxford Surveys in Evolutionary Biology, 7,69–128. Larson WA, Seeb LW, Everett MV, Waples RK, Templin WD, Harrison RG, Rand DM (1989) Mosaic hybrid zone and the nat- Seeb JE (2013) Genotyping by sequencing resolves shallow ure of species boundaries. In: Speciation and its Consequences population structure to inform conservation of Chinook sal- (eds Otte D, Ender JA), pp. 111–133. Sinauer Associates Inc, mon (Oncorhynchus tshawytscha). Evolutionary Applications, 7, Sunderland, Massachusetts. 355–369. Harrison PW, Wright AE, Mank JE (2012) The evolution of Leache AD, Fujita MK, Minin VN, Bouckaert RR (2014) Species gene expression and the transcriptome–phenotype relation- delimitation using genome-wide SNP data. Systematic Biol- ship. Seminars in Cell & Developmental Biology, 23, 222–229. ogy, 63, 534–542. Harvey MG, Brumfield RT (2014) Genomic variation in a wide- Legendre P, Gallagher E (2001) Ecologically meaningful spread Neotropical bird (Xenops minutus) reveals divergence, transformations for ordination of species data. Oecologia, 129, population expansion, and gene flow. Molecular Phylogenetics 271–280. and Evolution, 83, 305–316. Lemmon EM, Lemmon AR (2013) High-throughput genomic Hastie T, Tibshirani R, Friedman JH (2001) The Elements of Sta- data in systematics and phylogenetics. Annual Review of Ecol- tistical Learning: Data Mining, Inference, and Prediction. ogy Evolution and Systematics, 44,99–121. Springer-Verlag, New York, New York. Li B, Dewey CN (2011) RSEM: accurate transcript quantifica- Heibl C, Calenge C (2013) phyloclim: Integrating phylogenetics tion from RNA-Seq data with or without a reference. BMC and climatic niche modeling. R package version 0.9-4. Bioinformatics, 12, 323. Herremans M (1989) Taxonomy and evolution in redpolls Li H, Durbin R (2009) Fast and accurate short read alignment Carduelis flammea–hornemanni; a multivariate study of their with Burrows-Wheeler transform. Bioinformatics, 25, 1754– biometry. Ardea, 78, 441–458. 1760. Hijmans RJ, Garrett KA, Huaman Z et al. (2000) Assessing the Lifjeld JT, Bjerke BA (1996) Evidence for assortative pairing by geographic representativeness of genebank collections: the cabaret and flammea subspecies of the common redpoll the case of Bolivian wild potatoes. Conservation Biology, 14, Carduelis flammea in SE Norway. Fauna Norvegica Series C, 1755–1765. Cinclus, 19,1–8. Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A (2005) Lohse M, Bolger AM, Nagel A et al. (2012) RobiNA: a user- Very high resolution interpolated climate surfaces for glo- friendly, integrated software solution for RNA-Seq-based bal land areas. International Journal of Climatology, 25, 1965– transcriptomics. Nucleic Acids Research, 40, W622–W627. 1978. Maddison W, Knowles L (2006) Inferring phylogeny despite Hijmans RJ, Phillips S, Leathwick J, Elith J (2013) dismo: Spe- incomplete lineage sorting. Systematic Biology, 55,21–30. cies distribution modeling. Manceau M, Domingues VS, Linnen CR, Rosenblum EB, Hoek- Hipp AL, Eaton DAR, Cavender-Bares J, Fitzek E, Nipper R, stra HE (2010) Convergence in pigmentation at multiple lev- Manos PS (2014) A framework phylogeny of the American els: mutations, genes and function. Philosophical Transactions oak clade based on sequenced RAD data. PLoS ONE, 9, of the Royal Society B: Biological Sciences, 365, 2439–2450. e93975. Marten JA, Johnson NK (1986) Genetic relationships of North vonHoldt BM, Pollinger JP, Earl DA et al. (2011) A genome- American cardueline finches. Condor, 88, 409–420. wide perspective on the evolutionary history of enigmatic Marthinsen G, Wennerberg L, Lifjeld JT (2008) Low support wolf-like canids. Genome Research, 21, 1294–1305. for separate species within the redpoll complex (Carduelis Hubbard JK, Uy JAC, Hauber ME, Hoekstra HE, Safran RJ flammea–hornemanni–cabaret) from analyses of mtDNA and (2010) Vertebrate pigmentation: from underlying genes to microsatellite markers. Molecular Phylogenetics and Evolution, adaptive function. Trends in Genetics, 26, 231–239. 47, 1005–1017. James FC (1983) Environmental component of morphological Mastretta-Yanes A, Arrigo N, Alvarez N, Jorgensen TH, Pinero~ differentiation in birds. Science, 221, 184–186. D, Emerson BC (2015) Restriction site-associated DNA Jombart T (2008) adegenet: a R package for the multivariate sequencing, genotyping error estimation and de novo assem- analysis of genetic markers. Bioinformatics, 24, 1403–1405. bly optimization for population genetic inference. Molecular Jombart T, Ahmed I (2011) adegenet 1.3-1: new tools for the Ecology, 15,28–41. analysis of genome-wide SNP data. Bioinformatics, 27, 3070– McCormack JE, Maley JM (2015) Interpreting negative results 3071. with taxonomic and conservation implications: another look

© 2015 John Wiley & Sons Ltd 16 N. A. MASON and S. A. TAYLOR

at the distinctness of coastal California Gnatcatchers. Auk, R Core Team (2014) R: A Language and Environment for Statisti- 132, 380–388. cal Computing. R Core Team, Vienna, Austria. McCormack JE, Hird SM, Zellmer AJ, Carstens BC, Brumfield Rand DM, Harrison RG (1989) Ecological genetics of a mosaic RT (2013) Applications of next-generation sequencing to phy- hybrid zone: mitochondrial, nuclear, and reproductive differ- logeography and phylogenetics. Molecular Phylogenetics and entiation of crickets by soil type. Evolution, 43, 432–449. Evolution, 66, 526–538. Reed RD, Papa R, Martin A et al. (2011) Optix drives the Michel AP, Sim S, Powell TH et al. (2010) Widespread genomic repeated convergent evolution of butterfly wing pattern divergence during sympatric speciation. Proceedings of the mimicry. Science, 333, 1137–1141. National Academy of Sciences of the USA, 107, 9724–9729. Ritchie ME, Phipson B, Wu D et al. (2015) Limma powers dif- Molau M (1985) Grasiskkomplexet i Sverige (The redpoll com- ferential expression analyses for RNA-sequencing and micro- plex in ). Var Fagelv€arld, 44,5–20. array studies. Nucleic Acids Research, 43,1–13. Mundinger PC (1979) Call learning in the : etholo- Robinson MD, McCarthy DJ, Smyth GK (2009) edgeR: a Bio- gical and systematic considerations. Systematic Biology, 28, conductor package for differential expression analysis of dig- 270–283. ital gene expression data. Bioinformatics, 26, 139–140. Mundy NI (2005) A window on the genetics of evolution: Romero IG, Ruvinsky I, Gilad Y (2012) Comparative studies of MC1R and plumage colouration in birds. Proceedings of the gene expression and the evolution of gene regulation. Nature Royal Society B: Biological Sciences, 272, 1633–1640. Reviews Genetics, 13, 505–516. Nadeau NJ, Martin SH, Kozak KM et al. (2013) Genome-wide Sangster G, Knox AG, Helbig AJ, Parkin DT (2002) Taxonomic patterns of divergence and gene flow across a butterfly radi- recommendations for European birds. Ibis, 144, 153–159. ation. Molecular Ecology, 22, 814–816. Schoener TW (1968) The Anolis lizards of Bimini: resource par- Nice CC, Gomp[ert Z, Fordyce JA, Forister ML, Lucas LK, titioning in a complex fauna. Ecology, 49, 704–726. Buerkle CA (2012) Hybrid speciation and independent Seehausen O, Butlin RK, Keller I et al. (2014) Genomics and the evolution in lineages of alpine butterflies. Evolution, 67, origin of species. Nature Reviews Genetics, 15, 176–192. 1055–1068. Seutin G, Boag PT, White BN, Ratcliffe LM (1991) Sequential Nijhout HF (2003) Development and evolution of adaptive polyandry in the Common Redpoll (Carduelis flammea). Auk, polyphenisms. Evolution and Development, 5,9–18. 108, 166–170. Nosil P, Feder JL (2012) Genomic divergence during speciation: Seutin G, Boag PT, Ratcliffe LM (1992) Plumage variability in causes and consequences. Philosophical Transactions of the redpolls from Churchill, Manitoba. Auk, 109, 771–785. Royal Society B, 367, 332–342. Seutin G, Ratcliffe LM, Boag PT (1995) Mitochondrial DNA Ottvall R, Bensch S, Walinder G, Lifjeld JT (2002) No evidence homogeneity in the phenotypically diverse redpoll finch of genetic differentiation between lesser redpolls Carduelis complex (Aves: Carduelinae: Carduelis flammea-hornemanni). flammea cabaret and common redpolls Carduelis f. flammea. Evolution, 49, 962–973. Avian Science, 2, 237–244. Shapiro MD, Marks ME, Peichel CL et al. (2004) Genetic and Paradis E (2010) pegas: an R package for population genetics developmental basis of evolutionary pelvic reduction in with an integrated-modular approach. Bioinformatics, 26, threespine sticklebacks. Nature, 428, 717–723. 419–420. Smyth GK (2005) Limma: linear models for microarray data. Pavey SA, Collin H, Nosil P, Rogers S (2010) The role of gene In: Bioinformatics and Computational Biology Solutions Using expression in ecological speciation. Annals of the New York R and Bioconductor (eds Gentleman R, Carey V, Huber W, Academy of Sciences, 1206, 110–129. Irizarry R, Dudoit S), pp. 397–420. Springer, New York, Peterson AT, Soberon J, Sanchez-Cordero V (1999) Conserva- NY. tism of ecological niches in evolutionary time. Science, 285, Symonds MRE, Tattersall GJ (2010) Geographical variation in 1265–1267. bill size across bird species provides evidence for allen’s Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE rule. The American Naturalist, 176, 188–197. (2012) Double digest RADseq: an inexpensive method for de Tiffin P, Ross-Ibarra J (2014) Advances and limits of using pop- novo SNP discovery and genotyping in model and non- ulation genetics to understand local adaptation. Trends in model species. PLoS ONE, 7, e37135. Ecology and Evolution, 29, 673–680. Phillips SJ, Anderson RP, Schapire RE (2006) Maximum Troy DM (1985) A phenetic analysis of the redpolls Carduel- entropy modeling of species geographic distributions. Ecolog- is flammea flammea and C. hornemanni exilipes. Auk, 102,82– ical Modelling, 190, 231–259. 96. Poelstra JW, Vijay N, Bossu CM et al. (2014) The genomic land- Tzahor E, Kempf H, Mootoosamy RC et al. (2003) Antago- scape underlying phenotypic integrity in the face of gene nists of Wnt and BMP signaling promote the formation of flow in crows. Science, 344, 1410–1414. vertebrate head muscle. Genes and Development, 17, 3087– Pritchard JK, Stephens M, Donnelly P (2000) Inference of popu- 3099. lation structure using multilocus genotype data. Genetics, Wagner CE, Keller I, Wittwer S et al. (2012) Genome-wide 155, 945–959. RAD sequence data provide unprecedented resolution of Questiau S, Eybert MC, Gaginskaya AR, Gielly L, Taberlet P species boundaries and relationships in the Lake Victoria (1998) Recent divergence between two morphologically dif- cichlid adaptive radiation. Molecular Ecology, 22, 787–798. ferentiated subspecies of bluethroat (Aves: Muscicapidae: Warren DL, Glor RE, Turelli M (2008) Environmental niche Luscinia svecica) inferred from mitochondrial DNA sequence equivalency versus conservatism: quantitative approaches to variation. Molecular Ecology, 7, 239–245. niche evolution. Evolution, 62, 2868–2883.

© 2015 John Wiley & Sons Ltd PHENOTYPIC AND GENETIC VARIATION IN REDPOLLS 17

Warren DL, Glor RE, Turelli M (2010) ENMTools: a toolbox for comparative studies of environmental niche models. Ecogra- Supporting information phy, 33, 607–611. Additional supporting information may be found in the online ver- West-Eberhard MJ (1989) Phenotypic plasticity and the origins sion of this article. of diversity. Annual Review of Ecology and Systematics, 20, 249–278. Table S1 Sampling information for 77 ingroup and two out- West-Eberhard MJ (2005) Developmental plasticity and the ori- group individuals for population genetic analyses presented in gin of species differences. Proceedings of the National Academy this study. of Sciences of the USA, 102, 6543–6549. Wolf JBW, Lindell J, Backstrom N (2010) Speciation genetics: Table S2 Phenotype data for 10 individuals collected for RNA- current status and evolving approaches. Philosophical Transac- Seq experiment. tions of the Royal Society B: Biological Sciences, 365, 1717–1733. Table S3 Breeding occurrence data for A. flammea and A. horne- Wu CI (2001) The genic view of the process of speciation. Jour- manni in North America used in generating ecological niche nal of Evolutionary Biology, 14, 851–865. models. Zuccon D, Prys-Jones^ R, Rasmussen PC, Ericson PGP (2012) The phylogenetic relationships and generic limits of Table S4 Results of Structure with different settings of the K finches (Fringillidae). Molecular Phylogenetics and Evolution, parameter. 62, 581–596. Table S5 Results of differential gene expression analysis using generalized linear models and morphology + plumage PC1 as the response variable. S.A.T. and N.A.M. conceived the study. N.A.M. and S.A.T. conducted field work. S.A.T. generated ddRAD- Fig. S1 Loadings for principal component axes of plumage and Seq libraries. N.A.M. performed transcriptome assembly morphology characters. and differential gene expression analyses. N.A.M. and Fig. S2 Scatterplot of first two principal component axes of S.A.T. annotated the transcriptome. N.A.M. ran the plumage and morphology. population genetic analyses. N.A.M. generated figures Fig. S3 Summary statistics from Trinity de novo assembly. with help from S.A.T. N.A.M. and S.A.T. wrote the manuscript. Fig. S4 Comparison of observed heterozygosity among 20 712 loci sampled from A. flammea, A. hornemanni, and A. cabaret.

Fig. S5 Results of Structure and genetic PCA analyses using a locus assembly pipeline that includes outgroup individuals. Data accessibility Fig. S6 Results of PCA using SNPs called from the transcrip- The de novo transcriptome assembly and raw reads of tome. RNA-Seq data are available through BioProject number PRJNA256306. Raw, demultiplexed reads from the Fig. S7 Output of BAYESCAN run on ddRAD-Seq loci, which did ddRAD-Seq portion of this study are available through not detect any outlier loci between currently recognized species of redpolls. the NCBI Short Read Archive (SRP052607). Phenotypic measurements and occurrence data used in this study Fig. S8 Percent contribution of each BioClim variable included are available through the Supporting Information. Pan- in constructing ecological niche models for A. hornemanni and els of SNP data generated from the ddRAD-Seq pipe- A. flammea. lines and the transcriptome, FPKM read counts for gene Fig. S9 Response curves of each BioClim variable included in expression analyses, input files for SNAPP, a posterior constructing ecological niche models for A. hornemanni and distribution of trees generated from SNAPP, and R and A. flammea. shell scripts used to run bioinformatics pipelines and Fig. S10 Statistical tests for niche identity (top) and back- analyse data are available through Dryad (http:// ground similarity (bottom) for A. hornemanni and A. flammea. dx.doi.org/10.5061/dryad.15rk0).

© 2015 John Wiley & Sons Ltd