ARTICLE doi:10.1038/nature14181

Evolution of Darwin’s and their revealed by genome sequencing

Sangeet Lamichhaney1*, Jonas Berglund1*, Markus Sa¨llman Alme´n1, Khurram Maqbool2, Manfred Grabherr1, Alvaro Martinez-Barrio1, Marta Promerova´1, Carl-Johan Rubin1, Chao Wang1, Neda Zamani1,3, B. Rosemary Grant4, Peter R. Grant4, Matthew T. Webster1 & Leif Andersson1,2,5

Darwin’s finches, inhabiting the Gala´pagos archipelago and Cocos Island, constitute an iconic model for studies of speci- ation and adaptive . Here we report the results of whole-genome re-sequencing of 120 individuals representing all of the Darwin’s and two close relatives. Phylogenetic analysis reveals important discrepancies with the phenotype-based . We find extensive evidence for interspecific flow throughout the radiation. Hybrid- ization has given rise to species of mixed ancestry. A 240 kilobase haplotype encompassing the ALX1 gene that encodes a transcription factor affecting craniofacial development is strongly associated with shape diversity across Darwin’s finch species as well as within the ( fortis), a species that has undergone rapid evolution of beak shape in response to environmental changes. The ALX1 haplotype has contributed to diversification of beak shapes among the Darwin’s finches and, thereby, to an expanded utilization of food resources.

Adaptive radiations are particularly informative for understanding the diversification throughout phylogeny, and report the discovery of a locus ecological and genetic basis of biodiversity1,2. Those causes are best iden- with a major effect on beak shape. tified in young radiations, as they represent the early stages of diver- sification when phenotypic transitions between species are small and Considerable nucleotide diversity interpretable and extinctions are likely to be minimal3. Darwin’s finches We generated approximately103 sequence coverage per individual are a classic example of such a young adaptive radiation3,4. They have using 2 3 100 base-pair (bp) paired-end reads (Extended Data Fig. 1). diversified in beak sizes and shapes, feeding habits and diets in adapt- Reads were aligned to the genome assembly of a female medium ground ing to different food resources4,5 (Extended Data Table 1). The radiation finch (G. fortis)14. We identified Z- and W-linked scaffolds on the basis is entirely intact, unlike most other radiations, none of the species hav- of significant differences in read depth between males (ZZ) and females ing become extinct as a result of human activities4. (ZW) (Supplementary Table 1) and generated a G. fortis mtDNA se- Fourteen of the currently recognized species evolved from a common quence through a combined bioinformatics and experimental approach. ancestor in the Gala´pagos archipelago (Fig. 1a) in the past 1.5 million Stringent variant calling revealed approximately 45 million variable years according to mitochondrial DNA (mtDNA) dating6; a fifteenth sites within or between populations. We found a considerable amount species inhabits Cocos Island. The radiation proceeded rapidly as a of genetic diversity within each population, in the range 0.3 3 1023 to result of strong isolation from the South American continent, genera- 2.2 3 1023 (Extended Data Table 2), similar to that reported in other tion of new islands by volcanic activity, climatic oscillations caused by bird populations15 including island populations of the zebra finch16. the El Nin˜o phenomenon, and sea level changes associated with glacial We used these estimates of diversity to estimate effective population sizes and interglacial cycles over the past million years that led to repeated of Darwin’s finch species within a range of 6,000–60,000 (Supplemen- alternations of island formation and coalescence7,8. tary Text). Extensive sharing of genetic variation among populations Traditional taxonomy of Darwin’s finches is based on morphology3, was evident, particularly among ground and tree finches, with almost no and has been largely supported by observations of breeding birds4,5 and fixed differences between species in each group (Extended Data Fig. 2). genetic analysis6,9. However, the branching order of several recently diverged taxa is unresolved6 and genetic analysis of phylogeny has been Genome-based phylogeny limited to mtDNA and a few microsatellite loci. Some candidate According to the classical taxonomy of Darwin’s finches, supported by for beak development are differentially expressed in species with dif- morphological and mitochondrial (cytochrome b) data, warbler finches ferent beak morphologies10–12, but the loci controlling genetic variation were the first to branch off, and ground and tree finches constitute the in beak diversity among Darwin’s finches remain to be discovered. most recent major split3,6,9. Our maximum-likelihood phylogenetic tree Here we report results from whole genome re-sequencing of 120 based on autosomal genome sequences is generally consistent with cur- individuals representing all Darwin’s finch species and two closely rent taxonomy, but shows several interesting deviations (Fig. 1b). First, related , Tiaris bicolor and Loxigilla noctis13. For some species Geospiza difficilis occurring on six different islands forms a polyphyl- we collected samples from multiple islands (Fig. 1a). We comprehen- etic group separated into three distinct groups: (1) populations occu- sively analyse patterns of intra- and interspecific genome diversity and pying the highlands of Pinta, Santiago and Fernandina, (2) populations phylogenetic relationships among species. We find widespread evid- occupying the low islands of Wolf and Darwin in the northwest3,6,9 and ence of interspecific gene flow that may have enhanced evolutionary (3) the population on Genovesa in the northeast. This is consistent with

1Department of Medical Biochemistry and Microbiology, Uppsala University, SE-751 23 Uppsala, Sweden. 2Department of Breeding and Genetics, Swedish University of Agricultural Sciences, SE- 75007 Uppsala, Sweden. 3Department of Plant Physiology, Umea˚ University, SE-901 87 Umea˚, Sweden. 4Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey 08544, USA. 5Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843-4458, USA. *These authors contributed equally to this work.

19 FEBRUARY 2015 | VOL 518 | NATURE | 371 ©2015 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

C. olivacea_S Warbler finches a b Vegetarian finch * C. fusca_E Cocos finch Darwin Sharp-beaked ground finches G. difficilis_D * Tree finches Cocos All other ground finches Wolf P. inornata_C C. fusca_L Outgroups G. difficilis_W * P. crassirostris_Z 1º P. inornata_C Galápagos G. difficilis_P * C. psittacula_P South America * G. difficilis_F * G. difficilis_S Pinta G. conirostris_G * G. magnirostrs_G G. difficilis_P G. difficilis_G Genovesa C. pauper_F G. fuliginosa_S * * C. parvulus_Z G. difficilis_S Marchena * C. psittacula_P C. olivacea_S C. heliobates_I G. fortis_M * 0º * C. pallidus_Z Santiago G. scandens_M * G. difficilis_D G. fuliginosa_Z * G. difficilis_F C. parvulus_Z G. difficilis_W C. pallidus_Z G. fuliginosa_Z Fernandina Daphne P. crassirostris_Z * * G. fuliginosa_S * G. fortis_M Santa * G. difficilis_G Cruz C. fusca_L G. scandens_M Isabela * * C. heliobates_I G. conirostris_G 1º San Cristóbal * C. pauper_F G. magnirostris_G C. fusca_E * G. conirostris_E G. conirostris_E 0 100 km Floreana 92º 91º 90º Española * T. bicolor_B L. noctis_B

Figure 1 | Sample locations and phylogeny of Darwin’s finches. local support on the basis of the Shimodaira–Hasegawa test are marked by a, Geographical origin of samples; the letter after the species name is the asterisks. The colour code for groups of species applies to both panels. Taxa that abbreviation used for geographical origin. The map is modified from ref. 30. showed deviations from classical taxonomy are underscored. b, Maximum-likelihood trees based on all autosomal sites; all nodes having full an earlier version of the taxonomy, in which these three groups were clas- sified as distinct species on the basis of morphological differences17,18. Second, Geospiza conirostris on Espan˜ola showed the highest gen- a C.olivacea_S b etic similarity to another species, Geospiza magnirostris, whereas G. 1 G. magnirostris_G 827 ± 157 C. fusca_E conirostris on Genovesa clustered with Geospiza scandens (Fig.1b).Here, 291 2 G. difficilis_W phenotypic similarity parallels genetic similarity; G. conirostris on C. fusca_L ± Genovesa have a pointed beak similar to G. scandens, whereas those 901 123 3 G. difficilis_P on Espan˜ola have a blunt beak more similar to the beaks of G. magnir- P. crassirostris_Z ostris (Extended Data Fig. 3). P. inornata_C 4 T. bicolor_B 546 ± 74 D = 0.098 231 ± 32 G. difficilis_F 39% A network constructed from autosomal genome sequences indicates G. difficilis_S P = 5 X 10–113 418,099 ± 231 34% conflicting signals in the internal branches of ground and tree finches 529 81 G. difficilis_P 361,033 that may reflect incomplete lineage sorting and/or gene flow (Extended C. pauper_F 27% 205 ± 50 174 C. psittacula_P Data Fig. 3). The exact branching order of the most recently evolved 469 ± 101 C. parvulus_Z 296,072 ± C. pallidus_Z 204 83 171 1 ground and tree finches should be interpreted with caution as it may C. heliobates_I B B A G. difficilis_D 2 change with additional sampling. Since our data revealed some import- 412 ± 97 134 B A B G. difficilis_W 250 ± 35 ant discrepancies with the phenotype-based taxonomy, we propose a G. fuliginosa_Z 3 A B B 239 G. fuliginosa_S 360 ± 58 revised taxonomy for the sharp-beaked ground finch (G. difficilis) and 240 G. fortis_M G. difficilis_G 4 A A A the large cactus finch (G. conirostris) (Supplementary Text and Extended G. scandens_M 258 ± 70 227 Data Fig. 4), but will use the current names in the text. G. conirostris_G

We dated phylogenetic splits on the basis of genome divergence ± 236 124 G. magnirostris_G (Fig. 2a), and compared these estimates with those obtained using 170 mtDNA (Extended Data Fig. 5a and Supplementary Text). We infer that G. conirostris_E the most basal split, between warbler finches (Certhidea sp.) and other 900 800 700 600 500 400 300 200 100 0 finches, occurred about 900,000 years ago. The rapid radiations of ground and tree finches began around 100,000–300,000 years ago. Although Figure 2 | Population history. a, Dating the nodes (in thousands of years) these estimates are based on whole-genome data, they should be con- with confidence intervals (when applicable) in the phylogeny on the basis of sidered minimum times, as they do not take into account gene flow. divergence corrected for coalescence in ancestral populations; the topology is the representation of the inferred species tree from Fig. 1b. b, ABBA–BABA Extensive interspecies gene flow analysis of G. magnirostris, G. difficilis from Wolf and Pinta, and L. noctis. Number of sites supporting different trees is indicated both as a percentage and The discrepancies between phylogenies based on morphology and ge- as actual numbers. The D statistic and corresponding Holm–Bonferroni- nome sequences may be due to convergent evolution and/or interspe- corrected P value are given for testing the null hypothesis of symmetry in cies gene flow. We found evidence of introgression from three sources: genetic relationships. Finch heads are reproduced from ref. 5. How and Why ABBA–BABA tests, discrepancies between phylogenetic trees based on Species Multiply: The Radiation of Darwin’s Finches by Peter R. Grant & autosomal and sex-linked loci, and mtDNA (Supplementary Text and B. Rosemary Grant. Copyright ß 2008 Princeton University Press. Reprinted Extended Data Fig. 5a). by permission.

372 | NATURE | VOL 518 | 19 FEBRUARY 2015 ©2015 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH

First, the D statistic19 associated with the ABBA–BABA test was used on Wolf have pointed beaks. We used non-overlapping 15-kilobase (kb) to compare two populations of G. difficilis from Pinta and Wolf, and windows to identify regions with the highest fixation indices (FST) G. magnirostris from Genovesa, using L. noctis as outgroup; G. magnirostris between groups. The FST distribution was Z-transformed (ZFST) and also occurs on Wolf but we lacked samples from that population. The regions with striking ZFST values were identified (Fig. 3a). Among the analysis confirmed that G. difficilis on Wolf has a closer genetic rela- 15 most significant regions, six harboured genes previously associated tionship with G. magnirostris than with G. difficilis on Pinta (Fig. 2b). with craniofacial and/or beak development inmammals orbirds includ- But there is evidence of gene flow between G. difficilis on Wolf and ing calmodulin (CALM)11, goosecoid (GSC)22, retinol dehy- Pinta (P 5 5 3 102113), because the substantial asymmetry in genetic drogenase 14 (RDH14)23, ALX homeobox 1 (ALX1)24,25,fibroblastgrowth relationships cannot be explained by incomplete lineage sorting. How- factor 10 (FGF10)26 and forkhead box C1 (FOXC1)27. A previous study ever, the D statistic does not distinguish admixture from ancestral sub- demonstrated differential expression of CALM between finches with division19. We conclude that the closely related populations of G. difficilis different beak types11. Two other studies reported differential expres- on Wolf and Darwin are a species of mixed ancestry where most of the sion of bone morphogenetic 4 (BMP4)10,12, but we did not observe genome originates from G. magnirostris or a close relative (Supplemen- any elevated ZFST values in the vicinity of this locus, suggesting that tary Table 2), whereas a considerable proportion of the genome, pos- differential expression is controlled by other loci. sibly including genetic variants affecting phenotypic characters, is derived The most striking finding was a 240-kb region with high ZFST values, from G. difficilis. Similarly, G. difficilis on Genovesa shows a closer including the window with the highest ZFST score (9.46) overall genetic relationship to the other ground and tree finches than to G. (Fig. 3a, b). The region overlaps part of LRRIQ1 (leucine-rich repeats difficilis on Pinta, but we also found evidence for gene flow between the and IQ motif containing 1), the entire ALX1 gene and about 130 kb two groups previously classified as G. difficilis (P 5 3 3 10287; Sup- downstream of ALX1. No previous report indicates that LRRIQ1 has a plementary Table 2). role during development in vertebrates. By contrast, ALX1 is an excel- We next investigated gene flow involving the populations of G. lent candidate for variation in beak morphology. It encodes a paired- conirostris on Genovesa and Espan˜ola, which appear as separate spe- type homeodomain protein that plays a crucial role in development of cies in our phylogenetic analysis. The ABBA–BABA analysis confirmed structures derived from craniofacial mesenchyme, the first branchial that G. conirostris on Espan˜ola shows a closer genetic relationship to G. arch and the limb bud24, and on migration of cranial neural crest cells, magnirostris than to G. conirostris on Genovesa (Extended Data Fig. 6a), highly relevant to beak development25. Loss of ALX1 in humans causes but also provided evidence for gene flow between G. conirostris on disruption of early craniofacial development24. Espan˜ola and G. conirostris on Genovesa, which may explain some of All individuals in the blunt beak category were homozygous for a their phenotypic similarities and their previous classification as a sin- blunt beak-associated haplotype (denoted B), except one heterozyg- gle species. ous G. conirostris individual from Espan˜ola. Furthermore, except for Given the evidence of relatively recent hybridization, we explored one heterozygous bird from Genovesa, all 19 G. difficilis individuals the possibility of more ancient hybridization between warbler finches not included in the FST scan were homozygous for a pointed beak hap- (Certhidea fusca and Certhidea olivacea) and other finches. ABBA– lotype (P), consistent with their phenotypic appearance (sharp-beaked BABA analysis provided evidence for gene flow between C. fusca and ground finches). This is notable because genome-wide, G. difficilis on the other finches (P 5 7 3 102199; Extended Data Fig. 6b). This pat- Wolf, Darwin and Genovesa are all more closely related to the blunt- tern of gene flow was apparent for all non-warbler finches, implying beaked G. magnirostris than to the pointed-beaked G. difficilis from that it occurred before the radiation of the non-warbler finches (Sup- Pinta (Fig. 2b). plementary Table 2). A phylogenetic tree based on this region revealed a deep divergence The trees based on autosomal (Fig. 1b) and Z-linked sites (Extended between the B and P haplotypes that must have occurred soon after the Data Fig. 5b) are not completely congruent. The tree based on Z-linked split between warbler finches and other Darwin’s finches (Fig. 3c). Apart polymorphisms indicated that G. difficilis present on the highlands of from the blunt-beaked G. magnirostris and G. conirostris on Espan˜ola, all Pinta, Fernandina and Santiago is more closely related to Platyspiza individuals except three were homozygous for P haplotypes,the remain- crassirostris and emerged before the Cocos finch split off from the ing three being heterozygous. The two G. fortis from Daphne Major ground and tree finches, whereas the autosomal tree indicates a reversed Island were both homozygous, but for different haplotypes (BB and PP; order for the emergence of the two species. This discrepancy can poten- Fig. 3c). The short branch lengths among B haplotypes are consistent tially be explained by gene flow between G. difficilis and tree and ground with a selective sweep. There were 335 fixed differences between the B finches after the Cocos finch became reproductively isolated from the and P haplotypes (Fig. 3d, upper panel), which we assigned as derived finches on the Gala´pagos, which affected Z-linked and autosomal loci or ancestral on the basis of comparison with the outgroup sequence to different degrees. It is a common observation in closely related spe- (L. noctis). Derived alleles on the B haplotype were aggregated in the vi- cies that there is more interspecies sharing of sequence polymorphisms cinity of ALX1, including the downstream region (Fig. 3d, middle panel). at autosomal loci than at sex-linked loci20. This interpretation of the Furthermore, 8 of these 335 fixed differences occurred at conserved sites, phylogenetic status of G. difficilis (highland group) is supported by the and the B haplotype carried the derived allele at seven of them (Fig. 3d, trees based on both mtDNA and W (Extended Data Fig. 5), which sug- lower panel). Four derived alleles occurred at sites corresponding to gest that G. difficilis diverged from the ancestor of other ground and transcription factor binding sites in the human genome28. Two other tree finches before the emergence of the Cocos finch. changes constitute missense mutations (L112P and I208V) at ALX1 Finally, our analysis of demographic history using the pairwise se- amino-acid residues that are highly conserved among and mam- quentially Markovian coalescent (PSMC) model21 was consistent with mals (Extended Data Fig. 7), and ‘Sorting Intolerant From Tolerant’ extensive interspecies gene flow among the ground finches, as they (SIFT)29 analysis classified both as damaging (score 0.03 for both). The have maintained larger effective population sizes than the other spe- ratio of non-synonymous to synonymous substitutions between the cies (Supplementary Text and Extended Data Fig. 6c, d). P and B alleles is high (2/1 5 2.00) compared with the ratio observed between the ancestral P allele and orthologous zebra finch (2/14 5 A major locus controlling beak shape 0.14) and human (21/122 5 0.17) sequences, suggesting that one or both The most striking morphological difference among Darwin’s finches of these missense mutations are non-neutral. concerns beak shape (Extended Data Fig. 3). We performed a genome- That ALX1 is polymorphic in G. fortis (Fig. 3c, d, upper panel) is wide scan on the basis of populations that are closely related but show dif- particularly interesting, because field observations have shown there ferent beak morphology: G. magnirostris and G. conirostris on Espan˜ola is considerable diversity in beak shape in this species5,30. We genotyped have blunt beaks, whereas G. conirostris on Genovesa and G. difficilis an additional 62 G. fortis birds from Daphne Major Island for a diagnostic

19 FEBRUARY 2015 | VOL 518 | NATURE | 373 ©2015 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

a 15-kb windows Figure 3 | A major locus 0 controlling beak shape. a, Genome- 2 wide FST screen comparing 4 st G. magnirostris and G. conirostris ZF 6 (Espan˜ola) having blunt beaks with CALM 8 RDH14 GALN2 AKR1D1 CNTNAP3 G. conirostris (Genovesa) and SMIM15 SV2C FOXC1 RCAN2 10 TMTC2 FGF10 KIAA1429 STK3 EXOC2 G. difficilis (Wolf) having pointed PPP4R4, GSC PLEKHF1 VPS13B beaks. The y axis represents ZFST DLK1, DIO3 MGAT4C, EPYC, SLC6A15 SUPT3H RASSF9, LRRIQ1, ALX1 values. b, Nucleotide diversities in b LRRIQ1 ALX1 c the ALX1 region. The 240-kb region G. magnirostris_G C. fusca_L showing high homozygosity in G. conirostris_E blunt-beaked species is highlighted. Red and blue colours in b–d refer to G. conirostris_G C. fusca_E blunt and pointed beak haplotypes, G. difficilis_W respectively. c, Neighbour-joining Nucleotide diversity G. difficilis_P G. magnirostris_G haplotype tree of ALX1 region. 240 kb G. fortis_M Haplotypes originating from heterozygous birds (see text) are d LRRIQ1 ALX1 BB BP PP G. conirostris_E indicated in yellow. Estimated time G. magnirostris_G since divergence (6 confidence G. conirostris_E Blunt G. fortis_M P. crassirostris_Z interval) of blunt and pointed beak G. conirostris_G C. pallidus_Z haplotypes are given in thousands of G. difficilis_D C. psittacula_P birds G. difficilis_W years. d, Upper panel: genotypes at C. heliobates_I G. difficilis_F C. pauper_F 335 SNPs showing complete fixation Pointed G. difficilis_S

Genotype in individual C. parvulus_Z G. difficilis_P between ALX1 haplotypes associated G. difficilis_G 913 G. scandens_M with blunt (B) and pointed (P) Derived ± 144 beaks. d, Middle panel: classification state Allelic Ancestral P. inornata_C of alleles associated with blunt beaks LRRIQ1 ALX1 synonymous at the 335 SNPs as derived or synonymous ALX1 non-synonymous G. fuliginosa_Z TFBS G. fuliginosa_S ancestral on the basis of allelic state in G. difficilis_G the outgroup. d, Lower panel: G. difficilis_S TFBS TFBS 35 G. difficilis_F PhastCons scores (on the basis TFBS of human, mouse and finch G. difficilis_P alignments) for the 335 SNP sites.

PhastCons score TFBS, transcription factor binding sites. e, Linear regression analysis of e G. difficilis_W beak-shape scores among G. fortis 2.0 individuals on Daphne Major Island P = 8.85 × 10–5 G.fortis_M 1.5 classified according to ALX1

1.0 G. conirostris_G genotype; distribution of pointedness in each class is shown as a boxplot; 0.5 n 5 62; F 5 17.7, adjusted R2 5 0.22. 0.0 C. olivacea_S

Beak shape score Differences in six individual body T. bicolor_B (degree of pointedness) –0.5 and beak size traits were not L. noctis_B significant (all P . 0.05). BB BPPP Genotype 0.02 single nucleotide polymorphism (SNP), and observed a significant introduced by introgressive hybridization most probably contributed association with beak shape (P 5 8.8 3 1025, Fig. 3e). PP homozygotes to evolution of more pointed beaks in 1987 following natural selection tended to have proportionately long, pointed beaks, BB homozygotes as a result of a change in food supply in the 1985–86 drought30. had proportionately deep, blunt beaks, whereas heterozygotes (BP) had intermediate beak shapes. We also compared haplotype frequen- Discussion cies among G. fortis individuals on Daphne Major Island with those on Our revised and dated phylogeny of Darwin’s finches shows that the Santa Cruz, which have a larger and blunter beak on average31, possibly took place in the past million years, with a rapid as a result of introgressive hybridization with G. magnirostris4,5.We accumulation of species recently (Supplementary Text). We have ge- found theB haplotype to be more frequent onSanta Cruzthan onDaph- nomically characterized the entire radiation, which has revealed a ne Major (0.74, n 5 21 versus 0.49, n 5 62; P 5 0.007, Fisher’s exact test). striking connection between past and present evolution. Evidence of Natural selection on beak size and shape of G. fortis on Daphne introgressive hybridization, which has been documented as a contem- Major Island has led to evolutionary change in the past few decades5,30. porary process, is found throughout the radiation. Hybridization has Moreover, genetic variation in beak shape has been increased through given rise to species of mixed ancestry, in the past (this study) and the introgressive hybridization5,30 with two species of Geospiza, scandens present30. It has influenced the evolution of a key phenotypic trait: beak and fuliginosa, that have relatively pointed beaks. Therefore we expect shape. Similar introgressive hybridization affecting an adaptive trait hybrids and backcrosses in the G. fortis population to have a relatively (mimicry) has been described in Heliconius butterflies32. The degree of high frequency of the P haplotype. We genotyped an additional 25 continuity between historical and contemporary evolution is unexpec- G. fortis at ALX1, added them to the sample of 62 (Methods) and com- ted because introgressive hybridization plays no part in traditional pared the haplotype frequencies in eight hybrids (including backcrosses) accounts of adaptive radiations of animals1,2. For young radiations it and 79 non-hybrids. ALX1-P had a frequency of 0.75 among hybrids, complements the better-known role of natural selection. and 0.44 among the others, which is statistically significant in the ex- Charles Darwin first noted the diversity in beak shapes among the pected direction (P 5 0.03, Fisher’s exact test). Thus, ALX1-P alleles finches on Gala´pagos. Our genomic study has now revealed some of

374 | NATURE | VOL 518 | 19 FEBRUARY 2015 ©2015 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH the underlying genetic variation explaining this diversity. A polygenic 18. Lack, D. The Galapagos finches (Geospizinae): a study in variation. Occ. Pap. Calif. Acad. Sci. 21, 1–159 (1945). basis for beak diversity is indicated by our discovery of about 15 19. Durand, E. Y., Patterson, N., Reich, D. & Slatkin, M. Testing for ancient admixture regions with strong genetic differentiation between groups of finches between closely related populations. Mol. Biol. Evol. 28, 2239–2252 (2011). with blunt or pointed beaks. We present evidence that the ALX1 locus 20. Qvarnstrom, A. & Bailey, R. I. Speciation through evolution of sex-linked genes. contributes to beak diversity, within and among species. The derived Heredity 102, 4–15 (2009). 21. Li, H. & Durbin, R. Inference of human population history from individual whole- ALX1-B haplotype associated with blunt beaks has a long evolutionary genome sequences. Nature 475, 493–496 (2011). history (hundreds of thousands of years), because its origin predates 22. Rivera-Perez, J. A., Wakamiya, M. & Behringer, R. R. Goosecoid acts cell the radiation of vegetarian, tree and ground finches (Fig. 3c). This hap- autonomously in mesenchyme-derived tissues during craniofacial development. Development 126, 3811–3821 (1999). lotype is fixed or nearly fixed in two ground finches with blunt beaks, 23. Rowe, A., Richman, J. M. & Brickell, P. M. Retinoic acid treatment alters the G. magnirostris and G. conirostris on Espan˜ola, and it co-segregates with distribution of retinoic acid receptor-b transcripts in the embryonic chick face. variation in beak shape in G. fortis. As previously documented in do- Development 111, 1007–1016 (1991). mestic animals33 and natural populations34, the haplotype might have 24. Uz, E. et al. Disruption of ALX1 causes extreme microphthalmia and severe facial clefting: expanding the spectrum of autosomal-recessive ALX-related frontonasal evolved by accumulating both coding and regulatory changes affecting dysplasia. Am. J. Hum. Genet. 86, 789–796 (2010). ALX1 function. Natural selection and introgression affecting this locus 25. Dee, C. T., Szymoniuk, C. R., Mills, P. E. D. & Takahashi, T. Defective neural crest have contributed to the diversification of beak shapes among Darwin’s migration revealed by a zebrafish model of Alx1-related frontonasal dysplasia. Hum. Mol. Genet. 22, 239–251 (2013). finches and hence to their expanded utilization of food resources on 26. Brugmann, S. A. et al. Comparative gene expression analysis of avian embryonic Gala´pagos. facial structures reveals new candidates for human craniofacial disorders. Hum. Mol. Genet. 19, 920–930 (2010). Online Content Methods, along with any additional Extended Data display items 27. Sommer, P., Napier, H. R., Hogan, B. L. & Kidson, S. H. Identification of Tgfb1i4 as a and Source Data, are available in the online version of the paper; references unique downstream target of Foxc1. Dev. Growth Differ. 48, 297–308 (2006). to these sections appear only in the online paper. 28. Wang, J. et al. Factorbook.org: a Wiki-based database for transcription factor- binding data generated by the ENCODE consortium. Nucleic Acids Res. 41, Received 9 October; accepted 31 December 2014. D171–D176 (2013). Published online 11 February 2015. 29. Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols 4, 1. Schluter, D. The Ecology of Adaptive Radiation (Oxford Univ. Press, 2000). 1073–1081 (2009). 2. Seehausen, O. African cichlid fish: a model system in adaptive radiation research. 30. Grant, P. R. & Grant, B. R. 40 Years of Evolution. Darwin’s Finches on Daphne Major Proc. R. Soc. B 273, 1987–1998 (2006). Island (Princeton Univ. Press, 2014). 3. Lack, D. Darwin’s Finches (Cambridge Univ. Press, 1947). 31. Boag, P. T. Growth and allometry of external morphology in Darwin’s finches 4. Grant, P. R. Ecology and Evolution of Darwin’s Finches (Princeton Univ. Press, 1999). (Geospiza)onIslaDaphneMajor,Gala´pagos. J. Zool. 204, 413–441 (1984). 5. Grant, P. R. & Grant, B. R. How and Why Species Multiply. The Radiation of Darwin’s 32. The Heliconius Genome Consortium. Butterfly genome reveals promiscuous Finches (Princeton Univ. Press, 2008). exchange of mimicry adaptations among species. Nature 487, 94–98 (2012). 6. Petren, K., Grant, P. R., Grant, B. R. & Keller, L. F. Comparative landscape genetics 33. Andersson, L. Molecular consequences of animal breeding. Curr. Opin. Genet. Dev. and the adaptive radiation of Darwin’s finches: the role of peripheral isolation. Mol. 23, 295–301 (2013). Ecol. 14, 2943–2957 (2005). 34. Linnen, C. R. et al. Adaptive evolution of multiple traits through multiple mutations 7. Ali, J. R. & Aitchison, J. C. Exploring the combined role of eustasy and oceanic island at a single gene. Science 339, 1312–1316 (2013). thermal subsidence in shaping biodiversity on the Gala´pagos. J. Biogeogr. 41, 35. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and 1227–1241 (2014). yeast genomes. Genome Res. 15, 1034–1050 (2005). 8. Geist, D., Snell, H., Snell, H., Goddard, C. & Kurz, M. in The Gala´pagos: A Natural Laboratory for the Earth Sciences (eds Harpp K. S., Mittelstaedt E., d’Ozouville N., & Supplementary Information is available in the online version of the paper. Graham, D.) 145–166 (American Geophysical Union, 2014). Acknowledgements The National Science Foundation (USA) funded the collection of 9. Farrington, H. L., Lawson, L. P., Clark, C. M. & Petren, K. The evolutionary history of material under permits from the Gala´pagos and Costa Rica National Parks Services, Darwin’s finches: speciation, gene flow, and introgression in a fragmented and in accordance with protocols of Princeton University’s Animal Welfare Committee. landscape. Evolution 68, 2932–2944 (2014). The map and images of finch heads are reproduced with permission from Princeton 10. Abzhanov, A., Protas, M., Grant, B. R., Grant, P. R. & Tabin, C. J. Bmp4 and University Press. The project was supported by the Knut and Alice Wallenberg morphological variation of beaks in Darwin’s finches. Science 305, 1462–1465 Foundation. Sequencing was performed by the SNP&SEQ Technology Platform, (2004). supported by Uppsala University and Hospital, SciLifeLab and Swedish Research 11. Abzhanov, A. et al. The calmodulin pathway and evolution of elongated beak Council (80576801 and 70374401). Computer resources were supplied by UPPMAX. morphology in Darwin’s finches. Nature 442, 563–567 (2006). 12. Mallarino, R. et al. Two developmental modules establish 3D beak-shape variation Author Contributions P.R.G. and B.R.G. collected the material. L.A., P.R.G. and B.R.G. in Darwin’s finches. Proc. Natl Acad. Sci. USA 198, 4057–4062 (2011). conceived the study. L.A. and M.T.W. led the bioinformatic analysis of data. S.L. and J.B. 13. Burns, K. J. et al. Phylogenetics and diversification of tanagers (Passeriformes: performed the bioinformatic analysis with contributions from M.S.A., K.M., M.G., A.M.-B., Thraupidae), the largest radiation of Neotropical songbirds. Mol. Phylogenet. Evol. C.-J.R. and N.Z. M.P. and C.W. performed experimental work. L.A., S.L., J.B., B.R.G., P.R.G. 75, 41–77 (2014). and M.T.W. wrote the paper with input from the other authors. All authors approved the 14. Zhang, G., Parker, P., Li, B., Li, H. & Wang, J. The genome of Darwin’s finch (Geospiza manuscript before submission. fortis). GigaScience, http://dx.doi.org/10.5524/100040 (3 August 2012). 15. Ellegren, H. The evolutionary genomics of birds. Annu. Rev. Ecol. Evol. Syst. 44, Author Information The Illumina reads have been submitted to the short reads archive 239–259 (2013). (http://www.ncbi.nlm.nih.gov/sra) under accession number PRJNA263122 and the 16. Balakrishnan, C. N. & Edwards, S. V. Nucleotide variation, linkage disequilibrium consensus sequence for the G. fortis mtDNA has been submitted to GenBank under and founder-facilitated speciation in wild populations of the zebra finch accession number KM891730. Reprints and permissions information is available at (Taeniopygia guttata). Genetics 181, 645–660 (2009). www.nature.com/reprints. The authors declare no competing financial interests. 17. Swarth, H. S. The avifauna of the Galapagos Islands. Occ. Pap. Calif. Acad. Sci. 18, Readers are welcome to comment on the online version of the paper. Correspondence 1–299 (1931). and requests for materials should be addressed to L.A. ([email protected]).

19 FEBRUARY 2015 | VOL 518 | NATURE | 375 ©2015 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

METHODS statistic43 as implemented for polymorphic sites19.TheD statistics were transformed Study samples. No statistical methods were used to predetermine sample size. to Z scores by division with the standard error, which was calculated with a jackknife Blood samples from a total of 200 individuals of Darwin’s finches, captured in procedure. Blocks of 40,000 variable sites for autosomes and 10,000 for the Z mist nets and then released, were collected on FTA papers and stored at 270 uC were used in the jackknife to overcome the effect of linkage disequi- librium, which yielded 1,027 and 291 blocks, respectively. The Z scores were trans- until DNA preparation. These included all 15 species of Darwin’s finches cur- 44 rently present on the Gala´pagos and Cocos Island, and two closely related tana- lated to two-sided P values that were Holm–Bonferroni-corrected for multiple gers from Barbados used as outgroups13. Details on the name of each species, the testing by stepwise division of the lowest P value with the remaining number of specific island where they were sampled and the total number of individuals sam- tests performed for all 1,768 possible tests in the phylogeny and the two tests with pled from each species are in Extended Data Table 2 and phenotype descriptions pooled species (Supplementary Table 2). Mutation rates. We used the following previously reported estimated mutation of each species are in Extended Data Table 1. 29 Whole-genome sequencing. DNA was isolated from pieces of FTA papers using rates for nuclear and mtDNA: nuclear DNA, 2.04 3 10 per site per year estimated DNeasy tissue kit (QIAGEN). Each DNA sample was uniquely tagged with a se- from the synonymous mutation rate on the Darwin’s finches’ lineage since the split 45 quence index during multiplexing library preparation protocol. The libraries (aver- from zebra finch ; mtDNA, a fossil-calibrated divergence rate of 2.1% per million 46 age fragment size about 400 bp) were sequenced using Illumina Hiseq2000 sequencers years for bird cytochrome b sequences . and 2 3 100 bp paired-end reads were generated. The amount of sequence per bird Estimation of effective population size. Effective population sizes (Ne) were cal- was targeted to approximately 103 coverage. culated from Watterson’s h (ref. 47) across the whole genome and the above- 37 Reference genome assembly. Sequence reads were aligned to the genome assem- mentioned mutation rate. Fluctuations in Ne were inferred using PSMC and with bly of a female medium ground finch (G. fortis)14. This draft genome assembly has ‘64*1’ as the time interval parameter pattern. Plots were scaled assuming a muta- 28 a size of ,1.07 Gb with scaffold N50 size of ,5.2 Mb and contig N50 size of tion rate per generation of 1.02 3 10 and a generation time of 5 years (ref. 48). ,30 kb. The annotation of the genome included a total of 16,286 protein-coding Dating the nodes in the phylogeny and demographic history. Times of popu- genes. lation splits were calculated with our estimates of genetic distances in the two In addition, as the complete sequence for mtDNA was not previously available subtrees of a node and corrected for the time to coalescence in ancestral popula- 49 for any of the Darwin’s finches, we also generated an assembly of the mtDNA ge- tions and mutation rate. Confidence intervals were estimated from the standard nome sequence. For this, we first mapped all reads from one G. fortis individual deviation of genetic distances estimated from the pairwise species comparisons. against the zebra finch (Taeniopygia guttata) mtDNA. All the aligned reads were We estimated the time of divergence between the blunt and pointed ALX1 hap- locally reassembled using SOAP DENOVO36, and then the gaps between the lotypes by estimating the average pairwise difference at this locus between species contigs were filled using Sanger sequencing to generate a single mtDNA genome containing all blunt and all pointed haplotypes and correcting for mutation rate. sequence of 16.8 kb in length. G. fortis and heterozygous individuals were excluded. Cytochrome b sequences were Sequence alignment and variant calling. The short sequence reads (2 3 100 bp) used to date the mtDNA phylogeny in which the most recently evolved ground were quality checked using FASTQC (http://www.bioinformatics.babraham.ac. finches (that is, G. magnirostris, conirostris, scandens, fortis, fuliginosa and difficilis uk/projects/fastqc/). Then we used BWA37 (version 0.6.2) with default parameters on Genovesa) were treated as one population, with diversities averaged across to map the genomic reads from each individual against the reference genome as- species, because they did not form monophyletic groups according to species. sembly. The alignments were further checked for PCR duplicates using PICARD To elucidate and display the demographic history of Darwin’s finches we used (http://picard.sourceforge.net/). We used Genome Analysis Toolkit (GATK)38 for the pairwise sequentially Markovian coalescent (PSMC) model, which infers fluc- base quality recalibrations, insertion/deletion (INDEL) realignment, SNP and INDEL tuations in effective population size over evolutionary time from a single genome discovery and genotyping across all 120 samples simultaneously according to GATK sequence21. best practice recommendations39,40. Signatures of selection for beak diversification. We scanned the whole genome Quality filtering of the raw variant calls was done according to an in-house in non-overlapping 15-kb windows to identify regions with increased genetic di- filtering pipeline that excluded a variant as low quality if it did not satisfy the fol- vergence (FST) between species with blunt and pointed beaks. We used VCFtools lowing cut offs for filtering: SNP quality . 100, base quality . 30, mapping qual- version 0.1.11 (ref. 50) to calculate FST. The genomic windows with high ZFST ity . 50, haplotype score , 10, Fisher strand bias ,60, mapping quality rank (.6) were analysed for gene content. sum . 24.0, read position rank sum . 22.0, quality by depth . 2.0, minimum ALX1 genotyping in additional samples. A Taqman SNP genotyping assay (Life depth (summing all 120 samples) . 125, and maximum depth (summing all 120 Technologies) was designed for one SNP (A/C at nucleotide position 517,149 bp samples) , 1,875. These parameters are explained in detail in the GATK user in scaffold JH739921) diagnostic for the ALX1 haplotypes associated with blunt manual39. The cut-offs were chosen on the basis of the distribution of each of these and pointed beaks. A standard TaqMan Allele discrimination assay was performed parameters from the raw variant calls generated by the GATK UnifiedGenotyper using an Applied Biosystems 7900 HT real-time PCR instrument. The association module. The missing and low quality genotypes from the call set were inferred of individual genotypes with beak shape measurements was evaluated using stand- separately for each population using BEAGLE (version 3.3.2)41. Finally, we retained ard linear regression in R. 44,753,624 variable sites in the data set. The variant calling in mtDNA was also Comparison of ALX1 protein sequences among vertebrates. The ALX1 protein performed using a similar BWA and GATK pipeline as described above. We iden- sequence for G. fortis was downloaded from NCBI (XP_005421635). This G. fortis tified 1,429 mtDNA variable sites in mtDNA. We calculated the average nucleo- protein is a representative for the pointed allele and was edited to create a blunt tide diversity for autosomes, Z and W, and in the mtDNA genome counterpart by introducing the two amino-acid substitutions (L112P and I208V). separately to estimate the amount of genetic variation in each population in differ- ALX1 protein sequences from other species were collected from predicted ortho- ent parts of the genome. logues of the chicken ALX1 gene in Ensembl51, including representative species Identification of scaffolds from chromosomes Z and W. The medium ground- from teleosts, reptiles, birds and mammals. The protein sequences were aligned finch genome assembly contains 27,239 scaffolds unassigned to chromosomes. We using MUSCLE52 (version 3.8.31) with default settings, and the multiple sequence used the MultiSV package to identify scaffolds that belong to chromosomes Z and alignment was viewed and edited using Jalview29,53. The probability of functional W by comparing the read depth for each scaffold in 85 males and 35 females. This consequences of amino-acid substitutions was predicted using SIFT29 with the analysis identified 133 scaffolds, which belonged to chromosome Z with a total length multiple sequence alignment as input after exclusion of the blunt allele. Both sub- of 67,176,652 bp (Supplementary Table 1a), and 662 scaffolds, which belonged to stitutions were predicted to be damaging with probability scores of 0.03, where a chromosome W with a total length of 643,111 bp (Supplementary Table 1b). score less than 0.05 is considered significant. Both predictions were reported to Estimation of genetic distance and phylogeny reconstruction. We used PLINK have a low confidence due to limited divergence in the alignment. However, we (version 1.07)42 to calculate genetic distance (on the basis of proportion of alleles argue that because we have sampled orthologues from such a diverse set of species identical by state) for all pairs of individuals separately for autosomes and the where ALX1 displays considerable conservation, these predictions can be viewed Z chromosome. We used the neighbour-net method of SplitsTree4 (http://www. with greater confidence. Protein domains were predicted with Interpro scan54 using splitstree.org/) to compute the phylogenetic network from genetic distances. We the G. fortis ALX1 protein sequence. used FastTree to infer approximately maximum-likelihood phylogenies with stan- Functional annotation of SNPs. NCBI’s genome annotation for the G. fortis as- dard parameters for nucleotide alignments of variable positions in the data set sembly (GeoFor1) was downloaded from NCBI’s FTP server (ftp://ftp.ncbi.nlm. (http://meta.microbesonline.org/fasttree/). FastTree computes local support values nih.gov/genomes/Geospiza_fortis/) in GFF format. The annotation was filtered with the Shimodaira–Hasegawa test. to include only genes annotated with a coding sequence (13,949 genes with 16,365 ABBA–BABA analysis. Patterns of gene flow and the extent of admixture in pop- transcripts) before using it to build a local SnpEff (version 3.4) database55. The ulations were analysed and tested for asymmetry in the frequencies of discordant SnpEff database was subsequently used to annotate all detected sequence variants gene trees in a three-population phylogeny rooted with an outgroup using the D among the Darwin’s finches with putative functional effects according to categories

©2015 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH defined in the SnpEff manual. The upstream and downstream categories are re- 46. Weir, J. T. & Schluter, D. Calibrating the avian molecular clock. Mol. Ecol. 17, gions within 5,000 bp in the respectivedirection of an annotated gene. SnpEff allows 2321–2328 (2008). SNPs to be included in multiple categories; for example, a SNP may be intronic in 47. Watterson, G. A. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7, 256–276 (1975). one gene and a synonymous change in another gene residing in the intron of the 48. Grant, B. R. & Grant, P. R. Demography and the genetically effective sizes of two first gene. populations of Darwin’s finches. Ecology 73, 766–784 (1992). 49. Nei, M. in Molecular Evolutionary Genetics 276–279 (Columbia Univ. Press, 1987). 36. Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read 50. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, de novo assembler. GigaScience 1, 18 (2012). 2156–2158 (2011). 37. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler 51. Vilella, A. J. et al. EnsemblCompara GeneTrees: complete, duplication-aware transform. Bioinformatics 25, 1754–1760 (2009). phylogenetic trees in vertebrates. Genome Res. 19, 327–335 (2009). 38. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for 52. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 throughput. Nucleic Acids Res. 32, 1792–1797 (2004). (2010). 39. DePristo, M. A. et al. A framework for variation discovery and genotyping using 53. Waterhouse, A. M., Procter, J. B., Martin, D. M. A., Clamp, M. & Barton, G. J. Jalview next-generation DNA sequencing data. Nature Genet. 43, 491–498 (2011). version 2—a multiple sequence alignment editor and analysis workbench. 40. Van der Auwera, G. A. et al. From FastQ data to high-confidence variant calls: the Bioinformatics 25, 1189–1191 (2009). Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinform. 43, 54. Jones, P. et al. InterProScan 5: genome-scale protein function classification. 11.10.1–11.10.33 (2002). Bioinformatics 30, 1236–1240 (2014). 41. Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and 55. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila missing-data inference for whole-genome association studies by use of localized 1118 haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007). melanogaster strain w ; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012). 42. Purcell, S. et al. PLINK: a tool set for whole-genome association and population- 56. Grant, B. R., Grant, P. R. & Petren, K. The allopatric phase of speciation: the sharp- based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). beaked ground finch (Geospiza difficilis) on the Gala´pagos islands. Biol. J. Linn. Soc. 43. Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 69, 287–317 (2000). 710–722 (2010). 57. Grant, P. R., Abbott, I., Schluter, D., Curry, R. L. & Abbott, L. K. Variation in the size 44. Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, and shape of Darwin’s finches. Biol. J. Linn. Soc. 25, 1–39 (1985). 65–70 (1979). 58. Schluter, D. & Grant, P. R. Ecological correlates of morphological evolution in a 45. Rands, C. et al. Insights into the evolution of Darwin’s finches from comparative Darwin’s finch, Geospiza difficilis. Evolution 38, 856–869 (1984). analysis of the Geospiza magnirostris genome sequence. BMC Genomics 14, 95 59. Rabosky, D. Diversity-dependence, ecological speciation, and the role of (2013). in macroevolution. Ann. Rev. Evol. Ecol. Syst. 44, 481–502 (2013).

©2015 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

Extended Data Figure 1 | Read depth. Average read depth in all 120 samples of Darwin’s finches and outgroup species.

©2015 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH

Extended Data Figure 2 | Genetic diversity among Darwin’s finches. Heat map illustrating the proportion of shared and fixed polymorphisms among Darwin’s finches and outgroup species.

©2015 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

Extended Data Figure 3 | Network tree for the Darwin’s finches on the basis Species Multiply: The Radiation of Darwin’s Finches by Peter R. Grant & of all autosomal sites. Taxa that showed deviations from classical taxonomy B. Rosemary Grant. Copyright ß 2008 Princeton University Press. Reprinted are underscored. Finch heads are reproduced from ref. 5. How and Why by permission.

©2015 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH

Extended Data Figure 4 | Taxonomy and rate of speciation. likelihood P 5 0.02). Maximum discrimination was achieved by entering three a, Morphological variation among populations of ground finch (Geospiza) variables in the sequence beak width, beak length and body size (weight or species, scandens, fuliginosa and three others, acutirostris, difficilis and wing). Substituting beak depth for beak width gave the same result. No other septentrionalis, that were formerly classified as a single species (difficilis). Data variable entered significantly. Data are from ref. 57, except for scandens and are from refs 56, 57, and from ref. 58 for weights and measures of difficilis on magnirostris data from ref. 30. c, Species accumulation on a log scale as a Fernandina. b, Morphological variation among populations of G. scandens, function of time before the present, dating based on mtDNA. Species are conirostris, propinqua and magnirostris assessed by multiple discriminant expected to accumulate linearly according to a ‘birth–death’ process, eventually function analysis in JMP version 9. In a discriminant function analysis of the declining under a density- (diversity-) dependent mechanism59. measured variables, all populations were correctly identified to species (22log

©2015 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

Extended Data Figure 5 | Phylogenies for mtDNA and the sex cytochrome b sequence6,9. b, Maximum-likelihood trees based on all Z-linked chromosomes Z and W. a, Tree based on mtDNA sequences. The dating of the sites; all nodes having full local support on the basis of the Shimodaira– nodes and their variances (in thousands of years) is based on the cytochrome b Hasegawa test are marked by asterisks. c, Tree based on W sequences, only sequences using the fossil-calibrated divergence rate 2.1% per million years females. Taxa that showed deviations from classical taxonomy are underscored for birds46. This tree based on the full mtDNA sequences shows only minor (applies to a–c). differences compared with previously published trees based only on the

©2015 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH

Extended Data Figure 6 | ABBA–BABA analysis and demographic history. percentage and as the actual number. The D statistic and corresponding a, ABBA–BABA analysis of G. magnirostris, G. conirostris on Espan˜ola and on Holm–Bonferroni-corrected P value are also given for testing the null Genovesa, and with L. noctis as outgroup. b, Comparison of C. olivacea, hypothesis of symmetry in genetic relationships. Finch heads are reproduced C. fusca, a pool of all non-warblers, and with L. noctis as outgroup. The number from ref. 5. c, PSMC analysis21 of all species except the G. difficilis group. of informative sites supporting the different trees is indicated both as a d, PSMC analysis of the G. difficilis group.

©2015 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

Extended Data Figure 7 | Sequence conservation of ALX1. Amino-acid alignment of the complete ALX1 sequence among different vertebrates. Amino-acid substitutions between ALX1 alleles associated with blunt and pointed beaks are highlighted. The homeobox domain is indicated.

©2015 Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH

Extended Data Table 1 | Phenotypic description of Darwin’s finches

©2015 Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE

Extended Data Table 2 | Summary of samples of Darwin’s finches and outgroup species

©2015 Macmillan Publishers Limited. All rights reserved