Molecular Phylogenetics and Evolution 102 (2016) 233–245

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution

journal homepage: www.elsevier.com/locate/ympev

Multiple instances of paraphyletic and cryptic taxa revealed by mitochondrial and nuclear RAD data for (Aves: Alaudidae) ⇑ ⇑ Martin Stervander a,b, , Per Alström c,d,e, , Urban Olsson f, Ulf Ottosson b, Bengt Hansson a, Staffan Bensch a a Molecular Ecology and Evolution Lab, Dept of Biology, Lund University, Ecology Building, SE-223 62 Lund, Sweden b AP Leventis Ornithological Research Institute, Laminga, Jos East, Jos, Plateau, Nigeria c Department of Ecology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18 D, SE-752 36 Uppsala, Sweden d Swedish Species Information Centre, Swedish University of Agricultural Sciences, Box 7007, SE-750 07 Uppsala, Sweden e Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China f Department of Biological and Environmental Sciences, University of Göteborg, Box 463, SE-405 30 Göteborg, Sweden article info abstract

Article history: The avian Calandrella (larks) was recently suggested to be non-monophyletic, and was divided into Received 9 October 2015 two genera, of which Calandrella sensu stricto comprises 4–5 species in Eurasia and Africa. We analysed Revised 7 May 2016 mitochondrial cytochrome b (cytb) and nuclear Restriction-site Associated DNA (RAD) sequences from all Accepted 24 May 2016 species, and for cytb we studied 21 of the 22 recognised , with the aim to clarify the phyloge- Available online 26 May 2016 netic relationships within the genus and to compare large-scale nuclear sequence patterns with a widely used mitochondrial marker. Cytb indicated deep splits among the currently recognised species, although Keywords: it failed to support the interrelationships among most of these. It also revealed unexpected deep diver- Cryptic taxa gences within C. brachydactyla, C. blanfordi/C. erlangeri, C. cinerea, and C. acutirostris. It also suggested that Phylogeny RAD both C. brachydactyla and C. blanfordi, as presently circumscribed, are paraphyletic. In contrast, most of SNAPP the many subspecies of C. brachydactyla and C. cinerea were unsupported by cytb, although two popula- Museum specimens tions of C. cinerea were found to be genetically distinct. The RAD data corroborated the cytb tree (for the smaller number of taxa analysed) and recovered strongly supported interspecific relationships. However, coalescence analyses of the RAD data, analysed in SNAPP both with and without an outgroup, received equally strong support for two conflicting topologies. We suggest that the tree rooted with an outgroup – which is not recommended for SNAPP – is more trustworthy, and suggest that the reliability of analyses performed without any outgroup species should be thoroughly evaluated. We also demonstrate that degraded museum samples can be phylogenetically informative in RAD analyses following careful bioin- formatic treatment. We note that the genus Calandrella is in need of taxonomic revision. Ó 2016 Elsevier Inc. All rights reserved.

1. Introduction Donsker, 2014). Alström et al. (2013) presented the first compre- hensive phylogeny of the family, which was based on two mito- The avian family Alaudidae, larks, comprises 93–97 species in chondrial and three nuclear loci for >80% of all species and 21 genera (Dickinson and Christidis, 2014; Gill and Donsker, representatives from all recognised genera (though not all loci 2014). Larks are found on all continents except Antarctica. were available for all species), as well as mitochondrial data for Although the majority of the species occur in Africa, followed by multiple subspecies. They found several non-monophyletic genera Eurasia, only one species each occur in Australia and the Americas and multiple cases of unpredicted deep divergences among taxa (de Juana et al., 2004; Dickinson and Christidis, 2014; Gill and that were treated as conspecific, as well as shallow splits between some recognised species. They concluded that ‘‘few groups of show the same level of disagreement between based on ⇑ Corresponding authors at: Molecular Ecology and Evolution Lab, Dept of Biology, Lund University, Ecology Building, SE-223 62 Lund, Sweden (M. Stervan- morphology and phylogenetic relationships as inferred from DNA der). Department of Animal Ecology, Evolutionary Biology Centre, Uppsala Univer- sequences”, and proposed a revised generic classification. sity, Norbyvägen 18 D, SE-752 36 Uppsala, Sweden (P. Alström). Although the notion of cryptic species is several centuries old, E-mail addresses: [email protected] (M. Stervander), per.alstro- reports of cryptic diversity have dramatically increased after the [email protected] (P. Alström). http://dx.doi.org/10.1016/j.ympev.2016.05.032 1055-7903/Ó 2016 Elsevier Inc. All rights reserved. 234 M. Stervander et al. / Molecular Phylogenetics and Evolution 102 (2016) 233–245 introduction of PCR methods (Bickford et al., 2007). The detection 2. Material and methods of cryptic diversity is of great importance to science as well as con- servation, since the lack thereof severely can misguide conserva- 2.1. Study group tion policy (Bickford et al., 2007; Trontelj and Fišer, 2009). Whereas cryptic diversity has been unravelled across the animal We analysed all of the species in the genus Calandrella (sensu kingdom, the frequency at which it occurs varies (Trontelj and Alström et al., 2013) as well as all of the subspecies except C. Fišer, 2009). Birds are a class offering relatively few such discover- brachydactyla orientalis (Dickinson and Christidis, 2014; Gill and ies, and the rate of discovery of cryptic diversity is, for example, Donsker, 2014), in total 115 individuals (Table S1). Taxonomy fol- twice as high in mammals and three times higher in amphibians lows Gill and Donsker (2014), except that we follow Ryan (2004) (Trontelj and Fišer, 2009). Most larks are cryptically coloured and regarding the subspecies within C. cinerea. It should be noted that patterned, and many of the species are renowned for being difficult the taxonomic annotation of museum specimens is not always reli- to distinguish (de Juana et al., 2004). Surprisingly few taxonomic able, and we have classified the taxa according to current taxon- revisions have been undertaken using molecular and/or vocal data. omy (Ryan, 2004) and known distributions (Gombobaatar and Nearly all of those have proposed taxonomic splits (Alström, 1998; Monks, 2011; BirdLife International and NatureServe, 2014; pers. Ryan et al., 1998; Ryan and Bloomer, 1999; Guillaumet et al., 2005, obs.). 2006, 2008; Alström et al., 2013), except one recent paper that advocated the lumping of two species (Spottiswoode et al., 2.2. Laboratory procedures 2013). As remarked by Alström et al. (2013), it seems likely that the number of presently recognised species is underestimated. Tissue samples were taken from toe pads of museum specimens One of the lark genera suggested by Alström et al. (2013) to be at the Natural History Museum London (Tring; BMNH), the Royal non-monophyletic was Calandrella. This genus was separated into Museum for Central Africa Tervuren (RMCA), the Natural History two non-sister clades: (1) C. cinerea, C. brachydactyla and C. acu- Museum of Denmark (ZMUC), the American Museum of Natural tirostris in a clade sister to the genus (‘horned larks’), History (AMNH), and the University of Michigan Museum of Zool- and (2) C. raytal, C. rufescens, C. cheleensis and C. athensis in a sister ogy (UMMZ). position to a clade comprising the two monotypic genera Cher- Blood samples from live birds and tissue samples from dead sophilus and Eremalauda. Based on morphology, all of these rela- embryos were extracted using standard phenol-chloroform proto- tionships were totally unexpected. By priority, the name col (Sambrook and Russell, 1989), whereas tissue samples from Calandrella was restricted to the first of these clades, while the museum specimens were digested overnight at 55 °C in 100 ll available name was resurrected for the second Calandrella lysis buffer (0.1 M Tris, 0.005 EDTA, 0.2% SDS, 0.2 M NaCl, pH 8.5) clade. Alström et al. (2013) also found C. brachydactyla dukhunensis with 3–6 ll proteinase K (10 mg/ml) and then precipitated with to be anciently separated from the other C. brachydactyla sub- ethanol. species, and actually sister to C. acutirostris – again completely We used Qiagen Multiplex PCR Kit (Qiagen Inc.), with amplifica- unpredicted based on morphological assessments. tion reactions containing 5 ll Qiagen Multiplex PCR Master Mix, The genus Calandrella (sensu Alström et al., 2013, subsequently 0.2 ll each of 10 lM forward and reverse primer (Table S2), 2 ll followed by Dickinson and Christidis (2014) and Gill and Donsker template DNA (5–10 ng/ll), and 2.6 ll water. We ran the PCR reac- (2014)) comprises four (Dickinson and Christidis, 2014) or five tions for activation at 95 °C for 15 min; 30–45 three step cycles (Gill and Donsker, 2014) species. The taxonomy has been much with denaturation at 94 °C for 30 s, annealing at varying tempera- debated over the years. Formerly, C. brachydactyla was considered tures for 90 s (Table S2), and extension at 72 °C for 90 s; final conspecific with C. cinerea under the latter name (e.g. extension at 72 °C for 10 min. PCR products were checked on a

Meinertzhagen, 1951; Vaurie, 1959; Mayr and Greenway, 1960). 2% agarose gel, precipitated with NH4Ac and ethanol, and then dis- Based on widely allopatric distributions, different migratory beha- solved in 10–25 ml of water. We used 2 ll for sequencing of suc- viour and plumage differences, Voous (1960) and Hall and Moreau cessful amplifications (BigDye sequencing kit; Applied (1970) proposed to treat these as separate species, and this has Biosystems) in an ABI Prism 3100 capillary sequencer (Applied been followed by most subsequent authors (e.g. Glutz von Biosystems). Blotzheim and Bauer, 1985; Cramp, 1988; Sibley and Monroe, The DNA sequence of the full cytb gene was obtained from sam- 1990; Keith et al., 1992). In addition, Mayr and Greenway (1960) ples of wild-caught birds using amplification primers ND5-Syl separated C. blanfordi as a monotypic species; Sibley and Monroe (Stervander et al., 2015) and mtF-NP (Fregin et al., 2009), and inter- (1990) agreed with this and also split off C. erlangeri (monotypic). nal sequencing primers Cytb_seq_H15541 (Stervander et al., 2015) In contrast, Dickinson and Christidis (2014) treated C. blanfordi as a and Cytb_seq_L15383 (this study; see Table S2). Owing to DNA polytypic species (with subspecies C. b. blanfordi in Eritrea; C. b. degradation in the museum specimens, only shorter DNA frag- daaroodensis in northern Somalia; C. b. eremica in south-western ments could be amplified, and we therefore designed specific Arabia; and C. b. erlangeri in the highlands of Ethiopia), whereas primers based on the sequences of C. cinerea and C. brachydactyla Gill and Donsker (2014) recognised C. blanfordi with three sub- (Table S2). For some museum samples, only a 356 basepair (bp) species and C. erlangeri as a monotypic species. None of these fragment was sequenced, and specifically to type a number of C. authors gave any reasons for their treatments. acutirostris in Kashmir, a 107 bp diagnostic fragment was We here study all species and all except one of the subspecies in sequenced. PCR success was screened on 2% agarose gels. the genus Calandrella (sensu Alström et al., 2013) using full-length We performed paired-end Restriction site-Associated DNA cytochrome b (cytb) sequences for a sample of 46 individuals (RAD) sequencing on fresh samples representing C. a. acutirostris, (all species but not all subspecies), a mix of full-length and short C. a. tibetana, C. brachydactyla dukhunensis, C. b. longipennis, C. b. cytb sequences for 114 individuals (all taxa) and Restriction-site rubiginosa, C. c. cinerea, C. c. williamsi, and C. c. saturatior from Nige- Associated DNA (RAD) sequences for 12 individuals (all species, a ria, as well as Eremophila alpestris as an outgroup. few subspecies). We aim to clarify the phylogenetic relationships Library preparation followed Baird et al. (2008) and Etter et al. within the genus, to investigate the recently suggested paraphyly (2011) with the following modifications: We considered no RNase of C. brachydactyla, and to compare analyses of large-scale nuclear A treatment necessary; DNA fragments were sheared with a sequence data with a mitochondrial marker. Bioruptor Standard UCD-200 (Diagenode) and selected to a size M. Stervander et al. / Molecular Phylogenetics and Evolution 102 (2016) 233–245 235 range comprising 150–750 bp (length excluding adapters); PCR 2.4. Bioinformatic pipeline for nuclear RAD data reactions were set up at 50 ll with 16 ll RAD library template run- ning 16 cycles. Such protocol, with more library template and The paired raw reads were first filtered using the process_rad- fewer cycles, leads to more P2–P2 fragments but fewer PCR clones. tags and clone_filter components of Stacks v. 1.17 (Catchen et al., We used the restriction enzyme SbfI and the samples were multi- 2013). There is no genome assembly available for Calandrella larks, plexed in a library with 16 samples (together with samples from thus raw reads must either be assembled de novo, or mapped to a other projects); the library was then sequenced on one Illumina genome assembly of another species. The remarkably conserved HiSeq 2000 channel at BGI, Hong Kong. synteny of the avian karyotype in most orders (Romanov et al., Furthermore, we evaluated the inclusion of degraded museum 2014; Zhang et al., 2014) allows for such interspecific mapping samples of C. erlangeri and C. blanfordi daaroodensis into the library. (e.g. Stervander et al., 2015). On the one hand, this results in While extractions had yielded reasonable DNA concentrations, an excluding loci that are too diverged from the mapping target spe- analysis with a DNA1000 chip for BioAnalyzer 2100 (Agilent) cies. On the other hand, it effectively avoids problems of e.g. dupli- revealed that the C. erlangeri and C. blanfordi museum samples cated sequences, which can be much harder to detect when were heavily degraded, with few fragments exceeding 50 bp in assembling raw RAD reads de novo. Furthermore, even though length. Effectively, this made for a very low concentration of some genomic rearrangements are expected between different DNA that could be included in our RAD sequencing libraries species, mapping towards a reference genome allows a coarse together with DNA from fresh samples (with longer fragments), subset selection of SNPs based on genomic position. We thus even without shearing. In order to be able to include the museum mapped the filtered reads to the Zebra Finch Taeniopygia guttata samples at all, we therefore modified the protocol as follows: The genome (assembly taeGut3.2.4, downloaded from the UCSC gen- samples were first whole-genome amplified using a REPLI-g Mini ome website http://genome.ucsc.edu/) with Bowtie v. 2.2.2 Kit (Qiagen) and no shearing was performed. (Langmead and Salzberg, 2012). Given the interspecific mapping we set maximum and minimum mismatch penalties (--mp)to2 2.3. Phylogenetic analyses of cytochrome b and 1, respectively. Further, we only accepted paired reads map- ping in the expected relative direction within 150–1,000 bp from All cytb sequences were manually inspected, assembled, edited each other (--no-discordant, --no-mixed). From this point, two sep- and aligned in Geneious v. 5–6 (Biomatters), and only acceptable arate paths were followed for preparing data for concatenated sequences were retained (occasionally with ambiguity codes intro- sequence super-matrices and SNP-based analyses, respectively. duced for specifically problematic base calls). We created a full For the sequence-based analyses, we removed read 2 from the alignment (all 114 samples irrespective of sequence length; align- resulting sam files and used read 1 as input to Stacks, following ment FULL) and one alignment only containing sequences >900 bp the ref_map track for phylogenetic and coalescent analyses. We (46 individuals; alignment LONG). As outgroup, we used either used default parameters except for setting a higher minimum Horned Lark E. alpestris and Teminck’s Horned Lark E. bilopha, from sequencing coverage to call a locus in a given sample (m = 4). the sister clade to Calandrella (Alström et al., 2013), or these two We filtered the output so that it only contained loci (i) in which and three other closely related lark species pairs (Agulhas Long- all samples had been assigned 1–2 alleles; (ii) which contained billed Lark brevirostris, Eastern Long-billed Lark C. semi- 1–10 single nucleotide polymorphisms (SNPs); (iii) which com- torquata, cristata, Maghreb Lark G. prised 612 alleles. The rationale for these filters was to eliminate macrorhyncha, mongolica and Tibe- any loci in which individuals had been assigned more than two tan Lark M. tibetana), as well as the more distantly related Bearded alleles, and/or which were remarkably variable (cut-off values Reedling Panurus biarmicus and Banded Prinia Prinia bairdii however assigned arbitrarily after inspecting data). By doing so, (cf. Alström et al., 2013). we may have traded off erroneous exclusion of highly variable loci Substitution models were evaluated with jModelTest v. 2.1.4 in order not to include mis-assembled loci. (Guindon and Gascuel, 2003; Darriba et al., 2012), selecting from The Stacks output was then converted to biallelic sequences for 88 available models allowing for rate heterogeneity according to all samples and loci, using custom scripts, and concatenated based four gamma categories and for a proportion of invariable sites. on sample (but with random allele pairing) to long super-matrices. Model selection was performed according to the Bayesian Informa- We included data from all high quality (fresh) samples in a super- tion Criterion (BIC; Schwarz, 1978). The substitution model matrix in which all individuals were included and no missing data selected for both alignments was HKY (Hasegawa et al., 1985) with at any locus was tolerated, comprising 38,038 loci with total length rate variation following a discrete gamma distribution with four of 3.27 million bp (Mbp). rate categories (G; Yang, 1994) and with an estimated fraction of For SNP-based analyses, we merged all individual bam files invariant sites (I; Gu et al., 1995). (including museum samples) and performed variant calling with Cytb gene trees were computed within a Bayesian inference (BI) GATK. Index and dictionary files were prepared with PicardTools, framework with BEAST v. 1.8.2 (Drummond et al., 2012), using a then indels in the bam file were realigned using LeftAlignIndels. Birth-Death tree prior (Gernhard, 2008), and an uncorrelated The UnifiedGenotyper was called and a limitation was set on max- relaxed molecular clock (Drummond et al., 2006) with a rate of imum number of alternate alleles to 9 for the full dataset. 0.0105 ± 0.0005 substitution/site/lineage/my, based on overall cytb Filtering with VCFtools (Danecek et al., 2011) was performed substitution rates for a wide range of avian species (Weir and separately for fresh samples and museum samples. Both were fil- Schluter, 2008). We sampled trees every 1,000 generations, over tered based on variant type (SNP), but minimum sequencing depth 50 million generations, of which the first 25% were discarded as (8) per genotype (individual and site) was required only for fresh burn-in. Every analysis was repeated to ascertain congruence. samples, since sequencing success was generally low for museum The results were inspected using Tracer v. 1.6 (Rambaut et al., samples. For the fresh samples, hereafter called dataset FRESH, sites 2013), ensuring stationarity and effective sample sizes (ESS) of were further filtered to require coverage in all individuals, followed >300. Additionally, the cytb gene was analysed by a maximum like- by requiring no more or less than two alleles (i.e. variable, biallelic lihood (ML) tree search (1000 replicates) and bootstrapping (1,000 sites). We prepared one dataset including the outgroup E. alpestris, replicates) using RAxML version 8.2.4 (Stamatakis, 2014). and one dataset in which only the ingroup was retained. GTRGAMMA was used for the bootstrapping phase and for the final Museum samples (C. blanfordi daaroodensis and C. erlangeri) tree inference. were processed separately, and preliminary phylogenetic analyses 236 M. Stervander et al. / Molecular Phylogenetics and Evolution 102 (2016) 233–245 placed them at a distance 2.6–3.2 times further from other Calan- and a minimum sequence depth of P8 in each individual), which drella samples compared to that of the distance from the latter to were variable among fresh samples (including outgroup). For the the outgroup (Eremophila). This likely demonstrates that the museum samples, this estimate was based on dataset MUSEUM, con- sequenced DNA contained many changed nucleotides, probably taining 11,684 SNPs, after verifying that the heterozygosity levels either owing to errors introduced during whole-genome amplifica- for fresh samples were comparable to those obtained from the tion or by post mortem DNA damage (data not shown). The latter larger dataset FRESH. can be caused by, for example, hydrolytic deamination, strand Theta was co-estimated with the species tree in SNAPP runs. We crosslinks, or oxidative nucleotide damage, all of which can alter combined multiple replicate runs for maximising ESS, all of which the sequenced DNA (cf. Pääbo et al., 2004; Willerslev and Cooper, were ensured to be >200. Theta co-estimation was done for runs in 2005; Stiller et al., 2006). In order to make use of museum speci- which each individual was treated as a terminal taxon, as well as mens for RAD sequence data, despite these being affected by for runs in which clades D1 and A (see Section 3) were grouped. DNA damage, we took the following approach to remove polymor- phic positions likely to be erroneous. SNPs from the dataset FRESH 3. Results (including outgroup) were intersected with nucleotide data from museum samples and only those sites polymorphic in the dataset 3.1. Phylogenetic analyses of cytochrome b

FRESH were retained. In this way we retrieved SNPs covered by all taxa and variable among fresh samples, and excluded SNPs unique Both BI and ML analyses based on the full-length sequence data to the museum samples but monomorphic in other samples. This (Fig. 2) recovered a cytb tree with seven strongly supported procedure was performed separately for C. blanfordi daaroodensis primary clades (labeled A–G). Sister relationships between clades (dataset DAAROODENSIS) and C. erlangeri (dataset ERLANGERI), as well as B + C, D + E and F + G, respectively, were strongly supported, but for the intersection for both (dataset MUSEUM). the other, more ancient, relationships were unsupported. The tree shown in Fig. 2 resulted from a BI analysis where clade ABC was constrained based on the RAD data (see below); unconstrained 2.5. Phylogenomic analyses of nuclear RAD data analysis of these data resulted in a sister relationship between clades A and DE, with very poor support (BI posterior probability The RAD sequence super-matrix was analysed in RAxML (PP)/ML bootstrap proportion (MLBS) <0.5; not shown). v. 8.0.26 (Stamatakis, 2014), using the GTRGAMMA substitution model Clade A comprised all C. brachydactyla subspecies except C. b. (general time-reversible model allowing for rate heterogeneity dukhunensis. Although the divergences within this clade were very according to a gamma distribution). The best maximum likelihood shallow, it was subdivided into a western clade (A1: NW Africa – C. (ML) tree from 1,000 searches was selected, and 1,000 bootstrap b. rubiginosa; SW Europe – C. b. brachydactyla) and an eastern clade replicates were run. Convergence was verified running a posteriori (A2: Kazakhstan, Uzbekistan, Somalia [winter] and India [winter] – bootstrap tests using the autoMRE function (Pattengale et al., C. b. longipennis), with one Greek sample (C. b. brachydactyla)in 2009). each of these two clades. The placement of one C. b. longipennis The datasets DAAROODENSIS, ERLANGERI, and MUSEUM were thinned to sample (64_KAZ) in the ML tree was inconclusive with regard to retain SNPs at least 10 Kbp apart. The dataset FRESH was first divided clades A1 and A2. into three non-overlapping subsets spread over the whole genome, Clade B comprised the Asian C. acutirostris. There was a deep and then each genomic subset was thinned by 50 Kbp. Since phase split, estimated to 1.5 million years ago (mya) (95% highest poste- is unknown, the pairing of alleles between sites was randomised, rior density [HPD] 0.9–2.2 mya), between the samples from the and two replicates of different allele pairing was analysed for each Tibetan plateau (China and India; B1) and Afghanistan (B2), respec- genomic subset of dataset FRESH, while three replicates were anal- tively. Clade C contained our single sample of C. brachydactyla ysed for datasets DAAROODENSIS, ERLANGERI, and MUSEUM. dukhunensis, with an estimated divergence time from its sister All datasets, including genomic subsets and replicated random clade, B, at 3.1 mya (95% HPD 2.1–4.2 mya). allele pairing, were imported into BEAUti (Bouckaert et al., 2014), Clades D and E contained the sub-Saharan C. cinerea, with a where the data were prepared for analyses with the SNAPP v. deep split, estimated at 2.7 mya (95% HPD 1.7–3.7 mya) that sepa- 1.1.16 plugin (Bryant et al., 2012) in BEAST v. 2.1.3 (Bouckaert rated the Nigerian C. c. saturatior (D1) and Kenyan C. c. williamsi et al., 2014). The priors for forward (u) and reverse (v) mutation (D2) from the South African C. c. cinerea and DR Congo C. c. satura- rates were set to be estimated and the remaining parameters were tior (E). The divergence time between clades D1 and D2 was con- left at default values, e.g. species divergence rate k = 0.00765, and h siderably younger (0.8 mya, 95% HPD 0.4–1.3 mya). defined by a c prior with shape parameter a = 11.750 and scale Clade F included C. b. blanfordi (N Eritrea; F1) and C. erlangeri parameter b = 109.73. Runs were carried out for P1.5 million gen- (Ethiopia; F2), separated from clade G, which contained C. blanfordi erations, sampling every 1,000 generations. Output was inspected eremica (Arabia; G1; one sample (44_SAU) inconclusively placed with Tracer v. 1.6 (Rambaut et al., 2013), ensuring stationarity within clade G in ML analyses) and C. b. daaroodensis (N Somalia; and ESS >200 for all parameters (with a few exceptions for single G2), 4.3 mya (95% HPD 3.0–5.6 mya). theta values of internal branches, in single replicates). We first The cytb tree including all sequences, i.e. a mix of full-length analysed the data using each sample as a terminal taxon, in order and short sequences (Fig. 3), recovered the same seven well- to verify whether there were any deviations in topology between supported clades as in the tree based on only the full-length the nuclear and the mitochondrial trees, and then reran the analy- sequences, as well as additional deep splits within clade B. Like ses with clades A (C. brachydactyla) and D1 (Nigerian C. cinerea sat- in Fig. 2, the C. brachydactyla clade (A) was divided into a western uratior; both estimated with cytb being <600 ky old) collapsed into (A1) and an eastern (A2) clade. In addition to the previously men- tree-tip taxa. tioned subspecies, clade A1 also included a sample from Hungary, presumably representing C. b. hungarica, and clade A2 also 2.6. Within-lineage genetic variation included C. b. artemisiana (Georgia, Iran), C. b. hermonensis (Leba- non), a presumed C. b. woltersi (E Turkey), and a C. b. brachydactyla SNP heterozygosity was calculated, for individuals sampled from W Turkey. from fresh material, as the proportion of heterozygous sites in Clade B was divided into more clades than in Fig. 2. Clade B1 dataset FRESH, comprising 215,996 high quality SNPs (full coverage contained three short sequence samples from India, whereas the M. Stervander et al. / Molecular Phylogenetics and Evolution 102 (2016) 233–245 237

Fig. 1. Distribution map and sampling locations. Samples are denoted with coloured symbols; plain symbols represent full (or long) sequences of cytochrome b, symbols with a central dot represent samples sequenced for 300–400 basepairs (bp), and symbols with crosses represent samples sequenced for c. 100 bp. Distribution of resident Calandrella cinerea (orange; clade E circles, clade D squares), C. erlangeri (green diamonds), and C. blanfordi (purple; clade F diamonds, clade G squares), breeding distribution of C. brachydactyla (brown; clade A red circles, clade C red squares) and C. acutirostris (blue; clade B1 circles, clade B2 triangles, clade B3 diamonds, clade B4 squares). Clade names refer to Fig. 3. In the inset, the region holding four distinct C. acutirostris clades is shown (sample numbers refer to Table S2). Distribution data from BirdLife International and NatureServe (2014) modified for C. erlangeri and C. blanfordi according to Ryan (2004) and for eastern C. brachydactyla based on Gombobaatar and Monks (2011) and unpublished personal studies (which primarily exclude the Tibetan plateau from the previously assumed breeding range). full-length Indian sample that fell in this clade in Fig. 2 (110_IND) dactyla samples except the deviating C. b. dukhunensis and both was now placed in clade B4 (though lacking support). Clade B2 Nigerian C. cinerea saturatior samples as single taxa. The SNAPP included six additional Afghan samples. Both the new clades B3 analyses of the dataset FRESH, which included only the fresh Calan- and B4 were comprised exclusively of samples from the Indian part drella samples (i.e. all of the species except C. blanfordi and C. erlan- of the Tibetan plateau. Clades B2 and B3 were sisters, with insignif- geri) and with E. alpestris as outgroup (26,629 SNPs divided evenly icant support (posterior probability [PP] 0.89), and had an esti- over three genomic subsets, in turn run as two replicates with dif- mated divergence at 1.4 mya (95% HPD 0.5–2.5 mya). Clades B1 ferent random allele pairing), consistently recovered a tree (Fig. 4) and B4 were sisters with strong support (PP 0.99) and an estimated that was largely in agreement with the cytb tree. However, unlike split at 1.5 mya (95% HPD 0.7–2.4 mya). The age of the divergence the cytb tree, it strongly (PP 1.0) supported a sister relationship between these two pairs of clades was estimated to be 2.5 mya between clade A (containing C. brachydactyla, excluding C. b. (95% HPD 1.4–3.7 mya). Clade C included five samples of C. b. dukhunensis) and clade BC (comprising C. acutirostris and C. brachy- dukhunensis from the winter quarters (C India, Myanmar) in addi- dactyla dukhunensis) (labelling as in Figs. 2 and 3). In contrast, the tion to the one from the breeding grounds included in Fig. 2. The analyses of the same data but excluding the outgroup species estimated split between clades B and C was older than that esti- (20,051 SNPs [after the exclusion of 6,578 sites that differed mated in Fig. 2 (3.8 mya, 95% HPD 2.3–5.5 mya). between E. alpestris and Calandrella but were monomorphic in The position of clade D was as in Fig. 2, with the addition of two Calandrella] over three genomic subsets) consistently recovered samples of C. cinerea williamsi from Kenya in clade D2. Clade E con- clades A and DE as sisters, with strong support (PP 1.0) (Fig. 4). tained, in addition to the samples in Fig. 2, C. c. alluvia and C. c. mil- The SNAPP analyses of the dataset MUSEUM (1,590 SNPs), includ- lardi from Botswana, C. c. fulvida from Zambia and Zimbabwe, C. c. ing all Calandrella species, i.e. also samples obtained from museum niveni from South Africa, C. c. saturatior from Angola, Malawi, specimens of C. blanfordi and C. erlangeri, consistently recovered Tanzania and Uganda, and C. c. spleniata from Namibia. Clades F clade FG as sister to the other Calandrella species with strong sup- and G agreed with Fig. 2 (same sequences). port (PP 1.0) in two out of three replicates (Fig. 5a). Analyses of the larger dataset DAAROODENSIS (4,165 SNPs), containing C. blanfordi daa- 3.2. Phylogenomic analyses of nuclear RAD data roodensis (G2), resulted in a tree with the same basic topology in one of the three replicates (Fig. 5b). However, the remaining Using single samples as terminal taxa, the SNAPP analyses did MUSEUM and DAAROODENSIS replicates, as well as all replicates for data- not deviate from clade assignment in the cytb analyses (Fig. S1), set ERLANGERI (4,343 SNPs) containing C. erlangeri (F2), resulted in and we therefore continued the analyses by treating all C. brachy- trees that placed the museum taxa as sisters to the outgroup 238 M. Stervander et al. / Molecular Phylogenetics and Evolution 102 (2016) 233–245

Fig. 2. Relationships within the genus Calandrella based on the complete mitochondrial cytochrome b gene, analysed in BEAST v. 1.8.2 under a lognormal relaxed clock with a rate of 2.1%/million years. Posterior probabilities (PPs) are indicated at the nodes; ⁄ means PP 1.00, and – means PP < 0.50. Bootstrap support from ML analyses (MLBS; same topology) are given after PP values, with ⁄ representing MLBS = 100. Clade ABC was constrained based on the results from the RAD data (Fig. 4); cytb trees based on unconstrained analyses did not support this clade (cf. Fig. 3). Clade names and colours as in Figs. 3–5.

E. alpestris (Fig. 5c). Thus, it should be noted, that the museum In all of the SNAPP analyses of the RAD data the relative branch sequences still seem to suffer from erroneous bases due to DNA lengths differed markedly from the cytb trees (Figs 2 and 3), with damage. clades DE and FG being considerably shallower. Divergences within M. Stervander et al. / Molecular Phylogenetics and Evolution 102 (2016) 233–245 239

Fig. 3. Relationships within the genus Calandrella based on a dataset comprising both full-length and short mitochondrial cytochrome b sequences, analysed under a lognormal relaxed clock with a rate of 2.1%/million years. Posterior probabilities (PPs) are indicated at the nodes; ⁄ means PP 1.00, and – means PP < 0.50. Clade labels and colours as in Figs. 2, 4 and 5 for simplified comparison. # indicates full length sample (also used in analysis in Fig. 2). 240 M. Stervander et al. / Molecular Phylogenetics and Evolution 102 (2016) 233–245

Fig. 4. Relationships between and within three of the four major clades (A, BC, DE) of the genus Calandrella based on nuclear RAD sequence data from live samples, representing a total of >20,000 SNPs. The coalescent-based trees were produced by SNAPP and drawn in Densitree (Bouckaert, 2010). Thin lines, making up densities that represent the branches, represent individual replicate trees. Alternative topologies are drawn in different colours: the most supported topology in blue, followed in turn by red and green (the latter hardly visible due to low densities). The main tree represents one of three independent genomic subsets, in which the Horned Lark Eremophila alpestris was used as outgroup, and one of two replicates with random allele pairings per genomic subset. Only in this one analysis, out of six, the alternative (red) topology received any substantial support. The inset shows the corresponding analysis without any outgroup. Note that clade A is sister to clade BC in the outgroup-rooted tree, whereas it is sister to clade DE in the ingroup-only tree. Maximum likelihood bootstrap analyses (MLBS) of a RAD sequence super-matrix comprising 3.27 Mbp, including the outgroup, were performed in RAxML. The topology of that ML tree agrees entirely with the main outgroup-rooted SNAPP tree topology. Posterior probabilities (PP) and MLBS are indicated at nodes, with ⁄ representing PP/MLBS = 1.0. PP/MLBS < 1.0 are represented according to the legend included in the figure. The analyses of SNP data from the RAD sequencing resolved the major topology, and agreed with the cytb trees (Figs. 2 and 3). Clade names and colours correspond to those in Figs. 2, 3 and 5. For corresponding analyses, in which all samples were treated as terminal taxa, see Fig. S1.

clade A, when individuals were treated as terminal taxa, were most of these. It also inferred deep divergences within C. cinerea markedly deeper compared to clade BC (Fig. S1). Moreover, within and C. acutirostris, and two non-monophyletic species, namely clade A the relationship among C. b. brachydactyla, C. b. rubiginosa C. brachydactyla and C. blanfordi. The RAD data strongly supported and C. b. longipennis varied among different analyses, with short sister relationships between C. acutirostris and C. brachydactyla internode distances and long terminal branches (Fig. S1). dukhunensis (clade BC), between this clade and the other sub- The RAxML bootstrap analysis of the RAD sequence super- species of C. brachydactyla (clade ABC), and between clade FG matrix based on the fresh samples (including E. alpestris as out- (including only representatives for clades F2 and G2) and the rest group; 3.27 Mbp) recovered the same topology as the SNAPP tree of the species (clade DE). with 100% support for all clades (see Fig. 4). The divergence times inferred from the cytb data had large con- fidence intervals, and are based on a molecular clock, and should 3.3. Within-lineage genetic variation therefore be interpreted with caution. This is particularly true for the deepest nodes that were associated with low posterior proba- Calculated for the individual samples, the SNP heterozygosity bilities, as well as the divergence times in Fig. 3, which were based was almost twice as high (0.13–0.14) in C. brachydactyla (excluding on an analysis in which the majority of the sequences were incom- C. b. dukhunensis) as in the samples from all the other taxa (0.03– plete. However, as the divergence times were estimated under the 0.09) (Table 1). There was considerably greater variation in theta: best-fit model, HKY + G + I, rather than the GTR + G model used by lowest in the African clades DE (0.012–0.033) and FG (0.015– Weir and Schluter (2008) in their analysis that confirmed a mean 0.016), intermediate in the Asian clade BC (0.025–0.041), and high- cytb rate of 2.1%/my across 74 avian calibrations, our time esti- est in C. brachydactyla (clade A; 0.067–0.087) (Table 1). When mates are likely to be underestimates rather than overestimates. collapsed to a single terminal taxon, clade A had a particularly high The overall somewhat younger divergences compared to those estimated theta (0.331; Table 1). estimated by Alström et al. (2013) are also likely the result of the use of different models (GTR + G in Alström et al. (2013)). 4. Discussion 4.2. Intraspecific variation and divergence 4.1. Interspecific relationships and dating In the cytb tree, the C. brachydactyla clade (A) showed very shal- Cytb supported deep splits among the currently recognised spe- low divergence across its large range, here sampled from Morocco cies, although it failed to support the interrelationships among to eastern Kazakhstan. Still, there was evidence of geographical M. Stervander et al. / Molecular Phylogenetics and Evolution 102 (2016) 233–245 241

a

b c

Fig. 5. Relationships between and within all four major clades (A, BC, DE, FG) of the genus Calandrella based on nuclear RAD sequence data from fresh samples, as well as two 50–90 year old museum specimens (C. erlangeri and C. blanfordi daaroodensis). (a) A matrix of 1,590 SNPs including both C. erlangeri and C. blanfordi daaroodensis, (b) 4,165 SNPs including only C. blanfordi daaroodensis, and (c) 4,343 SNPs including only C. erlangeri. These are the consensus trees from coalescent-based analyses made in SNAPP, with all nodes supported by posterior probabilities (PPs) = 1.0, unless stated otherwise at the nodes. Since the DNA of the museum samples was damaged, these analyses are based only on SNPs represented in all taxa and variable among fresh samples, i.e. SNPs unique to the museum samples have been excluded on the notion that in the majority of the cases they would represent erroneous nucleotides from damaged DNA (leading to a placement of the museum samples outside the genus Calandrella, such as in (c)). This procedure risks to decrease the true branch lengths of the taxa represented by museum samples, but it is likely counteracted by the inclusion of some erroneous nucleotide calls among the sites used. Clade names and colours correspond to those in Figs. 2–4.

Table 1 Genetic diversity based on RAD SNP data. SNP heterozygosity is calculated as the proportion of heterozygous SNPs among a dataset of 215,996 SNPs with full coverage in all fresh samples, and a restricted dataset of 11,684 SNPs including the museum samples. Note that the heterozygosity values of the museum samples may be deflated (see text). Theta values are co-estimated in SNAPP, for individual samples and for young clades (A, D1) comprising multiple samples, separately. Effective sample sizes (ESS) for these theta estimates were >400.

Clade Sample SNP heterozygosity Theta

Dataset FRESH Dataset MUSEUM A 0.331 A1 brachydactyla_brachydactyla_2076_ESP 0.135 0.124 0.071 A1 brachydactyla_rubiginosa_46_MAR 0.137 0.125 0.067 A2 brachydactyla_longipennis_67_KAZ 0.142 0.135 0.087 B1 acutirostris_tibetana_87_CHN 0.065 0.068 0.025 B2 acutirostris_acutirostris_83_AFG 0.088 0.092 0.032 C brachydactyla_dukhunensis_73_MNG 0.091 0.093 0.041 D1 0.028 D1 cinerea_saturatior_28_NGA 0.073 0.081 0.012 D1 cinerea_saturatior_29_NGA 0.064 0.073 0.012 D2 cinerea_williamsi_33_KEN 0.083 0.087 0.015 E cinerea_cinerea_02_ZAF 0.079 0.082 0.033 F2 blanfordi_daaroodensis_41_SOM 0.032 0.016 G2 erlangeri_37_ETH 0.034 0.015

structuring, with western and eastern birds in separate clades failed to support any geographic structuring within C. brachy- (A1 and A2, respectively). The only signs of geographical overlap dactyla (clade A), likely due to incomplete lineage sorting, which between these two clades were from Crete, where two birds repre- is more common in nuclear markers than in mitochondrial as a senting each of these two clades had been collected in the latter result of the four times larger effective population size in nuclear half of May. Although we have no proof that these two birds were compared to mitochondrial DNA (see further Section 4.3, below). breeding there, the chance of one or both of them being migrants C. brachydactyla (clade A) is the most widespread of the Calandrella on their way to breeding grounds further northeast and passing species, and by far exhibits the largest genetic diversity (Table 1). through Crete that late in spring seems slight (cf. Cramp, 1988; The deep divergence between C. b. dukhunensis (clade C) and the de Juana et al., 2004). In contrast to the cytb data, the RAD data other C. brachydactyla subspecies (clade A), and the sister 242 M. Stervander et al. / Molecular Phylogenetics and Evolution 102 (2016) 233–245 relationship between the former and C. acutirostris (clade B), was 4.3. SNAPP analyses of RAD data first found by Alström et al. (2013). They used the same C. b. dukhunensis sample from the Mongolian breeding grounds as anal- Individual gene trees frequently differ among each other as well ysed here (73_MNG). Their finding was corroborated by additional as from the phylogeny they are contained within (reviews in e.g. samples of C. b. dukhunensis from the winter quarters in Myanmar Avise, 1994, 2000; Maddison, 1997; Page and Charleston, 1998; and eastern India and, more importantly, by RAD data from the Funk and Omland, 2003; Degnan and Rosenberg, 2009). Moreover, same breeding site bird (73_MNG). The RAD data refute that the mitochondrial genes are particularly prone to introgress across pattern might be caused by mitochondrial introgression. The taxo- species boundaries, at least during early stages of divergence (e.g. nomic status of C. b. dukhunensis is in obvious need of revision. Funk and Omland, 2003; Chan and Levin, 2005; cf. e.g. Alström The deep splits inferred within Calandrella acutirostris in the et al., 2008; Irwin et al., 2009; Wang et al., 2014). Hence, any cytb tree were unexpected. The estimated age of clade B differed single-locus tree needs to be corroborated by analyses of indepen- between the analyses in Figs. 2 and 3 (mean 1.5 mya and dent loci. In the present study, the RAD data provided independent 2.6 mya, respectively). The more ancient separation in the latter testing of the cytb hypothesis, although RAD data were only avail- tree is likely due to the addition of samples that suggest deep splits able for a small number of individuals. In particular, the non- within Indian birds. This may also be the main reason for the older monophyly of C. brachydactyla first reported by Alström et al. age of clade BC in Fig. 3 than in Fig. 2 (3.9 mya and 3.1 mya, respec- (2013), and the deep divergence between clades F and G and tively), perhaps in combination with the addition of more between clade FG and the others found in our cytb trees, were cor- dukhunensis samples in the former analysis. Further studies are roborated by the RAD data. Moreover, as expected, the RAD data- warranted to clarify the relationship between the clades and the set, which was vastly larger in terms of number of nucleotides timing of divergence. than the cytb matrix, resolved the deeper nodes much better than The phylogeographic pattern within clade DE in the cytb tree cytb. was unexpected, with a total lack of geographical structure Concatenation, as in the present RAxML analysis of the RAD throughout almost the entire distribution of C. cinerea, from South data, has been suggested to be statistically inconsistent under Africa to DR Congo, Uganda, and southern Tanzania (clade E) and some circumstances (e.g. Kolaczkowski and Thornton, 2004; comprising seven different subspecies. In contrast, clades D (Nige- Kubatko and Degnan, 2007; Degnan and Rosenberg, 2006), but is ria and Kenya) and E were estimated to have separated around still used without evaluation against other methods (e.g. Eaton 2.5 mya, and there was a deep split between the geographically and Ree, 2013; Wagner et al., 2013; Cruaud et al., 2014). To deal isolated Nigerian and Kenyan populations (clade D1 and D2, with the inconsistency problem, multiple ‘‘species tree methods” respectively). The ancient divergence between the samples from have been proposed. Some of these (e.g. BEST, Liu and Pearl, Uganda (C. c. saturatior) and Kenya (C. c. williamsi), on either side 2007; Liu, 2008; ⁄BEAST, Heled and Drummond, 2010; MP-EST, of Lake Victoria, are particularly noteworthy. It should be noted Liu et al., 2010) are based on the multispecies coalescent model that the Nigerian population (clade D1) is treated as belonging to (e.g. Nielsen, 1998; Rannala and Yang, 2003; Degnan and Salter, C. c. saturatior, as are the samples from Angola, DR Congo, Malawi, 2005; Liu and Pearl, 2007), which assumes that gene trees evolve Tanzania, and Uganda (clade E) (Ryan, 2004; Dickinson and under coalescent processes in every branch of a species phylogeny. Christidis, 2014). This calls for a taxonomic revision. The RAD data Unlike previous algorithms integrating the species tree in parallel supported the topology of the cytb tree for the four samples with marker trees (e.g. Liu, 2008; Heled and Drummond, 2010), included in the RAD analysis. It is noteworthy that the Nigerian SNAPP (Bryant et al., 2012) is tailored for SNP datasets and inte- samples of C. c. saturatior displays intermediate genetic diversity grates a coalescent species tree without inferring marker trees. (Table 1), despite seemingly very restricted current range and pop- Unlike multi-marker sequence data SNPs cannot easily be parti- ulation size (pers. obs.). tioned and indeed SNAPP assumes unlinked markers. Conse- The old age of clade FG, and the approximately 4 my old diver- quently, to decrease linkage SNPs should first be thinned based gence between clades F (C. blanfordi eremica and C. b. daaroodensis) on their physical position, as we did by 50 Kbp (10 Kbp for the and G (C. erlangeri and C. blanfordi blanfordi) were totally unex- small datasets including museum samples). As the results from pected. These deep splits among the Horn of Africa/Arabian taxa the SNAPP analyses agreed with the RAxML of the concatenated stand in strong contrast to the lack of divergence within the much SNPs, concatenation is probably not compromised for these data larger geographical area covered by the taxa in clade E, although it (but compare with the inconsistencies found by Stervander et al. resembles (though is much older than) the separation of the geo- (2015)). graphically close populations from opposite sides of Lake Victoria SNAPP will sample the root position along with the rest of the (in clades D2 and E, respectively). The taxonomy of the two species nodes in the tree and therefore, in theory, does not require an out- in clade FG is in need of revision. The genetic diversity measured in group to root the tree (Bryant et al., 2012). This view is also pre- C. b. daaroodensis and C. erlangeri was low (Table 1). This is compat- sented in the instructions for sequence-based analyses in BEAST ible with their rather small geographic distributions, but our esti- v. 2 (http://beast2.org/faq/). However, our analyses of the dataset mates may be somewhat deflated owing to less power to detect FRESH, with and without an outgroup, respectively, received equally heterozygote positions due to low sequencing coverage. strong support (PP 1.0) for two conflicting topologies [(outgroup, Trontelj and Fišer (2009) pointed out that meaningful investiga- ((A, BC), DE)) vs. ((A, DE), BC)]. From both a biogeographical and tion of cryptic diversity should be undertaken at genus level, as we morphological viewpoint, the former seems more likely (cf. Fig. 1 have done with extensive sampling, and that information about and de Juana et al., (2004)), and this best reflects the general any resulting cryptic lineages are of value for conservation policy. sequence (SNP) similarity: clade A–BC 70.5–72.6% vs. clade A–DE We have demonstrated the distinctiveness, both of previously 67.4–68.4%. We suggest that the use of an outgroup produces a unknown lineages (within C. cinerea and C. acutirostris) and more reliable result, at least for datasets which contain short between deep lineages that have previously been classified as sub- internode distances close to the root. Coalescence-based analyses species. Pending a taxonomic revision, this information may prove of SNP datasets are often very computationally demanding and very important for the management and conservation of e.g. the time-consuming, and it may therefore be compelling to follow small and restricted Nigerian population of what is presently trea- the recommendation by Bryant et al. (2012), not to include an out- ted as C. cinerea saturatior. group. Indeed, adding an outgroup to the analyses does not only M. Stervander et al. / Molecular Phylogenetics and Evolution 102 (2016) 233–245 243 increase the taxon matrix by one, but it will perhaps double the suggest that analyses without an outgroup resulted in a strongly number of SNPs (owing to the inherent longer evolutionary dis- supported but incorrect tree. We showed that precious museum tance between the outgroup and the ingroup, than between any specimens can be successfully used in RAD sequencing analyses, of the ingroup taxa). In practice, this will often incur a trade-off, although they should preferably be prepared in separate libraries. where the inclusion of an outgroup may limit the number of SNPs Nevertheless, we managed to include them in our analyses despite that can be included in analyses. Based on our results, we suggest heavy degradation and DNA damage. In the genus Calandrella,we that the reliability of analyses without outgroup be thoroughly discovered several cryptic lineages within C. acutirostris and C. evaluated. cinerea, whereas many described subspecies were not genetically The branch lengths differed strongly between the cytb and RAD distinct; the family is thus in need of taxonomic revision, which trees. This might represent true differences between these data- we will address elsewhere. sets. As has already been remarked, different loci may have differ- ent histories. However, it seems unlikely that C. brachydactyla, Author contributions which is the most northerly distributed and most highly migratory species and which shows an extremely shallow population struc- The study was conceived in two separate processes by M.S., U.Ot., ture in the cytb tree (excluding C. b. dukhunensis), actually has B.H., S.B., and P.A. and U.Ol, respectively. The parts were merged, and the deepest intraspecific divergence of all species in nuclear DNA its final design was determined by M.S. with input from all other (Fig. S1). It seems more likely that this pattern results from random authors; P.A., U.Ol., and U.Ot. carried out the field work, whereas effects due to small sample sizes for all taxa, which prevents reli- museum sample collection was done by M.S., U.Ol., and U.Ot.; M.S. able estimates of effective population sizes and hence branch performed the major parts of the lab work, with additions from U. lengths. The higher heterozygosity and estimated theta within C. Ol.; M.S. and P.A. performed all analyses with input from B.H. and brachydactyla than in any of the other taxa in the RAD data (Table 1) S.B.; P.A. and M.S. wrote the paper with input from the other authors, might be the main reason for the anomalous pattern, but larger all of which have approved of the final version. samples for all taxa are needed to evaluate whether this is consis- tent or not. Acknowledgements

4.4. Use of degraded DNA from museum samples We would like to thank Michel Louette, Alain Reygel and Gael Carin (RMCA), Mark Adams, Robert Prys-Jones, and Hein van We tested including degraded DNA from museum samples in Grouw (BMNH), Paul Sweet, Peter Capainolo, and Thomas J. Trom- our RAD libraries, by using whole-genome amplification and bone (AMNH), Janet Hinshaw (UMMZ), Jon Fjeldså and Jan Bolding applying no shearing. This seems to have introduced errors in the Kristensen (ZMUK), and Sharon Birks (UWBM) for providing data DNA sequences. Posterior to our library construction, other studies and museum samples for genetic analyses. Yahkat Barshep, Mark have successfully amplified RAD tags from museum material, with Hulme, Nick Horrocks, Jesus Garcia, Pierre-André Crochet, Claire one important difference – they made separate libraries with Spottiswoode, Jan Lifjeld, Paul J. Leader, and Stephane Ostrowski museum material only, which allowed much shorter fragment provided samples from the field, while Keith Barnes and Paulette lengths (e.g. Tin et al., 2014) than our target, which was optimized Bloomer provided unpublished sequences. Naomi Keehnen and for the fresh samples. Peter Pruisscher were instrumental in creating the RAD sequencing However, despite these problems, we attempted to make use of library, and are warmly thanked. Robert Prys-Jones and Pamela C. the data through a process of careful bioinformatic scrutiny. After Rasmussen kindly shared their expertise in scrutinizing specimens all putative erroneous nucleotides had been excluded from the from R. Meinertzhagen’s collections, in order to validate that these degraded museum specimens representing clade FG from the Horn were likely genuine. Martin Irestedt gave advice on DNA extraction of Africa, we could place that clade basal within the genus (Fig. 5). from ancient tissues, and Martin Larsson and Linus Hedh assisted In the attempt to exclude erroneous nucleotides, we acknowledge in some Sanger sequencing. We thank Dennis Hasselquist and the risk to also exclude true variants unique to clade FG, however one anonymous reviewer for comments on the manuscript. M.S. that risk may be balanced by the risk of included erroneous nucleo- has received grants from the Royal Physiographic Society and Bird- tides among the sites variable among the fresh samples. Life Sweden, P.A. and U.O were supported by the Sound Approach Lastly, while traditional RAD sequencing is an option, the new and the Jornvall Foundation; B.H. by the Swedish Research Council, hyRAD method (Suchan et al., 2016) and other NGS approaches the Crafoord Foundation, and EU-FP7 (project Avian Genomics); such as whole-genome shotgun sequencing and sequence capture and S.B. by the Swedish Research Council. This is contribution no. approaches are available and may be preferable for the degraded 109 from A. P. Leventis Ornithological Research Institute. DNA from museum samples (reviewed by Burrell et al. (2015)). Appendix A. Supplementary material

5. Conclusions Sanger sequences >200 bp are deposited at GenBank with accession numbers KX379897–KX379995. Sanger and RAD data- Although the mitochondrial cytb and nuclear RAD data differed sets are available at DataDryad, see http://dx.doi.org/10.5061/ strongly in number of nucleotides (61,143 bp vs. 63,270,000 bp) dryad.16mg6. Supplementary data associated with this article and taxa (22 vs. 10), they agreed well. As expected, the RAD data can be found, in the online version, at http://dx.doi.org/10.1016/j. provided better resolution and support for deeper nodes. More- ympev.2016.05.032. over, representing a vastly larger proportion of the genome, which was analysed using a coalescent based method (SNAPP), the RAD References data should provide a more robust estimate of the phylogeny of the genus Calandrella than cytb. The pronounced differences in Alström, P., 1998. Taxonomy of the assamica complex. Forktail 13, 97–107. relative branch lengths between cytb and RAD trees that we noted Alström, P., Barnes, K.N., Olsson, U., Barker, F.K., Bloomer, P., Khan, A.A., Qureshi, M. need to be evaluated by more densely sampled RAD data. Sugges- A., Guillaumet, A., Crochet, P.A., Ryan, P.G., 2013. Multilocus phylogeny of the avian family Alaudidae (larks) reveals complex morphological evolution, non- tions that outgroup rooting is superfluous in SNAPP (and related monophyletic genera and hidden species diversity. Mol. Phylogenet. Evol. 69, ⁄ methods, e.g. BEAST) need to be evaluated. For our data, we 1043–1056. 244 M. Stervander et al. / Molecular Phylogenetics and Evolution 102 (2016) 233–245

Alström, P., Olsson, U., Lei, F., Wang, H.-T., Gao, W., Sundberg, P., 2008. Phylogeny Guillaumet, A., Crochet, P.A., Pons, J.M., 2008. Climate-driven diversification in two and classification of the Old World Emberizini (Aves, Passeriformes). Mol. widespread Galerida larks. BMC Evol. Biol. 8, 32. Phylogenet. Evol. 47, 960–973. Guindon, S., Gascuel, O., 2003. A simple, fast, and accurate algorithm to estimate Avise, J.C., 1994. Molecular Markers, Natural History and Evolution. Chapman and large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704. Hall, New York. Hall, B.P., Moreau, R.E., 1970. An Atlas of Speciation in African Birds. Avise, J.C., 2000. Phylogeography: The History and Formation of Species. Harvard Trustees of the British Museum (Natural History), London. University Press, Cambridge. Hasegawa, M., Kishino, H., Yano, T.A., 1985. Dating of the human ape splitting by a Baird, N.A., Etter, P.D., Atwood, T.S., Currey, M.C., Shiver, A.L., Lewis, Z.A., Selker, E.U., molecular clock of mitochondrial-DNA. J. Mol. Evol. 22, 160–174. Cresko, W.A., Johnson, E.A., 2008. Rapid SNP discovery and genetic mapping Heled, J., Drummond, A.J., 2010. Bayesian inference of species trees from multilocus using sequenced RAD markers. PLoS One 3, e3376. data. Mol. Biol. Evol. 27, 570–580. Bickford, D., Lohman, D.J., Sodhi, N.S., Ng, P.K.L., Meier, R., Meier, R., Winker, K., Irwin, D.E., Rubtsov, A.S., Panov, E.N., 2009. Mitochondrial introgression and Ingram, K.K., Das, I., 2007. Cryptic species as a window on diversity and replacement between yellowhammers (Emberiza citrinella) and pine buntings conservation. Trends Ecol. Evol. 22, 148–155. (Emberiza leucocephalos) (Aves: Passeriformes). Biol. J. Linn. Soc. 98, 422– BirdLife International, NatureServe, 2014. Bird Species Distribution Maps of the 438. World. BirdLife International, Cambridge, UK. Keith, S., Urban, E.K., Fry, C.H. (Eds.), 1992. The Birds of Africa, vol. 4. Academic Bouckaert, R.R., 2010. DensiTree: making sense of sets of phylogenetic trees. Press, London. Bioinformatics 26, 1372–12373. Kolaczkowski, B., Thornton, J.W., 2004. Performance of maximum par-simony and Bouckaert, R., Heled, J., Kuhnert, D., Vaughan, T., Wu, C.H., Xie, D., Suchard, M.A., likelihood phylogenetics when evolution is heterogeneous. Nature 431, 980– Rambaut, A., Drummond, A.J., 2014. BEAST 2: a software platform for bayesian 984. evolutionary analysis. PLoS Comput. Biol. 10, e1003537. Kubatko, L.S., Degnan, J.H., 2007. Inconsistency of phylogenetic estimates from Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N.A., RoyChoudhury, A., 2012. concatenated data under coalescence. Syst. Biol. 56, 17–24. Inferring species trees directly from biallelic genetic markers: bypassing gene Langmead, B., Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nat. trees in a full coalescent analysis. Mol. Biol. Evol. 29, 1917–1932. Methods 9, 357–359. Burrell, A.S., Disotell, T.R., Bergey, C.M., 2015. The use of museum specimens with Liu, L., 2008. BEST: Bayesian estimation of species trees under the coalescent model. high-throughput DNA sequencers. J. Hum. Evol. 79, 35–44. Bioinformatics 24, 2542–2543. Catchen, J., Hohenlohe, P.A., Bassham, S., Amores, A., Cresko, W.A., 2013. Stacks: an Liu, L., Pearl, D.K., 2007. Species trees from gene trees: reconstructing Bayesian analysis tool set for population genomics. Mol. Ecol. 22, 3124–3140. posterior distributions of a species phylogeny using estimated gene tree Chan, K.M.A., Levin, S.A., 2005. Leaky prezygotic isolation and porous genomes: distributions. Syst. Biol. 56, 504–514. rapid introgression of maternally inherited DNA. Evolution 59, 720–729. Liu, L., Yu, L., Edwards, S.V., 2010. A maximum pseudo-likelihood approach for Cramp, S. (Ed.), 1988. The Birds of the Western Palearctic, vol. V. Oxford University estimating species trees under the coalescent model. BMC Evol. Biol. 10, 302. Press, Oxford. Maddison, W.P., 1997. Gene trees in species trees. Syst. Biol. 46, 523–536. Cruaud, A., Gautier, M., Galan, M., Foucaud, J., Saune, L., Genson, G., Dubois, E., Mayr, E., Greenway, J.C., 1960. Check-list of birds of the world: a continuation of the Nidelet, S., Deuve, T., Rasplus, J.Y., 2014. Empirical assessment of RAD work of James L. Peters, vol. 9. Harvard University Press, Cambridge. sequencing for interspecific phylogeny. Mol. Biol. Evol. 31, 1272–1274. Meinertzhagen, R., 1951. A review of the Alaudidae. Proc. Zool. Soc. Lond. 121, 81– Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, 132. R.E., Lunter, G., Marth, G.T., Sherry, S.T., McVean, G., Durbin, R., Grp, G.P.A., 2011. Nielsen, R., 1998. Maximum likelihood estimation of population divergence times The variant call format and VCFtools. Bioinformatics 27, 2156–2158. and population phylogenies under the infinite sites model. Theor. Popul. Biol. Darriba, D., Taboada, G.L., Doallo, R., Posada, D., 2012. JModelTest 2: more models, 53, 143–151. new heuristics and parallel computing. Nat. Methods 9, 772-772. Pääbo, S., Poinar, H., Serre, D., Jaenicke-Després, V., Hebler, J., Rohland, N., Kuch, M., Degnan, J.H., Rosenberg, N.A., 2006. Discordance of species trees with their most Krause, J., Vigilant, L., Hofreiter, M., 2004. Genetic analysis from ancient DNA. likely gene trees. PLoS Genet. 2, 762–768. Annu. Rev. Genet. 38, 645–679. Degnan, J.H., Rosenberg, N.A., 2009. Gene tree discordance, phylogenetic inference, Page, R.D.M., Charleston, M.A., 1998. Trees within trees: phylogeny and historical and the multispecies coalescent. Trends Ecol. Evol. 24, 332–340. association. Trends Ecol. Evol. 13, 356–359. Degnan, J.H., Salter, L.A., 2005. Gene tree distributions under the coalescent process. Pattengale, N.D., Alipour, M., 2009. How Many Bootstrap Replicates Are Necessary? Evolution 59, 24–37. Lect Not. de Juana, E., Suárez, F., Ryan, P., Alström, P., Donald, P., 2004. Family alaudidae Rambaut, A., Suchard, M.A., Drummond, A.J., 2013. Tracer v. 1.6, available from (Larks). In: del Hoyo, J., Elliott, A., Christie, D.A. (Eds.), Handbook of the Birds of . the World, vol. 9. Lynx Edicions, Barcelona, pp. 496–601. Rannala, B., Yang, Z.H., 2003. Bayes estimation of species divergence times and Dickinson, E.C., Christidis, L. (Eds.), 2014. The Howard and Moore Complete ancestral population sizes using DNA sequences from multiple loci. Genetics Checklist of the Birds of the World, Vol. 2. , fourth ed. Aves Press, 164, 1645–1656. Eastbourne. Romanov, M.N., Farré, M., Lithgow, P.E., Fowler, K.E., Skinner, B.M., O’Connor, R., Drummond, A.J., Ho, S.Y.W., Phillips, M.J., Rambaut, A., 2006. Relaxed phylogenetics Fonseka, G., Backström, N., Matsuda, Y., Nishida, C., Houde, P., Jarvis, E.D., and dating with confidence. PLoS Biol. 4, 699–710. Ellegren, H., Burt, D.W., Larkin, D.M., Griffin, D.K., 2014. Reconstruction of gross Drummond, A.J., Suchard, M.A., Xie, D., Rambaut, A., 2012. Bayesian phylogenetics avian genome structure, organization and evolution suggests that the chicken with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973. lineage most closely resembles the dinosaur avian ancestor. BMC Genomics 15, Eaton, D.A.R., Ree, R.H., 2013. Inferring phylogeny and introgression using RADseq 1060. data: an example from flowering plants (Pedicularis: Orobanchaceae). Syst. Biol. Ryan, P., 2004. Erlanger’s Lark (Calandrella erlangeri) and Blanford’s Lark (Calandrella 62, 689–706. blanfordi). In: del Hoyo, J., Elliott, A., Christie, D.A. (Eds.), Handbook of the Birds Etter, P.D., Bassham, S., Hohenlohe, P.A., Johnson, E.A., Cresko, W.A., 2011. SNP of the World, vol. 9. Cotingas to Pipits and Wagtails. Lynx Edicions, Barcelona. discovery and genotyping for evolutionary genetics using RAD sequencing. Ryan, P.G., Bloomer, P., 1999. The long-billed lark complex: a species mosaic in Methods Mol. Biol. 772, 157–178. southwestern Africa. Auk 116, 194–208. Fregin, S., Haase, M., Olsson, U., Alstrom, P., 2009. Multi-locus phylogeny of the Ryan, P.G., Hood, I., Bloomer, P., Komen, J., Crowe, T.M., 1998. Barlow’s Lark: a new family Acrocephalidae (Aves: Passeriformes) – the traditional taxonomy species in the Certhilauda albescens complex of southwest Africa. Ibis overthrown. Mol. Phylogenet. Evol. 52, 866–878. 140, 605–619. Funk, D.J., Omland, K.E., 2003. Species-level paraphyly and polyphyly: frequency, Sambrook, J., Russell, D., 1989. Molecular Cloning: A Laboratory Manual. Cold Spring causes, and consequences, with insights from animal mitochondrial DNA. Annu. Harbor Laboratory Press, Cold Spring Harbor, NY. Rev. Ecol. Evol. Syst. 34, 397–423. Schwarz, G., 1978. Estimating dimension of a model. Ann. Stat. 6, 461–464. Gernhard, T., 2008. The conditioned reconstructed process. J. Theor. Biol. 253, 769– Sibley, C.G., Monroe Jr., B.L., 1990. Distribution and Taxonomy of Birds of the World. 778. Yale University Press, New Haven. Gill, F.B., Donsker, D., 2014. IOC World Bird List (v 4.3). . Hoddinott, D., Dagne, A., Wood, C., Donald, P.F., Collar, N.J., Alström, P., 2013. Glutz von Blotzheim, U.N., Bauer, K.M. (Eds.), 1985. Handbuch der Vögel Rediscovery of a long-lost lark reveals the conspecificity of endangered Mitteleuropas, Band 10, Passeriformes. Aula-Verlag, Wiesbaden. populations in the Horn of Africa. J. Ornithol. 154, 813–825. Gombobaatar, S., Monks, E.M. (compilers), Seidler, R., Sumiya, D., Tseveenmyadag, Stamatakis, A., 2014. RAxML version 8: a tool for phylogenetic analysis and post- N., Bayarkhuu, S., Baillie, J.E.M., Boldbaatar, Sh, Uuganbayar, Ch (Eds.), 2011. analysis of large phylogenies. Bioinformatics 30, 1312–1313. Mongolian Red List of Birds. Regional Red List Series, vol. 7. Zoological Society of Stervander, M., Illera, J.C., Kvist, L., Barbosa, P., Keehnen, N.P., Pruisscher, P., Bensch, London, National University of Mongolia and Mongolian Ornithological Society, S., Hansson, B., 2015. Disentangling the complex evolutionary history of the London and Ulaanbaatar. Western Palearctic blue tits (Cyanistes spp.) – phylogenomic analyses suggest Gu, X., Fu, Y.-X., Li, W.-H., 1995. Maximum likelihood estimation of the radiation by multiple colonisation events and subsequent isolation. Mol. Ecol. heterogeneity of substitution rate among nucleotide sites. Mol. Biol. Evol. 12, 24, 2477–2494. 546–557. Stiller, M., Green, R.E., Ronan, M., Simons, J.F., Du, L., He, W., Egholm, M., Rothberg, J. Guillaumet, A., Crochet, P.A., Godelle, B., 2005. Phenotypic variation in Galerida larks M., Keats, S.G., Ovodov, N.D., Antipina, E.E., Baryshnikov, G.F., Kuzmin, Y.V., in Morocco: the role of history and natural selection. Mol. Ecol. 14, 3809–3821. Vasilevski, A.A., Wuenschell, G.E., Termini, J., Hofreiter, M., Jaenicke-Despres, V., Guillaumet, A., Pons, J.M., Godelle, B., Crochet, P.A., 2006. History of the Crested Lark Paabo, S., 2006. Patterns of nucleotide misincorporations during enzymatic in the Mediterranean region as revealed by mtDNA sequences and morphology. amplification and direct large-scale sequencing of ancient DNA. Proc. Natl. Acad. Mol. Phylogenet. Evol. 39, 645–656. Sci. USA 103, 13578–13584. M. Stervander et al. / Molecular Phylogenetics and Evolution 102 (2016) 233–245 245

Suchan, T., Pitteloud, C., Gerasimova, N.S., Kostikova, A., Schmid, S., Arrigo, N., resolution of species boundaries and relationships in the Lake Victoria cichlid Pajkovic, M., Ronikier, M., Alvarez, N., 2016. Hybridization capture using RAD adaptive radiation. Mol. Ecol. 22, 787–798. probes (hyRAD), a new tool for performing genomic analyses on collection Wang, W., Dai, C., Alström, P., Zhang, C., Qu, Y., Li, S.H., Yang, X., Zhao, N., Song, G., specimens. PLoS One 11 (3), e0151651. Lei, F., 2014. Past hybridization between two East Asian long-tailed tits Tin, M.M.-Y., Economo, E.P., Mikheyev, A.S., 2014. Sequencing degraded DNA from (Aegithalos bonvaloti and A. fuliginosus). Front. Zool. 11, 40. non-destructively sampled museum specimens for RAD-tagging and low- Weir, J.T., Schluter, D., 2008. Calibrating the avian molecular clock. Mol. Ecol. 17, coverage shotgun phylogenetics. PLoS One 9 (5), e96793. 2321–2328. Trontelj, P., Fišer, C., 2009. Cryptic species diversity should not be trivialised. Syst. Willerslev, E., Cooper, A., 2005. Ancient DNA. Proc. R. Soc. B 272, 3–16. Biodivers. 7, 1–3. Yang, Z., 1994. Maximum likelihood phylogenetic estimation from DNA sequences Vaurie, C., 1959. The Birds of the Palearctic Fauna. Passeriformes. Witherby, London. with variable rates over sites: approximate methods. J. Mol. Evol. 39, Voous, K.H., 1960. Atlas of European Birds. Nelson, London. 306–314. Wagner, C.E., Keller, I., Wittwer, S., Selz, O.M., Mwaiko, S., Greuter, L., Sivasundar, A., Zhang, G., Li, C., Li, Q., Li, B., Larkin, D.M., et al., 2014. Comparative genomics reveals Seehausen, O., 2013. Genome-wide RAD sequence data provide unprecedented insights into avian genome evolution and adaptation. Science 346, 1311–1320.