<<

Available online at www.sciencedirect.com

Molecular Phylogenetics and 46 (2008) 430–445 www.elsevier.com/locate/ympev

DNA evidence for a origin of the Alcidae (Aves: ) in the Pacific and multiple dispersals across northern

Sergio L. Pereira a,*, Allan J. Baker a,b

a Department of Natural History, Royal Ontario Museum, 100 Queen’s Park Crescent, Toronto, Ont., Canada M5S 2C6 b Department of Ecology and Evolution, University of Toronto, Toronto, Ont., Canada M5S 1A1

Received 30 December 2006; revised 5 November 2007; accepted 27 November 2007 Available online 5 December 2007

Abstract

The Alcidae is a group of marine, wing-propelled diving known as that are distributed along the coasts of the northern oceans. It has been suggested that auks originated in the Pacific coastal shores as early as the , and dispersed to the Atlantic either through the Arctic coasts of and (northern dispersal route), or through upwelling zones in the coastal areas of to (southern dispersal route), before the closure of the Isthmus of in the . These hypotheses have not been tested formally because proposed phylogenies failed to recover fully bifurcating, well-supported phylogenetic relationships among and within genera. We therefore constructed a large data set of mitochondrial and nuclear DNA sequences for 21 of the 23 of extant auks. We also included sequences from two other extant and one extinct species retrieved from GenBank. Our analyses recov- ered a well-supported phylogenetic hypothesis among and within genera. is the only for which we could not obtain strong support for species relationships, probably due to incomplete lineage sorting. By applying a Bayesian method of molecular dating that allows for rate variation across lineages and genes, we showed that auks became an independent lineage in the Early Paleocene and radi- ated gradually from the Early to the Quaternary. Reconstruction of ancestral areas strongly suggests that auks originated in the Pacific during the Paleocene. The southern dispersal route seems to have favored the subsequent colonization of the northern Atlantic during the Eocene and . The northern route across the Arctic Ocean was probably only used more recently after the opening of the Norwegian Sea in the Middle Miocene and the opening of the Bering Strait in the Late Miocene. We postulate that the ancestors of auks lived in a warmer world than that currently occupied by auks, and became gradually adapted to feeding in cool marine currents with high biomass productivity. Hence, warmer tropical waters are now a barrier for the dispersal of auks into the , as it is for in the opposite direction. Ó 2007 Elsevier Inc. All rights reserved.

Keywords: Alcidae; Divergence times; Molecular phylogenetics; Biogeography; Paleocene; Dispersal; Speciation; Pacific Ocean;

1. Introduction auks. They come to land briefly to reproduce, usually in rocks and cliffs in offshore islands and narrow coastal strips The Alcidae is a group of 23 extant and one recently that are hardly accessed by land predators (Gaston and extinct species of marine, wing-propelled diving birds that Jones, 1998). spend most of their lives foraging in the northern seas of Based on the age and diversity of forms observed in the the Northern Hemisphere and are collectively known as marine record it has been hypothesized that auks originated in the Pacific region (Gaston and Jones, 1998; Olson, 1985). Hydrotherikornis from the Late Eocene of * Corresponding author. Fax: +1 416 586 5553. Oregon is the oldest fossil that has been attributed to the E-mail address: [email protected] (S.L. Pereira). Alcidae, but the first undisputed of the group are

1055-7903/$ - see front matter Ó 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.ympev.2007.11.020 S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445 431 the extinct genera Alcodes (Howard, 1968)andMiocepphus 1996a; Moum et al., 1994), incomplete sampling (Olson, 1985) from marine deposits of the Middle Miocene (Moum et al., 2002; Watada et al., 1987), and contentious of the Pacific and Atlantic Oceans, respectively. Later on, treatment of shared among multiple phylogenetic the extinct genus Praemancalla and forms possibly related source trees as characters (supertree approaches) (Thomas to modern , Cerorhinca, Aethia and appear in et al., 2004). marine deposits of the Late Miocene of the Pacific. The In the present study, we aimed to fully resolve the phy- extinct genus Australca and modern Alca, Alle, Fratercula logenetic relationships of all the genera and species of the and Pinguinus appear in the marine fossil record of Atlantic Alcidae with a much larger data set of DNA sequences deposits of the Late Miocene and Pliocene. If auks origi- than were used in previous studies. We used the statistically nated as far back as at least the Miocene, it is very likely best-supported tree topology to estimate divergence times that their radiation began in the Pacific Ocean and they and areas where ancestors of the Alcidae may have origi- became adapted to marine environments early in their evo- nated. This is the first study to test the hypothesis of a Paci- lutionary history. During the Miocene, the Pacific would fic origin and alternative hypotheses for their dispersal have provided a greater expanse for the evolution of mar- routes within a strong phylogenetic framework using mod- ine birds compared to the Northern Atlantic Ocean, which ern methods of tree inference, molecular dating and ances- was still in the process of opening up during the end of the tral area reconstruction. Our estimates are interpreted Eocene and throughout the Oligocene (Olson, 1985). based on the geological history of the Northern Hemi- From the Pacific, auks are thought to have colonized the sphere throughout the Cenozoic. Atlantic and Arctic Oceans, but competing hypotheses have been proposed to explain possible colonization routes. 2. Materials and methods One hypothesis posits that they crossed the Bering Strait into the Arctic Ocean, and reached the Atlantic region 2.1. Taxon sampling, DNA amplification, sequencing and via the Arctic coasts of Eurasia and North America sequence alignments (Be´dard, 1985). New geological evidence for reduced salin- ity and increased freshwater input in the Arctic Ocean dur- Twenty (87%) out of the 23 recognized extant species ing the Eocene (Brinkhuis et al., 2006) would make this within the Alcidae, and three other Charadriiformes out- dispersal route more unlikely if the ancestors of auks were groups were sampled for this study (Table 1). DNA was already adapted to marine environments. On the other isolated following standard protocols (Sambrook et al., hand, an alternative colonization route has been suggested 1989). Mitochondrial and nuclear DNA amplifications via the Pacific southern coasts of North America to the were carried out for the small and large ribosomal subunit Atlantic before North and finally became (12S and 16S rDNA, respectively), NADH dehydrogenase connected to each other through the Isthmus of Panama subunit 2 (ND2), cytochrome b (cyt b) (primers designed by (Konyukhov, 2002). This hypothesis also postulates that O. Haddrath and described in Pereira and Baker, 2004b) the exchange between the Atlantic and the Pacific Oceans and cytochrome c oxidase subunit I (COI) (COIaR—aac occurred more recently, through periodic openings of the yaa cca caa aga cat ygg and COIbR—gan agg aca tag Bering Strait from the Late Miocene to the Pleistocene tgg aag tgg gc; O. Haddrath, personal communication), (Konyukhov, 2002). and the nuclear recombination activating protein (RAG- The lack of a well-supported phylogenetic hypothesis at 1) gene (primer combinations R13 and R18, R17 and the genus and species level has precluded further testing of R22, and R21 and R2b in Groth and Barrowclough, the proposed hypotheses for their center of origin and dis- 1999), following published protocols (Groth and Barrowc- persal routes. Published phylogenies (Fig. 1) agree that the lough, 1999; Pereira and Baker, 2005). After amplification, auks can be subdivided in two major sister clades (Friesen PCR products were separated in and recovered from 1% et al., 1996a; Moum et al., 2002, 1994; Strauch, 1985; Tho- agarose gels, purified by centrifuging each recovered frag- mas et al., 2004; Watada et al., 1987). One of the major ment through a filter tip, cycle-sequenced and run on a clades includes puffins (Fratercula) and auklets (Cerorhin- Li-Cor 4200 bidirectional automated DNA sequencer ca, Aethia and Ptychoramphus) that feed mainly on zoo- according to the manufacturer’s suggested protocol. Both plankton, and the other contains murres (Uria), murrelets L- and H-strands sequences were checked for ambiguities ( and ), (Cep- and final consensus sequences were created for each frag- phus), Dovekie (Alle alle), (Alca torda) and the ment sequenced in Sequencher 4.1.2 (GeneCOdes Corp., recently extinct Great (Pinguinus impennis) that feed Inc., Ann Arbor, Michigan). Consensus sequences plus primarily on fishes. However, these same phylogenies have sequences for alcids retrieved from GenBank (Table 1), failed to provide strong support for the generic and some including Cepphus grylle, Synthliboramphus hypoleucus of the specific relationships within the mainly due and the extinct Pinguinus impennis for which to use of characters under natural selection and prone to we did not have DNA samples, were aligned visually in at the genotypic and phenotypic lev- MacClade 4.0 (Maddison and Maddison, 2000). A final els (Friesen et al., 1996b; Strauch, 1985; Watada et al., matrix of 7403 base pairs (bp), including gaps, was used 1987), insufficient character sampling (Friesen et al., for phylogenetic analyses. New sequences obtained in this 432 S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445

Fig. 1. Proposed phylogenetic relationships for the Alcidae based on non-molecular (Strauch, 1985) and molecular characters (Friesen et al., 1996a; Moum et al., 2002, 1994; Watada et al., 1987), as well as on a supertree approach (Thomas et al., 2004) in which clades are transformed into characters. In the present study, the monotypic Cyclorrhynchus has been considered synonym with Aethia, following American Ornithologists’ Union (1997). study have been deposited in GenBank (see Table 1 for ses. Model selection was also performed for all genes con- accession numbers). catenated to carry out a non-partitioned maximum likelihood analysis in PAUP 4.0 beta 10 (Swofford, 2001) as the current version of this software does not allow for 2.2. Base composition and model selection data partition. Bias in base composition was checked in TREEPUZ- ZLE 5.0 (Strimmer and von Haeseler, 1996) as it can neg- 2.3. Phylogenetic Bayesian inference atively influence phylogenetic inference (Haddrath and Baker, 2001). Model selection was carried out separately A Metropolis-coupled Markov chain Monte Carlo for each gene according to the Akaike Information Criteria (MCMC) Bayesian approach as implemented in MrBayes (AIC) in MrModeltest 2.0 (Nylander, 2004) because the 3.1.2 (Ronquist and Huelsenbeck, 2003) was used to infer AIC is not influenced by the in which models are phylogenetic relationships under two simultaneous inde- compared, and allows quantification of model uncertainty pendent runs, each starting with a different random tree. through Akaike weights. The best fitting model for each Each run had one cold and five heated chains to allow bet- partition was subsequently used in Bayesian phylogenetic ter mixing of the MCMC chain, quicker convergence and analyses, and parameters were estimated during the analy- minimizing the chance of being trapped in local optima. S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445 433

Table 1 Species sampled (and their common names) Species Common name Voucher of 12S rDNA 16S rDNA COI ND2 cyt b RAG-1 material Aethia cristatella JP3166 M EF373064 EF380278 EF380315 EF373219 U37087 EF373165 Aethia psittacula PAAU2340 M EF373077 EF380290 EF380327 EF373235 U37296 EF373179 Aethia pusilla JP3140 M EF380303 EF380279 EF380316 EF380337 U37104 EF380266 Aethia pygmaea JP3395 M EF380304 EF380280 EF380317 EF380338 U37286 EF380267 Alca torda Razorbill GC001 M EF373065 EF380281 EF380318 EF373220 U37288 AY228788 Alle alle alle Dovekie 1B-1361 M AJ242684 EF380282 EF380319 EF373221 U37287 EF373166 Alle alle polaris Dovekie JP3145 M EF380305 EF380283 EF380320 EF380339 EF380300 EF380268 Brachyramphus Kittlitz’s murrelet JP3067 M EF373070 EF380284 EF380321 EF373227 U63058 EF373172 brevirostris Brachyramphus JP3312 M EF380306 EF380285 EF380322 EF380340 U63054 EF380229 marmoratus Brachyramphus perdix Long-billed JMB992 M EF380307 EF380286 EF380323 EF380341 U63057 EF380270 murrelet Cepphus carbo Spectacled CSW4372 M EF380308 EF380287 EF380324 EF380342 U37292 EF380271 Cepphus columba columba JP2462 M X76349 EF380288 EF380325 EF373229 U37293 EF373173 Cepphus grylle — H AJ242688 — — — AJ242688 — Cerorhinca monocerata JMB862 M EF373072 EF380289 EF380326 EF373230 U37295 EF373174 Fratercula arctica arctica Atlantic puffin AJB5521 M DQ385279 DQ385296 DQ385177 DQ385092 DQ385228 AY228787 Fratercula cirrhata Tufted puffin JP3416 M EF380309 EF380291 EF380329 EF380343 U37298 EF380273 Fratercula corniculata Horned puffin JP2600 M EF380310 EF380292 EF380328 EF380344 U37299 EF380272 Pinguinus impennis Great Auk — S/F AJ242685 — — — AJ242685 — Ptychoramphus a. Cassin’s auklet 1B-2650 M EF373103 EF380293 EF380330 EF373261 U37302 EF373204 aleuticus Synthliboramphus ANMU2338 M EF373111 EF380294 EF380331 EF373269 U37303 EF373212 antiquus Synthliboramphus craveri Craveri’s murrelet B-5790 U EF380311 EF380295 EF380332 EF380345 U37304 EF380274 Synthliboramphus JMB811 M EF380312 EF380296 EF380333 EF380346 U37306 EF380275 wumizusume Synthliboramphus Xantu’s Murrelet — U ————U37305 — hypoleucus Uria aalge aalge AJB5518 M EF380313 EF380297 EF380334 EF380348 EF380301 EF380276 Uria aalge inornata Common murre JP3360 M EF380314 EF380298 EF380335 EF380347 EF380302 EF380277 Uria lomvia lomvia Thick-billed murre JP3364 M AJ242687 EF380299 EF380336 EF373273 U37308 EF373216 Catharacta MKP1592 M DQ385278 DQ385295 DQ385176 DQ385091 DQ385227 AY228783 Rissa tridactyla Black-legged AJB5531 M DQ385280 DQ385297 DQ385178 DQ385093 DQ385229 AY228785 jacana L51903 M DQ385273 DQ385290 DQ385171 DQ385086 DQ385222 AY228776 Spheniscus magellanicus Magellanic MM5 B DQ137200 DQ137160 DQ137178 — DQ137218 DQ137239 Eudyptula minor Little blue penguin CIB6 B DQ137204 DQ137164 DQ137174 NC_004538 DQ137214 DQ137235 Aptenodytes patagonicus King penguin KI2 B DQ137188 DQ137148 DQ13718 AY139622 DQ137226 DQ137247 Diomedea exulans Wandering 1B-111 M DQ137205 DQ137165 DQ137168 — DQ137208 DQ137229 albatross Gavia immer 1B-105 M DQ137206 DQ137166 DQ137167 — DQ137207 DQ137228 Anser albifrons White-fronted MKP991 — NC_004539 NC_004539 NC_004539 NC_004539 NC_004539 DQ137227 goose Gallus gallus Red junglefowl — — NC_001323 NC_001323 NC_001323 NC_001323 NC_001323 M58530 Voucher numbers are given for birds sequenced in this study. Type of material is the source used to isolate DNA: B, blood; F, ; H, heart; M, muscle; S, skin; U, unknown. GenBank accession numbers are given for each gene fragment. All samples are deposited at the Royal Ontario Museum, except B-5790, which is at the Louisiana State University Museum of Natural Science.

Convergence was considered to have been reached when DNA evolution as chosen by AIC (hereafter, the nucleo- the average standard deviation of the split frequencies tide model). Alternatively, we performed a second analysis, between both simultaneous runs were smaller than 0.01. partitioning the concatenated protein-coding genes by Runs were carried out assuming the same topology to be codon position and each ribosomal gene in non-coding shared among all gene partitions but model parameters partitions (hereafter, the codon model). Among-partition (substitution matrix, state frequency, proportion of invari- rate variation was assumed a priori. All trees were consid- able sites and shape of the c distribution) were unlinked ered equally likely and were rooted with sequences from and each gene partition allowed to have its own model of the Wattled Jacana (Jacana jacana). Other priors applied 434 S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445 for all partitions were: unconstrained:exponential (10.0) for changes in topology did not significantly increase the off- branch lengths, flat Dirichlet (1,1,1,1) for stationary base spring fitness, (2) the total increase in the likelihood score frequencies, flat Dirichlet (1,1,1,1,1,1) for the nucleotide over a number of generations was less than a specified substitution ratio, uniform distribution (0,200) for the amount and (3) the parameter controlling the degree of shape parameter of the c distribution of rate variation branch length optimization reached its minimum value. and uniform distribution (0,1) for the proportion of invari- Two independent GARLI runs were performed, each able sites. MCMC samples were taken in every 1000 cycles. one starting with a different random tree topology. Judg- A plot of log-likelihoods of sampled topologies versus sam- ing from the similar results compared across runs, pled cycles was built to determine the burn-in period in GARLI is unlikely to have been trapped in local optima. which the MCMC chain had reached stationary status. Only the results of one run are reported. Post-burn-in samples from both simultaneous and indepen- dent runs were used to construct a 50% majority-rule con- 2.5. Phylogenetic maximum parsimony analysis sensus tree. The percentage of times a node is recovered from the posterior tree distribution after the burn-in period Maximum parsimony (MP) analysis was executed in is interpreted as the posterior probability (PP) of that node, PAUP 4. Gaps were treated as missing characters and or the probability that that node is true (Ronquist and multistate character treated as uncertain. A heuristic tree Huelsenbeck, 2003). Nodes receiving PP P 0.95 were con- search was performed by stepwise addition and sequences sidered to be strongly supported. randomly added to 100 replicates. The ‘‘MulTrees” option was in effect. Branch support was estimated in 2.4. Phylogenetic maximum likelihood analysis PAUP 4 by non-parametric bootstrapping with the same parameters as the heuristic search. In this case, for each Maximum likelihood (ML) inference was performed in bootstrap replicate 100 random sequence addition was PAUP 4 beta 10 (Swofford, 2001). The model parameters in effect. estimated in MrModeltest for the combined data set were used during a heuristic search, with random addition of 2.6. Tree selection taxa. The starting tree was obtained by stepwise addition, and the tree-bisection–reconnection algorithm and ‘‘Mul- To evaluate the significance among the different tree Trees” option were in effect. A 100-replicate non-para- topologies obtained from different methods of phyloge- metric maximum likelihood bootstrapping analysis was netic inference applied to our data set and those previ- also performed in PAUP, with sequences being randomly ously published using molecular and non-molecular data added in each replicate. Bootstrapping analysis was also (Friesen et al., 1996a; Strauch, 1985; Thomas et al., performed in Garli version 0.942 (Zwickl, 2006), assum- 2004; Watada et al., 1987), the approximately unbiased ing the GTR substitution model with a proportion of (AU) test (Shimodaira, 2002) was used as implemented invariable sites and four-category c-distributed rate varia- in the program CONSEL version 0.1f (Shimodaira and tion estimated from the data set for 100 replicates. Hasegawa, 2001). The AU test uses a multiscale bootstrap GARLI is a new approach developed for large data sets technique and site log-likelihoods when computing P-val- and uses a genetic algorithm to search for the topology, ues for the topologies being tested. The multiscale boot- branch lengths and the substitution model parameters strap technique removes the bias of more conservative (rate of DNA substitution, base frequency, proportion tests that increase the number of trees within the confi- of invariable sites, and shape parameter of the c distribu- dence set with the increase in the number of trees being tion) that maximize the likelihood of generating the compared. The distribution of posterior probabilities for observed set of DNA sequences. Briefly, the genetic algo- each alternative topology is obtained by counting the rithm will create a sample of individual solutions at a number of times the hypothesis is supported by the given time, each solution with its own topology, branch replicates. lengths and model parameters representing the phyloge- netic relationships of the given DNA sequences; next, 2.7. Molecular dating each individual solution is ascribed a fitness likelihood score, which will determine the proportion of ‘‘offspring” To estimate the timeframe for the evolution of auks, we solutions that each individual solution will contribute to applied a Bayesian approach that accounts for uncertainty the next generation; the topology, branch lengths or in branch lengths and rates of evolution for individual gene one of the substitution model parameters of each off- partitions, and uses fossil data as prior information on the spring solution is then changed to a new state and the age of clades (Thorne and Kishino, 2002; Thorne et al., others are re-estimated to maximize the likelihood; these 1998). For each gene partition maximum likelihood esti- steps are repeated for a number of generations until an mates of the transition/transversion ratio, and nucleotide arbitrary stop condition is reached. For the Alcidae data frequencies was obtained in PAML 3.14 (Yang, 1997) set, the stop condition was set to automatic termination, under the F84 model of DNA substitution assuming that which was reached when three conditions were met: (1) rate variation across sites follows a c distribution with five S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445 435 discrete rate categories (Hasegawa et al., 1985). These the same parameters as in the shorter runs described for parameters were used to estimate branch lengths for each the combined analysis of all sequences. data partition and their approximate variance–covariance Finally, as an independent check on our time estimates, matrix under a maximum likelihood approach, and to we used a modified implementation of the Bayesian derive estimates of divergence times and 95% credibility method of molecular dating available in the software intervals based on all data partitions under a Bayesian mcmctree in the phylogenetic package PAML version 4 framework (Thorne and Kishino, 2002; Thorne et al., (Yang, 2007). Unlike multidivtime that imposes a zero 1998). These methods are implemented in the software probability of divergence times being younger or older than ESTBRANCHES and MULTIDIVTIME (Thorne, 2003). the minimum and maximum time constraints (hard The Ostrich Struthio camelus (AF143727, NC_002785) bound), mcmctree allows a small but non-zero probability was used as the outgroup to root the tree and had the same (soft bound) that the true divergence time falls outside the fixed rate of change in the rate of DNA substitution at the bounds. Hence, we obtained estimates of divergence times beginning and at the end of its branch as imposed by the under a correlated rates model using mcmctree and the method (Kishino et al., 2001; Thorne et al., 1998). Addi- same temporal constraints given below. We compared tional outgroups were included (Table 1), for which the fos- times estimates derived from hard and soft bounds as a sil record provides additional prior information on the age proxy to evaluate how much conflict there is among time of relevant cladogenic events (described below). constraints, and between time constraints and the molecu- The Bayesian dating method was carried out to derive lar data. prior and posterior age estimates assuming a burn-in per- iod = 5000, sample frequency = 1000, number of sam- 2.8. Temporal constraints in molecular dating ples = 10,000. Two shorter runs were also performed to estimate posterior ages and evaluate the sensitivity of the Several temporal constraints were imposed in the Bayes- method to sampling schemes and check for convergence ian tree topology used as the model tree for molecular dat- of the method. The shorter runs were sampled every ing. These constraints were based on the fossil record as 100th cycle and number of samples collected was set to follows: minimum age of 65 Myr for the separation 10,000 after a burn-in period = 3000. The following between Charadriiformes and the lineage leading to Gavi- gamma priors we also in effect in all MULTIDIVTIME iformes, and Sphenisciformes based on runs: expected prior time between tip and root (rttm) = 122 the oldest members of Charadriiformes (Cimolopteryx million years ago (Mya) (Pereira and Baker, 2006a) with and Palaeotringa) from the Late (Brodkorb, standard deviation (SD) = 20 Mya, rate of the root node 1964); minimum age of 55.8 Myr for the separation of (rtrate) and its SD = 0.00263 substitutions per site per mil- Jacana jacana (Suborder Scolopaci) and other shorebirds lion years as estimated from the median of the tip-to-root (Suborder ) based on Halcyornis topiapicus (Suborder branch lengths for all genes. Simulation and empirical Lari) from the Late Paleocene (Brodkorb, 1964); minimum studies have shown that these priors do not seem to have age of 15 Myr for the split between Alcidae and Stercorar- any appreciable effect on the Bayesian posterior distribu- iidae based on Miocepphus and Alcodes from deposits of tion of node ages and rates of evolution, and sequence data the ; minimum age of 11.6 Myr for the sep- and time constraints should determine the overall rate and aration between Cerorhinca and Fratercula based on the age of the root (Wiegmann et al., 2003; Yang and Cerorhinca dubia from the Middle Miocene and for the sep- Yoder, 2003). The prior for the rate change between ances- aration of Uria and Alle based on Uria antiqua from the tral and descendant nodes (brownmean) was 0.00818 Middle Miocene (Brodkorb, 1964); minimum age of (SD = 0.00818) substitutions per site per million years, so 3.6 Myr for the separation of Brachyramphus and of Ptych- that rttm  brownmean = 1. This later prior follows the oramphus from their respective sister clades based on suggestion that this is a meaningful value for real and sim- Brachyramphus pliocenus and Ptychoramphus tenuis from ulated data sets (Wiegmann et al., 2003). Because a priori the Middle Pliocene (Brodkorb, 1964); minimum age of information for rate change is unknown, it is advisable to 61 Myr for the separation of Sphenisciformes from Procel- use a large SD value (Thorne and Kishino, 2002), which lariformes based on extinct genus from the Early allows a gene to have a priori a large variation in rate Paleocene (Ksepka et al., 2006; Slack et al., 2006); and min- change over time. Convergence of the MCMC algorithm imum age of 66 Myr for the separation of was also checked by comparing the posterior distribution and based on Vegavis iaai from the Late Cre- of divergence times, branch lengths and the proportion of taceous (Clarke et al., 2005). Except for Vegavis iaai, the successful changes of those parameters along the Markov phylogenetic placement of the other fossils has not been chain among the three independent runs, each one starting performed in a cladistic framework. However, it seems rea- with a different randomly selected initial state. sonable to use them as minimum time constraints because To evaluate if mtDNA sequences could cause older the genus level affinities of these fossils have not been con- dates due to sequence saturation with distant outgroups, tested (Chandler, 1990; Howard, 1972; Warheit, 1992). In as usually claimed (Hugall et al., 2007), we performed sep- previous work, the divergence time between Galloanserae arated analyses of RAG-1 and mtDNA sequences, using and was estimated at 122 Myr with 95% credible 436 S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445 interval between 110 and 135 Myr based on a Bayesian we obtained a single PCR product of the expected size, no analysis of complete mitochondrial genomes and assuming frame-shift mutations or premature stop codons were pres- multiple independent time constraints from the fossil ent in protein-coding genes, and predictions of secondary record spread evenly throughout the tree (Pereira and structure for ribosomal genes were similar to published Baker, 2006a). Hence, the 95% credible interval was set models (not shown). This is consistent with findings that as minimum and maximum time constraints for the separa- the size of the amplified fragments we obtained is larger tion of Galloanserae and Neoaves in the present analysis. than the size of the few mitochondrial pseudogenes present To evaluate whether the above deep time constraints in the nuclear genome of most sequenced vertebrates, based on the 95% credible interval would overestimate including the chicken (Pereira and Baker, 2004a). The the age of alcids, we removed the maximum constraint nuclear RAG-1 gene amplifications were clean, and only placed deep in the tree, and chose an independent maxi- one single product was seen in agarose gels and no mum constraint of 72 Myr for the split between Alcidae frame-shift mutations were seen in the predicted protein and Stercorariidae, based on the upper limit of the 95% sequences. Mean sequence divergence for all genes credible interval previously estimated by us on a study of combined was 0.076 ± 0.018 (range = 0.01–0.13), for the generic relationship of the Charadriiformes (Baker RAG-1 = 0.014 ± 0.006 (range = 0.0–0.2), 12S rDNA = et al., 2007). An extra analysis was run using the 110 and 0.06 ± 0.02 (range = 0.0–0.11), 16S rDNA = 0.10 ± 0.02 135 Myr constraint for the base of the tree and the (range = 0.01–0.13), COI = 0.11 ± 0.02(range = 0.01–0.14), 72 Myr for the split between Alcidae and Stercorariidae. cyt b = 0.10 ± 0.02 (range = 0.0–0.13), ND2 = 0.13 ± 0.02 (range = 0.01–0.16). The average base composition was 2.9. Reconstruction of ancestral area states A = 0.31, C = 0.20, G = 0.24 and T = 0.25 for RAG-1 and A = 0.30, C = 0.30, G = 0.15 and T = 0.25 for the To infer ancestral areas where the Alcidae may have combined mitochondrial DNA. The values of sequence originated we assumed geographic characters to be dis- divergence and base composition above is similar to those crete distributions departing away from the Arctic Ocean, reported for other Charadriiformes (Paton and Baker, based on the breeding distribution of auks classified 2006; Paton et al., 2003; Pereira and Baker, 2005). Consid- according to the oceanographic climate zones of Gaston ering variable sites for all sequences combined, no taxon and Jones (1998). Ancestral breeding areas were scored failed the test of base composition. The GTR+i+g model as High Arctic, Atlantic Low Arctic, Pacific Low Arctic, of DNA substitution was the best fitting model for all gene Pacific Boreal and Pacific Subtropical. Because some spe- partitions or for all genes combined. Akaike weights were cies breed in more than one climate zone, two breeding >0.98 for each partition, indicating little uncertainty in score systems were used. These two schemes differ from model selection. The only exception was for the 12S rDNA each other on whether the northernmost or the southern- gene, in that GTR+i+g had the highest support with most breeding zone for these species was scored as the Akaike weight = 0.62, followed by HKY+i+g with Akaike current breeding area. We also considered a third scoring weight = 0.37, totaling a cumulative Akaike weight = 0.99. scheme, where the Arctic, Atlantic and Pacific Oceans Because HKY+i+g is a special model of the GTR+i+g, we were considered ancestral areas without further oceano- used the GTR+i+g for the 12S rDNA partition. graphic subdivision. Next, we applied a maximum likeli- hood approach that maximizes the probability that the 3.2. Tree inference observed discrete states evolved under a stochastic Mar- kov model of evolution (Lewis, 2001), as implemented The standard deviation of the split frequencies between in Mesquite 1.1 (build h61) (Maddison and Maddison, the two independent Markov chains was lower than 0.001 2005a,b). For each internal node, the state that maximizes after 2 million cycles for both the partition by gene or by the probability of observing the states at the terminal codon position/non-coding. The Markov chain was sam- nodes is found, while allowing states at all other nodes pled until the 5,951,000th cycle for the gene partition. to vary independently. We reconstructed ancestral area The 50% majority-rule consensus Bayesian tree obtained states based on the consensus Bayesian tree topology from the analysis assuming the nucleotide model (Fig. 2) assuming branch lengths to be scaled to the mean poster- was based on 10,902 trees sampled after discarding the first ior divergence times as estimated by the Bayesian dating 500 sampled trees in both independent runs. The Alcidae approach described above using the longer sampling was recovered as a monophyletic group with posterior parameters. probability (PP) = 1.0 and most internal nodes received PP > 0.95. Two major clades can be recognized within 3. Results the Alcidae. One of these clades contains the puffins (Frat- ercula) and auklets (Cerorhinca, Aethia and Ptychoram- 3.1. Sequence variability phus) distributed mainly in the Pacific Ocean, and the other includes the murres (Uria), murrelets (Brachyramphus We discarded the possibility of having amplified nuclear and Synthliboramphus), guillemots (Cepphus), dovekie (Alle copies of mitochondrial genes in the present study because alle), razorbill (Alca torda) and the extinct Great Auk (Pin- S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445 437

Fig. 2. Bayesian tree inference based on the nucleotide partition model. The inset at the top shows the alternative relationships obtained with the codon partition model. Outgroups are not shown. Numbers at nodes are posterior probabilities for the nucleotide/codon partition, indicated as à when 1.0. Scale at bottom left is expected number of substitutions per site. guinus impennis), found in all three northern Oceans. Only Using maximum parsimony (MP) two equally most par- the phylogenetic relationships among Aethia cristatella, simonious trees of 5976 steps were obtained (Fig. 4), differ- Aethia pygmae and Aethia psittacula could not be resolved. ing in the placement of the murrelets of the genus The Markov chain for the analysis derived from the codon Synthliboramphus and the interrelationships of the model was sampled for 6,020,000 cycles. The 50% majority- (Alca, Pinguinus) with Alle and Uria. In one of the most rule consensus Bayesian tree (inset in Fig. 2) was based on parsimonious trees (Fig. 4, left), Synthliboramphus was 11,042 trees sampled, discarding the first 500 samples from placed as sister to Alca, Pinguinus, Alle and Uria, similar each of two independent runs. The recovered topology was to the position recovered in the ML and both BA analyses. identical to the one obtained from the nucleotide model, The second tree (Fig. 4, right) placed Synthliboramphus except that Alle and (Alca, Pinguinus) were sister clades, more basally in the clade of the guillemots, murres and excluding Uria, with PP = 0.51, compared to the allies. Additionally, Uria was placed as a sister genus to PP = 0.71 for the placement of Alle and Uria as sister gen- Alle, excluding the clade (Alca, Pinguinus), as seen in the era in the analysis based on the nucleotide model. Bayesian topology derived from the nucleotide model. Maximum likelihood analyses in PAUP and GARLI However, MP bootstrapping analysis did not provide resulted in similar tree topologies (Fig. 3), except that both strong support for the placement of Synthliboramphus methods differed in how species in the genus Aethia are basally in the clade (BP = 51%), or for the remaining gen- related to each other. Moreover, neither PAUPà nor eric relationships within the Alcidae (BP < 50%) and the GARLI bootstrap proportions (BP) supported the internal monophyly of the Alcidae (BP = 58%). relationships within Aethia or the relationships among The approximately unbiased (AU) test indicated that Alle, Uria and the (Alca, Pinguinus) clade. Despite these only the alternative trees obtained in this study were differences, the remaining relationships within the family included in the 95% confidence set of the likely representa- were highly supported by BP and similar to the Bayesian tion of the phylogenetic relationships of the Alcidae. The topology recovered in the analysis with the nucleotide AU test statistic for the nucleotide model derived BA model. topology = 0.16; codon model derived BA = 0.70; 438 S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445

Fig. 3. Maximum likelihood tree topology obtained in GARLI. The inset depicts an alternative arrangement for Aethia found in PAUP. Outgroups are not shown. Numbers at nodes are PAUP/GARLI bootstrap proportions from 100 replicates with random sequence addition, indicated as à when 100. Scale at bottom left is expected number of substitutions per site.

Fig. 4. Two equally maximum parsimonious topologies with 5976 steps found using the heuristic tree search algorithm. Outgroups are not shown. Numbers at nodes in one of the trees are bootstrap proportions based on 100 replicates with 100 cycles of sequence random addition, indicated as à when 100. Scale at bottom left of each tree is expected number of substitutions.

ML = 0.70; MP with Synthliboramphus more derived analysis with the codon model (Fig. 2, inset), which is iden- = 0.70 and MP with Synthliboramphus placed more tical to one of the MP topologies (Fig. 4, left), is the best basally = 0.12, and each tree published previously <0.04. representation of the phylogenetic relationships of the Among the trees with the highest test statistics in the AU Alcidae. Additionally, we consider that the relationships test, we conclude that the topology recovered in the BA among Aethia cristatella, Aethia psittacula and Aethia pyg- Table 2 Prior and posterior distributions of Bayesian estimates of divergence times (95% credible interval—CrI) Node Multidivtime prior Multidivtime (longer run) Multidivtime (shorter run) Multidivtime RAG-1 Multidivtime mtDNA mcmctree Mean (95% CrI) Mean (95% CrI) Mean (95% CrI) Mean (95% CrI) Mean (95% CrI) Mean (95% CrI) 430–445 (2008) 46 Evolution and Phylogenetics Molecular / Baker A.J. Pereira, S.L. 1 115.6 (95.2, 131.8) 101.5 (86.9, 116.7) 101.4 (87.3, 116.6) 105.2 (88.0, 123.2) 106.7 (89.2, 124.7) 95.2 (85.1, 109.2) 2 106.5 (80.3, 127.2) 96.3 (81.9, 111.5) 96.2 (82.4, 111.2) 102.3 (84.8, 120.4) 95.6 (79.0, 113.8) 84.2 (74.2, 97.27) 3 96.9 (68.1, 121.6) 87.1 (72.7, 102.5) 87.1 (73.1, 102.4) 92.2 (72.6, 111.9) 87.7 (70.1, 106.8) 75.2 (64.3, 88.2) 4 106.1 (78.4, 127.4) 86.9 (72.1, 102.6) 86.8 (72.1, 102.9) 86.4 (66.9, 106.5) 97.5 1 (78.2, 117.8) 80.3 (70.1, 92.9) 5 96.5 (65.3, 121.8) 65.4 (52.3, 79.8) 65.4 (52.6, 80.0) 63.5 (43.6, 84.1) 77.1 (60.4, 96.2) 62.0 (53.4, 72.6) 6 86.5 (53.2, 114.9) 61.5 (48.5, 75.8) 61.5 (48.8, 75.8) 57.3 (38.1, 77.6) 73.3 (56.5, 92.1) 55.8 (47.9, 65.4) 7 76.8 (43.3, 108.0) 54.9 (43.1, 68.6) 54.9 (42.9, 68.5) 51.8 (33.2, 71.5) 65.2 (49.6, 83.9) 50.4 (43.2, 59.1) 8 66.6 (35.0, 100.1) 47.5 (36.7, 60.0) 47.6 (36.6, 60.0) 46.3 (29.3, 65.2) 56.4 (42.3, 73.2) 44.0 (37.50, 51.8) 9 56.0 (26.4, 90.6) 44.3 (34.0, 56.1) 44.4 (34.0, 56.4) 42.8 (26.4, 61.5) 52.4 (39.2, 68.8) 41.6 (35.4, 48.8) 10 45.4 (18.7, 79.4) 39.0 (29.2, 50.5) 39.1 (29.2, 50.5) 40.3 (24.3, 58.8) 42.5 (30.5, 57.8) 38.7 (32.9, 45.5) 11 34.6 (13.3, 67.2) 31.0 (22.7, 41.0) 31.0 (22.7, 40.9) 28.4 (15.5, 45.5) 36.0 (25.2, 49.6) 31.7 (26.70, 37.8) 12 23.1 (4.1, 53.8) 29.4 (21.3, 39.1) 29.3 (21.2, 39.1) 25.8 (13.6, 42.2) 33.8 (23.6, 46.9) 28.4 (23.9, 34.1) 13 11.6 (0.4, 37.2) 24.2 (16.1, 33.9) 24.2 (16.1, 33.9) — 27.9 (17.9, 41.0) 18.7 (10.8, 26.3) 14 11.7 (0.3, 37.8) 0.6 (0.0, 1.7) 0.6 (0.0, 1.7) 10.3 (2.9, 22.0) 0.6 (0.0, 1.8) 3.1 (1.6, 5.5) 15 23.0 (4.4, 53.2) 14.7 (9.5, 21.2) 14.7 (9.5, 21.4) 20.9 (9.3, 37.6) 15.4 (9.3, 23.8) 16.0 (11.5, 21.5) 16 11.4 (0.4, 36.5) 3.4 (1.7, 5.9) 3.4 (1.7, 5.9) 7.9 (0.9, 19.4) 3.5 (1.6, 6.5) 3.6 (2.0, 5.9) 17 30.3 (6.1, 65.4) 15.2 (10.1, 21.9) 15.3 (10.1, 22.1) 20.7 (9.3, 37.4) 16.5 (10.3, 24.7) 20.2 (15.3, 25.8) 18 15.1 (0.6, 45.7) 8.8 (5.3, 13.6) 8.9 (5.3, 13.6) 15.5 (5.7, 30.9) 8.9 (5.0, 14.9) 11.1 (7.7, 15.0) 19 15.3 (0.6, 47.4) 2.5 (0.4, 5.6) 2.5 3 (0.4, 5.6) — 2.7 (0.5, 6.2) 4.1 (1.5, 8.5) 20 37.6 (7.9, 76.1) 20.2 (12.0, 31.4) 20.4 (12.0, 32.1) — 24.6 (13.4, 45.2) 23.4 (15.6, 31.3) 21 18.8 (0.6, 56.1) 4.9 (1.8, 11.8) 5.1 (1.8, 12.7) 10.1 (1.1, 25.1) 6.9 (1.6, 27.5) 9.3 (5.2, 15.1) 22 44.8 (9.5, 85.4) 40.3 (30.2, 52.7) 40.4 (30.0, 52.9) 39.7 (23.7, 59.2) 47.8 (34.1, 64.6) 34.5 (28.8, 41.3) 23 22.6 (0.9, 64.1) 26.4 (18.3, 36.4) 26.5 (18.2, 36.5) 33.5 (18.1, 51.7) 28.1 (17.7, 41.8) 23.0 (16.7, 29.22) 24 62.1 (28.0, 97.5) 49.4 (38.0, 62.7) 49.5 (37.7, 62.7) 47.4 (29.9, 67.7) 57.8 (42.1, 76.4) 45.8 (38.6, 54.0) 25 46.5 (14.6, 84.9) 20.9 (14.5, 29.0) 20.9 (14.5, 29.2) 20.6 (9.1, 37.9) 25.8 (16.8, 37.6) 23.0 (18.0, 28.7) 26 30.7 (5.0, 68.7) 16.2 (10.8, 23.1) 16.2 (10.9, 23.1) 14.2 (5.2, 29.0) 20.6 (13.1, 30.3) 16.0 (12.4, 20.0) 27 15.1 (0.5, 49.7) 15.0 (10.0, 21.5) 15.0 (10.0, 21.3) 9.8 (2.4, 22.1) 19.1 (12.0, 28.0) 14.1 (10.9, 17.8) 28 46.8 (16.3, 84.1) 29.0 (19.2, 42.0) 29.1 (19.2, 41.8) 31.4 (16.1, 52.1) 34.4 (20.5, 56.5) 26.9 (20.7, 35.0) 29 31.4 (5.7, 69.1) 8.5 (3.1, 17.9) 8.7 (3.1, 17.9) 18.3 (7.2, 35.7) 8.0 (2.7, 22.0) 13.7 (10.40, 18.2) 30 15.7 (0.5, 49.6) 4.3 (1.1, 10.7) 4.5 (1.1, 10.8) 8.8 (1.0, 22.2) 4.0 (0.9, 12.9) 9.1 (6.4, 12.7) Estimates are given in million years. Only one of the two shorter posterior runs for the combined data set is shown because estimates were virtually identical. Nodes are numbered as in Fig. 5. RAG-1 sequences was not available for three comparisons. 439 440 S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445 mae are still uncertain and represent them as a trichotomy straints are not the cause of it. Moreover, it is unlikely that in the model tree used for the molecular dating analysis and the upper constraints tested would cause equal biases reconstruction of ancestral areas. across the tree. On average, mtDNA ages within alcids are 2.4 Myr older compared to RAG-1 dates (Table 2). Moreover, if 3.3. Divergence time the three oldest nodes within alcids are not considered (nodes 7, 8 and 24), mtDNA provided estimates only 0.6 The estimates of divergence times and 95% credible Myr older compared to RAG-1. Moreover, the 95% cred- intervals (95% CrI) are given in Table 2 and illustrated in ible intervals largely overlap across different dating analy- Fig. 5. The size of the 95% CrI of the prior distribution ses. Hence, we found no strong evidence that mtDNA for node ages is considerably larger than the size of the will produce considerably older dates, and it seems that 95% CrI of the posterior distribution estimated in multidiv- the combination of sequences from both genomes is the time, which is expected because the estimates of the prior best approach available so far because the Bayesian distribution ignores the information contained in the method implemented in multidivtime accounts for differ- sequence data. The longer and the two shorter sampling ences in the evolutionary process of each gene partition, schemes used in multidivtime resulted in very similar pos- and reduces the stochastic error associated with single gene terior estimates (only one of the shorter runs is shown in estimates (Pereira and Baker, 2006b). Table 2). According to the 95% CrI of the Bayesian poster- The 95% CrI of the posterior distribution of divergence ior estimates of divergence times, Charadriiformes last times obtained in mcmctree using soft bounds were similar shared a common ancestor with the lineage leading to to that obtained in multidivtime using hard bounds, indi- Sphenisciformes, and Procellariiformes cating no significant conflict among times constraints around 101 Mya (95% CrI 87, 116 Mya). These estimates (Table 2). are consistent with estimates derived from mitogenomic data that places the origin of many Avian Orders in the Era (Pereira and Baker, 2006a). Alcidae origi- 3.4. Reconstruction of ancestral areas nated likely in the Paleocene, with 95% CrI extending between the to Early Eocene. We mapped the current distribution of auks on the best The posterior distributions of the analyses using or not topology and estimated the ancestral areas where the group using the deep time constraint at the root were very similar, likely originated (Fig. 6). Whether we scored the northern- as expected (Wiegmann et al., 2003; Yang and Yoder, most or southernmost breeding range for species occurring 2003). Hence, we are confident that if dates have been over in more than one of the climate zones we defined did not or under-estimated in this study, the use of maximum con- change our conclusions, except for the origin of Synthlib-

Fig. 5. Divergence times for the Alcidae obtained in the longer analysis as described in the text. Nodes are numbered as in Table 2. The geological timescale is indicated at the bottom. The bar at 65 Myr marks the boundary between the Mesozoic and Cenozoic, which is characterized by an event of global mass extinction. S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445 441

Fig. 6. Reconstruction of ancestral areas using current breeding areas defined by oceanographic climate zone (Gaston and Jones, 1998). Because some species breed in more than one climate zone, two scoring schemes were used, assuming the northernmost (to the left) or the southernmost (to the right) breeding areas. Chart pies at nodes represent the proportion of the marginal likelihood given to each state. Colored circles at tips are the current breeding areas and were scored as: white, High Arctic; green, Atlantic Low Arctic; blue, Pacific Low Arctic; red, Pacific Boreal; and black, Pacific Subtropical. oramphus. Both scoring schemes favored the Pacific Low number of invasions of the Atlantic and Arctic and one Arctic as the region in which the ancestors of modern auks re-invasion of the Pacific were observed. occurred, with proportional marginal likelihood (PML) = 0.97 and 0.93, respectively. These values were sig- 4. Discussion nificantly higher than those for the other alternative areas. The Atlantic was invaded once, between 30 and 40 Mya, by 4.1. Phylogenetic relationships ancestors of Alca, Pinguinus, Alle and Uria (PML = 0.68 and 0.93 for northernmost and southernmost scoring The phylogenetic relationships among the Alcidae were scheme, respectively), and one re-colonization of the Pacific inferred here using multiple methods of phylogenetic infer- by Uria can be inferred from the reconstruction of ances- ence and over 7.4 kb of nuclear and mitochondrial DNA tral areas. The colonization of the High Arctic region sequences, the largest molecular data set collected to date seemed to have occurred multiple times—via the Pacific for the group. Among congeneric species, most relation- by Cepphus grille and Fratercula arctica, and via the Atlan- ships received strong nodal support. The phylogenetic rela- tic by Uria lomvia lomvia and Alle alle. tionships within Aethia could not be resolved in previous Considering a northernmost ancestral breeding area for studies (Friesen et al., 1996a; Moum et al., 2002; Thomas Synthliboramphus significantly favored an origin for the et al., 2004) or with the larger data set collected in the pres- evolution of this genus either within the Pacific Boreal ent study. Compared to other auks, auklets in the genus region (PML = 0.56) or the Pacific Low Arctic Aethia have the largest overlap in breeding sites and virtu- (PML = 0.34). On the other hand, the Pacific Subtropical ally no breeding area is exclusive to a single Aethia species (PML = 0.76) and the Pacific Low Arctic (PML = 0.19) as in other congeneric species (Gaston and Jones, 1998). received significantly more support when a southernmost However, hybridization among Aethia species in the wild ancestral area for Synthliboramphus was considered. How- has not been reported and, hence, retention of ancestral ever, because the Pacific Subtropical and the Pacific Low DNA polymorphism and/or incomplete lineage sorting of Arctic are reciprocally discontinuous areas, it is unlikely mitochondrial haplotypes remains a plausible explanation that the southernmost scoring scheme is biogeographically in obscuring their phylogenetic relationships (Walsh and plausible for this genus. Hence, the colonization of south- Friesen, 2003; Walsh et al., 2005). An extensive molecular ernmost areas such as the Pacific Boreal and the Pacific study at the population level within the group is still needed Subtropical seems to have occurred no earlier than to evaluate this issue. 15 Mya. Reconstructions of ancestral areas using a scheme At the genus level, three alternative topologies were that does not subdivide the Oceans into smaller breeding recovered by the methods of phylogenetic inference we areas resulted in similar conclusions to those obtained from applied. They differ in which genus is the to partitions by climate zones (not shown). The Pacific was Alle and in the placement of the genus Synthliboramphus. the favored ancestral area with PML = 0.98 and the same Additionally, the intergeneric relationships inferred in this 442 S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445 study differ from those previously published based on taxon sampling and the number of source trees derived molecular and anatomical data (Friesen et al., 1996a; from the same or overlapping data sets. For example, a Strauch, 1985; Thomas et al., 2004; Watada et al., 1987). supertree generated for the big clade of Charadriiformes A statistical test to evaluate the significance of the topolog- based on more molecular than non-molecular source trees ical differences at the genus level excluded all the previously is more consistent with the molecular source trees (Thomas published trees from the 95% confidence set of possible et al., 2004). Additionally, it is not surprising that the phy- topologies. Among the trees included in the 95% confidence logenetic relationships recovered for congeneric alcids is set, the codon model BA topology and an identical one similar to those of the two molecular source trees that recovered by the MP analysis, received the highest P value included all extant species (Friesen et al., 1996a; Moum in the AU test (P = 0.70). Hence, this topology was consid- et al., 1994). Therefore, the supertree approach does not ered the best representation of the phylogenetic relation- seem to be a suitable approach to phylogenetic inference, ships among alcids, and shows that auks can be divided especially if a strong phylogenetic framework is required in two major sister clades that differ in feeding strategies for testing evolutionary hypotheses. (Fig. 2, inset), in agreement with phylogenetic analyses using skeletal, integument and natural history characters 4.2. The origin of Alcidae and chronology of their radiation (Strauch, 1985), protein electrophoresis (Friesen et al., 1996a; Moum et al., 1994) and mitochondrial DNA To better understand the biogeographic history of auks, sequences (Friesen et al., 1996a; Moum et al., 2002). None- we estimated divergence times and reconstructed the ances- theless, the nature of the characters sampled and insuffi- tral areas using Bayesian approaches. Our estimates indi- cient amount of data from previous analyses could not cated that the ancestor of the (Rissa) followed by provide a strong, well-resolved phylogenetic hypothesis at the ancestor of Stercorariidae (Catharacta) split from the the genus level especially within the fish-eating clade lineage leading to the Alcidae around 65 Mya (52, (Fig. 1). By increasing character sampling, we were able 80 Mya) and 61 Mya (48, 76 Mya), respectively, when glo- to solve this issue. bal climate was warmer and the Equator-to-pole gradient One of the major alcid clades includes primarily plank- in sea-surface temperature was narrower than today (Jen- tivorous species with deep laterally compressed bills and kyns et al., 2004). Hence, the inferred origin of modern colorful ornaments, such as the puffins (Fratercula) and auks in the Pacific Low Arctic during the Early Paleocene auklets (Cerorhinca, Aethia and Ptychoramphus), mainly estimated here (PML = 0.97; Fig. 6) and suggested by from the Pacific Ocean, and one species found in the Arctic Be´dard (1985) implies an origin of the Alcidae in a temper- and Northern Atlantic (Fratercula arctica). The other ate or subtropical region, followed by a gradual adaptation major alcid clade contains species that are mostly fish-eat- to colder conditions as the world cooled and the cold, salin- ing birds with sharp pointed bills and that lack colorful ity-driven deep-sea currents formed (Be´dard, 1985). ornaments, such as the murres (Uria), murrelets (Brachy- Additionally, the timing for the origin of the Alcidae ramphus and Synthliboramphus), guillemots (Cepphus), corresponds to the transition between the Mesozoic and Dovekie (Alle alle), Razorbill (Alca torda) and the extinct Cenozoic Eras 65.5 Mya, which is marked by a bolide Great Auk (Pinguinus impennis), with a broader distribu- impact on the Yucatan Peninsula in eastern Mexico (Alva- tion across the coast of the Arctic, Northern Atlantic and rez et al., 1980), leading to the demise of much of the bio- Northern Pacific Oceans. diversity of the planet at that time. Thus, the inferred origin An alternative method of phylogenetic inference known of modern auks in the Pacific Low Arctic is consistent with as matrix representation with parsimony, or simply super- this scenario because their ancestors would have had a trees, has been applied to Charadriiformes, including all lower chance of survival had they been in areas closer to species of extant auks and the extinct Pinguinus impennis. the site of the bolide impact, such as the eastern side of The supertree approach uses a matrix of presence/absence the Pacific Boreal or the Pacific Subtropical areas. of clades in source trees derived from any type of data as The gradual radiation of auks starting around 55 Mya characters, and does not consider the raw data used to (43, 68 Mya) is in conflict with previous suggestions of a establish the source trees. The supertree method is the only rapid adaptive radiation in the Mid-Late Oligocene, the one so far to have failed to recover the planktivorous clade lack of phylogenetic resolution at the genus level according as monophyletic (Thomas et al., 2004) by placing the clade to the fossil record and Cenozoic climate data (Be´dard, (Cerorhinca, Fratercula) as a sister lineage to (Aethia, 1985) and the rapid radiation shown in previous molecular Ptychoramphus) plus the balance of the remaining fish-eat- phylogenies (Friesen et al., 1996a; Moum et al., 1994). The ing genera (Fig. 1). The phylogenetic signal for the recipro- conflict is here clarified by assuming that the alcid fossil cal monophyly of the planktivorous and fishing–eating record is incomplete (Be´dard, 1985) and insufficient DNA clades seen in source trees for alcids (references 43–48 in data was gathered previously to establish the phylogenetic Thomas et al., 2004) is probably being overridden by other relationships within auks. A gradual radiation starting in source trees where few alcids were sampled in conjunction the Eocene is also characteristic of other birds (Baker with other non-alcids. Indeed, the recovered phylogenetic et al., 2006; Moyle, 2005; Nahum et al., 2003; Pereira relationships from a supertree can be strongly biased by et al., 2002; Tavares et al., 2006) and non-Avian vertebrates S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445 443

(Arnason et al., 2006; Cassens et al., 2000; Janis, 2003), and The invasion of the Arctic and the Northern Atlantic may represent a post-bolide recovery phase and reshaping Oceans by Fratercula arctica and Cepphus grylle, and of the planet’s biodiversity associated with major tectonic invasion of the Pacific by Uria aalge inornata via the events, significant global cooling, and change in oceanic Arctic coincide with intermittent openings of the Bering circulation at the Mid-Late Eocene that led to planetary Strait during the Pliocene and Pleistocene, favoring conditions similar to those seen on Earth in recent geolog- exchange of biota between both oceans (Marincovich ical times (Janis, 2003; Thomas, 2004). and Gladenkov, 1999). The hypothesis for the dispersal of Uria is not novel (Gaston and Jones, 1998), and it 4.3. A temporal framework of the historical biogeography of was originally assumed that Uria would have disap- modern auks peared from the Atlantic after colonizing the Pacific because of the absence of this genus in Atlantic deposits Two alternative routes of dispersal of auks from the until the Pliocene. Pacific into the Arctic and Atlantic Oceans have been pro- The Atlantic origin for Uria has been discarded previ- posed: (1) a northern route through the Arctic coastal ously on grounds that an Atlantic fossil has been wrongly zones of Eurasia and North America to the Atlantic assigned to this genus, and by the discovery of a Uria fossil (Be´dard, 1985) or (2) a southern route via the Central from the Miocene of California (Be´dard, 1985). This dis- American Seaway, connecting the Pacific and the Atlantic agrees with our reconstructions of ancestral areas through the coastal regions from California to Florida (Fig. 6). Using the northernmost breeding areas, the Atlan- for most of the Cenozoic (Konyukhov, 2002). The recon- tic (PML = 0.37), Arctic (PML = 0.27) and Pacific struction of ancestral areas mapped onto the chronogram (PML = 0.36) cannot be discarded as possible ancestral of our best phylogenetic hypothesis for the Alcidae areas. However, assuming the southernmost coding of (Fig. 6) suggests that dispersal played a large role in the ancestral breeding areas, the Atlantic is the preferred center speciation process within the family and that both routes of origin for Uria (PML = 0.99) over the other two oceans. are plausible, but at different times and for different genera. The alcid fossil record is incomplete (Be´dard, 1985) and the The early radiation of the clade containing primary ‘‘disappearance” of the common ancestor of Uria species planktivorous puffins (Fratercula) and auklets (Cerorhinca, from the Atlantic may be more apparent than real. Aethia and Ptychoramphus), the fish-eating murrelets of the In conclusion, the southern route via the Central Amer- genera Brachyramphus and Synthliboramphus, and guille- ican Seaway probably was of greater importance in the mots of the genus Cepphus, occurred during the early radiation of alcids compared to the northern route, Eocene—Early Miocene in the Pacific. which served as a more recent two-way route for exchanges The most recent common ancestor of Alca, Alle, Pingui- between the Atlantic and Pacific biota. Furthermore, alcids nus and Uria seems to have invaded the Atlantic Ocean in did not use the American southern route to extend their the Eocene/Oligocene between 40 and 30 Mya (Fig. 6), and distribution further through the coastal areas of South their radiation started in the Early Oligocene (Fig. 5; Table America, nor did auks use the southern coastal areas of 2). The fossil record indicates that the ancestors of auks eastern to reach localities closer to or below the were already well adapted to marine environments at this Equator. time. It is unlikely that they could have dispersed via the Arctic coasts of Eurasia and North America (Be´dard, Acknowledgments 1985) because extensive landmass still separated the Arctic and Pacific oceans during this time (Golonka et al., 2003). This work was supported by an operating grant to Additionally, survival in the Arctic Ocean would have been A.J.B. from the Natural Sciences and Engineering Re- rather difficult because the Arctic Ocean had large episodic search Council of Canada, the Royal Ontario Museum inputs of freshwater and reduced salinity in the Eocene Foundation, and the National Science Foundation (Brinkhuis et al., 2006). Hence, the most probable route (Assembling the Tree of Life (AToL) Program—EF- taken by the common ancestor of Alca, Alle, Pinguinus 0228693). We are grateful to the Louisiana State University and Uria from the Pacific to the Atlantic would have been Museum of Natural Science for granting a tissue sample through the Central American Seaway via the southern for Synthliboramphus craveri. coasts of North America (Fyler et al., 2005; Konyukhov, 2002). The regions around coastal California and Florida References had high biomass productivity due to upwelling zones dur- ing the Miocene (Konyukhov, 2002) until the closure of the Alvarez, L.W., Alvarez, W., Asaro, F., Michel, H., 1980. Extraterrestrial Isthmus of Panama (Emslie and Morgan, 1994), likely pro- cause for the Cretaceous–Tertiary extinction. Science 208, 1095–1108. viding for a good food supply on which the ancestors of the American Ornithologists’ Union, 1997. Forty-first supplement to the Atlantic alcids could rely. This scenario is consistent with American Ornithologists’ Union check-list of North American birds. The Auk 114, 542–552. the presence of modern alcids and the extinct Australca Arnason, U., Gullberg, A., Janke, A., Kullberg, M., Lehman, N., Petrov, in the Late Miocene fossil record of Florida (Konyukhov, E.A., Vainola, R., 2006. Pinniped phylogeny and a new hypothesis for 2002). their origin and dispersal. Mol. Phylogenet. Evol. 41, 345–354. 444 S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445

Baker, A.J., Pereira, S.L., Haddrath, O.P., Edge, K.-A., 2006. Multiple Hugall, A.F., Foster, R., Lee, M.S., 2007. Calibration choice, rate gene evidence for expansion of extant penguins out of Antarctica due smoothing, and the pattern of tetrapod diversification according to the to global cooling. Proceeding of the Royal Society of London B 273, long nuclear gene RAG-1. Syst. Biol. 56, 543–563. 11–17. Janis, C., 2003. Tectonics, climatic change, and the evolution of Baker, A.J., Pereira, S.L., Paton, T.A., 2007. Phylogenetic relationships mammalian ecosystems. In: Rothschild, L.J., Lister, A.M. (Eds.), and divergence times of Charadriiformes genera: multigene evidence Evolution on Planet Earth. The impact of the Physical Environment. for the Cretaceous origin of at least 14 clades of shorebirds. Biol. Lett. Academic Press, London, pp. 319–338. 3, 205–209. Jenkyns, H.C., Forster, A., Schouten, S., Sinninghe Damste, J.S., 2004. Be´dard, J., 1985. Evolution and characteristics of the Atlantic Alcidae. In: High temperatures in the Late Cretaceous Arctic Ocean. Nature 432, Nettleship, D.N., Birkhead, T.R. (Eds.), The Atlantic Alcidae: The 888–892. Evolution, Distribution and Biology of the Auks Inhabiting the Kishino, H., Thorne, J.L., Bruno, W.J., 2001. Performance of a divergence Atlantic Ocean and adjacent Areas. Academic Press, Inc., Orlando, pp. time estimation method under a probabilistic model of rate evolution. 1–51. Mol. Biol. Evol. 18, 352–361. Brinkhuis, H., Schouten, S., Collinson, M.E., Sluijs, A., Sinninghe Konyukhov, N.B., 2002. Possible way of spreading and evolution of Damste, J.S., Dickens, G.R., Huber, M., Cronin, T.M., Onodera, J., alcids. Biology Bulletin 29, 447–454. Takahashi, K., Bujak, J.P., Stein, R., van der Burgh, J., Eldrett, J.S., Ksepka, D.T., Bertelli, S., Giannini, N.P., 2006. The phylogeny of the Harding, I.C., Lotter, A.F., Sangiorgi, F., van Konijnenburg-van living and fossil Sphenisciformes (penguins). Chromosoma 22, 412– Cittert, H., de Leeuw, J.W., Matthiessen, J., Backman, J., Moran, K., 441. 2006. Episodic fresh surface waters in the Eocene Arctic Ocean. Nature Lewis, P.O., 2001. A likelihood approach to estimating phylogeny from 441, 606–609. discrete morphological character data. Syst. Biol. 50, 913–925. Brodkorb, P., 1964. Catalogue of fossil birds, part 2 (Anseriformes Maddison, D.R., Maddison, W.P., 2000. MacClade 4.0. Sinauer Associ- through Galliformes). Bull Florida State Mus. Biol. Sci. 8, 195–335. ates, Inc., Sunderland. Cassens, I., Vicario, S., Waddell, V.G., Balchowsky, H., Van Belle, D., Maddison, W.P., Maddison, D.R., 2005a. Mesquite: A Modular System Ding, W., Fan, C., Mohan, R.S., Simoes-Lopes, P.C., Bastida, R., for Evolutionary Analysis. Available from: . Meyer, A., Stanhope, M.J., Milinkovitch, M.C., 2000. Independent Maddison, W.P., Maddison, D.R., 2005b. Stochar: A Package of adaptation to riverine habitats allowed survival of ancient cetacean Mesquite Modules for Stochastic Models of Character Evolution. lineages. Proc. Natl. Acad. Sci. USA 97, 11343–11347. Available from: . Chandler, R.M., 1990. Fossil birds of the San Diego Formation, Late Marincovich Jr., L., Gladenkov, A.Y., 1999. Evidence for an early Pliocene, Blancan, San Diego County, California. Ornithological opening of the Bering Strait. Nature 397, 149–151. Monographs 44, 73–161. Moum, T., Arnason, U., Arnason, E., 2002. Mitochondrial DNA Clarke, J.A., Tambussi, C.P., Noriega, J.I., Erickson, G.M., Ketcham, sequence evolution and phylogeny of the Atlantic Alcidae, including R.A., 2005. Definitive fossil evidence for the extant avian radiation in the extinct Great Auk (Pinguinus impennis). Mol. Biol. Evol. 19, 1434– the Cretaceous. Nature 433, 305–308. 1439. Emslie, S.D., Morgan, G.S., 1994. A catastrophic death assemblage and Moum, T., Johansen, S., Erikstad, K.E., Piatt, J.F., 1994. Phylogeny and paleoclimatic implications of Pliocene of Florida. Science 264, evolution of the auks ( Alcinae) based on mitochondrial 684–685. DNA sequences. Proc. Natl. Acad. Sci. USA 91, 7912–7916. Friesen, V.L., Baker, A.J., Piatt, J.F., 1996a. Phylogenetic relationships Moyle, R.G., 2005. Phylogeny and biogeographical history of Trogoni- within the Alcidae (Charadriiformes: Aves) inferred from total formes, a pantropical order. Biol. J. Linn. Soc. 84, 725–738. molecular evidence. Mol. Biol. Evol. 13, 359–367. Nahum, L.A., Pereira, S.L., Fernandes, F.M.D., Matioli, S.R., Wajntal, Friesen, V.L., Piatt, J.F., Baker, A.J., 1996b. Evidence from cytochrome b A., 2003. Diversification of Ramphastinae (Aves, Ramphastidae) sequences and allozymes for a ‘‘new” species of alcid: the long-billed prior to the Cretaceous/Tertiary boundary as shown by molecular murrelet (Brachyramphus perdix). Condor 98, 681–690. clock of mtDNA sequences. Genetics and Molecular Biology 26, Fyler, C.A., Reeder, T.W., Berta, A., Antonelis, G., Aguilar, A., 411–418. Androukaki, E., 2005. Historical biogeography and phylogeny of Nylander, J.A., 2004. MrModeltest 2.0. Program distributed by the monachine seals (Pinnipedia: Phocidae) based on mitochondrial and author. Evolutionary Biology Centre, Uppsala University. nuclear DNA data. J. Biogeogr. 32, 1267–1279. Olson, S.L. (Ed.), 1985. The Fossil Record of Birds. Academic Press, New Gaston, A.J., Jones, I.L., 1998. The Auks. Alcidae. Oxford University York. Press, New York. Paton, T.A., Baker, A.J., 2006. Sequences from 14 mitochondrial genes Golonka, J., Bocharova, N.Y., Ford, D., Edrich, M.E., Bednarczyk, J., provide a well-supported phylogeny of the Charadriiform birds Wildharber, J., 2003. Paleogeographic reconstructions and basins congruent with the nuclear RAG-1 tree. Mol. Phylogenet. Evol. 39, development of the Arctic. Mar. Petrol. Geol. 20, 211–248. 657–667. Groth, J.G., Barrowclough, G.F., 1999. Basal divergences in birds and the Paton, T.A., Baker, A.J., Groth, J.G., Barrowclough, G.F., 2003. RAG-1 phylogenetic utility of the nuclear RAG-1 gene. Mol. Phylogenet. sequences resolve phylogenetic relationships within Charadriiform Evol. 12, 115–123. birds. Mol. Phylogenet. Evol. 29, 268–278. Haddrath, O., Baker, A.J., 2001. Complete mitochondrial DNA genome Pereira, S.L., Baker, A.J., 2004a. Low number of mitochondrial pseudo- sequences of extinct birds: ratite phylogenetics and the vicariance genes in the chicken (Gallus gallus) nuclear genome: implications for biogeography hypothesis. Proc. Biol. Sci. 268, 939–945. molecular inference of population history and phylogenetics. BMC Hasegawa, M., Kishino, H., Yano, T., 1985. Dating of the human-ape Evol. Biol. 4, 17. splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22, Pereira, S.L., Baker, A.J., 2004b. Vicariant speciation of (Aves, 160–174. ): a hypothesis based on mitochondrial DNA phylogeny. Auk Howard, H., 1968. Tertiary birds from Laguna Hills, Orange County, 121, 682–694. California. Los Angeles County Museum Contributions to Science Pereira, S.L., Baker, A.J., 2005. Multiple gene evidence for parallel 142, 1–21. evolution and retention of ancestral morphological states in the shanks Howard, H., 1972. Type specimens of avian fossils in the collections of (Charadriiformes: Scolopacidae). Condor 107, 514–526. the Natural History Museum of Los Angeles County. Natural Pereira, S.L., Baker, A.J., 2006a. A mitogenomics timescale for birds History Museum of Los Angeles County Contributions in Science detects variable phylogenetic rates of molecular evolution and refutes 228, 1–27. the standard molecular clock. Mol. Biol. Evol. 23, 1731–1740. S.L. Pereira, A.J. Baker / Molecular Phylogenetics and Evolution 46 (2008) 430–445 445

Pereira, S.L., Baker, A.J., 2006b. A molecular timescale for galliform birds Thorne, J.L., 2003. Multidivtime. Department of Genetics and Statistics, accounting for uncertainty in time estimates and heterogeneity of rates North Carolina State University, Raleigh, NC. of DNA substitutions across lineages and sites. Mol. Phylogenet. Evol. Thorne, J.L., Kishino, H., 2002. Divergence time and evolutionary rate 38, 499–509. estimation with multilocus data. Syst. Biol. 51, 689–702. Pereira, S.L., Baker, A.J., Wajntal, A., 2002. Combined nuclear and Thorne, J.L., Kishino, H., Painter, I.S., 1998. Estimating the rate of mitochondrial DNA sequences resolve generic relationships within the evolution of the rate of molecular evolution. Mol. Biol. Evol. 15, 1647– Cracidae (Galliformes, Aves). Syst. Biol. 51, 946–958. 1657. Ronquist, F., Huelsenbeck, J.P., 2003. MrBayes 3: Bayesian phylogenetic Walsh, H.E., Friesen, V.L., 2003. A comparison of intraspecific patterns inference under mixed models. Bioinformatics 19, 1572–1574. of DNA sequence variation in mitochondrial DNA, alpha-enolase, Sambrook, J., Fritch, E.F., Maniatis, T., 1989. Molecular Cloning: A and MHC II B loci in auklets (Charadriiformes: Alcidae). J. Mol. Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Evol. 57, 681–693. Spring Harbor, NY. Walsh, H.E., Jones, I.L., Friesen, V.L., 2005. A test of founder effect Shimodaira, H., 2002. An approximately unbiased test of phylogenetic speciation using multiple loci in the auklets (Aethia spp.). Genetics 171, tree selection. Syst. Biol. 51, 492–508. 1885–1894. Shimodaira, H., Hasegawa, M., 2001. CONSEL: for assessing the Warheit, W.I., 1992. A review of the fossil seabirds from the Tertiary of confidence of phylogenetic tree selection. Bioinformatics 17, 1246–1247. the North Pacific: plate tectonics, paleoceanography, and faunal Slack, K.E., Jones, C.M., Ando, T., Harrison, G.L., Fordyce, R.E., change. Paleobiology 18, 401–424. Arnason, U., Penny, D., 2006. Early penguin fossils, plus mitochondrial Watada, M., Kakizawa, R., Kuroda, N., Utida, S., 1987. Genetic genomes, calibrate avian evolution. Mol. Biol. Evol. 23, 1144–1155. differentiation and phylogenetic relationships of an avian family, Strauch Jr., J.G., 1985. The phylogeny of the Alcidae. Auk 102, 520–539. Alcidae (auks). J. Yamashima. Inst. Ornithol. 19, 79–88. Strimmer, K., von Haeseler, A., 1996. Quartet puzzling: a quartet Wiegmann, B.M., Yeates, D.K., Thorne, J.L., Kishino, H., 2003. Time maximum likelihood method for reconstructing tree topologies. Mol. flies, a new molecular time-scale for brachyceran fly evolution without Biol. Evol. 13, 964–969. a clock. Syst. Biol. 52, 745–756. Swofford, D.L., 2001. PAUPÃ: Phylogenetic Analysis Using Parsimony Yang, Z., 1997. PAML: a program package for phylogenetic analysis by (Ã and Other Methods) 4.0. Sinauer Associates, Inc., Sunderland. maximum likelihood. Comput. Appl. Biosci. 13, 555–556. Tavares, E.S., Baker, A.J., Pereira, S.L., Miyaki, C.Y., 2006. Phylogenetic Yang, Z., 2007. PAML 4: phylogenetic analysis by maximum likelihood. relationships and historical biogeography of neotropical parrots Mol. Biol. Evol. 24, 1586–1591. (Psittaciformes: Psittacidae: Arini) inferred from mitochondrial and Yang, Z., Yoder, A.D., 2003. Comparison of likelihood and Bayesian nuclear DNA sequences. Syst. Biol. 55, 454–470. methods for estimating divergence times using multiple gene loci and Thomas, D.J., 2004. Evidence for deep-water production in the North calibration points, with application to a radiation of cute-looking Pacific Ocean during the early Cenozoic warm interval. Nature 430, mouse lemur species. Syst. Biol. 52, 705–716. 65–68. Zwickl, D.J., 2006. Genetic algorithm approaches for the phylogenetic Thomas, G.H., Wills, M.A., Sze´kely, T., 2004. A supertree approach to analysis of large biological sequence datasets under the maximum shorebird phylogeny. BMC Evol. Biol. 4, 28. likelihood criterion. University of Texas, Austin.