Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Christophe Jean Douady, Frédéric Delsuc, Yan Boucher, W Ford Doolittle, Emmanuel Douzery

To cite this version:

Christophe Jean Douady, Frédéric Delsuc, Yan Boucher, W Ford Doolittle, Emmanuel Douzery. Com- parison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability.. Molecular Biology and Evolution, Oxford University Press (OUP), 2003, 20 (2), pp.248-54. ￿halsde-00192997￿

HAL Id: halsde-00192997 https://hal.archives-ouvertes.fr/halsde-00192997 Submitted on 30 Nov 2007

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Letter to the Editor, intended for Molecular Biology and Evolution Manuscript number: 2925

Comparison of Bayesian and Maximum Likelihood Bootstrap Measures of Phylogenetic Reliability

ChristopheJ.Douady,*FrédéricDelsuc,†YanBoucher,*W.FordDoolittle,*and EmmanuelJ.P.Douzery† *DepartmentofBiochemistryandMolecularBiology,DalhousieUniversity,Halifax, NovaScotia,B3H4H7,Canada †LaboratoiredePaléontologie,PaléobiologieetPhylogénie,InstitutdesSciencesde l’Evolution,UniversitéMontpellierII,Montpellier,France Author for correspondence and reprints: ChristopheJ.Douady:Email: [email protected] EmmanuelJ.P.Douzery:Email: [email protected]montp2.fr

1 Testingevolutionaryhypothesesinaphylogeneticcontextbecomesmorereliable asreconstructionmethodsbasedonmorerealisticmodelsofmolecularevolutionare available.However,computingtimeburdenlimitstheapplicationofmodelbased methodssuchasMaximumLikelihood(ML) whenmanytaxaand/orassessmentof reliabilityviastandard—nonparametric—bootstrapmethodsareinvolved(Felsenstein

1985).TimesavingsthusaccountinpartfortheincreasingpopularityofBayesian inferencemethods(e.g.,Karoletal.2001;Lutzoni,PagelandReeb2001;Murphyetal.

2001),asimplementedinprogramslikeMrBayes(HuelsenbeckandRonquist2001).

Thesemethodspromisecomputationaltractabilitywithlargedatasetsandcomplex evolutionarymodels(LargetandSimon1999;Huelsenbecketal.2001).

Bayesianinferenceofphylogenycombinesthepriorprobabilityofaphylogeny withthetreelikelihoodtoproduceaposteriorprobabilitydistributionontrees

(Huelsenbecketal.2001).Thebestestimateofthephylogenycanbeselectedasthetree withthehighestposteriorprobability,i.e.theMAximumPosteriorprobability(MAP) tree(RannalaandYang1996).Topologiesandbranchlengthsarenottreatedas parameters—asinMLmethods(Felsenstein1981)—butasrandomvariables.Because posteriorprobabilitiescannotbeobtainedanalytically,theyareapproximatedby numericalmethodsknownasMarkovchainMonteCarlo(MCMC)orMetropolis coupledMCMC(MCMCMC).Thesechainsaredesignedtoexploretheposterior probabilitysurfacebyintegrationoverthespaceofmodelparameters.Treesaresampled atfixedintervalsandtheposteriorprobabilityofagiventreeisapproximatedbythe proportionoftimethatthechainsvisitedit(YangandRannala1997).Aconsensustree canbeobtainedfromthesesampledtrees,andBayesianposteriorprobabilitiesof

2 individualclades(PP),asexpressedbytheconsensusindices,maybeviewedasclade credibilityvalues.Thus,Bayesiananalysisoftheinitialmatrixoftaxaandcharacters producesbothaMAPtreeandestimatesofuncertaintyofitsnodes,directlyassessing substitutionmodel,branchlengthandtopologicalvariables,aswellascladereliability values,allinareasonablecomputationtime.

Reliabilityofnodesinphylogenetictreesisclassicallyevaluatedintwoways.

First,fromtheinitialmatrixofcharacters,astrengthofgroupingvalueismeasured,i.e. theleastdecreaseinloglikelihoodassociatedwiththebreakingofthecladedefinedby thatnode(e.g.Meirelesetal.1999).Thestatisticalsignificanceofthisdecreasecanbe estimatedwithnonparametricorparametrictests(Goldman,AndersonandRodrigo

2000).WithBayesianmethods,reliabilityofMAPtreenodesderivesdirectlyfrom correspondingposteriorprobabilities.Inthesecondway,theinitialcharactermatrixis redrawnwithreplacement,andbootstrappercentages(BP)arecalculated,forexample undertheMLcriterion(BP ML ),andinterpretedasameasureofexperimentrepeatability

(Felsenstein1985)orphylogeneticaccuracy(FelsensteinandKishino1993).

TheBayesianapproachispresumedtoperformroughlyasbootstrappedML

(Huelsenbecketal.2001)butrunsmuchfaster(LargetandSimon1999;Huelsenbecket al.2001).RecentanalyseshaveaimedatcomparingBayesianandMLsupportsby studyingthecorrelationbetweenposteriorprobabilities(PP)andbootstrappercentages

(BP ML )(LeachéandReeder2002;Whittinghametal.2002).Acompilationofliterature values(Karoletal.2001;Murphyetal.2001;Buckleyetal.2002;LeachéandReeder

2002;Whittinghametal.2002;Wilcoxetal.inpress)revealsthatplottingPPasa functionofBP ML canshowsignificantcorrelation( P-values<0.02),butthatthestrength

3 2 ofthiscorrelationishighlyvariableandsometimeverylow(correlationcoefficientr between0.29and0.99,medianat0.71).Moreover,theslope(S)oftheregressionline(S between0.29and1.08,medianat0.79)indicatesthatBP ML aregenerallylowerthanPP.

ThistrendhasalreadybeennoticedbyRannalaandYang(1996)intheirpioneeringwork wherePPappearedsystematicallyhigherthanresamplingestimatedloglikelihood

(RELL)bootstrapsupportvalues.

AsmorephylogeneticresultsrelyingstrictlyonBayesiananalysesarepublished

(ArkhipovaandMorrison2001;Henzeetal.2001;Lutzoni,PagelandReeb2001),a betterunderstandingoftherelationbetweenPPandBP ML becomesessential.Inaworkto bepublished,Wilcoxetal.exploredthisrelationbyperformingsimulationsontheir originaldataset.Theyconcludethat,undertheconditionoftheirstudy,PPandBPare bothoverconservativemeasuresofphylogeneticaccuracy,butthatBayesiansupport valuesprovidecloserestimatesofthetrueprobabilitiesofrecoveringclades.Thusthey advocatethepreferentialuseofPPratherthanBP(Wilcoxetal.inpress).However, caseswhereconflictinghypothesesaresupportedbyhighposteriorprobabilitieshave beenreported(Buckleyetal.2002;Douadyetal.,inpress).Thissuggeststhatatleastin certaincasesPPputoverconfidenceonagivenphylogenetichypothesisanddrawing conclusionsfromthissolemeasureofsupportmightbemisleading.

TobetterunderstandtherelationshipbetweenPPandBP,weappliedstandard bootstrapresamplingprocedurestotheBayesianapproach,studyingthecorrelation betweenPP,BP ML ,andBP Bay —i.e.posteriorprobabilitiesestimatedafterbootstrapping ofthedata—foreightempiricaldatasetsspanningdifferentkindofcharacters,typesof sequences,genomiccompartments,andtaxonomicgroups.Evenwhenthecorrelation

4 2 2 betweenPPandBP ML wasweak(r <0.52),itbecameverystrong(r >0.96)when

Bayesianposteriorprobabilitiesarecomputedonbootstrappeddatamatrices.Moreover, albeitlessclearly,simulationseemstoconfirmthistrend.Thesesimulationsalsotendto predictthatPPovercomeBPsupportforbothtrueandfalsenodes.Wediscusstheeffect ofthebootstrappedapproachinthecaseofapparentconflictsbetweendatasets,and consideritspracticalimplicationsformeasuringphylogeneticreliability.

Eighthighlydiverseempiricaldatasetswerechosen(seedetailsin

SupplementaryMaterialatMBEwebsite:http://www.molbiolevol.org),includingtwo pairsshowingconflict(i.e.,PPstronglysupportingmutuallyexclusivenodes): mitochondrialversusnuclearmarkersfor14cicadas(Buckleyetal.2002),and mitochondrialrRNAmarkersforeither21or23sharks(Douadyetal.inpress).The modelofsequenceevolutionthatbestfitseachDNAdatasetandthecorrespondingGTR substitutionrateparameters,shapeofthefourcategoriesgammadistribution( Γ4)and fractionofinvariablesites(INV)wereestimatedbyModeltest3.06(PosadaandCrandall

1998),andthenusedinPAUP*4b10(Swofford2002)tocomputeMLbootstrap percentages(BP ML )after100pseudoreplicationswithNJstartingtreesandTBRbranch swapping.FortheHMGRaminoaciddataset,BP ML wereobtainedusingP ROML version

3.6a2.1ofthePHYLIP package(Felsenstein2001)withaJTTsubstitutionmatrixprovided byE.Tillier(pers.comm.)combinedtoa Γ4 +INVmodel,andparametersoptimizedby

PUZZLE 4.0.2(StrimmerandvonHaeseler1996).Bayesianposteriorprobabilities(PP) werecomputedunderthesameMLmodelswithMrBayes2.01(Huelsenbeckand

Ronquist2001)byrunningfourchainsfor100,000MCMCMCgenerationsusingthe programdefaultpriorsonmodelparameters.ForbootstrappedBayesiananalyseswe

5 generated100bootstrappseudoreplicatesforeachdatasetusingtheprogramS EQBOOT

3.6a2.1(Felsenstein2001),andestimatedBayesianposteriorprobabilitiesaspreviously describedforeachpseudoreplicate.Forallanalyses,1,000treesweresampledfromthe posteriorprobabilitydistribution(oneevery100generations)andaconservative50%of thetrees(500)wassystematicallydiscardedas“burnin”toensurethatthechainshave reachedstationarity.Bayesianbootstrappercentages(BP Bay )werecomputedforeach nodeintothreeways:i)theconsensusofthe500x100=50,000treesgeneratedfromthe

100bootstrappedpseudoreplicates,ii)theaverageofeachnodesPPforthe100MAP trees,andiii)theconsensusofthe100MAPtrees(“consensusofconsensus”).Giventhe tediousaspectofpreparingfilesforbootstrappedBayesiananalyses,aPerlscriptwas custommadeandisavailableuponrequest.

WealsoexploredtherelationbetweenPPandBPusingasimulationdesign.

MonteCarlosimulationof100datasetsof1,000charactersforseventaxaeachwas performedusingSEQ GEN 1.2.5(RambautandGrassly1997),underamodeltopology andassociatedbranchlengthstakenfromthearmadillosubsetofVWFxenarthrandata

(SupplementaryMaterial).TheK2Pmodelofnucleotidesubstitution(Kimura,1980)was chosenwithatransition:transversionratioof2.00anda Γ8distributionwith α=1.00.

BP ML andPPsupportswereobtainedforthese100simulateddatasetsfollowingthesame procedureasdescribedabove.Forcomputingtimereasons,i.e.running2,500times

MrBayes,BP Bay wereonlycomputedforthe25datasetsshowingthegreatestcontrast betweenBP ML andPP.

ForalleightdatasetsthescatterplotsofPPandBP Bay versusBP ML werevery similar.Inallcases,PPversusBP ML arecharacterizedbyamoderatedispersionbuta

6 flattenedslope(ScolumninSupplementaryMaterial:0.180.93)whileBP Bay vs.BP ML haveverylittleunexplainedvariation,slopesofcorrelationlinesappearingmuchsteeper andbeingalwaysverycloseto1(0.931.22).Figure1illustratesthistrendforthree individualdatasetsshowingthattheresultsareindependentofthenatureofthedata analyzed:nucleotideversusaminoacidcharacters,nuclearversusmitochondrial compartments,proteincodingversusnoncodingmarkersanddifferenttaxonomicgroups andlevels(Fig.1AC).Becauseempiricalobservationssufferfromthedifficultyof drawinggeneralconclusionsfromalimitednumberofobservations,wecombinedthesix strictlyindependentdatasetsandconfirmedourobservations(Fig.1D;Supplementary

Material).Therefore,weareconfidentthat,inempiricaldatasets,PPandBP ML will

2 proveonlymoderatelycorrelated(r =0.270.93;S=0.180.93; Pvalues<0.02),

2 whereastheBP Bay andBP ML arestronglycorrelated(r =0.950.99;S=0.931.22; P values<10 6).

SuchacorrelationbetweenBP ML andBP Bay seemsexpectablesincetheuseof uniformpriorsintheBayesiananalysesinvolvesthattheposteriorprobabilitydensityis stronglydependentuponthelikelihoodfunction.However,thiscorrelationisnottrivial eitherbecausetheMLandtheMAPtreesobtainedfromeachbootstrappseudoreplicate arenotalwaysidentical.Forexample,inthecaseofthe21sharkandxenarthrandata sets,MLandMAPtreesaredifferentin38%and27%ofthereplicates,respectively.

2 Therefore,theveryhighqualitycorrelationbetweenBP Bay andBP ML (r >0.95)cannotbe expected a priori .Wealsotestedseveraloftheassumptionsleadingtothestrong correlationbetweenBP Bay andBP ML .First,thepossibilitythatthequalityofthe correlationsobservedcoulddependuponrandomerroroccurringbetweenindependent

7 runsseemstobediscardedbytheminimalPPvarianceobservedonMCMCMC repeatabilityplots(Huelsenbecketal.2001).Itisthusunlikelythatthelowcorrelation betweenPPandBP ML reflectsaproblemofrepeatabilitybetweenindependentruns.

Second,we a priori removed50%ofthesampledtreesasMCMCMC“burnin”.This wasdonetoensurethatalltreessampledbeforestationaritywerediscarded,without actuallycheckingBayesianresultsofeachindividualbootstrappseudoreplicate.To checkforpotentialbiasesatthisstage,werecomputedBP Bay ,keeping90%ofall sampledtrees(i.e.,removing100insteadof500treesforeachpseudoreplicate).Results indicatethatbiasisquiteunlikelyasthelevelofBP Bay variationisverylow(e.g.,1%for theITSdataset).Therefore,“burnin”thresholdseemstobeofmodestimportanceas longasitiskeptrealistic,probablybecauseoftherapidconvergencetowardsstationarity ofourdata.Third,welookedattheeffectofmakinganoverallconsensus(i.e.,consensus ofall50,000treessampledoverall100pseudoreplicatesandaftera50%burnin)versus makingtheconsensusofthe100MAPtreesortheaverageofthePP.Compilationof nodesupports—forexampleforbothITSandBuckleyetal.(2002)nucleardatasets—

2 yieldshighcorrelations(r >0.95)betweenBP ML andBP Bay ,"PPaverage"or"MAPtrees consensus".However,itseemsthatBP Bay and"PPaverage"areclosertoBayesian philosophywhereas"MAPtreeconsensus"valuesareclosertotheMLbootstrap approach.Indeed,inthetwofirstcases,thecompletecollectionoftreesisconsidered whileinthelastcaseasingleoptimaltreeiskepttorepresenteachpseudoreplicate.

Giventhelikelylossofinformationduringtheconsensusiteration,itseemsthatusingan overallconsensuswasabetteroption.

8 Nonparametricbootstrappingmaybeanoverconservativeestimatorofnode reliability(HillisandBull1993;Wilcoxetal.inpress;butseeFelsensteinandKishino

1993;Efron,HalloranandHolmes1996).However,itremainsthemostcommonlyused waytocharacterizeit.Fromthestatisticalpointofview,posteriorprobabilitieshavethe advantageofstraightforwardnessbut,aswejustshowed,theyarenottightlycorrelated withMLbootstrappercentages.Theseestimatorsseemratherdifferent,asPPneedstobe calculatedonbootstrappeddatatobehavelikeBP ML supports.Recently,Wilcoxetal.(in press),basedonasimulationstudy,concludethatPPandBParebothoverconservative measuresofnodesupport,butthatPPprovidedcloserestimatesofthetrueprobabilities ofrecoveringclades.ResultsfromoursimulationsseemtoconfirmthefactthatPPisless conservativethanBP.Indeed,whenconsideringtruenodes—thosethatwerepresentin themodeltopology—PParegenerallyhigherthanBP ML andBP Bay (Fig.2A:upperright quarter).However,PPisalsohigherwhenlookingatstrongsupportforfalsenodes— thosethatwereabsentofthemodeltree(Fig.2B:upperrightquarter).Below50%ofPP andBP(Fig.2B:lowerleftquarter),i.e.forvaluesthatareusuallynotinterpretedfor phylogeneticinference,thereisalargedispersionofpointswithatrendoflowBPto overestimateaccuracyasnotedbyHillisandBull(1993).Asawholethesesimulation resultsimplythat,atleastincertaincases,highPPfalselyinterpretsignalandmayend upstronglysupportingincorrectphylogeneticrelationships.Thus,themoreconservative

BP ML andBP Bay seemlesssubjecttothebehaviorofstronglysupportinganodewhenit isactuallyfalse.

Furthermore,theexistenceofstrongconflictsinempiricaldatausingBayesian inferenceseemstoarguethatthisapproachmaybesensitivetosmallmodel

9 misspecificationsastheoreticallyanticipatedbyWaddell,KishinoandOta(2001)and subsequentlyshownbyBuckleyetal.(2002)andBuckley(2002).Bayesiananalyseson bootstrappeddatawereabletoeliminateapparentconflicts.TwonodesopposedbyPP=

0.93/0.94( Rhodopsalta sistertoeither Maoricicada or Kikihia dependingonthechoice ofmitochondrialornuclearmarkers:Buckleyetal.2002)and0.98/0.99(relativeposition ofHexanchiformesinsharks’interordinaltreedependingonthechoiceoftheoutgroup:

Douadyetal.inpress)respectively,thenreceivedBPBay =59/65and57/47after bootstrapresampling.Evidently,certainconflictsdiagnosedbyPPcouldbebiologically explainedbydifferencesbetweengenetreesandspeciestreesintroducedbyhorizontal transfer,lineagesorting,andgeneduplicationandextinction(reviewinMaddison1997).

Inparticular,hybridizationbetweentaxamightalternativelyaccountfortheconflict observedbetweenmitochondrialandnucleargenesinCicadas.However,inthecaseof sharks,theconflictarosewhentaxaareaddedtotheoutgroup(forthesamegene).It appearsmorethanlikelythatthisspuriousconflictwastheresultoftheoverestimationof nodesupportbasedonPPandthatconclusionsbasedsolelyonthisestimatorwouldhave beenpositivelymisleading.

Drawinggeneralconclusionsfromempiricalstudiescouldbeproblematic becausewedonotknowhowrepresentativeourexampledatasetsareofphylogenetic problems.However,using“real”datasetsdoeshavetheadvantageofavoidingthe simplifyingassumptionsinherentinsimulatingDNAdataunderagivenmodel(Buckley

2002;BuckleyandCunningham2002).Furthermore,inourcaseanalysesbasedonboth empiricalandsimulateddataseemtocorroborateeachotherinsuggestingthat,being moreconservative,BP ML andBP Bay mightbelesspronetostronglysupportingafalse

10 phylogenetichypothesis,thusreinforcingconcernsregardingPPsensitivitytomodel misspecifications.

Nevertheless,Bayesianinference(withorwithoutbootstrap)remainsavery efficientwaytosimultaneouslyestimatesubstitutionmodelparameters,branchlengths andtopologyundercomplexmodelsofevolutionarychange(Huelsenbeck2002).Ifwe takeourshark12S16Sdatasetwith23taxaasanexample,aregularPP(oroneBP Bay replicate)requiresroughly1.5hourofcomputingtimeonaPentium4runningat1.80

GHz,against120hoursforasinglereplicateofBPML withsimultaneousestimationofall parameters.BayesiansearchonbootstrapdataismuchfasterthanMLiftheuserwants parameterstobeestimatedasthesearchgoes,andgivesverysimilarresults.However,in thewidemajorityofcases,aML(orBP ML )searchwithsimultaneousestimationofthe parametersisnotnecessaryand a priori approximationsallowtheidentificationofthe optimaltreesandbootstrapsupports.TheBayesianapproachalsoprovidesauniqueway toanalyzeaminoaciddatawithsimultaneousparametersestimation(inpopular phylogeneticpackagessuchasP AUP orP HYLIP thisoptionisonlyavailableforDNA).

BothPPandBootstrapsupportsareofgreatinteresttophylogenyaspotentialupperand lowerboundofnodesupportaccuracy,buttheyaresurelynotinterchangeableandcannot bedirectlycompared.Inthatcontext,usersmayprefercomputingPPandBP Bay orBP ML tobetterexploretherangeofnodesupportestimates,especiallywhenpotentialconflicts betweendatasetsareexplored.

11 Acknowledgments

ChrisSimonandThomasR.BuckleykindlymadetheirCicadadatasets available.DavidM.HillisandThomasP.Wilcoxprovidedpreciousenlightenmentabout theirsimulationsontheSnakedataset.WearegratefultoAlistairG.B.Simpson,the

AssociateEditorMarkA.Ragan,andtwoanonymousreviewersforinsightfulcomments.

ThisworkwassupportedbyagrantfromtheCanadianInstituteforHealthResearch

(MOP42470),aKillamPostdoctoralFellowshiptoCJD,adoctoralMENRTgrant

(contract99075)toFD,adoctoralresearchawardfromtheCanadianInstitutesofHealth

ResearchtoYB,andbytheTMRNetwork"Mammalianphylogeny"(contractFMRX–

CT98–022)oftheEuropeanCommunity,the"GénopoleMontpellierLanguedoc

Roussillon",andthe"ActionBioinformatique"oftheCNRStoEJPD.Thisisthe contributionISEM2002BBBoftheInstitutdesSciencesdel’EvolutiondeMontpellier

(UMR5554CNRS).

12 LITERATURE CITED

ARKHIPOVA ,I.R.,and H.G.M ORRISON .2001.Threeretrotransposonfamiliesinthe

genomeof Giardia lamblia :twotelomeric,onedead.Proc.Natl.Acad.Sci.USA

98:14497502.

BOUCHER ,Y.,H.H UBER ,S.L'H ARIDON ,K.O.S TETTER ,and W.F.D OOLITTLE .2001.

BacterialoriginfortheisoprenoidbiosynthesisenzymeHMGCoAreductaseofthe

archaealordersThermoplasmatalesandArchaeoglobales.Mol.Biol.Evol.18:

137888.

BUCKLEY ,T.R.2002.Modelmisspecificationandprobabilistictestsoftopology:

evidencefromempiricaldatasets.Syst.Biol.51:509523.

BUCKLEY ,T.R.,and C.W.C UNNINGHAM .2002.Theeffectofnucleotidesubstitution

modelassumptionsonestimatesofnonparametricbootstrapsupport.Mol.Biol.

Evol.19:394405.

BUCKLEY ,T.R.,P.A RENSBURGER ,C.S IMON ,and G.K.C HAMBERS .2002.Combined

data,Bayesianphylogenetics,andtheoriginoftheNewZealandcicadagenera.

Syst.Biol.51:418.

DELSUC ,F.,M.S CALLY ,O.M ADSEN ,M.J.S TANHOPE ,W.W. DE JONG ,F.M.

CATZEFLIS ,M.S.S PRINGER ,and E.J.P.D OUZERY .Inpress.Molecularphylogeny

oflivingxenarthransandtheimpactofcharacterandtaxonsamplingonthe

placentaltreerooting.Mol.Biol.Evol.

DOUADY ,C. J., M. DOSAY ,M. S. SHIVJIAND M. J. STANHOPE .Inpress.Molecular

phylogeneticevidencerefutingthehypothesisofBatoidea(raysandskates)as

derivedsharks.Mol.Phylogenet.Evol.

13 DOUZERY ,E. J., A. M. PRIDGEON ,P. KORES ,H. P. LINDER ,H. KURZWEIL ,andM. W.

CHASE .1999.Molecularphylogeneticsof():acontribution

fromnuclearribosomalITSsequences.Am.J.Bot.86:887899.

EFRON ,B.,E.H ALLORAN ,andS.H OLMES .1996.Bootstrapconfidencelevelsfor

phylogenetictrees.Proc.Natl.Acad.Sci.USA93:1342913434.

FELSENSTEIN ,J.1981.EvolutionarytreesfromDNAsequences:amaximumlikelihood

approach.J.Mol.Evol.17:368376.

FELSENSTEIN ,J.1985.Confidencelimitsonphylogenies:anapproachusingthe

bootstrap.Evolution39:783791.

FELSENSTEIN ,J.2001.PHYLIP(PHYLogenyInferencePackage).Version3.6a2.1.

DepartmentofGenomeSciences.UniversityofWashington.Seattle.

FELSENSTEIN ,J.,and H.K ISHINO .1993.Istheresomethingwrongwiththebootstrapon

phylogeniesareply.Syst.Biol.42:193200.

GOLDMAN ,N.,J.P.A NDERSON ,and A.G.R ODRIGO .2000.Likelihoodbasedtestsof

topologiesinphylogenetics.Syst.Biol.49:652670.

HENZE ,K.,D.S.H ORNER ,S.S UGURI ,D.V.M OORE ,L.B.S ANCHEZ ,M.M ULLER ,and T.

M.E MBLEY .2001.Uniquephylogeneticrelationshipsofglucokinaseand

glucosephosphateisomeraseoftheamitochondriateeukaryotes Giardia intestinalis ,

Spironucleus barkhanus and Trichomonas vaginalis .Gene281:123131.

HILLIS ,D.M.,and J.J.B ULL .1993.Anempiricaltestofbootstrappingasamethodfor

assessingconfidenceinphylogeneticanalysis.Syst.Biol.42:182192.

HUELSENBECK ,J.P.2002.TestingacovariotidemodelofDNAsubstitution.Mol.Biol.

Evol.19:698707.

14 HUELSENBECK ,J.P.,and F.R ONQUIST .2001.MRBAYES:Bayesianinferenceof

phylogenetictrees.Bioinformatics17:754755.

HUELSENBECK ,J.P.,F.R ONQUIST ,R.N IELSEN ,and J.P.B OLLBACK .2001.Bayesian

inferenceofphylogenyanditsimpactonevolutionarybiology.Science294:2310

2314.

KAROL ,K.G.,R.M.M CCOURT ,M.T.C IMINO ,and C.F.D ELWICHE .2001.Theclosest

livingrelativesofland.Science294:23513.

KIMURA ,M. 1980.Asimplemethodforestimationevolutionaryrateofbasesubstitutions

throughcomparativestudiesofnucleotidesequences.J.Mol.Biol.16:111120.

LARGET ,B.,and D.L.S IMON .1999.MarkovchainMonteCarloalgorithmsforthe

Bayesiananalysisofphylogenetictrees.Mol.Biol.Evol.16:750759.

LEACHÉ ,A.D.,and T.W.R EEDER .2002.MolecularsystematicsoftheEasternFence

Lizard( Sceloporus undulatus ):acomparisonofparsimony,likelihood,and

Bayesianapproaches.Syst.Biol.51:4468.

LUTZONI ,F.,M.P AGEL ,and V.R EEB .2001.Majorfungallineagesarederivedfrom

lichensymbioticancestors.Nature411:93740.

MADDISON ,W.P.1997.Genetreesinspeciestrees.Syst.Biol.46:523536.

MEIRELES ,C.M.,J.C ZELUSNIAK ,M.P.S CHNEIDER ,J.A.M UNIZ ,M.C.B RIGIDO ,H.S.

FERREIRA ,and M.G OODMAN .1999.MolecularphylogenyofatelineNewWorld

monkeys(Platyrrhini,Atelinae)basedongammaglobingenesequences:evidence

that Brachyteles isthesistergroupof Lagothrix .Mol.Phylogenet.Evol.12:1030.

MURPHY ,W.J.,E.E IZIRIK ,S.J.O'B RIEN ,O.M ADSEN ,M.S CALLY ,C.J.D OUADY ,E.

TEELING ,O.A.R YDER ,M.J.S TANHOPE ,W.W. DE JONG ,and M.S.S PRINGER .

15 2001.ResolutionoftheearlyplacentalmammalradiationusingBayesian

phylogenetics.Science294:23482351.

POSADA ,D.,and K.A.C RANDALL .1998.MODELTEST:testingthemodelofDNA

substitution.Bioinformatics14:817818.

RAMBAUT ,A.,andN.C.GRASSLY .1997.SeqGen:anapplicationfortheMonteCarlo

simulationofDNAsequenceevolutionalongphylogenetictrees.Comput.Appl.

Biosci.13:235238.

RANNALA ,B.,and Z.Y ANG .1996.Probabilitydistributionofmolecularevolutionary

trees:anewmethodofphylogeneticinference.J.Mol.Evol.43:304311.

STRIMMER ,K.,and A. VON HAESELER .1996.QuartetPuzzling:aquartetmaximum

likelihoodmethodforreconstructingtreetopologies.Mol.Biol.Evol.13:964969.

SWOFFORD ,D.L.2002.PAUP*.PhylogeneticAnalysisUsingParsimony(*andOther

Methods).Version4.0b10.SinauerAssociates.Sunderland,Massachusetts.

WADDELL ,P.J.,H.K ISHINO ,and R.O TA .2001.APhylogeneticFoundationfor

ComparativeMammalianGenomics.GenomeInformaticsSeries.12:141155.

WHITTINGHAM ,L.A.,B.S LIKAS ,D.W.W INKLER ,and F.H.S HELDON .2002.Phylogeny

ofthetreeswallowgenus, Tachycineta (Aves:Hirundinidae),byBayesiananalysis

ofmitochondrialDNAsequences.Mol.Phylogenet.Evol.22:430441.

WILCOX ,T.P.,D.J.ZWICKL ,T.A.HEATH ,and D.M.HILLIS .Inpress.Phylogenetic

relationshipsofthedwarfboasandacomparisonofBayesianandbootstrap

measuresofphylogeneticsupport.Mol.Phylogenet.Evol.

YANG ,Z.H.,and B.R ANNALA .1997.BayesianphylogeneticinferenceusingDNA

sequences:aMarkovChainMonteCarlomethod.Mol.Biol.Evol.14:717724.

16 FIGURE LEGENDS

FIG . 1. Linearcorrelationbetweenmaximumlikelihoodbootstrappercentages(BP ML ) andBayesianposteriorprobabilities(PP,opencircles)orbootstrappedBayesianposterior probabilities(BP Bay ,blacktriangles)forempiricaldatasets.Thedottedlinerepresentsa slopeof1—withequalityofBP ML andPPorBP Bay —whiledashedandplainlines representPP=f(BP ML )andBP Bay =f(BP ML )regressionlines,respectively.Allaxes representnodesupportaspercentages.SeeSupplementaryMaterialforfurther informationregardingdatasets.

FIG . 2. Linearcorrelationbetweenmaximumlikelihoodbootstrappercentages(BP ML ) andBayesianposteriorprobabilities(PP,opencircles)orbootstrappedBayesianposterior probabilities(BP Bay ,blacktriangles)in25simulateddatasets.“Truenodes”arenodes thatwerepresentinthemodeltopologyusedtosimulatethedatasetsand“Falsenodes” arenodesthatwerenotinthemodeltopology.Thedottedlinerepresentsaslopeof1— withequalityofBP ML andPPorBP Bay —whiledashedandplainlinesrepresentPP= f(BP ML )andBP Bay =f(BP ML )regressionlines,respectively.Allaxesrepresentnode supportaspercentages.Topologyandparametersusedforthesimulation:

(A:0.043143,((B:0.027559,(C:0.018247,D:0.024211):0.003601):0.011055,(E:0.005704,(

F:0.010024,G:0.006528):0.000708):0.021913):0.003809)withbranchlengthsissued fromthexenarthrandataset;1000nucleotides;K2PwithTi/Tv=2.00; Γ8with α=1.00.

17 SUPPLEMENTARY MATERIAL

Linear Correlation between ML Bootstrap Percentages and Bayesian support for

Eight Highly Diverse Empirical Data Sets

Xaxis:BP ML Xaxis:BP ML

Yaxis:PP Yaxis:BP Bay Data r2 S B r2 S B

Orchids,ITS 1 0.85 0.59 44.09 0.99 1.22 21.47

Mammals,VWF 2 0.93 0.74 27.29 0.99 1.07 8.13

Insects,EF1 α3 0.75 0.36 64.33 0.99 1.07 6.97

Insects,Mitochondrial 4 0.75 0.93 12.05 0.99 1.10 9.95

3domains,HMGR 5 0.73 0.59 43.89 0.98 0.98 1.77

Sharks,12S16S(23taxa) 6 0.52 0.18 83.48 0.96 0.95 2.81

Sharks,12S16S(21taxa) 7 0.49 0.38 64.70 0.99 0.98 1.19

Snakes,12S16S 8 0.27 0.25 73.37 0.95 0.93 4.85

9 Combinationof6datasets 0.54 0.47 55.64 0.96 1.01 1.85

Combinationof8datasets 0.54 0.45 57.40 0.97 1.01 1.38

NOTE.—SandBarerespectivelytheslopeandtheinterceptofthelinearcorrelationY=

SxBP ML +B.

1:SubsetofnuclearribosomalITS(682alignednucleotides,nt)for23Diseaeorchids including10Satyriinae,12DisiinaeandoneBrownleeinaespecies(Douzeryetal.1999); highestlikelihoodtree:( Brownleea ,((( Disa uniflora ,( Disa racemosa ,Disa pillansii ,( Disa cardinalis ,Disa tripetaloides ))),( Disa glandulosa ,Disa longicornis )),(( Monadenia ,( Disa

18 chrysostachya ,Herschelia )),( Disa rosea ,Disa sagittalis ))),((( membranaceum ,( Satyrium humile ,( Satyrium stenopetalum ,( Satyrium acuminatum ,( Satyrium carneum ,Satyrium ligulatum ))))),( Satyrium nepalense ,Satyrium odorum )),( Satyrium bicallosum ,Satyrium rhynchanthum ))).

2:NuclearproteincodinggenevWF(1161nt)for13xenarthranmammals(Delsucetal. inpress);highestlikelihoodtree:((( Dasypus novemcinctus ,Dasypus kappleri ),(( Euphractus sexcinctus ,( Chaetophractus villosus ,Zaedyus pichiy )),( Tolypeutes matacus ,( Priodontes maximus ,Cabassous unicinctus )))),( Cyclopes didactylus ,( Tamandua tetradactyla ,Myrmecopha tridactyla )),( Choloepus didactylus ,Bradypus tridactylus )).

Sevenarmadillogeneraareunderlinedandwereusedtoprovidethemodeltreefor simulations.

3:EF1αproteincodinggene(2033nt)ofBuckleyetal.(2002)for14cicadainsects; highestlikelihoodtree:( Diemeniana frenchi ,Diemeniana tillyardi ,((( Amphipsalta cingulata ,Notopsalta sericea ),( Cicadetta celis ,Cicadetta puer )),( Pauropsalta johanae ,( Myersalna depicta ,(( Maoricicada cassiope ,Maoricicada hamiltoni ),(( Kikihia scutellaris ,Kikihia cauta ),( Rhodopsalta cruentata ,Rhodopsalta leptomera ))))).

4:Mitochondrial(12S16SribosomalRNA[rRNA]+COI+COII)markers(2249nt)of

Buckleyetal.(2002)for14cicadas;highestlikelihoodtree:( Diemeniana frenchi ,Diemeniana tillyardi ,((( Amphipsalta cingulata ,Notopsalta sericea ),( Cicadetta celis ,Cicadetta puer )),( Pauropsalta johanae ,( Myersalna depicta ,(( Kikihia scutellaris ,Kikihia cauta ),(( Maoricicada cassiope ,Maoricicada hamiltoni ),( Rhodopsalta cruentata ,Rhodopsalta leptomera ))))).

19 5:3hydroxy3methylglutarylcoenzymeAreductase(HMGR,258aminoacids)for15 taxarepresentingallthreedomainsoflife(EukaryaBacteriaArchea;Boucheretal.

2001);highestlikelihoodtree:((( Archaeoglobus profundus ,( Archaeoglobus fulgidus ,( Streptococcus pyogenes ,Pseudomonas mevalonii ))),(( Saccharomyces cerevisiae ,Homo sapiens ),( Arabidopsis thaliana ,Zea mays ))),(( Methanothermobacter thermautotrophicus ,( Vibrio cholerae ,Haloferax volcanii )),((( Pyrococcus abyssi ,Pyrococcus horikoshii ), Streptomyces aeriouvifer ), Aeropyrum pernix )))

6 :Sharkmitochondrial12S16SrRNAfor23taxa(1880nt,Douadyetal.,inpress); highestlikelihoodtree:(Petromyzon marinus ,( Polymixia japonica ,(((((( Centrophorus granulosus ,Squalus acanthias ),( Squatina californica ,Pristiophorus nudipinnis )),(( Heterodontus francisci ,Ginglymostoma cirratum ),((((( Isurus oxyrinchus ,Isurus paucus ), Lamna nasus ), Carcharodon carcharias ),( Carcharias taurus ,Alopias vulpinus )),(( Carcharhinus porosus ,Mustelus manazo ), Scyliorhinus canicula )))),( Hexanchus griseus ,Heptranchias perlo )),( Raja radiata ,Urobatis jamaicensis )), Hydrolagus colliei )), Siren intermedia ).

7 :Sharkmitochondrial12S16SrRNAfor21taxa(1963nt,Douadyetal.,inpress); highestlikelihoodtree:(Polymixia japonica ,((((((( Centrophorus granulosus ,Squalus acanthias ),( Squatina californica ,Pristiophorus nudipinnis )),( Hexanchus griseus ,Heptranchias perlo )),(((((( Isurus oxyrinchus ,Isurus paucus ), Lamna nasus ), Carcharodon carcharias ), Carcharias taurus ), Alopias vulpinus ),(( Carcharhinus porosus ,Mustelus manazo ), Scyliorhinus canicula ))), Heterodontus francisci ), Ginglymostoma cirratum ),( Raja radiata ,Urobatis jamaicensis )), Hydrolagus barbouri ).

20 8:Snakemitochondrial12S16SrRNAfor23taxa(1545nt,Wilcoxetal.,inpress); highestlikelihoodtree:(( Leptotyphlops dulcis ,( Typhlops jamaicensis ,Typhlops ruber )),( Anilius scytale ,(( Trachyboa boulengeri ,( Tropidophis greenwayi ,( Tropidophis pardalis ,( Tropidophis feicki ,Tropidophis melanurus )))),(( Xenopeltis unicolor ,( Morelia boeleni ,Loxocemus bicolor )),(( Cylindrophis ruffus ,( Uropeltis melanogaster ,Rhinophis philippinus )),((( Ungaliophis continentalis ,Exiliboa placata ), Eryx conicus ),( Boa constrictor ,( Acrochordus javanicus ,( Pituophis lineaticolis ,( Crotalus polysticus ,Azemiops feae )))))))))).

9:Sixstrictlyindependentdatasets(Sharks12S16S[21taxa]andInsectsEF1αdatasets excluded).

21 Figure 1

A. The three domains of life B. Sharks (23 taxa) HMGR 12S-16S 100 100

90 90

80 80 y = 0.18x + 83.48 y = 0.59x + 43.89 R² = 0.52 R² = 0.73 70 70

60 60 Bay Bay

50 50 PP and PP BP PP and PP BP 40 40 y = 0.95x + 2.81 30 30 R² = 0.96 y = 0.98x + 1.77 20 R² = 0.98 20

10 10

0 0 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

BP ML BP ML C. Orchids D. Combination ITS Six empirical data sets 100 100

90 90

80 y = 0.59x + 44.09 80 y = 0.47x + 55.64 R² = 0.85 R² = 0.54 70 70

60 60 Bay Bay

50 50 y = 1.01x - 1.85 PP and PP BP 40 and PP BP 40 R² = 0.96 y = 1.22x - 21.47 30 R² = 0.99 30

20 20

10 10

0 0 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

BP ML BP ML

22 Figure 2

A. True nodes B. False nodes

100 100

90 y = 0.87x + 15.79 90 y = 1.14x - 2.05 R² = 0.88 R² = 0.78 80 80

70 70

60 60 Bay Bay

50 50 y = 0.92x + 4.99 PP andPP BP PP and PP BP 40 R ²= 0.92 40

30 30

20 20 y = 0.84x + 11.54 R² = 0.80 10 10

0 0 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

BP ML BP ML

23 Figure 3

A. Mitochondrial genes Nuclear gene (COI + COII + 12S + 16S) (EF-1ααα) 2249 sites 2033 sites Diemeniana frenchi

Diemeniana tillyardi

Amphipsalta cingulata

Notopsalta sericea

Cicadetta celis

Cicadetta puer

Pauropsalta johanae

Myersalna depicta

Kikihia scutellaris Maoricicada cassiope

Kikihia cauta Maoricicada hamiltoni

Maoricicada cassiope Kikihia scutellaris

Maoricicada hamiltoni Kikihia cauta

Rhodopsalta cruentata

PP = 0.93 Rhodopsalta leptomera PP = 0.94

BP ML = 69 BP ML = 65 BP Bay = 59 0.1 substitution / site BP Bay = 65

B. 23 taxa 21 taxa (12S + 16S) (12S + 16S) 1880 sites 1963 sites

Petromyzon Siren

Polymixia Chimaeriformes Rajiformes Myliobatiformes Orectolobiformes Hexanchiformes Heterodontiformes

Squaliformes

Squatiniformes Pristiophoriformes Heterodontiformes Hexanchiformes Orectolobiformes PP = 0.98 PP = 0.99 BP ML = 61 BP ML = 53 Carcharhiniformes BP Bay = 57 BP Bay = 47

Lamniformes

0.1 substitution / site

24