Syst. Biol. 50(4):525–539, 2001

Biasin PhylogeneticEstimation and Its Relevance to theChoice betweenParsimony and Likelihood Methods

DAVID L. S WOFFORD,1,6 PETER J. WADDELL,2 JOHN P. HUELSENBECK,3 PETER G. FOSTER,1,7 PAUL O. LEWIS,4 AND JAMES S. ROGERS5 1Laboratory of MolecularSystematics, National Museumof Natural History, SmithsonianInstitution Museum Support Center,4210 Silver Hill Road, Suitland, Maryland 20746, USA; E-mail: [email protected] 2Instituteof MolecularBioSciences, Massey University,Palmerston North, New Zealand;E-mail: [email protected] 3Department of Biology,University of Rochester,Rochester ,New York 14627,USA; E-mail:[email protected] 4Department of Ecologyand Evolutionary Biology, The University of Connecticut,U-43, 75 N.EaglevilleRoad, Storrs, Connecticut06269-30437, USA; E-mail: [email protected] 5Department of BiologicalSciences, University of New Orleans, New Orleans, Louisiana70148, USA; E-mail:[email protected]

Itis now widely recognizedthat un- actuallyoccur and thus do notgo far enough der relativelysimple modelsof stochastic in correctingfor the problem. Inconsistency change,phylogenetic inference methodscan canalso arise under parsimony,even when activelymislead investigators attempting to allbranches have the samelength (Kim, estimateevolutionary trees from molecu- 1996),although in thiscase there muststill larsequences andother data. One instance be particularimbalances in the totallengths ofthis phenomenon is“ long-branch attrac- ofthe pathsfrom internal nodes to tipsof the tion,”in whichsome pairs of taxa have tree;“ long-path attraction”would describe ahigher probability of sharingthe same thisphenomenon. characterstate because ofparallel or con- Long-branch attractionhas been widely vergent changesalong long branchesthan used, andabused, in justifying choicesof dotaxa that are more closely related be- methodsand in explaining anomalousre- causethey haveretained some same state sults.Critics of the relevance oflong-branch froma commonancestor. Methods that sys- attractionand related artifacts have generally tematicallyunderestimate the actualamount takentwo tacks. The Žrst(e.g., Farris,1983) ofdivergence maythen become statisti- claimsthat the demonstrationof long-branch callyinconsistent or “ positivelymisleading” attractionrequires simple andunrealistic (Felsenstein, 1978;Hendy andPenny ,1989), modelsof evolutionary change. As pointed estimatingan incorrect tree withan increas- outby Kim (1996),this argument lacks force ing certaintyas the amountof characterdata because conditionsthat lead to inconsistency increases.Although usually associatedwith aremuch moregeneral andcomplex than parsimonymethods, long-branch attraction thoseoutlined by Felsenstein (1978);further canalso af ict maximum likelihood anddis- relaxationof Felsenstein’s conditionssimply tanceanalyses when the assumedsubstitu- exacerbatesthe problem. The secondline tionmodels of these methodsare strongly ofargument (e.g., Siddall andKluge, 1997) violated(e.g., Huelsenbeck andHillis, followsfrom the factthat “ truth”is unknow- 1993;Huelsenbeck, 1995;W addell,1995:377– able in science generally; because itis not 404;Gaut and Lewis, 1995; Chang, 1996a; possible tobe certainthat the analysisof a Lockhartet al., 1996; Sullivan andSwofford, realdata set has been compromisedby long- 1997).In thiscase, although the methods branchattraction, the abilityof a methodto areexplicitly designed todeal with superim- converge,in principle, tothe correctsolution posedsubstitutions (multiple hits),the un- withincreasing amounts of data is irrelevant. derlying modelspredict fewer ofthese than In thisview ,“‘accuracy’is rendered empty asan empirical claim” (Siddall andKluge, 6 Currentaddress: The Natural History Museum, 1997:318).Proponents of model-based(or Cromwell Road,London SW7 5BD, U.K.; E-mail: statistical)methods that seek toavoid [email protected] 7 Currentaddress: School of ComputationalScience inconsistencyattributable to long-branch or andInformation T echnology,FloridaState University , long-path artifactshave not been dissuaded Tallahassee,Florida, 32306-4120. by thisargument. They certainlyappreciate

525 526 SYSTEMATICBIOLOGY VOL. 50 the elusiveness of“ truth”but understand the modelis misspeciŽ ed (e.g., overcorrec- thatall methods are susceptible tofailure tionfor among-site rate variation). However , under certainconditions. Consequently , wewill use thisterm in Siddall’s contextfor these proponentsseek methodsand models the present purposes. thatwill succeed under awide range of We—and undoubtedly others—realized plausible conditionsand that are less likely long agothat when long-branch attractionfa- toyield misleadingresults purely because of vorsthe correctunrooted tree forfour taxa artifacts.Historically ,the different perspec- ratherthan one ofthe twoincorrect trees, tiveshave led toaschismbetween thosewho parsimonywould outperform maximum wouldapproach froma statis- likelihood inchoosing a topology . Parsimony ticalperspective andthose who place strong “succeeds”in the inverse-Felsenstein zone faithin one particularapproach over all oth- because itis a stronglybiased method, the ers.In manyareas of science, the statistical directionof the biasfavoring the correct modeling viewpointtends to become more tree ratherthan an incorrect one, in con- predominantas asubject matures.However , trastto the situationin the Felsenstein zone. proponentsof model-basedmethods in phy- Thispoint was obvious enough notto merit logeneticshave not always helped their case publicationon its own, although we have by makingoverly assertive and sometimes mentioned itin variousother contexts (e.g., misleadingclaims about the superiorityof Swoffordet al.,1995; W addell,1995). How- these methods(see Sidow,1994;Hillis et al., ever, the portrayalby Siddall (1998)of this 1994). observationas avictoryfor parsimony meth- Againstthis backdrop of confusing and odsdemands closer scrutiny .Properly in- often acrimoniousdebate, Siddall (1998)of- terpreted, the resultsof Siddall’s simula- fered anew challenge tothe positionthat con- tionsactually support the superiorityof siderationsof long-branch attractionfavor model-basedmethods for dealing withlong- model-basedmethods. Siddall’ s position, branchartifacts just as stronglyas did those whichseems reasonable at least on the sur- fromearlier studiesthat concentrated on the face,can be summarizedsimply: Although Felsenstein zone.W eemphasizeat the out- maximumlikelihood andcorrected-distance set,however ,thatthe followinganalysis is methodsoutperform parsimony methods in notintended asa general criticismof the par- the so-calledFelsenstein zone(four-taxon simonymethod. Rather ,we showthat results tree withtwo long, but unrelated,termi- suchas those of Siddall (1998)should not be nalbranches and all other branches short), takenas avindicationof parsimonywith re- parsimonyis better able toinfer the cor- spectto one particularproblem— sensitivity recttree topologyin whatSiddall callsthe tolong-branch-attractionartifacts. Farriszone, where the twolong terminal branchesinstead lead to sister taxa (or are adjacenton an unrooted tree). Thus,if an PERFORMANCEOF MAXIMUM unrootedphylogeny containstwo long (ter- LIKELIHOODINTHE minal)branches plus three shortbranches, INVERSE-FELSENSTEIN ZONE andthe long branchesare expected tolead Siddall’s (1998)simulationresults are sum- tosister taxa about as often asthey leadto marizedin Figure 1.Siddall’s branch-length nonsistertaxa, then one mightargue that parameterswere deŽned (Siddall, 1998:212) there isno compelling reasonfor preferring asthe “expected percentage change ofthe : : : one methodover another on the basisof branches.”This refers tothe expected per- long-branch attraction.W addell(1995) too centage ofsites for which the nucleotide at hadearlier referred tothe Farriszone, call- one end ofabranch(internode oredge) dif- ing itthe “anti-Felsenstein”zone. Neither fers fromthe nucleotide atthe otherend. To ofthese designationsseems entirely appro- avoidambiguity ,weprefer tocallthis quan- priate,and we will use the term“ inverse- titythe expected percentage difference. Un- Felsenstein zone”here. Siddall refers tothe der the modelused forhis simulations, this poorperformance of maximumlikelihood in value,expressed asaproportion p, is a lower the inverse-Felsenstein zoneas “long-branch bound onthe expected number of changes repulsion,”a termused by Waddell(1995) for (substitutions)per siteincluding multiple the signiŽcantly different problem ofperfor- hits,which we will call d.The twomeasures mancein the inverse-Felsenstein zonewhen arerelated by using the familiardistance 2001 SWOFFORDETAL.—BIAS ANDCHOICEOFPHYLOGENETICMETHODS 527

FIGURE 1.(a) Four-taxon model tree used bySiddall (1998).The probability ofa difference in characterstates between the nodesincident to brancheslabeled a and b is given by pa and pb ,respectively.(b)Parameter-space investigated bySiddall, showing relative performance of parsimonyand likelihood (underthe Jukes–Cantor model) in various regions of this space. equationof Jukes andCantor (1969): Thus,the longestbranch length simulatedby Siddall, p 0.75,corresponds to an inŽnitely D 3 4 long branch,and the next longest, p 0.675, d ln 1 p correspondsto a meanof about1.7 substitu- D D 4 3 ³ ´ tionsper sitealong the branch. Closeexamination of Siddall’s (1998)sim- andits inverse: ulationresults immediately reveals some anomalies.The Žrstinvolves his claim 3 3 4 p exp d : (1998:213and his Fig. 4)that parsimony D 4 4 3 achieved high accuracywhen allbranch ³ ´ 528 SYSTEMATICBIOLOGY VOL. 50 lengths were p 0.75,in which caseeach statementthat “ likelihood methodsalso re- sequence wasfully D randomizedwith re- coveredthe correcttopology with somewhat spectto all others. A methodthat could loweraccuracies than parsimony when all successfully reconstructthe true topology branchrates were equal but high.”Siddall mostof the timefrom completely random claimedthat this “ samephenomenon” was datawould be apowerful one indeed, but evident in Huelsenbeck’s (1995)simulation parsimonyis not this method. W erepeated studybut was“ notnoted.” However ,re- these simulationsusing aresearchversion examinationof Huelsenbeck’s Appendix 1, of PAUP¤4.0d65written by the Žrstau- where the relevantcomparisons are pre- thor.(With the long,biologically implausible, sented (Huelsenbeck, 1995:37,rows 1 and5), brancheson treessimulated here, the likeli- revealsno qualitative difference in the rel- hoodsurface with respect to branch-length ativeperformance of parsimony and like- parametersbecomes extremely at,so con- lihoodin the upper rightcorner of the vergence ofthese parametersto their opti- graphs. malvalues is very slow.Toadjustfor this, Asecondanomaly in Siddall’s (1998)pre- the limiton the maximumnumber ofpasses sentationis the suggestionthat in theinverse- overthe tree, MaxPass,wasincreased from Felsenstein zone,the accuracyof likelihood the default value of20 to 1,000, thus min- methodsdeclines irreversibly withincreas- imizingthe possibilitythat failure oflike- ing sequence length: “: : : asthe number lihoodto converge toan optimal solution of characterswas increased to 500 or 1000, mightaffect accuracyrates.) Our results, the relativeaccuracies of allimplementa- shownin Table 1,are in complete accord tionsof likelihood variedaround 33% which withthe predictionthat given randomdata, isequivalent torandomly picking one of parsimonycan do no better thanpicking the three possible topologiesfor four taxa” atree atrandom, with a 1in 3chanceof (Siddall, 1998:213).This result is in direct choosingthe correcttree. For long but Žnite oppositionto theoretical predictions. Chang branchlengths ( p 0.675),parsimony per- (1996b)and Rogers (1997) have indepen- formssomewhat better D thana randomtree dently provedthat on binary treeswith selection,but even for1,000 sites parsimony Žnite branchlengths, maximum likelihood has a <50%chance of correctlyinferring isguaranteed to be statisticallyconsistent the tree.Thus, Siddall’ s statementthat “ with when charactersevolve accordingto a com- 1000characters free tovary ,[parsimony]re- monmechanism under the assumptionsof constructedthe correctmodel tree morethan the model.These proofsestablish that when 95%of the timeacross the whole parame- assumptionsof the modelare met, as they are ter”space is clearly untrue. Notethat with in these simulations,maximum likelihood long but Žnite branchlengths, maximum methodsshould converge toward100% ac- likelihood slightly outperformsparsimony curacywith increasing sequence length at (Table 1, p 0:675columns). This result is anypoint in the inverse-Felsenstein zone(or alsoat variance D with Siddall’ s (1998:213) anyother zone), except forpoints involving

TABLE 1.Performance of parsimonyand likelihood (underthe Jukes–Cantor model) when all Žve branchlengths areequal and long.

Branchlengths

p 0:75 p 0:675 Number D D of sites Method Prop.correct a Prop.correct b Prop.correct a Prop.correct b 100 Parsimony 0.403 0.3375 0.446 0.3770 Likelihood 0.477 0.3395 0.513 0.3808 500 Parsimony 0.352 0.3205 0.472 0.4368 Likelihood 0.426 0.3478 0.511 0.4453 1000 Parsimony 0.363 0.3405 0.487 0.4632 Likelihood 0.389 0.3258 0.530 0.4828

aProportionof correctlyestimated trees in 1,000simulation replicates using Siddall’ s systemfor handlingtied trees. When more thanone optimaltree is found, theresult is considered fullycorrect if the true tree is contained in this set. b Proportionof correctlyestimated trees in 1,000simulation replicates using our preferred system for handlingtied trees. One-half creditis given if the true tree is one of twooptimal trees, one-third credit is given if all three trees have equal scores. 2001 SWOFFORDETAL.—BIAS ANDCHOICEOFPHYLOGENETICMETHODS 529 inŽnite-length branches.Our own simula- Ratherthan considering the possibilityof tionsare in accordwith this prediction (as anerror in hissimulations, Siddall (1998) wasacknowledged in Siddall’s “noteadded adoptsthe positionthat Felsenstein’ s (1978) in proof”), althoughthe successrate does claimfor the consistencyof maximum likeli- notmonotonically approach perfect accu- hoodestimation of phylogenetic trees,based racy.Forexample, Figure 2showsthe re- onearlier workof Wald(1949), is not valid. sultsof oursimulations for one fairly extreme Siddall (1998:215),following Farris (1997, inverse-Felsenstein-zone pointevaluated by 1999)and possibly Y ang(1996), asserts that Siddall (1998).The accuracyof likelihood is higher for100 sitesthan for 500 or1,000 sites, amongW ald’s (1949)criteria forconsistency were re- quirements forindependence andidentical distribu- soin the absence ofrelevanttheory ,one could tions, which sequenced nucleotides cannothave, and notfault an investigatorfor guessing thatthe thatthe likelihood function is everywhere continuous accuracyof the likelihood methodmight con- andcontinuously differentiable with respect to the pa- tinue todecline withstill longer sequences. rameterof interest. Cladogramsbeing discrete, it has yet to beexplained how thatcondition canbe satisŽ ed However,the relevanttheory does exist, and orindeed what it would meanin this case. consistentwith its prediction, increasing se- quence length enables likelihood toeventu- Neither partof this statement is true. In allyturn the cornerand begin movinggradu- principle, the sitesof anucleotide distri- allytoward 100% accuracy .In thisexample, bution certainly can be independently and the phylogenetic problem issimply sodif- identicallydistributed, whether ornot they Žcultthat it is unreasonable to expect any actuallyare so distributed in anyparticu- relativelyunbiased methodto perform well larcase. Siddall andKluge’ s (1997)earlier withoutan extremely largeamount of data, assertionthat nucleotide characterscannot andthe simulationsconŽ rm thisintuition. logicallybe independent isbased on a

FIGURE 2.Performance of parsimonyand likelihood with increasing sequence length forone point in the inverse- Felsenstein zone( pa 0:675, pb 0:15).Accuracy is measuredas the proportion of correctly estimated trees in 1,000 simulation replicates.D ParsimonyDachieves nearlyperfect accuracywith only50 sites. Theaccuracy of likelihood with very short sequences is helped bya biasfavoring the correct tree. As sequence length increases, the biasexerts less inuence andaccuracy initially declines, but eventually moves toward the predicted 100%accuracy . 530 SYSTEMATICBIOLOGY VOL. 50 fundamentalmisunderstanding of the na- apply.However,even in the caseof contin- ture andapplication of the independence as- uousdistribution functions, the required ab- sumption.In anycase, correlation between solutecontinuity is with regard to the ran- sitesdoes not preclude consistencyas long domvariable x,notto the parameters µ: In asthe strengthof the correlationdecays at the continuouscase, continuity is required so somevery minorrate (W addellet al.,1997). thatthe probability density function f(x, µ), Withregard to the claimof arequirement of which isrelated to the distributionfunction continuityand differentiability ofthe likeli- by the equation hoodfunction, W ald(1949:595) states explic- itly thathis proof “ make[s] nodifferentiabil- @F(x, µ) ityassumptions (thus, not even the existence f(x, µ) , D @ ofthe likelihood equationis postulated).” x Furthermore,Chang (1996b) explicitly treats tree topologyas aparameterin hisproof of will alwaysexist for all values of x. In the µ the consistencyof maximumlikelihood for discretecase, as W aldnotes, f (x, ) is the estimatingtrees, which he refers toasa“cus- probability of x,notthe probabilitydensity , sothe requirement ofdifferentiability does tomizedvariant” of W ald’s proof.For L(µ), the likelihood function withrespect to a pa- notarise at all. rameter µ,the “likelihood equation”referred toby Wald(1949) is BIAS IN MAXIMUM LIKELIHOOD ESTIMATION @L(µ) An estimatoris biased if itsexpected 0, value differs fromits true (population) value. @µ D Even when maximumlikelihood isconsis- tent,it is not guaranteed to be unbiased. A (Kendall andStuart, 1979:39). If suchan well-known example isthe maximumlike- equationexists, the optimalvalue of µ can lihoodestimator of apopulationvariance be solvedfor as one ofthe roots,either ex- when the dataare drawn from a normal plicitly orby iterativeprocedures such as distribution, Newton’s method.This simpliŽ es Žnding the optimalvalue of µ,but itis not a re- s2 (X X¯ )2=n, quirement forthe consistencyof the maxi- D mumlikelihood method.In the caseof an X unordered, discrete-valuedparameter such where n mustbe replaced by n 1 to obtain astree topology,thissimply meanswe must anunbiased estimator.(In thiscase, X is use someother method for searching the acontinuousrandom variable, but see parameterspace (e.g., atree-searching al- Kuhner andFelsenstein (1994)for one gorithmsuch as exhaustive search, branch- approachto quantifying biason discretetree and-bound, orbranchswapping) toattempt topologies.)When twoterminal branches on toŽ nd the optimal“ value”of thisparam- afour-taxontree areextremely long andthe eter.Although clumsierand more time- remaining three branchesare short, max- consumingthan conventional mathemati- imumlikelihood tree inference under the calsolution procedures, this requirement is Jukes–Cantor model is affected by bias.The merely apracticalproblem forthe method, presence ofbias is suggested by the results nota theoreticalone. shownin Figure 2,where the performance Confusionover this latter point may arise of likelihood declines initiallyand then fromW ald’s declarationof hisAssumption improvesas sequence length increases.In 1— “F(x, µ)iseither discretefor all µ or is thiscase, although estimates of underlying absolutelycontinuous for all µ.”AsW aldex- parameters(branch lengths) arebiased, plainsin the preceding sentence, F(x, µ) is maximumlikelihood managesto obtain a the cumulativedistribution function ofthe correcttree topologymore often thaneither randomvariable x,whichin the caseof se- ofthe incorrecttopologies at all sequence quence datais the nucleotide sitepattern, a lengths.This is not always the case.Figure 3 discretevariable. So, in thiscase the distri- showsthat in the Felsenstein zonewhere bution ofthe randomvariable is discrete and the twolong branchesare not adjacent on the secondpart of Assumption 1 doesnot the tree,a biasin likelihood causesthe 2001 SWOFFORDETAL.—BIAS ANDCHOICEOFPHYLOGENETICMETHODS 531

FIGURE 3.Performance of parsimonyand likelihood with increasing sequence length forthe point in the Felsenstein zoneanalogous to the one used forthe inverse-Felsenstein zonesimulations of Figure2. Accuracy is measuredas the proportion of correctly estimated trees in 1,000simulation replicates. At 50or fewer sites, like- lihood actually does slightly worse thanpicking atree atrandom (because ofbias), but with increasing sequence length the biasdecays and the correct tree is recovered with increasing certainty. accuracyrate for likelihood forvery short Felsenstein zoneof four-taxon branch-length sequences tobe lowerthan randomly pick- space,parsimony estimates a phylogeny cor- ing atree.However ,asthe consistencyproofs rectlymore often thandoes maximum like- guarantee,the biasis eventually overcome lihood.W ethink itinformative to examine andthe accuracyof likelihood increasesto- the reasonfor the superiorperformance by ward100% with longer sequences. The only parsimonyunder these conditions. conditionsunder whichSiddall’ s conclusion Forthe simple modelof evolutionsimu- of equal preference forall three possible latedby Siddall,one cancalculate the proba- treesis realized involve inŽ nitely long bility thatan apparent synapomorphy unit- branches(whereas the consistencyproofs ing the twolong-branch taxais in factdue to require Žnite branchlengths). Because .(Here, weuse the termsynapo- inŽnite branchlengths arenot reasonable morphyin anunrooted sense; it will cor- biologically,the performanceof a method respondto its traditional meaning if any under these conditionsis not highly relevant one of the four terminaltaxa is designated tothe choicebetween methods,although one asan .) In the extreme end of the wouldhope thata methodwould not return inverse-Felsenstein zone,an overwhelming astrongpreference forany one four-taxon number ofapparent synapomorphies link tree when twoof the four sequences are the twolong-branch taxatogether. However , completely random(see below). the synapomorphiesuniting the twolong branchescan arise in manydifferent ways. BIAS IN PARSIMONY ANALYSIS Figure 4illustratesa few ofthe different char- Despite the errorsin Siddall’s simula- acterhistories that can lead to an apparent tionresults and their interpretation,his pri- synapomorphylinking the long-branch taxa. maryconclusion is correct— in the inverse- Forthis example, the nucleotidesobserved at 532 SYSTEMATICBIOLOGY VOL. 50

pected numbers ofchangeson long andshort branchesare 1.727 and 0.167, respectively . The probability thata single change will oc- curalong the internalbranch but nochange will occuralong the remaining branchesfor this tree is

Pr[True Synapomorphy] Pr[Nochange onlong terminalbranches] D Pr[Nochange onshortterminal £ branches] Pr[single change oninternalbranch] £ (e 1:727)2(e 0:167)2(0:167e 0:167) ¼ 0:0032 ¼ Onthe otherhand, the probability ofobserv- ing apatternof nucleotidesin whichthe two taxaon one side ofthe centralbranch share the samenucleotide andthat nucleotide is different froma nucleotide sharedby the two taxaon the oppositeside ofthe centralbranch is0.1172 (obtained as the sumof the single- sitelikelihoods for all xxyy-type patterns, e.g.,AACC, AAGG, : : :,TTCC).Thus, about (0:1172 0:0032)/0.1172,or 97%, of all ap- parentsynapomorphies will actuallybe mis- interpreted .At more extreme pointsof the parameterspace examined by FIGURE 4.Different scenarios thatwill leadto anap- Siddall,the misinterpretationbecomes even parentsynapomorphy .(a)A “true”synapomorphy .(b,c) Two scenarios in which anapparentsynapomorphy is morepronounced. For example, atthe sec- actually the result of misinterpreted homoplasy. ondmost extreme pointsimulated by Siddall ( pa 0.675, pb 0.0075), 99.8%of appar- ent synapomorphiesD D supporting » the true tree the tipsof the tree areC forthe long-branch will infactbe misinterpreted homoplasies! taxaand A forthe remaining twotaxa. All Figures 5and6 summarizethe rel- of the examplesexcept thatshown in Fig- ativecontribution of actualsynapomor- ure 4ainvolvemore than one change.In all phies (thoseapparent synapomorphies that cases,however ,the parsimonymethod inter- arisefrom a single change alongthe in- pretsthe historyof the characteras a sin- ternalbranch of the tree) versusmis- gle change thatoccurred along the internal interpreted homoplasyfor the full pa- branchof the tree.In otherwords, except for rameterspace explored by Siddall (1998). the single example ofFigure 4a,parsimony Figure 5ashows the expected proportionof misinterpretshomoplasy as evidence ofrela- parsimony-informativesites for which the tionship(in thiscase, as arelationshipunit- twointernal nodes have a different state ing the twolong-branch taxa).This would andeach pair of adjacent taxa have the notbe aproblem forthe parsimonymethod samestate (this includes bothtrue synapo- if the probability issmall that homoplasy morphiesas well assites for which mul- underlies the apparentsynapomorphies. In tiple substitutionshave occurred along a the inverse-Felsenstein zone,however, a vast branchbut parsimonycorrectly reconstructs majorityof the apparentsynapomorphies the ancestralstates). The expected propor- uniting the long-branch taxaare due toho- tionof parsimony-informativesites that are moplasy.Considerthe pointin the parame- apparentsynapomorphies resulting from terspace analyzed in Figure 2,where the ex- homoplasyin the long-branch taxais shown 2001 SWOFFORDETAL.—BIAS ANDCHOICEOFPHYLOGENETICMETHODS 533

FIGURE 5.Contour plots showing the proportion ofparsimony-informative sites forwhich (a)parsimony cor- rectly reconstructs the states atthe internal nodes,and this reconstruction suggestsan apparent synapomorphy , and(b) parsimony misinterprets parallel changesin the terminal branchesas a synapomorphy.See Figure1 for deŽnition of pa and pb . 534 SYSTEMATICBIOLOGY VOL. 50

FIGURE 6.Contour plots showing the proportion of parsimony-informative sites thatrepresent true synapomor- phies (asingle changealong the internal branch)with nochanges in the terminal branch.In the extreme regions of the inverse-Felsenstein zone(upper left corner), nearlyall of the parsimony-informative characterssupport the true tree, but almost noneof them will betrue synapomorphies.See Figure1 fordeŽ nition of pa and pb . in Figure 5b.Figure 6illustratesthe bottom The ultimatecause of the biastoward trees line; almostall of the goodperformance by thatgroup “long-branch”taxa is that parsi- parsimonyin the inverse-Felsenstein zoneis monyseverely underestimatesthe true num- due tositeswith more than one substitution. ber ofsubstitutionsthat occur along the long Siddall (citing Farris,1983) was apparently branches.It is important to remember that awarethat parsimony’ s performancewas be- likelihood methodscan have similar prob- ing boostedby misinterpreted homoplasy, lemswhen their modelsare strongly vio- assuggested by the followingstatement lated.In general, if the violationof the model (Siddall, 1998:216):“ the reason[that parsi- issuch that the assumedmodel is too simple monydoes well in the inverse-Felsenstein (e.g., if high transition/transversionratios zone] isthat the number ofsynapomorphies oramong-site rate variations are ignored), recoveredfor a pairof sistertaxa need not underestimationof the actualnumber of allactually be homologiesfor the methodto substitutionscan lead to inconsistency of havebehaved correctly.”Wewouldnot dis- likelihood in the Felsenstein zone(W addell, pute thisstatement in the least.However ,we 1995:377–385; Gaut and Lewis, 1995; Chang, wouldadd that most researchers would be 1996a;Sullivan andSwofford, 1997) and worriedif they knew that99% (or more) of overconŽdence in the inverse-Felsenstein the apparentsupport for an “ optimal”tree zone(W addell,1995:385– 398; Bruno and camefrom an inherent biasin the method Halpern,1999). However, attempting to ac- used ratherthan from actual phylogenetic countfor multiple substitutionsby using an signal.Surprisingly ,Siddall seemsentirely oversimpliŽed modelis a step in the right comfortablewith this possibility ,referring to direction,whereas ignoring thementirely is parsimonyas “ positivelyleading” (1998:216) toaccept ignorance. Maximum likelihood in the inverse-Felsenstein zone. methodsare much morerobust to artifacts 2001 SWOFFORDETAL .—BIASANDCHOICEOFPHYLOGENETICMETHODS 535 oflong-branch attractionthan are parsimony appropriateto saythe likelihood isfailing be- methods,even when their assumedmodels causeof “long-branch repulsion;”rather ,itis areinadequate. succeeding inremaining uncommittedwhen the datado notdecisively supportany single topology.If indeed likelihood were affected LONG-BRANCH REPULSION OR ABSENCE by long-branch repulsion, then itwould ob- OF LONG-BRANCH ATTRACTION? tainthe correcttopology signiŽ cantly less If the true unrootedtree forfour taxahas thanone-third ofthe time,which it does not aninternal branch length closeenough to do. zerothat the informationin the resulting se- Thisbasic notion can be encapsulatedin quences isinsufŽ cient toreliably chooseone the simulationresults shown in Figure 7. ofthe treesover the others,then the tree isef- Thissimulation evaluates the relativeperfor- fectively astartree. In thiscase, if amethod manceof parsimony and likelihood forthree isunbiased, itshould choose equivocally— it sequence lengths asthe tree approachesa mightfavor all three treesequally ,oritmight startree fromthe inverse-Felsenstein zone, chooseone tree atrandom(and, ideally ,dis- becomesan exactstar tree, and then moves coveras well thatthe othertrees were not intothe Felsenstein zone.Likelihood meth- signiŽcantly different). Bythisargument, if a odsdo well in bothzones when the cen- methodcorrectly chooses the true tree one- tralbranch length isat least 0.04 substitu- thirdof the time,then itis successful, even tionsper siteand the number ofsites is though itchooses an incorrecttopology the notsmall. Parsimony is inconsistent in the othertwo-thirds of the time.On the other Felsenstein zonefor all branch lengths in hand,if abiasedmethod is used when the the range of 0–0.05 substitutions per site true tree iseffectively astartree, one topol- (andeven higher), doing better forshorter ogy will be preferred overthe others.If there sequences thanlonger ones.As the central areexactly three choicesand the available branchlength shrinkstoward zero in both informationis inadequate to decide among zones,the accuracyof likelihood decreases, them,then the methodis failing if itdeviates reaching the expected 1in 3accuracyrate stronglyfrom a 1in 3preference foreach when the centralbranch is extremely small choice.In thiscase, a methodobviously is in either zone.Bootstrap (as well asjack- failing if itpreferentially choosesthe wrong knife) supportis low for all three trees(de- tree,but perhaps lessobviously ,itis also fail- tailsnot shown). Parsimony ,onthe other ing if italways favors the correcttree. hand,abruptly shiftsfrom nearly perfect ac- Siddall (1998)focused ona near-startree curacyto complete inaccuracyon either side in the inverse-Felsenstein zonefor which ofthe zeropoint. When the centralbranch is there waslittle or no information in the extremely short,parsimony simply chooses sequences todistinguish among the three the tree thatgroups the long-branch taxa, possible treesand found thatparsimony regardlessof what the true tree mightbe, nonetheless choosesthe correcttree topol- withhigh bootstrapsupport for either the ogy mostof the time.A similarsituation corrector the incorrectresult. This behavior existsfor a near-startree in the Felsenstein ofparsimony in the extreme regionsof the zone,except thatparsimony usually chooses Felsenstein andinverse-Felsenstein zonesis the sametopology ,whichis now incorrect. analogousto an oracle who responds to any In bothcases, parsimony is failing rather questionby responding “0.492.”If the ques- thansucceeding, itsfailure followingdirectly tionasked is, “ Whatis the sumof 0.450and fromits bias. Likelihood, on the otherhand, 0.042?”or “ Whatis 3 times0.164?” the or- issucceeding in bothof these casesbecause aclewill answercorrectly ,but presumably itschoice is much closerto a randomone onceinterrogators realized that the answer in bothzones. Virtually anymethod for as- wasalways the sameregardless of the ques- sessingreliability ,including bootstrapping tion,they wouldnot be readyto give up their (Felsenstein, 1985),jackkniŽ ng (Penny and electroniccalculators. There aretimes when Hendy,1986;Felsenstein, 1988;Farris et al., “Idon’t know”is a better answerthan a con- 1996),or the testof Kishino and Hasegawa Ždent guess thathas a high probability of (1989),will fail toŽ ndsigniŽcant support for being incorrect. anyof the three possible topologiesunder Itis interesting, and somewhat amusing, the likelihood criterion.In thiscase, it is not toexamine the performanceof a phenetic 536 SYSTEMATICBIOLOGY VOL. 50

FIGURE 7.Results ofsimulation comparingthe behaviorof parsimonywith likelihood in the transition between the inverse-Felsenstein andFelsenstein zones.The lengths ofthe terminal branches,in expected substitutions per site, are0.5 (long branches) and 0.05 (short branches).Accuracy is measuredas the proportion ofcorrectly estimated trees in 1,000simulation replicates. (a)Parsimony shifts abruptlyfrom nearly perfect accuracyto nearlycomplete inaccuracyas the true tree goesfrom being a near-startree in the inverse-Felsenstein zoneto anear-startree in the Felsenstein zone,especially forsequence lengths of 1,000or longer .(b)When the internal branchlength is not extremely small orthe sequence length is not too short,likelihood achieves reasonableaccuracy in boththe inverse- Felsenstein andFelsenstein zones.As the internal branchbecomes progressively shorter,likelihood is less ableto infer the tree correctly,appropriately reecting the lackof resolving power in the data.In both plots, the points just to the left andright of the zeroon the abscissarepresent branchesof inŽnitesimal length in the inverse-Felsenstein andFelsenstein zones,respectively . 2001 SWOFFORDETAL.—BIAS ANDCHOICEOFPHYLOGENETICMETHODS 537

FIGURE 8.Performance of UPGMA in the transition between the inverse-Felsenstein andFelsenstein zones mimics thatof parsimony(see Fig.7a), reinforcing the position thatthe high“ accuracy”of parsimonyin the inverse-Felsenstein zoneis purely the result of bias.

clustering method,UPGMA, under the same CONCLUSIONS conditions(Fig. 8).Because UPGMA deter- The demonstrationthat parsimony anal- minesa rootedtree, we candisregard the ysiscan, under speciŽc conditions,achieve positionof the rootin orderto compare its greateraccuracy than maximum likelihood performancefairly withthat of the intrin- failsto rescue parsimonyfrom the criticism sicallyunrooted parsimony and maximum thatpotential biases can lead it to supportor likelihood methods.UPGMA demonstrates reject alternativetopologies strongly when almostexactly the samebehavior as parsi- informationis insufŽ cient forreaching a mony.ItŽ ndsthe correcttree 100%of the deŽnitive conclusion.Although the property timein the inverse-Felsenstein zone,at the ofproviding strongsupport for a correct, but price ofmissing it nearly 100%of the time near-star,tree in the inverse-Felsenstein zone in the Felsenstein zone.Application of UP- seemsdesirable, this advantage is negated if GMAasaphylogenetic methodrequires the the methodalso provides strong support for assumptionof a“molecularclock” (equal an incorrect,but near-star,tree in the Felsen- ratesof substitution in alllineages), but vi- steinzone. Whatever good properties parsi- olationof this assumption does not necessar- monymight have— and we donotdeny their ily leadto reduced accuracy—the method’s existence—strong commitment to a topology inherent biasworks in itsfavor if the taxa purely onthe basisof the length ofatree’s thatare most similar are in factclose rela- terminalbranches is not one ofthem. tives.Reasoning analogous to Siddall’s could Itis often suggested thatthe conditions then be used toargue thatUPGMA isbe- under whichmaximum likelihood outper- havingproperly (even outperforming par- formsparsimony are extreme, whereas un- simony)by avoiding“ long-branch repul- der more“ typical”conditions this advantage sion”in the inverse-Felsenstein zonewhen disappears.Siddall hastaken a different po- the clockassumption is violated. W edoubt, sition,claiming to have found a“limiting however,that many proponents of parsi- case”for which likelihood methods,rather monymethods would Ž nd thisargument thanparsimony ,“moreoften thannot will fail compelling. toconverge onthe correctmodel topology” 538 SYSTEMATICBIOLOGY VOL. 50

(Siddall, 1998:211).W ehaveshown that this FARRIS,J.S.1997.“ Who,really is astatistician?”Paper claimis simply falseand further suggestthat presented atthe Sixteenth Meeting ofthe Willi Hennig Society.GeorgeW ashingtonUniv .Washington,DC. mostscientists would prefer touse methods FARRIS,J.S.1999.Likelihood andinconsistency .Cladis- thatare honest about how strongly a resultis tics 15:199–204. supported thanto use amethodthat pretends FARRIS,J.S.,V.A.A LBERT, D. LIPSCOMB, AND A. thata resultis strongly supported when the G.Kluge. 1996.Parsimony jackkniŽ ng outperforms majorityof that support is a consequence neighbor-joining. 12:99–124. FELSENSTEIN,J.1978.Cases in which parsimonyand ofbias. When interpreted properly,simula- compatibility methodswill bepositively misleading. tionstudies such as those of Siddall (1998) Syst.Zool. 27:401–410. merely reinforce argumentsfor the utilityof FELSENSTEIN,J.1985.ConŽ dence limits onphylogenies: model-basedmethods, including maximum An approachusing the bootstrap. 39:783– 791. likelihood,for phylogenetic analysisof FELSENSTEIN,J.1988.Phylogenies frommolecular se- molecularsequence data.These methodsac- quences:Inference andreliability .Annu.Rev .Genet. knowledge the inevitability ofmultiple sub- 22:521–565. stitutionsand explicitly accommodatethem GAUT, B. S., AND P. O. LEWIS.1995.Success of maxi- asa fundamentalcomponent of their oper- mumlikelihood in the four-taxoncase. Mol. Biol. Evol. 12:152–162. ation.The parsimonymethod is useful and HENDY,M.D.,and D. P ENNY.1989.A frameworkfor the powerful in manysituations, but itsability to quantitative studyof evolutionary trees. Syst.Zool. obtaina “correct”result for reasons that are 38:297–309. clearlyinappropriate should not be used as HILLIS, D. M., HUELSENBECK, J. P. AND D. L. SWOFFORD. 1994.Hobgoblin of phylogenetics? Nature369:363– anargumentin itsfavor. 364. HUELSENBECK,J.1995.Performance of phylogenetic methodsin simulation. Syst.Biol. 44:17–48. ACKNOWLEDGMENTS HUELSENBECK, J. P., AND D. M. HILLIS.1993.Success ofphylogenetic methodsin the four-taxoncase. Syst. WethankHirohisa Kishino, JackSullivan, Jianzhi Zool. 42:247–264. Zhang,and members of the “Phylobrew”group (es- JUKES, T. H., AND C. R. CANTOR.1969.Evolution ofpro- pecially Frank“ Andy”Anderson, Robb BrumŽ eld, tein molecules. Pages21– 132 in Mammalianprotein Kevin de Queiroz,Jim McGuire, Steve Poe, andJim metabolism.(H. N.Munro,ed.). Academic Press, New Wilgenbusch) forhelpful discussion andeditorial sug- York. gestions. Correspondence with ZihengY angwas ex- KENDALL, M., AND A. STUART.1979.Advanced theory tremely helpful in clarifying our ideason W ald’s con- ofstatistics, 2ndedition. CharlesGrifŽ n, London. sistency proof andrelated ideas,although this should KIM,J.1996.General inconsistency conditions for not betaken as complete endorsement of what we have maximumparsimony: Effects of branchlengths written here.W ethankRon DeBry ,MarkSiddall, and andincreasing numbersof taxa.Syst. Biol. 45:363– ananonymousreviewer forsuggestions thatimproved 374. the clarity andaccuracy of the paper.Finally ,the “ora- KISHINO, H., AND M. HASEG AWA.1989.Evaluation of cle”analogy is not original to this paper.W ehaveheard the maximumlikelihood estimate of the evolution- variations onit fromJoe Felsenstein andBret Larget, arytree topologies fromDNA sequence data,and the amongothers. branchingorder in Hominoidea.J. Mol.Evol. 29:170– This work was supported bythe following grants: 179. MarsdenFund of New Zealandto P.J.W.,NationalSci- KUHNER,M.K.,andJ. F ELS ENSTEIN.1994.A simulation ence Foundation(NSF) DEB-0075406 to J.P.H.,NSF DEB- comparison of phylogenyalgorithms under equal and 9628835to J.S.R.,NSF DEB-9974124to D.L.S.,and an unequalevolutionary rates.Mol. Biol. Evol. 11:459– Alfred P.Sloan/NSFYoungInvestigator Award to P.O.L. 468. LOCKHART,P.J.,A.W.L ARKUM, M. A. STEEL, P. J. WADDELL, AND D. PENNY.1996.Evolution of chloro- phyll andbacteriochlorophyll: Theproblem ofinvari- REFERENCES antsites in sequence analysis.Proc. Natl.Acad. Sci. BRUNO, W. J., AND A. L. HALPERN.1999.T opological USA93:1930–1934. biasand inconsistency ofmaximumlikelihood using PENNY, D., AND M. D. HENDY.1986.Estimating the re- wrongmodels. Mol.Biol. Evol. 16:564–566. liability ofevolutionary trees. Mol.Biol. Evol. 3:403– CHANG ,J.T.1996a. Inconsistency of evolutionary tree 417. topology reconstruction methodswhen substitution ROGERS,J.S.1997.On the consistency of maximum ratesvary across characters. Math. Biosci. 134:189– likelihood estimation ofphylogenetic trees fromnu- 215. cleotide sequences. Syst.Biol. 46:354–357. CHANG ,J.T.1996b. Full reconstruction of Markovmod- SIDDALL,M.E.1998.Success of parsimonyin the four- els onevolutionary trees: IdentiŽability andconsis- taxoncase: Long-branch repulsion bylikelihood in the tency.Math.Biosci. 137:51–73. Farriszone. Cladistics 14:209–220. FARRIS,J.S.1983.The logical basisof phylogenetic anal- SIDDALL, M. E., AND A. G. KLUGE.1997.Probabilism and ysis.Pages 7– 36 in Advances in cladistics, Volume 2 phylogenetic inference. Cladistics 13:313–336. (N.I.Platnick andV .A.Funk,eds.). Columbia Univ . SIDOW,A. 1994.Parsimony or statistics? Nature367:26– Press, New York. 27. 2001 SWOFFORDETAL.—BIAS ANDCHOICEOFPHYLOGENETICMETHODS 539

SULLIVAN, J., AND D. L. SWOFFORD.1997.Are guinea transforms,and maximum likelihood. MasseyUniv ., pigs rodents? Theimportance of adequatemodels Palmerston North,New Zealand. in . J.Mammal.Evol. 4:77– WADDELL, P. J., D. PENNY, AND T. MOORE. 1997. 86. Hadamardconjugations andmodeling sequence evo- SWOFFORD,D.L.,P.O.L EWIS, AND P. J. WADDELL. lution with unequalrates across sites. Mol.Phylo- 1995.“ Thephylogenetic utility of LogDet/paralinear genet.Evol. 8:33–50. distances formore realistic evolutionary models. I. WALD,A.1949.Note onthe consistency of maximum Isthere aheavyprice forusing them when asim- likelihood. Ann.Math. Stat. 20:595– 601. pler model would sufŽce?” Paper presented atan- YANG,Z.1996.Phylogenetic analysisusing parsimony nualmeeting of the Society forthe Studyof Evo- andlikelihood methods.J. Mol.Evol. 42:294–307. lution/Society of Systematic Biologists, Montreal, Canada. WADDELL,P.J.1995.Statistical methodsof phylogenetic Received19 January2000; accepted 18 March 2000 analysis,including Hadamardconjugations, LogDet Associate Editor:R. Olmstead