Bias in Phylogenetic Estimation and Its Relevance to the Choice

Bias in Phylogenetic Estimation and Its Relevance to the Choice

Syst. Biol. 50(4):525–539, 2001 Biasin PhylogeneticEstimation and Its Relevance to theChoice betweenParsimony and Likelihood Methods DAVID L. S WOFFORD,1,6 PETER J. WADDELL,2 JOHN P. HUELSENBECK,3 PETER G. FOSTER,1,7 PAUL O. LEWIS,4 AND JAMES S. ROGERS5 1Laboratory of MolecularSystematics, National Museumof Natural History, SmithsonianInstitution Museum Support Center,4210 Silver Hill Road, Suitland, Maryland 20746, USA; E-mail: [email protected] 2Instituteof MolecularBioSciences, Massey University,Palmerston North, New Zealand;E-mail: [email protected] 3Department of Biology,University of Rochester,Rochester ,New York 14627,USA; E-mail:[email protected] 4Department of Ecologyand Evolutionary Biology, The University of Connecticut,U-43, 75 N.EaglevilleRoad, Storrs, Connecticut06269-30437, USA; E-mail: [email protected] 5Department of BiologicalSciences, University of New Orleans, New Orleans, Louisiana70148, USA; E-mail:[email protected] Itis now widely recognizedthat un- actuallyoccur and thus do notgo far enough der relativelysimple modelsof stochastic in correctingfor the problem. Inconsistency change,phylogenetic inference methodscan canalso arise under parsimony,even when activelymislead investigators attempting to allbranches have the samelength (Kim, estimateevolutionary trees from molecu- 1996),although in thiscase there muststill larsequences andother data. One instance be particularimbalances in the totallengths of thisphenomenon is“ long-branch attrac- ofthe pathsfrom internal nodes to tipsof the tion,”in whichsome pairs of taxahave tree;“ long-path attraction”would describe ahigher probability of sharingthe same thisphenomenon. characterstate because of parallelor con- Long-branch attractionhas been widely vergent changesalong long branchesthan used, andabused, in justifying choicesof dotaxa that are more closely related be- methodsand in explaining anomalousre- causethey haveretained some same state sults.Critics of the relevance oflong-branch froma commonancestor. Methods that sys- attractionand related artifacts have generally tematicallyunderestimate the actualamount takentwo tacks. The rst(e.g., Farris,1983) of divergence maythen become statisti- claimsthat the demonstrationof long-branch callyinconsistent or “ positivelymisleading” attractionrequires simple andunrealistic (Felsenstein, 1978;Hendy andPenny ,1989), modelsof evolutionarychange. As pointed estimatingan incorrect tree withan increas- outby Kim (1996),this argument lacks force ing certaintyas the amountof characterdata because conditionsthat lead to inconsistency increases.Although usually associatedwith aremuch moregeneral andcomplex than parsimonymethods, long-branch attraction thoseoutlined by Felsenstein (1978);further canalso af ict maximum likelihood anddis- relaxationof Felsenstein’s conditionssimply tanceanalyses when the assumedsubstitu- exacerbatesthe problem. The secondline tionmodels of these methodsare strongly of argument(e.g., Siddall andKluge, 1997) violated(e.g., Huelsenbeck andHillis, followsfrom the factthat “ truth”is unknow- 1993;Huelsenbeck, 1995;W addell,1995:377– able in science generally; because itis not 404;Gaut and Lewis, 1995; Chang, 1996a; possible tobe certainthat the analysisof a Lockhartet al.,1996; Sullivan andSwofford, realdata set has been compromisedby long- 1997).In thiscase, although the methods branchattraction, the abilityof a methodto areexplicitly designed todeal with superim- converge,in principle, tothe correctsolution posedsubstitutions (multiple hits),the un- withincreasing amounts of datais irrelevant. derlying modelspredict fewer of these than In thisview ,“‘accuracy’is rendered empty asan empirical claim” (Siddall andKluge, 6 Currentaddress: The Natural History Museum, 1997:318).Proponents of model-based(or Cromwell Road,London SW7 5BD, U.K.; E-mail: statistical)methods that seek toavoid [email protected] 7 Currentaddress: School of ComputationalScience inconsistencyattributable to long-branch or andInformation T echnology,FloridaState University , long-path artifactshave not been dissuaded Tallahassee,Florida, 32306-4120. by thisargument. They certainlyappreciate 525 526 SYSTEMATICBIOLOGY VOL. 50 the elusiveness of “truth”but understand the modelis misspeci ed (e.g., overcorrec- thatall methods are susceptible tofailure tionfor among-site rate variation). However , under certainconditions. Consequently , wewill use thisterm in Siddall’s contextfor these proponentsseek methodsand models the present purposes. thatwill succeed under awide range of We—and undoubtedly others—realized plausible conditionsand that are less likely long agothat when long-branch attractionfa- toyield misleadingresults purely because of vorsthe correctunrooted tree forfour taxa artifacts.Historically ,the different perspec- ratherthan one ofthe twoincorrect trees, tiveshave led toaschismbetween thosewho parsimonywould outperform maximum wouldapproach phylogenetics froma statis- likelihood inchoosing a topology . Parsimony ticalperspective andthose who place strong “succeeds”in the inverse-Felsenstein zone faithin one particularapproach over all oth- because itis a stronglybiased method, the ers.In manyareas of science,the statistical directionof the biasfavoring the correct modeling viewpointtends to become more tree ratherthan an incorrect one, in con- predominantas asubject matures.However , trastto the situationin the Felsenstein zone. proponentsof model-basedmethods in phy- Thispoint was obvious enough notto merit logeneticshave not always helped their case publicationon its own, although we have by makingoverly assertive and sometimes mentioned itin variousother contexts (e.g., misleadingclaims about the superiorityof Swofford et al.,1995; W addell,1995). How- these methods(see Sidow,1994;Hillis et al., ever, the portrayalby Siddall (1998)of this 1994). observationas avictoryfor parsimony meth- Againstthis backdrop of confusing and odsdemands closer scrutiny .Properly in- often acrimoniousdebate, Siddall (1998)of- terpreted, the resultsof Siddall’s simula- fered anew challenge tothe positionthat con- tionsactually support the superiorityof siderationsof long-branch attractionfavor model-basedmethods for dealing withlong- model-basedmethods. Siddall’ s position, branchartifacts just as stronglyas did those which seemsreasonable at least on the sur- fromearlier studiesthat concentrated on the face,can be summarizedsimply: Although Felsenstein zone.W eemphasizeat the out- maximumlikelihood andcorrected-distance set,however ,thatthe followinganalysis is methodsoutperform parsimony methods in notintended asa general criticismof the par- the so-calledFelsenstein zone(four-taxon simonymethod. Rather ,we showthat results tree withtwo long, but unrelated,termi- suchas those of Siddall (1998)should not be nalbranches and all other branches short), takenas avindicationof parsimonywith re- parsimonyis better able toinfer the cor- spectto one particularproblem— sensitivity recttree topologyin whatSiddall callsthe tolong-branch-attractionartifacts. Farriszone, where the twolong terminal branchesinstead lead to sister taxa (or are adjacenton an unrooted tree). Thus,if an PERFORMANCEOF MAXIMUM unrootedphylogeny containstwo long (ter- LIKELIHOODINTHE minal)branches plus three shortbranches, INVERSE-FELSENSTEIN ZONE andthe long branchesare expected tolead Siddall’s (1998)simulationresults are sum- tosister taxa about as often asthey leadto marizedin Figure 1.Siddall’s branch-length nonsistertaxa, then one mightargue that parameterswere dened (Siddall, 1998:212) there isno compelling reasonfor preferring asthe “expected percentage change ofthe : : : one methodover another on the basisof branches.”This refers tothe expected per- long-branch attraction.W addell(1995) too centage of sitesfor which the nucleotide at hadearlier referred tothe Farriszone, call- one end ofabranch(internode oredge) dif- ing itthe “anti-Felsenstein”zone. Neither fers fromthe nucleotide atthe otherend. To ofthese designationsseems entirely appro- avoidambiguity ,weprefer tocallthis quan- priate,and we will use the term“ inverse- titythe expected percentage difference. Un- Felsenstein zone”here. Siddall refers tothe der the modelused forhis simulations, this poorperformance of maximumlikelihood in value,expressed asaproportion p, is a lower the inverse-Felsenstein zoneas “long-branch bound onthe expected number of changes repulsion,”a termused by Waddell(1995) for (substitutions)per siteincluding multiple the signicantly different problem of perfor- hits,which we will call d.The twomeasures mancein the inverse-Felsenstein zonewhen arerelated by using the familiardistance 2001 SWOFFORDET AL.—BIAS AND CHOICEOF PHYLOGENETICMETHODS 527 FIGURE 1.(a) Four-taxon model tree used bySiddall (1998).The probability of adifference in characterstates between the nodesincident to brancheslabeled a and b is given by pa and pb ,respectively.(b)Parameter-space investigated bySiddall, showing relative performance of parsimonyand likelihood (underthe Jukes–Cantor model) in various regions of this space. equationof Jukes andCantor (1969): Thus,the longestbranch length simulatedby Siddall, p 0.75,corresponds to an innitely D 3 4 long branch,and the next longest, p 0.675, d ln 1 p correspondsto a meanof about1.7 substitu- D D 4 3 ³ ´ tionsper sitealong the branch. Closeexamination of Siddall’s (1998)sim- andits inverse: ulationresults immediately reveals some anomalies.The rstinvolves his claim 3 3 4 p exp d : (1998:213and his Fig. 4)that parsimony D 4

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    15 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us